Analysis and Design of Pharmacokinetic Models

TECHNISCHE UNIVERSITEIT EINDHOVEN
Department of Mathematics and Computer Science
Analysis and Design

of Pharmacokinetic Models
By
R. H. Eyzaguirre Pérez
Supervisors:
E. E. M. van Berkum (TU/e)

H. C. M van der Knaap (Unilever R&D Vlaardingen)
Eindhoven, August 2006

Preface
In this report I present the results of the research made to complete my master studies at the Department
of Mathematics and Computer Science of the Technische Universiteit Eindhoven. This research was
carried out at Unilever R&D Vlaardingen between March and August of 2006 in the field of the analysis
of pharmacokinetic models to describe biological systems.
This was a very instructive experience for my professional and academic training and for this reason I
want to thank Unilever R&D Vlaardingen that provided all the facilities to successfully conclude this
project. In particular I want to thank my Unilever supervisor in this project, Henk van der Knaap of the
Data Science Skillbase, and to Emiel van Berkum, my supervisor at the Technische Universiteit
Eindhoven. Thanks also go to Guus Duchateau, Pieter van de Pijl, and Martin Folz, members of the
Bioavailabilty & Gut Health Expertise Group at Unilever R&D Vlaardingen for their recommendations
and time.
Raul Eyzaguirre
Eindhoven, August 2006.
i
ii
Contents
Definitions and Nomenclature...................................................................................................................v
1. Introduction.......................................................................................................................................1
2. Compartment Analysis .....................................................................................................................3

2.1. Principal Compartment Models.................................................................................................3
2.2. Further Considerations ..............................................................................................................5
3. Individual Pharmacokinetics ...........................................................................................................7

3.1. Nonlinear Regression Model.....................................................................................................7
3.2. Inference for Functions of the Estimated Parameters..............................................................10
3.3. Measures of Curvature or Nonlinearity ...................................................................................12
3.4. Software ..................................................................................................................................16
3.5. Example: One-Compartment Model, Extravascular Administration.......................................17
4. Population Pharmacokinetics ........................................................................................................27

4.1. Hierarchical Nonlinear Models ...............................................................................................27
4.2. Traditional Approaches ...........................................................................................................29
4.3. Inference Based on Linearization ............................................................................................31
4.4. Software ..................................................................................................................................33
4.5. Example 1: One-Compartment Model with Extravascular Administration.............................33
4.6. Example 2: Comparison of two Treatments in the One-Compartment Model with
Extravascular Administration ..................................................................................................40
5. Sampling Strategies ........................................................................................................................48

5.1. Optimal Designs ......................................................................................................................48
5.2. Simulation Studies...................................................................................................................50
5.3. Sparse Data Analysis...............................................................................................................52
6. Conclusions and Recommendations ..............................................................................................60

6.1. Conclusions .............................................................................................................................60
6.2. Data Analysis Recommendations............................................................................................62
Appendix: R Code for the Measures of Curvature Computation ........................................................64
References .................................................................................................................................................66
iii
iv
Definitions and Nomenclature
α : Hybrid constant related to the micro rate constants k12, k21, and kel
β : Hybrid constant related to the micro rate constants k12, k21, and kel
β : r×1 vector of fixed population parameters

γ : Vector of intra-individual covariance parameters in the functional part of the covariance model
δ : Intra-individual covariance parameter used to model heterogeneity of variances
θ : p×1 vector of regression parameters
θˆ GLS : Generalized least squares estimator for θ
θˆ OLS : Ordinary least squares estimator for θ
θˆ WLS : Weighted least square estimator for θ

µ : Mean response
ξ : Vector of intra-individual covariance parameters
σ2 : Intra-individual variance
σɶ 2 : Maximum likelihood estimator for σ 2
2
σˆ OLS : Ordinary least squares estimator for σ 2
2
σˆWLS : Weighted least squares estimator for σ 2
−1
ΣOLS : p×p matrix given by ΣOLS = F.T F.
ΣGLS −1
: p×p matrix given by ΣGLS = F.T S −1 ( β, γ ) F.
a : a×1 covariate vector of individual characteristics

A.. : (p+p′)×p×p compact acceleration array
θ
A : Parameter effects acceleration array
ι
A : Intrinsic acceleration array
AIC : Akaike’s Information Criterion
AUC : Area under the concentration time curve
AUMC : Area under the first moment curve
b : k×1 vector of random effects
C : (p+p′)×p×p array of relative curvatures
θ
C : Parameter effects relative curvature array
Cι : Intrinsic relative curvature array
v
C (t ) : Concentration of drug at time t
ClT : Total clearance
cθ : Root mean square parameters effect curvature

cι : Root mean square intrinsic curvature
cmax : Maximum concentration
Cp : Concentration of drug in plasma or central compartment
d : p-dimensional vector-valued function
D : Covariance matrix for the random effects
D : Administered dose
e : Random error
f : Fraction of administered dose which is absorbed
f ( x, θ ) : Nonlinear function on θ
f (θ ) : n×1 vector, named the expectation surface, that contains the functions f ( x, θ ) , ∀ x
F. : n×p matrix of derivatives of f ( θ ) with respect to the elements of θ
F.. : n×p×p array of second derivatives of f ( θ ) with respect to the elements of θ
k : Rate constant
ka : Absorption rate constant
kel : Elimination rate constant
k12 : Distribution rate constant for transfer of drug from compartment 1 to compartment 2
k21 : Distribution rate constant for transfer of drug from compartment 2 to compartment 1
m : Number of subjects in the sample
MRT : Mean residence time
n : Number of measures per subjects
p : Number of regression parameters in the nonlinear model
R : Intra-individual covariance matrix
s2 : Residual mean square
S (θ) : Residual sum of squares
S : Matrix proportional to the intra-individual covariance matrix R, specifically, S = σ 2 R

SBC : Schwar’s Bayesian Criterion
t1/ 2 : Elimination half-life
tlag : Lag-time
tmax : Time to maximum concentration

VC : Volume of the central compartment
VD : Volume of distribution
W : Weight matrix
vi
Ŵ : Estimated weight matrix
x : Covariate vector for the nonlinear regression model
y : Response variable in the nonlinear regression model
z : Vector of constants which include some or all of the covariates in x
vii
viii
1. Introduction
Pharmacokinetics is dedicated to the study of the time course of substances and their relationship with an
organism; a pharmacokinetic model is used to describe the concentration of such substances into the
organism over the time. In the modeling and analysis of these data an important distinction has to be
made between the case that the data comes from one individual and the case that the data comes from
several individuals. In the first case we deal with individual pharmacokinetics, and the results of the
experiments are usually analyzed using nonlinear regression models where the concentration in the body
of the inoculate substance at time t is the response variable; in the second case we deal with population
pharmacokinetics, and the main statistical tools are hierarchical nonlinear regression models, which are
also referred as nonlinear mixed effects models. Unilever is constantly working in the development and
improvement of healthy food, and in this task a lot of experimentation is made in the field of
pharmacokinetics. The objective of this project is to give an answer to some particular questions in the
analysis of such models.
A first concern is related to the parameterization of the model. In pharmacokinetics, practitioners are
usually interested in several pharmacokinetic parameters which are related among themselves through
nonlinear functions. For instance, the pharmacokinetic parameter total clearance (ClT) is related to the
parameters constant of elimination (kel) and volume of distribution (VD) through the function
ClT = kel ⋅ VD . As a result, a pharmacokinetic model can be fitted under several different
parameterizations, so a question arising here is which parameterization is more convenient in order to get
more accurate parameter estimates.
A second subject is the sampling strategy. Pharmacokinetic data is gathered for each subject over time, so
a first issue is to define the optimal sampling times. The strategy will be different depending if the main
goal is to determine the more appropriate functional form to describe the data or to estimate the
parameters for a model with a given functional form. Due to the fact that data are frequently taken from
human beings there are several limitations in the sampling strategy (ethical concerns, budget limitations,
pour control in the exact measure times, etc.). In this respect it is important to be able to get the maximum
amount of information from sparse data, the extreme case being how to take advantage from individuals
with less data points than parameters in the model (i.e., individuals for whom the individual nonlinear
regression model is not estimable). Indeed, an important concern here is the trade-off of number of
subjects versus number of measures per subject.
A third concern is the estimation of the population parameters. Traditionally these parameters have been
estimated by the naïve pooled data approach, which pooled all the data as if they come from a single
individual, and the two-stage approach, where individual models are fitted first for each subject, and then
the individual parameter estimates are used as building blocks to get estimators for the population
parameters. Both approaches have limitations. The naïve pooled data approach does not recognize
differences among individuals, so inter and intra-individual variation is lumped together; the two-stage
approach can not take advantage from subjects with less data points than parameters in the individual
model, and the population parameters obtained with this approach used to be biased. Here we explore a
third approach which seems to be more efficient, based on a linearization of the hierarchical nonlinear
model, that is used to model the two sources of random variation, the inter-individual and intra-individual
variation.
This report is organized as follows. In Section 2 we present a short introduction to the compartment
models, which are the main type of parametric models used in pharmacokinetics. In Section 3 we deal
with the individual pharmacokinetic models analysis, with a main focus in the effect of the
1
parameterization of the model. In Section 4 we approach the problem of the population analysis which is
based on the estimation of the hierarchical nonlinear regression model by linearization. In Section 5 we
treat the problem of sampling strategies; this problem is approached since the theory of optimal designs
and by simulation studies. Finally in Section 6 we present our conclusions and recommendations.
Throughout this report we illustrate the methods with some examples. All the computations and data
analysis are done using the R language and environment, version 2.2.1 for Windows. A good reference
for statistical applications with R is the book of Venables and Ripley (2002) and for the particular subject
of linear and nonlinear mixed effects models, which is the main statistical tool in this report, the book of
Pinheiro and Bates (2000).
2
2. Compartment Analysis
In pharmacokinetics a compartment is an entity which can be described by a definite volume and a

concentration of drug in that volume. Although the human body is conformed by millions of
compartments, a simplification (theoretical model) is made to just a few of them (mainly one or two). In
practice, rarely more than two compartments are used to model pharmacokinetic data.
Compartment models are a special class of nonlinear models where the response variable (concentration
of drug in the compartment) is described by an ordinary differential equation. These differential equations
describe the change of the concentration of drug in the compartment over time, a process that is usually of
first order kinetics. First order kinetics means that the rate of change of drug concentration in a
compartment at time t is directly proportional to the drug concentration in that compartment at that time.
Therefore, a process of first order kinetics can be described by the following differential equation:
dC (t )
= − kC (t ) , (2.1)
dt
where C (t ) is the drug concentration at time t, and k is named the rate constant.
2.1. Principal Compartment Models
In this section we present a description of some important compartment models. The main goal of this
section is to give some insight on the kind of models we will deal with through this report, so this list is
not exhaustive. In all the models presented here we consider first order kinetics for the rate constants and
that the drug is administered by a single dose with intravascular or extravascular administration. In an
intravascular administration the drug is directly introduced into the bloodstream, usually considered as the
central compartment, and therefore we assume that the drug is rapidly mixed in the blood or plasma. With
this kind of administration our model will contain just elimination rate if a one-compartment model is
used and elimination and between compartments distribution rates if a two-compartment (or more) model
is used. With extravascular administration (for instance oral or nasal), the drug must be absorbed by the
central compartment, so an absorption rate is also included in the model. These descriptions are mainly
based on Chapters 15 and 21 of Ritschel and Kearns (2004).
2.1.1. Open One-Compartment Model, Intravascular Administration
In this model there is just one rate constant, that is, the elimination rate constant. The pharmacokinetic
model is obtained by direct integration of (2.1) and hence defined by
C (t ) = C (0) ⋅ e − kel ⋅t .
In this model kel is the elimination rate constant, and C (0) is the drug concentration at time zero. Let D
be the administered dose. An important pharmacokinetic parameter is the apparent volume of distribution,
denoted by VD. In this model, VD is given by
3
D
VD = ,
C (0)
so the model can be written as

D − kel ⋅t
C (t ) = e . (2.2)
VD
2.1.2. Open One-Compartment Model, Extravascular Administration
In this model we have absorption and elimination rate constants. The pharmacokinetic model is given by
C (t ) = B ⋅ e − kel ⋅t − A ⋅ e − ka ⋅t ,
where ka is the absorption rate constant. The coefficients A and B are equal to
D ⋅ f ⋅ ka
A=B= ,
VD ( ka − kel )
where f is the fraction of administered dose which is absorbed. Therefore, the model can be written as
D ⋅ f ⋅ ka
C (t ) = e − kel ⋅t − e − ka ⋅t  . (2.3)
VD ( ka − kel ) 
2.1.3. Open Two-Compartment Model, Intravascular Administration
The pharmacokinetic model is given by

C (t ) = B ⋅ e − β ⋅t + A ⋅ e −α ⋅t . (2.4)
In this model α and β are called hybrid constants and are related to the micro rate constants k12, k21, and
kel by the following equations
α + β = k12 + k21 + kel
α ⋅ β = k21 ⋅ kel .
The micro rate constant k12 is the distribution rate constant for transfer of drug from compartment 1
(central compartment) to compartment 2 (peripheral compartment), k21 is the distribution rate constant for
transfer of drug from compartment 2 to compartment 1, and kel is the elimination rate constant of drug
from the central compartment. The coefficients A and B are given by
D (α − k21 )
A= ,
VC (α − β )
D ( k21 − β )
B= ,
VC (α − β )
where VC is the volume of the central compartment. Therefore, this pharmacokinetic model can be written
as
D
C (t ) = ( k21 − β ) e − β ⋅t − ( k21 − α ) e −α ⋅t  . (2.5)
VC (α − β )
2.1.4. Open Two-Compartment Model, Extravascular Administration
The pharmacokinetic model is given by
4
C (t ) = B ⋅ e − β ⋅t + A ⋅ e −α ⋅t − C (0) ⋅ e − ka ⋅t .
Here, C (0) is the hypothetical drug concentration at time zero obtained from A + B = C (0) . The
coefficients A and B are given by
D ⋅ f ⋅ ka (α − k21 )
A= ,
VC (α − β )( ka − α )
D ⋅ f ⋅ ka ( k21 − β )
B= .
VC (α − β )( ka − β )
In this model the volume of the central compartment, VC, is given by

D ⋅ f ⋅ ka ( k21 − ka )
VC = ,
−C (0) ( β − ka )(α − ka )
so the pharmacokinetic model can be written as
D ⋅ f ⋅ ka   k21 − α  −α ⋅t  k21 − β  − β ⋅t  k21 − ka  − ka ⋅t 

C (t ) =  e +  e +   e  . (2.6)
VC   ( ka − α )( β − α )   ( ka − β )(α − β )   (α − ka )( β − ka )  
2.2. Further Considerations
2.2.1. Lag-Time
When the drug is given by intravascular administration, we assume that the drug is rapidly mixed in the
central compartment in such a way that the first appearance of drug in the circulation system is virtually
immediate. However, with extravascular administration it is possible to observe a time interval between
administration of the drug and its first appearance. This time, denoted by tlag, is called lag-time and when
necessary must be incorporated in the model. For instance, in the one-compartment model with
extravascular administration, if a lag-time is considered, (2.3) becomes
D ⋅ f ⋅ ka  − kel ⋅(t − tlag ) − ka ⋅( t −tlag ) 

C (t ) = e −e .
VD ( ka − kel )  
2.2.2. Number of Compartments
As mentioned before, the human body is conformed by millions of compartments. A good theoretical
model must use the fewest number of compartments necessary to adequately describe the experimental
data. Most of the time, one or two compartments are enough.
In Section 2.1 we noted that the compartment models are sums of exponential terms. If the exponents of
these terms are sufficiently sparse (which is usually the case with pharmacokinetic data), we can split up
the model in different straight lines, one for each exponential term, when depicting it in a semilog plot.
Then the number of straight lines will give us insight concerning the number of components of the model.
For instance if β is considerably greater than α in model (2.4), then exp(− β ⋅ t ) will tend to zero faster
than exp(−α ⋅ t ) , and therefore, for sufficiently large t, we will have that
ln ( C (t ) ) ≈ ln ( A) − α ⋅ t .
This straight line can be observed in the right-hand side of Figure 2.1c. If we subtract exp(−α ⋅ t ) from
C (t ) and depict a semilog plot of this amount against time, we will get a second straight line, namely
5
ln ( C (t ) − e−α ⋅t ) ≈ ln ( B ) − β ⋅ t .
These two straight lines reveal the two compartments on the model and indeed show a different phase of
the kinetic process. We can clearly observe these phases in the plots of Figure 2.1. In Figure 2.1a we
observe just one straight line, which corresponds to the elimination phase (the only phase in the open one-
compartment model with intravascular administration). In Figure 2.2b we observe an absorption phase
(first part of the curve with positive slope) followed by the elimination phase (second part of the curve
with negative slope). In the open two-compartment model with intravascular administration shown in
Figure 2.1c we have a distribution phase (in this phase the drug is distributed between the two
compartments until reach equilibrium) followed by the elimination phase. Finally, in Figure 2.1d we can
see the three phases of the open two-compartment model with extravascular administration: absorption,
distribution, and elimination phase. A very simple method that utilizes this graphical characteristic in
order to determine the number of compartments is the method of residuals, also known as feathering or
peeling. A description of this method can be found in Chapter 4 of Shargel and Yu (1999). Besides its
utility in the determination of the number of compartments, this method is also proposed, mostly in the
pharmacokinetic literature, as a technique to obtain rough estimations of the pharmacokinetic parameters.
Open One-Compartment Model Open One-Compartment Model

Intravascular Administration Extravascular Administration
Log Concentration
Log Concentration
Time Time
(a) (b)
Open Two-Compartment Model Open Two-Compartment Model

Intravascular Administration Extravascular Administration
Log Concentration
Log Concentration
Time Time
(c) (d)
FIGURE 2.1 Semilog plots for one-compartment and two-compartment models with intravascular and extravascular
administration
The number of compartments may depend also on pharmacokinetic considerations. For example,
theophylline follows the kinetics of a one-compartment model after oral administration but of a two-
compartment model after intravascular administration. The reason is that the distribution phase is rapid,
and therefore, this phase is confounded with the absorption phase in oral administration (Shargel and Yu,
1999). Another factor that may affect the observed number of compartments is the times for blood
sampling. We can see that on Figure 2.1c; if the first sample comes too late, we can miss the distribution
phase. We will not pay attention to this issue here, so we will assume in our examples that the
compartment model is given.
6
3. Individual Pharmacokinetics
With data gathered from a single individual, the compartment pharmacokinetic models, as those shown in
equations (2.2), (2.3), (2.5), and (2.6), are fitted by using standard nonlinear regression techniques. In all
those models, the response variable, concentration of drug, depends through a nonlinear model on the
time after administration of drug. In Section 3.1 we present a description of the nonlinear regression
model, the estimation procedures, and asymptotic distributional results, and in Section 3.2 we focus on
inference for functions of the parameters in the model and reparameterization. This theory is mainly
based on Chapter 2 of Davidian and Giltinan (1995). As we will see in Section 3.2, inference in the
nonlinear regression model is based on a linearization of the expectation surface, so that the exact results
of linear regression can be asymptotically applied. This approximation will be appropriate if the
expectation surface around θ̂ , the estimated regression parameters, is fairly flat, and different
parameterizations will produce better or worse linear approximations. We pay some attention to this
subject in Section 3.3. In Section 3.4 we mention some available software to analyse nonlinear regression
models or the most specific individual pharmacokinetic models, and in Section 3.5 we illustrate the theory
with an example.
3.1. Nonlinear Regression Model
3.1.1. Model and Assumptions
The nonlinear regression model for a response variable yj taken at the jth covariate value xj, j = 1,…, n, is
usually written as
y j = f ( x j , θ) + e j . (3.1)
In this expression θ is a p×1 vector of regression parameters, f is a function which depends on a nonlinear
fashion on θ, and ej is the random error associated to the jth measured response. We can aggregate the n
equations in the more compact model
y = f (θ ) + e , (3.2)
where the jth element of (3.2) is given by (3.1). We will call E ( y ) = f ( θ ) the expectation surface.
The classical assumptions for the model specified in (3.1) are the following:
(i) The errors ej have mean zero.
(ii) The errors ej are uncorrelated.
(iii) The errors ej have common variance σ 2.
(iv) The errors ej are identically distributed for all xj.
(v) The errors ej are normally distributed.
Some nonlinear models can be transformed to a linear model by applying a suitable transformation. For
instance, the nonlinear model
7
f ( x, θ ) = θ1θ 2 x
can be transformed, applying a logarithmic transformation, to the linear model
ln ( f ( x, θ ) ) = ln (θ1 ) + ln (θ 2 ) x ,
and the nonlinear model (known as the Michaelis-Menten model)

θ1 x
f ( x, θ ) =
θ2 + x
can be transformed, with a reciprocal transformation, to the linear model
1 1 θ 1
= + 2 .
f ( x, θ ) θ1 θ1 x
These models are called “transformably linear” or “intrinsically linear”, and transforming models to a
linear form makes computations easier. However, we must consider that a transformation of the model
involves a transformation of the error terms too, and though sometimes the transformation can be helpful
to satisfy the classical assumptions, it can also be the case that it departs from them. Indeed, nonlinear
models are usually the result of some meaningful empirical or theoretical relation among the response
variable, the covariates, and the parameters, and it is desirable to preserve this relation in the analysis.
The last four assumptions are very restrictive and may not hold in some applications. However, the
classical nonlinear regression model may be generalized to accommodate some departures from these
assumptions. Specifically we will consider, by relaxing assumptions (ii) and (iii), the possibility of a
general covariance matrix R for the errors
Cov ( e ) = R ( θ, ξ ) , (3.3)
which can depend on the regression parameters θ and on some intra-individual covariance parameters
given in the vector ξ (σ included in ξ).
3.1.2. Least Squares Estimation
Under the classical assumptions, the ordinary least squares (OLS) estimator for θ, θˆ OLS , which minimizes
the error sum of squares
n 2
{
S (θ) = ∑ y j − f ( x j , θ)
j =1
} (3.4)
is also the maximum likelihood estimator of θ. The maximum likelihood estimator for σ 2 is
2
1 n
σɶ 2 = { (
∑ y j − f x j , θˆ OLS
n j =1
)} ,
which is generally biased downward. As in the linear case, σɶ 2 is usually replaced by

2
1 n
2
σˆ OLS =
n − p j =1
{
∑ y j − f x j , θˆ OLS( )} .
With a general covariance structure as specified in (3.3), we can apply the generalized least squares
principle. For convenience let us write
R ( θ, ξ ) = σ 2 S ( θ, γ ) ,
8
with γ the vector of intra-individual covariance parameters but without including σ. If S ( θ, γ ) = W −1 for
some known matrix W, then, under normality, the maximum likelihood estimator for θ is the weighted
least square estimator θˆ WLS minimizing
T
S ( θ ) = {y − f ( θ )} W {y − f ( θ )} , (3.5)
and σ 2 may be estimated by

1 T
2
σˆWLS =
n− p
{ ( )} W {y − f (θˆ )} .
y − f θˆ WLS WLS
In most of the cases it is unlikely that such a complete specification of the covariance structure be known.
In those cases the following iterative process can be used:
1. Get an initial estimator for θ, for instance θˆ OLS .
2. ˆ = S −1 θˆ , γˆ .
Obtain an estimator for γ, and form the estimated weight matrix W ( )
3. Using Ŵ reestimate θ by minimizing (3.5), and return to step 2.
The final estimator for θ is called the generalized least squares estimator and is denoted by θˆ GLS .
Closed form solutions for θˆ OLS and θˆ WLS are rarely available, and therefore, minimization of expressions
(3.4) and (3.5) require the use of iterative algorithms. The most common algorithms are based on
modifications of the Gauss-Newton algorithm such as the ones proposed by Levenberg (1944) and
Marquardt (1963), and Hartley (1961). Documentation about these algorithms can be found in Chapters 2
and 14 of Seber and Wild (1989) and in Chapters 2 and 3 of Bates and Watts (1988). Other options to
estimate the parameters are the Steepest-Descent method (see Section 13.2.3 of Seber and Wild, 1989)
and the Nelder-Mead Simplex algorithm (see Section 13.5.3 of Seber and Wild, 1989). All the Gauss-
Newton based algorithms work over a linearization of the expectation surface. The same approximation is
used to the get the asymptotic results and is illustrated in the next section.
3.1.3. Asymptotic Results
An important distinction between linear and nonlinear regression is that, even under the classical
assumptions, it is not possible in the nonlinear case to obtain exact distributional results for the
estimators. However, it is possible to obtain asymptotic results based on large sample theory that hold
even when the assumption of normality does not hold. It is of concern to note that these asymptotic results
are obtained from a linear approximation of the nonlinear expectation surface (cf. (3.6)). Here we present
the basis of this approximation following the discussion of Section 2.1.2 of Seber and Wild (1989).
Let θ∗ be the true value of θ . We can approximate the expectation surface in a small neighbourhood of
θ∗ by the linear Taylor expansion
p ∂f ( x j , θ )
f ( x j , θ ) ≈ f ( x j , θ∗ ) + ∑ (θ r − θ r∗ )
r =1 ∂θ r
θ∗
or
f ( θ ) ≈ f ( θ∗ ) + ( F. ) ( θ − θ∗ ) , (3.6)
where F. is the n×p matrix of derivatives of the expectation function f ( θ ) with respect to the parameters
θ given by
9
 ∂f ( x j , θ )  
∂f ( θ )
F. = =   .
∂θ  ∂θ r  
 
Equation (3.4) can be written as
T
S ( θ ) = {y − f ( θ )} {y − f ( θ )} , (3.7)
and hence, applying approximation (3.6) in (3.7) we have

T
{ } {y − f ( θ ) − ( F.) ( θ − θ )}
S ( θ ) ≈ y − f ( θ∗ ) − ( F. ) ( θ − θ∗ ) ∗ ∗
(3.8)
T
= {e − ( F. ) ( θ − θ )} {e − ( F. ) ( θ − θ )} .
∗ ∗
By analogy with the linear regression model, (3.8) is minimized when

−1
( θ − θ ) = ( F. F.)
∗ T
F.T e .
If θ̂ is within the small neighbourhood of θ∗ , then we have that
( θˆ − θ ) ≈ ( F. F.)
∗ T −1
F.T e .
This result shows how, under the linearization given in (3.6), we can get for the nonlinear regression
model similar results to those of the linear regression model, and therefore, that inference can be treated
in a similar fashion. The results can be extended to the case of generalized least squares. Below we
summarize the asymptotic results presented in Davidian and Giltinan (1995).
1. Under assumptions (i) to (iv) and under the additional condition that the errors are independent, we
have that, asymptotically
θˆ OLS ∼ N ( θ, σ 2 ΣOLS ) , ΣOLS

−1
= F.T F. . (3.9)
2. For a general covariance structure, as specified in (3.3), the estimator θˆ GLS has asymptotic normal
distribution
θˆ GLS ∼ N ( θ, σ 2 ΣGLS ) , ΣGLS

−1
= F.T S −1 ( θ, γ ) F. . (3.10)
It is important to mention that this result holds no matter if γ has been estimated or is known.
These asymptotic results can be used to construct approximate confidence intervals and hypothesis testing
procedures. Some inferential procedures can be found in Davidian and Giltinan (1995), and a
comprehensive discussion for the OLS case is given in Chapter 5 of Seber and Wild (1989). For a single
parameter, inference is straightforward by using the corresponding marginal distribution. For linear and
nonlinear functions of the parameters we can use the results presented in the next section.
3.2. Inference for Functions of the Estimated Parameters
It is sometimes of interest to make inference about quantities that are functions of the estimated
parameters1. That is the case, in pharmacokinetics, of parameters such as half-life or clearance. For
instance, in the open one-compartment model with intravascular administration presented in (2.2)
D − kel ⋅t
C (t ) = e
VD
the elimination half-life parameter is defined by
1
In pharmacokinetics these parameters are called secondary parameters.
10
ln 2
t1 2 = ,
kel
and the total clearance by

ClT = kel ⋅ VD .
For a linear combination cT θ of the elements of θ, the asymptotic results in (3.9) and (3.10) imply that
cT θˆ ∼ N ( cT θ, σ 2 cT Σc ) . (3.11)
For a nonlinear function c ( θ ) of the elements of θ, we can use, in the same way as in (3.6), the linear
Taylor expansion
() (
c θˆ ≈ c ( θ ) + cT θˆ − θ , ) (3.12)
where c is now the p×1 vector of partial derivatives of c with respect to the elements of θ. With this
linearization we have the following asymptotic result
()
c θˆ ∼ N ( c ( θ ) , σ 2 cT Σc ) . (3.13)
Note that this last expression also applies to the case of a linear function where c ( θ ) = cT θ .
Since σ 2 and Σ are usually unknown, we can use the following result of the standard statistical theory
T=
()
c θˆ − c ( θ )
∼ tn − p ,
12
σˆ ( c Σc
T ˆ )
where tn-p represents the Student t distribution with n-p degrees of freedom. Hence, an approximate
100 (1 − α ) % confidence interval for a function c ( θ ) is given by
12
() (
c θˆ ± tα 2, n − pσˆ cT Σc
ˆ ) . (3.14)
The vector c may be a function of the parameters in θ, and in that case it is replace by ĉ , obtained by
substituting θ by θ̂ .
The problem of getting confidence intervals for pharmacokinetic parameters, which sometimes are
functions of the parameters fitted in the model, was treated by Sheiner (1986). He presents some
approaches to this problem, and we summarize them below:
1. If the parameter of interest, let us say φ, is a 1 to 1 function c(⋅) of only one of the original
parameters, let us say θ, we can obtain a confidence interval for φ applying the transformation c(⋅) to
both sides of the original interval. That is, if [θ L , θ S ] is a 100 (1 − α ) % confidence interval for θ, then
c (θ L ) , c (θ S )  is a 100 (1 − α ) % confidence interval for φ. It holds because the confidence level
remains invariant under 1 to 1 transformations.
2. An approximate standard error can be computed for the new function (cf. (3.11) and (3.13)), and then
it can be used to construct confidence intervals or for hypothesis testing.
3. We can reparameterize the model in terms of the new parameter and then refit the data. If the
parameter of interest, φ, is a function of some of the p original parameters, that is φ = c (θ1 ,...,θ p ) ,
then we can chose one of the original parameters that is of little interest, let us say θ1, and then
replace it in the model by θ1 = h (θ 2 ,..., θ p , φ ) .
All the previous approaches seem reasonable, but the problem is that we can obtain different results with
them. For instance, under the asymptotic distributions presented in (3.9) and (3.10), the corresponding
11
confidence intervals with approaches 2 and 3 will be symmetric; however, approach 1 will not necessarily
produce a symmetrical interval. Similarly, if we obtain confidence intervals for θ and φ by using the
appropriate parameterization in each case, the resulting intervals will not be necessarily equivalent
through the function φ = c (θ ) . From a computational point of view, Sheiner (1986) emphasizes the fact
that with approaches 1 or 2 it is not necessary to refit the model while it is the case with approach 3.
However, due to the computational facilities available now it is not anymore an important consideration.
Without taking into account the computational aspect, Sheiner (1986) recommends the approach 3
because it is more direct. Nevertheless, if we use approach 2 with the results (3.11) or (3.13) to get the
standard error of the function of interest, approaches 2 and 3 are equivalent. In Section 3.5 we will
illustrate these ideas with an example.
3.3. Measures of Curvature or Nonlinearity
As exposed in Section 3.1.3, the key step to get the asymptotic results is the linearization of the
expectation surface given in (3.6), in such a way that the expectation surface is approximated by a tangent
plane at θ̂ . At some extend the precision of this approximation depends on the parameterization of the
model, so a bad parameterization could produce misleading results. It is possible to get some insight
about the appropriateness of the approximations under different parameterizations by computing a kind of
measures of curvature or nonlinearity. A good explanation of this subject is given in Chapter 4 of Seber
and Wild (1989) and Chapter 7 of Bates and Watts (1988). In this section we follow the theory of the
latter.
3.3.1. Intrinsic and Parameter Effects Nonlinearity
There are two aspects that determine the appropriateness of the linear approximation of the expectation
surface given in (3.6). The first aspect is the intrinsic curvature of the expectation surface also referred as
the planar assumption. The second aspect is whether straight, parallel, equispaced lines in the parameter
space map into nearly straight, parallel, equispaced lines on the expectation surface which is also referred
as the uniform coordinate assumption. In this section we present measures of both characteristics based
on the second derivatives of the expectation surface; for the planar assumption they are named measures
of intrinsic nonlinearity, and for the uniform coordinate assumption they are named measures of
parameter effects nonlinearity.
In Section 3.1.3 we define the n×p matrix of derivatives of the expectation surface with respect to the
parameters
 ∂f ( x j , θ )  
∂f ( θ )
F. = =   .
∂θ  ∂θ r  
 
Similarly, we introduce now the n×p×p array of second derivatives
 ∂ 2 f ( x j , θ )  
∂ 2f (θ )
F.. = =   .
∂θ∂θT  ∂θ r ∂θ s  
   
This is an array of n faces F..j, where each face is a complete p×p matrix of second derivatives2.
The matrix F. can be decomposed in p vectors f.r, and the array F.. can be regarded as consisting of p2
vectors f..rs. Following the terminology of Bates and Watts (1988), the tangent vectors f.r are also called
velocity vectors, since they give the rate of change of f(θ) with respect to each parameter, and the vectors
f..rs are called acceleration vectors, since they give the rates of change of the velocity vectors with respect
to the parameters. There are only p(p+1)/2 different acceleration vectors, so together with the p velocity
2
This matrix is also called the Hessian matrix.
12
vectors, the maximum dimension of the combined tangent and acceleration space is p(p+3)/2. Sometimes,
the combined dimension is only slightly larger than p, so we will denote the combined dimension by p+p′.
All the velocity vectors lie on the tangent plane. The acceleration vectors can be decomposed in two
components: a tangential component (in the tangent plane) and a normal component (orthogonal to the
tangent plane). This decomposition can be performed by a QR decomposition. To do that, let us form a
matrix D of dimension n×(p(p+3)/2) compounded of the p(p+1)/2 different acceleration vectors of F.. into
a matrix W.. and the p vectors of F.
D = ( F., W.. ) . (3.15)
Performing the QR decomposition of D we have that

D = QR = ( Q1 | Q '1 | Q 2 ) R ,
where Q1 contains the first p columns of Q and Q '1 the next p ' . Then, we form an array A.. by the
multiplication
A.. = ( Q1 | Q '1 )  [ F..] .

T
(3.16)
 
In this multiplication the element in the kth face, rth row, sth column of A.., is given by
n
{A..}krs = ∑ {( Q1 | Q '1 ) }
T
kj
{F..} jrs .
j =1
Then, A.. is a compact acceleration array of p + p ' faces of dimension p×p (instead of the n faces of F..).
The first p faces of A.. determine the projections of the acceleration vectors in the tangent space, so they
are the tangential components. These components measure the nonuniformity of the parameter lines on
the tangent plane, that is, they are a measure of parameter effects nonlinearity. The last p ' faces of A..,
which are the normal components, determine the projections of the acceleration vectors in the space
normal to the tangent space. These components measure how much the expectation surface deviates from
a plane, so they are a measure of intrinsic nonlinearity and do not depend on the parameterization but only
on the design and the form of the nonlinear function. To differentiate these two components, we will write
the first p faces of A.. as Aθ to denote the parameter effects acceleration array and the last p ' faces as Aι
to denote the intrinsic acceleration array.
To illustrate the concepts of intrinsic nonlinearity and the parameter effects nonlinearity we present an
example taken from Bates and Wild (1988). In this example the nonlinear function is
f ( x,θ ) = 60 + 70e − xθ ,
with the design x T = ( 4, 41) . In Figure 3.1 we plot the expectation surface for this design with marks for
θ = 0, 0.05, 0.10, …, 0.95, 1.0. In Figure 3.2 we plot the expectation surface with a different
parameterization, namely with φ = log10 θ , so the nonlinear function is expressed by
f ( x,θ ) = 60 + 70 exp ( − x10φ ) .
In this case we plot marks for φ = -2.0, -1.9, …, -0.1, 0. In both cases the expectation curves are identical.
This aspect is the intrinsic nonlinearity of the expectation surface and does not change with the
reparameterization because it is just a relabeling of the points on the curve. On the other hand, the points,
that are equally spaced in the θ and φ spaces, do not map into equally spaced points on the expectation
curves. This is the parameter effects nonlinearity which does depend on the parameterization, and as we
can see in Figures 3.1 and 3.2, is less severe with the φ parameterization. In Figure 3.3 we present the
expectation curve for a different design, namely with x T = ( 4,12 ) , and with the same parameterization
and the same marks for θ as in Figure 3.1. Now, as we can see, the different design affects the intrinsic
nonlinearity (the shape of the curve has changed).
13
0
120
100
f2
80
60
1
60 80 100 120 140

f1
FIGURE 3.1 Expectation surface with design xT = (4, 41) and parameterization in terms of θ.
120
-2
100
f2
80
60
60 80 100 120 140

f1
FIGURE 3.2 Expectation surface with design xT = (4, 41) and parameterization in terms of φ = log10θ.
0
120
100
f2
80
60
60 80 100 120 140

f1
FIGURE 3.3 Expectation surface with design xT = (4, 12) and parameterization in terms of θ.
14
Summarizing, the curvature of the expectation surface affects the precision of the asymptotic inferential
results. Intrinsic nonlinearity depends on the design (the time points to take the measures and the
nonlinear function in the regression model), and parameter effects nonlinearity depends on the
parameterization.
A final consideration concerning accelerations as nonlinearity measures is that they depend on the scaling
of the data and parameters. To avoid this problem, accelerations are converted to relative curvatures with
the following transformation:
−T −1
C = R11 A..R11 s p. (3.17)
−T −1
Here, each face of A.. is premultiplied and postmultiplied by R11 and R11 respectively to get each face
of C, the relative curvature array. In a similar way as with A.., we will use the notation Cθ to denote the
parameter effects relative curvature array and Cι to denote the intrinsic relative curvature array. The
matrix R11 comes from the original QR decomposition and is given by
R R12 
R1 = ( Q1 | Q '1 )  [ D] =  11
T
.
   0 R 22 
s is the root mean square error, and p the number of parameters.
3.3.2. Reparameterization
There is little guidance in the literature about which parameterization is more suitable for particular
models based on curvature measures. The problem is that a parameterization that gives good results for a
particular model with some dataset can produce poor results with another dataset (Bates and Watts, 1981).
Indeed, small changes in the design may change both, the intrinsic and parameter effects curvatures, in
unexpected ways. For instance, Bates and Watts (1980) added two data points to two data sets; in one of
them adding the two data points increased the intrinsic curvatures and decreased the parameter effects
curvatures while in the other, both, the intrinsic and parameter effects curvatures were increased.
Therefore, to have some insight about which parameterization is more suitable in a particular case, we
would need to compute the relative curvature measures for each parameterization with a specified model
and design.
In order to facilitate comparisons, it is helpful to possess a single overall measure of nonlinearity. Bates
and Watts (1988) use the root mean square (RMS) curvature, denoted by c and defined by
1  p p 2  p  
2
c= ∑ 2∑∑ ckrs +  ∑
p ( p + 2 ) k  r =1 s =1
ckrr   , (3.18)
  r =1  
with ckrs the (r,s)th element of the kth face of C. Running the index k from 1 to p we obtain the RMS
parameters effect curvature c θ , and running k from p+1 to p + p ' we obtain the RMS intrinsic curvature
cι .
With these measures we can compare different parameterizations to find the one with less curvature
effects. Hence, if we get the best results with a parameterization φ, where the elements of φ are functions
of the parameters of main interest in θ defined by
φ = G (θ) , (3.19)
we can use the inverse of (3.19) to get confidence intervals on θ.
Scaling (3.17) by s p makes curvature measures comparable with the F distribution because in the
linear case, a (1 − α ) joint confidence region for θ is given by the set
{θ : (θ − θˆ ) Σ
T
−1
(θ − θˆ ) ≤ ps F(
2
1−α , p , n − p ) }.
15
There isn’t a clear criterion to decide if the curvature measures in a particular case are satisfactory or not.
Bates and Watts (1988) compute the RMS curvatures for 67 data sets and consider them as acceptable if
cθ F and cι F are less than 0.3, with F the 0.95 quantil of the F distribution with p and n-p degrees of
freedom; this reference value strongly depends on geometrical considerations. We present an example of
these computations in Section 3.5.4 to compare three parameterizations of the one-compartment model
with extravascular administration.
3.4. Software
In this section we present an overview of the characteristics of the commercial software SAS and
WinNonLin and the free software R to fit individual pharmacokinetic models and nonlinear regression
models.
3.4.1. R
For nonlinear regression fitting R has the nls function. By default it works with the Gauss-Newton
algorithm, and it produces least squares estimates of the parameters of the model. Weighted least squares
estimates are not yet implemented.
R has two packages to estimate pharmacokinetic parameters and fit pharmacokinetic models, the PK and
PKfit packages. The PK package estimates the area under the concentration time curve (AUC), the area
under the first moment curve (AUMC), and the half life parameters given concentration time data for a
single individual. The PKfit package allows estimating several compartment models (one-compartment,
two-compartment, and macroconstant exponential functions with one, two, and three exponential terms)
and performs simulations with each of them. Simulations can be done using normally and uniformly
distributed random errors, and it is possible to relate their variability to the actual concentrations. The
PKfit package is equipped with a menu-based interface which is invoked with the PKmenu() function.
This package fits the models using three different methods: the Nelder-Mead Simplex algorithm, the
Genetic algorithm, and by calling the nls function, and enables three weighting schemes: equal weight,
1/Cp, and 1/Cp2.
3.4.2. SAS
The NLIN procedure produces least squares and weighted least squares estimates of the parameters of a
nonlinear model. The estimation can be performed using four different algorithms: Steepest-Descent,
Newton, Gauss-Newton, and Marquardt. Confidence intervals for the parameters of the model are
computed using formula (3.14).
3.4.3. WinNonlin
WinNonlin fits one, two, and three-compartment models using three different algorithms: the Nelder-
Mead Simplex, the Gauss-Newton algorithm with Hartley modification, and the Gauss-Newton algorithm
with Hartley and Levenberg modification, the latter used as default. It allows using different weighting
schemes: user specified weights, 1/Cpn, and 1/predicted Cpn for user specified power n. WinNonlin can
make simulations for several predefined compartment models with one, two, and three-compartments and
also for user defined models; however, the models must be fitted with data first, which becomes a great
limitation. Confidence intervals are computed for all the parameters of the model but not for secondary
parameters. For secondary parameters standard errors are computed using the second approach mentioned
in Section 3.2. If the secondary parameter is a linear combination of the parameters of the model, the
computation of the standard error is direct according to (3.11), otherwise, it is computed from the linear
term of a Taylor series expansion of the secondary parameter according to (3.12) and (3.13).
16
3.5. Example: One-Compartment Model, Extravascular Administration
3.5.1. Data and Nonlinear Model
In this example we will analyze data from an open one-compartment model with intranasal administration
gathered from a single individual. The concentration time data is given in Table 3.1 and is stored in the
data frame ex1 in R.
TABLE 3.1 Concentration time data for a single individual after

intranasal administration of 5 mg of drug.
Time (hrs) Concentration (mg/l)
0.00 0.0000
0.25 0.0081
0.50 0.0092
0.75 0.0098
1.00 0.0089
2.00 0.0072
4.00 0.0043
6.00 0.0027
The open one-compartment model with extravascular administration, given in (2.3) is

D ⋅ f ⋅ ka
C (t ) = e − kel ⋅t − e − ka ⋅t  .
The parameters to estimate in this model are the absorption constant ka, the elimination constant kel, and
the volume of distribution VD. The administered dose D and the fraction of administered dose which is
absorbed f are given constants. In this example we will assume that f = 1 and D = 5 mg. Note that from a
statistical point of view, D, f, and VD form together a single parameter, so any specification on the values
of D and f will be reflected in the estimated value of VD. In order to follow the standard nonlinear
regression notation, we rewrite the model by
5θ1
f ( x, θ ) = e −θ2 ⋅ x − e−θ1 ⋅ x  ,
θ3 (θ1 − θ 2 ) 
with θ1 = ka, θ2 = kel, and θ3 = VD.
3.5.2. R Output
In Table 3.2 we show the results obtained with the PKfit package of R. The initial values for the
parameters required for the estimation process are θ1 = 5, θ2 = 0.2, and θ3 = 500. Note that the PKfit
package uses the labels ka, kel, and Vd for the parameters. The model is fitted using equal weights, that
is, by ordinary least squares.
The estimated parameters are (according to the Nelder-Mead Simplex algorithm and the nls function)
θˆ1 = 5.521 hr −1 , θˆ2 = 0.2438 hr -1 , and θˆ3 = 451.3L , so the estimated model is (with D = 5 mg and f = 1)
( 5 )( 5.521) e −0.2438⋅ x − e −5.521⋅ x 

f ( x, θ ) =
451.3 ( 5.521 − 0.2438 ) 
= 0.01159  e −0.2438⋅ x − e −5.521⋅ x  .
17
TABLE 3.2 Analysis of the concentration time data using the PKfit package.
<< The value of parameter fitted by genetic algorithm >>
Parameter Value
1 ka 7.3002211
2 kel 0.2174612
3 Vd 474.3527005
<< The value of parameter fitted by Nelder-Mead Simplex slgorithm >>

Parameter Value
1 ka 5.5209976
2 kel 0.2438322
3 Vd 451.3250264
<< Residual sun-of-squares and parameter values fitted by nls >>

2.591939e-07 : 5.5209976 0.2438322 451.3250264
<< Output >>

time Observed Calculated Wtd Residuals AUC AUMC
1 0.00 0.0000 0.000000000 0.000000e+00 0.0000000 0.000000000
2 0.25 0.0081 0.007989802 1.101983e-04 0.0010125 0.000253125
3 0.50 0.0092 0.009526818 -3.268180e-04 0.0031750 0.001081250
4 0.75 0.0098 0.009468888 3.311119e-04 0.0055500 0.002575000
5 1.00 0.0089 0.009036054 -1.360537e-04 0.0078875 0.004606250
6 2.00 0.0072 0.007116987 8.301315e-05 0.0159375 0.016256250
7 4.00 0.0043 0.004370272 -7.027193e-05 0.0274375 0.047856250
8 6.00 0.0027 0.002683714 1.628555e-05 0.0344375 0.081256250
<< AUC (0 to infinity) computed by trapezoidal rule >>

[1] 0.04551069
<< AUMC (0 to infinity) computed by trapezoidal rule >>

[1] 101.0658
<< Akaike's Information Criterion (AIC) >>

[1] -109.2580
<< Log likelihood >>

'log Lik.' 57.62902 (df=3)
<< Schwarz's Bayesian Criterion (SBC) >>

[1] -109.0197
Formula: conc ~ modfun(time, ka, kel, Vd)
Parameters:
Estimate Std. Error t value Pr(>|t|)
ka 5.52100 0.42989 12.84 5.10e-05 ***
kel 0.24383 0.01254 19.44 6.64e-06 ***
Vd 451.32503 9.27550 48.66 6.93e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.0002277 on 5 degrees of freedom
Correlation of Parameter Estimates:

ka kel
kel -0.5188
Vd 0.6862 -0.761
In Figure 3.4 we show concentration time plots of the data together with the fitted curve in linear and
logarithmic scale, and plots of residuals3 versus time and versus fitted values. These plots are also
generated by the PKfit package.
3
These are standardized residuals.
18
Subject 1 plot Subject 1 plot
0.010
Linear Semi-log
0.008
0.007
Concentration
Concentration
0.005
0.004
0.003
0.000
0 1 2 3 4 5 6 0 1 2 3 4 5 6
Time Time
Residual Plots Residual Plots

3 e-04
3 e-04
Weighted Residual
Weighted Residual
1 e-04
1 e-04
-1 e-04
-1 e-04
-3 e-04
-3 e-04
0 1 2 3 4 5 6 0.000 0.002 0.004 0.006 0.008
Time Calc Cp(i)
FIGURE 3.4 Fitted curve and plot of residuals
The PKfit package computes the residuals and the calculated concentration values, and they are shown in
Table 3.2 below the title <<Output>>. The pharmacokinetic parameters area under the concentration time
curve (AUC) and area under the first moment curve (AUMC) are also computed.
The Akaike’s Information Criterion (AIC), the log-likelihood, and the Schwar’s Bayesian Criterion
(SBC4) are measures of the goodness of fit of the model and can be used as criteria to decide the best
model to describe the data. Evidently, a model with more parameters will fit the data better, and therefore
it will always have a greater likelihood than a model with less parameters. To obtain a compromise
between the goodness of fit of a model and its simplicity, the AIC and SBC criteria introduce a penalty in
the likelihood of the model as a function of the number of estimated parameters. In this respect, the AIC
and SBC values are defined by5
()
AIC = −2 l θˆ + 2 p ,
()
SBC = −2 l θˆ + ln ( n ) p ,
where l (θˆ ) is the log-likelihood value in the estimated parameters θ̂ , n is the number of observations,
and p the number of parameters in the model. Hence, according to these criteria, while comparing models
the one with the lower AIC or SBC should be preferred. In our example we have that
Log-likelihood = 57.63 ,
AIC = −2 × 57.63 + 2 × 3 = −109.3 ,
4
It is usually called in the literature the Bayes Information Criterion (BIC).
5
These are the formulas used in R.
19
SBC = −2 × 57.63 + ln(8) × 3 = −109.0 .
Below these measures, the R output presents a table with the estimated parameters, their asymptotic
standard deviation, and the results of a t test. In this example all the parameters result significant. Finally,
it presents the correlations between the estimated parameters. Since we did not use weighted least
squares, these correlations are computed from the asymptotic result given in (3.9) with σˆ = 0.0002277 .
3.5.3. Confidence Intervals
Confidence Interval for the Absorption Constant, the Elimination Constant, and the Volume of
Distribution
Given a one-dimensional linear or nonlinear function c ( θ ) of the parameters in θ, we can compute an

approximate 100 (1 − α ) % confidence interval for c ( θ ) using formula (3.14)
12
()
c θˆ ± tα 2, n − pσˆ cT Σc
ˆ ( ) .
In this example σˆ = 0.0002277 , n = 8, p = 3, and
 3566181 −54034 52811240 

ˆΣ =  −54034 3037 −1709438  .

52811240 −1709438 1660369983
Hence, the approximate 95% confidence intervals for θ1, θ2, and θ3 are given by (respectively)
5.521 ± 2.571( 0.4300 ) = [ 4.416; 6.626] ,
0.2438 ± 2.571( 0.01255 ) = [ 0.2116; 0.2761] ,
451.3 ± 2.571( 9.277 ) = [ 427.5; 475.2] .
Confidence Interval for the Elimination Half-Life
Now we compute a confidence interval for the elimination half-life parameter with the three approaches
discussed in Section 3.2. For the open one-compartment model with extravascular administration we have
that t1 2 = ln 2 kel , so the reparameterization φ = G ( θ ) is given by
ln 2
φ1 = θ1 , φ2 = , φ3 = θ 3 .
θ2
Following the first approach, we have to apply this transformation to the confidence interval computed
for θ. Hence, with this method the estimated value of φ2 is 2.843 and the correspondent 95% confidence
interval [ 2.511; 3.276] . Note that this interval is not symmetric.
The second approach consists in computing an approximate standard error for the new parameter. The
computations are straightforward is we apply the result (3.13) with c ( θ ) = ln 2 θ 2 and the result (3.14).
Then we have that φˆ2 = 2.843 (the same as in approach 1), cT = 0, − ln 2 θ 22 , 0 , and by replacing θ2
with its estimate, cˆ T = [ 0, −11.66, 0] . The resulting 95% confidence interval is
2.843 ± 2.571( 0.1463) = [ 2.467; 3.219] .
The third approach, reparameterization, implies fitting the model

5φ1
f ( x, φ ) = e − ln 2 φ ⋅ x − e −φ ⋅ x  .
2 1
φ3 (φ1 − ln 2 φ2 ) 
20
The ordinary least squares estimation of this model using the nls function of R is shown in Table 3.3.
The starting values are φ1 = 5, φ2 = 3.5, and φ3 = 500.
TABLE 3.3 Analysis of the model reparameterized in terms of elimination half-life using the
nls function.
> half_life <- nls(Conc~5*Ka/(Vd*(Ka-log(2)/t_half))*(exp(-log(2)/t_half*Time)-
+ exp(-Ka*Time)), data=ex1, start = c(Ka=5, t_half=3.5, Vd=500), model=T)
> summary(half_life)
Formula: Conc ~ 5 * Ka/(Vd * (Ka - log(2)/t_half)) * (exp(-log(2)/t_half *

Time) - exp(-Ka * Time))
Parameters:
Ka 5.5211 0.4300 12.84 5.10e-05 ***
t_half 2.8427 0.1463 19.43 6.66e-06 ***
Vd 451.3271 9.2781 48.64 6.94e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Ka t_half
t_half 0.5192
Vd 0.6863 0.7612
With these results, the estimated half-life is φˆ2 = 2.843 6, and again using formula (3.14) with c ( φ ) = φ2 ,
the approximate 95% confidence interval is
2.843 ± 2.571( 0.1463) = [ 2.467; 3.219] .
As we can see, approaches 2 and 3 give the same results.
Confidence Interval for Total Clearance
As a second example, we compute a confidence interval for total clearance, ClT, which in the one-
compartment model is defined by
ClT = kel ⋅ VD .
Hence, the reparameterization φ = G ( θ ) is given by
φ1 = θ1 , φ2 = θ 2 , φ3 = θ 2θ 3 .
Applying the second approach7 we have that c ( θ ) = θ 2θ 3 , cT = [ 0,θ 3 , θ 2 ] , cˆ T = [ 451.3, 0.2438] , and by
(3.14), the approximate 95% confidence interval is
110.0 ± 2.571( 4.205 ) = [99.24; 120.86] .
In Table 3.4 we present the results of the estimation of the model reparameterized in terms of total
clearance
6
Note that with the three approaches the point estimates are the same. This fact relies in an appealing characteristic of
()
least squares estimators, namely that for φ = G ( θ ) , if θ̂ is the least squares estimator of θ , then φˆ = G θˆ is the
least squares estimator of φ .
7
The first approach does not apply in this case because total clearance is a function of more than one parameter.
21
5φ1
f ( x, φ ) = e −φ2 ⋅ x − e −φ1 ⋅ x  .
φ3 (φ1 − φ2 ) φ2 
TABLE 3.4 Analysis of the model reparameterized in terms of total clearance using the nls
function.
> total_clearance<-nls(Conc~5*Ka/(Cl/Kel*(Ka-Kel))*(exp(-Kel*Time)-exp(-Ka*Time)),
+ data=ex1, start=c(Ka=5, Kel=0.2, Cl=100), model=T)
> summary(total_clearance)
Formula: Conc ~ 5 * Ka/(Cl/Kel * (Ka - Kel)) * (exp(-Kel * Time) - exp(-Ka *

Time))
Parameters:
Ka 5.52109 0.43000 12.84 5.10e-05 ***
Kel 0.24383 0.01255 19.43 6.66e-06 ***
Cl 110.04715 4.20543 26.17 1.52e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Ka Kel
Kel -0.5192
Cl -0.3300 0.9372
Thus, the approximate 95% confidence interval for total clearance with the third approach is
110.0 ± 2.571( 4.205 ) = [99.24; 120.86] .
3.5.4. Measures of Curvature
In this section we compute the measures of intrinsic and parameter effects nonlinearity for each of the
three parameterizations treated in Section 3.5.3; the R code for these computations is presented in the
final Apendix. In each case we have three parameters, so we have three velocity vectors and nine
acceleration vectors, six of them being different.
Given the original parameterization
5θ1
f ( x, θ ) = e −θ 2 ⋅x
− e−θ1 ⋅ x  ,
θ 3 (θ1 − θ 2 ) 
the elements of F. are defined by
∂f ( x, θ ) 5θ1 ( −e −θ1 x + e −θ2 x ) 5 ( −e −θ1 x + e −θ2 x + θ1e −θ1 x x )

=− + ,
∂θ1 2
(θ1 − θ 2 ) θ3 (θ1 − θ 2 )θ3
∂f ( x, θ ) 5θ1 ( −e −θ1 x + e −θ2 x ) 5θ1e −θ2 x x
= − ,
∂θ 2 2
(θ1 − θ 2 ) θ3 (θ1 − θ 2 )θ3
∂f ( x, θ )
=−
(
5θ1 −e −θ1 x + e −θ2 x ),
∂θ 3 (θ1 − θ 2 )θ 2
3
and the elements of F.. by
22
∂ 2 f ( x, θ ) 10θ1 ( −e −θ1 x + e−θ 2 x ) 10 ( −e −θ1 x + e −θ2 x + θ1e−θ1 x x ) 5 ( 2e −θ1 x x − θ1e −θ1 x x 2 )
= − + ,
∂θ12 3
(θ1 − θ 2 ) θ3 (θ1 − θ 2 ) θ3
2
(θ1 − θ 2 )θ3
∂ 2 f ( x, θ ) 10θ1 ( −e −θ1 x + e −θ2 x ) 5 ( −e−θ1 x + e −θ2 x + θ1e −θ1 x x + θ1e −θ2 x x ) 5e −θ2 x x
=− + − ,
∂θ1θ 2 3
(θ1 − θ 2 ) θ3 (θ1 − θ 2 ) θ3
2
(θ1 − θ 2 )θ3
∂ 2 f ( x, θ ) 5θ1 ( −e−θ1 x + e −θ2 x ) 5 ( −e −θ1 x + e −θ2 x − θ1e −θ1 x x )
= − ,
∂θ1θ 3 2
(θ1 − θ 2 ) θ32 (θ1 − θ 2 )θ32
∂ 2 f ( x, θ ) ∂ 2 f ( x, θ )
= ,
∂θ 2θ1 ∂θ1θ 2
∂ 2 f ( x, θ ) 10θ1 ( −e −θ1 x + e−θ2 x ) 10θ1e −θ2 x x 5θ1e −θ2 x x 2

= − + ,
∂θ 22 3
(θ1 − θ 2 ) θ3
2
(θ1 − θ 2 ) θ3 (θ1 − θ 2 )θ3
∂ 2 f ( x, θ ) 5θ1 ( −e −θ1 x + e −θ2 x ) 5θ1e −θ1 x x
=− + ,
∂θ 2θ 3 2
(θ1 − θ 2 ) θ32 (θ1 − θ 2 )θ32
∂ 2 f ( x, θ ) ∂ 2 f ( x, θ )
= ,
∂θ3θ1 ∂θ1θ 3
∂ 2 f ( x, θ ) ∂ 2 f ( x, θ )
= ,
∂θ 3θ 2 ∂θ 2θ 3
∂ 2 f ( x, θ ) 10θ1 ( −e −θ1 x + e −θ2 x )

= .
∂θ32 (θ1 − θ 2 )θ33
In Table 3.5 we present the velocity and acceleration vectors for our data evaluated at
T
θˆ = ( 5.521, 0.2438, 451.3) .
TABLE 3.5 Velocity and acceleration vectors.
Velocity Acceleration
Time f.1 f.2 f.3 f..11 f..12 f..13 f..22 f..23 f..33
0.00 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
0.25 0.000662 -0.001212 -0.000018 -0.000169 -0.000139 -0.000001 0.000222 0.000003 0.000000
0.50 0.000287 -0.003325 -0.000021 -0.000159 -0.000245 -0.000001 0.001305 0.000007 0.000000
0.75 0.000059 -0.005446 -0.000021 -0.000076 -0.000268 0.000000 0.003366 0.000012 0.000000
1.00 -0.000029 -0.007370 -0.000020 -0.000019 -0.000254 0.000000 0.006289 0.000016 0.000000
2.00 -0.000059 -0.012886 -0.000016 0.000022 -0.000148 0.000000 0.023585 0.000029 0.000000
4.00 -0.000037 -0.016653 -0.000010 0.000014 -0.000018 0.000000 0.063615 0.000037 0.000000
6.00 -0.000022 -0.015594 -0.000006 0.000009 0.000034 0.000000 0.090703 0.000035 0.000000
The tangent space has dimension three. The combined tangent and acceleration spaces have dimension
five because four of the acceleration vectors are linear combinations of the velocity vectors. We show the
linear combinations below.
∂ 2 f ( x, θ ) 1 ∂f ( x, θ ) θ2 ∂f ( x, θ ) θ3 ∂f ( x, θ )
= − + ,
∂θ1∂θ 2 (θ1 − θ 2 ) ∂θ1 θ1 (θ1 − θ 2 ) ∂θ 2 θ1 (θ1 − θ 2 ) ∂θ 3
23
∂ 2 f ( x, θ ) 1 ∂f ( x, θ )
=− ,
∂θ1θ 3 θ3 ∂θ1
∂ 2 f ( x, θ ) 1 ∂f ( x, θ )
=− ,
∂θ 2θ3 θ 3 ∂θ 2
∂ 2 f ( x, θ ) 2 ∂f ( x, θ )
=− .
∂θ32 θ3 ∂θ 3
Hence, for our future computations, p = 3 and p ' = 2 . The vectors in Table 3.5 form the matrix D given
in (3.15). Now we perform a QR decomposition on D and compute the compact acceleration array A..
given in (3.16). In Table 3.6 we show the results obtained with R for the array A.. which is composed of
five faces, each of them represented as a 3×3 matrix.
TABLE 3.6 R output for the compact acceleration array A...

> A1
[,1] [,2] [,3]
[1,] 2.245701e-04 2.222226e-04 1.613188e-06
[2,] 2.222226e-04 7.174226e-03 -4.267707e-07
[3,] 1.613188e-06 -4.267707e-07 -1.035020e-07
> A2
[,1] [,2] [,3]
[1,] 2.131654e-05 2.120827e-04 -1.596977e-21
[2,] 2.120827e-04 -1.017923e-01 -6.199946e-05
[3,] -1.596977e-21 -6.199946e-05 -1.276624e-07
> A3
[,1] [,2] [,3]
[1,] 7.864738e-05 3.801505e-04 -1.337680e-21
[2,] 3.801505e-04 4.114165e-02 3.243375e-21
[3,] -1.337680e-21 3.243375e-21 -1.087519e-07
> A4
[,1] [,2] [,3]
[1,] -6.095127e-05 1.325061e-20 -5.494046e-23
[2,] 1.325061e-20 1.847316e-02 -5.277412e-22
[3,] -5.494046e-23 -5.277412e-22 -7.973730e-24
> A5
[,1] [,2] [,3]
[1,] 9.367820e-22 -5.755854e-20 3.568832e-23
[2,] -5.755854e-20 -2.087244e-02 -1.273858e-21
[3,] 3.568832e-23 -1.273858e-21 9.649902e-24
The first three faces (A1, A2, and A3 in Table 3.6) give the projections of the acceleration vectors on the
tangent plane and constitute a measure of the parameter effects nonlinearity; we denote these three faces
by Aθ. The last two faces (A4 and A5) give the projections of the acceleration vectors on the space
normal to the tangent plane spanned by the acceleration vectors and are a measure of intrinsic
nonlinearity; they are denoted by Aι. As we can see in these results, the parameter effects nonlinearity is
larger than the intrinsic nonlinearity.
In Table 3.7 we present the relative curvatures, computed with formula (3.17). In this example,
s = 0.0002277 and p = 3 . The matrix R11 is also presented in Table 3.7.
24
TABLE 3.7 R output for the relative curvature array C
> R11
[,1] [,2] [,3]
[1,] -7.280754e-04 1.926132e-04 2.335663e-05
[2,] -2.268129e-21 2.798204e-02 2.880874e-05
[3,] 1.518704e-20 -1.389134e-19 2.454134e-05
> C1
[,1] [,2] [,3]
[1,] 0.167079417 -0.005451944 -0.188220720
[2,] -0.005451944 0.003680740 0.000867986
[3,] -0.188220720 0.000867986 0.144227621
> C2
[,1] [,2] [,3]
[1,] 0.015859437 -0.004214736 -0.01014622
[2,] -0.004214736 -0.051214704 0.02852463
[3,] -0.010146215 0.028524631 -0.06562669
> C3
[,1] [,2] [,3]
[1,] 0.058513386 -0.007761854 -0.04657717
[2,] -0.007761854 0.020826800 -0.01706114
[3,] -0.046577168 -0.017061138 -0.00685723
> C4
[,1] [,2] [,3]
[1,] -0.0453475368 0.0003121479 0.04279200
[2,] 0.0003121479 0.0093026323 -0.01121731
[3,] 0.0427919974 -0.0112173116 -0.02755840
> C5
[,1] [,2] [,3]
[1,] 6.969627e-19 1.399622e-18 -3.094041e-18
[2,] 1.399622e-18 -1.051328e-02 1.234139e-02
[3,] -3.094041e-18 1.234139e-02 -1.448739e-02
Again, the three first faces (C1, C2, and C3) correspond to the parameter effects nonlinearity and the last
two (C4 and C5) to the intrinsic nonlinearity. We use these values with formula (3.18) to get the RMS
curvatures for parameter effects and intrinsic nonlinearity
3 
 
2
1 3 3
 3
cθ = ∑  2∑∑ ckrs +  ∑ ckrr   = 0.16124 ,
3 ( 3 + 2 ) k =1  r =1 s =1
2
 r =1  
5 
 
2
1 3 3
 3
cι = ∑ ∑∑ krs  ∑
 2
3 ( 3 + 2 ) k = 4  r =1 s =1
c 2
+
r =1
ckrr   = 0.03611 .
 
Finally, we compute the RMS curvatures for the parameterizations in terms of the half-life and total
clearance. The relative curvature array can be computed in the same way as we made for the original
parameterization. For the half-life parameterization we have
c θ = 0.17636 ,
c ι = 0.03611 ,
and for the total clearance parameterization
c θ = 0.21682 ,
c ι = 0.03611 .
As we can see, different parameterizations change the parameter effects curvature but the intrinsic
curvature remains constant. For the original parameterization, with F the 0.95 quantile of the F
25
distribution with 3 and 5 degrees of freedom, we have that c θ F = 0.375 and c ι F = 0.084 . For the
half-life parameterization c θ F = 0.410 and for the total clearance parameterization c θ F = 0.504 . If
we compare these values with the 0.3 reference value given by Bates and Watts (1988), the parameter
effects curvatures are a little bit high. Anyhow, for this specific model and design, the parameterization in
terms of ka, kel, and VD seems to be more convenient.
An important feature to note is that in the three cases the nonlinearity caused by the parameterization is
much longer than the intrinsic nonlinearity of the model. This characteristic was noted by Bates and Watts
(1980) in a work when they computed nonlinearity measures for 24 data sets.
26
4. Population Pharmacokinetics
Different individuals respond in different ways to drugs. Hence, under the reasonable assumption that in a
specific situation (that is, determined drug, concentration, and administration route) the pharmacokinetic
model to explain the concentrations of the drug in the body over time is the same in all the individuals,
the estimated parameters for each individual will be different; in fact, the actual unknown individual
parameters must be different. If we consider the different individual parameters as members of a
population, then the new goal is to find a probabilistic description of this population. In this section we
will discuss this problem, and as usual, this description will be based mainly on the first two moments of
the distribution.
In Section 3 we dealt with individual pharmacokinetics, so there we considered only intra-individual
variation. Now, with several individuals, the inter-individual variation is incorporated. In Section 4.1 we
start by setting up the general model that adequately accommodate both sources of variability. In Section
4.2 we present two traditional methods, quite often used in the past, to estimate the population
pharmacokinetic parameters, and in Section 4.3 we present a different approach, based on the
linearization of the model, apparently more efficient. In Section 4.4 we spend some words on available
software, and in Sections 4.5 and 4.6 we illustrate the different methods presented here with examples.
4.1. Hierarchical Nonlinear Models
The hierarchical nonlinear model gives the general framework to analyze repeated measures data in
nonlinear models, which is the kind of data that arise in the field of population pharmacokinetics. The
theory presented in this section is mainly based on Chapter 4 of Davidian and Giltinan (1995).
4.1.1. The Model
Let yij denote the jth response, j = 1,…, ni, for the ith individual, i = 1,…, m. The hierarchical nonlinear
model is given by
yij = f ( xij , θi ) + eij ,
where f ( xij , θi ) is a nonlinear function common to all individuals which depends on a vector of
covariates xij and a vector of possible different parameters θi of dimension p × 1 , and eij is the random
error term associated to the jth response in the ith individual.
4.1.2. Intra-Individual Variation
We can summarize the data for the ith individual as

y i = f i ( θ i ) + ei , (4.1)
T T T
( )
with y i =  yi1 ,..., yini  , fi ( θi ) =  f ( xi1 , θi ) ,..., f xini , θi  , and ei = ei1 ,..., eini  .
 
27
Intra-individual variation is specified by the systematic variation given through the function f and the
random variation characterized by an assumption on the conditional distribution of the random errors
given the ith individual. The general assumption is that
E ( ei | θi ) = 0 , Cov ( ei | θi ) = R i ( θi , ξ ) , (4.2)
with some defined probability distribution. The covariance matrix R (cf. (3.3)) depends on the parameters
of the model θi and on some intra-individual covariance parameters given in the vector ξ. The functional
form of Ri and the covariance parameters ξ are the same for all the individuals, so Ri differs across
individuals only through its dependence on θi. Although it is possible to extend the model for intra-
individual covariance to allow covariance parameters to vary from individual to individual (actually, it is
possible to do it with R), Davidian and Giltinan (1995) recommend not to do this because it could be
impossible to estimate the elements of such a complicated structure reliably. Note that for a given
individual i, this setting is equal to the one presented in Section 3.1. The most common assumption for the
conditional distribution of ei given θi is
ei | θi ∼ N ( 0, R i ( θi , ξ ) ) .
4.1.3. Inter-Individual Variation
Differences among individuals are specified through differences in the individual parameters θi. In order
to have a quite general model for variation among individuals, we define the following model for θi
θ i = d ( a i , β, b i ) . (4.3)
In this expression, d is a p-dimensional vector-valued function, ai is a a × 1 covariate vector

corresponding to individual characteristics for individual i, bi is a k × 1 vector of random effects
associated with individual i, and β is a r × 1 vector of fixed population parameters. Each element of d is
associated with the corresponding element of θi, so the functional relation may be different for each
element. Inter-individual random variation is specified by an assumption on the distribution of the random
effects bi. The general assumption is that bi has some distribution with mean 0 and covariance matrix D,
and that they are independent and identically distributed. As in the case of intra-individual variation, the
most common assumption is that
bi ∼ N ( 0, D ) . (4.4)
Note that this assumption does not imply normality for θi since its distribution will depend on the form of
the function d. For instance, pharmacokinetic parameters such as clearance exhibit skewed distributions
with constant coefficient of variation. If we model the individual parameters by
θ ri = β exp ( bri )
and assume a normal distribution for the random effects bri (here, the subindices r and i refer to the rth
parameter for the ith individual) then the individual parameters θri will have a lognormal distribution
which is skewed with constant coefficient of variation.
4.1.4. Modelling
We can model different kind of relations and assumptions with suitable choices of the function d, the
matrix D, and the matrices Ri.
Individual Parameters Distribution
In the previous section we have seen that, although the general assumption is that the random effects have
a normal distribution, an adequate form of the function d will enable us to assume a different distribution
for the individual parameters. Similarly, we can incorporate a systematic dependence of the individual
parameters on subject characteristics by including some covariates in the function d. For example, if we
know that a parameter such as clearance depends on weight, we can define the function
28
θ ri = ( β1 + β 2 wi ) exp ( bri ) ,
with wi the weight of the ith individual. Furthermore, not all the pharmacokinetic parameters of the model
need to be random. We can specify some parameters to be fixed and some to be random by considering
some zeros in the vectors bi.
Treatment Comparisons
The effect of different treatments can be also incorporated in the analysis through the function d. For
instance consider an experiment to test two treatments in an open one-compartment model with
intravascular administration (cf. Section 2.1.1). We can assume that both parameters, VD and kel, vary
between treatments by assuming that
θi = A i β + b i ,
T
with β = ( β1 , β 2 , β3 , β 4 ) , where β 1 and β2 represent the fixed effects for the parameters VD and kel for the
first treatment, and β 3 and β 4 represent the corresponding fixed effects for the second treatment. If each
individual is subjected to just one treatment, then the matrix Ai would be of the form
A i = [ I 2 | 0 2× 2 ]
if individual i receives the first treatment and

A i = [ 02× 2 | I 2 ]
if individual i receives the second treatment. If we assume that just one parameter varies between
treatments (let us say kel), then we can assume that
 β1 
θ1i  1 0 0     b1i 
θ  0 1 0   β 2  + b 
=
 2i       2i 
 β3 
for the first treatment and
 β1 
θ1i  1 0 0     b1i 
θ  = 0 0 1   β 2  + b 
 2i       2i 
 β3 
for the second treatment with β1 the fixed effect for VD.
Inter-Individual and Intra-Individual Structure
We can assume uncorrelated random effects by considering a diagonal matrix D. If we assume that some
random effects are correlated and some are not, a block diagonal matrix is adequate. Autocorrelations and
nonconstant variances in the measures of an individual can be incorporated in the matrices Ri.
4.2. Traditional Approaches
In this section we present a brief description of two traditional approaches to the population parameters
estimation problem. Since these approaches are well-known and easy to implement, we will not go into
further details, and we will pay more attention to the approach exposed in Section 4.3.
4.2.1. The Naïve Pooled Data Approach (NPD)
In this approach all individuals’ data are pooled together as though there were no differences among
individuals and analyzed using nonlinear regression models as though it had all come from one
29
individual. Because this method ignores individuals, both intra and inter-individual variations are
combined in one single error term. Although it is an advantage that the method is simple, it produces
biased and imprecise estimators as was shown by Sheiner and Beal in a series of papers where they
applied this method to estimate the pharmacokinetic parameters of three models: the Michaelis-Menten
(Sheiner and Beal, 1980), the bioexponential model (Sheiner and Beal, 1981), and the monoexponential
model (Sheiner and Beal, 1983). As a result of their research, the authors suggested that this approach
must be abandoned.
4.2.2. The Two-Stage Approach
This approach is useful when there are enough measurements per individual in order to fit individual
models. Then, in a second stage, the individual parameter estimates are used as building blocks to obtain
estimators for the population parameters.
In the first stage the individual parameters are estimated using nonlinear regression methods in a similar
way as described in Section 3.1.2. If we consider a model as in (4.1) with a general covariance structure
as in (4.2), we can use the same iterative process described in Section 3.1.2 to obtain the individual
parameters with a slight modification such that the functional form of Ri and the intra-individual
covariance parameter ξ remain the same across individuals. We can do that by fitting the m individual
regressions simultaneously and then using the residuals from all these fits to estimate ξ.
In the second stage, following the model specification given in (4.3) and (4.4) for the inter-individual
variation, the objective is to estimate the population coefficients given in β and the covariance matrix D.
In the traditional method, named the Standard Two-Stage method (STS), the individual estimates θˆ i are
considered as if they were the true parameters θi . In the simplest case where θi = β + b i , we have that the
θi are independent and identically distributed N ( β, D ) , and then the STS estimates are the sample mean
and covariance of the θˆ i
m
∑ θˆ i
θˆ STS = i =1
,
m
m T
∑ ( θˆ i − θˆ STS )( θˆ − θˆ )
i STS
ˆ i =1
D STS = .
m −1
Since no account is taken of the uncertainty in estimating θi, the STS estimator for the variances in D is
upwardly biased. A further drawback of the STS method is that no refinement of the individual θˆ i such
as shrinkage toward the mean is implemented (Davidian and Giltinan, 1995). Sheiner and Beal (1980,
1981, 1983) suggest to use the geometric mean instead of the arithmetic mean to estimate θ. Their
suggestion is based on the assumption that the functional relation (4.3) for pharmacokinetic parameters is
usually of the form
ln ( θi ) = ln ( β ) + b i .
These authors mention that in the data rich situation the population pharmacokinetic parameter estimates
obtained with this approach are as good as the ones obtained with the methods based on linearization that
we will treat in the next section. For the random effects however, they mention that the inter-individual
variance estimates are biased and imprecise.
Davidian and Giltinan (1995) recommend not using the STS method because of the drawbacks mentioned
above. As an alternative they discuss the Global Two-Stage Method (GTS), which does incorporate the
uncertainty of estimating θi. However, iterative methods are needed to obtain the estimators with this
method which is computationally quite more complicated than the STS method.
30
4.3. Inference Based on Linearization
The NPD approach uses all the data points but as if they came from a single individual. On the other side,
the Two-Stage approach uses the data of the individuals as if they were single data points. This approach
takes a middle course between the two previous ones. It pools all the data but recognizing the individuals
they come from. An advantage of this approach over the Two-Stage approach is that it is capable to use
data from all the individuals, while in the Two-Stage approach, data from individuals without enough data
points to fit the individual models are not considered. Therefore, this approach is particularly valuable
when extensive measurements are not available on all the subjects, which is typically the case in
pharmacokinetics with routine type data8.
Consider the model presented in Section 4.1
y i = f i ( θ i ) + e i , θ i = d ( a i , β, b i ) ,
with the general assumption that
ei | θi ∼ ( 0, R i ( θi , ξ ) ) , bi ∼ ( 0, D ) .
There are two problems in the estimation of this model: that the random effects enter in the model in a
nonlinear fashion and that ei and bi are not independent since the distribution of ei depend on θi which in
turn depend on bi. This approach is based on a linearization by Taylor series expansion of the hierarchical
nonlinear model in such a way that both problems are solved, and the marginal distribution of yi may be
computed. In this section we present two linear approximations: the first-order linearization suggested by
Beal and Sheiner (1982) and a refinement of it, the conditional first-order linearization suggested by
Lindstrom and Bates (1990). This discussion is mainly based on Chapter 6 of Davidian and Giltinan
(1995).
4.3.1. First-Order Linearization
The first-order linearization scheme proceeds as follows. Let us consider

ei = R1i 2 ( θi , ξ ) εi ,
with R1i 2 ( θi , ξ ) the Cholesky decomposition of R i ( θi , ξ ) . Hence, εi has mean zero, covariance matrix
I ni , and is independent of bi. The model given in (4.1) may be written as
y i = fi ( d ( ai , β, b i ) ) + R1i 2 ( d ( ai , β, bi ) , ξ ) εi . (4.5)
In this expression we replace θi by d ( ai , β, bi ) to explicitly show the dependence on the random effects
bi. A Taylor series expansion of (4.5) in b i about its mean E ( b i ) = 0 retaining the first two terms in the
expansion of fi ( d ( ai , β, b i ) ) and the first term in R1i 2 ( d ( ai , β, bi ) , ξ ) εi produces the approximation
y i ≈ fi ( d ( ai , β, 0 ) ) + Fi ( β, 0 ) ∆bi ( β, 0 ) bi + R1i 2 ( d ( ai , β, 0 ) , ξ ) εi , (4.6)
with Fi ( β, 0 ) the ni × p matrix of derivatives of fi ( θi ) with respect to θi evaluated at θi = d ( ai , β, 0 )

and ∆bi ( β, 0 ) the p × k matrix of derivatives of d ( ai , β, bi ) with respect to b i evaluated at bi = 0 .
Defining the ni × k matrix Zi ( β, 0 ) = Fi ( β, 0 ) ∆bi ( β, 0 ) and e∗i = R1i 2 ( d ( ai , β, 0 ) , ξ ) εi , (4.6) may be
written as
y i ≈ fi ( d ( ai , β, 0 ) ) + Zi ( β, 0 ) b i + e∗i . (4.7)
8
These are the data collected from the routine care of patients receiving the drug of interest. In these cases, one
usually takes a few samples per individual in a big group of individuals. This kind of data is quite important because
it comes from the population target and not from healthy subjects.
31
From (4.7), the mean and covariance of yi may be specified by
E ( y i ) ≈ fi ( d ( a i , β, 0 ) ) ,
(4.8)
V ( y i ) ≈ Z i ( β, 0 ) DZ iT ( β, 0 ) + R i ( d ( ai , β, 0 ) , ξ ) .
If b i and e∗i are assumed to be normally distributed, it follows from (4.7) that the marginal distribution of
yi may be taken as approximately normal with moments given by (4.8). Maximum likelihood estimation
is based on taking the model (4.7) as exact and a normal distribution for b i and e∗i , and this is the
framework used by Sheiner and Beal (1980, 1981, 1983) in the studies where they compare the NPD,
STS, and NONMEM9 approaches. Generalized least squares methods rely on the assumption that the
model in (4.7) and the moments in (4.8) are exact and are inspired on multivariate extensions of the
individual nonlinear regression models.
4.3.2. Conditional First-Order Linearization
Lindstrom and Bates (1990) argue that the approximation given in (4.6) by Taylor series expansion about
bi = 0 may be poor. Instead of that, they propose a Taylor series expansion of (4.5) about some value b∗i
closer to b i than 0. Then, the linear approximation given in (4.6) becomes
( ) ( )
y i ≈ fi d ( ai , β, bi∗ ) + Fi ( β, b∗i ) ∆bi ( β, b∗i )( b i − b∗i ) + R1i 2 d ( ai , β, bi∗ ) , ξ ε i , (4.9)
with Fi ( β, b∗i ) the ni × p matrix of derivatives of fi ( θi ) with respect to θi evaluated at θi = d ( ai , β, bi∗ )

and ∆bi ( β, b∗i ) the p × k matrix of derivatives of d ( ai , β, bi ) with respect to b i evaluated at bi = b∗i .
Defining, in a similar way as in the previous section, the ni × k matrix Z i ( β, b∗i ) = Fi ( β, b∗i ) ∆bi ( β, b∗i )
( )
and e∗i = R1i 2 d ( ai , β, b∗i ) , ξ εi , (4.9) may be written as
( )
y i ≈ fi d ( ai , β, bi∗ ) − Zi ( β, b∗i ) b∗i + Z i ( β, b∗i ) b i + e∗i ,
and the mean and covariance of yi are specified by
( )
E ( y i ) ≈ fi d ( ai , β, b∗i ) − Zi ( β, bi∗ ) b∗i ,
(
V ( y i ) ≈ Zi ( β, b∗i ) DZiT ( β, b∗i ) + R i d ( ai , β, b∗i ) , ξ . )
Estimation of this model requires a reasonable choice of b∗i . The strategy suggested by Lindstrom and
Bates (1990) consists of obtaining a suitable estimate of b i , use this value as b∗i in any, a ML or GLS
estimation procedure, and use the ML or GLS estimates to updated the estimate of b i . Then, the process
must be iterated with the new values of b∗i in each iteration.
Lindstrom and Bates (1990) developed their work under a more restricted model than the one presented in
Section 4.1. They assumed that d is a linear function of the fixed and random effects given in β and b i
respectively and that the intra-individual covariance matrix R i ( θi , ξ ) does not depend on θi (and
therefore on b i ) but on i just through its dimension. Davidian and Giltinan (1995) extend their technique
to accommodate the more general model presented in Section 4.1.
9
NONMEM is an acronym for analysis of nonlinear mixed effect models. In the pharmacokinetic literature, this term
refers to the analysis of hierarchical nonlinear models via linearization.
32
4.4. Software
Fitting a hierarchical nonlinear model by linearization (as discussed in Section 4.3) is considerably more
difficult than fitting a nonlinear model with data form a single individual since a computational point of
view. Even for the simplest case when both, the conditional distribution of ei given θi and the
distribution of bi are normal, and the covariance matrix Ri is given by σ 2 I ni (that is, the assumption of
uncorrelated random errors with common variance), the algorithms to fit the model may not converge or
present problems with singular matrices. In these cases it is necessary to try with different initial
approximations and to perform a meticulous data cleaning process. Here we discuss some capabilities of
R and SAS to estimate this models. WinNonlin does not provide population pharmacokinetic analysis.
4.4.1. R
The nlme package fits nonlinear mixed effects models with the linearization method proposed by
Lindstrom and Bates (1990) and exposed in Section 4.3.2 but allowing for nested random effects (we will
see this characteristic in the example of Section 4.6). A normal distribution is assumed in both
components of variation, and the intra-individual errors are allowed to be correlated and to have unequal
variances (that is a complete specification as in (4.2)). The parameters of the model can be linear
functions of fixed and random effects, so the function d given in (4.3) is defined by
θi = A i β + B i b i ,
with Ai and Bi p×r and p×k design matrices respectively. Although this specification is more restrictive
than (4.3), it offers a considerable range of possibilities for modelling inter-individual variation. For
instance, as shown in Section 4.1.4, we can specify different fixed effects among groups of individuals by
defining different design matrices Ai for each group; in such a way, different treatment groups may be
compared. In an analogous way, random effects with different normal distributions may be assigned to
different groups. Dependence of the fixed and random effects on covariates may be included.
4.4.2. SAS
The procedure NLMIXED fits nonlinear mixed effects models. We can specify different distributions for
θi; available options are Normal, binomial, gamma, negative binomial, Poisson, or a general distribution
ei|θ
using SAS programming statements. For the random effects bi, only the normal distribution is available.
PROC NLMIXED fits the models by numerically maximizing an approximation to the marginal
likelihood, that is, the likelihood integrated over the random effects. The principal methods to
approximate this integral are adaptive Gaussian quadrature and a first-order Taylor series approximation
around zero. This procedure does not offer the possibility of modelling correlations structures and
unequal variances for intra-individual errors.
Another option is SAS to fit nonlinear mixed effects models is the %NLINMIX macro. This macro works
by linearization of the nonlinear mixed effects model as described in Section 4.3. Both, the first-order
linearization and the conditional first-order linearization treated in Sections 4.3.1 and 4.3.2 are available.
The methods implemented in this macro are more similar to the method of R than the methods
implemented in the NLMIXED procedure, and consequently, the results obtained with %NLMIX are also
closer to the R results.
4.5. Example 1: One-Compartment Model with Extravascular Administration
In this example we analyze data from an open one-compartment model with intranasal administration of
drug gathered from 4 individuals. This example is a continuation of the one treated in Section 3.5, and the
goal now is to estimate the population parameters. The subject analyzed in Section 3.5 is also included in
33
the current data (subject 1). The data are presented in Table 4.1, and they are stored in the ex2 grouped
data object for the R computations.
TABLE 4.1 Concentration time data for 4 subjects after intranasal administration of 5 mg of drug.
Concentration (mg/l)
Time (hrs) Subject 1 Subject 2 Subject 3 Subject 4
0.00 0.0000 0.0000 0.0006 0.0007
0.25 0.0081 0.0128 0.0221 0.0127
0.50 0.0092 0.0152 0.0213 0.0108
0.75 0.0098 0.0164 0.0216 0.0127
1.00 0.0089 0.0180 0.0199 0.0121
2.00 0.0072 0.0147 0.0129 0.0152
4.00 0.0043 0.0055 0.0066 0.0072
6.00 0.0027 - 0.0036 0.0051
8.00 - 0.0034 0.0024 -
In the same way as in Section 3.5, for each individual we have the following pharmacokinetic model
D ⋅ f ⋅ ka
C (t ) = e − kel ⋅t − e − ka ⋅t  .
Considering f = 1 and D = 5 mg, we express the model for individual i, following the standard nonlinear
regression notation by
5θ1i
f ( x, θ i ) = e −θ2 i ⋅ x − e−θ1i ⋅ x  ,
θ3i (θ1i − θ 2i ) 
with θ1i = ka, θ2i = kel, and θ3i = VD for individual i. In figure 4.1 we present the concentration-time data
for the four individuals.
> plot(ex2)
0 2 4 6 8
3 4
0.020
0.015
0.010
0.005
0.000
Conc
1 2
0.020
0.015
0.010
0.005
0.000
0 2 4 6 8
Time
FIGURE 4.1 Concentration of drug over time after extravascular administration for four subjects
34
4.5.2. Analysis of the Population Pharmacokinetic Model
Here we consider the simplest setting, that is, that the intra-individual random errors are independent and
have normal distributions with constant variance, and that the random effects have normal distribution.
Hence, the hierarchical nonlinear model is defined by
y i = f i ( θ i ) + ei ,
with the assumptions
( )
ei | θi ∼ N 0, σ 2 I ni , θi = β + b i , bi ∼ N ( 0, D ) .
We start by fitting an individual model for each subject using the nlsList function; the results are
stored in the oneEV.lis object and are shown in Table 4.2. The initial values for the parameters
required for the estimation process are, as in Section 3.5, θ1 = 5, θ2 = 0.2, and θ3 = 500.
TABLE 4.2 Individual analysis using the nlsList function.

> oneEV.lis<-nlsList(Conc~5*Ka/(Vd*(Ka-Kel))*(exp(-Kel*Time)-exp(-Ka*Time)),
+ data=ex2, start=c(Ka=5, Kel=0.2, Vd=500)); oneEV.lis
Call:
Model: Conc ~ 5 * Ka/(Vd * (Ka - Kel)) * (exp(-Kel * Time) - exp(-Ka * Time)) |
Subject
Coefficients:
Ka Kel Vd
1 5.521096 0.2438301 451.3271
2 3.046351 0.3135334 220.1265
3 8.984343 0.3416069 192.8556
4 8.242108 0.1410363 353.0241
Degrees of freedom: 33 total; 21 residual

Residual standard error: 0.001414658
> fixed.effects(oneEV.lis)
Ka Kel Vd
6.4484745 0.2600017 304.3333496
The estimated parameters for subject 1 are equal to the ones obtained in Section 3.5 and presented in
Table 3.2. The function fixed.effects extracts fixed effects estimates, and they are shown at the end
of Table 4.2; in this case they are just the arithmetic means of the individual estimates10.
Pinheiro and Bates (2000) recommend using a diagonal matrix D when the number of random effects is
large relative to the number of individuals. The reason is that a general positive definite structure for D
would include too many parameters in the model, and this can carry out convergence problems due to an
overparameterized model. Here, we follow their recommendation, and in Table 4.3 we present the results
of the analysis considering a diagonal matrix D, that is, assuming that the random effects are uncorrelated.
As we can see on top of Table 4.3, the nlme function is applied directly over the oneEV.lis object
which contains the results of the first fitting. In this way, the fixed effects estimates obtained with the
nlsList function and presented in Table 4.2 are used as initial values. The results obtained with the
nlme function are stored in the oneEV1.nlme object.
In this analysis we are considering that all the individual parameters, ka, kel, and VD, contain a random
effect. However, it could sometimes be the case that some of the parameters may be considered as fixed
among subjects. We will use an ANOVA procedure to decide if the ka parameter can be considered as
fixed, though in practice additional non statistical criteria must be considered. To do that, we compute a
second model, oneEV2.nlme, where just kel and VD have random effects, and then we compare these
two models using a likelihood-ratio test. We present the fitted model in Table 4.4 and the ANOVA results
10
This corresponds with the STS approach treated in Section 4.2.2.
35
in Table 4.5. If we compare model 1 and model 2 we see that the log-likelihood values do not differ very
much, and that the AIC and BIC values obtained with model 2 are lower. These results support the idea
that model 2 is more suitable. Indeed, the high p-value suggests that the more complicated model 1 does
not fit the data significantly better than model 2, so we will consider that the parameter ka has just fixed
effect. In addition, we performed the same test for the kel and VD parameters, but in both cases the p-
values were significant (0.0182 and <0.0001 respectively).
TABLE 4.3 Estimated hierarchical nonlinear model under the assumption of uncorrelated
random effects.
> oneEV1.nlme<-nlme(oneEV.lis, random = pdDiag(Ka+Kel+Vd~1)); oneEV1.nlme
Nonlinear mixed-effects model fit by maximum likelihood

Model: Conc ~ 5 * Ka/(Vd * (Ka - Kel)) * (exp(-Kel * Time) - exp(-Ka * Time))
Log-likelihood: 157.8977
Fixed: list(Ka ~ 1, Kel ~ 1, Vd ~ 1)
Ka Kel Vd
5.3135642 0.2657170 294.2156330
Random effects:
Formula: list(Ka ~ 1, Kel ~ 1, Vd ~ 1)
Level: Subject
Structure: Diagonal
Ka Kel Vd Residual
StdDev: 1.605036 0.0667344 90.40686 0.001403244
Number of Observations: 33
Number of Groups: 4
TABLE 4.4 Estimated hierarchical nonlinear model under the assumption of uncorrelated
random effects. Parameters kel and VD are considered as random and ka as fixed.
> oneEV2.nlme<-update(oneEV1.nlme, random = pdDiag(Kel+Vd~1)); oneEV2.nlme

Ka Kel Vd
6.2177185 0.2502124 305.4877720
Random effects:
Formula: list(Kel ~ 1, Vd ~ 1)
Level: Subject
Structure: Diagonal
Kel Vd Residual
StdDev: 0.07140224 93.36056 0.001514504
Number of Groups: 4
TABLE 4.5 ANOVA procedure to test if the random effect for the ka parameter can be removed.
> anova(oneEV1.nlme, oneEV2.nlme)
Model df AIC BIC logLik Test L.Ratio p-value

oneEV1.nlme 1 7 -301.7953 -291.3198 157.8977
oneEV2.nlme 2 6 -302.5641 -293.5851 157.2821 1 vs 2 1.231189 0.2672
36
In Table 4.6 we explore a possible correlation between kel and VD. To do that, we update the model stored
in oneEV2.nlme with a general covariance structure. In Table 4.7 we compare both models with an
ANOVA procedure. The high negative correlation obtained between these two random effects (-0.704)
suggests that model 3 could be appropriate, but the ANOVA results state that this more complicated
model does not fit the data significantly better. Therefore, we choose the model 2, stored in the
oneEV2.nlme object as the more appropriate model to describe this data.
TABLE 4.6 Estimated hierarchical nonlinear model under the assumption of a general
covariance structure. Parameters kel and VD are considered as random and ka as fixed.
> oneEV3.nlme<-update(oneEV2.nlme, random = Kel+Vd~1); oneEV3.nlme

Ka Kel Vd
6.188203 0.243500 307.955015
Random effects:
Level: Subject
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
Kel 0.074010013 Kel
Vd 98.107180378 -0.704
Residual 0.001513284
Number of Groups: 4
TABLE 4.7 ANOVA procedure to test if a correlation between the random effects of kel and VD
must be included in the model.

oneEV2.nlme 1 6 -302.5641 -293.5851 157.2821
oneEV3.nlme 2 7 -302.2080 -291.7325 158.1040 1 vs 2 1.643892 0.1998
4.5.3. Heterogeneity of Variances
In the analysis so far we have assumed that the random errors have constant variance. However, with
concentration time data, the variance of the random errors at measure time t used to depend on the
concentration at that time. We can see this relation in Figure 4.2 where the random errors are more
disperse for larger fitted values.
We can use the varPower function of R to model the variance of the random errors as a power function
of the concentrations. Following the general specification in (4.2)
Cov ( ei | θi ) = R i ( θi , ξ ) ,
we will assume that the covariance matrices Ri are diagonal matrices with elements given by
Var ( eij ) = σ 2 yij2δ ,
37
so the standard deviation of the jth random error for individual i is proportional to some power δ of the
corresponding jth concentration11. Therefore in this case the vector of covariance parameters ξ contains
the scale parameter σ and the power parameter δ. The function varPower in R fits the best value for δ
which as shown in Table 4.8 turns out to be 0.6063. In Figure 4.3 we present the plot of residuals for this
new model which looks quite better than the one shown in Figure 4.2.
> plot(oneEV2.nlme)
2
Standardized residuals
-1
0.000 0.005 0.010 0.015 0.020

Fitted values
FIGURE 4.2 Standardized residuals versus fitted values for model oneEV2.nlme
TABLE 4.8 Estimated hierarchical nonlinear model considering heterogeneity of variances for
the intra-individual random errors.
> oneEV4.nlme<- update(oneEV2.nlme, weights = varPower(0,form=~Conc+0.00001))
> oneEV4.nlme

Ka Kel Vd
5.9268517 0.2555056 306.0218075
Random effects:
Level: Subject
Structure: Diagonal
Kel Vd Residual
StdDev: 0.05066358 89.05437 0.02703865
Variance function:
Structure: Power of variance covariate
Formula: ~Conc + 1e-05
Parameter estimates:
power
0.6062694
Number of Groups: 4
11
As we can see in the first line of Table 4.8, a small quantity (0.00001) is added to each concentration in the weight
function to avoid computational problems.
38
TABLE 4.8 Continuation.
> ranef(oneEV4.nlme)
Kel Vd
1 0.001624147 125.86434
2 -0.007037481 -39.39552
3 0.060773043 -108.27353
4 -0.055359709 21.80472
> plot(oneEV4.nlme)
2
Standardized residuals
-1
0.000 0.005 0.010 0.015 0.020

Fitted values
FIGURE 4.3 Standardized residuals versus fitted values for model oneEV4.nlme
Finally, we compare models oneEV2.nlme and oneEV4.nlme with an ANOVA procedure. The
results of Table 4.9 confirm our conclusions from the residual plots. We can see that allowing for
heterogeneity of variances considerably improves the fit of the model.
TABLE 4.9 ANOVA procedure to compare the models with constant and nonconstant variances

oneEV2.nlme 1 6 -302.5641 -293.5851 157.2821
oneEV4.nlme 2 7 -323.0044 -312.5289 168.5022 1 vs 2 22.44028 <.0001
Summarizing the final proposed model is

5β1
yij = e − ( β 2 + b2 i )⋅ x − e − β1 ⋅ x  + eij ,
( β3 + b3i )( β1 − β 2 − b2i )  
with β 1, β2, and β3 the fixed effects (population parameters), and b2i and b3i the random effects for
individual i. The individual parameters are given by β 1, β2 + b2i, and β 3 + b3i. For the random effects we
have that
b   0   d 0 
bi =  2i  ∼ N    ,  22 .
b
 3i   0   0 d33  
39
The estimated fixed and random effects are (from Table 4.8) βˆ1 = 5.9269 , βˆ2 = 0.2555 , βˆ3 = 306.0218 ,
bˆ = 0.0016 , bˆ = −0.0070 , bˆ = 0.0608 , bˆ = −0.0554 ,
21 22 23 24 bˆ31 = 125.86 , bˆ32 = −39.40 , bˆ33 = −108.27 ,
and bˆ34 = 21.80 . For the random errors, eij, we assume that they are independent and have a normal
distribution with mean 0 and variance given by
Var ( eij ) = σ 2 yij2δ ,
where σˆ = 0.02704 and δˆ = 0.6063 . In Figure 4.4 we present the population and individual fitted curves
for this last model.
> plot(augPred(oneEV4.nlme, level=0:1))
fixed Subject
0 2 4 6 8
3 4
0.020
0.015
0.010
0.005
0.000
Conc
1 2
0.020
0.015
0.010
0.005
0.000
0 2 4 6 8
Time
FIGURE 4.4 Population and individual fitted curves for the final model
4.6. Example 2: Comparison of Two Treatments in the One-Compartment Model

with Extravascular Administration
In this example we analyze data from an open one-compartment model with oral administration of drug
gathered from 6 subjects. The main goal is to compare two different formulations (treatments) which are
labeled E and PO, in terms of their effects on the pharmacokinetic parameters, especially in the constant
of absorption. Each treatment was applied twice to each subject in randomized orders and with a
sufficiently large between applications time in order to completely wash out the previous dose before the
administration of the next one. Hence we have four repetitions of the experiment per subject (two with
each treatment) which gives a total of 24 individual data sets. To avoid confusions we must keep in mind
40
that “subjects” and “individuals” are not the same; we will call “individual” to each of the 24 subject-
treatment-repetition combinations, and we will refer to them in the R code as “Unit” (and we will use
“Subject” and “Treat” for the other two classification criteria). We show these data in Table 4.10, and
they are stored, for the R computations, in the grouped data object ex3.
TABLE 4.10 Concentration time data for 6 subjects after oral administration of 1.5 g of drug, with
two treatments and two repetitions per treatment.
Time Concentration (mg/l)
(min) Treat Rep Sub 1 Sub 2 Sub 3 Sub 4 Sub 5 Sub 6
5 E 1 2.374 0.000 0.000 0.000 0.000 0.000
10 E 1 7.711 1.860 4.672 2.177 0.000 0.000
15 E 1 16.374 12.277 10.115 9.918 0.000 2.026
20 E 1 21.817 13.623 16.374 13.940 3.735 4.022
30 E 1 16.269 13.003 19.202 16.102 5.519 7.363
40 E 1 14.711 14.862 25.884 14.454 10.327 8.089
50 E 1 14.817 16.102 20.774 12.489 10.841 7.771
60 E 1 14.182 15.785 20.562 12.186 12.413 6.320
75 E 1 11.052 13.517 15.966 15.074 10.735 6.426
90 E 1 11.990 12.277 14.500 15.376 9.903 7.469
105 E 1 9.586 11.355 13.668 13.214 9.797 8.301
120 E 1 8.542 9.193 12.307 12.700 9.586 8.920
150 E 1 7.076 8.361 10.417 10.735 9.903 7.666
180 E 1 6.138 6.925 9.072 8.875 8.542 6.214
240 E 1 4.884 4.657 6.033 6.501 5.413 4.642
5 E 2 2.132 0.000 0.000 0.000 0.000 0.000
10 E 2 5.277 1.512 0.771 2.797 0.423 0.000
15 E 2 12.171 4.748 1.194 6.290 1.149 6.154
20 E 2 18.219 7.469 7.257 10.826 8.164 7.363
30 E 2 21.772 13.320 14.560 14.530 13.320 9.676
40 E 2 18.854 14.363 24.070 13.683 18.370 12.489
50 E 2 11.430 12.685 21.046 12.171 15.694 12.398
60 E 2 12.368 12.791 16.238 15.195 14.243 12.700
75 E 2 10.599 12.065 16.556 14.469 13.003 15.014
90 E 2 9.767 9.344 14.772 12.262 11.355 17.024
105 E 2 9.344 8.618 13.623 11.324 11.566 12.700
120 E 2 7.877 8.406 11.959 10.493 10.946 9.873
150 E 2 7.666 8.195 9.344 9.026 9.298 7.862
180 E 2 5.685 5.579 7.257 8.089 8.467 7.167
240 E 2 3.281 3.598 5.065 5.897 5.685 4.944
5 PO 1 0.000 0.000 0.000 0.000 0.000 0.000
10 PO 1 11.869 17.115 3.523 0.000 0.257 0.922
15 PO 1 19.292 12.096 9.389 0.015 5.231 7.393
20 PO 1 25.688 16.072 5.987 3.432 16.329 10.009
30 PO 1 17.342 15.119 22.906 14.711 16.435 12.897
40 PO 1 16.919 13.562 18.884 13.562 11.461 15.271
50 PO 1 14.137 15.119 13.623 11.884 13.471 14.862
60 PO 1 12.594 13.139 12.383 10.629 13.789 13.003
75 PO 1 11.355 11.990 14.757 9.692 13.048 10.432
90 PO 1 11.385 11.264 14.545 11.778 11.672 9.389
105 PO 1 9.329 10.417 14.757 12.096 11.355 9.298
120 PO 1 8.301 8.860 13.940 11.158 10.085 8.164
150 PO 1 7.061 8.119 10.946 9.797 9.041 6.925
180 PO 1 5.715 5.624 8.467 8.225 8.830 6.411
240 PO 1 4.173 3.946 5.791 5.413 6.607 4.657
5 PO 2 6.758 0.000 0.000 0.000 0.000 0.000
10 PO 2 14.212 8.195 4.702 0.000 0.922 0.000
41
TABLE 4.10 Continuation.
Time Concentration (mg/l)
(min) Treat Rep Sub 1 Sub 2 Sub 3 Sub 4 Sub 5 Sub 6
15 PO 2 19.141 15.800 18.446 1.618 1.724 5.095
20 PO 2 21.560 19.081 18.234 6.653 9.571 10.251
30 PO 2 22.966 12.201 18.446 10.478 16.722 11.294
40 PO 2 17.130 12.836 16.435 9.178 15.618 11.083
50 PO 2 14.212 13.683 15.271 12.594 12.700 11.294
60 PO 2 11.491 12.625 16.752 10.584 12.398 11.702
75 PO 2 9.178 10.402 14.106 9.178 11.491 11.597
90 PO 2 7.756 9.147 12.731 10.886 10.689 10.145
105 PO 2 7.454 9.253 12.096 10.085 10.085 9.435
120 PO 2 6.154 7.666 11.249 10.175 7.454 9.223
150 PO 2 4.944 6.607 9.873 9.072 8.966 8.089
180 PO 2 3.538 6.078 9.253 8.059 6.048 7.061
240 PO 2 2.026 3.115 5.655 5.957 4.748 4.475
In Figure 4.5 we present plots of the concentration time data for each subject and treatment combination.
We observe a similar shape for both curves in each subject treatment combination with the exception of
subject 6 with treatment E. Another strange characteristic in these data is the presence of two peaks in
some curves (it is clear with subject 4). These peaks may be related with some factor not considered in
the experiment, so attention must be paid to this strange effect in future experimentation.
In Table 4.10 we appreciate that in most of the cases a zero concentration is recorded at 5 minutes after
the administration of the drug and sometimes even at 10 and 15 minutes. Hence, we will include in this
example a lag-time in the model, denoted by tlag. With the addition of this component the model is defined
by
D ⋅ f ⋅ ka  − kel ⋅(t − tlag ) − ka ⋅( t −tlag ) 
C (t ) = e −e .
VD ( ka − kel )  
In this example the dose is 1.5 g, and the fraction of drug which is absorbed is assumed to be 0.85, so
D.f = 1275 mg. The pharmacokinetic parameters ka, kel, and VD are positive quantities, and sometimes, in
order to ensure positiveness of the estimates, the model is parameterized in terms of the logarithms of the
parameters. We will do that in this example, so the model is finally defined by
∗
1275e ka
C (t ) = ( ) (
exp −ekel∗ ⋅ ( t − t ) − exp −e ka∗ ⋅ ( t − t )  ,
  )
(e )
lag lag
VD∗ ka∗ kel∗
e −e
with ka∗ = ln ka , kel∗ = ln kel , and VD∗ = ln VD . In the standard nonlinear notation we have for each
individual model
1275eθ1
f ( x, θ ) = θ3
e (e θ1i
−e θ2
)
( ) (  )
exp −eθ2 ⋅ ( x − tlag ) − exp −eθ1 ⋅ ( x − tlag )  ,

(4.10)
with θ1 = ka∗ = ln ka , θ 2 = kel∗ = ln kel , and θ3 = VD∗ = ln VD . The inclusion of the lag-time as a parameter in
the hierarchical nonlinear model brings about computational problems. To avoid these problems, we
estimate a different lag-time for each of the 24 individual models by fitting equation (4.10) using
nonlinear regression and with the restriction that the lag-time should not be greater than the lowest time
with positive concentration in the data. We fit different lag-times for each individual model because the
lag-time can depend on the subjects and the treatments, and because the first appearance of drug is
different between the two repetitions of the treatments for some subjects. The estimated lag-times range
from 3 to 15 minutes. Then, we use the estimated individual lag-times as fixed values in the population
analysis, so in the population stage, we consider for each individual the model
42
1275eθ1
f ( x∗ , θ ) = ( ) (
exp −eθ2 ⋅ x∗ − exp −eθ1 ⋅ x∗  , )
θ3
e ( eθ1 − eθ2 
) 
where x∗ = x − tlag . For the computations presented in the next section, we store the data from Table 4.10,
with the lag-time adjustment and the zero concentration data points removed in the grouped data object
ex3clean.
> plot(ex3, outer=~sub_treat)
0 50 100 150 200 250 0 50 100 150 200 250
sub5 - E sub5 - PO sub6 - E sub6 - PO

25
20
15
10
5
0
25
20
Conc
15
10
5
0
25
20
15
10
5
0
0 50 100 150 200 250 0 50 100 150 200 250
Time
FIGURE 4.5 Concentration of drug over time after extravascular administration for six subjects
There are two important features in this example, the inclusion of the factor treatments and the two levels
for the random effects, given by subjects and the repetitions within subjects. These two features make the
modeling considerably more difficult.
4.6.2. Analysis of the Population Pharmacokinetic Model
In this example we have to include in the model the fixed effect of treatments and two levels of random
effects, given by subjects and the repetitions within subjects. Hence, we define the model by
y ij = fij ( θij ) + eij , θij = β( ) + b i + b ij ,

h
( )
eij | θij ∼ N 0, σ 2 I nij , bi ∼ N ( 0, D1 ) , bij ∼ N ( 0, D2 ) , (4.11)
h = 1, 2; i = 1,..., 6; j = 1,..., 4,
43
where h, i, and j represent the hth treatment, ith subject, and jth repetition within the ith subject. β is the
vector of fixed effects, bi is the vector of random effects for subject i, and bij is the vector of random
effects for the repetition j within subject i. When repetition j within subject i corresponds to treatment E, h
equals 1, and when it corresponds to treatment PO, h equals 2. The random effects bi and bij are assumed
to be independent and normally distributed.
We start the analysis by fitting individual nonlinear regression models using the nlsList function; the
initial values for the parameters required for the estimation process are θ1 = -2.3, θ2 = -4.6, and θ3 = 4.1.
The results of the nlsList function are stored in the onelag.lis object, and in Figure 4.6 we show
the confidence intervals for the individual pharmacokinetic parameters.
> onelag1.lis<-nlsList(Conc~1275*exp(lnKa)/(exp(lnVd)*(exp(lnKa)-exp(lnKel)))*
+ (exp(-exp(lnKel)*t)-exp(-exp(lnKa)*t)),
+ data=ex3clean, start=c(lnKa=-2.3, lnKel=-4.6, lnVd=4.1))
> plot(intervals(onelag1.lis))
lnVd
24 | | |
23 | | |
22 | | |
21 | | |
20 | | |
19 | | |
18 | | |
17 | | |
16 | | |
15 | | |
14 | | |
13 | | |
12 | | |
11 | | |
9 | | |
10 | | |
8 | | |
7
6 | | |
5 | | |
4 | | |
3 | | |
2 | | |
1 | | |
Unit
3.5 4.0 4.5 5.0

lnKa lnKel
24 | | | | | |
23 | | | | | |
22 | | | | | |
21 | | | | | |
20 | | | | | |
19 | | | | | |
18 | | | | | |
17 | | | | | |
16 | | | | | |
15 | | | | | |
14 | | | | | |
13 | | | | | |
12 | | | | | |
11 | | | | | |
9 | | | | | |
10 | | | | | |
8 | | | | | |
7
6 | | | | | |
5 | | | | | |
4 | | | | | |
3 | | | | | |
2 | | | | | |
1 | | | | | |
-4 -3 -2 -1 0 1 -8 -7 -6 -5 -4
FIGURE 4.6 95% confidence intervals for the individual pharmacokinetic parameters
The first four intervals (numbered from bottom to top with 1, 2, 3, and 4) correspond to subject 1, the next
four to subject 2 and so on. Within each subject, the first two intervals correspond to treatment E and the
other two to treatment PO. The individual model 7 cannot be estimated (it corresponds to one repetition
with subject 2 and treatment PO). However there are not computational problems when we include this
44
model in the population analysis using the nlme function. We can see in this graph that for each subject,
the constant of absorption has a tendency to be greater with treatment PO.
We start the population analysis fitting a hierarchical nonlinear model with all the fixed and random
effects defined in (4.11). After that, we will test the suitability of the fixed effects by using likelihood-
ratio tests and the AIC value. The results of this first fitting are stored in the onelag1.nlme object and
are shown in Table 4.11.
TABLE 4.11 Estimated hierarchical nonlinear model. All the fixed and random effects are
included.
> onelag1.nlme<-nlme(Conc~1275*exp(lnKa)/(exp(lnVd)*(exp(lnKa)-exp(lnKel)))*
+ fixed = lnKa+lnKel+lnVd~Treat, random = lnKa+lnKel+lnVd~1|Subject/Unit,
+ data=ex3clean, start=c(-2.3,0,-4.6,0,4.1,0))
> onelag1.nlme

Model: Conc ~ 1275 * exp(lnKa)/(exp(lnVd) * (exp(lnKa) - exp(lnKel))) *
(exp(-exp(lnKel) * t) - exp(-exp(lnKa) * t))
Log-likelihood: -712.0107
Fixed: lnKa + lnKel + lnVd ~ Treat
lnKa.(Intercept) lnKa.TreatPO lnKel.(Intercept) lnKel.TreatPO
-2.49726501 0.78591411 -5.16345068 -0.01963519
lnVd.(Intercept) lnVd.TreatPO
4.23964597 0.05720349
Random effects:
Formula: list(lnKa ~ 1, lnKel ~ 1, lnVd ~ 1)
Level: Subject
StdDev Corr
lnKa.(Intercept) 0.2692887 lnK.(In) lnKl.(I)
lnKel.(Intercept) 0.3023348 0.876
lnVd.(Intercept) 0.1814035 -0.503 -0.856

Level: Unit %in% Subject
StdDev Corr
lnKa.(Intercept) 0.4603061 lnK.(In) lnKl.(I)
lnKel.(Intercept) 0.2654344 -0.445
lnVd.(Intercept) 0.1645496 0.217 -0.970
Residual 1.7502720

Number of Groups:
Subject Unit %in% Subject
6 24
> anova(onelag1.nlme)
numDF denDF F-value p-value
lnKa.(Intercept) 1 303 99.2063 <.0001
lnKa.Treat 1 303 24.8467 <.0001
lnKel.(Intercept) 1 303 259.2543 <.0001
lnKel.Treat 1 303 1.4860 0.2238
lnVd.(Intercept) 1 303 2572.0818 <.0001
lnVd.Treat 1 303 0.5483 0.4596
The ANOVA results at the end of the table suggest that the effect of treatments in the volume of
distribution and the constant of elimination could be negligible. Indeed, based on pharmacokinetic
considerations, this must be the case because both, the constant of elimination and the volume of
distribution are subject characteristics, and then, they should not depend on the treatment. To confirm this
assumption, we fit a model without the effect of treatments on the volume of distribution in the
onelag2.nlme object and a model without the effect of treatments on both, the volume of distribution
45
and the constant of elimination, in the onelag3.nlme object. Then, we perform a likelihood-ratio test
to compare model onelag1.nlme with model onelag2.nlme and model onelag2.nlme with
model onelag3.nlme. The results of these comparisons are shown in Table 4.12.
TABLE 4.12 ANOVA procedures to evaluate the effect of treatments on the volume of
distribution and constant of elimination.
+ fixed = list(lnKa~Treat, lnKel~Treat, lnVd~1),
+ random = lnKa+lnKel+lnVd~1|Subject/Unit,
+ data=ex3clean, start=c(-2.3,0,-4.6,0,4.1))
> anova(onelag1.nlme,onelag2.nlme)

onelag1.nlme 1 19 1462.022 1534.319 -712.0107
onelag2.nlme 2 18 1460.571 1529.063 -712.2854 1 vs 2 0.5492719 0.4586
+ fixed = list(lnKa~Treat, lnKel~1, lnVd~1),
+ random = lnKa+lnKel+lnVd~1|Subject/Unit,
+ data=ex3clean, start=c(-2.3,0,-4.6,4.1))
> anova(onelag2.nlme,onelag3.nlme)

onelag2.nlme 1 18 1460.571 1529.063 -712.2854
onelag3.nlme 2 17 1460.234 1524.921 -713.1168 1 vs 2 1.662786 0.1972
We see that the inclusion of the effect of treatments on the volume of distribution and on the constant of
elimination in the model does not fit the data significantly better (p-values of 0.4586 and 0.1972). Indeed,
as we go from the complete model in onelag1.nlme to the models in onelag2.nlme and
onelag3.nlme, we get lower AIC values. Hence we conclude that the treatments just affect the
constant of absorption, which in turn as mention before, is of pharmacokinetic meaning. This final model
is shown in Table 4.13. Confidence intervals for the fixed and random effects parameters can be obtained
with the command intervals. From this table we can get the estimates for the parameters of the model
specified in (4.11), and they are
 −2.549   −1.664 
ˆβ(1) =  −5.170  , βˆ ( 2) =  −5.170  ,
   
 4.268  4.268
0.0707 0.0715 −0.0248 0.2157 −0.0524 0.0134 

ˆ 
D1 =  0.0913 −0.0461 , D2 = 
 ˆ 0.0637 −0.0400  ,
 0.0322  0.0273
σˆ 2 = 3.073 .
We can see in Table 4.13 that the estimated effect of treatment PO on ka∗ is 0.8852 greater than the
estimated effect of treatment E on ka∗ . It means that the estimated constant of absorption with treatment
PO is approximately 2.42 times ( exp ( 0.8852 ) = 2.42 ) the estimated constant of absorption with treatment
E.
46
TABLE 4.13 Final estimated hierarchical nonlinear model. Factor Treatments only affects the
constant of absorption.
> onelag3.nlme

Model: Conc ~ 1275 * exp(lnKa)/(exp(lnVd) * (exp(lnKa) - exp(lnKel))) *
(exp(-exp(lnKel) * t) - exp(-exp(lnKa) * t))
Log-likelihood: -713.1168
Fixed: list(lnKa ~ Treat, lnKel ~ 1, lnVd ~ 1)
lnKa.(Intercept) lnKa.TreatPO lnKel lnVd
-2.548838 0.885244 -5.170350 4.268024
Random effects:
Level: Subject
StdDev Corr
lnKa.(Intercept) 0.2658372 lK.(I) lnKel
lnKel 0.3021092 0.89
lnVd 0.1795318 -0.52 -0.85

Level: Unit %in% Subject
StdDev Corr
lnKa.(Intercept) 0.4644615 lK.(I) lnKel
lnKel 0.2524257 -0.447
lnVd 0.1652529 0.174 -0.958
Residual 1.7529901

Number of Groups:
Subject Unit %in% Subject
6 24
> intervals(onelag3.nlme)
Approximate 95% confidence intervals
Fixed effects:
lower est. upper
lnKa.(Intercept) -2.8941050 -2.548838 -2.203570
lnKa.TreatPO 0.5390877 0.885244 1.231400
lnKel -5.4483971 -5.170350 -4.892303
lnVd 4.1050106 4.268024 4.431038
attr(,"label")
[1] "Fixed effects:"
Random Effects:
Level: Subject
lower est. upper
sd(lnKa.(Intercept)) 0.09278093 0.2658372 0.7616806
sd(lnKel) 0.14602780 0.3021092 0.6250176
sd(lnVd) 0.08650154 0.1795318 0.3726137
cor(lnKa.(Intercept),lnKel) -0.91630921 0.8903210 0.9997060
cor(lnKa.(Intercept),lnVd) -0.96099913 -0.5199324 0.6675543
cor(lnKel,lnVd) -0.97758356 -0.8500074 -0.2658948
Level: Unit
lower est. upper
sd(lnKa.(Intercept)) 0.3083837 0.4644615 0.6995327
sd(lnKel) 0.1444497 0.2524257 0.4411134
sd(lnVd) 0.1054579 0.1652529 0.2589520
cor(lnKa.(Intercept),lnKel) -0.8007522 -0.4467078 0.1386471
cor(lnKa.(Intercept),lnVd) -0.3768424 0.1743702 0.6343818
cor(lnKel,lnVd) -0.9957065 -0.9578051 -0.6448654
Within-group standard error:

lower est. upper
1.608233 1.752990 1.910777
47
5. Sampling Strategies
The sampling design is an important issue in most statistical applications, and in population
pharmacokinetics it has a special relevance due to some particular characteristics of the field. Firstly,
since two sources of variation are present in the population model, that is, inter and intra-individual
variation, the sample size determination is a two sided problem, each of them with their own difficulties.
For the intra-individual variation, the main problem is that in some situations it is not possible to obtain
several samples per individual. With routine clinical data for instance, only a few samples per individual
are available in a group of several individuals, usually with just one or two samples in many of them.
With experimental data, other kind of considerations (e.g., popular believes, superstitions, and clinical
considerations) used to limit the number of samples per individual. For inter-individual variation,
limitations rely more on budget considerations, although ethical and clinical considerations can also be
important. Hence, the trade-off between number of individuals and number of measures per individual is
a central aspect here. Another important consideration in the sampling design in pharmacokinetic studies
is the measure times. A limitation here is that sometimes it is quite difficult to strongly control the
measure times. This is the case of routine clinical data where measures are taken at the time patients
arrive.
There are several studies addressing these topics in the literature, and in this section we present some
results. The problem of getting optimal sampling times has been approached with simulation studies
(mostly in the pharmacokinetic literature) and the theory of optimal designs (mostly in the statistical
literature). In Section 5.1 we present some results from the optimal designs theory; the studies in this field
focus mainly in the individual model analysis. In Section 5.2 we present a summary of some simulation
studies which are more focused on the population model analysis; these studies approach problems as the
trade-off between the number of subjects and the number of measures per subject, and the most
appropriate individual sampling times, including the effect of some randomness in the sampling times
which is typical in pharmacokinetic studies. Finally in Section 5.3 we perform some simulations to
evaluate the effect of subjects with just one or two measures in the population analysis of the one-
compartment model with intra and extravascular administration.
5.1. Optimal Designs
We start this section with a definition for D and c-optimal designs, mainly based on the book of Atkinson
and Donev (1996). To go into this subject, we must start with a definition for a design.
A continuous design is represented by a measure ξ over the design region. If the design has trials at k
different points in the design region, we write
x1 x 2 … x k 
ξ = ,
 w1 w2 … wk 
where xi are the design points (or support points) and wi the corresponding design weights. A design with
n trials is exact if it consists of ni trials at location xi with
k
n = ∑ ni .
i =1
48
Given an optimum continuous design ξ *, if n trials are available, in practice we will perform the exact
design ξn with ni the integer approximation to wi∗ n .
Consider a nonlinear regression model as defined in Section 3.1. A design is D-optimal if it maximizes
the determinant of the information matrix F.T F. . A D-optimal design is appropriate if our interest is in
precise estimation of all the parameters of the model. If there is a particular interest in the estimation of
some linear combination of the parameters, cT θ , we must use a c-optimal design. A c-optimal design
minimizes the variance of the linear combination of interest, which is given by
−1
var ( cT θ ) = σ 2 cT ( F.T F.) c .
Due to the fact that the model is nonlinear in the parameters, the matrix of derivatives F. (cf. (3.7)) does
depend on the parameter values, and hence the optimum design too. If the interest is in a nonlinear
θ) (which is the case of AUC, MRT, or t1/2), the nonlinear function must be
function of the parameters c(θ
expanded by Taylor series as in (3.12), and then the resulting c-optimal design will depend on the
parameters through both, the information matrix and the derivatives of c(θθ) with respect to θ. Therefore D
and c-optimal designs are only locally optimum. We can deal with the problem of the dependence on the
unknown parameters with the following approaches:
1. Assume a prior value for the parameters.
2. Assume a prior distribution for the parameters (Bayesian approach).
3. Sequential designs.
In the sequential approach we assume a prior value or a prior distribution for the parameters, find the
optimal design for this prior specification, carry out an experiment, and estimate the model. If the
parameter estimates do not considerably differ from the prior specification the process stops, otherwise,
we repeat the process with the parameter estimates as original approximations. The main limitations of
this approach are the costs and time for experimentation. However, in a population study parameter
estimates from previous subjects could be used to compute optimal designs for the next subject
(D’Argenio, 1981).
D-optimal designs for nonlinear models constructed with a prior value for the parameters have some
limitations. If the nonlinear model has p parameters, the D-optimal design usually has p different support
points with equal weights, and hence, there are no degrees of freedom to check the model. Indeed, if the
initial approximation for the parameters is far from the true values, the efficiency of the resulting optimal
design could be very low. For c-optimal designs, the number of support points can be even less than p,
and therefore not all the parameters can be estimated. Indeed, a c-optimal design for a specific parameter
may produce poor estimates for other parameters in the model. Assuming a prior distribution for the
parameters will produce more support points, which allows for model checking, and the number of
support points will be larger as the prior distribution becomes more dispersed. However, as the prior
distribution becomes more diffuse, the relative efficiency of the design will be lower. A similar approach
to the Bayesian is the Maximin approach where we specify a discrete set of possible parameter values
instead of a continuous prior distribution. In the Maximin approach we choose the design which
maximizes the minimum efficiency through the different parameter values. Maximin and Bayesian
optimal designs are more robust to misspecifications of the parameter values. Biedermann, Dette, and
Pepelyshev (2004) explore the Maximin approach in a two exponential compartment model based on D-
efficiencies; Dette, Haines, and Imhof (2005) analyze the relationship between the Maximin and Bayesian
approaches for linear and nonlinear regression models.
Because of the dependence of the optimal designs on the parameter values it is not possible to make
general recommendations. However, there are some studies in the literature addressing this problem in the
field of pharmacokinetics that give some insights. Atkinson et al. (1993) computed D-optimal and c-
optimal designs for the area under the concentration curve (AUC), the maximum concentration (cmax), and
the time to maximum concentration (tmax) in an open one-compartment model with extravascular
administration and compare the efficiency of these designs with the typical geometric design12 with 18
12
In the geometric design readings are taken at approximately equal intervals in log-time. In this particular case,
measures are taken at times, in hours, 0.166, 0.333, 0.5, 0.666, 1, 1.5, 2, 2.5, 3, 4, 5, 6, 8, 10, 12, 24, 30, and 48.
49
measures. They also proposed a “c-omnibus” design where the sum of suitable scaled asymptotic
variances of the three parameters of interest is minimized. They worked with prior values for the
parameters and prior distributions (Bayesian approach). The D-optimal design with a prior value for the
parameters has three support points with equal weights (it takes measures at 0.23, 1.39, and 18.42 hours).
c-optimal designs rely on less than three points, 2, 2, and 1 for AUC, tmax, and cmax respectively, so their
utility is dubious. Optimal designs based on a prior distribution for the parameters give more support
points. They show that if there is a good knowledge about the parameter values, the D-optimal or c-
omnibus designs are quite better than the 18-point geometric design, but if there is great uncertainty in the
parameter values, the 18-point geometric design might be reasonable to use. Indeed, if there is an interest
in estimation of several parameters, the 18-point geometric design is a good alternative. Mentre, Mallet,
and Baccar (1997) analyzed the open one-compartment model with intravascular administration in a
population setting and paid attention to the trade-off of number of subjects and number of measures per
subject under a D-optimality criterion. Although they did not carry out a simulation study, we present
their results in the next section because their analysis is closer to those presented there. Nevertheless, for
this particular model (cf. (2.3)) it is possible to get an analytic solution for the D-efficiency criteriion
which corresponds to take the first measure at time 0 and the second one at time 1 kel (Melas, 2005).
Finally, it is important to note that all the precedent discussion has sense under the assumption of
independent errors of constant variance.
5.2. Simulation Studies
In this section we present a summary of some simulation studies approaching the sampling strategies
problem in pharmacokinetics. These studies focus mainly on the following aspects:
1. Optimal sampling times.
2. Trade-off between number of subjects and number of samples per subject.
3. The effect of randomness in the sampling times.
4. The gain in the precision of the estimates due to the inclusion of subjects with just one measure.
Sheiner and Beal (1983)
They worked with the open one-compartment model with intravascular administration of a regular
repetitive bolus dose. They simulated 2 measures per individual with random times for 50 individuals
(50×2 design) and then went over some departures from this base design. They also compared the STS
method (cf. Section 4.2.2) with the method based on linearization (cf. Section 4.3.1). They found that the
inter-individual variances were poorly estimated even with the large sample size of 50 individuals. They
compared the 50×2 design with 33×3 and 25×4 designs; while with the STS method the trade-off clearly
favored more samples per individual, with the method based on linearization the gain was not so
important. Finally, they investigated the effect of adding data from individuals with a single measure.
They added 50 and 100 single observations to the basic design. While the STS method clearly cannot take
advantage from these data, the method based on linearization can; with the addition of these points and
the linearization based method, they observed an improvement on the precision of all the estimates except
for the intra-individual variance. With respect to the measure times, they explored some rigid designs and
compared the results with the ones obtained from a routine type data which were simulated by randomly
choosing the dosing interval and the sampling times, but they did not find important differences in the
quality of the estimates with the better designed data.
Al-Banna, Kelman, and Whiting (1989)
They worked with the open one-compartment model with intravascular administration of a single dose.
They ran simulations under the assumption of independent normal distributions for the individual
parameters and a normal distribution for the random errors with variance proportional to the actual
concentration to compare three sampling schemes: 2 measures per individual in a group of 50 (50×2
design), 3 measures per individual in a group of 50 (50×3 design), and 3 measures per individual in a
50
group of 33 (33×3 design). They tried different measure times for each design from as early as possible
after the intravenous bolus dose (5 minutes after administration) until as late as possible in such a way
that a minimum response be still observable (20 hours after administration); at each time they added a
random element from a uniform distribution with a range of ± 1 hour to mimic a real study. The
evaluation of each sampling design was made on the precision and bias of the parameter estimates
obtained using a method based on linearization (cf. Section 4.3.1). Considering the trade-off of number of
individuals and number of measures per individual, they compared the 33×3 designs with the 50×2
designs with the first measure at 5 minutes after administration, the second one at different times between
1 and 20 hours and the third one, for the 33×3 designs, at 20 hours. The 33×3 designs gave quite better
results in the estimation of parameters than the 50×2 designs no matter the time of the second measure.
There were no clear insights about the best moment to take the second measure with the 50×2 designs,
and while earlier times produced better estimates for some parameters, later times worked better with
others.
Ette et al. (1994)
They worked with a similar setting as Al-Banna, Kelman, and Whiting (1989) but tested three and four-
point designs with a first measure at 5 minutes after dose, the last measure at 240 minutes after dose, and
the intermediate measures at different time points. In both designs a fixed number of 48 measures was
considered. In the three-point design 16 different subjects were sampled at each time point, and in the
four-point design 12 different subjects were sampled at each time point, so just one measure per subject
was taken13. Similarly to the results of Al-Banna, Kelman, and Whiting (1989), they found that, with the
three-point designs, the overall efficiency on the estimation does not depend on the location of the third
sample provided that the other two were as early and as late as possible. While earlier times for the
intermediate measure produced better estimates for volume of distribution, later times produced better
estimates for clearance. The four-point design was not markedly better than the three-point design in
overall efficiency.
Jonsson, Wade, and Karlsson (1996)
They evaluated the effect of taking two samples instead of one during each visit to a clinic (that is in the
context of routine clinical data) with simulated data from a one-compartment model with extravascular
administration at steady state. The dosing interval was 12 hours and the basic design considered two visits
per day, one during the morning and one during the afternoon. The second measures were taken at the
same time (time 0) or 1 or 2 hours after the first one. They also analyzed a real data set with similar
characteristics to the simulated data. They compared the following designs with a fixed number of 200
samples: 100 patients with two visits and one sample per visit, 75 patients of which 25 had two visits and
two samples per visit and 50 had two visits and one sample per visit, and 50 patients with two visits and
two samples per visit. They found that the quality of the parameter estimates, with respect to precision
and bias, was greater when two measures per visit were taking in some of the patients (the 75 patients
design), and perhaps more interesting, that this improvement does not depend on the time of the second
measure (at time 0, 1, or 2). Sampling designs where one fraction of the patients have only early samples
(morning) and the other fraction have only late samples (afternoon) were inferior to designs where the
patients had both early and late samples, even when the number of samples is the same in both designs.
Mentre, Mallet, and Baccar (1997)
They focused in the population analysis of the one-compartment model with intravascular administration
and restricted their analysis to a finite set of sampling times (0.5, 1, 2, 4, 7, and 24 hours after
administration). They compared different designs with a fixed amount of 60 measures (60×1, 30×2, 20×3,
15×4, 12×5, and 10×6). Among them the optimal design is the 30×2 design with the first measure at 0.5
and the second one at 24 hours after administration. The 20×3 design with a third measure at 0.5 or 1 hour
is 83% as efficient as the optimal design, and the efficiency decreases as more measures per subject are
considered. If only one measure is available per subject (60×1 design), the optimum design takes 19, 23,
and 18 measures at 0.5, 7, and 24 hours after administration respectively, and it is 58% as efficient as the
13
This is the case of destructive population pharmacokinetic studies where animals are sacrificed when the measure
is taken.
51
optimal 30×2 design. An important result is that the optimal population design based on D-optimality
generally repeats the D-optimal design for all the individuals.
5.3. Sparse Data Analysis
In Section 4.3 we mentioned that an important advantage of the method based on linearization over the
Two-Stage approach is that the first one is capable to use data from individuals without enough data
points to estimate the individual pharmacokinetic models. Due to the fact that in pharmacokinetics it is
sometimes impossible to obtain several measures per subject, this characteristic of the methods based on
linearization is quite promising.
Sheiner and Beal (1983) made a simulation study to investigate the effect of the addition of single
measures in the one-compartment model with intravascular administration. In this section we will
perform some simulations to get more information about how much we can gain from individuals with
fewer measures than parameters in the individual model. In Section 5.3.1 we will work with the one-
compartment model with intravascular administration but considering a correlation between the random
effects, which was not the case in the study of Sheiner and Beal (1983). In Section 5.3.2 we will work
with the one-compartment model with extravascular administration and assuming uncorrelated random
effects. In all the cases we will produce 200 simulations to have a clear idea about the distribution of the
estimated parameters.
5.3.1. Addition of Individuals with One Measure in the One-Compartment Model with
Intravascular Administration
Here we will simulate data from a one-compartment model with intravascular administration of a dose of
1 unit. Following the recommendation of Sheiner and Beal (1983) for the relation between the individual
parameters and the fixed and random effects, we have that the model is given by
1 1
yij =
φ1i
exp ( −φ2i xij ) + eij =
β2e b2 i ( )
exp − β1eb1i xij + eij ,
ln (φ1i ) = ln ( β1 ) + b1i ,
ln (φ2i ) = ln ( β 2 ) + b2i .
The constant of elimination and volume of distribution fixed effects are settled at β 1 = 0.01 and β 2 = 100.
The random effects, b1i and b2i, are assumed to have a bivariate normal distribution, each of them with a
mean of 0 and a standard deviation of 0.15, and a correlation of -0.7. For the intra-individual variation we
assume that the error terms, eij, have independent normal distributions with mean 0 and standard deviation
0.1 times the actual concentration.
We simulate data from 10 subjects with 10 measures per subject taken at equidistant times in logarithmic
scale; the sampling times are, on linear scale, 10, 14, 20, 29, 41, 58, 83, 118, 168, and 240. Then we add
sequentially data simulated for 20, 40, and 60 subjects with single measures taken at times chosen at
random from the 10 original sampling times. Therefore we compare the following designs: 10×10, 10×10
+ 20×1, 10×10 + 40×1, and 10×10 + 60×1. We show the results in Figures 5.1, 5.2, 5.3, 5.4, 5.5, and 5.6
by using boxplots. The dashed horizontal lines represent the real value of the parameters and the “+”
symbols the estimated means.
In Figures 5.1 and 5.2 we can see that for the fixed effects, the addition of individuals with just one
measure improves the precision of the estimators. About bias, we see that for the constant of elimination
fixed effect the addition of the single measures increases the estimated value while for the volume of
distribution we observe a slight decrease. The mean and median of the simulated values are quite close to
the real value for the constant of elimination and slightly differ from it for the volume of distribution. The
small bias that we get with all the estimators can be however a result of the linearization of the
expectation surface applied to fit the model. In Figures 5.3, 5.4, and 5.5 we see that for the random effects
52
the gain in precision is less important than for the random effects. For the random effect variances we
observe that the values are increased as more individual measures are included while for the correlation
between them the mean and median of the estimated values remain approximately constant. Finally we
see in Figure 5.6 that the addition of the individual measures produces no effect in the intra-individual
variation which indeed is theoretically consistent.
0.0110
Constant of Elimination
0.0100
0.0090
10x10 10x10 + 20x1 10x10 + 40x1 10x10 + 60x1

Sampling Design
FIGURE 5.1 Boxplots for the estimated constants of elimination obtained from 200 simulations with different
designs. The horizontal dashed line represents the real parameter value and the + marks the sample means.
115
110
Volume of Distribution
105
100
95
90
10x10 10x10 + 20x1 10x10 + 40x1 10x10 + 60x1

Sampling Design
FIGURE 5.2 Boxplots for the estimated volumes of distribution obtained from 200 simulations with different
53
Constant of Elimination Random Effect Standard Deviation
0.25
0.20
0.15
0.10
0.05
10x10 10x10 + 20x1 10x10 + 40x1 10x10 + 60x1

Sampling Design
FIGURE 5.3 Boxplots for the estimated constant of elimination random effect standard deviations obtained from
200 simulations with different designs. The horizontal dashed line represents the real parameter value and the +
marks the sample means.
Volume of Distribution Random Effect Standard Deviation
0.25
0.20
0.15
0.10
0.05
10x10 10x10 + 20x1 10x10 + 40x1 10x10 + 60x1

Sampling Design
FIGURE 5.4 Boxplots for the estimated volume of distribution random effect standard deviations obtained from 200
simulations with different designs. The horizontal dashed line represents the real parameter value and the + marks the
sample means.
54
0.2
Random Effects Correlation
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
10x10 10x10 + 20x1 10x10 + 40x1 10x10 + 60x1

Sampling Design
FIGURE 5.5 Boxplots for the estimated constant of elimination and volume of distribution random effects
correlations obtained from 200 simulations with different designs. The horizontal dashed line represents the real
parameter value and the + marks the sample means.
1.4
1.2
Intra-Individual Standard Deviation
1.0
0.8
0.6
0.4
0.2
0.0
10x10 10x10 + 20x1 10x10 + 40x1 10x10 + 60x1

Sampling Design
FIGURE 5.6 Boxplots for the estimated intra-individual standard deviations obtained from 200 simulations with
different designs. The horizontal dashed line represents the real parameter value and the + marks the sample means.
5.3.2. Addition of Individuals with One and Two Measures in the One-Compartment Model with
Extravascular Administration
In this section we will simulate data from a one-compartment model with extravascular administration of
a dose of 1 unit, and for simplicity we will assume that the entire drug is absorbed by the body. In the
same way as in Section 5.3.1 we will parameterize the model in terms of the logarithms of the individual
parameters, so the model is given by
55
φ1i
yij = exp ( −φ2i xij ) − exp ( −φ1i xij )  + eij
φ3i (φ1i − φ2i )  
b
β1e 1i
exp ( − β 2 eb xij ) − exp ( − β1eb xij )  + eij ,
= 2i 1i
β3 e ( β1e − β 2 e ) 
b
3i b 1i b 2i 
ln (φ1i ) = ln ( β1 ) + b1i ,
ln (φ2i ) = ln ( β 2 ) + b2i ,
ln (φ3i ) = ln ( β3 ) + b3i .
The constant of absorption, constant of elimination, and volume of distribution fixed effects are settled at
β1 = 0.1, β2 = 0.01, and β3 = 100. The random effects, b1i, b2i, and b3i, are assumed to have independent
normal distributions, each of them with a mean of 0 and a standard deviation of 0.15. For the intra-
individual variation we assume that the error terms, eij, have independent normal distributions with mean
0 and standard deviation 0.1 times the actual concentration.
We simulate data from 10 subjects with 10 measures per subject (10×10) taken at the same times as in the
previous section, that is, at 10, 14, 20, 29, 41, 58, 83, 118, 168, and 240. Then we generate three
additional designs by adding to the original 10×10 design data simulated for 20 subjects with single
measures (20×1), data simulated for 10 subjects with two measures per subject (10×2), and both, the 20×1
and 10×2 data. Therefore we compare the following designs: 10×10, 10×10 + 20×1, 10×10 + 10×2, and
10×10 + 20×1 + 10×2. We show the results in Figures 5.7, 5.8, 5.9, 5.10, 5.11, 5.12, and 5.13.
In Figures 5.7, 5.8, and 5.9 we can see that the gain in precision for the fixed effects is quite small. The
improvement in precision due to the addition of the 20×1 and the 10×2 points is quite similar and adding
both, the 20×1 and the 10×2 data points produces a clearer improvement in the precision of the estimates.
The addition of individuals with just one or two measures produces a small change in the mean and
median of the estimates; while for the constant of absorption and volume of distribution the addition of
these data decreases the estimated values, for the constant of elimination the addition of these data
increases the estimated values. For the variances of the random effects the improvement in precision is
not noticeable for the constant of absorption in Figure 5.10, barely perceptible for the constant of
elimination in Figure 5.11, and quite clear for the volume of distribution in Figure 5.12. In Figure 5.13 we
can see that the effect of the addition of the data with two measures per individual does not produce an
improvement in the precision of the estimation of the intra-individual variation.
0.12
0.11
Constant of Absoption
0.10
0.09
10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

Sampling Design
FIGURE 5.7 Boxplots for the estimated constants of absorption obtained from 200 simulations with different
56
0.0110
0.0105
Constant of Elimination
0.0100
0.0095
0.0090
10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

Sampling Design
FIGURE 5.8 Boxplots for the estimated constants of elimination obtained from 200 simulations with different
115
110
Volume of Distribution
105
100
95
90
85
10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

Sampling Design
FIGURE 5.9 Boxplots for the estimated volumes of distribution obtained from 200 simulations with different
57
Constant of Absorption Random Effect Standard Deviation
0.35
0.30
0.25
0.20
0.15
0.10
0.05
0.00
10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

Sampling Design
FIGURE 5.10 Boxplots for the estimated constant of absorption random effect standard deviations obtained from
Constant of Elimination Random Effect Standard Deviation
0.25
0.20
0.15
0.10
0.05
10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

Sampling Design
FIGURE 5.11 Boxplots for the estimated constant of elimination random effect standard deviations obtained from
58
Volume of Distribution Random Effect Standard Deviation
0.25
0.20
0.15
0.10
0.05
10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

Sampling Design
FIGURE 5.12 Boxplots for the estimated volume of distribution random effect standard deviations obtained from
0.8
Intra-Individual Standard Deviation
0.6
0.4
0.2
0.0
10x10 10x10 + 20x1 10x10 + 10x2 10x10 + 20x1 + 10x2

Sampling Design
FIGURE 5.13 Boxplots for the estimated intra-individual standard deviations obtained from 200 simulations with
different designs. The horizontal dashed line represents the real parameter value and the + marks the sample means.
Considering the results of both simulations, it is clear that the method based on linearization takes
advantage from individuals whose individual models are not estimable in the estimation of the fixed and
random effects, and that this additional information is maybe more important for the fixed effects
estimation. For the intra-individual variance it is clear that individuals with just one measure cannot
contribute to the estimation of this parameter, and our simulation does not show a contribution due to the
individuals with two measures. In addition we note that most of the boxplots show symmetric
distributions with the exception of the random effects correlation in Figure 5.5 and the intra-individual
variances in Figures 5.6 and 5.13. The asymmetry is stronger for the intra-individual variances.
59
6. Conclusions and Recommendations
In this section we present our conclusions and recommendations. Our conclusions come in Section 6.1,
and in Section 6.2 we present a list of practical recommendations to take into account when analyzing
pharmacokinetic data. We hope that this list could serve as a reference point, and that the theory and
references presented in the previous sections could lead the reader who wants to go further.
6.1. Conclusions
6.1.1. Individual Pharmacokinetics
Estimation and Inference
Individual pharmacokinetic analysis is carried through standard nonlinear regression techniques

which are based on a linearization of the expectation surface. Under this linearization, asymptotic
results similar to the ones from linear regression hold.
The classical assumptions in nonlinear regression are that the random errors are independent and
normally distributed with constant variance. Deviations from these assumptions can be
accommodated in the model, but computations are considerably more complicated.
Confidence intervals can be computed based on the asymptotic results. However we must take into
account that the confidence intervals are only approximate, not just because of the asymptotic results,
but because the standard errors are compute based on a linearization of the nonlinear model.
Once the model has been fitted, it is straightforward to get approximate standard errors for any kind
of secondary pharmacokinetic parameters. These standard errors can be used to compute confidence
intervals.
Transformations and Reparameterizations
Transformations on the data change the structure of the random errors. The functional form of
nonlinear models usually corresponds to physical, biological, or chemical relationships which are
important to preserve in the analysis. Reparameterizations do not change the structure of the data nor
the residuals as transformations do. Hence, we must be more cautious with transformations of the
model than with reparameterizations.
Different parameterizations will produce different accuracies in the linear approximation of the
expectation surface and therefore will produce different inferential results. A methodology based on
second derivatives of the expectation surface, to obtain measurements of curvature or nonlinearity,
has been proposed in the last decades to quantify the appropriateness of the linear approximation of
the expectation surface under different parameterizations. These measures constitute a mechanism to
get insights about which parameterization is more convenient in each case. However, the results,
besides their difficulty to interpret, are quite dependent on the design, so there are no general
guidelines about which parameterization is more convenient for each pharmacokinetic model. In our
opinion, although very promising, this theory turns out to be of little practical use.
60
The computations involved to get the measures of curvature are quite complicated. In our opinion, the
main difficulty is the determination of the dimension of the vector space spanned by the vectors of
second derivatives.
Software
Nonlinear regression models analysis is available in most of the statistical software, and mostly they
work with some modification of the Gauss-Newton algorithm to estimate the model. About
departures from the classical assumptions, they used to allow for different weighting schemes, so the
heterogeneity of variances problem is considered. Flexibility to deal with autocorrelations or
departures from normality is not available in WinNonlin nor in the nonlinear regression tools of SAS
and R. However, as mentioned in Section 3.1.3, the asymptotic results hold even when the normality
assumption does not.
6.1.2. Population Pharmacokinetics
Hierarchical nonlinear models constitute the natural framework for the statistical analysis in
population pharmacokinetics. They provide great flexibility in the data modeling, so we can specify
different relations between the fixed effects and random effects, and their dependence on different
treatments.
The naive pooled approach produces biased and imprecise estimators, and it is not able to estimate
the inter-individual variability. Therefore this approach must not be considered in the population
analysis.
The Two-Stage approach cannot use the data from individuals whose individual models cannot be
estimated. This is a main drawback when analyzing routine clinical data, or in situations when there
are strong limitations to get several samples per individual. On the other hand, when several samples
per individual are available, the population pharmacokinetic parameter estimates obtained with this
approach are, according to the simulation studies of Sheiner and Beal (1980, 1981, 1983) as good as
the ones obtained with the methods based on linearization.
The methods based on linearization are the best alternative to analyze population pharmacokinetic
models. This approach produces a reasonable good estimation of population and individual
pharmacokinetic parameters, and inter and intra-individual variation.
The main difficulty with the methods based on linearization is that they strongly rely on intensive
computations, so numerical and convergence problems occur quite often. Due to the fact that the
structure of a hierarchical nonlinear model is quite complicated, the modeling phase demands a
considerably amount of time, and sometimes it is necessary to sacrifice some data.
Transformations and Reparameterizations
As mentioned before, the measures of curvature approach to decide which parameterization is more
suitable is quite appealing but of little practical use. Indeed, these measures apply just to the case of
individual pharmacokinetics (that is nonlinear regression models), and there is not a similar approach
to analyze the population case in the literature. Since the real interest in pharmacokinetics is most of
the time on population models, the measures of curvature theory turns out to be of even less
applicability.
Software
It is possible to fit hierarchical nonlinear models by linearization with SAS and R. It is possible to
model heterogeneity of variances and autocorrelation schemas for the intra-individual errors with R.
61
6.1.3. Sampling Strategies
Optimal Designs
Optimal designs with nonlinear models depend on the parameter values. Since they are unknown, we
have to make an assumption about their values, and therefore, the resulting optimal designs are just
locally optimal. This characteristic constitutes a great limitation.
There are three approaches to deal with the problem of unknown parameters: To assume a prior value
for the parameters, to assume a prior distribution (Bayesian approach), or to use sequential designs.
Assuming a prior value for the parameters can lead to misleading results if our guessing is quite far
from the real values. Indeed, the resulting designs used to have not enough support points for model
checking, and with c-optimal designs not all the parameters can be estimated. The sequential designs
approach is of poor practical use in pharmacokinetics. Therefore, the Bayesian approach, or the in
spirit similar Maximin approach, turns out to be in our opinion the best option. However, we must
consider that the lower the precision on the prior distribution, the lower the relative efficiency of the
resulting design. If the prior distribution has much dispersion, then the efficiency of the resulting
optimal design can be very close to the efficiency of a typical geometric design.
Another limitation with optimal designs is that they do not assure the same efficiency for all the
pharmacokinetic parameters. Due to the fact that we are usually interested in several parameters, the
optimal design can be quite efficient in the estimation of some parameters but of low efficiency for
others.
In the population context, the applicability of optimal designs can be even of less practical use due to
the intrinsic differences among individuals. Since the individual parameters differ among individuals,
the efficiency of the optimal design is not constant among them.
Optimal designs are continuous designs. In practice, an exact design is used, so its relative efficiency
may be lower than 100%.
Considering all the previous factors, that is, the uncertainty about the parameter values, the
differences in the parameter values among individuals, that usually our interest is on the estimation of
several pharmacokinetic parameters, and that exact designs must be implemented, the task of
deciding on the best design for a specific case is quite complicated. If in addition we consider the fact
that the assumption of independent errors with constant variance usually does not hold, the utility of
optimal design in this field is fairly limited.
Sparse Data Analysis
The methods based on linearization of the hierarchical nonlinear model can take advantage of
subjects with a small number of measures, even from subjects whose individual models cannot be
fitted. This is a clear advantage over the traditional Two-Stage approach.
6.2. Data Analysis Recommendations
6.2.1. Individual Pharmacokinetics
We recommend fitting the model in its original functional form and with the usual scale measure for
the variables. Data and model transformations must be considered only when strong statistical and
non statistical foundations are available.
Residual plots must be observed to decide on the best weighting scheme.
In order to get confidence intervals for functions of the parameters of the original parameterization
we suggest applying the first approach mentioned in Section 3.2 if possible, unless we have enough
evidence that the new parameterization produce a better linear approximation of the expectation
62
surface. When the parameter of interest is not a 1 to 1 function of one of the original parameters or a
function of more than one of the original parameters, then we can use the second approach with the
asymptotic results in (3.11) and (3.13).
6.2.2. Population Pharmacokinetics
Due to its simplicity, the Two-Stage approach can be an alternative to get population estimators for
the pharmacokinetic parameters when there are several data points per individual or when the
methods based on linearization do not converge.
When using the methods based on linearization, the NPD approach can be used to get initial
approximations for the iterative methods.
As a first step in the population analysis, we can estimate individual models. It allows us to identify
individuals with some strange data points or individual curves that do not fit well to the specified
compartment model. Indeed, individual estimates and individual confidence intervals are useful to get
a first insight about the dependence of the fixed effects on the different treatments and about the
suitability of the random effects on the model.
When the number of parameters in the model is high in relation with the number of individuals, a
diagonal covariance matrix for the random effects may be useful to avoid numerical problems.
We can use the likelihood-ratio tests to decide if a more complex model fits the data significantly
better than a simpler one, when the more complex model differs from the simple model only by the
addition of one or more parameters. In addition we can use the AIC or BSC to compare any pair of
models. With these tools we can decide on the best structure for the fixed and random effects on the
model.
6.2.3. Sampling Strategies
Optimal Designs
Optimal design studies, simulation studies, and practical experience must be considered to make a
decision about the sampling strategy.
For the one-compartment model with intravascular administration, optimal designs and simulations
studies suggest to take one measure as soon as possible, one measure as late as possible, and a third
measure in between. For the first measure, a small time must be considered in order to ensure the
drug has been mixed through the entire compartment, so a representative positive concentration be
recorded. For the last measure, the optimal time is around 1/kel. There is no clear insight about the
time for the measure in between.
Sparse Data Analysis
Individuals with just one or two measures can be incorporated in the analysis by using the methods
based on linearization. However, we recommend the inclusion of a small number of subjects with
several samples to get good estimates for inter-individual and intra-individual variation. Particularly,
intra-individual variation estimation is quite poor with sparse data.
63
Appendix: R Code for the Measures of
Curvature Computation
########################################################################################
## These libraries are required for the computations presented in this report
########################################################################################
library(PK)
library(rgenoud)
library(odesolve)
library(PKfit)
library(nlme)
library(lattice)
########################################################################################
## This is the code for computing the measures of curvature on Section 3.5.4
## with the original parameterization
########################################################################################
## First and second derivatives

f<- expression(5*Ka/(Vd*(Ka-Kel))*(exp(-Kel*Time)-exp(-Ka*Time)))
df<-deriv(f,c("Ka","Kel","Vd"),hessian=TRUE)
## Input values
original<-nls(Conc~5*Ka/(Vd*(Ka-Kel))*(exp(-Kel*Time)-exp(-Ka*Time)),
data=ex1, start=c(Ka=5, Kel=0.2, Vd=500), model=T)
Ka<-coef(original)[1]
Kel<-coef(original)[2]
Vd<-coef(original)[3]
## Here we compute the matrices F.. (label F2) and D
F2<-matrix(nrow=8,ncol=9)
D<-matrix(nrow=8,ncol=9)
for (i in 1:8){
Time<-ex1$Time[i]
eval(df)
## Matrix F.. and D
F2[i,]<-c(.hessian[1],.hessian[2],.hessian[3],.hessian[4],.hessian[5],.hessian[6],
.hessian[7],.hessian[8],.hessian[9])
D[i,]<-c(.grad[1],.grad[2],.grad[3],.hessian[1],.hessian[2],.hessian[3],.hessian[5],
.hessian[6],.hessian[9])
}
D
## QR decomposition
## The matrix Q1 contains the first p+p' columns of Q
QR<-qr(D)
Q<-qr.Q(QR)
Q1<-matrix(nrow=8, ncol=5)
for (j in 1:5){
for (i in 1:8){
Q1[i,j]<-Q[i,j]
}
}
R1<-t(Q1)%*%D
## Computing A..
## The array A.. has 5 faces. Each face is a 3x3 matrix, and they are given
## in A1, A2, A3, A4, and A5
A1<-matrix(nrow=3,ncol=3)
64
A<-t(Q1)%*%F2
for (i in 1:3){
for (j in 1:3){
A1[i,j]<-A[1,3*(i-1)+j]
A2[i,j]<-A[2,3*(i-1)+j]
A3[i,j]<-A[3,3*(i-1)+j]
A4[i,j]<-A[4,3*(i-1)+j]
A5[i,j]<-A[5,3*(i-1)+j]
}
}
A1; A2; A3; A4; A5
## Relative Curvatures
R11<-matrix(nrow=3,ncol=3)
for (i in 1:3){
for (j in 1:3){
R11[i,j]<-R1[i,j]
}
}
R11
C1<-t(solve(R11))%*%A1%*%solve(R11)*0.0002277*3^0.5; C1
## RMS parameter effects curvature: original parametrization
cb<-((2*sum(C1*C1,C2*C2,C3*C3)+sum(diag(C1))^2+sum(diag(C2))^2+sum(diag(C3))^2)/15)^.5
ci<-((2*sum(C4*C4,C5*C5)+sum(diag(C4))^2+sum(diag(C5))^2)/15)^.5
cb; cb*qf(0.95,3,5)^0.5
ci; ci*qf(0.95,3,5)^0.5
########################################################################################
## To compute the measures of curvature for the half life parameterization, change the
## the first lines of the code given above by the following lines
########################################################################################

f<- expression(5*Ka/(Vd*(Ka-log(2)/t_half))*(exp(-log(2)/t_half*Time)-exp(-Ka*
Time)))
df<-deriv(f,c("Ka","t_half","Vd"),hessian=TRUE)
## Input values
half_life<-nls(Conc~5*Ka/(Vd*(Ka-log(2)/t_half))*(exp(-log(2)/t_half*Time)-exp(-Ka*
Time)), data=ex1, start = c(Ka=5, t_half=3.5, Vd=500), model=T)
Ka<-coef(half_life)[1]
t_half<-coef(half_life)[2]
Vd<-coef(half_life)[3]
########################################################################################
## To compute the measures of curvature for the total clearance parameterization,
## change the first lines of the code given above by the following lines
########################################################################################

f<- expression(5*Ka/(Cl/Kel*(Ka-Kel))*(exp(-Kel*Time)-exp(-Ka*Time)))
df<-deriv(f,c("Ka","Kel","Cl"),hessian=TRUE)
## Input values
total_clearance<-nls(Conc~5*Ka/(Cl/Kel*(Ka-Kel))*(exp(-Kel*Time)-exp(-Ka*Time)),
data=ex1, start=c(Ka=5, Kel=0.2, Cl=100), model=T)
Ka<-coef(total_clearance)[1]
Kel<-coef(total_clearance)[2]
Cl<-coef(total_clearance)[3]
65
References
1. Al-Banna M. K., Kelman A. W., and Whiting B. (1989). Experimental Design and Efficient
Parameter Estimation in Population Pharmacokinetics. Journal of Pharmacokinetics and
Biopharmaceutics, Vol. 18, 347–360.
2. Atkinson A. C., Chaloner K., Herzberg A. M., and Juritz J. (1993). Optimum Experimental Designs
for Properties of a Compartmental Model. Biometrics, Vol. 49, 325–337.
3. Atkinson A. C. and Donev A. N. (1996). Optimum Experimental Designs. Clarendon Press, Oxford.
4. Bates D. M. and Watts D. G. (1980). Relative Curvature Measures of Nonlinearity. Journal of the
Royal Statistical Society. Series B (Methodological), Vol. 42, 1–25.
5. Bates D. M. and Watts D. G. (1981). Parameter Transformations for Improved Approximate
Confidence Regions in Nonlinear Least Squares. The Annals of Statistics, Vol. 9, 1152–1167.
6. Bates D. M. and Watts D. G. (1988). Nonlinear regression analysis and its applications. Wiley.
7. Beal S. L. and Sheiner L. B. (1982). Estimating Population Kinetics. Critical Reviews in Biomedical
Engineering, Vol. 8, 195–222.
8. Biedermann S., Dette H., and Pepelyshev A. (2004). Maximin Optimal Designs for a Compartmental
Model. mODa 7 – Advances in Model-Oriented Design and Analysis. Physica Verlag, 41–49.
9. D’Argenio D. Z. (1981). Optimal Sampling Times for Pharmacokinetic Experiments. Journal of
Pharmacokinetics and Biopharmaceutics. Vol. 9, 739–756.
10. Davidian M. and Giltinan D. M. (1995). Nonlinear Models for Repeated Measurement Data.
Chapman & Hall.
11. Dette H., Haines L. M., and Imhof L. A. (2005). Maximin and Bayesian Optimal Designs for Linear
and Non-Linear Regression Models. Statistica Sinica.
12. Ette E. I., Howie C. A., Kelman A. W., and Whiting B. (1994). Experimental Design and Efficient
Parameter Estimation in Preclinical Pharmacokinetic Studies. Pharmaceutical Research, Vol. 12,
729–737.
13. Jonsson E. N., Wade J. R., and Karlsson M. O. (1996). Comparison of Some Practical Sampling
Strategies for Population Pharmacokinetic Studies. Journal of Pharmacokinetics and
Biopharmaceutics, Vol. 24, 245–263.
14. Lindstrom M. J. and Bates D. M. (1990). Nonlinear Mixed Effects Models for Repeated Measures
Data. Biometrics, Vol. 46, 673–687.
15. Melas V. B. (2005). On the Functional Approach to Optimal Designs for Nonlinear Models. Journal
of Statistical Planning and Inference, Vol. 132, 93–116.
16. Mentre F., Mallet A., and Baccar D. (1997). Optimal Design in Random-Effects Regression Models.
Biometrika, Vol. 84, 429–442.
17. Pinheiro J. C. and Bates D. M. (2000). Mixed Effects Models in S and S-Plus. Springer.
18. Ritschel W. A. and Kearns G. L. (2004). Handbook of Basic Pharmacokinetics …Including Clinical
Applications (sixth edition). American Pharmacists Association.
66
19. Seber G. A. F. and Wild C. J. (1989). Nonlinear Regression. Wiley.
20. Shargel L. and Yu A. B. C. (1999). Applied Biopharmaceutics and Pharmacokinetics (fourth
edition). McGraw-Hill.
21. Sheiner, L. B. (1986). Analysis of Pharmacokinetic Data Using Parametric Models. III. Hypothesis
Test and Confidence Intervals. Journal of Pharmacokinetics and Biopharmaceutics, Vol. 14, 539–
555.
22. Sheiner, L. B. and Beal S. L. (1980). Evaluation of Methods for Estimating Population
Pharmacokinetic Parameters. I. Michelis-Menten Model: Routine Clinical Data. Journal of
Pharmacokinetics and Biopharmaceutics, Vol. 8, 553–571.
Pharmacokinetic Parameters. II. Biexponential Model and Experimental Pharmacokinetic Data.
Journal of Pharmacokinetics and Biopharmaceutics, Vol. 9, 635–651.
Pharmacokinetic Parameters. III. Monoexponential Model: Routine Clinical Pharmacokinetic Data.
Journal of Pharmacokinetics and Biopharmaceutics, Vol. 11, 303–319.
25. Venables W. N. and Ripley B. D. (2002). Modern Applied Statistics with S (Fourth edition).
Springer.
67

Analysis and Design of Pharmacokinetic Models

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analysis and Design of Pharmacokinetic Models

Uploaded by

Copyright:

Available Formats

TECHNISCHE UNIVERSITEIT EINDHOVEN

Department of Mathematics and Computer Science

Analysis and Design

E. E. M. van Berkum (TU/e)

Eindhoven, August 2006

Definitions and Nomenclature...................................................................................................................v

2. Compartment Analysis .....................................................................................................................3

3. Individual Pharmacokinetics ...........................................................................................................7

4. Population Pharmacokinetics ........................................................................................................27

5. Sampling Strategies ........................................................................................................................48

6. Conclusions and Recommendations ..............................................................................................60

Appendix: R Code for the Measures of Curvature Computation ........................................................64

β : r×1 vector of fixed population parameters

θˆ OLS : Ordinary least squares estimator for θ

θˆ WLS : Weighted least square estimator for θ

a : a×1 covariate vector of individual characteristics

cθ : Root mean square parameters effect curvature

F. : n×p matrix of derivatives of f ( θ ) with respect to the elements of θ

F.. : n×p×p array of second derivatives of f ( θ ) with respect to the elements of θ

S : Matrix proportional to the intra-individual covariance matrix R, specifically, S = σ 2 R

tmax : Time to maximum concentration

In pharmacokinetics a compartment is an entity which can be described by a definite volume and a

2.1. Principal Compartment Models

2.1.1. Open One-Compartment Model, Intravascular Administration

so the model can be written as

2.1.2. Open One-Compartment Model, Extravascular Administration

2.1.3. Open Two-Compartment Model, Intravascular Administration

The pharmacokinetic model is given by

2.1.4. Open Two-Compartment Model, Extravascular Administration

The pharmacokinetic model is given by

In this model the volume of the central compartment, VC, is given by

so the pharmacokinetic model can be written as

D ⋅ f ⋅ ka   k21 − α  −α ⋅t  k21 − β  − β ⋅t  k21 − ka  − ka ⋅t 

2.2. Further Considerations

D ⋅ f ⋅ ka  − kel ⋅(t − tlag ) − ka ⋅( t −tlag ) 

2.2.2. Number of Compartments

Open One-Compartment Model Open One-Compartment Model

Open Two-Compartment Model Open Two-Compartment Model

3.1. Nonlinear Regression Model

3.1.1. Model and Assumptions

can be transformed, applying a logarithmic transformation, to the linear model

and the nonlinear model (known as the Michaelis-Menten model)

3.1.2. Least Squares Estimation

which is generally biased downward. As in the linear case, σɶ 2 is usually replaced by

and σ 2 may be estimated by

1. Get an initial estimator for θ, for instance θˆ OLS .

3.1.3. Asymptotic Results

and hence, applying approximation (3.6) in (3.7) we have

By analogy with the linear regression model, (3.8) is minimized when

If θ̂ is within the small neighbourhood of θ∗ , then we have that

θˆ OLS ∼ N ( θ, σ 2 ΣOLS ) , ΣOLS

θˆ GLS ∼ N ( θ, σ 2 ΣGLS ) , ΣGLS

3.2. Inference for Functions of the Estimated Parameters

the elimination half-life parameter is defined by

and the total clearance by

3.3. Measures of Curvature or Nonlinearity

3.3.1. Intrinsic and Parameter Effects Nonlinearity

Performing the QR decomposition of D we have that

A.. = ( Q1 | Q '1 )  [ F..] .

f ( x,θ ) = 60 + 70 exp ( − x10φ ) .

60 80 100 120 140

60 80 100 120 140

60 80 100 120 140