5 views

Uploaded by marcogarieri

save

- Bayesian Analysys of SVAR Models
- Binder6.pdf
- Redes Neuronales - Inflación
- j_92
- Hiebert et Sydow, 2009pvar.pdf
- Fertilizer requirements in 2015 and 2030 revisited
- Series Temporales con Statsmodels Python
- A Wok Use
- JHONSON_DINARDO.pdf
- Social Media, Traditional Media, and Music Sales
- Chapter 00
- ex_coint
- Koop, Korobilis - Large Time-Varying Parameter VARs - 2012
- Untitled
- Anomaly Detection Capability of Existing Models of Continuity Equations in Continuous Assurance
- GAP BOOTSTRAP METHODS FOR MASSIVE DATA SETS WITH AN APPLICATION TO TRANSPORTATION ENGINEERING
- CMBLec Chapter 1 Reviewer
- Supremne Fulvic 5.Medical.aspects.and.Applications.of.Humic.substances
- Tutorial 2 Mode of Disease Transmission
- Pvar Stata Modul
- Variable Selection for Multivariate Cointegrated Time Series Prediction With PROC VARCLUS in SAS Enterprise Miner 7.1
- Var process
- 2 Pharmacokinetics 2018 Medical Pharmacology and Therapeutics
- Maddala_Limited Dependent Variable Models Using Panel Data
- 4136-MCQ.doc
- one price
- MS 2 Global ﬁnancial crisis, liquidity pressure in stock markets and eﬃciency of central bank interventions.pdf
- Bjornland Lecture (PhD Course) SVAR
- Arimaы
- Pastor Stambaugh
- Redação Científica_ a Prática de Fichamentos, Resumos, Resenhas, 8ª Ed., J.B. MEDEIROS
- El Sistema Educativo en La Actualidad
- Belladona I
- Promosi kesehatan
- regproy-formatopresupuestodeobra
- TOQUE DA IMAGINAÇÃO.docx
- MECHROOM-absoption
- 7
- 0450.111.Gioia.al.Mondo
- Dirección de Internet Para Descargar La Guia de Metodo Dow
- RELATÓRIO BUGAN_JUNHO.pdf
- maloclusiones_respiracion_buca.pdf
- Contoh Akta Otentik 1
- diagnosa Keperawatan Pangkreatitis
- Pt. Artha Multi Berkah Abadi
- au clair de la lune
- Data Definition
- criterios para elaborar una presentacion
- Habilidades Directivas II
- diego.docx
- BAB V.doc
- Steam Boiler
- Acuerdo 0006 de 2017.pdf
- [Free-scores.com]_carulli-ferdinando-3-duos-21787.pdf
- El Socialismo y El Movimiento Obrero Alemán Antes de 1914
- j PluS ASIA vs Utility Assurance
- Resumen de Sociología
- trabajjo consitucional4
- economia de produccion
- teory
- Manual
- Graduatoria
- Online
- Stochastic Processes - Ross
- Cdfnv

You are on page 1of 42

:

An Overview and Update

Marie Davidian and David M. Giltinan

Nonlinear mixed eﬀects models for data in the form of continuous, repeated measurements on

each of a number of individuals, also known as hierarchical nonlinear models, are a popular platform

for analysis when interest focuses on individual-speciﬁc characteristics. This framework ﬁrst enjoyed

widespread attention within the statistical research community in the late 1980s, and the 1990s saw

vigorous development of new methodological and computational techniques for these models, the

emergence of general-purpose software, and broad application of the models in numerous substantive

ﬁelds. This article presents an overview of the formulation, interpretation, and implementation of

nonlinear mixed eﬀects models and surveys recent advances and applications.

Key Words: Hierarchical model; Inter-individual variation; Intra-individual variation; Nonlinear

mixed eﬀects model; Random eﬀects; Serial correlation; Subject-speciﬁc.

1. INTRODUCTION

A common challenge in biological, agricultural, environmental, and medical applications

is to make inference on features underlying proﬁles of continuous, repeated measurements

from a sample of individuals from a population of interest. For example, in pharmacokinetic

analysis (Sheiner and Ludden 1992), serial blood samples are collected from each of several

subjects following doses of a drug and assayed for drug concentration, and the objective is to

characterize pharmacological processes within the body that dictate the time-concentration

relationship for individual subjects and the population of subjects. Similar objectives arise

in a host of other applications; see Section 2.1.

The nonlinear mixed eﬀects model, also referred to as the hierarchical nonlinear model,

has gained broad acceptance as a suitable framework for such problems. Analyses based on

this model are now routinely reported across a diverse spectrum of subject-matter literature,

and software has become widely available. Extensions and modiﬁcations of the model to

Marie Davidian is Professor, Department of Statistics, North Carolina State University, Box 8203, Raleigh,

NC 27695 (E-mail: davidian@stat.ncsu.edu). David Giltinan is Staﬀ Scientist, Genentech, Inc., 1 DNA Way

South San Francisco, CA 94080-4990 (E-mail: giltinan@gene.com).

1

handle new substantive challenges continue to emerge. Indeed, since the publication of

Davidian and Giltinan (1995) and Vonesh and Chinchilli (1997), two monographs oﬀering

detailed accounts of the model and its practical application, much has taken place.

The objective of this article is to provide an updated look at the nonlinear mixed ef-

fects model, summarizing from a current perspective its formulation, interpretation, and

implementation and surveying new developments. In Section 2, we describe the model and

situations for which it is an appropriate framework. Section 3 oﬀers an overview of popular

techniques for implementation and associated software. Recent advances and extensions that

build on the basic model are reviewed in Section 4. Presentation of a comprehensive bibli-

ography is impossible, as, pleasantly, the literature has become vast. Accordingly, we note

only a few early, seminal references and refer the reader to Davidian and Giltinan (1995) and

Vonesh and Chinchilli (1997) for a fuller compilation prior to the mid-1990s. The remaining

work cited represents what we hope is an informative sample from the extensive modern

literature on the methodology and application of nonlinear mixed eﬀects models that will

serve as a starting point for readers interested in deeper study of this topic.

2. NONLINEAR MIXED EFFECTS MODEL

2.1 The Setting

To exemplify circumstances for which the nonlinear mixed eﬀects model is an appropriate

framework, we review challenges from several diverse applications.

Figure 1 goes here

Pharmacokinetics. Figure 1 shows data typical of an intensive pharmacokinetic study

(Davidian and Giltinan 1995, sec. 5.5) carried out early in drug development to gain insight

into within-subject pharmacokinetic processes of absorption, distribution, and elimination

governing concentrations of drug achieved. Twelve subjects were given the same oral dose

(mg/kg) of the anti-asthmatic agent theophylline, and blood samples drawn at several times

following administration were assayed for theophylline concentration. As ordinarily observed

in this context, the concentration proﬁles have a similar shape for all subjects; however, peak

concentration achieved, rise, and decay vary substantially. These diﬀerences are believed

2

to be attributable to inter-subject variation in the underlying pharmacokinetic processes,

understanding of which is critical for developing dosing guidelines.

To characterize these processes formally, it is routine represent the body by a simple

compartment model (e.g. Sheiner and Ludden 1991). For theophylline, a one-compartment

model is standard, and solution of the corresponding diﬀerential equations yields

C(t) =

Dk

a

V (k

a

−Cl/V )

_

exp(−k

a

t) −exp

_

−

Cl

V

t

__

, (1)

where C(t) is drug concentration at time t for a single subject following oral dose D at

t = 0. Here, k

a

is the fractional rate of absorption describing how drug is absorbed from

the gut into the bloodstream; V is roughly the volume required to account for all drug in

the body, reﬂecting the extent of drug distribution; and Cl is the clearance rate representing

the volume of blood from which drug is eliminated per unit time. Thus, the parameters

(k

a

, V, Cl) in (1) summarize the pharmacokinetic processes for a given subject.

More precisely stated, the goal is to determine, based on the observed proﬁles, mean

or median values of (k

a

, V, Cl) and how they vary in the population of subjects in order

to design repeated dosing regimens to maintain drug concentrations in a desired range. If

inter-subject variation in (k

a

, V, Cl) is large, it may be diﬃcult to design an “all-purpose”

regimen; however, if some of the variation is associated with subject characteristics such as

age or weight, this might be used to develop strategies tailored for certain subgroups.

Figure 2 goes here

HIV Dynamics. With the advent of assays capable of quantifying the concentration of

viral particles in the blood, monitoring of such “viral load” measurements is now a routine

feature of care of HIV-infected individuals. Figure 2 shows viral load-time proﬁles for ten

participants in AIDS Clinical Trial Group (ACTG) protocol 315 (Wu and Ding 1999) follow-

ing initiation of potent antiretroviral therapy. Characterizing mechanisms of virus-immune

system interaction that lead to such patterns of viral decay (and eventual rebound for many

subjects) enhances understanding of the progression of HIV disease.

Considerable recent interest has focused on representing within-subject mechanisms by a

system of diﬀerential equations whose parameters characterize rates of production, infection,

3

and death of immune system cells and viral production and clearance (Wu and Ding 1999, sec.

2). Under assumptions discussed by these authors, typical models for V (t), the concentration

of virus particles at time t following treatment initiation, are of the form

V (t) = P

1

exp(−λ

1

t) + P

2

exp(−λ

2

t), (2)

where (P

1

, λ

1

, P

2

, λ

2

) describe patterns of viral production and decay. Understanding the

“typical” values of these parameters, how they vary among subjects, and whether they are

associated with measures of disease status at initiation of therapy such as baseline viral load

(e.g., Notermans et al. 1998) may guide use of drugs in the anti-HIV arsenal. A complication

is that viral loads below the lower limit of detection of the assay are not quantiﬁable and

are usually coded as equal to the limit (100 copies/ml for ACTG 315).

Dairy Science. Inﬂammation of the mammary gland in dairy cattle has serious economic

implications for the dairy industry (Rodriguez-Zas, Gianola, and Shook 2000). Milk somatic

cell score (SCS) is a measure reﬂecting udder health during lactation, and elucidating pro-

cesses underlying its evolution may aid disease management. Rodriguez-Zas et al. (2000)

represent time-SCS proﬁles by nonlinear models whose parameters characterize these pro-

cesses. Rekaya et al. (2001) describe longitudinal patterns of milk yield by nonlinear models

involving quantities such as peak yield, time-to-peak-yield, and persistency (reﬂecting ca-

pacity to maintain milk production after peak yield). Understanding how these parameters

vary among cows and are related to cow-speciﬁc attributes is valuable for breeding purposes.

Forestry. Forest growth and yield and the impact of silvicultural treatments such as

herbicides and fertilizers are often evaluated on the basis of repeated measurements on per-

manent sample plots. Fang and Bailey (2001) and Hall and Bailey (2001) describe nonlinear

models for these measures as a function of time that depend on meaningful parameters such

as asymptotic growth or yield and rates of change. Objectives are to understand among-

plot/tree variation in these parameters and determine whether some of this variation is

associated with site preparation treatments, soil type, and tree density to monitor and pre-

dict changes in forest stands with diﬀerent attributes, and to predict growth and yield for

individual plots/trees to aid forest management decisions. Similarly, Gregoire and Schaben-

4

berger (1996ab) note that forest inventory practices rely on prediction of the volume of the

bole (trunk) of a standing tree to assess the extent of merchantable product available in trees

of a particular age using measurements on diameter-at-breast-height D and stem height H.

Based on data on D, H, and repeated measurements of cumulative bole volume at three-foot

intervals along the bole for several similarly-aged trees, they develop a nonlinear model to

be used for prediction whose parameters governing asymptote, growth rate, and shape vary

across trees depending on D and H; see also Davidian and Giltinan (1995, sec. 11.3).

Further Applications. The range of ﬁelds where nonlinear mixed models are used is vast.

We brieﬂy cite a few more applications in biological, agricultural, and environmental sciences.

In toxicokinetics (e.g., Gelman et al. 1996; Mezetti et al. 2003), physiologically-based

pharmacokinetic models, complex compartment models including organs and tissue groups,

describe concentrations in breath or blood of toxic substances as implicit solutions to a system

of diﬀerential equations involving meaningful physiological parameters. Knowledge of the

parameters aids understanding of mechanisms underlying toxic eﬀects. McRoberts, Brooks,

and Rogers (1998) represent size-age relationships for black bears via nonlinear models whose

parameters describe features of the growth processes. Morrell et al. (1995), Law, Taylor, and

Sandler (2002), and Pauler and Finkelstein (2002) characterize longitudinal prostate-speciﬁc

antigen trajectories in men by subject-speciﬁc nonlinear models; the latter two papers relate

underlying features of the proﬁles to cancer recurrence. M¨ uller and Rosner (1997) describe

patterns of white blood cell and granulocyte counts for cancer patients undergoing high-

dose chemotherapy by nonlinear models whose parameters characterize important features

of the proﬁles; knowledge of the relationship of these features to patient characteristics aids

evaluation of toxicity and prediction of hematologic proﬁles for future patients. Mikulich et

al. (2003) use nonlinear mixed models in analyses of data on circadian rhythms. Additional

applications are found in the literature in ﬁsheries science (Pilling, Kirkwood, and Walker

2002) and plant and soil sciences (Schabenberger and Pierce 2001).

Summary. These situations share several features: (i) repeated observations of a continu-

ous response on each of several individuals (e.g., subjects, plots, cows) over time or other con-

5

dition (e.g., intervals along a tree bole); (ii) variability in the relationship between response

and time or other condition across individuals, and (iii) availability of a scientiﬁcally-relevant

model characterizing individual behavior in terms of meaningful parameters that vary across

individuals and dictate variation in patterns of time-response. Objectives are to understand

the “typical” behavior of the phenomena represented by the parameters; the extent to which

the parameters, and hence these phenomena, vary across individuals; and whether some of

the variation is systematically associated with individual attributes. Individual-level predic-

tion may also be of interest. As we now describe, the nonlinear mixed eﬀects model is an

appropriate framework within which to formalize these objectives.

2.2 The Model

Basic model. We consider a basic version of the model here and in Section 3; extensions

are discussed in Section 4. Let y

ij

denote the jth measurement of the response, e.g., drug con-

centration or milk yield, under condition t

ij

, j = 1, . . . , n

i

, and possible additional conditions

u

i

. E.g., for theophylline, t

ij

is the time associated with the jth drug concentration for sub-

ject i following dose u

i

= D

i

at time zero. In many applications, t

ij

is time and u

i

is empty,

and we use the word “time” generically below. We write for brevity x

ij

= (t

ij

, u

i

), but note

dependence on t

ij

where appropriate. Assume that there may be a vector of characteristics

a

i

for each individual that do not change with time, e.g., age, weight, or diameter-at-breast-

height. Letting y

i

= (y

i1

, . . . , y

in

i

)

T

, it is ordinarily assumed that the triplets (y

i

, u

i

, a

i

)

are independent across i, reﬂecting the belief that individuals are “unrelated.” The usual

nonlinear mixed eﬀects model may then be written as a two-stage hierarchy as follows:

Stage 1: Individual-Level Model. y

ij

= f(x

ij

, β

i

) + e

ij

, j = 1, . . . , n

i

. (3)

In (3), f is a function governing within-individual behavior, such as (1)–(2), depending

on a (p × 1) vector of parameters β

i

speciﬁc to individual i. For example, in (1), β

i

=

(k

ai

, V

i

, Cl

i

)

T

= (β

1i

, β

2i

, β

3i

)

T

, where k

ai

, V

i

, and Cl

i

are absorption rate, volume, and clear-

ance for subject i. The intra-individual deviations e

ij

= y

ij

− f(x

ij

, β

i

) are assumed to

satisfy E(e

ij

|u

i

, β

i

) = 0 for all j; we say more about other properties of the e

ij

shortly.

Stage 2: Population Model. β

i

= d(a

i

, β, b

i

), i = 1, . . . , m, (4)

6

where d is a p-dimensional function depending on an (r × 1) vector of ﬁxed parameters,

or ﬁxed eﬀects, β and a (k × 1) vector of random eﬀects b

i

associated with individual i.

Here, (4) characterizes how elements of β

i

vary among individuals, due both to systematic

association with individual attributes in a

i

and to “unexplained” variation in the population

of individuals, e.g., natural, biological variation, represented by b

i

. The distribution of

the b

i

conditional on a

i

is usually taken not to depend on a

i

(i.e., the b

i

are independent

of the a

i

), with E(b

i

|a

i

) = E(b

i

) = 0 and var(b

i

|a

i

) = var(b

i

) = D. Here, D is an

unstructured covariance matrix that is the same for all i, and D characterizes the magnitude

of “unexplained” variation in the elements of β

i

and associations among them; a standard

such assumption is b

i

∼ N(0, D). We discuss this assumption further momentarily.

For instance, if a

i

= (w

i

, c

i

)

T

, where for subject i w

i

is weight (kg) and c

i

= 0 if creatinine

clearance is ≤ 50 ml/min, indicating impaired renal function, and c

i

= 1 otherwise, then for

a pharmacokinetic study under (1), an example of (4), with b

i

= (b

1i

, b

2i

, b

3i

)

T

, is

k

ai

= exp(β

1

+b

1i

), V

i

= exp(β

2

+β

3

w

i

+b

2i

), Cl

i

= exp(β

4

+β

5

w

i

+β

6

c

i

+β

7

w

i

c

i

+b

3i

). (5)

Model (5) enforces positivity of the pharmacokinetic parameters for each i. Moreover, if b

i

is

multivariate normal, k

ai

, Cl

i

, V

i

are each lognormally distributed in the population, consistent

with the widely-acknowledged phenomenon that these parameters have skewed population

distributions. Here, the assumption that the distribution of b

i

given a

i

does not depend on

a

i

corresponds to the belief that variation in the parameters “unexplained” by the systematic

relationships with w

i

and c

i

in (5) is the same regardless of weight or renal status, similar to

standard assumptions in ordinary regression modeling. For example, if b

i

∼ N(0, D), then

log Cl

i

is normal with variance D

33

, and thus Cl

i

is lognormal with coeﬃcient of variation

(CV) exp(D

33

) −1, neither of which depends on w

i

, c

i

. On the other hand, if this variation

is thought to be diﬀerent, the assumption may be relaxed by taking b

i

|a

i

∼ N{0, D(a

i

)},

where now the covariance matrix depends on a

i

. E.g., if the parameters are more variable

among subjects with normal renal function, one may assume D(a

i

) = D

0

(1 −c

i

) +D

1

c

i

, so

that the covariance matrix depends on a

i

through c

i

and equals D

0

in the subpopulation of

individuals with renal impairment and D

1

in that of healthy subjects.

7

In (5), each element of β

i

is taken to have an associated random eﬀect, reﬂecting the

belief that each component varies nonnegligibly in the population, even after systematic

relationships with subject characteristics are taken into account. In some settings, “unex-

plained” variation in one component of β

i

may be very small in magnitude relative to that

in others. It is common to approximate this by taking this component to have no associated

random eﬀect; e.g., in (5), specify instead V

i

= exp(β

2

+β

2

w

i

), which attributes all variation

in volumes across subjects to diﬀerences in weight. Usually, it is biologically implausible

for there to be no “unexplained” variation in the features represented by the parameters,

so one must recognize that such a speciﬁcation is adopted mainly to achieve parsimony and

numerical stability in ﬁtting rather than to reﬂect belief in perfect biological consistency

across individuals. Analyses in the literature to determine “whether elements of β

i

are ﬁxed

or random eﬀects” should be interpreted in this spirit.

A common special case of (4) is that of a linear relationship between β

i

and ﬁxed and

random eﬀects as in usual, empirical statistical linear modeling, i.e.,

β

i

= A

i

β +B

i

b

i

, (6)

where A

i

is a design matrix depending on elements of a

i

, and B

i

is a design matrix typically

involving only zeros and ones allowing some elements of β

i

to have no associated random

eﬀect. For example, consider the linear alternative to (5) given by

k

ai

= β

1

+ b

1i

, V

i

= β

2

+ β

3

w

i

, Cl

i

= β

4

+ β

5

w

i

+ β

6

c

i

+ β

7

w

i

c

i

+ b

3i

(7)

with b

i

= (b

1i

, b

3i

)

T

, which may be represented as in (6) with β = (β

1

, . . . , β

7

)

T

, and

A

i

=

_

_

_

_

_

_

1 0 0 0 0 0 0

0 1 w

i

0 0 0 0

0 0 0 1 w

i

c

i

w

i

c

i

_

_

_

_

_

_

, B

i

=

_

_

_

_

_

_

1 0

0 0

0 1

_

_

_

_

_

_

;

(7) takes V

i

to vary negligibly relative to k

ai

and Cl

i

. Alternatively, if V

i

= β

2

+ β

3

w

i

+ b

2i

,

so including “unexplained” variation in this parameter above and beyond that explained by

weight, b

i

= (b

1i

, b

2i

, b

3i

)

T

and B

i

= I

3

, where I

q

is a q-dimensional identity matrix.

If b

i

is multivariate normal, a linear speciﬁcation as in (7) may be unrealistic for appli-

cations where population distributions are thought to be skewed, as in pharmacokinetics.

8

Rather than adopt a nonlinear population model as in (5), a common alternative tactic is

to reparameterize the model f. For example, if (1) is represented in terms of parameters

(k

∗

a

, V

∗

, Cl

∗

)

T

= (log k

a

, log V, log Cl)

T

, then β

i

= (k

∗

ai

, V

∗

i

, Cl

∗

i

)

T

, and

k

∗

ai

= β

1

+ b

1i

, V

∗

i

= β

2

+ β

3

w

i

+ b

2i

, Cl

∗

i

= β

4

+ β

5

w

i

+ β

6

c

i

+ β

7

w

i

c

i

+ b

3i

(8)

is a linear population model in the same spirit as (5).

In summary, speciﬁcation of the population model (4) is made in accordance with usual

considerations for regression modeling and subject-matter knowledge. Taking A

i

= I

p

in

(6) with β = (β

1

, . . . , β

p

)

T

assumes E(β

i

) = β

**for = 1, . . . , p, which may be done in the
**

case where no individual covariates a

i

are available or as a part of a model-building exercise;

see Section 3.6. If the data arise according to a design involving ﬁxed factors, e.g., a fac-

torial experiment, A

i

may be chosen to reﬂect this for each component of β

i

in the usual way.

Within-individual variation. To complete the full nonlinear mixed model, a speciﬁcation

for variation within individuals is required. Considerations underlying this task are often

not elucidated in the applied literature, and there is some apparent confusion regarding

the notion of “within-individual correlation.” Thus, we discuss this feature in some detail,

focusing on phenomena taking place within a single individual i. Our discussion focuses on

model (3), but the same considerations are relevant for linear modeling.

Figure 3 goes here

A conceptual perspective on within-individual variation discussed by Diggle et al. (2001,

Ch. 5) and Verbeke and Molenberghs (2000, sec. 3.3) is depicted in Figure 3 in the con-

text of the one-compartment model (1) for theophylline, where y is concentration and t is

time following dose D. According to the individual model (3), E(y

ij

|u

i

, β

i

) = f(t

ij

, u

i

, β

i

),

so that f represents what happens “on average” for subject i, shown as the solid line in

Figure 3. A ﬁner interpretation of this may be appreciated as follows. Because f may not

capture all within-individual physiological processes, or because response may exhibit “local

ﬂuctuations” (e.g., drug may not be “perfectly mixed” within the body as the compartment

model assumes), when i is observed, the response proﬁle actually realized follows the dotted

line. In addition, as an assay is used to ascertain the value of the realized response at any

9

time t, measurement error may be introduced, so that the measured values y, represented by

the solid symbols, do not exactly equal the realized values. More precisely, then, f(t, u

i

, β

i

)

is the average over all possible realizations of actual concentration trajectory and measure-

ment errors that could arise if subject i is observed. The implication is that f(t, u

i

, β

i

) may

be viewed as the “inherent tendency” for i’s responses to evolve over time, and β

i

is an

“inherent characteristic” of i dictating this tendency. Hence, a fundamental principle un-

derlying nonlinear mixed eﬀects modeling is that such “inherent” properties of individuals,

rather than actual response realizations, are of central scientiﬁc interest.

Thus, the y

ij

observed by the analyst are each the sum of one realized proﬁle and one set

of measurement errors at intermittent time points t

ij

, formalized by writing (3) as

y

ij

= f(t

ij

, u

i

, β

i

) + e

R,ij

+ e

M,ij

, (9)

where e

ij

has been partitioned into deviations from f(t

ij

, u

i

, β

i

) due to the particular realiza-

tion observed, e

R,ij

, and possible measurement error, e

M,ij

, at each t

ij

. In (9), the “actual”

realized response at t

ij

, if it could be observed without error, is thus f(t

ij

, u

i

, β

i

) +e

R,ij

. We

may think of (9) as following from a within-subject stochastic process of the form

y

i

(t, u

i

) = f(t, u

i

, β

i

) + e

R,i

(t, u

i

) + e

M,i

(t, u

i

) (10)

with E{e

R,i

(t, u

i

)|u

i

, β

i

} = E{e

M,i

(t, u

i

)|u

i

, β

i

} = 0, where e

R,i

(t

ij

, u

i

) = e

R,ij

, e

M,i

(t

ij

, u

i

) =

e

M,ij

, and hence E(e

R,ij

|u

i

, β

i

) = E(e

M,ij

|u

i

, β

i

) = 0. The process e

M,i

(t, u

i

) is similar to

the “nugget” eﬀect in spatial models. Thus, characterizing within-individual variation cor-

responds to characterizing autocorrelation and variance functions (conditional on u

i

, β

i

)

describing the pattern of correlation and variation of realizations of e

R,i

(t, u

i

) and e

M,i

(t, u

i

)

and how they combine to produce an overall pattern of intra-individual variation.

First consider e

R,i

(t, u

i

). From Figure 3, two realizations (dotted line) at times close

together tend to occur “on the same side” of the “inherent” trajectory, implying that e

R,ij

and e

R,ij

for t

ij

, t

ij

“close” would tend to be positive or negative together. On the other

hand, realizations far apart bear little relation to one another. Thus, realizations over time

are likely to be positively correlated within an individual, with correlation “damping out” as

observations are more separated in time. This suggests models such as the exponential corre-

lation function corr{e

R,i

(t, u

i

), e

R,i

(s, u

i

)|u

i

, β

i

} = exp(−ρ|t −s|); other popular models are

10

described, for example, by Diggle et al. (2001, Ch. 5). If “ﬂuctuations” in the realization pro-

cess are believed to be of comparable magnitude over time, then var{e

R,i

(t, u

i

)|u

i

, β

i

} = σ

2

R

for all t. Alternatively, the variance of the realization process need not be constant over

time; e.g., if “actual” responses at any time within an individual are thought to have a

skewed distribution with constant CV σ

R

, then var{e

R,i

(t, u

i

)|u

i

, β

i

} = σ

2

R

f

2

(t, u

i

, β

i

). Let-

ting e

R,i

= (e

R,i1

, . . . , e

R,in

i

)

T

, under speciﬁc variance and autocorrelation functions, deﬁne

T

i

(u

i

, β

i

, δ) to be the (n

i

×n

i

) diagonal matrix with diagonal elements var(e

R,ij

|u

i

, β

i

) de-

pending on parameters δ, say, and Γ

i

(ρ) (n

i

×n

i

) with (j, j

) elements corr(e

R,ij

, e

R,ij

|u

i

, β

i

)

depending on parameters ρ, then

var(e

R,i

|u

i

, β

i

) = T

1/2

i

(u

i

, β

i

, δ)Γ

i

(ρ)T

1/2

i

(u

i

, β

i

, δ), (n

i

×n

i

), (11)

is the covariance matrix for e

R,i

. E.g., with constant CV σ

R

and exponential correlation,

(11) has (j, j

) element σ

2

R

f(t

ij

, u

i

, β

i

)f(t

ij

, u

i

, β

i

) exp(−ρ|t

ij

−t

ij

|), and δ = σ

2

R

, ρ = ρ.

The process e

M,i

(t, u

i

) characterizes error in measuring “actual” realized responses on i.

Often, the error committed by a measuring device at one time is unrelated to that at another;

e.g., samples are assayed separately. Thus, it is usually reasonable to assume (conditional) in-

dependence of instances of the measurement process, so that corr{e

M,i

(t, u

i

), e

M,i

(s, u

i

)|u

i

, β

i

} =

0 for all t > s. The form of var{e

M,i

(t, u

i

)|u

i

, β

i

} reﬂects the nature of error variation;

e.g., var{e

M,i

(t, u

i

)|u

i

, β

i

} = σ

2

M

implies this is constant regardless of the magnitude of the

response. Deﬁning e

M,i

= (e

M,i1

, . . . , e

M,in

i

)

T

, the covariance matrix of e

M,i

would thus

ordinarily be diagonal with diagonal elements var(e

M,ij

|u

i

, β

i

); i.e.,

var(e

M,i

|u

i

, β

i

) = Λ

i

(u

i

, β

i

, θ), (n

i

×n

i

), (12)

a diagonal matrix depending on a parameter θ. (12) may also depend on β

i

; an example is

given below. For constant measurement error variance, Λ

i

(u

i

, β

i

, θ) = σ

2

M

I

n

i

and θ = σ

2

M

.

A common assumption (e.g., Diggle et al. 2001, Ch. 5) is that the realization and

measurement error processes in (10) are conditionally independent, which implies

var(y

i

|u

i

, β

i

) = var(e

R,i

|u

i

, β

i

) + var(e

M,i

|u

i

, β

i

). (13)

If such independence were not thought to hold, var(y

i

|u

i

, β

i

) would also involve a conditional

covariance term. Thus, combining the foregoing considerations and adopting (13) as is

11

customary, a general representation of the components of within-subject variation is

var(y

i

|u

i

, β

i

) = T

1/2

i

(u

i

, β

i

, δ)Γ

i

(ρ)T

1/2

i

(u

i

, β

i

, δ) +Λ

i

(u

i

, β

i

, θ) = R

i

(u

i

, β

i

, ξ), (14)

where ξ = (δ

T

, ρ

T

, θ

T

)

T

fully describes the overall pattern of within-individual variation.

The representation (14) provides a framework for thinking about sources that contribute

to the overall pattern of within-individual variation. It is common in practice to adopt

models that are simpliﬁcations of (14). For example, measurement error may be nonexistent;

if response is age of a plant or tree and recorded exactly, e

M,i

(t, u

i

) may be eliminated from

the model altogether, so that (14) only involves T

i

and Γ

i

. In many instances, although

ideally e

R,i

(t, u

i

) may have non-zero autocorrelation function, the observation times t

ij

are

far enough apart relative to the “locality” of the process that correlation has “damped out”

suﬃciently between any t

ij

and t

ij

as to be virtually negligible. Thus, for the particular

times involved, Γ

i

(ρ) = I

n

i

in (11) may be a reasonable approximation.

In still other contexts, measurement error may be taken to be the primary source of vari-

ation about f. Karlsson, Beal, and Sheiner (1995) note that this is a common approximation

in pharmacokinetics. Although this is generally done without comment, a rationale for this

approximation may be deduced from the standpoint of (14). A well-known property of assays

used to quantify drug concentration is that errors in measurement are larger in magnitude the

larger the (true) concentration being measured. Thus, as the “actual” concentration in a sam-

ple at time t

ij

is f(t

ij

, u

i

, β

i

)+e

R,ij

, measurement errors e

M,ij

at t

ij

would be thought to vary

as a function this value. This would violate the usual assumption (13) that e

M,ij

is indepen-

dent of e

R,ij

. However, if the magnitude of “ﬂuctuations” represented by the e

R,ij

is “small”

relative to errors in measurement, variation in measurement errors at t

ij

may be thought

mainly to be a function of f(t

ij

, u

i

, β

i

). In fact, if var(e

R,ij

|u

i

, β

i

) << var(e

M,ij

|u

i

, β

i

), one

might regard the ﬁrst term in (14) as negligible. This leads to a standard model in this

application, where R

i

(u

i

, β

i

, ξ) = Λ

i

(u

i

, β

i

, θ), with Λ

i

(u

i

, β

i

, θ) a diagonal matrix with

diagonal elements σ

2

f

2θ

(t

ij

, u

i

, β

i

) for some θ = (σ

2

, θ)

T

; often θ = 1. Karlsson et al. (1995)

argue that this approximation, where eﬀectively e

R,i

(t, u

i

) is entirely disregarded, may be

optimistic, as serial correlation may be apparent. To demonstrate how considering (14) may

12

lead to a more realistic model, suppose that, although although var(e

R,ij

|u

i

, β

i

) may be small

relative to var(e

M,ij

|u

i

, β

i

), so that Λ

i

(u

i

, β

i

, θ) is reasonably modeled as above, e

R,i

(t, u

i

)

is not disregarded. As drug concentrations are positive, it may be appropriate to assume

that realized concentrations have a skewed distribution for each t and take T

i

(u

i

, β

i

, δ) to

have diagonal elements σ

2

R

f

2

(t

ij

, u

i

, β

i

). If θ = 1 and Γ

i

(ρ) is a correlation matrix, one is

led to R

i

(u

i

, β

i

, ξ) with diagonal elements (σ

2

R

+ σ

2

M

)f

2

(t

ij

, u

i

, β

i

) and (j, j

) oﬀ-diagonal

elements σ

2

R

f(t

ij

, u

i

, β

i

)f(t

ij

, u

i

, β

i

)Γ

i,jj

(ρ), similar to models studied by these authors.

In summary, in specifying a model for R

i

(u

i

, β

i

, ξ) in (14), the analyst must be guided by

subject-matter and practical considerations. As discussed below, accurate characterization

of R

i

(u

i

, β

i

, ξ) may be less critical when among-individual variation is dominant.

In this formulation of R

i

(u

i

, β

i

, ξ), the parameters ξ are common to all individuals. Just

as β

i

vary across individuals, in some contexts, parameters of the processes in (10) may

be thought to be individual-speciﬁc. For instance, if the same measuring device is used

for all subjects, var{e

M,i

(t, u

i

)|u

i

, β

i

) = σ

2

M

for all i is plausible; however, “ﬂuctuations”

may diﬀer in magnitude for diﬀerent subjects, suggesting, e.g., var{e

R,i

(t, u

i

)|u

i

, β

i

} = σ

2

R,i

,

and it is natural to think of σ

2

R,i

as depending on covariates and random eﬀects, as in (4).

Such speciﬁcations are possible although more diﬃcult (see Zeng and Davidian 1997 for an

example), but have not been widely used. If inter-individual variation in these parameters

is not large, postulating a common parameter may be a reasonable approximation.

The foregoing discussion tacitly assumes that f(t, u

i

, β

i

) is a correct speciﬁcation of

“inherent” individual behavior, with actual realizations ﬂuctuating about f(t, u

i

, β

i

). Thus,

deviations arising from e

R,i

(t, u

i

) in (10) are regarded as part of within-individual variation

and hence not of direct interest. An alternative view is that a component of these deviations

is due to misspeciﬁcation of f(t, u

i

, β

i

). E.g., in pharmacokinetics, compartment models used

to characterize the body are admittedly simplistic, so systematic deviations from f(t, u

i

, β

i

)

due to their failure to capture the true “inherent” trajectory are possible. If, for example,

other “local” variation is negligible, the process e

R,i

(t, u

i

) in (10) in fact reﬂects entirely

such misspeciﬁcation, and the true, “inherent trajectory” would be f(t, u

i

, β

i

) + e

R,i

(t, u

i

),

13

so that f(t, u

i

, β

i

) alone is a biased representation of the true trajectory. Here, e

R,i

(t, u

i

)

should be regarded as part of the “signal” rather than as “noise.” More generally, under

misspeciﬁcation, part of e

R,i

(t, u

i

) is due to such systematic deviations and part is due to

“ﬂuctuations,” but without knowledge of the exact nature of misspeciﬁcation, distinguishing

bias from within-individual variation makes within-individual covariance modeling a nearly

impossible challenge. In the particular context of nonlinear mixed eﬀects models, there are

also obvious implications for the meaning and relevance of β

i

under a misspeciﬁed model.

We do not pursue this further; however, it is vital to recognize that published applications

of nonlinear mixed eﬀects models are almost always predicated on correctness of f(t, u

i

, β

i

).

Summary. We are now in a position to summarize the basic nonlinear mixed eﬀects

model. Let f

i

(u

i

, β

i

) = {f(x

i1

, β

i

), . . . , f(x

in

i

, β

i

)}

T

, and let z

i

= (u

T

i

, a

T

i

)

T

summarize all

covariate information on subject i. Then, we may write the model in (3) and (4) succinctly as

Stage 1: Individual-Level Model.

E(y

i

|z

i

, b

i

) = f

i

(u

i

, β

i

) = f

i

(z

i

, β, b

i

), var(y

i

|z

i

, b

i

) = R

i

(u

i

, β

i

, ξ) = R

i

(z

i

, β, b

i

, ξ). (15)

Stage 2: Population Model. β

i

= d(a

i

, β, b

i

), b

i

∼ (0, D). (16)

In (15), dependence of f

i

and R

i

on the covariates a

i

and ﬁxed and random eﬀects through

β

i

is emphasized. This model represents individual behavior conditional on β

i

and hence

on b

i

, the random component in (16). In (16), we assume that the distribution of b

i

|a

i

does not depend on a

i

, so that all b

i

have common distribution with mean 0 and covariance

matrix D. We adopt this assumption in the sequel, as it is routine in the literature, but the

methods we discuss may be extended if it is relaxed in the manner described in Section 2.2.

“Within-individual correlation.” The nonlinear mixed model (15)–(16) implies a model

for the marginal mean and covariance matrix of y

i

given all covariates z

i

; i.e, averaged across

the population. Letting F

b

(b

i

) denote the cumulative distribution function of b

i

, we have

E(y

i

|z

i

) =

_

f

i

(z

i

, β, b

i

) dF

b

(b

i

), var(y

i

|z

i

) = E{R

i

(z

i

, β, b

i

, ξ)|z

i

} + var{f

i

(z

i

, β, b

i

)|z

i

}, (17)

where expectation and variance are with respect to the distribution of b

i

. In (17), E(y

i

|z

i

)

characterizes the “typical” response proﬁle among individuals with covariates z

i

. In the liter-

ature, var(y

i

|z

i

) is often referred to as the “within-subject covariance matrix;” however, this

14

is misleading. In particular, var(y

i

|z

i

) involves two terms: E{R

i

(z

i

, β, b

i

, ξ)|z

i

}, which av-

erages realization and measurement variation that occur within individuals across individuals

having covariates z

i

; and var{f

i

(z

i

, β, b

i

)|z

i

}, which describes how “inherent trajectories”

vary among individuals sharing the same z

i

. Note E{R

i

(z

i

, β, b

i

, ξ)|z

i

} is a diagonal ma-

trix only if Γ

i

(ρ) in (14), reﬂecting correlation due to within-individual realizations, is an

identity matrix. However, var{f

i

(z

i

, β, b

i

)|z

i

} has non-zero oﬀ-diagonal elements in general

due to common dependence of all elements of f

i

on b

i

. Thus, correlation at the marginal

level is always expected due to variation among individuals, while there is correlation from

within-individual sources only if serial associations among intra-individual realizations are

nonnegligible. In general, then, both terms contribute to the overall pattern of correlation

among responses on the same individual represented in var(y

i

|z

i

).

Thus, the terms “within-individual covariance” and “within-individual correlation” are

better reserved to refer to phenomena associated with the realization process e

R,i

(t, u

i

). We

prefer “aggregate correlation” to denote the overall population-averaged pattern of correla-

tion arising from both sources. It is important to recognize that within-individual variance

and correlation are relevant even if scope of inference is limited to a given individual only.

As noted by Diggle et al. (2001, Ch. 5) and Verbeke and Molenberghs (2000, sec. 3.3), in

many applications, the eﬀect of within-individual serial correlation reﬂected in the ﬁrst term

of var(y

i

|z

i

) is dominated by that from among-individual variation in var{f

i

(z

i

, β, b

i

)|z

i

}.

This explains why many published applications of nonlinear mixed models adopt simple,

diagonal models for R

i

(u

i

, β

i

, ξ) that emphasize measurement error. Davidian and Giltinan

(1995, secs. 5.2.4 and 11.3), suggest that, here, how one models within-individual correla-

tion, or, in fact, whether one improperly disregards it, may have inconsequential eﬀects on

inference. It is the responsibility of the data analyst to evaluate critically the rationale for

and consequences of adopting a simpliﬁed model in a particular application.

2.3 Inferential Objectives

We now state more precisely routine objectives of analyses based on the nonlinear mixed

eﬀects model, discussed at the end of Section 2.1. Implementation is discussed in Section 3.

15

Understanding the “typical” values of the parameters in f, how they vary across indi-

viduals in the population, and whether some of this variation is associated with individual

characteristics may be addressed through inference on the parameters β and D. The com-

ponents of β describe both the “typical values” and the strength of systematic relationships

between elements of β

i

and individual covariates a

i

. Often, the goal is to deduce an appro-

priate speciﬁcation d in (16); i.e., as in ordinary regression modeling, identify a parsimonious

functional form involving the elements of a

i

for which there is evidence of associations. In

most of the applications in Section 2.1, knowledge of which individual characteristics in a

i

are

“important” in this way has signiﬁcant practical implications. For example, in pharmacoki-

netics, understanding whether and to what extent weight, smoking behavior, renal status,

etc. are associated with drug clearance may dictate whether and how these factors must

be considered in dosing. Thus, an analysis may involve postulating and comparing several

such models to arrive at a ﬁnal speciﬁcation. Once a ﬁnal model is selected, inference on D

corresponding to the included random eﬀects provides information on the variation among

subjects not explained by the available covariates. If such variation is relatively large, it

may be diﬃcult to make statements that are generalizable even to particular subgroups with

certain covariate conﬁgurations. In HIV dynamics, for example, for patients with baseline

CD4 count, viral load, and prior treatment history in a speciﬁed range, if λ

2

in (2) character-

izing long-term viral decay varies considerably, the diﬃculty of establishing broad treatment

recommendations based only on these attributes will be highlighted, indicating the need for

further study of the population to identify additional, important attributes.

In many applications, an additional goal is to characterize behavior for speciﬁc individu-

als, so-called “individual-level prediction.” In the context of (15)–(16), this involves inference

on β

i

or functions such as f(t

0

, u

i

, β

i

) at a particular time t

0

. For instance, in pharmacoki-

netics, there is great interest in the potential for “individualized” dosing regimens based

on subject i’s own pharmacokinetic processes, characterized by β

i

. Simulated concentra-

tion proﬁles based on β

i

under diﬀerent regimens may inform strategies for i that maintain

desired levels. Of course, given suﬃcient data on i, inference on β

i

may in principle be

16

implemented via standard nonlinear model-ﬁtting techniques using i’s data only. However,

suﬃcient data may not be available, particularly for a new patient. The nonlinear mixed

model provides a framework that allows “borrowing” of information from similar subjects;

see Section 3.6. Even if n

i

is large enough to facilitate estimation of β

i

, as i is drawn from

a population of subjects, intuition suggests that it may be advantageous to exploit the fact

that i may have similar pharmacokinetic behavior to subjects with similar covariates.

2.4 “Subject-Specific” or “Population-Averaged?”

The nonlinear mixed eﬀects model (15)–(16) is a subject-speciﬁc (SS) model in what

is now standard terminology. As discussed by Davidian and Giltinan (1995, sec. 4.4),

the distinction between SS and population averaged (PA, or marginal) models may not be

important for linear mixed eﬀects models, but it is critical under nonlinearity, as we now

exhibit. A PA model assumes that interest focuses on parameters that describe, in our

notation, the marginal distribution of y

i

given covariates z

i

. From the discussion following

(17), if E(y

i

|z

i

) were modeled directly as a function of z

i

and a parameter β, β would

represent the parameter corresponding to the “typical (average) response proﬁle” among

individuals with covariates z

i

. This is to be contrasted with the meaning of β in (16) as the

“typical value” of individual-speciﬁc parameters β

i

in the population.

Consider ﬁrst linear such models. A linear SS model with second stage β

i

= A

i

β +B

i

b

i

as in (6), for design matrix A

i

depending on a

i

, and ﬁrst stage E(y

ij

|u

i

, β

i

) = U

i

β

i

, where

U

i

is a design matrix depending on the t

ij

and u

i

, leads to the linear mixed eﬀects model

E(y

i

|z

i

, b

i

) = f

i

(z

i

, β, b

i

) = X

i

β + Z

i

b

i

for X

i

= U

i

A

i

and Z

i

= U

i

B

i

, where X

i

thus

depends on z

i

. From (17), this model implies that

E(y

i

|z

i

) =

_

(X

i

β +Z

i

b

i

) dF

b

(b

i

) = X

i

β,

as E(b

i

) = 0. Thus, β in a linear SS model fully characterizes both the “typical value” of β

i

and the “typical response proﬁle,” so that either interpretation is valid. Here, then, postulat-

ing the linear SS model is equivalent to postulating a PA model of the form E(y

i

|z

i

) = X

i

β

directly in that both approaches yield the same representation of the marginal mean and

hence allow the same interpretation of β. Consequently, the distinction between SS and PA

17

approaches has not generally been of concern in the literature on linear modeling.

For nonlinear models, however, this is no longer the case. For deﬁniteness, suppose that

b

i

∼ N(0, D), and consider a SS model of the form in (15) and (16) for some function f

nonlinear in β

i

and hence in b

i

. Then, from (17), the implied marginal mean is

E(y

i

|z

i

) =

_

f

i

(z

i

, β, b

i

)p(b

i

; D)db

i

, (18)

where p(b

i

; D) is the N(0, D) density. For nonlinear f such as (1), this integral is clearly

intractable, and E(y

i

|z

i

) is a complicated expression, one that is not even available in a

closed form and evidently depends on both β and D in general. Consequently, if we start

with a nonlinear SS model, the implied PA marginal mean model involves both the “typical

value” of β

i

(β) and D. Accordingly, β does not fully characterize the “typical response

proﬁle” and thus cannot enjoy both interpretations. Conversely, if we were to take a PA

approach and model the marginal mean directly as a function of z

i

and a parameter β, β

would indeed have the interpretation of describing the “typical response proﬁle.” But it

seems unlikely that it could also have the interpretation as the “typical value” of individual-

speciﬁc parameters β

i

in a SS model; indeed, identifying a corresponding SS model for which

the integral in (18) turns out to be exactly the same function of z

i

and β in (16) and does not

depend on D seems an impossible challenge. Thus, for nonlinear models, the interpretation

of β in SS and PA models cannot be the same in general; see Heagerty (1999) for related

discussion. The implication is that the modeling approach must be carefully considered to

ensure that the interpretation of β coincides with the questions of scientiﬁc interest.

In applications like those in Section 2.1, the SS approach and its interpretation are clearly

more relevant, as a model to describe individual behavior like (1)–(2) is central to the sci-

entiﬁc objectives. The PA approach of modeling E(y

i

|z

i

) directly, where averaging over the

population has already taken place, does not facilitate incorporation of an individual-level

model. Moreover, using such a model for population-level behavior is inappropriate, par-

ticularly when it is derived from theoretical considerations. E.g., representing the average

of time-concentration proﬁles across subjects by the one-compartment model (1), although

perhaps giving an acceptable empirical characterization of the average, does not enjoy mean-

18

ingful subject-matter interpretation. Even when the “typical concentration proﬁle” E(y

i

|z

i

)

is of interest, Sheiner (2003, personal communication) argues that adopting a SS approach

and averaging the subject-level model across the population, as in (17), is preferable, as this

exploits the scientiﬁc assumptions about individual processes embedded in the model.

General statistical modeling of longitudinal data is often purely empirical in that there is

no “scientiﬁc” model. Rather, linear or logistic functions are used to approximate relation-

ships between continuous or discrete responses and covariates. The need to take into account

(aggregate) correlation among elements of y

i

is well-recognized, and both SS and PA models

are used. In SS generalized linear mixed eﬀects models, for which there is a large, parallel

literature (Breslow and Clayton 1993; Diggle et al. 2001, Ch. 9) within-individual correla-

tion is assumed negligible, and random eﬀects represent (among-individual) correlation and

generally do not correspond to “inherent” physical or mechanistic features as in “theoreti-

cal” nonlinear mixed models. In PA models, aggregate correlation is modeled directly. Here,

for nonlinear such empirical models like the logistic, the above discussion implies that the

choice between PA and SS approaches is also critical; the analyst must ensure that the inter-

pretation of the parameters matches the subject-matter objectives (interest in the “typical

response proﬁle” versus the “typical” value of individual characteristics).

From a historical perspective, pharmacokineticists were among the ﬁrst to develop in

nonlinear mixed eﬀects modeling in detail; see Sheiner, Rosenberg, and Marathe (1977).

3. IMPLEMENTATION AND INFERENCE

A number of inferential methods for the nonlinear mixed eﬀects model are now in common

use. We provide a brief overview, and refer the reader to the cited references for details.

3.1 The Likelihood

As in any statistical model, a natural starting point for inference is maximum likelihood.

This is a starting point here because the analytical intractability of likelihood inference has

motivated many approaches based on approximations; see Sections 3.2 and 3.3. Likelihood

is also a fundamental component of Bayesian inference, discussed in Section 3.5.

The individual model (15) along with an assumption on the distribution of y

i

given (z

i

, b

i

)

19

yields a conditional density p(y

i

|z

i

, b

i

; β, ξ), say; the ubiquitous choice is the normal. Under

the popular (although not always relevant) assumption that R

i

(z

i

, β, b

i

, ξ) is diagonal, the

density may be written as the product of m contributions p(y

ij

|z

i

, b

i

; β, ξ). Under this

condition, the lognormal has also been used. At Stage 2, (16), adopting independence of

b

i

and a

i

, one assumes a k-variate density p(b

i

; D) for b

i

. As with other mixed models,

normality is standard. With these speciﬁcations, the joint density of (y

i

, b

i

) given z

i

is

p(y

i

, b

i

|z

i

, ; β, ξ, D) = p(y

i

|z

i

, b

i

; β, ξ)p(b

i

; D). (19)

(If the distribution of b

i

given a

i

is taken to depend on a

i

, one would substitute for p(b

i

; D)

in (19) the assumed k-variate density p(b

i

|a

i

), which may depend on diﬀerent parameters

for diﬀerent values of a

i

.) A likelihood for β, ξ, D may be based on the joint density of the

observed data y

1

, . . . , y

m

given z

i

,

m

i=1

_

p(y

i

, b

i

|z

i

, ; β, ξ, D) db

i

=

m

i=1

_

p(y

i

|z

i

, b

i

; β, ξ)p(b

i

; D) db

i

(20)

by independence across i. Nonlinearity means that the m k-dimensional integrations in (20)

generally cannot be done in a closed form; thus, iterative algorithms to maximize (20) in

β, ξ, D require a way to handle these integrals. Although numerical techniques for evaluation

of an integral are available, these can be computationally expensive when performed at each

internal iteration of the algorithm. Hence, many approaches to ﬁtting (15)–(16) are instead

predicated on analytical approximations.

3.2 Methods Based on Individual Estimates

If the n

i

are suﬃciently large, an intuitive approach is to “summarize” the responses y

i

for

each i through individual-speciﬁc estimates

´

β

i

and then use these as the basis for inference

on β and D. In particular, viewing the conditional moments in (15) as functions of β

i

,

i.e., E(y

i

|u

i

, β

i

) = f(x

ij

, β

i

), var(y

i

|u

i

, β

i

) = R

i

(u

i

, β

i

, ξ), ﬁt the model speciﬁed by these

moments for each individual. If R

i

(u

i

, β

i

, ξ) is a matrix of the form σ

2

I

n

i

, as in ordinary

regression where the y

ij

are assumed (conditionally) independent, standard nonlinear OLS

may be used. More generally, methods incorporating estimation of within-individual variance

and correlation parameters ξ are needed so that intra-individual variation is taken into

appropriate account, as described by Davidian and Giltinan (1993; 1995, sec. 5.2).

20

Usual large-sample theory implies that the individual estimators

´

β

i

are asymptotically

normal. Each individual is treated separately, so the theory may be viewed as applying

conditionally on β

i

for each i, yielding

´

β

i

|u

i

, β

i

·

∼ N(β

i

, C

i

). Because of the nonlinearity

of f in β

i

, C

i

depends on β

i

in general, so C

i

is replaced by

´

C

i

in practice, where

´

β

i

is substituted. To see how this is exploited for inference on β and D, consider the linear

second-stage model (6); the same developments apply to any general d in (16) (Davidian

and Giltinan 1995, sec. 5.3.4). The asymptotic result may be expressed alternatively as

´

β

i

≈ β

i

+e

∗

i

= A

i

β +B

i

b

i

+e

∗

i

, e

∗

i

|z

i

·

∼ N(0,

´

C

i

), b

i

∼ N(0, D), (21)

where

´

C

i

is treated as known for each i, so that the e

∗

i

do not depend on b

i

. This has the form

of a linear mixed eﬀects model with known “error” covariance matrix

´

C

i

, which suggests

using standard techniques for ﬁtting such models to estimate β and D. Steimer et al. (1984)

propose use of the EM algorithm; the algorithm is given by Davidian and Giltinan (1995,

sec. 5.3.2) in the case k = p and B

i

= I

p

. Alternatively, it is possible to use linear mixed

model software such as SAS proc mixed (Littell et al. 1996) or the Splus/R function lme()

(Pinheiro and Bates 2000) to ﬁt (21). Because

´

C

i

is known and diﬀerent for each i, and the

software default is to assume var(e

∗

i

|z

i

) = σ

2

e

I

p

, say, this is simpliﬁed by “preprocessing”

as follows. If

´

C

−1/2

i

is the Cholesky decomposition of

´

C

−1

i

; i.e., an upper triangular matrix

satisfying

´

C

−1/2 T

i

´

C

−1/2

i

=

´

C

−1

i

, then

´

C

−1/2

i

´

C

i

´

C

−1/2 T

i

= I

p

, so that

´

C

−1/2

i

e

∗

i

has identity

covariance matrix. Premultiplying (21) by

´

C

−1/2

i

yields the “new” linear mixed model

´

C

−1/2

i

´

β

i

≈ (

´

C

−1/2

i

A

i

)β + (

´

C

−1/2

i

B

i

)b

i

+

i

,

i

∼ N(0, I

p

). (22)

Fitting (22) to the “data”

´

C

−1/2

i

´

β

i

with “design matrices”

´

C

−1/2

i

A

i

and

´

C

−1/2

i

B

i

and the

constraint σ

2

e

= 1 is then straightforward. For either this or the EM algorithm, valid ap-

proximate standard errors are obtained by treating (21) as exact.

A practical drawback of these methods is that no general-purpose software is available.

Splus/R programs implementing both approaches that require some intervention by the user

are available at http://www.stat.ncsu.edu/˜st762 info/.

If the

´

β

i

are viewed roughly as conditional “suﬃcient statistics” for the β

i

, this approach

may be thought of as approximating (20) with a change of variables to β

i

by replacing

21

p(y

i

|z

i

, b

i

; β, ξ) by p(

´

β

i

|u

i

, β

i

; β, ξ). In point of fact, as the asymptotic result is not pred-

icated on normality of y

i

|u

i

, β

i

, and because the estimating equations for β, D solved by

normal-based linear mixed model software lead to consistent estimators even if normality

does not hold, this approach does not require normality to achieve valid inference. However,

the n

i

must be large enough for the asymptotic approximation to be justiﬁed.

3.3 Methods Based on Approximation of the Likelihood

This last point is critical when the n

i

are not large. E.g., in population pharmacokinetics,

sparse, haphazard drug concentrations over intervals of repeated dosing are collected on a

large number of subjects along with numerous covariates a

i

. Although this provides rich

information for building population models d, there are insuﬃcient data to ﬁt the pharma-

cokinetic model f to any one subject (nor to assess its suitability). Implementation of (20)

in principle imposes no requirements on the magnitude of the n

i

. Thus, an attractive tactic

is instead to approximate (20) in a way that avoids intractable integration. In particular, for

each i, an approximation to p(y

i

|z

i

; β, ξ, D) =

_

p(y

i

|z

i

, b

i

; β, ξ)p(b

i

; D) db

i

is obtained.

First order methods. An approach usually attributed to Beal and Sheiner (1982) is

motivated by letting R

1/2

i

be the Cholesky decomposition of R

i

and writing (15)–(16) as

y

i

= f

i

(z

i

, β, b

i

) +R

1/2

i

(z

i

, β, b

i

, ξ)

i

,

i

|z

i

, b

i

∼ (0, I

n

i

). (23)

As nonlinearity in b

i

causes the diﬃculty for integration in (20), it is natural to consider a

linear approximation. A Taylor series of (23) about b

i

= 0 to linear terms, disregarding the

term involving b

i

i

as “small” and letting Z

i

(z

i

, β, b

∗

) = ∂/∂b

i

{f

i

(z

i

, β, b

i

)}|

b

i

=b

∗ leads to

y

i

≈ f

i

(z

i

, β, 0) +Z

i

(z

i

, β, 0)b

i

+R

1/2

i

(z

i

, β, 0, ξ)

i

, (24)

E(y

i

|z

i

) ≈ f

i

(z

i

, β, 0), var(y

i

|z

i

) ≈ Z

i

(z

i

, β, 0)DZ

T

i

(z

i

, β, 0) +R

i

(z

i

, β, 0, ξ). (25)

When p(y

i

|z

i

, b

i

; β, ξ) in (20) is a normal density, (24) amounts to approximating it by

another normal density whose mean and covariance matrix are linear in and free of b

i

,

respectively. If p(b

i

; D) is also normal, the integral is analytically calculable analogous to a

linear mixed model and yields a n

i

-variate normal density p(y

i

|z

i

; β, ξ, D) for each i with

mean and covariance matrix (25). This suggests the proposal of Beal and Sheiner (1982) to

22

estimate β, ξ, D by jointly maximizing

m

i=1

p(y

i

|z

i

; β, ξ, D), which is equivalent to maximum

likelihood under the assumption the marginal distribution y

i

|z

i

is normal with moments (25).

The advantage is that this “approximate likelihood” is available in a closed form. Standard

errors are obtained from the information matrix assuming the approximation is exact.

This approach is known as the fo (ﬁrst-order) method in the package nonmem (Boeck-

mann, Sheiner, and Beal 1992, http://www.globomax.net/products/nonmem.cfm) favored

by pharmacokineticists. It is also available in SAS proc nlmixed (SAS Institute 1999) via

the method=firo option. As (25) deﬁnes an approximate marginal mean and covariance

matrix for y

i

given z

i

, an alternative approach is to estimate β, ξ, D by solving generalized

estimating equations (GEEs) (Diggle et al. 2001, Ch. 8). Here, the mean is not of the “gen-

eralized linear model” type and the covariance matrix is not a “working” model but rather

an approximation to the true structure dictated by the nonlinear mixed model; however,

the GEE approach is broadly applicable to any marginal moment model. Implementation

is available in the SAS nlinmix macro (Littell et al. 1996, Ch. 12) with the expand=zero

option. The SAS nlinmix macro and proc nlmixed are distinct pieces of software. nlinmix

implements only the ﬁrst order approximate marginal moment model (25) using the GEE

approach and a related ﬁrst order conditional method, discussed momentarily. In contrast,

in addition to implementing (25) via the normal likelihood method above, nlmixed oﬀers

several, diﬀerent alternative approaches to “exact” likelihood inference, described shortly.

Vonesh and Chinchilli (1997, Chs. 8,9) and Davidian and Giltinan (1995, sec. 6.2.4)

describe the connections between GEEs and the approximate methods discussed in this

section. Brieﬂy, because the approximation to var(y

i

|z

i

) in (25) depends on β, maximizing

the approximate normal likelihood corresponds to what has been called a “GEE2” method,

which takes full account of this dependence. The ordinary GEE approach noted above is

an example of “GEE1;” here, the dependence of var(y

i

|z

i

) on β only plays a role in the

formation of “weighting matrices” for each data vector y

i

.

Vonesh and Carter (1992) and Davidian and Giltinan (1995, sec. 6.2.3) discuss further,

related methods. An obvious drawback of all ﬁrst order methods is that the approximation

23

may be poor, as they essentially replace E(y

i

|z

i

) =

_

f(z

i

, β, b

i

)p(b

i

; D) db

i

by f(z

i

, β, 0).

This suggests that more reﬁned approximation would be desirable.

First order conditional methods. As p(y

i

|z

i

, b

i

; β, ξ) and p(b

i

; D) are ordinarily normal

densities, a natural way to approximate integrals like those in (20) is to exploit Laplace’s

method, a standard technique to approximate an integral of the form

_

e

−(b)

db that follows

from a Taylor series expansion of −(b) about the value

´

b, say, maximizing (b). Wolﬁnger

and Lin (1997) provide references for this method and a sketch of steps involved in using

this approach to approximate (20) in a special case of (15)–(16); see also Wolﬁnger (1993)

and Vonesh (1996). The derivations of these authors assume that the within-individual

covariance matrix R

i

does not depend on β

i

and hence b

i

, which we write as R

i

(z

i

, β, ξ).

The result is that p(y

i

|z

i

; β, ξ, D) may be approximated by a normal density with

E(y

i

|z

i

) ≈ f

i

(z

i

, β,

´

b

i

) −Z

i

(z

i

, β,

´

b

i

)

´

b

i

, var(y

i

|z

i

) ≈ Z

i

(z

i

, β,

´

b

i

)DZ

T

i

(z

i

, β,

´

b

i

) +R

i

(z

i

, β, ξ), (26)

´

b

i

= DZ

T

i

(z

i

, β,

´

b

i

)R

i

(z

i

, β, ξ){y

i

−f

i

(z

i

, β,

´

b

i

)}, (27)

where Z

i

is deﬁned as before, and

´

b

i

maximizes (b

i

) = {y

i

−f

i

(z

i

, β, b

i

)}

T

R

−1

i

(z

i

, β, ξ){y

i

−

f

i

(z

i

, β, b

i

)} +b

T

i

Db

i

in b

i

. In fact,

´

b

i

maximizes in b

i

the posterior density for b

i

p(b

i

|y

i

, z

i

; β, ξ, D) =

p(y

i

|z

i

, b

i

; β, ξ)p(b

i

; D)

p(y

i

|z

i

; β, ξ, D)

. (28)

Lindstrom and Bates (1990) instead derive (26) by a Taylor series of (23) about b

i

=

´

b

i

.

Equations (26)–(27) suggest an iterative scheme whose essential steps are (i) given cur-

rent estimates

´

β,

´

ξ,

´

D and

´

b

i

, say, update

´

b

i

by substituting these in the right hand side

of (27); and (ii) holding

´

b

i

ﬁxed, update estimation of β, ξ, D based on the moments in

(26). Software is available implementing variations on this theme. The Splus/R function

nlme() (Pinheiro and Bates 2000) and the SAS macro nlinmix with the expand=eblup

option carry out step (ii) by a “GEE1” method. The nonmem package with the foce op-

tion instead uses a “GEE2” approach. Additional software packages geared to pharma-

cokinetic analysis also implement both this and the “ﬁrst order” approach; e.g., winnonmix

(http://www.pharsight.com/products/winnonmix) and nlmem (Galecki 1998). In all cases,

standard errors are obtained assuming the approximation is exactly correct.

24

In principle, the Laplace approximation is valid only if n

i

is large. However, Ko and

Davidian (2000) note that it should hold if the magnitude of intra-individual variation is

small relative to that among individuals, which is the case in many applications, even if n

i

are small. When R

i

depends on β

i

, the above argument no longer holds, as noted by Vonesh

(1996), but Ko and Davidian (2000) argue that is still valid approximately for “small” intra-

individual variation. Davidian and Giltinan (1995, sec. 6.3) present a two-step algorithm

incorporating dependence of R

i

on β

i

. The software packages above all handle this more

general case (e.g., Pinheiro and Bates 2000, sec. 5.2). In fact, the derivation of (26) involves

an additional approximation in which a “negligible” term is ignored (e.g., Wolﬁnger and Lin

1997, p. 472; Pinheiro and Bates 2000, p. 317). The nonmem laplacian method includes

this term and invokes Laplace’s method “as-is” in the case R

i

depends on b

i

.

It is well-documented by numerous authors that these “ﬁrst order conditional” approx-

imations work extremely well in general, even when n

i

are not large or the assumptions of

normality that dictate the form of (28) on which

´

b

i

is based are violated (e.g., Hartford and

Davidian 2000). These features and the availability of supported software have made this

approach probably the most popular way to implement nonlinear mixed models in practice.

Remarks. The methods in this section may be implemented for any n

i

. Although they in-

volve closed-form expressions for p(y

i

|z

i

; β, ξ, D) and moments (25) and (26), maximization

or solution of likelihoods or estimating equations can still be computationally challenging,

and selection of suitable starting values for the algorithms is essential. Results from ﬁrst

order methods may also be used as starting values for a more reﬁned “conditional” ﬁt. A

common practical strategy is to ﬁrst ﬁt a simpliﬁed version of the model and use the results

to suggest starting values for the intended analysis. For instance, one might take D to be

a diagonal matrix, which can often speed convergence of the algorithms; this implies the

elements of β

i

are uncorrelated in the population and hence the phenomena they represent

are unrelated, which is usually highly unrealistic. The analyst must bear in mind that failure

to achieve convergence in general is not valid justiﬁcation for adopting a model speciﬁcation

that is at odds with features dictated by the science.

25

3.4 Methods Based on the “Exact” Likelihood

The foregoing methods invoke analytical approximations to the likelihood (20) or ﬁrst

two moments of p(y

i

|z

i

; β, ξ, D). Alternatively, advances in computational power have

made routine implementation of “exact” likelihood inference feasible for practical use, where

“exact” refers to techniques where (20) is maximized directly using deterministic or stochastic

approximation to handle the integral. In contrast to an analytical approximation as in

Section 3.3, whose accuracy depends on the sample size n

i

, these approaches can be made

as accurate as desired at the expense of greater computational intensity.

When p(b

i

; D) is a normal density, numerical approximation of the integrals in (20)

may be achieved by Gauss-Hermite quadrature. This is a standard deterministic method

of approximating an integral by a weighted average of the integrand evaluated at suitably

chosen points over a grid, where accuracy increases with the number of grid points. As the

integrals over b

i

in (20) are k-dimensional, Pinheiro and Bates (1995, sec. 2.4) and Davidian

and Gallant (1993) demonstrate how to transform them into a series of one-dimensional

integrals, which simpliﬁes computation. As a grid is required in each dimension, the number

of evaluations grows quickly with k, increasing the computational burden of maximizing

the likelihood with the integrals so represented. Pinheiro and Bates (1995, 2000, Ch. 7)

propose an approach they refer to as adaptive Gaussian quadrature; here, the b

i

grid is

centered around

´

b

i

maximizing (28) and scaled in a way that leads to a great reduction in

the number of grid points required to achieve suitable accuracy. Use of one grid point in each

dimension reduces to a Laplace approximation as in Section 3.3. We refer the reader to these

references for details of these methods. Gauss-Hermite and adaptive Gaussian quadrature

are implemented in SAS proc nlmixed (SAS Institute 1999); the latter is the default method

for approximating the integrals, and the former is obtained via method=gauss noad.

Although the assumption of normal b

i

is commonplace, as in most mixed eﬀects model-

ing, it may be an unrealistic representation of true “unexplained” population variation. For

example, the population may be more prone to individuals with “unusual” parameter values

than indicated by the normal. Alternatively, the apparent distribution of the β

i

, even after

26

accounting for systematic relationships, may appear multimodal due to failure to take into

account an important covariate. These considerations have led several authors (e.g., Mallet

1986; Davidian and Gallant 1993) to place no or mild assumptions on the distribution of the

random eﬀects. The latter authors assume only that the density of b

i

is in a “smooth” class

that includes the normal but also skewed and multimodal densities. The density represented

in (20) by a truncated series expansion, where the degree of truncation controls the ﬂexi-

bility of the representation, and the density is estimated with other model parameters by

maximizing (20); see Davidian and Giltinan (1995, sec. 7.3). This approach is implemented

in the Fortran program nlmix (Davidian and Gallant 1992a), which uses Gauss-Hermite

quadrature to do the integrals and requires the user to write problem-speciﬁc code. Other

authors impose no assumptions and work with the β

i

directly, estimating their distribution

nonparametrically when maximizing the likelihood. Mallet (1986) shows that the resulting

estimate is discrete, so that integrations in the likelihood are straightforward. With covari-

ates a

i

, Mentr´e and Mallet 1994) consider nonparametric estimation of the joint distribution

of (β

i

, a

i

); see Davidian and Giltinan (1995, sec. 7.2). Software for pharmacokinetic analy-

sis implementing this type of approach via an EM algorithm (Schumitzky 1991) is available

at http://www.usc.edu/hsc/lab apk/software/uscpack.html. Methods similar in spirit

in a Bayesian framework (Section 3.5) have been proposed by M¨ uller and Rosner (1997).

Advantages of methods that relax distributional assumptions are potential insight on the

structure of the population provided by the estimated density or distribution and more re-

alistic inference on individuals (Section 3.6). Davidian and Gallant (1992) demonstrate how

this can be advantageous in the context of selection of covariates for inclusion in d.

Other approaches to “exact” likelihood are possible; e.g., Walker (1996) presents an EM

algorithm to maximize (20), where the “E-step” is carried out using Monte Carlo integration.

3.5 Methods Based on a Bayesian Formulation

The hierarchical structure of the nonlinear mixed eﬀects model makes it a natural candi-

date for Bayesian inference. Historically, a key impediment to implementation of Bayesian

analyses in complex statistical models was the intractability of the numerous integrations

27

required. However, vigorous development of Markov chain Monte Carlo (MCMC) techniques

to facilitate such integration in the early 1990s and new advances in computing power have

made such Bayesian analysis feasible. The nonlinear mixed model served as one of the ﬁrst

examples of this capability (e.g., Rosner and M¨ uller 1994; Wakeﬁeld et al. 1994). We provide

only a brief review of the salient features of Bayesian inference for (15)–(16); see Davidian

and Giltinan (1995, Ch. 8) for an introduction in this speciﬁc context and Carlin and Louis

(2000) for comprehensive general coverage of modern Bayesian analysis.

From the Bayesian perspective, β, ξ, D and b

i

, i = 1, . . . , m, are all regarded as ran-

dom vectors on an equal footing. Placing the model within a Bayesian framework requires

speciﬁcation of distributions for (15) and (16) and adoption of a third “hyperprior” stage

Stage 3: Hyperprior. (β, ξ, D) ∼ p(β, ξ, D). (29)

The hyperprior distribution is usually chosen to reﬂect weak prior knowledge, and typically

p(β, ξ, D) = p(β)p(ξ)p(D). Given such a full model (15), (16), and (29), Bayesian analysis

proceeds by identifying the posterior distributions induced; i.e., the marginal distributions

of β, ξ, D and the b

i

or β

i

given the observed data, upon which inference is based. Here,

because the b

i

are treated as “parameters,” they are ordinarily not “integrated out” as in the

foregoing frequentist approaches. Rather, writing the observed data as y = (y

T

1

, . . . , y

T

m

)

T

,

and deﬁning b and z similarly, the joint posterior density of (β, ξ, D, b) is given by

p(β, ξ, D, b|y, z) =

m

i=1

p(y

i

, b

i

|z

i

; β, ξ, D)p(β, ξ, D)

p(y|z)

, (30)

where p(y

i

, b

i

|z

i

; β, ξ, D) is given in (19), and the denominator follows from integration of

the numerator with respect to β, ξ, D, b. The marginals are then obtained by integration

of (30); e.g., the posterior for β is p(β|y, z), and an “estimate” of β is the mean or mode,

with uncertainty measured by spread of p(β|y, z). The integration involved is a daunting

analytical task (see Davidian and Giltinan 1995, p. 220).

MCMC techniques yield simulated samples from the relevant posterior distributions,

from which any desired feature, such as the mode, may then be approximated. Because

of the nonlinearity of f (and possibly d) in b

i

, generation of such simulations is more

complex than in simpler linear hierarchical models and must be tailored to the speciﬁc

28

problem in many instances (e.g. Wakeﬁeld 1996; Carlin and Louis 2000, sec. 7.3). This

complicates implementation via all-purpose software for Bayesian analysis such as WinBUGS

(http://www.mrc-bsu.cam.ac.uk/bugs/winbugs/contents.shtml). For pharmacokinetic

analysis, where certain compartment models are standard, a WinBUGS interface, PKBugs

(http://www.med.ic.ac.uk/divisions/60/pkbugs web/home.html), is available.

It is beyond our scope to provide a full account of Bayesian inference and MCMC im-

plementation. The work cited above and Wakeﬁeld (1996), M¨ uller and Rosner (1997), and

Rekaya et al. (2001) are only a few examples of detailed demonstrations in the context of

speciﬁc applications. With weak hyperprior speciﬁcations, inferences obtained by Bayesian

and frequentist methods agree in most instances; e.g., results of Davidian and Gallant (1992)

and Wakeﬁeld (1996) for a pharmacokinetic application are remarkably consistent.

A feature of the Bayesian framework that is particularly attractive when a “scientiﬁc”

model is the focus is that it provides a natural mechanism for incorporating known constraints

on values of model parameters and other subject-matter knowledge through the speciﬁcation

of suitable proper prior distributions. Gelman et al. (1996) demonstrate this capability in

the context of toxicokinetic modeling.

3.6 Individual Inference

As discussed in Section 2.3, elucidation of individual characteristics may be of interest.

Whether from a frequentist or Bayesian standpoint, the nonlinear mixed model assumes that

individuals are drawn from a population and thus share common features. The resulting

phenomenon of “borrowing strength” across individuals to inform inference on a randomly-

chosen such individual is often exhibited for the normal linear mixed eﬀects model (f linear

in b

i

) by showing that E(b

i

|y

i

, z

i

) can be written as linear combination of population- and

individual-level quantities (Davidian and Giltinan 1995, sec. 3.3; Carlin and Louis 2000, sec.

3.3). The spirit of this result carries over to general models and suggests using the posterior

distribution of the b

i

or β

i

for this purpose.

In particular, such posterior distributions are a by-product of MCMC implementation of

Bayesian inference for the nonlinear mixed model, so are immediately available. Alterna-

29

tively, from a frequentist standpoint, an analogous approach is to base inference on b

i

on the

mode or mean of the posterior distribution (28), where now β, ξ, D are regarded as ﬁxed. As

these parameters are unknown, it is natural to substitute estimates for them in (28). This

leads to what is known as empirical Bayes inference (e.g., Carlin and Louis 2000, Ch. 3).

Accordingly,

´

b

i

in (27) are often referred to as empirical Bayes “estimates.” For a general

second stage model (16), such “estimates” for β

i

are then obtained as

´

β

i

= d(a

i

,

´

β,

´

b

i

).

In both frequentist and Bayesian implementations, “estimates” of the b

i

are often ex-

ploited in an ad hoc fashion to assist with identiﬁcation of an appropriate second-stage

model d. Speciﬁcally, a common tactic is to ﬁt an initial model in which no covariates a

i

are included, such as β

i

= β +b

i

; obtain Bayes or empirical Bayes “estimates”

´

b

i

; and plot

the components of

´

b

i

against each element of a

i

. Apparent systematic patterns are taken to

reﬂect the need to include that element of a

i

in d and also suggest a possible functional form

for this dependence. Davidian and Gallant (1992) and Wakeﬁeld (1996) demonstrate this

approach in a speciﬁc application. Mandema, Verotta, and Sheiner (1992) use generalized

additive models to aid interpretation. Of course, such graphical techniques may be sup-

plemented by standard model selection techniques; e.g., likelihood ratio tests to distinguish

among nested such models or inspection of information criteria.

3.7 Summary

With a plethora of methods available for implementation, the analyst has a range of

options for nonlinear mixed model analysis. With sparse individual data (n

i

“small”), the

choice is limited to the approaches in Sections 3.3, 3.4, and 3.5. First order conditional

methods yield reliable inferences for both rich (n

i

“large”) and sparse data situations; this is

in contrast to their performance when applied to generalized linear mixed models for binary

data, where they can lead to unacceptable biases. Here, we can recommend them for most

practical applications. “Exact” likelihood and Bayesian methods in require more sophistica-

tion and commitment on the part of the user. If rich intra-individual data are available for all

i, in our experience, methods based on individual estimates (Section 3.6), are attractive, both

on the basis of performance and the ease with which they are explained to non-statisticians.

30

Demidenko (1997) has shown that these methods are equivalent asymptotically to ﬁrst-order

conditional methods when m and n

i

are large; see also Vonesh (2002).

Many of the issues that arise for linear mixed models carry over, at least approximately,

to the nonlinear case. For small samples, estimation of D and ξ may be poor, leading to

concern over the impact on estimation of β. Lindstrom and Bates (1990) and Pinheiro and

Bates (2000, sec. 7.2.1) propose approximate “restricted maximum likelihood” estimation of

these parameters in the context of ﬁrst order conditional methods. Moreover, the reliability

of standard errors for estimators for β may be poor in small samples in part due to failure

of the approximate formulæ to take adequate account of uncertainty in estimating D and

ξ. The issue of testing whether all elements of β

i

should include associated random eﬀects,

discussed in Section 2.2, involves the same considerations as those arising for inference on

variance components in linear mixed models; e.g., a null hypothesis that a diagonal element

of D, representing the variance of the corresponding random eﬀect, is equal to zero, is on

the boundary of the allowable parameter space for variances. Under these conditions, it is

well-known for the linear mixed model that the usual test statistics do not have standard

null sampling distributions. E.g., the approximate distribution of the likelihood ratio test

statistic is a mixture of chi-squares; see, for example, Verbeke and Molenberghs (2000, sec.

6.3.4). In the nonlinear case, the same issues apply to “exact” or approximate likelihood

(under the ﬁrst order or ﬁrst order conditional approaches) inference.

4. EXTENSIONS AND RECENT DEVELOPMENTS

The end of the twentieth century and beginning of the twenty-ﬁrst saw an explosion of

research on nonlinear mixed models. We cannot hope to do this vast literature justice, so

note only selected highlights. The cited references should be consulted for details.

New approximations and computation. Vonesh et al. (2002) propose a higher-order con-

ditional approximation to the likelihood than that obtained by the Laplace approximation

and show that this method can achieve gains in eﬃciency over ﬁrst order conditional methods

and approach the performance of “exact” maximum likelihood. These authors also establish

large-m/large-n

i

theoretical properties. Raudenbush et al. (2000) discuss use of a sixth-order

31

Laplace-type approximation and Clarkson and Zhan (2002) apply so-called spherical-radial

integration methods (Monahan and Genz 1997) for deterministic and stochastic approxima-

tion of an integral based on a transformation of variables, both in the context of generalized

linear mixed models. These methods could also be applied to the likelihood (20).

Multilevel models and inter-occasion variation. In some settings, individuals may be

observed longitudinally over more than one “occasion.” One example is in pharmacokinetics,

where concentration measurements may be taken over several distinct time intervals following

diﬀerent doses. In each interval, covariates such as enzyme levels, weight, or measures of renal

function may also be obtained and are thus “time-dependent” in the sense that their values

change across intervals for each subject. If pharmacokinetic parameters such as clearance

and volume of distribution are associated with such covariates, then it is natural to expect

their values to change with changing covariate values. If there are q dosing intervals I

h

,

say, h = 1, . . . , q, then this may be represented by modifying (16) to allow the individual

parameters to change; i.e., write β

ij

= d(a

ih

, β, b

i

) to denote the value of the parameters at

t

ij

when t

ij

∈ I

h

, where a

ih

gives the values of the covariates during I

h

. This assumes that

such “inter-occasion” variation is entirely attributable to changes in covariates.

Similarly, Karlsson and Sheiner (1993) note that pharmacokinetic behavior may vary nat-

urally over time. Thus, even without changing covariates, if subjects are observed over several

dosing intervals, parameters values may ﬂuctuate. This is accommodated by a second-stage

model with nested random eﬀects for individual and interval-within-individual. Again letting

β

ij

denote the value of the parameters when t

ij

∈ I

h

, modify (16) to β

ij

= d(a

i

, β, b

i

, b

ih

),

where now b

i

and b

ih

are independent with means zero and covariance matrices D and G,

say. See Pinheiro and Bates (2000 sec. 7.1.2) for a discussion of such multilevel models. Lev-

els of nested random eﬀects are also natural in other settings. Hall and Bailey (2001) and

Hall and Clutter (2003) discuss studies in forestry where longitudinal measures of yield or

growth may be measured on each tree within a plot. Similarly, Rekaya et al. (2001) consider

milk yield data where each cow is observed longitudinally during it ﬁrst three lactations.

Multivariate response. Often, more than one response measurement may be taken longi-

32

tudinally on each individual. A key example is again from pharmacology, where both drug

concentrations and measures of some physiological response are collected over time on the

same individual. The goal is to develop a joint pharmacokinetic/pharmacodynamic model,

where a pharmacodynamic model for the relationship between concentration and response

is postulated in terms of subject-speciﬁc parameters and linked to a pharmacokinetic model

for the time-concentration relationship. This yields a version of (15) where responses of

each type are “stacked” and depend on random eﬀects corresponding to parameters in each

model, which are in turn taken to be correlated in (16). Examples are given by Davidian

and Giltinan (1995, sec. 9.5) and Bennett and Wakeﬁeld (2001).

Motivated by studies of timber growth and yield in forestry, where multiple such measures

are collected, Hall and Clutter (2003) extend the basic model and ﬁrst order conditional

ﬁtting methods to handle both multivariate response and multiple levels of nested eﬀects.

Mismeasured/missing covariates and censored response. As in any statistical modeling

context, missing, mismeasured, and censored data may arise. Wang and Davidian (1996)

study the implications for inference when the observation times for each individual are

recorded incorrectly. Ko and Davidian (2000) develop ﬁrst order conditional methods ap-

plicable when components of a

i

are measured with error. An approach to take appropriate

account of censored responses due to a lower limit of detection as in the HIV dynamics setting

in Section 2.1 is proposed by Wu (2002). Wu also extends the model to handle mismeasured

and missing covariates a

i

. Wu and Wu (2002b) propose a multiple imputation approach to

accommodate missing covariates a

i

.

Semiparametric models. Ke and Wang (2001) propose a generalization of the nonlinear

mixed eﬀects model where the model f is allowed to depend on a completely unspeciﬁed

function of time and elements of β

i

. The authors suggest that the model provides ﬂexibility

for accommodating possible model misspeciﬁcation and may be used as a diagnostic tool for

assessing the form of time dependence in a fully parametric nonlinear mixed model. Li et al.

(2002) study related methods in the context of pharmacokinetic analysis. Lindstrom (1995)

develops methods for nonparametric modeling of longitudinal proﬁles that involve random

33

eﬀects and may be ﬁtted using standard nonlinear mixed model techniques.

Other topics. Methods for model selection and determination are studied by Vonesh,

Chinchilli, and Pu (1996), Dey, Chen, and Chang (1997), and Wu and Wu (2002a). Young,

Zerbe, and Hay (1997) propose conﬁdence intervals for ratios of components of the ﬁxed

eﬀects β. Oberg and Davidian (2000) propose methods for estimating a parametric transfor-

mation of the response under which the Stage 1 conditional density is normal with constant

variance. Concordet and Nunez (2000) and Chu et al. (2001) discuss interesting applications

in veterinary science involving calibration and prediction problems. Methods for combining

data from several studies where each is represented by a nonlinear mixed model are dis-

cussed by Wakeﬁeld and Rahman (2000) and Lopes, M¨ uller, and Rosner (2003). Yeap and

Davidian (2001) propose “robust” methods for accommodating “outlying” responses within

individual or “outlying” individuals. Lai and Shih (2003) develop alternative methods to

those of Mentr´e and Mallet (1994) cited in Section 3.4 that do not require consideration of

the joint distribution of β

i

and a

i

. For a model of the form (16), the distribution of the b

i

is left completely unspeciﬁed and is estimated nonparametrically; these authors also derive

large-sample properties of the approach.

Pharmaceutical applications. A topic that has generated great recent interest in the phar-

maceutical industry is so-called “clinical trial simulation.” Here, a hypothetical population

is simulated to which the analyst may apply diﬀerent drug regimens according to diﬀerent

designs to evaluate potential study outcomes. Nonlinear mixed models are at the heart of

this enterprise; subjects are simulated from mixed models for pharmacokinetic and pharma-

codynamic behavior that incorporate variation due to covariates and “unexplained” sources

thought to be present. See, for example, http://www.pharsight.com/products/trial simulator.

More generally, nonlinear mixed eﬀects modeling techniques have been advocated for popula-

tion pharmacokinetic analysis in a guidance issued by the U.S. Food and Drug Administration

(http://www.fda.gov/cder/guidance/1852fnl.pdf).

5. DISCUSSION

This review of nonlinear mixed eﬀects modeling is of necessity incomplete, as it is beyond

34

the limits of a single article to document fully the extensive literature. We have chosen to

focus much of our attention on revisiting the considerations underlying the basic model from

an updated standpoint, and we hope that this will oﬀer readers familiar with the topic addi-

tional insight and provide those new to the model a foundation for appreciating its rationale

and utility. We have not presented an analysis of a speciﬁc application; the references cited

in Section 2.1 and Clayton et al. (2003) and Yeap et al. (2003) in this issue oﬀer detailed

demonstrations of the formulation, implementation, and interpretation of nonlinear mixed

models in practice. We look forward to continuing methodological developments for and new

applications of this rich class of models in the statistical and subject-matter literature.

ACKNOWLEDGMENTS

This research was partially supported by NIH grants R01-CA085848 and R37-AI031789.

REFERENCES

Beal, S.L., and Sheiner, L.B. (1982), “Estimating Population Pharmacokinetics,” CRC Critical

Reviews in Biomedical Engineering, 8, 195–222.

Bennett, J., and Wakeﬁeld, J. (2001), “Errors-in-Variables in Joint Population Pharmacokinet-

ic/Pharmacodynamic Modeling,” Biometrics, 57, 803–812.

Boeckmann, A.J., Sheiner, L.B., and Beal S.L. (1992), NONMEM User’s Guide, Part V, Intro-

ductory Guide, San Francisco: University of California.

Breslow, N.E., and Clayton, D.G. (1993), “Approximate Inference in Generalized Linear Mixed

Models,” Journal of the American Statistical Association, 88, 9–25.

Carlin, B.P., and Louis, T.A. (2000), Bayes and Empirical Bayes Methods for Data Analysis,

Second Edition, New York: Chapman and Hall/CRC Press.

Chu, K.K., Wang, N.Y., Stanley, S., and Cohen, N.D. (2001), “Statistical Evaluation of the Reg-

ulatory Guidelines for Use of Furosemide in Race Horses,” Biometrics, 57, 294–301.

Clarkson, D.B., and Zhan, Y.H. (2002), “Using Spherical-Radial Quadrature to Fit Generalized

Linear Mixed Eﬀects Models,” Journal of Computational and Graphical Statistics, 11, 639–659.

Clayton, C.A., T.B. Starr, R.L. Sielken, Jr., R.L. Williams, P.G. Pontal, A.J. Tobia. (2003), ‘Using

a Non-linear Mixed Eﬀects Model to Characterize Cholinesterase Activity in Rats Exposed to

Aldicarb,” Journal of Agricultural, Biological, and Environmental Statistics, 8, ????-????.

Concordet, D., and Nunez, O.G. (2000), “Calibration for Nonlinear Mixed Eﬀects Models: An

Application to the Withdrawal Time Prediction,” Biometrics, 56, 1040–1046.

Davidian, M., and Gallant, A. R. (1992a). Nlmix: A program for maximum likelihood estimation

of the nonlinear mixed eﬀects model with a smooth random eﬀects density. Department of

Statistics, North Carolina State University.

Davidian, M., and Gallant, A.R. (1992b), “Smooth Nonparametric Maximum Likelihood Estima-

tion for Population Pharmacokinetics, With Application to Quinidine,” Journal of Pharmacoki-

netics and Biopharmaceutics, 20, 529–556.

35

Davidian, M., and Gallant, A.R. (1993), “The Nonlinear Mixed Eﬀects Model With a Smooth

Random Eﬀects Density,” Biometrika, 80, 475–488.

Davidian, M., and Giltinan, D.M. (1993), “Some simple methods for estimating intra-individual

variability in nonlinear mixed eﬀects models,” Biometrics, 49, 59–73.

Davidian, M., and Giltinan, D.M. (1995), Nonlinear Models for Repeated Measurement Data, New

York: Chapman and Hall.

Demidenko, E. (1997), “Asymptotic Properties of Nonlinear Mixed Eﬀects Models,” in Modeling

Longitudinal and Spatially Correlated Data: Methods, Applications, and Future Directions, eds.,

T.G. Gregoire, D.R., Brillinger, P.J. Diggle, E. Russek-Cohen, W.G. Warren, and R.D. Wolﬁnger,

New York: Springer.

Dey, D.K., Chen, M.H., and Chang, H. (1997), “Bayesian Approach for Nonlinear Random Eﬀects

Models,” Biometrics, 53, 1239–1252.

Diggle, P.J., Heagerty, P., Liang, K.-Y., and Zeger, S.L. (2001), Analysis of Longitudinal Data,

Second Edition, Oxford: Oxford University Press.

Hall, D.B., and Bailey, R.L. (2001), “Modeling and Prediction of Forest Growth Variables Based

on Multilevel Nonlinear Mixed Models,” Forest Science, 47, 311–321.

Hall, D.B., and Clutter, M. (2003), “Multivariate multilevel nonlinear mixed eﬀects models for

timber yield predictions. ” Biometrics, in press.

Hartford, A., and Davidian, M. (2000), “Consequences of Misspecifying Assumptions in Nonlinear

Mixed Eﬀects Models,” Computational Statistics and Data Analysis, 34, 139–164.

Heagerty, P. (1999), “Marginally Speciﬁed Logistic-Normal Models for Longitudinal Binary Data,”

Biometrics, 55, 688–698.

Fang, Z. and Bailey, R.L. (2001), “Nonlinear Mixed Eﬀects Modeling for Slash Pine Dominant

Height Growth Following Intensive Silvicultural Treatments,” Forest Science, 47, 287–300.

Galecki, A.T. (1998), “NLMEM: A New SAS/IML Macro for Hierarchical Nonlinear Models,”

Computer Methods and Programs in Biomedicine, 55, 207–216.

Gelman, A., Bois, F., and Jiang, L.M. (1996), “Physiological Pharmacokinetic Analysis Using

Population Modeling and Informative Prior Distributions,”Journal of the American Statistical

Association, 91, 1400–1412.

Gregoire, T.G., and Schabenberger, O. (1996a), “Nonlinear Mixed-Eﬀects Modeling of Cumulative

Bole Volume With Spatially-Correlated Within-Tree Data,” Journal of Agricultural, Biological,

and Environmental Statistics, 1, 107–119.

Gregoire, T.G., and Schabenberger, O. (1996b), “A Non-Linear Mixed-Eﬀects Model to Predict

Cumulative Bole Volume of Standing Trees,” Journal of Applied Statistics, 23, 257–271.

Karlsson, M.O., Beal, S.L., and Sheiner, L.B. (1995), “Three New Residual Error Models for

Population PK/PD Analyses,” Journal of Pharmacokinetics and Biopharmaceutics, 23, 651–672.

Karlsson, M.O., and Sheiner, L.B. (1993), “The Importance of Modeling Inter-Occasion Variability

in Population Pharmacokinetic Analyses,” Journal of Pharmacokinetics and Biopharmaceutics,

21, 735–750.

Ke, C. and Wang, Y. (2001), “Semiparametric Nonlinear Mixed Models and Their Applications,”

Journal of the American Statistical Association, 96, 1272–1298.

Ko, H.J., and Davidian, M. (2000), “Correcting for Measurement Error in Individual-Level Co-

variates in Nonlinear Mixed Eﬀects Models,” Biometrics, 56, 368–375.

Lai, T.L., and Shih, M.-C. (2003), “Nonparametric Estimation in Nonlinear Mixed Eﬀects Models,”

Biometrika, 90, 1–13.

36

Law, N.J., Taylor, J.M.G., and Sandler, H. (2002), “The Joint Modeling of a Longitudinal Disease

Progression Marker and the Failure Time Process in the Presence of a Cure,” Biostatistics, 3,

547–563.

Li, L., Brown, M.B., Lee, K.H., and Gupta, S. (2002), “Estimation and Inference for a Spline-

Enhanced Population Pharmacokinetic Model,” Biometrics, 58, 601–611.

Lindstrom, M.J. (1995), “Self-Modeling With Random Shift and Scale Parameters and a Free-Knot

Spline Shape Function,” Statistics in Medicine, 14, 2009–2021.

Lindstrom, M.J., and Bates, D.M. (1990), “Nonlinear Mixed Eﬀects Models for Repeated Measures

Data,” Biometrics, 46, 673–687.

Littell, R.C., Milliken, G.A., Stroup, W.W., and Wolﬁnger, R.D. (1996), SAS System for Mixed

Models, Cary NC: SAS Institute Inc.

Lopes, H.F., M¨ uller, P., and Rosner, G.L. (2003), “Bayesian meta-analysis for longitudinal data

models using multivariate mixture priors, Biometrics, 59, 66–75.

Mallet, A. (1986), “A Maximum Likelihood Estimation Method for Random Coeﬃcient Regression

Models,” Biometrika, 73, 645–656.

Mandema, J.W., Verotta, D., and Sheiner, L.B., (1992), “Building Population Pharmacokinetic/Phar-

macodynamic Models,” Journal of Pharmacokinetics and Biopharmaceutics, 20, 511–529.

McRoberts, R.E., Brooks, R.T., and Rogers, L.L. (1998), “Using Nonlinear Mixed Eﬀects Models

to Estimate Size-Age Relationships for Black Bears,” Canadian Journal of Zoology, 76, 1098–

1106.

Mentr´e, F., and Mallet, A. (1994), “Handling Covariates in Population Pharmacokinetics,” Inter-

national Journal of Biomedical Computing, 36, 25–33.

Mezzetti, M., Ibrahim, J,.G., Bois, F.Y., Ryan, L.M., Ngo, L., and Smith, T.J. (2003), “A Bayesian

Compartmental Model for the Evaluation of 1,3-Butadiene Metabolism,” Applied Statistics, 52,

291–305.

Mikulich, S.K., Zerbe, G.O., Jones, R.H., and Crowley, T.J. (2003), “Comparing Linear and

Nonlinear Mixed Model Approaches to Cosinor Analysis,” Statistics in Medicine, 22, 3195–3211.

Monahan, J., and Genz, A. (1997), “Spherical-Radial Integration Rules for Bayesian Computa-

tion,” Journal of the American Statistical Association, 92, 664–674.

Morrell, C.H., Pearson, J.D., Carter, H.B., and Brant, L.J. (1995), “Estimating Unknown Tran-

sition Times Using a Piecewise Nonlinear Mixed-Eﬀects Model in Men With Prostate Cancer,”

Journal of the American Statistical Association, 90, 45–53.

M¨ uller, P., and Rosner, G.L. (1997), “A Bayesian Population Model With Hierarchical Mixture

Priors Applied to Blood Count Data,” Journal of the American Statistical Association, 92, 1279–

1292.

Notermans, D.W., Goudsmit, J., Danner, S.A., de Wolf, F., Perelson, A.S., and Mittler, J. (1998).

Rate of HIV-1 decline following antiretroviral therapy is related to viral load at baseline and drug

regimen. AIDS 12, 1483–1490.

Oberg, A., and Davidian, M. (2000), “Estimating Data Transformations in Nonlinear Mixed Eﬀects

Models,” Biometrics, 56, 65–72.

Pauler, D. and Finkelstein, D. (2002), “Predicting Time to Prostate Cancer Recurrence Based on

Joint Models for Non-linear Longitudinal Biomarkers and Event Time,” Statistics in Medicine,

21, 3897–3911.

Pilling, G.M., Kirkwood, G.P., and Walker, S.G. (2002), “An Improved Method for Estimating

Individual Growth Variability in Fish, and the Correlation Between von Bertalanﬀy Growth

Parameters,” Canadian Journal of Fisheries and Aquatic Sciences, 59, 424–432.

37

Pinheiro, J.C., and Bates, D.M. (1995), “Approximations to the Log-Likelihood Function in the

Nonlinear Mixed-Eﬀects Model,” Journal of Computational and Graphical Statistics, 4, 12–35.

Pinheiro, J.C., and Bates, D.M. (2000), Mixed-Eﬀects Models in S and Splus, New York: Springer.

SAS Institute (1999), PROC NLMIXED, SAS OnlineDoc, Version 8, Cary, NC: SAS Institute

Inc.

Schumitzky, A. (1991), “Nonparametric EM Algorithms for Estimating Prior Distributions,” Ap-

plied Mathematics and Computation, 45, 143–157

Steimer, J.L., Mallet, A., Golmard, J.L., and Boisvieux, J.F. (1984), “Alternative Approaches to

Estimation of Population Pharmacokinetic Parameters: Comparison with the Nonlinear Mixed

Eﬀect Model,” Drug Metabolism Reviews, 15, 265–292.

Raudenbush, S.W., Yang, M.L., and Yosef, M. (2000), “Maximum Likelihood for Generalized Lin-

ear Models With Nested Random Eﬀects Via High-Order, Multivariate Laplace Approximation,”

Journal of Computational and Graphical Statistics, 9, 141–157.

Rekaya, R., Weigel, K.A., and Gianola, D. (2001), “Hierarchical Nonlinear Model for Persistency of

Milk Yield in the First Three Lactations of Holsteins,” Lifestock Production Science, 68, 181–187.

Rodriguez-Zas, S.L., Gianola, D., and Shook, G.E. (2000), “Evaluation of models for somatic cell

score lactation patterns in Holsteins,” Lifestock Production Science, 67, 19–30.

Rosner, G.L, and M¨ uller, P. (1994), “Pharmacokinetic/Pharmacodynamic Analysis of Hematologic

Proﬁles,” Journal of of Pharmacokinetics and Biopharmaceutics, 22, 499–524.

Schabenberger, O., and Pierce, F.J. (2002), Contemporary Statistical Models for the Plant and Soil

Sciences, New York: CRC Press.

Sheiner, L.B., and Ludden, T.M. (1992), “Population pharmacokinetics/pharmacodynamics,” An-

nual Review of Pharmacological Toxicology, 32, 185–209.

Sheiner, L.B., Rosenberg, B., and Marathe, V.V. (1977), “Estimation of Population Characteristics

of Population Pharmacokinetic Parameters From Routine Clinical Data,” Journal of Pharma-

cokinetics and Biopharmaceutics, 8, 635–651.

Verbeke, G., and Molenberghs, G. (2000), Linear Mixed Models for Longitudinal Data, New York:

Springer.

Vonesh, E.F. (1996), “A Note on the Use of Laplace’s Approximation for Nonlinear Mixed-Eﬀects

Models,” Biometrika, 83, 447–452.

Vonesh, E.F. (1992), “Mixed-Eﬀects Nonlinear Regression for Unbalanced Repeated Measures,”

Biometrics, 48, 1–17.

Vonesh, E.F., and Chinchilli, V.M. (1997), Linear and Nonlinear Models for the Analysis of Re-

peated Measurements, New York: Marcel Dekker.

Vonesh, E.F., Chinchilli, V.M., and Pu, K.W. (1996), “Goodness-Of-Fit in Generalized Nonlinear

Mixed-Eﬀects Models,” Biometrics, 52, 572–587.

Vonesh, E.G., Wang, H., Nie, L., and Majumdar, D. (2002), “Conditional Second-Order General-

ized Estimating Equations for Generalized Linear and Nonlinear Mixed-Eﬀects Models,” Journal

of the American Statistical Association, 97, 271–283.

Wakeﬁeld, J. (1996), “The Bayesian Analysis of Population Pharmacokinetic Models,” Journal of

the American Statistical Association, 91, 62–75.

Wakeﬁeld, J. and Rahman, N. (2000), “The Combination of Population Pharmacokinetic Studies,”

Biometrics, 56, 263–270.

Wakeﬁeld, J.C., Smith, A.F.M., Racine-Poon, A., and Gelfand, A.E. (1994), “Bayesian Analysis

of Linear and Nonlinear Population Models by Using the Gibbs Sampler,” Applied Statistics, 43,

201–221.

38

Walker, S.G. (1996), “An EM algorithm for Nonlinear Random Eﬀects Models,” Biometrics, 52.

934–944.

Wang, N., and Davidian, M. (1996), “A Note on Covariate Measurement Error in Nonlinear Mixed

Eﬀects Models,” Biometrika, 83, 801–812.

Wolﬁnger, R. (1993), “Laplace’s Approximation for Nonlinear Mixed Models,” Biometrika, 80,

791–795.

Wolﬁnger, R.D., and Lin, X. (1997), “Two Taylor-series Approximation Methods for Nonlinear

Mixed Models,” Computational Statistics and Data Analysis, 25, 465–490.

Wu, H.L., and Ding, A.A. (1999), “Population HIV-1 Dynamics in vivo: Applicable Models and

Inferential Tools for Virological Data From AIDS Clinical Trials,” Biometrics, 55, 410–418.

Wu, H.L., and Wu, L. (2002a), “Identiﬁcation of Signiﬁcant Host Factors for HIV Dynamics

Modelled by Non-Linear Mixed-Eﬀects Models,” Statistics in Medicine, 21, 753–771.

Wu, L. (2002), “A Joint Model for Nonlinear Mixed-Eﬀects Models With Censoring and Covariates

Measured With Error, With Application to AIDS Studies,” Journal of the American Statistical

Association, 97, 955–964.

Wu, L., and Wu, H.L. (2002b), “Missing Time-Dependent Covariates in Human Immunodeﬁciency

Virus Dynamic Models,” Applied Statistics, 51, 2002.

Yeap, B.Y., Catalano, P.J., Ryan, L.M., and Davidian, M. (2003), ‘Robust Two-Stage Approach to

Repeated Measurements Analysis of Chronic Ozone Exposure in Rats,” Journal of Agricultural,

Biological, and Environmental Statistics, 8, ????-????.

Yeap, B.Y., and Davidian, M. (2001), “Robust Two-Stage Estimation in Hierarchical Nonlinear

Models,” Biometrics, 57, 266–272.

Young, D.A., Zerbe, G.O., and Hay, W.W. (1997), “Fieller’s Theorem, Scheﬀ’e Simultaneous

Conﬁdence Intervals, and Ratios of Parameters of Linear and Nonlinear Mixed-Eﬀects Models,”

Biometrics, 53, 838–347.

Zeng, Q., and Davidian, M. (1997), “Testing Homgeneity of Intra-run Variance Parameters in

Immunoassay,” Statistics in Medicine, 16, 1765–1776.

39

Time (hr)

T

h

e

o

p

h

y

l

l

i

n

e

C

o

n

c

.

(

m

g

/

L

)

0 5 10 15 20 25

0

2

4

6

8

1

0

1

2

Figure 1. Theophylline concentrations for 12 subjects following a single oral dose.

40

0 20 40 60 80

1

2

3

4

5

6

7

Days

l

o

g

1

0

P

l

a

s

m

a

R

N

A

(

c

o

p

i

e

s

/

m

l

)

Figure 2. Viral load proﬁles for 10 subjects from the ACTG 315 study. The lower limit of

detection of 100 copies/ml is denoted by the dotted line.

41

0 5 10 15 20

0

2

4

6

8

1

0

1

2

y

t

Figure 3. Intra-individual sources of variation. The solid line is the “inherent” trajectory, the

dotted line is the “realization” of the response process that actually takes place, and the solid

diamonds are measurements of the “realization” at particular time points that are subject to

error.

42

handle new substantive challenges continue to emerge. Indeed, since the publication of Davidian and Giltinan (1995) and Vonesh and Chinchilli (1997), two monographs oﬀering detailed accounts of the model and its practical application, much has taken place. The objective of this article is to provide an updated look at the nonlinear mixed effects model, summarizing from a current perspective its formulation, interpretation, and implementation and surveying new developments. In Section 2, we describe the model and situations for which it is an appropriate framework. Section 3 oﬀers an overview of popular techniques for implementation and associated software. Recent advances and extensions that build on the basic model are reviewed in Section 4. Presentation of a comprehensive bibliography is impossible, as, pleasantly, the literature has become vast. Accordingly, we note only a few early, seminal references and refer the reader to Davidian and Giltinan (1995) and Vonesh and Chinchilli (1997) for a fuller compilation prior to the mid-1990s. The remaining work cited represents what we hope is an informative sample from the extensive modern literature on the methodology and application of nonlinear mixed eﬀects models that will serve as a starting point for readers interested in deeper study of this topic.

2.

2.1

NONLINEAR MIXED EFFECTS MODEL

The Setting To exemplify circumstances for which the nonlinear mixed eﬀects model is an appropriate

framework, we review challenges from several diverse applications. Figure 1 goes here Pharmacokinetics. Figure 1 shows data typical of an intensive pharmacokinetic study (Davidian and Giltinan 1995, sec. 5.5) carried out early in drug development to gain insight into within-subject pharmacokinetic processes of absorption, distribution, and elimination governing concentrations of drug achieved. Twelve subjects were given the same oral dose (mg/kg) of the anti-asthmatic agent theophylline, and blood samples drawn at several times following administration were assayed for theophylline concentration. As ordinarily observed in this context, the concentration proﬁles have a similar shape for all subjects; however, peak concentration achieved, rise, and decay vary substantially. These diﬀerences are believed 2

to be attributable to inter-subject variation in the underlying pharmacokinetic processes, understanding of which is critical for developing dosing guidelines. To characterize these processes formally, it is routine represent the body by a simple compartment model (e.g. Sheiner and Ludden 1991). For theophylline, a one-compartment model is standard, and solution of the corresponding diﬀerential equations yields C(t) = Dka V (ka − Cl/V ) exp(−ka t) − exp − Cl t V , (1)

where C(t) is drug concentration at time t for a single subject following oral dose D at t = 0. Here, ka is the fractional rate of absorption describing how drug is absorbed from the gut into the bloodstream; V is roughly the volume required to account for all drug in the body, reﬂecting the extent of drug distribution; and Cl is the clearance rate representing the volume of blood from which drug is eliminated per unit time. Thus, the parameters (ka , V, Cl) in (1) summarize the pharmacokinetic processes for a given subject. More precisely stated, the goal is to determine, based on the observed proﬁles, mean or median values of (ka , V, Cl) and how they vary in the population of subjects in order to design repeated dosing regimens to maintain drug concentrations in a desired range. If inter-subject variation in (ka , V, Cl) is large, it may be diﬃcult to design an “all-purpose” regimen; however, if some of the variation is associated with subject characteristics such as age or weight, this might be used to develop strategies tailored for certain subgroups. Figure 2 goes here HIV Dynamics. With the advent of assays capable of quantifying the concentration of viral particles in the blood, monitoring of such “viral load” measurements is now a routine feature of care of HIV-infected individuals. Figure 2 shows viral load-time proﬁles for ten participants in AIDS Clinical Trial Group (ACTG) protocol 315 (Wu and Ding 1999) following initiation of potent antiretroviral therapy. Characterizing mechanisms of virus-immune system interaction that lead to such patterns of viral decay (and eventual rebound for many subjects) enhances understanding of the progression of HIV disease. Considerable recent interest has focused on representing within-subject mechanisms by a system of diﬀerential equations whose parameters characterize rates of production, infection, 3

and death of immune system cells and viral production and clearance (Wu and Ding 1999, sec. 2). Under assumptions discussed by these authors, typical models for V (t), the concentration of virus particles at time t following treatment initiation, are of the form V (t) = P1 exp(−λ1 t) + P2 exp(−λ2 t), (2)

where (P1 , λ1 , P2 , λ2 ) describe patterns of viral production and decay. Understanding the “typical” values of these parameters, how they vary among subjects, and whether they are associated with measures of disease status at initiation of therapy such as baseline viral load (e.g., Notermans et al. 1998) may guide use of drugs in the anti-HIV arsenal. A complication is that viral loads below the lower limit of detection of the assay are not quantiﬁable and are usually coded as equal to the limit (100 copies/ml for ACTG 315). Dairy Science. Inﬂammation of the mammary gland in dairy cattle has serious economic implications for the dairy industry (Rodriguez-Zas, Gianola, and Shook 2000). Milk somatic cell score (SCS) is a measure reﬂecting udder health during lactation, and elucidating processes underlying its evolution may aid disease management. Rodriguez-Zas et al. (2000) represent time-SCS proﬁles by nonlinear models whose parameters characterize these processes. Rekaya et al. (2001) describe longitudinal patterns of milk yield by nonlinear models involving quantities such as peak yield, time-to-peak-yield, and persistency (reﬂecting capacity to maintain milk production after peak yield). Understanding how these parameters vary among cows and are related to cow-speciﬁc attributes is valuable for breeding purposes. Forestry. Forest growth and yield and the impact of silvicultural treatments such as herbicides and fertilizers are often evaluated on the basis of repeated measurements on permanent sample plots. Fang and Bailey (2001) and Hall and Bailey (2001) describe nonlinear models for these measures as a function of time that depend on meaningful parameters such as asymptotic growth or yield and rates of change. Objectives are to understand amongplot/tree variation in these parameters and determine whether some of this variation is associated with site preparation treatments, soil type, and tree density to monitor and predict changes in forest stands with diﬀerent attributes, and to predict growth and yield for individual plots/trees to aid forest management decisions. Similarly, Gregoire and Schaben4

and Walker 2002) and plant and soil sciences (Schabenberger and Pierce 2001).3). We brieﬂy cite a few more applications in biological. In toxicokinetics (e. see also Davidian and Giltinan (1995. describe concentrations in breath or blood of toxic substances as implicit solutions to a system of diﬀerential equations involving meaningful physiological parameters. 2003). These situations share several features: (i) repeated observations of a continuous response on each of several individuals (e. M¨ller and Rosner (1997) describe u patterns of white blood cell and granulocyte counts for cancer patients undergoing highdose chemotherapy by nonlinear models whose parameters characterize important features of the proﬁles.berger (1996ab) note that forest inventory practices rely on prediction of the volume of the bole (trunk) of a standing tree to assess the extent of merchantable product available in trees of a particular age using measurements on diameter-at-breast-height D and stem height H. and Sandler (2002). sec.. and environmental sciences. cows) over time or other con- 5 . Knowledge of the parameters aids understanding of mechanisms underlying toxic eﬀects. the latter two papers relate underlying features of the proﬁles to cancer recurrence. Morrell et al. The range of ﬁelds where nonlinear mixed models are used is vast. Based on data on D. 11. growth rate. and Rogers (1998) represent size-age relationships for black bears via nonlinear models whose parameters describe features of the growth processes. (2003) use nonlinear mixed models in analyses of data on circadian rhythms. Law. Additional applications are found in the literature in ﬁsheries science (Pilling. (1995). Gelman et al. they develop a nonlinear model to be used for prediction whose parameters governing asymptote. and shape vary across trees depending on D and H. subjects.g. Taylor. and repeated measurements of cumulative bole volume at three-foot intervals along the bole for several similarly-aged trees.g. agricultural. 1996. Kirkwood. knowledge of the relationship of these features to patient characteristics aids evaluation of toxicity and prediction of hematologic proﬁles for future patients. Mikulich et al. and Pauler and Finkelstein (2002) characterize longitudinal prostate-speciﬁc antigen trajectories in men by subject-speciﬁc nonlinear models. plots. Brooks. Summary.. McRoberts. complex compartment models including organs and tissue groups. Further Applications. physiologically-based pharmacokinetic models. Mezetti et al. H.

age. . ui . . .. we say more about other properties of the eij shortly. . in (1). Individual-level prediction may also be of interest. . β i ) + eij . 6 i = 1. ai ) are independent across i. . Let yij denote the jth measurement of the response. vary across individuals. j = 1.g. yij = f (xij . β. We consider a basic version of the model here and in Section 3. The intra-individual deviations eij = yij − f (xij . . . but note dependence on tij where appropriate. extensions are discussed in Section 4. (3) In (3). it is ordinarily assumed that the triplets (y i . . . e.. and possible additional conditions ui .dition (e. tij is time and ui is empty. depending on a (p × 1) vector of parameters β i speciﬁc to individual i. volume. drug concentration or milk yield. where kai . and (iii) availability of a scientiﬁcally-relevant model characterizing individual behavior in terms of meaningful parameters that vary across individuals and dictate variation in patterns of time-response. ni . and hence these phenomena.2 The Model Basic model. (ii) variability in the relationship between response and time or other condition across individuals. e. for theophylline. . m.. β i = (kai . and clearance for subject i. . Assume that there may be a vector of characteristics ai for each individual that do not change with time. β2i . β3i )T . As we now describe. the extent to which the parameters. tij is the time associated with the jth drug concentration for subject i following dose ui = Di at time zero. .g. In many applications. Cli )T = (β1i . . β i ) are assumed to satisfy E(eij |ui . . j = 1. under condition tij .g. ni .g.. reﬂecting the belief that individuals are “unrelated.” The usual nonlinear mixed eﬀects model may then be written as a two-stage hierarchy as follows: Stage 1: Individual-Level Model. β i ) = 0 for all j. (4) . ui ). and Cli are absorption rate. Vi . and we use the word “time” generically below. We write for brevity xij = (tij . bi ). the nonlinear mixed eﬀects model is an appropriate framework within which to formalize these objectives. Objectives are to understand the “typical” behavior of the phenomena represented by the parameters. weight. β i = d(ai . . and whether some of the variation is systematically associated with individual attributes. E. intervals along a tree bole). Letting y i = (yi1 . yini )T . Vi . For example. Stage 2: Population Model. or diameter-at-breastheight. such as (1)–(2). f is a function governing within-individual behavior. 2.

We discuss this assumption further momentarily. For example. then for a pharmacokinetic study under (1). b2i . natural. where for subject i wi is weight (kg) and ci = 0 if creatinine clearance is ≤ 50 ml/min. and thus Cli is lognormal with coeﬃcient of variation (CV) exp(D 33 ) − 1.g. with bi = (b1i .. E. Cli = exp(β4 +β5 wi +β6 ci +β7 wi ci +b3i ). then log Cli is normal with variance D 33 . b3i )T . β and a (k × 1) vector of random eﬀects bi associated with individual i. ci )T . represented by bi . (5) Model (5) enforces positivity of the pharmacokinetic parameters for each i. D is an unstructured covariance matrix that is the same for all i. Cli . For instance. if ai = (wi . Here. 7 . or ﬁxed eﬀects.where d is a p-dimensional function depending on an (r × 1) vector of ﬁxed parameters. if bi is multivariate normal. where now the covariance matrix depends on ai . Vi = exp(β2 +β3 wi +b2i ). one may assume D(ai ) = D 0 (1 − ci ) + D 1 ci . kai .g. On the other hand.e. and D characterizes the magnitude of “unexplained” variation in the elements of β i and associations among them. the assumption may be relaxed by taking bi |ai ∼ N {0. D(ai )}. The distribution of the bi conditional on ai is usually taken not to depend on ai (i. an example of (4). D). biological variation.. the bi are independent of the ai ). if bi ∼ N (0. e. and ci = 1 otherwise. (4) characterizes how elements of β i vary among individuals. ci . D). consistent with the widely-acknowledged phenomenon that these parameters have skewed population distributions. indicating impaired renal function. is kai = exp(β1 +b1i ).. Vi are each lognormally distributed in the population. with E(bi |ai ) = E(bi ) = 0 and var(bi |ai ) = var(bi ) = D. due both to systematic association with individual attributes in ai and to “unexplained” variation in the population of individuals. a standard such assumption is bi ∼ N (0. neither of which depends on wi . the assumption that the distribution of bi given ai does not depend on ai corresponds to the belief that variation in the parameters “unexplained” by the systematic relationships with wi and ci in (5) is the same regardless of weight or renal status. if this variation is thought to be diﬀerent. similar to standard assumptions in ordinary regression modeling. Here. Here. Moreover. so that the covariance matrix depends on ai through ci and equals D 0 in the subpopulation of individuals with renal impairment and D 1 in that of healthy subjects. if the parameters are more variable among subjects with normal renal function.

a linear speciﬁcation as in (7) may be unrealistic for applications where population distributions are thought to be skewed. b3i )T . b2i . Alternatively. i. (6) where Ai is a design matrix depending on elements of ai . 0 1 0 1 wi c i wi c i (7) (7) takes Vi to vary negligibly relative to kai and Cli . Cli = β4 + β5 wi + β6 ci + β7 wi ci + b3i with bi = (b1i . Analyses in the literature to determine “whether elements of β i are ﬁxed or random eﬀects” should be interpreted in this spirit. in (5). which attributes all variation in volumes across subjects to diﬀerences in weight. where I q is a q-dimensional identity matrix.In (5). Usually. empirical statistical linear modeling. so including “unexplained” variation in this parameter above and beyond that explained by weight.g. “unexplained” variation in one component of β i may be very small in magnitude relative to that in others. even after systematic relationships with subject characteristics are taken into account. A common special case of (4) is that of a linear relationship between β i and ﬁxed and random eﬀects as in usual. . It is common to approximate this by taking this component to have no associated random eﬀect. b3i )T and B i = I 3 . . 8 . For example. as in pharmacokinetics. e. bi = (b1i . If bi is multivariate normal.. In some settings. which may 1 0 Ai = 0 1 0 0 be represented as in (6) with β = (β1 . and B i is a design matrix typically involving only zeros and ones allowing some elements of β i to have no associated random eﬀect. each element of β i is taken to have an associated random eﬀect. consider the linear alternative to (5) given by kai = β1 + b1i . β7 )T . it is biologically implausible for there to be no “unexplained” variation in the features represented by the parameters. β i = Ai β + B i bi . and 0 0 0 0 0 1 0 wi 0 0 0 0 . if Vi = β2 + β3 wi + b2i . so one must recognize that such a speciﬁcation is adopted mainly to achieve parsimony and numerical stability in ﬁtting rather than to reﬂect belief in perfect biological consistency across individuals. .. . Bi = 0 0 .e. Vi = β2 + β3 wi . specify instead Vi = exp(β2 + β2 wi ). reﬂecting the belief that each component varies nonnegligibly in the population.

Ai may be chosen to reﬂect this for each component of β i in the usual way. To complete the full nonlinear mixed model. Cl∗ )T = (log ka . Cli = β4 + β5 wi + β6 ci + β7 wi ci + b3i (8) is a linear population model in the same spirit as (5). β i ) = f (tij . . drug may not be “perfectly mixed” within the body as the compartment model assumes). where y is concentration and t is time following dose D.Rather than adopt a nonlinear population model as in (5). . In addition. so that f represents what happens “on average” for subject i.. E(yij |ui . (2001. or because response may exhibit “local ﬂuctuations” (e. a speciﬁcation for variation within individuals is required. see Section 3.6. Ch. βp )T assumes E(β i ) = β for = 1.3) is depicted in Figure 3 in the context of the one-compartment model (1) for theophylline. If the data arise according to a design involving ﬁxed factors. β i ). Cli )T . Taking Ai = I p in (6) with β = (β1 . According to the individual model (3). 3. Vi∗ = β2 + β3 wi + b2i . For example.g. we discuss this feature in some detail. log Cl)T .” Thus. Within-individual variation.g. Our discussion focuses on model (3). e. A ﬁner interpretation of this may be appreciated as follows. . ui . 5) and Verbeke and Molenberghs (2000. log V. . the response proﬁle actually realized follows the dotted line. shown as the solid line in Figure 3. p. Vi∗ . sec. if (1) is represented in terms of parameters ∗ ∗ ∗ (ka . . Considerations underlying this task are often not elucidated in the applied literature. then β i = (kai . and there is some apparent confusion regarding the notion of “within-individual correlation. which may be done in the case where no individual covariates ai are available or as a part of a model-building exercise. In summary. Figure 3 goes here A conceptual perspective on within-individual variation discussed by Diggle et al. and ∗ ∗ kai = β1 + b1i . . but the same considerations are relevant for linear modeling. V ∗ . when i is observed. speciﬁcation of the population model (4) is made in accordance with usual considerations for regression modeling and subject-matter knowledge. Because f may not capture all within-individual physiological processes. a factorial experiment. focusing on phenomena taking place within a single individual i.. a common alternative tactic is to reparameterize the model f . . . as an assay is used to ascertain the value of the realized response at any 9 .

i (s. β i } = 0. β i } = E{eM. tij “close” would tend to be positive or negative together. ui ) (10) with E{eR. β i ) is the average over all possible realizations of actual concentration trajectory and measurement errors that could arise if subject i is observed. The implication is that f (t. and hence E(eR. formalized by writing (3) as yij = f (tij . ui )|ui . More precisely. implying that eR. β i } = exp(−ρ|t − s|).ij .i (t.time t. ui ) = eR. This suggests models such as the exponential correlation function corr{eR.i (t. β i ) = E(eM. β i ) + eR. (9) where eij has been partitioned into deviations from f (tij . with correlation “damping out” as observations are more separated in time. Thus.i (t.i (t. ui ) and eM.ij .i (tij . the yij observed by the analyst are each the sum of one realized proﬁle and one set of measurement errors at intermittent time points tij .ij + eM. rather than actual response realizations. β i ) describing the pattern of correlation and variation of realizations of eR. if it could be observed without error.ij and eR. ui ) = eM. ui . Hence. ui ) + eM. realizations over time are likely to be positively correlated within an individual.ij . and β i is an “inherent characteristic” of i dictating this tendency.i (tij .ij .ij . ui )|ui . ui . Thus.i (t. and possible measurement error. From Figure 3. then. the “actual” realized response at tij . a fundamental principle underlying nonlinear mixed eﬀects modeling is that such “inherent” properties of individuals. The process eM. ui ) is similar to the “nugget” eﬀect in spatial models. characterizing within-individual variation corresponds to characterizing autocorrelation and variance functions (conditional on ui . is thus f (tij . ui ) = f (t.ij for tij . eM. ui ) and how they combine to produce an overall pattern of intra-individual variation.i (t. are of central scientiﬁc interest. ui .ij |ui . Thus. ui . In (9). eR. other popular models are 10 .ij .i (t. measurement error may be introduced. β i ) + eR. eR. β i ) = 0.i (t.i (t. realizations far apart bear little relation to one another. On the other hand.ij |ui . two realizations (dotted line) at times close together tend to occur “on the same side” of the “inherent” trajectory. ui ). so that the measured values y. do not exactly equal the realized values. represented by the solid symbols. β i ) + eR. First consider eR. β i ) may be viewed as the “inherent tendency” for i’s responses to evolve over time. eM. β i ) due to the particular realization observed. where eR. ui )|ui . f (t. ui ). We may think of (9) as following from a within-subject stochastic process of the form yi (t. at each tij . ui . ui .

i .ini )T . var(y i |ui . β i } = 0 for all t > s. (13) If such independence were not thought to hold. .. . Often.g. E. (2001. if “actual” responses at any time within an individual are thought to have a 2 skewed distribution with constant CV σR . 2001.i would thus ordinarily be diagonal with diagonal elements var(eM. β i .e. .i (s. β i ) = T i (ui . δ).ij |ui . . with constant CV σR and exponential correlation.i1 . ui ) characterizes error in measuring “actual” realized responses on i. then var{eR. β i ) = Λi (ui . and Γi (ρ) (ni × ni ) with (j.i (t. e.i1 .i = (eM. δ) to be the (ni × ni ) diagonal matrix with diagonal elements var(eR.i (t.i (t. β i } = σR for all t.i |ui .i = (eR. Ch.i |ui . ui . eM. ui . samples are assayed separately. 5) is that the realization and measurement error processes in (10) are conditionally independent. deﬁne T i (ui . 2 e. β i ) + var(eM. β i . 5). 2 2 (11) has (j. then var{eR.ini )T . β i . β i } = σM implies this is constant regardless of the magnitude of the response. If “ﬂuctuations” in the realization pro2 cess are believed to be of comparable magnitude over time. β i ).. by Diggle et al. e. A common assumption (e. Thus.i (t. β i ). β i } reﬂects the nature of error variation. (12) a diagonal matrix depending on a parameter θ. and δ = σR . .ij . say. β i ) = var(eR. θ) = σM I ni and θ = σM . ui )|ui . The process eM. for example. (ni × ni ). the covariance matrix of eM.i |ui . var(eM. j ) element σR f (tij .i (t. Λi (ui . ui ). . under speciﬁc variance and autocorrelation functions. i. eR. eR. β i . Thus. β i ) depending on parameters ρ. Alternatively. j ) elements corr(eR.g. θ). combining the foregoing considerations and adopting (13) as is 11 . . ui )|ui .. then var(eR. Deﬁning eM.. (12) may also depend on β i . δ)Γi (ρ)T i (ui . β i ) exp(−ρ|tij − tij |).described.ij |ui . which implies var(y i |ui . β i ) depending on parameters δ. β i . it is usually reasonable to assume (conditional) independence of instances of the measurement process.g. Diggle et al. var{eM. β i } = σR f 2 (t. 1/2 1/2 (ni × ni ).g. Ch. The form of var{eM. the variance of the realization process need not be constant over time. .. an example is 2 2 given below. (11) is the covariance matrix for eR. β i ). Let- ting eR. For constant measurement error variance. β i ) would also involve a conditional covariance term.. ui )|ui .i (t.i |ui .g. ρ = ρ.ij |ui . eM. ui )|ui . ui . so that corr{eM. ui )|ui . β i )f (tij . the error committed by a measuring device at one time is unrelated to that at another.

ij . β i . although ideally eR. ui ) may have non-zero autocorrelation function. Γi (ρ) = I ni in (11) may be a reasonable approximation. ξ). where eﬀectively eR. β i . ui ) may be eliminated from the model altogether. For example. δ) + Λi (ui . the observation times tij are far enough apart relative to the “locality” of the process that correlation has “damped out” suﬃciently between any tij and tij as to be virtually negligible. if the magnitude of “ﬂuctuations” represented by the eR. eM. A well-known property of assays used to quantify drug concentration is that errors in measurement are larger in magnitude the larger the (true) concentration being measured. one might regard the ﬁrst term in (14) as negligible. δ)Γi (ρ)T i (ui . if var(eR. ρT . This would violate the usual assumption (13) that eM. Thus. often θ = 1. ui ) is entirely disregarded.i (t. may be optimistic. 1/2 1/2 (14) where ξ = (δ T .ij .ij |ui .ij is “small” relative to errors in measurement. Thus. β i ). (1995) argue that this approximation. ui . The representation (14) provides a framework for thinking about sources that contribute to the overall pattern of within-individual variation. In fact. θ)T . where Ri (ui .ij is independent of eR. as serial correlation may be apparent.customary. for the particular times involved.i (t. β i ). θ) = Ri (ui . as the “actual” concentration in a sample at time tij is f (tij . β i . β i ) for some θ = (σ 2 . and Sheiner (1995) note that this is a common approximation in pharmacokinetics. Karlsson et al. Karlsson. Although this is generally done without comment. measurement error may be nonexistent. θ) a diagonal matrix with diagonal elements σ 2 f 2θ (tij . θ). variation in measurement errors at tij may be thought mainly to be a function of f (tij . In many instances. However. β i )+eR. β i . To demonstrate how considering (14) may 12 .ij |ui .i (t. so that (14) only involves T i and Γi . In still other contexts.ij at tij would be thought to vary as a function this value. ξ) = Λi (ui . a general representation of the components of within-subject variation is var(y i |ui . It is common in practice to adopt models that are simpliﬁcations of (14). β i . β i ) = T i (ui . Beal. θ T )T fully describes the overall pattern of within-individual variation. if response is age of a plant or tree and recorded exactly. β i . ui . This leads to a standard model in this application. measurement error may be taken to be the primary source of variation about f . measurement errors eM. ui . β i . β i ) << var(eM. a rationale for this approximation may be deduced from the standpoint of (14). with Λi (ui .

E.i . the parameters ξ are common to all individuals. although although var(eR. similar to models studied by these authors. other “local” variation is negligible. β i ) may be small relative to var(eM. β i )f (tij . If. ui ) is not disregarded. β i ) is a correct speciﬁcation of “inherent” individual behavior. postulating a common parameter may be a reasonable approximation. compartment models used to characterize the body are admittedly simplistic. β i ). in some contexts. θ) is reasonably modeled as above. eR.i (t.i (t. As drug concentrations are positive. ui . ξ) may be less critical when among-individual variation is dominant. suggesting. ui . ui ) in (10) are regarded as part of within-individual variation and hence not of direct interest. for example. An alternative view is that a component of these deviations is due to misspeciﬁcation of f (t. with actual realizations ﬂuctuating about f (t.. β i } = σR. ui . In this formulation of Ri (ui . Thus. in specifying a model for Ri (ui . var{eR.. ui . the analyst must be guided by subject-matter and practical considerations.i (t. j ) oﬀ-diagonal 2 elements σR f (tij . ui . suppose that. Just as β i vary across individuals. β i ) = σM for all i is plausible. 13 . β i .i (t. ui . var{eM. but have not been widely used. β i )Γi.i (t. e. The foregoing discussion tacitly assumes that f (t.ij |ui . ui . however. β i ) + eR. in pharmacokinetics.lead to a more realistic model. so that Λi (ui . ui )|ui . ui . accurate characterization of Ri (ui . β i . β i ) and (j. “inherent trajectory” would be f (t. so systematic deviations from f (t. as in (4).ij |ui . For instance. ξ). it may be appropriate to assume that realized concentrations have a skewed distribution for each t and take T i (ui . β i ). and the true. δ) to 2 have diagonal elements σR f 2 (tij .i (t. ui ). deviations arising from eR.g. β i ) due to their failure to capture the true “inherent” trajectory are possible. ξ) in (14).jj (ρ). ui ) in (10) in fact reﬂects entirely such misspeciﬁcation. the process eR. parameters of the processes in (10) may be thought to be individual-speciﬁc. β i .i as depending on covariates and random eﬀects. ξ) with diagonal elements (σR + σM )f 2 (tij .g. β i . As discussed below. ui . “ﬂuctuations” 2 may diﬀer in magnitude for diﬀerent subjects. one is 2 2 led to Ri (ui . β i . Such speciﬁcations are possible although more diﬃcult (see Zeng and Davidian 1997 for an example). if the same measuring device is used 2 for all subjects. If θ = 1 and Γi (ρ) is a correlation matrix. β i ). 2 and it is natural to think of σR. In summary. β i ). β i . If inter-individual variation in these parameters is not large. ui )|ui .

distinguishing bias from within-individual variation makes within-individual covariance modeling a nearly impossible challenge. We are now in a position to summarize the basic nonlinear mixed eﬀects model. . β i )}T . D).so that f (t.2. . averaged across the population. ui . . β. we assume that the distribution of bi |ai does not depend on ai . var(y i |z i ) = E{Ri (z i . this 14 . part of eR. “Within-individual correlation. . bi ). β i ) = {f (xi1 . and let z i = (uT .” The nonlinear mixed model (15)–(16) implies a model for the marginal mean and covariance matrix of y i given all covariates z i . var(y i |z i . it is vital to recognize that published applications of nonlinear mixed eﬀects models are almost always predicated on correctness of f (t. β i = d(ai . as it is routine in the literature. Let f i (ui . bi . β. aT )T summarize all i i covariate information on subject i. β i ). β i . ξ) = Ri (z i . In the literature. f (xini .i (t.e. β. bi )|z i }. ξ)|z i } + var{f i (z i . E(y i |z i . there are also obvious implications for the meaning and relevance of β i under a misspeciﬁed model. β. β.” More generally. bi . eR. under misspeciﬁcation.” but without knowledge of the exact nature of misspeciﬁcation. Then. bi ∼ (0. ξ). bi ) dFb (bi ). In (16). Summary. dependence of f i and Ri on the covariates ai and ﬁxed and random eﬀects through β i is emphasized.i (t. We adopt this assumption in the sequel. β i ) alone is a biased representation of the true trajectory. the random component in (16). E(y i |z i ) characterizes the “typical” response proﬁle among individuals with covariates z i .” however. This model represents individual behavior conditional on β i and hence on bi . (15) Stage 2: Population Model. β. however. so that all bi have common distribution with mean 0 and covariance matrix D. we have E(y i |z i ) = f i (z i . β i ). but the methods we discuss may be extended if it is relaxed in the manner described in Section 2. ui ) is due to such systematic deviations and part is due to “ﬂuctuations. ui ) should be regarded as part of the “signal” rather than as “noise. i. we may write the model in (3) and (4) succinctly as Stage 1: Individual-Level Model. In (17). Letting Fb (bi ) denote the cumulative distribution function of bi . var(y i |z i ) is often referred to as the “within-subject covariance matrix. We do not pursue this further. bi ). ui . Here. (17) where expectation and variance are with respect to the distribution of bi . bi ) = Ri (ui . bi ) = f i (ui . β i ) = f i (z i . In the particular context of nonlinear mixed eﬀects models. (16) In (15).

In particular. both terms contribute to the overall pattern of correlation among responses on the same individual represented in var(y i |z i ).3).3). and var{f i (z i . 2. bi )|z i }. sec. β. here. 5. In general. Note E{Ri (z i . discussed at the end of Section 2.1.is misleading. is an identity matrix. correlation at the marginal level is always expected due to variation among individuals. However. β. the terms “within-individual covariance” and “within-individual correlation” are better reserved to refer to phenomena associated with the realization process eR. β i . may have inconsequential eﬀects on inference. β. while there is correlation from within-individual sources only if serial associations among intra-individual realizations are nonnegligible.i (t. var(y i |z i ) involves two terms: E{Ri (z i . ξ)|z i }. ξ)|z i } is a diagonal matrix only if Γi (ρ) in (14). how one models within-individual correlation. bi . which describes how “inherent trajectories” vary among individuals sharing the same z i . (2001. whether one improperly disregards it. bi . var{f i (z i . Thus. in many applications. 3. reﬂecting correlation due to within-individual realizations. bi )|z i }. suggest that. secs. It is the responsibility of the data analyst to evaluate critically the rationale for and consequences of adopting a simpliﬁed model in a particular application. 15 . We prefer “aggregate correlation” to denote the overall population-averaged pattern of correlation arising from both sources. 5) and Verbeke and Molenberghs (2000. β. As noted by Diggle et al. Davidian and Giltinan (1995. Implementation is discussed in Section 3. Ch. which averages realization and measurement variation that occur within individuals across individuals having covariates z i . the eﬀect of within-individual serial correlation reﬂected in the ﬁrst term of var(y i |z i ) is dominated by that from among-individual variation in var{f i (z i . or. ξ) that emphasize measurement error. in fact.4 and 11. then. This explains why many published applications of nonlinear mixed models adopt simple. It is important to recognize that within-individual variance and correlation are relevant even if scope of inference is limited to a given individual only. bi )|z i } has non-zero oﬀ-diagonal elements in general due to common dependence of all elements of f i on bi . diagonal models for Ri (ui . β. Thus.2. ui ).3 Inferential Objectives We now state more precisely routine objectives of analyses based on the nonlinear mixed eﬀects model.

Often. In HIV dynamics. Once a ﬁnal model is selected. etc. how they vary across individuals in the population. it may be diﬃcult to make statements that are generalizable even to particular subgroups with certain covariate conﬁgurations. understanding whether and to what extent weight. given suﬃcient data on i. In many applications. For example.” In the context of (15)–(16). In most of the applications in Section 2. as in ordinary regression modeling. identify a parsimonious functional form involving the elements of ai for which there is evidence of associations. inference on β i may in principle be 16 . an analysis may involve postulating and comparing several such models to arrive at a ﬁnal speciﬁcation.e. knowledge of which individual characteristics in ai are “important” in this way has signiﬁcant practical implications. smoking behavior. so-called “individual-level prediction. indicating the need for further study of the population to identify additional.. For instance. Of course. there is great interest in the potential for “individualized” dosing regimens based on subject i’s own pharmacokinetic processes. The components of β describe both the “typical values” and the strength of systematic relationships between elements of β i and individual covariates ai . are associated with drug clearance may dictate whether and how these factors must be considered in dosing. characterized by β i .1. in pharmacokinetics. renal status. and prior treatment history in a speciﬁed range. this involves inference on β i or functions such as f (t0 . important attributes. Simulated concentration proﬁles based on β i under diﬀerent regimens may inform strategies for i that maintain desired levels.Understanding the “typical” values of the parameters in f . an additional goal is to characterize behavior for speciﬁc individuals. viral load. Thus. the goal is to deduce an appropriate speciﬁcation d in (16). inference on D corresponding to the included random eﬀects provides information on the variation among subjects not explained by the available covariates. the diﬃculty of establishing broad treatment recommendations based only on these attributes will be highlighted. β i ) at a particular time t0 . in pharmacokinetics. for patients with baseline CD4 count. i. for example. if λ2 in (2) characterizing long-term viral decay varies considerably. and whether some of this variation is associated with individual characteristics may be addressed through inference on the parameters β and D. ui . If such variation is relatively large.

As discussed by Davidian and Giltinan (1995. particularly for a new patient. β in a linear SS model fully characterizes both the “typical value” of β i and the “typical response proﬁle. suﬃcient data may not be available. β i ) = U i β i . The nonlinear mixed model provides a framework that allows “borrowing” of information from similar subjects. then. bi ) = X i β + Z i bi for X i = U i Ai and Z i = U i B i . This is to be contrasted with the meaning of β in (16) as the “typical value” of individual-speciﬁc parameters β i in the population.implemented via standard nonlinear model-ﬁtting techniques using i’s data only. where X i thus depends on z i . the marginal distribution of y i given covariates z i .” so that either interpretation is valid. this model implies that E(y i |z i ) = (X i β + Z i bi ) dFb (bi ) = X i β.6. or marginal) models may not be important for linear mixed eﬀects models.4). 4. A PA model assumes that interest focuses on parameters that describe. as we now exhibit. if E(y i |z i ) were modeled directly as a function of z i and a parameter β. postulating the linear SS model is equivalent to postulating a PA model of the form E(y i |z i ) = X i β directly in that both approaches yield the same representation of the marginal mean and hence allow the same interpretation of β. see Section 3. the distinction between SS and population averaged (PA. as i is drawn from a population of subjects. However. sec. for design matrix Ai depending on ai . as E(bi ) = 0. the distinction between SS and PA 17 . and ﬁrst stage E(yij |ui . From the discussion following (17). Thus. β. Consider ﬁrst linear such models. where U i is a design matrix depending on the tij and ui . 2.4 “Subject-Specific” or “Population-Averaged?” The nonlinear mixed eﬀects model (15)–(16) is a subject-speciﬁc (SS) model in what is now standard terminology. but it is critical under nonlinearity. in our notation. Even if ni is large enough to facilitate estimation of β i . bi ) = f i (z i . Here. leads to the linear mixed eﬀects model E(y i |z i . A linear SS model with second stage β i = Ai β + B i bi as in (6). intuition suggests that it may be advantageous to exploit the fact that i may have similar pharmacokinetic behavior to subjects with similar covariates. From (17). β would represent the parameter corresponding to the “typical (average) response proﬁle” among individuals with covariates z i . Consequently.

does not facilitate incorporation of an individual-level model. this is no longer the case. For deﬁniteness. β does not fully characterize the “typical response proﬁle” and thus cannot enjoy both interpretations.g. indeed. using such a model for population-level behavior is inappropriate. the interpretation of β in SS and PA models cannot be the same in general. Conversely. however. The implication is that the modeling approach must be carefully considered to ensure that the interpretation of β coincides with the questions of scientiﬁc interest. D) density..” But it seems unlikely that it could also have the interpretation as the “typical value” of individualspeciﬁc parameters β i in a SS model. D) is the N (0. the SS approach and its interpretation are clearly more relevant. and consider a SS model of the form in (15) and (16) for some function f nonlinear in β i and hence in bi . this integral is clearly intractable.1. and E(y i |z i ) is a complicated expression. D)dbi . the implied marginal mean is E(y i |z i ) = f i (z i . suppose that bi ∼ N (0. Accordingly. representing the average of time-concentration proﬁles across subjects by the one-compartment model (1). (18) where p(bi . although perhaps giving an acceptable empirical characterization of the average. if we start with a nonlinear SS model. Moreover. from (17). one that is not even available in a closed form and evidently depends on both β and D in general. The PA approach of modeling E(y i |z i ) directly. bi )p(bi . for nonlinear models. where averaging over the population has already taken place. E. Thus. does not enjoy mean18 . particularly when it is derived from theoretical considerations. Then. For nonlinear f such as (1). if we were to take a PA approach and model the marginal mean directly as a function of z i and a parameter β. identifying a corresponding SS model for which the integral in (18) turns out to be exactly the same function of z i and β in (16) and does not depend on D seems an impossible challenge. In applications like those in Section 2.approaches has not generally been of concern in the literature on linear modeling. Consequently. β would indeed have the interpretation of describing the “typical response proﬁle. the implied PA marginal mean model involves both the “typical value” of β i (β) and D. β. as a model to describe individual behavior like (1)–(2) is central to the scientiﬁc objectives. see Heagerty (1999) for related discussion. D). For nonlinear models.

2 and 3. Rosenberg. Even when the “typical concentration proﬁle” E(y i |z i ) is of interest. Diggle et al. as in (17). The need to take into account (aggregate) correlation among elements of y i is well-recognized. aggregate correlation is modeled directly.5. discussed in Section 3. and Marathe (1977). IMPLEMENTATION AND INFERENCE A number of inferential methods for the nonlinear mixed eﬀects model are now in common use. pharmacokineticists were among the ﬁrst to develop in nonlinear mixed eﬀects modeling in detail. In SS generalized linear mixed eﬀects models. bi ) 19 . is preferable. as this exploits the scientiﬁc assumptions about individual processes embedded in the model. parallel literature (Breslow and Clayton 1993.1 The Likelihood As in any statistical model. We provide a brief overview. The individual model (15) along with an assumption on the distribution of y i given (z i . and random eﬀects represent (among-individual) correlation and generally do not correspond to “inherent” physical or mechanistic features as in “theoretical” nonlinear mixed models. the analyst must ensure that the interpretation of the parameters matches the subject-matter objectives (interest in the “typical response proﬁle” versus the “typical” value of individual characteristics). Likelihood is also a fundamental component of Bayesian inference. Sheiner (2003. From a historical perspective. see Sheiner. 2001.3. see Sections 3. 3. Ch. 3. Rather. and both SS and PA models are used. a natural starting point for inference is maximum likelihood. personal communication) argues that adopting a SS approach and averaging the subject-level model across the population. and refer the reader to the cited references for details.ingful subject-matter interpretation. for nonlinear such empirical models like the logistic. Here. In PA models. General statistical modeling of longitudinal data is often purely empirical in that there is no “scientiﬁc” model. 9) within-individual correlation is assumed negligible. This is a starting point here because the analytical intractability of likelihood inference has motivated many approaches based on approximations. for which there is a large. linear or logistic functions are used to approximate relationships between continuous or discrete responses and covariates. the above discussion implies that the choice between PA and SS approaches is also critical.

viewing the conditional moments in (15) as functions of β i . D) = p(y i |z i . the ubiquitous choice is the normal. 5. .yields a conditional density p(y i |z i . More generally.2). As with other mixed models. . Although numerical techniques for evaluation of an integral are available.2 Methods Based on Individual Estimates If the ni are suﬃciently large. bi |z i . these can be computationally expensive when performed at each internal iteration of the algorithm. β. β. β. 3. the joint density of (y i . ξ. normality is standard. β i . β. D may be based on the joint density of the observed data y 1 . bi . say. ξ)p(bi . ξ. . one assumes a k-variate density p(bi . In particular. β i ) = Ri (ui . With these speciﬁcations. methods incorporating estimation of within-individual variance and correlation parameters ξ are needed so that intra-individual variation is taken into appropriate account. adopting independence of bi and ai . β i ). E(y i |ui . ξ). the lognormal has also been used. D) for bi . iterative algorithms to maximize (20) in β. D) in (19) the assumed k-variate density p(bi |ai ). ξ). β i ) = f (xij . bi . If Ri (ui . an intuitive approach is to “summarize” the responses y i for each i through individual-speciﬁc estimates β i and then use these as the basis for inference on β and D. ﬁt the model speciﬁed by these moments for each individual. many approaches to ﬁtting (15)–(16) are instead predicated on analytical approximations. one would substitute for p(bi . (16). bi ) given z i is p(y i . bi |z i . D require a way to handle these integrals. bi . ξ) is a matrix of the form σ 2 I ni . D) dbi (20) by independence across i. var(y i |ui .e. m m p(y i . ξ) is diagonal. β. thus. 20 . . bi . . y m given z i . D) dbi = i=1 i=1 p(y i |z i . ξ)p(bi . Hence. ξ. as described by Davidian and Giltinan (1993. ξ. which may depend on diﬀerent parameters for diﬀerent values of ai . β i . Nonlinearity means that the m k-dimensional integrations in (20) generally cannot be done in a closed form. D). the density may be written as the product of m contributions p(yij |z i . standard nonlinear OLS may be used. ξ). β.. 1995. β. (19) (If the distribution of bi given ai is taken to depend on ai . i. Under the popular (although not always relevant) assumption that Ri (z i . bi . At Stage 2. sec. . Under this condition.) A likelihood for β. as in ordinary regression where the yij are assumed (conditionally) independent.

this approach may be thought of as approximating (20) with a change of variables to β i by replacing 21 . this is simpliﬁed by “preprocessing” i −1/2 −1 as follows.3. C i depends on β i in general. e∗ |z i ∼ N (0. so that the e∗ do not depend on bi . i i i · · (21) where C i is treated as known for each i. For either this or the EM algorithm. (1984) propose use of the EM algorithm.. so the theory may be viewed as applying conditionally on β i for each i. This has the form i of a linear mixed eﬀects model with known “error” covariance matrix C i . bi ∼ N (0. Because of the nonlinearity of f in β i . β i ∼ N (β i . sec. Because C i is known and diﬀerent for each i.ncsu. Steimer et al. If the β i are viewed roughly as conditional “suﬃcient statistics” for the β i . If C i satisfying C i is the Cholesky decomposition of C i . To see how this is exploited for inference on β and D. sec. Premultiplying (21) by C i Ci −1/2 yields the “new” linear mixed model i β i ≈ (C i Ai )β + (C i B i )bi + i . C i ). The asymptotic result may be expressed alternatively as β i ≈ β i + e∗ = Ai β + B i bi + e∗ . 5. yielding β i |ui . the same developments apply to any general d in (16) (Davidian and Giltinan 1995. Each individual is treated separately.4). i. say.2) in the case k = p and B i = I p . it is possible to use linear mixed model software such as SAS proc mixed (Littell et al. the algorithm is given by Davidian and Giltinan (1995. 1996) or the Splus/R function lme() (Pinheiro and Bates 2000) to ﬁt (21). where β i is substituted.e. then C i −1/2 −1 −1/2 C iC i −1/2 −1/2 −1/2 T = I p . an upper triangular matrix −1/2 −1/2 T Ci = C i . ∼ N (0. consider the linear second-stage model (6). Splus/R programs implementing both approaches that require some intervention by the user are available at http://www.Usual large-sample theory implies that the individual estimators β i are asymptotically normal. so C i is replaced by C i in practice. which suggests using standard techniques for ﬁtting such models to estimate β and D. 5. D). valid ap- proximate standard errors are obtained by treating (21) as exact.3. so that C i −1/2 e∗ has identity i covariance matrix.stat. I p ). C i ). Ai and C i −1/2 (22) B i and the Fitting (22) to the “data” C i −1/2 β i with “design matrices” C i −1/2 2 constraint σe = 1 is then straightforward. A practical drawback of these methods is that no general-purpose software is available. Alternatively.edu/˜st762 info/. and the 2 software default is to assume var(e∗ |z i ) = σe I p .

bi . it is natural to consider a linear approximation. as the asymptotic result is not predicated on normality of y i |ui . and because the estimating equations for β. D solved by normal-based linear mixed model software lead to consistent estimators even if normality does not hold. this approach does not require normality to achieve valid inference. β. β. ξ). β. 0) + Ri (z i . D) dbi is obtained. β. disregarding the term involving bi i as “small” and letting Z i (z i . b∗ ) = ∂/∂bi {f i (z i . ξ) i . ξ) in (20) is a normal density. β. bi ) + Ri (z i . 0)bi + Ri (z i . β. 0) + Z i (z i . there are insuﬃcient data to ﬁt the pharmacokinetic model f to any one subject (nor to assess its suitability). β i . Although this provides rich information for building population models d. D) is also normal.3 Methods Based on Approximation of the Likelihood This last point is critical when the ni are not large. However. Implementation of (20) in principle imposes no requirements on the magnitude of the ni . 0). β. A Taylor series of (23) about bi = 0 to linear terms. (24) amounts to approximating it by another normal density whose mean and covariance matrix are linear in and free of bi . β. β. bi y i = f i (z i . ∼ (0. β. β. First order methods.. β. respectively. an approximation to p(y i |z i . Thus. β. bi . haphazard drug concentrations over intervals of repeated dosing are collected on a large number of subjects along with numerous covariates ai . ξ) by p(β i |ui . In particular. an attractive tactic is instead to approximate (20) in a way that avoids intractable integration. This suggests the proposal of Beal and Sheiner (1982) to 22 . In point of fact. bi . I ni ). (23) As nonlinearity in bi causes the diﬃculty for integration in (20). sparse. β. ξ)p(bi .p(y i |z i . 1/2 (24) (25) E(y i |z i ) ≈ f i (z i . i When p(y i |z i . in population pharmacokinetics. An approach usually attributed to Beal and Sheiner (1982) is motivated by letting Ri be the Cholesky decomposition of Ri and writing (15)–(16) as 1/2 i |z i . β. ξ. If p(bi . ξ) i . 3. bi )}|bi =b∗ leads to y i ≈ f i (z i . 0. D) for each i with mean and covariance matrix (25). β. for each i. ξ. 0)DZ T (z i . 0. E.g. β i . ξ). the ni must be large enough for the asymptotic approximation to be justiﬁed. var(y i |z i ) ≈ Z i (z i . D) = 1/2 p(y i |z i . bi . β. the integral is analytically calculable analogous to a linear mixed model and yields a ni -variate normal density p(y i |z i .

which takes full account of this dependence. discussed momentarily. D by solving generalized estimating equations (GEEs) (Diggle et al. Standard errors are obtained from the information matrix assuming the approximation is exact. because the approximation to var(y i |z i ) in (25) depends on β. ξ. Chs. ξ.2. The advantage is that this “approximate likelihood” is available in a closed form. the mean is not of the “generalized linear model” type and the covariance matrix is not a “working” model but rather an approximation to the true structure dictated by the nonlinear mixed model. Ch. As (25) deﬁnes an approximate marginal mean and covariance matrix for y i given z i . the dependence of var(y i |z i ) on β only plays a role in the formation of “weighting matrices” for each data vector y i . An obvious drawback of all ﬁrst order methods is that the approximation 23 . however. sec.4) describe the connections between GEEs and the approximate methods discussed in this section. 1996. 8).9) and Davidian and Giltinan (1995. related methods. 8.cfm) favored by pharmacokineticists. nlinmix implements only the ﬁrst order approximate marginal moment model (25) using the GEE approach and a related ﬁrst order conditional method.” here. an alternative approach is to estimate β. Ch. Here. described shortly. Vonesh and Carter (1992) and Davidian and Giltinan (1995. Vonesh and Chinchilli (1997. It is also available in SAS proc nlmixed (SAS Institute 1999) via the method=firo option.m estimate β. http://www. ξ.3) discuss further. the GEE approach is broadly applicable to any marginal moment model.net/products/nonmem. The ordinary GEE approach noted above is an example of “GEE1. Implementation is available in the SAS nlinmix macro (Littell et al. The SAS nlinmix macro and proc nlmixed are distinct pieces of software. Brieﬂy. nlmixed oﬀers several. in addition to implementing (25) via the normal likelihood method above. diﬀerent alternative approaches to “exact” likelihood inference. maximizing the approximate normal likelihood corresponds to what has been called a “GEE2” method.globomax. β. and Beal 1992. In contrast.2. 12) with the expand=zero option. 6. D by jointly maximizing i=1 p(y i |z i . which is equivalent to maximum likelihood under the assumption the marginal distribution y i |z i is normal with moments (25). 6. 2001. D). Sheiner. This approach is known as the fo (ﬁrst-order) method in the package nonmem (Boeckmann. sec.

say. β. The result is that p(y i |z i . i (27) where Z i is deﬁned as before. (28) p(y i |z i . as they essentially replace E(y i |z i ) = f (z i . ξ. The derivations of these authors assume that the within-individual covariance matrix Ri does not depend on β i and hence bi . β. β. 0). In all cases. bi . winnonmix (http://www. β. β. and (ii) holding bi ﬁxed. update estimation of β. β. say. β. First order conditional methods. ξ). ξ. bi )}T R−1 (z i . e. D) = . β. var(y i |z i ) ≈ Z i (z i . β. a natural way to approximate integrals like those in (20) is to exploit Laplace’s method. β. β. (26) i bi = DZ T (z i . ξ. ξ){y i − f i (z i . bi maximizes in bi the posterior density for bi i p(y i |z i . a standard technique to approximate an integral of the form e− (b) db that follows from a Taylor series expansion of − (b) about the value b. ξ){y i − i f i (z i . bi )bi . bi . In fact. bi )}. ξ. Wolﬁnger and Lin (1997) provide references for this method and a sketch of steps involved in using this approach to approximate (20) in a special case of (15)–(16). β. 24 . standard errors are obtained assuming the approximation is exactly correct. This suggests that more reﬁned approximation would be desirable. Software is available implementing variations on this theme. D and bi . D based on the moments in (26). ξ.pharsight. D) may be approximated by a normal density with E(y i |z i ) ≈ f i (z i . β. β. Additional software packages geared to pharmacokinetic analysis also implement both this and the “ﬁrst order” approach. β. β. D) dbi by f (z i . bi )p(bi . bi )} + bT Dbi in bi . Equations (26)–(27) suggest an iterative scheme whose essential steps are (i) given current estimates β. D) Lindstrom and Bates (1990) instead derive (26) by a Taylor series of (23) about bi = bi . D) are ordinarily normal densities. bi ) + Ri (z i . see also Wolﬁnger (1993) and Vonesh (1996). The nonmem package with the foce option instead uses a “GEE2” approach.may be poor. D) p(bi |y i . and bi maximizes (bi ) = {y i −f i (z i .g. bi )DZ T (z i . which we write as Ri (z i .. β. ξ)p(bi . bi )Ri (z i . β. ξ) and p(bi . β. As p(y i |z i . The Splus/R function nlme() (Pinheiro and Bates 2000) and the SAS macro nlinmix with the expand=eblup option carry out step (ii) by a “GEE1” method. update bi by substituting these in the right hand side of (27). maximizing (b). ξ).com/products/winnonmix) and nlmem (Galecki 1998). bi ) − Z i (z i . z i .

Results from ﬁrst order methods may also be used as starting values for a more reﬁned “conditional” ﬁt. ξ. maximization or solution of likelihoods or estimating equations can still be computationally challenging. The methods in this section may be implemented for any ni . even when ni are not large or the assumptions of normality that dictate the form of (28) on which bi is based are violated (e.g. β. p. as noted by Vonesh (1996). The analyst must bear in mind that failure to achieve convergence in general is not valid justiﬁcation for adopting a model speciﬁcation that is at odds with features dictated by the science.. p. the above argument no longer holds. For instance.g. Although they involve closed-form expressions for p(y i |z i . one might take D to be a diagonal matrix. The software packages above all handle this more general case (e. It is well-documented by numerous authors that these “ﬁrst order conditional” approximations work extremely well in general. A common practical strategy is to ﬁrst ﬁt a simpliﬁed version of the model and use the results to suggest starting values for the intended analysis.3) present a two-step algorithm incorporating dependence of Ri on β i . and selection of suitable starting values for the algorithms is essential.In principle. The nonmem laplacian method includes this term and invokes Laplace’s method “as-is” in the case Ri depends on bi . Hartford and Davidian 2000). In fact. Pinheiro and Bates 2000. 5. These features and the availability of supported software have made this approach probably the most popular way to implement nonlinear mixed models in practice... Ko and Davidian (2000) note that it should hold if the magnitude of intra-individual variation is small relative to that among individuals. even if ni are small.2). which can often speed convergence of the algorithms. However. D) and moments (25) and (26). sec. Remarks. the derivation of (26) involves an additional approximation in which a “negligible” term is ignored (e.g. sec. this implies the elements of β i are uncorrelated in the population and hence the phenomena they represent are unrelated. 317). but Ko and Davidian (2000) argue that is still valid approximately for “small” intraindividual variation. which is usually highly unrealistic. which is the case in many applications. 6. When Ri depends on β i . the Laplace approximation is valid only if ni is large. 472. Wolﬁnger and Lin 1997. Davidian and Giltinan (1995. 25 . Pinheiro and Bates 2000.

increasing the computational burden of maximizing the likelihood with the integrals so represented. as in most mixed eﬀects modeling. it may be an unrealistic representation of true “unexplained” population variation. sec. When p(bi . Alternatively. In contrast to an analytical approximation as in Section 3. here. As the integrals over bi in (20) are k-dimensional. numerical approximation of the integrals in (20) may be achieved by Gauss-Hermite quadrature. which simpliﬁes computation. We refer the reader to these references for details of these methods. the latter is the default method for approximating the integrals. D) is a normal density. even after 26 . where “exact” refers to techniques where (20) is maximized directly using deterministic or stochastic approximation to handle the integral. Gauss-Hermite and adaptive Gaussian quadrature are implemented in SAS proc nlmixed (SAS Institute 1999). the number of evaluations grows quickly with k. Pinheiro and Bates (1995. 7) propose an approach they refer to as adaptive Gaussian quadrature. ξ. As a grid is required in each dimension.4) and Davidian and Gallant (1993) demonstrate how to transform them into a series of one-dimensional integrals. For example. advances in computational power have made routine implementation of “exact” likelihood inference feasible for practical use. these approaches can be made as accurate as desired at the expense of greater computational intensity.4 Methods Based on the “Exact” Likelihood The foregoing methods invoke analytical approximations to the likelihood (20) or ﬁrst two moments of p(y i |z i .3. This is a standard deterministic method of approximating an integral by a weighted average of the integrand evaluated at suitably chosen points over a grid. Alternatively. the apparent distribution of the β i . Pinheiro and Bates (1995.3. 2. Ch. the bi grid is centered around bi maximizing (28) and scaled in a way that leads to a great reduction in the number of grid points required to achieve suitable accuracy. where accuracy increases with the number of grid points. the population may be more prone to individuals with “unusual” parameter values than indicated by the normal.3. β. and the former is obtained via method=gauss noad. D). Use of one grid point in each dimension reduces to a Laplace approximation as in Section 3. 2000. Although the assumption of normal bi is commonplace. whose accuracy depends on the sample size ni .

may appear multimodal due to failure to take into account an important covariate. u Advantages of methods that relax distributional assumptions are potential insight on the structure of the population provided by the estimated density or distribution and more realistic inference on individuals (Section 3. This approach is implemented in the Fortran program nlmix (Davidian and Gallant 1992a). Davidian and Gallant 1993) to place no or mild assumptions on the distribution of the random eﬀects. where the degree of truncation controls the ﬂexibility of the representation.. Davidian and Gallant (1992) demonstrate how this can be advantageous in the context of selection of covariates for inclusion in d.accounting for systematic relationships. sec.g. Other approaches to “exact” likelihood are possible.edu/hsc/lab apk/software/uscpack. see Davidian and Giltinan (1995. Mallet 1986. so that integrations in the likelihood are straightforward. Other authors impose no assumptions and work with the β i directly.usc. With covarie ates ai . see Davidian and Giltinan (1995. Mallet (1986) shows that the resulting estimate is discrete. ai ).g. Historically. These considerations have led several authors (e. Software for pharmacokinetic analysis implementing this type of approach via an EM algorithm (Schumitzky 1991) is available at http://www.5 Methods Based on a Bayesian Formulation The hierarchical structure of the nonlinear mixed eﬀects model makes it a natural candidate for Bayesian inference. where the “E-step” is carried out using Monte Carlo integration. 7.2).html. e. which uses Gauss-Hermite quadrature to do the integrals and requires the user to write problem-speciﬁc code.5) have been proposed by M¨ller and Rosner (1997). estimating their distribution nonparametrically when maximizing the likelihood. Mentr´ and Mallet 1994) consider nonparametric estimation of the joint distribution of (β i . 7.3). a key impediment to implementation of Bayesian analyses in complex statistical models was the intractability of the numerous integrations 27 . sec. The latter authors assume only that the density of bi is in a “smooth” class that includes the normal but also skewed and multimodal densities. Walker (1996) presents an EM algorithm to maximize (20). The density represented in (20) by a truncated series expansion.. and the density is estimated with other model parameters by maximizing (20). 3.6). Methods similar in spirit in a Bayesian framework (Section 3.

the joint posterior density of (β. D) = p(β)p(ξ)p(D). bi |z i .. 1 m and deﬁning b and z similarly. β. with uncertainty measured by spread of p(β|y. The nonlinear mixed model served as one of the ﬁrst examples of this capability (e.” they are ordinarily not “integrated out” as in the foregoing frequentist approaches. are all regarded as random vectors on an equal footing. and typically p(β.g. and an “estimate” of β is the mean or mode. Rosner and M¨ller 1994. D. e. the marginal distributions of β. z). from which any desired feature. The integration involved is a daunting analytical task (see Davidian and Giltinan 1995.required. . Rather. β. ξ. and (29). generation of such simulations is more complex than in simpler linear hierarchical models and must be tailored to the speciﬁc 28 . . ξ. p. . upon which inference is based. 220). D. ξ. (16). ξ. may then be approximated. writing the observed data as y = (y T . From the Bayesian perspective. Bayesian analysis proceeds by identifying the posterior distributions induced. D) ∼ p(β. D. We provide u only a brief review of the salient features of Bayesian inference for (15)–(16). MCMC techniques yield simulated samples from the relevant posterior distributions. . see Davidian and Giltinan (1995. D and bi . p(y|z) (30) where p(y i . ξ. vigorous development of Markov chain Monte Carlo (MCMC) techniques to facilitate such integration in the early 1990s and new advances in computing power have made such Bayesian analysis feasible. ξ. z) = m i=1 p(y i . y T )T . b) is given by p(β. Ch. because the bi are treated as “parameters. The marginals are then obtained by integration of (30). i. . bi |z i . ξ. such as the mode. ξ. 1994). b. However. ξ.g. D) is given in (19). D)p(β. . 8) for an introduction in this speciﬁc context and Carlin and Louis (2000) for comprehensive general coverage of modern Bayesian analysis. D and the bi or β i given the observed data. ξ. z). . ξ. b|y. (29) The hyperprior distribution is usually chosen to reﬂect weak prior knowledge. D) . i = 1. Given such a full model (15).. Wakeﬁeld et al. Placing the model within a Bayesian framework requires speciﬁcation of distributions for (15) and (16) and adoption of a third “hyperprior” stage Stage 3: Hyperprior. . Here. β.e. and the denominator follows from integration of the numerator with respect to β. Because of the nonlinearity of f (and possibly d) in bi . the posterior for β is p(β|y. m. D). (β..

results of Davidian and Gallant (1992) and Wakeﬁeld (1996) for a pharmacokinetic application are remarkably consistent. The resulting phenomenon of “borrowing strength” across individuals to inform inference on a randomlychosen such individual is often exhibited for the normal linear mixed eﬀects model (f linear in bi ) by showing that E(bi |y i . inferences obtained by Bayesian and frequentist methods agree in most instances. so are immediately available. A feature of the Bayesian framework that is particularly attractive when a “scientiﬁc” model is the focus is that it provides a natural mechanism for incorporating known constraints on values of model parameters and other subject-matter knowledge through the speciﬁcation of suitable proper prior distributions.3. In particular. Gelman et al. where certain compartment models are standard. z i ) can be written as linear combination of population. is available. 7. This complicates implementation via all-purpose software for Bayesian analysis such as WinBUGS (http://www. 3.mrc-bsu. The work cited above and Wakeﬁeld (1996). 3. Whether from a frequentist or Bayesian standpoint. With weak hyperprior speciﬁcations. elucidation of individual characteristics may be of interest. (2001) are only a few examples of detailed demonstrations in the context of speciﬁc applications.. For pharmacokinetic analysis.g. (1996) demonstrate this capability in the context of toxicokinetic modeling. a WinBUGS interface. Alterna- 29 . M¨ller and Rosner (1997).3.med. sec. Carlin and Louis 2000.ac.3). It is beyond our scope to provide a full account of Bayesian inference and MCMC implementation. The spirit of this result carries over to general models and suggests using the posterior distribution of the bi or β i for this purpose. and u Rekaya et al. such posterior distributions are a by-product of MCMC implementation of Bayesian inference for the nonlinear mixed model.and individual-level quantities (Davidian and Giltinan 1995. Carlin and Louis 2000. Wakeﬁeld 1996.cam. the nonlinear mixed model assumes that individuals are drawn from a population and thus share common features.problem in many instances (e.ic. e.shtml). 3.g.3). sec. sec.html).ac.6 Individual Inference As discussed in Section 2. PKBugs (http://www.uk/bugs/winbugs/contents.uk/divisions/60/pkbugs web/home.

If rich intra-individual data are available for all i. Mandema. First order conditional methods yield reliable inferences for both rich (ni “large”) and sparse data situations. from a frequentist standpoint. “estimates” of the bi are often exploited in an ad hoc fashion to assist with identiﬁcation of an appropriate second-stage model d. Speciﬁcally. and Sheiner (1992) use generalized additive models to aid interpretation. D are regarded as ﬁxed. β.. This leads to what is known as empirical Bayes inference (e. both on the basis of performance and the ease with which they are explained to non-statisticians. 3).4. this is in contrast to their performance when applied to generalized linear mixed models for binary data. we can recommend them for most practical applications.tively. With sparse individual data (ni “small”). 30 . such as β i = β + bi . Carlin and Louis 2000. Of course. the analyst has a range of options for nonlinear mixed model analysis. 3.6). such graphical techniques may be supplemented by standard model selection techniques. a common tactic is to ﬁt an initial model in which no covariates ai are included. where they can lead to unacceptable biases..5. where now β.7 Summary With a plethora of methods available for implementation.g. an analogous approach is to base inference on bi on the mode or mean of the posterior distribution (28). such “estimates” for β i are then obtained as β i = d(ai . methods based on individual estimates (Section 3. are attractive. bi ). Verotta. likelihood ratio tests to distinguish among nested such models or inspection of information criteria.3. Apparent systematic patterns are taken to reﬂect the need to include that element of ai in d and also suggest a possible functional form for this dependence. e. in our experience.” For a general second stage model (16). the choice is limited to the approaches in Sections 3. Here. Davidian and Gallant (1992) and Wakeﬁeld (1996) demonstrate this approach in a speciﬁc application. “Exact” likelihood and Bayesian methods in require more sophistication and commitment on the part of the user. and 3. 3. Accordingly. and plot the components of bi against each element of ai . obtain Bayes or empirical Bayes “estimates” bi . Ch. ξ. bi in (27) are often referred to as empirical Bayes “estimates. In both frequentist and Bayesian implementations. As these parameters are unknown. it is natural to substitute estimates for them in (28).g.

EXTENSIONS AND RECENT DEVELOPMENTS The end of the twentieth century and beginning of the twenty-ﬁrst saw an explosion of research on nonlinear mixed models. so note only selected highlights.. Many of the issues that arise for linear mixed models carry over.. discussed in Section 2. These authors also establish large-m/large-ni theoretical properties. In the nonlinear case. is on the boundary of the allowable parameter space for variances. Moreover. The issue of testing whether all elements of β i should include associated random eﬀects.Demidenko (1997) has shown that these methods are equivalent asymptotically to ﬁrst-order conditional methods when m and ni are large. sec.3. is equal to zero. E.g. the same issues apply to “exact” or approximate likelihood (under the ﬁrst order or ﬁrst order conditional approaches) inference. sec. estimation of D and ξ may be poor. see also Vonesh (2002). the reliability of standard errors for estimators for β may be poor in small samples in part due to failure of the approximate formulæ to take adequate account of uncertainty in estimating D and ξ. it is well-known for the linear mixed model that the usual test statistics do not have standard null sampling distributions. to the nonlinear case.2. leading to concern over the impact on estimation of β. New approximations and computation. Lindstrom and Bates (1990) and Pinheiro and Bates (2000. 4. Under these conditions.1) propose approximate “restricted maximum likelihood” estimation of these parameters in the context of ﬁrst order conditional methods. involves the same considerations as those arising for inference on variance components in linear mixed models.g. (2002) propose a higher-order conditional approximation to the likelihood than that obtained by the Laplace approximation and show that this method can achieve gains in eﬃciency over ﬁrst order conditional methods and approach the performance of “exact” maximum likelihood.4). Raudenbush et al. Verbeke and Molenberghs (2000. 7.2. the approximate distribution of the likelihood ratio test statistic is a mixture of chi-squares. for example. see. a null hypothesis that a diagonal element of D. For small samples. at least approximately. e. 6. representing the variance of the corresponding random eﬀect. We cannot hope to do this vast literature justice. (2000) discuss use of a sixth-order 31 . The cited references should be consulted for details. Vonesh et al.

then it is natural to expect their values to change with changing covariate values. These methods could also be applied to the likelihood (20). In each interval.. write β ij = d(aih . Similarly. In some settings. Hall and Bailey (2001) and Hall and Clutter (2003) discuss studies in forestry where longitudinal measures of yield or growth may be measured on each tree within a plot. i. individuals may be observed longitudinally over more than one “occasion. even without changing covariates. weight. See Pinheiro and Bates (2000 sec. bi ) to denote the value of the parameters at tij when tij ∈ Ih .2) for a discussion of such multilevel models. Karlsson and Sheiner (1993) note that pharmacokinetic behavior may vary naturally over time. Levels of nested random eﬀects are also natural in other settings. 7. more than one response measurement may be taken longi- 32 . where concentration measurements may be taken over several distinct time intervals following diﬀerent doses.1. bi . both in the context of generalized linear mixed models.” One example is in pharmacokinetics. parameters values may ﬂuctuate. . say. or measures of renal function may also be obtained and are thus “time-dependent” in the sense that their values change across intervals for each subject. where aih gives the values of the covariates during Ih . if subjects are observed over several dosing intervals. . say. modify (16) to β ij = d(ai . . This is accommodated by a second-stage model with nested random eﬀects for individual and interval-within-individual. Again letting β ij denote the value of the parameters when tij ∈ Ih . This assumes that such “inter-occasion” variation is entirely attributable to changes in covariates. (2001) consider milk yield data where each cow is observed longitudinally during it ﬁrst three lactations. β. Often. If pharmacokinetic parameters such as clearance and volume of distribution are associated with such covariates. Thus.Laplace-type approximation and Clarkson and Zhan (2002) apply so-called spherical-radial integration methods (Monahan and Genz 1997) for deterministic and stochastic approximation of an integral based on a transformation of variables. .e. Rekaya et al. where now bi and bih are independent with means zero and covariance matrices D and G. Multilevel models and inter-occasion variation. If there are q dosing intervals Ih . then this may be represented by modifying (16) to allow the individual parameters to change. bih ). Multivariate response. h = 1. covariates such as enzyme levels. q. β. Similarly.

mismeasured. The authors suggest that the model provides ﬂexibility for accommodating possible model misspeciﬁcation and may be used as a diagnostic tool for assessing the form of time dependence in a fully parametric nonlinear mixed model. where a pharmacodynamic model for the relationship between concentration and response is postulated in terms of subject-speciﬁc parameters and linked to a pharmacokinetic model for the time-concentration relationship. Hall and Clutter (2003) extend the basic model and ﬁrst order conditional ﬁtting methods to handle both multivariate response and multiple levels of nested eﬀects. where both drug concentrations and measures of some physiological response are collected over time on the same individual. This yields a version of (15) where responses of each type are “stacked” and depend on random eﬀects corresponding to parameters in each model. An approach to take appropriate account of censored responses due to a lower limit of detection as in the HIV dynamics setting in Section 2. Wu and Wu (2002b) propose a multiple imputation approach to accommodate missing covariates ai . (2002) study related methods in the context of pharmacokinetic analysis. Lindstrom (1995) develops methods for nonparametric modeling of longitudinal proﬁles that involve random 33 .tudinally on each individual. Motivated by studies of timber growth and yield in forestry. Ko and Davidian (2000) develop ﬁrst order conditional methods applicable when components of ai are measured with error. Li et al. Wang and Davidian (1996) study the implications for inference when the observation times for each individual are recorded incorrectly.5) and Bennett and Wakeﬁeld (2001). Mismeasured/missing covariates and censored response. missing. A key example is again from pharmacology. 9.1 is proposed by Wu (2002). Examples are given by Davidian and Giltinan (1995. Wu also extends the model to handle mismeasured and missing covariates ai . and censored data may arise. Ke and Wang (2001) propose a generalization of the nonlinear mixed eﬀects model where the model f is allowed to depend on a completely unspeciﬁed function of time and elements of β i . where multiple such measures are collected. sec. As in any statistical modeling context. which are in turn taken to be correlated in (16). The goal is to develop a joint pharmacokinetic/pharmacodynamic model. Semiparametric models.

A topic that has generated great recent interest in the pharmaceutical industry is so-called “clinical trial simulation. Methods for model selection and determination are studied by Vonesh. these authors also derive large-sample properties of the approach. Nonlinear mixed models are at the heart of this enterprise. For a model of the form (16). and Hay (1997) propose conﬁdence intervals for ratios of components of the ﬁxed eﬀects β. See. M¨ller. the distribution of the bi is left completely unspeciﬁed and is estimated nonparametrically. Chinchilli. a hypothetical population is simulated to which the analyst may apply diﬀerent drug regimens according to diﬀerent designs to evaluate potential study outcomes. More generally. Other topics. for example. Methods for combining data from several studies where each is represented by a nonlinear mixed model are discussed by Wakeﬁeld and Rahman (2000) and Lopes.4 that do not require consideration of e the joint distribution of β i and ai .com/products/trial simulator. Chen.pdf). http://www.S. Young. DISCUSSION This review of nonlinear mixed eﬀects modeling is of necessity incomplete. Concordet and Nunez (2000) and Chu et al. and Pu (1996). nonlinear mixed eﬀects modeling techniques have been advocated for population pharmacokinetic analysis in a guidance issued by the U. Oberg and Davidian (2000) propose methods for estimating a parametric transformation of the response under which the Stage 1 conditional density is normal with constant variance. Lai and Shih (2003) develop alternative methods to those of Mentr´ and Mallet (1994) cited in Section 3. Zerbe. Yeap and u Davidian (2001) propose “robust” methods for accommodating “outlying” responses within individual or “outlying” individuals. and Wu and Wu (2002a).” Here. Food and Drug Administration (http://www.eﬀects and may be ﬁtted using standard nonlinear mixed model techniques. as it is beyond 34 . (2001) discuss interesting applications in veterinary science involving calibration and prediction problems. 5. and Chang (1997). Dey.fda. Pharmaceutical applications. and Rosner (2003).gov/cder/guidance/1852fnl. subjects are simulated from mixed models for pharmacokinetic and pharmacodynamic behavior that incorporate variation due to covariates and “unexplained” sources thought to be present.pharsight.

“Calibration for Nonlinear Mixed Eﬀects Models: An Application to the Withdrawal Time Prediction.K. C.G. and Clayton.L.E. M.B. and Cohen.” Journal of Pharmacokinetics and Biopharmaceutics. Clarkson. Second Edition.. Davidian. New York: Chapman and Hall/CRC Press. North Carolina State University. (2001). 195–222. and Beal S. and Wakeﬁeld. and interpretation of nonlinear mixed models in practice. 1040–1046. Sielken. N. 9–25. P.J.. 20.H. A. Bennett. “Smooth Nonparametric Maximum Likelihood Estimation for Population Pharmacokinetics. Boeckmann..” CRC Critical Reviews in Biomedical Engineering. (2002).Y.A. 57. “Using Spherical-Radial Quadrature to Fit Generalized Linear Mixed Eﬀects Models. We have not presented an analysis of a speciﬁc application. 803–812.J. O. R.. (1993). Introductory Guide. (2003) in this issue oﬀer detailed demonstrations of the formulation. Chu. “Errors-in-Variables in Joint Population Pharmacokinetic/Pharmacodynamic Modeling.” Journal of the American Statistical Association.” Biometrics. Jr.B. Y. D.” Journal of Agricultural.. 11. D. and Louis.. Nlmix: A program for maximum likelihood estimation of the nonlinear mixed eﬀects model with a smooth random eﬀects density. Breslow. and Gallant.B.” Journal of Computational and Graphical Statistics.L. Williams. 294–301.D. Carlin. S. (1982). T. (2003) and Yeap et al. J. Stanley. REFERENCES Beal. 57. Concordet. and we hope that this will oﬀer readers familiar with the topic additional insight and provide those new to the model a foundation for appreciating its rationale and utility. Part V. Clayton. (1992b).. “Estimating Population Pharmacokinetics. ‘Using a Non-linear Mixed Eﬀects Model to Characterize Cholinesterase Activity in Rats Exposed to Aldicarb. Wang. R. M. “Approximate Inference in Generalized Linear Mixed Models.” Biometrics.1 and Clayton et al.. San Francisco: University of California. J. Starr. (2001). and Sheiner. 56. and Environmental Statistics. and Nunez. (2000).G. K.. We look forward to continuing methodological developments for and new applications of this rich class of models in the statistical and subject-matter literature.P. implementation.A. the references cited in Section 2. 35 . ACKNOWLEDGMENTS This research was partially supported by NIH grants R01-CA085848 and R37-AI031789.. (1992a). D. B. “Statistical Evaluation of the Regulatory Guidelines for Use of Furosemide in Race Horses.. Tobia. (1992). N.. Davidian. and Gallant. NONMEM User’s Guide.the limits of a single article to document fully the extensive literature. and Zhan. A.. Pontal. 8.L. (2000). T. A. We have chosen to focus much of our attention on revisiting the considerations underlying the basic model from an updated standpoint.G.” Biometrics. ????-????. S. (2003).R. L. With Application to Quinidine. 639–659.B.. 88. 529–556. Department of Statistics. L. R. Biological. A.. 8.L. Sheiner. Bayes and Empirical Bayes Methods for Data Analysis. N.

L. S. H. 311–321.. Chen.J. “Multivariate multilevel nonlinear mixed eﬀects models for timber yield predictions. (2003)...” Biometrika. (2001). Wolﬁnger. (1997). Demidenko.. Ke. (2001). “A Non-Linear Mixed-Eﬀects Model to Predict Cumulative Bole Volume of Standing Trees... Gregoire. and Bailey.” Biometrics. M. “Correcting for Measurement Error in Individual-Level Covariates in Nonlinear Mixed Eﬀects Models. and Jiang. D.G. O. Second Edition.R. 651–672. 1239–1252. 475–488. 56. L. and R. D. and Sheiner.B.. “Nonparametric Estimation in Nonlinear Mixed Eﬀects Models. 59–73.. Ko.”Journal of the American Statistical Association. A.. D. “Physiological Pharmacokinetic Analysis Using Population Modeling and Informative Prior Distributions. 21. Nonlinear Models for Repeated Measurement Data. (1996b). “Modeling and Prediction of Forest Growth Variables Based on Multilevel Nonlinear Mixed Models. New York: Chapman and Hall. 368–375. 47. and Environmental Statistics... (1997). 1. L. Karlsson.R. Liang. Hall. (1993).G. P.. 53. 257–271. 55. and Future Directions. “The Importance of Modeling Inter-Occasion Variability in Population Pharmacokinetic Analyses. (2000). Analysis of Longitudinal Data. “Some simple methods for estimating intra-individual variability in nonlinear mixed eﬀects models. Davidian.J.Davidian. Fang. and Davidian. (1995). D.M. and Chang. Hall. Diggle. K.. Gregoire. Russek-Cohen. 287–300. 1272–1298. 107–119.M. A. (1995). 47.B. T. M.L. W. in press.” Computer Methods and Programs in Biomedicine. (1999).G. Gelman. P. 36 . 207–216. “Consequences of Misspecifying Assumptions in Nonlinear Mixed Eﬀects Models. (1993). Davidian.” Biometrics. and Wang. (1993). “NLMEM: A New SAS/IML Macro for Hierarchical Nonlinear Models. F.D. “Marginally Speciﬁed Logistic-Normal Models for Longitudinal Binary Data. A. (2000).” Biometrics. M.. S. R. Heagerty. 735–750. Bois. D. ” Biometrics.G.. and Shih. T. P. Applications.” Biometrika.. M.” Journal of Agricultural. Y. (1996a). M. (1998).. Oxford: Oxford University Press. Diggle. L. Biological.” Computational Statistics and Data Analysis. and Bailey. 96. and Giltinan. 80. (2001).-Y.O. Lai.O. T. and Schabenberger. 55.. (1996).J.-C.B. Heagerty. and Gallant. 34.. (2003). 90. T.L. C. Galecki. “Nonlinear Mixed-Eﬀects Modeling of Cumulative Bole Volume With Spatially-Correlated Within-Tree Data.M. Karlsson.” Biometrics. E. Gregoire. D. P. 91. New York: Springer.K.” Journal of the American Statistical Association. M.. 139–164. and Giltinan. H. and Davidian. Warren..” Journal of Pharmacokinetics and Biopharmaceutics. E. and Clutter. 688–698. 23.” Forest Science. Dey. “Nonlinear Mixed Eﬀects Modeling for Slash Pine Dominant Height Growth Following Intensive Silvicultural Treatments.” in Modeling Longitudinal and Spatially Correlated Data: Methods. “Three New Residual Error Models for Population PK/PD Analyses. M. “Bayesian Approach for Nonlinear Random Eﬀects Models. 49. 1–13. M..B. Hartford. “Semiparametric Nonlinear Mixed Models and Their Applications. and Schabenberger.L. 23. R. M.L. “The Nonlinear Mixed Eﬀects Model With a Smooth Random Eﬀects Density. Beal. M.” Journal of Applied Statistics. “Asymptotic Properties of Nonlinear Mixed Eﬀects Models. O.H. and Zeger. eds. Z. Brillinger. (2001). 1400–1412.” Forest Science. A.T.” Journal of Pharmacokinetics and Biopharmaceutics. and Sheiner.

G. 664–674. 73. McRoberts.. (1997). D.” Canadian Journal of Fisheries and Aquatic Sciences. “Predicting Time to Prostate Cancer Recurrence Based on Joint Models for Non-linear Longitudinal Biomarkers and Event Time. Stroup.A... J. Bois. R.. J. F. 424–432. N. Rate of HIV-1 decline following antiretroviral therapy is related to viral load at baseline and drug regimen.” Biometrika. “A Maximum Likelihood Estimation Method for Random Coeﬃcient Regression Models.M.. and Crowley.J. (1997). 547–563.M..H. 66–75. Mentr´. and Sandler. Lopes. “A Bayesian Population Model With Hierarchical Mixture u Priors Applied to Blood Count Data.. Milliken. and Rosner. 3195–3211. G.. “Self-Modeling With Random Shift and Scale Parameters and a Free-Knot Spline Shape Function. (2002). J. “Comparing Linear and Nonlinear Mixed Model Approaches to Cosinor Analysis. “Handling Covariates in Population Pharmacokinetics.. D. H. 37 . J. Jones. Taylor. 92. T. Morrell.J. S. Ryan. M. Littell.” Journal of the American Statistical Association. A. G. D. J. SAS System for Mixed Models.. 1279– 1292.C. (2003).. S.D. 673–687. de Wolf.J. Goudsmit. F.. (1995).. 645–656. Mandema. S. 511–529. A. G. L. Pilling.. and Mittler. 46. R.W. A. 14. and Gupta. and Rosner. Ibrahim. 25–33. 45–53. A. 1098– 1106. M. “Building Population Pharmacokinetic/Pharmacodynamic Models. J.K. and Wolﬁnger.E.J. (2002). P. A. Monahan.H. and Rogers. M¨ller..” Journal of the American Statistical Association. 3897–3911.G.. W.. 36. and Genz.” Biometrics. 20..M.” Statistics in Medicine..W. Verotta.. Brown. (1995)..G. R. T. L.H. (1994). (1992). (2002). 58. 92.. Ngo..” Applied Statistics. D. Zerbe. Kirkwood. L. Carter. “A Bayesian Compartmental Model for the Evaluation of 1..Y.L. Lee. 21. Pauler.” Biometrics.F. 56.” Statistics in Medicine. M. and the Correlation Between von Bertalanﬀy Growth Parameters. M. “Estimation and Inference for a SplineEnhanced Population Pharmacokinetic Model. 90..P. 2009–2021. Oberg. F.. K.B. “The Joint Modeling of a Longitudinal Disease Progression Marker and the Failure Time Process in the Presence of a Cure. 22. and Bates. 3.. 291–305. 59.T.O.B. D. and Mallet. (1998).. Li.3-Butadiene Metabolism.” Canadian Journal of Zoology.. Brooks. H.. and Walker. G.” Biometrics. (1996). “Bayesian meta-analysis for longitudinal data u models using multivariate mixture priors. 76. R. C.M. 601–611. and Smith.L. L.D.S..” Journal of Pharmacokinetics and Biopharmaceutics. R. “Estimating Data Transformations in Nonlinear Mixed Eﬀects Models. (1998).L. Lindstrom. H. Biometrics. P. (1990). and Sheiner.” Journal of the American Statistical Association. L.. Pearson. Lindstrom. Mezzetti. (2003). and Davidian. Cary NC: SAS Institute Inc. and Brant. “Using Nonlinear Mixed Eﬀects Models to Estimate Size-Age Relationships for Black Bears.W. S. Notermans. J.G.. “An Improved Method for Estimating Individual Growth Variability in Fish. 1483–1490.Law.J.. (1986).. L.. Danner.B.A. and Finkelstein. (2002). “Nonlinear Mixed Eﬀects Models for Repeated Measures Data. AIDS 12. 52. 59. M¨ller. Mikulich. Mallet. Perelson.” Biostatistics.J.. G. “Estimating Unknown Transition Times Using a Piecewise Nonlinear Mixed-Eﬀects Model in Men With Prostate Cancer. (2003)..” Intere national Journal of Biomedical Computing. 65–72.. M. (2000).” Statistics in Medicine. “Spherical-Radial Integration Rules for Bayesian Computation.

Contemporary Statistical Models for the Plant and Soil Sciences. 68. 15. S. D. F. and Rahman. Mixed-Eﬀects Models in S and Splus. 265–292. (1995).L. Sheiner.M. and M¨ller.F. 635–651.E. A. “Approximations to the Log-Likelihood Function in the Nonlinear Mixed-Eﬀects Model. K. and Molenberghs. 48.C. and Yosef. “The Combination of Population Pharmacokinetic Studies.. 32. and Ludden. “The Bayesian Analysis of Population Pharmacokinetic Models. and Pu. Nie.” Biometrics. 1–17.F.” Lifestock Production Science. 572–587. H. “Goodness-Of-Fit in Generalized Nonlinear Mixed-Eﬀects Models. 97. G. A.E. “Population pharmacokinetics/pharmacodynamics. (1996). New York: CRC Press. “Hierarchical Nonlinear Model for Persistency of Milk Yield in the First Three Lactations of Holsteins.F.” Journal of Pharmacokinetics and Biopharmaceutics. G. 4.. Pinheiro. 43.” Applied Mathematics and Computation. “Pharmacokinetic/Pharmacodynamic Analysis of Hematologic u Proﬁles. SAS Institute (1999)..G.. E. Yang. V. and Bates. and Shook. 8. Vonesh. (1977). 12–35.C. Vonesh.” Biometrics. 67... New York: Springer. A. E.. Wakeﬁeld. D. “Maximum Likelihood for Generalized Linear Models With Nested Random Eﬀects Via High-Order.M...A.L. “Mixed-Eﬀects Nonlinear Regression for Unbalanced Repeated Measures. E. D. 38 . L. “Nonparametric EM Algorithms for Estimating Prior Distributions. (2002). Vonesh. (2000). G.M.B.. 45. T. Chinchilli. 52. Wakeﬁeld. Vonesh.Pinheiro.. Vonesh. Version 8.. D.. 181–187. (2002). Racine-Poon. and Chinchilli. Multivariate Laplace Approximation.F. 19–30. 447–452. Linear and Nonlinear Models for the Analysis of Repeated Measurements.. (2000). M. (1992).. and Boisvieux. and Gianola.” Lifestock Production Science. Rodriguez-Zas.” Annual Review of Pharmacological Toxicology. 62–75.” Journal of the American Statistical Association. R. Smith. Rekaya. and Bates. A. O.V.L. Verbeke. B. Golmard. and Majumdar. Sheiner. “Estimation of Population Characteristics of Population Pharmacokinetic Parameters From Routine Clinical Data. (1984). 271–283. M. P.. 201–221. “A Note on the Use of Laplace’s Approximation for Nonlinear Mixed-Eﬀects Models.. (2000). Rosner. 143–157 Steimer.L. 91. New York: Springer. Mallet. 83. (1996). Rosenberg.B. Gianola.” Applied Statistics. 141–157. 22.” Journal of of Pharmacokinetics and Biopharmaceutics. (1994). E. E. “Bayesian Analysis of Linear and Nonlinear Population Models by Using the Gibbs Sampler.. J. J. Raudenbush.M. Cary.. S.J. and Marathe.M.W. A. (1994). “Alternative Approaches to Estimation of Population Pharmacokinetic Parameters: Comparison with the Nonlinear Mixed Eﬀect Model. (2000).L. “Evaluation of models for somatic cell score lactation patterns in Holsteins.. 263–270. N. (1997). and Pierce. 499–524. PROC NLMIXED. Linear Mixed Models for Longitudinal Data.M. “Conditional Second-Order Generalized Estimating Equations for Generalized Linear and Nonlinear Mixed-Eﬀects Models. Wang. (2001). J.” Drug Metabolism Reviews.” Journal of Computational and Graphical Statistics. 185–209.” Journal of the American Statistical Association. Schumitzky. New York: Marcel Dekker. V. L. Schabenberger. J.. J. J.C.. V. (1996). J.” Journal of Computational and Graphical Statistics. D. (2000). SAS OnlineDoc. Wakeﬁeld.” Biometrika.F..F. Weigel. 9. (1992). G..” Biometrics.W. (1991). 56. J. L. NC: SAS Institute Inc. and Gelfand.. K.

(1997). “Testing Homgeneity of Intra-run Variance Parameters in Immunoassay. and Wu. L.. 57. Yeap.” Applied Statistics. 97. “Population HIV-1 Dynamics in vivo: Applicable Models and Inferential Tools for Virological Data From AIDS Clinical Trials. H. A. G. and Davidian.” Journal of Agricultural. and Ratios of Parameters of Linear and Nonlinear Mixed-Eﬀects Models. 2002. and Hay.A. M. 16. (2002). and Davidian. 51. and Lin. Q. “An EM algorithm for Nonlinear Random Eﬀects Models. (2001). and Ding. Biological. and Wu. 410–418. (1996).L. (2002a). B. 934–944.. Wu. 8. 753–771. L. S. and Davidian.” Computational Statistics and Data Analysis.. and Davidian. (1996). “Missing Time-Dependent Covariates in Human Immunodeﬁciency Virus Dynamic Models. H. R. (1993).D.” Biometrics. Ryan. Wu. “A Joint Model for Nonlinear Mixed-Eﬀects Models With Censoring and Covariates Measured With Error. (2002b). 266–272.G. 55. 39 .” Biometrics. “Robust Two-Stage Estimation in Hierarchical Nonlinear Models. (1999). 25.” Journal of the American Statistical Association.Y.O. Yeap. M. 80. 83. L. Scheﬀ’e Simultaneous Conﬁdence Intervals. Wolﬁnger. X. 52. 21. (2003).. Wu. Zerbe.” Biometrika.M. D. 955–964.. “Laplace’s Approximation for Nonlinear Mixed Models.J. ‘Robust Two-Stage Approach to Repeated Measurements Analysis of Chronic Ozone Exposure in Rats.W. W.” Statistics in Medicine. “Identiﬁcation of Signiﬁcant Host Factors for HIV Dynamics Modelled by Non-Linear Mixed-Eﬀects Models. 1765–1776.A. Wang.. 53.. Zeng. P. 791–795. R. “Fieller’s Theorem. H. Wu. N.” Biometrics. “A Note on Covariate Measurement Error in Nonlinear Mixed Eﬀects Models. Catalano.. M.Walker. “Two Taylor-series Approximation Methods for Nonlinear Mixed Models.L. (1997).L. ????-????. 838–347.. and Environmental Statistics.Y. 801–812... L. 465–490. Wolﬁnger. M.” Statistics in Medicine. Young. With Application to AIDS Studies.” Biometrika.” Biometrics. B.. (1997).

(mg/L) 0 0 2 4 6 8 10 12 5 10 Time (hr) 15 20 25 Figure 1.Theophylline Conc. 40 . Theophylline concentrations for 12 subjects following a single oral dose.

log10 Plasma RNA (copies/ml) 1 0 2 3 4 5 6 7 20 40 Days 60 80 Figure 2. The lower limit of detection of 100 copies/ml is denoted by the dotted line. 41 . Viral load proﬁles for 10 subjects from the ACTG 315 study.

Intra-individual sources of variation. The solid line is the “inherent” trajectory. and the solid diamonds are measurements of the “realization” at particular time points that are subject to error. the dotted line is the “realization” of the response process that actually takes place. 42 .y 0 0 2 4 6 8 10 12 5 10 15 20 t Figure 3.

- Bayesian Analysys of SVAR ModelsUploaded byDaniel Ilie
- Binder6.pdfUploaded byBorn113
- Redes Neuronales - InflaciónUploaded bygan dhi
- j_92Uploaded byshumaiyl
- Hiebert et Sydow, 2009pvar.pdfUploaded bymeone99
- Fertilizer requirements in 2015 and 2030 revisitedUploaded byFlorinel D
- Series Temporales con Statsmodels PythonUploaded byParra Victor
- A Wok UseUploaded byrunawayyy
- JHONSON_DINARDO.pdfUploaded byeeretamalvs
- Social Media, Traditional Media, and Music SalesUploaded byWarren Smith QC (Quantum Cryptanalyst)
- Chapter 00Uploaded byMuhammad Satrio
- ex_cointUploaded bysrieconomist
- Koop, Korobilis - Large Time-Varying Parameter VARs - 2012Uploaded byRenato Saraiva
- UntitledUploaded bypuplu123
- Anomaly Detection Capability of Existing Models of Continuity Equations in Continuous AssuranceUploaded byErik van Kempen
- GAP BOOTSTRAP METHODS FOR MASSIVE DATA SETS WITH AN APPLICATION TO TRANSPORTATION ENGINEERINGUploaded bySederhana Gulo
- CMBLec Chapter 1 ReviewerUploaded bykat milantes
- Supremne Fulvic 5.Medical.aspects.and.Applications.of.Humic.substancesUploaded bydanjulie
- Tutorial 2 Mode of Disease TransmissionUploaded bySaw Chee Hou
- Pvar Stata ModulUploaded byYusuf Indra
- Variable Selection for Multivariate Cointegrated Time Series Prediction With PROC VARCLUS in SAS Enterprise Miner 7.1Uploaded byarijitroy
- Var processUploaded bythienthanctom
- 2 Pharmacokinetics 2018 Medical Pharmacology and TherapeuticsUploaded byGUrpreet Singh
- Maddala_Limited Dependent Variable Models Using Panel DataUploaded bySri Jayanti Dewa Ayu
- 4136-MCQ.docUploaded byBalaKrishna
- one priceUploaded bykibrahim55
- MS 2 Global ﬁnancial crisis, liquidity pressure in stock markets and eﬃciency of central bank interventions.pdfUploaded byPalwasha Malik
- Bjornland Lecture (PhD Course) SVARUploaded byAnonymous O69Uk7
- ArimaыUploaded byKhuslen Amgalan
- Pastor StambaughUploaded bymirando93