Professional Documents
Culture Documents
Analysis Using R
Dose-Response
Analysis Using R
Christian Ritz
Signe Marie Jensen
Daniel Gerhard
Jens Carl Streibig
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
c 2020 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Rea-
sonable efforts have been made to publish reliable data and information, but the author
and publisher cannot assume responsibility for the validity of all materials or the conse-
quences of their use. The authors and publishers have attempted to trace the copyright
holders of all material reproduced in this publication and apologize to copyright holders if
permission to publish in this form has not been obtained. If any copyright material has not
been acknowledged please write and let us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted,
reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other
means, now known or hereafter invented, including photocopying, microfilming, and record-
ing, or in any information storage or retrieval system, without written permission from the
publishers.
For permission to photocopy or use material electronically from this work, please access
www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Cen-
ter, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-
for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system
of payment has been arranged.
Preface ix
1 Continuous data 1
1.1 Analysis of single dose-response curves . . . . . . . . . . . . 2
1.1.1 Inhibitory effect of secalonic acid . . . . . . . . . . . 2
1.1.1.1 Fitting the model . . . . . . . . . . . . . . . 3
1.1.1.2 Estimation of arbitrary ED values . . . . . 6
1.1.2 Data from a fish test in ecotoxicology . . . . . . . . . 6
1.1.3 Ferulic acid as an herbicide . . . . . . . . . . . . . . 9
1.1.4 Glyphosate in barley . . . . . . . . . . . . . . . . . . 13
1.1.5 Lower limits for dose-response data . . . . . . . . . . 19
1.1.6 A hormesis effect on lettuce growth . . . . . . . . . . 23
1.1.7 Nonlinear calibration . . . . . . . . . . . . . . . . . . 26
1.2 Analysis of multiple dose-response curves . . . . . . . . . . 31
1.2.1 Effect of an herbicide mixture on Galium aparine . . 31
1.2.2 Glyphosate and bentazone treatment of Sinapis alba 35
1.2.2.1 A joint dose-response model . . . . . . . . . 36
1.2.2.2 Fitting separate dose-response models . . . 39
v
vi Contents
5 Time-to-event-response data 95
5.1 Analysis of a single germination curve . . . . . . . . . . . . 97
5.1.1 Germination of Stellaria media seeds . . . . . . . . . 97
5.2 Analysis of data from multiple germination curves . . . . . 102
5.2.1 Time to death of daphnias . . . . . . . . . . . . . . . 104
5.2.1.1 Step 1 . . . . . . . . . . . . . . . . . . . . . 104
5.2.1.2 Step 2 . . . . . . . . . . . . . . . . . . . . . 107
5.2.2 A hierarchical three-way factorial design . . . . . . . 109
5.2.2.1 Step 1 . . . . . . . . . . . . . . . . . . . . . 112
5.2.2.2 Step 2 . . . . . . . . . . . . . . . . . . . . . 114
Bibliography 199
Index 211
Preface
The history of dose-response analysis goes back many hundred years. One of
the more unusual applications is that numerous rulers had cupbearers who
tried the ruler’s food and drink to avoid poisoning and probably the demise
of the regent. The dose-response was the survival/health of the cupbearer.
In more recent times, dose-response analysis was applied to data from
controlled experiments where a limited number of doses of a toxic chemical
compound were to be compared to a control group (dose 0) in terms of binary
responses such as whether or not a treated insect was dead or alive after a
certain time period (Finney, 1949). Later dose-response analysis crystallized
into being a certain type of regression analysis. In the seminal work by Finney
(1971) it is explained how to carry out the estimation in the so-called probit re-
gression model through manual calculations. By the late 1970s dose-response
analysis had been extended to log-logistic models for continuous response
(Finney, 1979). In the beginning, such dose-response data were fitted through
linearization (e.g., Streibig, 1981, 1983). Later nonlinear estimation of such
models became available through add-ons and macros for spreadsheet pro-
grams (e.g., Vindimian et al., 1983; Caux and Moore, 1997). General-purpose
statistical software programs also included nonlinear estimation procedures
but without any specific focus on dose-response analysis.
By 2005 the first version of the extension package drc was developed for
the statistical programming environment R (R Core Team, 2018). Originally,
it was developed for nonlinear fitting of log-logistic models that were routinely
carried out in weed science (Ritz and Streibig, 2005). However, subsequently,
the package has been modified and extended substantially, mostly in response
to inquiries and questions from the user community. It has developed into a
veritable ecosystem for dose-response analysis (Ritz et al., 2015). Currently,
such extensive functionality for dose-response analysis does not exist in any
other statistical software. One of the problems that non-statistical scientists
were facing in the past was that guestimates of nonlinear regression parameters
had to be provided upfront before any estimation of parameters could take
place; this was an insuperable problem for many practitioners. To a very large
extent this problem has now been resolved in the package drc through the use
of so-called self-starter routines.
The development of dose-response analysis has undergone dramatic
changes from struggling with cumbersome more or less manual calculations
and transformations with pen and paper to the blink-of-an-eye estimation of
relevant parameters on any laptop.
ix
x Preface
A unified framework
The dose does not necessarily need to be a chemical compound. We define a
dose (metameter) as any pre-specified amount of biological, chemical, or ra-
diation stimuli or stress eliciting a certain, well-defined response. Other kinds
of exposure or stress could also be imagined, e.g., time elapsed in germination
experiments. However, in any case, the dose is a non-negative quantity.
Specifically, we define the response evoked by a specific dose as the quan-
tification of a biologically relevant effect, and as such, it is subject to random
variation. The most common type is a continuous response such as biomass,
enzyme activity, or optical density. A binary or aggregated binary (binomial)
response is also frequently used to describe results such as dead/alive, immo-
bile/mobile, or present/absent (Van der Vliet and Ritz, 2013). The response
may also be discrete as in a number of events observed in a specific time inter-
val such as a number of juveniles, offspring, or roots (Ritz and Van der Vliet,
2009). We will have more examples in later chapters.
A key feature of dose-response analysis is that the experimenter or re-
searcher has to have some a priori idea about the type of model function that
would be relevant for the analysis of her/his dose-response data. In principle,
many nonlinear model functions could be considered for describing how the
average response changes over the range of doses considered. In practice, only
a limited number of functions are used in the majority of applications. Specif-
ically, we will focus on modeling average trends through mostly s-shaped or
related biphasic functions. These functions reflect an a priori basic under-
standing of the causal relationship between the dose and the response, e.g.,
when a dose increases the response decreases between certain limits referred
to as the lower and upper limits, respectively. S-shaped functions have turned
out to be extremely versatile for describing various biological mechanisms;
one key feature is that model parameters provide useful interpretations of
observed effects within a biologically plausible framework. Specifically, dose-
response analysis is often used for screening and ranking of compounds using
estimated effective or lethal doses such as ED50 or LD50 (e.g., WHO, 2005).
The full specification of a statistical dose-response model involves both
specifying the parametric model function and assumptions about the distri-
bution of the responses, i.e., how they randomly fluctuate around the average
value determined by an assumed model function. Distributional assumptions
depend on the type of response observed. However, the same model functions
may be meaningful for different types of responses, and this is the unify-
ing feature of dose-response analysis: It involves dose-response models that
are a collection of statistical models that have a certain mean structure in
common. This is not a mathematical definition in any sense, but rather a
definition driven by applications, which actually makes sense for a statisti-
cal methodology. Consequently, dose-response models encompass a range of
statistical models that could be classified as nonlinear regression, generalized
Preface xi
Acknowledgment
We are fortunate in having some colleagues and experts, Florent Baty, Andrew
Kniss, Andrea Onofri, Janine Wong, and Ming Yi, who kindly agreed to read
sections or the entire manuscript. We are grateful for their valuable comments
and correction of the substance and language. We would stress, however, that
all these helpful people are in no way responsible for any mistakes which still
occur; these are ours alone.
1
Continuous data
yi = f (xi , β) + εi , i = 1, . . . , n (1.1)
with the fixed and random contributions adding up to the observed response
value for each pair of dose and response (xi , yi ), for a total of n measurements.
It is common to assume that the random contributions, the εi ’s in Equa-
tion (1.1), follow a mean-zero normal distribution with an unknown residual
standard deviation, which also is a model parameter to be estimated from
the data. The residual standard error is a measure of the variation between
measurements beyond what is explained by the assumed dose-response model
function. We will also address how to deal with dose-response data that do not
fully satisfy the above assumptions (see Subsection 1.1.3 and Subsection 1.1.4
for examples).
The model specification in Equation 1.1 relies on the assumption that the
variation between replicates is the same for all doses (referred to as variance
homogeneity). In this case, estimation may be carried out using nonlinear
1
2 Dose-response analysis using R
least squares (see Section A.1 for more details) as the dose-response model is
a special case of a nonlinear regression model (Ritz and Streibig, 2008).
In the examples below we will only specify the model function f and im-
plicitly assume that a statistical model is defined through Equation (1.1).
However, we will also address situations where the assumptions of normality
and variance homogeneity are not fulfilled.
In this chapter, we use the following extension packages:
library(drc)
library(devtools)
install_github("DoseResponse/drcData")
library(drcData)
library(boot)
library(lmtest)
library(metafor)
library(sandwich)
secalonic
## dose rootl
## 1 0.000 5.5
## 2 0.010 5.7
Continuous data 3
## 3 0.019 5.4
## 4 0.038 4.6
## 5 0.075 3.3
## 6 0.150 0.7
## 7 0.300 0.4
The first argument supplied to the function drm() is a model formula relating
the response to the predictor. The second argument data is specifying the
dataset where the variables rootl and dose are found. R will not automat-
ically look for variables in the dataset secalonic because they are not in
the search path and it is a good habit to specify the relevant dataset every
time a model is fitted. The third argument fct specifies the dose-response
model function that we want to fit. As there are 7 different doses, a four-
parameter model may easily be fitted (usually as many doses as parameters
are said to be required, but less will also do sometimes, possibly depending
on the choice of model function and the number of replicates). The built-in
function LL.4() in drc provides the four-parameter log-logistic model that
is commonly used in toxicology (see Section B.1 for more details). In short,
this model has four parameters: a lower limit, an upper limit, a parameter
corresponding to ED50, and a parameter for the relative slope at the dose
equal to ED50 (see Figure 1.1).
4 Dose-response analysis using R
plot(secalonic.LL.4,
bp = 1e-3, broken = TRUE,
ylim = c(0, 7),
xlab = "Dose (mM)",
ylab = "Root length (cm)")
7
6
Root length (cm)
5
4
3
2
1
0
0 0.01 0.1
Dose (mM)
FIGURE 1.1
The four-parameter log-logistic model fitted to dose-response data from the
dataset secalonic is plotted together with the original data (no replicates).
Continuous data 5
instance we must choose a smaller value to ensure that the bp value is smaller
than all positive concentrations/doses (otherwise some observations are not
displayed!). The argument log = "" may be used to switch off the default
logarithmic dose axis.
A summary of the fit is obtained using the summary method when applied
to the model fit secalonic.LL.4:
summary(secalonic.LL.4)
##
## Model fitted: Log-logistic (ED50 as parameter) (4 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 2.6542086 0.6962333 3.8122 0.0317398 *
## c:(Intercept) 0.0917852 0.3747246 0.2449 0.8223012
## d:(Intercept) 5.5297495 0.2010300 27.5071 0.0001055 ***
## e:(Intercept) 0.0803547 0.0078829 10.1935 0.0020121 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error:
##
## 0.2957497 (3 degrees of freedom)
The output shows the type of model that was fitted and the parameter esti-
mates for the four model parameters together with the corresponding (esti-
mated) standard errors. Briefly, the parameter b, c, d, and e refer to the slope
parameter for a dose equal to e, which is the dose resulting in a reduction
halfway between the upper limit d and the lower limit c, which is also called
ED50. We refer to Subsection B.1.1 for more details about the four-parameter
log-logistic model.
For each parameter, there are also t-values, which are parameter esti-
mates divided by their standard error, and the resulting p-values, looked up
in an appropriate t distribution; each of them corresponds to testing the null
hypothesis that the parameter is equal to 0 (not necessarily a relevant null
hypothesis to consider). The estimated residual variance is also shown, al-
though it is hardly reported in any publications, but it may still be useful
for understanding variation in the experiment. Possibly, the most interesting
parameter in the summary output is the parameter estimate for e; it is equal
to 0.08 (0.0079).
We also see that the estimated lower limit, which is equal to 0.0918 with a
standard error of 0.375, is not significantly different from zero (p-value = 0.82),
possibly indicating that a three-parameter log-logistic model (with an assumed
lower limit of 0) would also fit the data. However, such ad hoc data-driven
6 Dose-response analysis using R
##
## Estimated effective doses
##
## Estimate Std. Error
## e:1:10 0.0351149 0.0078689
## e:1:20 0.0476628 0.0074229
predict(secalonic.LL.4,
data.frame(dose = ED(secalonic.LL.4, c(10), display = FALSE)))
## Prediction
## 4.985953
The argument display = FALSE switches off showing the output from ED().
summary(O.mykiss.EXD.2)
##
## Model fitted: Exponential decay with lower limit at 0 (2 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## d:(Intercept) 2.846794 0.092526 30.7674 < 2.2e-16 ***
## e:(Intercept) 111.738614 33.196876 3.3659 0.001347 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error:
##
## 0.5598508 (59 degrees of freedom)
Note that we specify the argument na.action to handle (in this case remove)
the 9 missing values mentioned earlier. The resulting model fit plotted together
with all data points (as indicated by the argument type = "all") is shown
in Figure 1.2. Alternatively, the default argument type = "average" would
imply plotting averages of replicates for each dose. Showing all data may be
helpful in evaluating the model fit, whereas showing averages may be a better
choice when reporting results.
The dataset is somewhat larger than in the previous example, so it makes
some sense to investigate if the model assumptions are fulfilled. Graphical
model checking can be based on the residual plot and QQ plot shown in Fig-
ure 1.3 and Figure 1.4, respectively. The visual assessment of the model fit
8 Dose-response analysis using R
plot(O.mykiss.EXD.2,
broken = TRUE,
type = "all",
xlim = c(0, 500), ylim = c(0, 4),
xlab = "Concentration (mg/l)",
ylab = "Weight (g)")
3
Weight (g)
0 1 10 100
Concentration (mg/l)
FIGURE 1.2
A two-parameter exponential decay model fitted to dose-response data from
a fish test (the dataset named O.mykiss) with up to 10 replicates per concen-
tration (all replicates shown).
by the residual plot in Figure 1.3 indicates that the chosen model function
provides an appropriate description of the dose-response data because there is
random scatter above and below the reference line, which is the x axis, as the
residuals ought to be centered around 0. Furthermore, the residual plot shows
no indications of deviations from the model assumption of variance homogene-
ity as the scatter of points has the same spread across the entire range of the
x axis (which is similar to the range in the response values). Likewise, visual
assessment of the model fit by means of the QQ plot in Figure 1.4 indicates
that the normality assumption for the response values is appropriate because
the ranked (raw) residuals approximately match the corresponding expected
ordered values when assuming a normal distribution.
Continuous data 9
plot(fitted(O.mykiss.EXD.2),
residuals(O.mykiss.EXD.2))
abline(h = 0, lty = 2)
residuals(O.mykiss.EXD.2)
1.0
0.5
0.0
−1.0
fitted(O.mykiss.EXD.2)
FIGURE 1.3
Residual plot for the two-parameter exponential decay model fitted to the
dose-response data in the dataset O.mykiss. A reference line corresponding
to the x axis has been added.
As for the previous example, the natural next step would be to estimate
relevant ED values. However, we will not pursue this objective here.
qqnorm(residuals(O.mykiss.EXD.2))
abline(a = 0, b = sd(residuals(O.mykiss.EXD.2)))
0.5
0.0
−1.0
−2 −1 0 1 2
Theoretical Quantiles
FIGURE 1.4
QQ plot for the two-parameter exponential decay model fitted to the dose-
response data in the dataset O.mykiss. A reference line corresponding to the
line with intercept 0 and slope equal to the empirical standard deviation of
the residuals has been added.
The fitted dose-response curve together with the data is shown in Figure 1.5.
We refer to Section C.1 for an example on how to make the plot by using the
package ggplot2 (Wickham, 2016).
Figure 1.5 shows that the variation in the response values is largest for
mid-range concentrations, whereas the variation is less for smaller concentra-
tions and much less for larger concentrations. Therefore, the model assump-
tion about variance homogeneity may be questionable. We can explore this
Continuous data 11
plot(ryegrass.LL.4,
broken = TRUE,
type = "all",
xlab = "Concentration (mM)",
ylab = "Root length (cm)")
8
Root length (cm)
0
0 1 10
Concentration (mM)
FIGURE 1.5
Four-parameter log-logistic model fitted to the dataset ryegrass (all repli-
cates shown).
plot(fitted(ryegrass.LL.4), residuals(ryegrass.LL.4))
abline(h = 0, lty = 2)
1.0
residuals(ryegrass.LL.4)
0.5
0.0
−1.0
2 4 6 8
fitted(ryegrass.LL.4)
FIGURE 1.6
Residual plot for the four-parameter log-logistic model fitted to the ryegrass
dataset.
the concentrations). The QQ plot (not shown) does not reveal any departures
from normality.
One way to adjust for the model misspecification is to use robust standard
errors, which do not require that a modified model is fitted (see Section A.5).
It is a remedy applicable to the already fitted model. We can use the function
coeftest() from the extension package lmtest and rely on functionality from
the package sandwich to obtain robust standard errors.
coeftest(ryegrass.LL.4,
vcov = sandwich)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## b:(Intercept) 2.98222 0.47438 6.2865 3.882e-06 ***
## c:(Intercept) 0.48141 0.12779 3.7672 0.001212 **
Continuous data 13
To allow comparisons, the condensed summary output for the model fit (with
naive standard errors) is shown below.
coef(summary(ryegrass.LL.4))
The output shows a large decrease in the standard error for the parameter
estimate of c (the lower limit) and an increase in the standard error for the
estimated ED50. The standard errors for the estimates of b and d are less
affected. Given Figure 1.5, the changes in the standard errors make sense: In
the initial model fit where variance homogeneity was assumed, the increased
variation observed in the mid-concentration range (where ED50 is found) was
weighed down. The reduced variation observed for the larger concentrations
mostly determining the lower limit (the parameter c) was weighed up, reflect-
ing the assumption of variance homogeneity. The sandwich estimates do not
underlie the assumption of variance homogeneity and, therefore, captures the
actual variation in data to a higher degree. It is of course a balance. How
much variation do we want to accommodate? Ideally only variation caused by
some systematic effect or feature, not random variation.
The next example introduces yet another way to handle some types of
model misspecification.
head(barley)
## Dose weight
## 1 0.00000 57.2
## 2 0.00000 49.8
14 Dose-response analysis using R
## 3 21.09375 62.2
## 4 21.09375 30.6
## 5 42.18750 40.9
## 6 42.18750 70.9
summary(barley.LL.4)
##
## Model fitted: Log-logistic (ED50 as parameter) (4 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 9.7084 42.9166 0.2262 0.82430
## c:(Intercept) 11.1275 3.7803 2.9435 0.01068 *
## d:(Intercept) 52.0478 3.2487 16.0212 2.123e-10 ***
## e:(Intercept) 286.2600 209.6374 1.3655 0.19364
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error:
##
## 9.241941 (14 degrees of freedom)
A residual plot also reveals the problem with the assumption of variance ho-
mogeneity: the variation in the response seems not to be constant (Figure 1.8).
In some cases, a transformation could help to achieve variance homogeneity.
The logarithm transformation would be one option as it is often success-
ful in removing the pattern observed in Figure 1.7: small variation in small
predicted values and large variation in large predicted values. However, in
order to preserve the assumed dose-response model, both the response and
the predicted values based on the model have to be transformed. This tech-
nique was used for the logarithm transformation by Streibig (1983) and, in
general terms, it is described as the transform-both-sides approach (Carroll
and Ruppert, 1988, Chapter 4). It is also possible to search for the optimal
transformation (within a family of transformations). We will consider Box-Cox
transformations, which are power functions with negative and positive expo-
nents (but also including the logarithm); see Section A.3 for more details. It
should be mentioned that there is some evidence that the transform-both-
sides approach may lead to inappropriate fitted dose-response curves in some
Continuous data 15
plot(barley.LL.4,
broken = TRUE,
type = "all",
xlab = "Dose (g a.i./ha)",
ylab = "Biomass (g/pot)")
70
60
Biomass (g/pot)
50
40
30
20
10
0 10 100 1000
Dose (g a.i./ha)
FIGURE 1.7
The four-parameter log-logistic model fitted to the barley data (barley).
cases (Ritz and Van der Vliet, 2009): so it is not an approach that we would
like to recommend for general use.
We can obtain the model fit for the optimal so-called Box-Cox transfor-
mation by refitting the model using the function boxcox(). The argument
method specifies that we want to estimate and apply the optimal power ex-
ponent, which is determined from the slightly more general one-way ANOVA
model rather than directly from the dose-response model (as there is a com-
putational gain in doing so).
summary(barley.LL.4.bc)
16 Dose-response analysis using R
plot(fitted(barley.LL.4), residuals(barley.LL.4))
abline(h = 0, lty = 2)
20
residuals(barley.LL.4)
10
0
−10
−20
10 20 30 40 50
fitted(barley.LL.4)
FIGURE 1.8
Residual plot for the four-parameter log-logistic model fitted to the barley
data (barley).
##
## Model fitted: Log-logistic (ED50 as parameter) (4 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 4.6567 4.3207 1.0778 0.299362
## c:(Intercept) 10.5303 1.1542 9.1231 2.875e-07 ***
## d:(Intercept) 51.4583 5.4983 9.3590 2.109e-07 ***
## e:(Intercept) 250.8884 52.3177 4.7955 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error:
##
Continuous data 17
The optimal power exponent (λ) for the Box-Cox transformation is –0.25,
which is significantly different from 1 (no transformation) as the corresponding
confidence interval does not include 1. However, the value 0 (corresponding to
the log-transformation) is included in the confidence interval. Except for the
slope parameter b, none of the parameter estimates changed dramatically due
to the transformation.
Alternatively, we may also directly specify the appropriate power exponent
using the argument bcVal in the model specification. This approach of making
an informed choice is to be preferred over estimating the optimal transforma-
tion as we did above. Perhaps only a few transformations are really relevant in
practice, e.g., the logarithm transformation and, occasionally, the square root
transformation. The drawback of this approach might be that some a priori
information about the distribution of the response is required (possibly from
other similar experiments).
barley.LL.4.log <- drm(weight ~ Dose,
data = barley,
fct = LL.4(),
na.action = na.omit,
bcVal = 0)
summary(barley.LL.4.log)
##
## Model fitted: Log-logistic (ED50 as parameter) (4 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 6.6523 8.8818 0.7490 0.466271
## c:(Intercept) 10.7738 1.1385 9.4628 1.843e-07 ***
## d:(Intercept) 51.1127 4.4105 11.5890 1.461e-08 ***
## e:(Intercept) 269.2678 76.8634 3.5032 0.003513 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error:
18 Dose-response analysis using R
##
## 0.2457367 (14 degrees of freedom)
##
## Non-normality/heterogeneity adjustment through Box-Cox
## transformation
##
## Specified lambda: 0
We can add the fitted dose-response curve obtained using the transform-both-
sides approach with the logarithm transformation in the plot of the fitted
dose-response curve using untransformed data. The two fitted curves agree
quite well (see Figure 1.9).
plot(barley.LL.4.log,
add = TRUE,
lty = 2)
Finally, we may choose to stay with our initial model but using sandwich
estimates for the standard errors as in the example above. It should, however,
be noticed that the sandwich standard errors rely on the mean structure, i.e.,
70
60
Biomass (g/pot)
50
40
30
20
10
0 10 100 1000
Dose (g a.i./ha)
FIGURE 1.9
The four-parameter log-logistic model fitted to the barley data (barley), both
untransformed data (solid line) and when using a transform-both-sides ap-
proach with the logarithm transformation (dashed line).
Continuous data 19
the model function, being correctly specified, which may be hard to judge for
the present data example with sparse information in the middle of the curve.
coeftest(barley.LL.4,
vcov. = sandwich)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## b:(Intercept) 9.7084 33.8435 0.2869 0.7784
## c:(Intercept) 11.1275 1.1026 10.0918 8.341e-08 ***
## d:(Intercept) 52.0478 4.0530 12.8417 3.899e-09 ***
## e:(Intercept) 286.2600 165.3541 1.7312 0.1054
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The first example uses data from an experiment where ryegrass plants
were treated with a mixture of commercial herbicides that prevents further
growth but leaves the plants green and viable for a long time. The herbi-
cides are acetolactate synthase inhibitors that do not block photosynthesis
per se, but prevent the photosynthetic production from making the plant
grow. The biomass of the plants was measured on the day of spraying (where
the mixture would have no visible effect) and 15 days after spraying (for a
range of doses). The data, which are unpublished, are found in the dataset
ryegrass2.
head(ryegrass2)
Figure 1.10 shows the fitted dose-response curve with a horizontal line added
showing the mean biomass level before spraying (at day 0).
In this instance, the biomass production (growth) at the time of spraying
(day 0) closely corresponds to the lower limit due to the mode of action of the
herbicide.
The second example involves the contact herbicide diquat, which was ap-
plied to red fescue, and after 16 days the biomass was measured (unpublished
data). The herbicide is a desiccant that kills the plant instantaneously. It
means the degradation of the plant commences immediately after the spray-
ing. Therefore, for high doses, we would expect biomass to approach 0.
head(red.fescue, 4)
plot(ryegrass2.LL.4,
broken = TRUE,
ylim = c(0, 250),
xlab = "Dose (g a.i./ha)",
ylab = "Weed biomass (g/pot)")
250
Weed biomass (g/pot)
200
150
100
50
0 1 10 100
Dose (g a.i./ha)
FIGURE 1.10
Fitted four-parameter log-logistic model for biomass of ryegrass 15 days after
being sprayed with a mixture of herbicides that stop growth but leave af-
fected plants green (using the dataset ryegrass2). The dashed line indicates
biomass on day 0, the day when the experiment was sprayed with the herbicide
mixture.
plot(red.fescue.LL.4,
broken = TRUE,
ylim = c(0, 150),
xlab = "Dose (g a.i./ha)",
ylab = "Biomass (g/pot)")
150
Biomass (g/pot)
100
50
0 100 1000
Dose (g a.i./ha)
FIGURE 1.11
Fitted four-parameter log-logistic model for biomass of red fescue 16 days after
being sprayed with the desiccant herbicide diquat that kills the plant quickly.
Dashed broken line indicates the biomass level at the time of spraying (day
0) and the dotted line shows the estimated lower limit after 16 days.
Figure 1.11 shows the fitted dose-response curve together with a horizontal
line representing the level of biomass at the time of spraying (day 0), which
does not coincide with the estimated lower limit that the mean response
is approaching for large doses. In this case, it reflects a different mode of
action of the herbicide. It is worth noting that the data still seem to indi-
cate that there is a positive lower limit, but it is not reflecting growth before
exposure.
Continuous data 23
lettuce
## conc weight
## 1 0.00 1.126
## 2 0.00 0.833
## 3 0.32 1.096
## 4 0.32 1.106
## 5 1.00 1.163
## 6 1.00 1.336
## 7 3.20 0.985
## 8 3.20 0.754
## 9 10.00 0.716
## 10 10.00 0.683
## 11 32.00 0.560
## 12 32.00 0.488
## 13 100.00 0.375
## 14 100.00 0.344
The variables conc and weight contain the concentration and corresponding
weight measurements. A scatterplot of the data is shown in Figure 1.12. A
warning is issued because of the argument log = "x", which imposes a log-
arithmic concentration axis that allows us to better appreciate the hormesis
effect: there is a fairly clear inverse j-shape with an increase at low concentra-
tions before showing the expected decreasing trend for higher concentrations.
We fit the four-parameter Brain-Cousens hormesis model with the lower
limit fixed at 0 as we assume that there will not be any growth for high
concentrations. The special feature of this hormesis model, as well as other
hormesis models, is the additional model parameter that quantifies the degree
of hormesis. For the Brain-Cousens model, this parameter is denoted f , and
the interpretation is that f = 0 implies no hormesis effect, whereas f > 0
implies some degree of hormesis, the larger the value the larger the effect.
However, the actual value of f cannot directly be understood by the scale of
the response. The parameters b and d have the same interpretation as for the
24 Dose-response analysis using R
plot(weight ~ conc,
data = lettuce, log = "x",
xlab = "Isobutylalcohol concentration (mg/L)",
ylab = "Weight (g)")
0.8
0.6
0.4
FIGURE 1.12
Data in the lettuce dataset showing a hormesis effect.
Figure 1.13 shows the original data with the fitted dose-response curve super-
imposed. The model appears to fit the data well.
The summary output shows that the hormesis parameter f is not signifi-
cantly different from 0 (p-value is 0.129), which means that we cannot reject
the hypothesis of no hormetic effect using the dataset lettuce.
Continuous data 25
plot(lettuce.BC.4,
broken = TRUE,
xlab = "Isobutylalcohol concentration (mg/L)",
ylab = "Weight (g)")
1.2
1.0
Weight (g)
0.8
0.6
0.4
0 1 10 100
FIGURE 1.13
The fitted concentration-response curve for the four-parameter Brain-Cousens
model fitted to the dataset lettuce, shown together with the data (average
per concentration).
summary(lettuce.BC.4)
##
## Model fitted: Brain-Cousens (hormesis) with lower limit fixed
## at 0 (4 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 1.282812 0.049346 25.9964 1.632e-10 ***
## d:(Intercept) 0.967302 0.077123 12.5423 1.926e-07 ***
## e:(Intercept) 0.847633 0.436093 1.9437 0.08059 .
## f:(Intercept) 1.620703 0.979711 1.6543 0.12908
26 Dose-response analysis using R
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error:
##
## 0.1117922 (10 degrees of freedom)
More importantly, the presence of a hormesis effect will impact the estimation
of any EC value. Ignoring the hormesis effect will typically lead to too small
(conservative) estimated EC values (Cedergreen et al., 2005). Therefore it may
be relevant to fit a hormesis model even though there is per se no interest in
assessing or describing the hormesis effect.
head(nasturtium, 12)
## conc rep wt
## 1 0.000 1 920
## 2 0.025 1 919
## 3 0.075 1 870
## 4 0.250 1 880
## 5 0.750 1 693
## 6 2.000 1 429
## 7 4.000 1 200
## 8 0.000 2 889
## 9 0.025 2 878
## 10 0.075 2 825
## 11 0.250 2 834
## 12 0.750 2 690
Continuous data 27
summary(nasturtium.LL.3)
##
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0 (3 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 1.350256 0.113557 11.891 1.518e-14 ***
## d:(Intercept) 897.862776 13.844884 64.852 < 2.2e-16 ***
## e:(Intercept) 1.576200 0.095694 16.471 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error:
##
## 55.55935 (39 degrees of freedom)
Backfitting is a quick way to see how good the calibration model is. We use
the function backfit() in drc that takes the model fit object as the only
argument.
backfit(nasturtium.LL.3)
The agreement is good, although for low doses (large response values) esti-
mated doses are too large. The warning message may be ignored as it simply
means it is not possible to estimate dose 0.
Let us assume that a prediction experiment resulted in response values 690,
693, and 722; this is similar to but not identical to what Racine-Poon (1988)
28 Dose-response analysis using R
did. For each single response value, we can use the function ED() for estimating
the corresponding dose with a 95% confidence interval. For instance, for the
value 690 we get:
##
## Estimated effective doses
##
## Estimate Std. Error Lower Upper
## e:1:690 0.648194 0.070708 0.505174 0.791214
Note that the argument type = "absolute" is essential. By default, the argu-
ment is type = "relative", which is suitable for estimating effective doses.
The 95% confidence interval of the estimated dose is [0.505, 0.791]. However,
this interval is likely somewhat too narrow as it does not incorporate the vari-
ation in the response value. Likewise, we could also estimate the dose based
on the mean of the response values:
##
## Estimated effective doses
##
## Estimate Std. Error Lower Upper
## e:1:701.666666666667 0.613385 0.069337 0.473137 0.753633
One way to incorporate variation in the response values, but also utilize that
three values were recorded, is through a bootstrap approach where estimated
doses are repeatedly computed for randomly sampled values from a normal
distribution assumed for the mean of the three response values. Specifically,
we will assume that the mean of 690, 693, and 722 follow a normal
√ distribution
with mean (690 + 693 + 722)/3 and standard deviation 56/ 3 where 56 is the
residual standard error from the above model fit.
To do a bootstrap, we use the extension package boot. As a first attempt
we will do a crude parametric bootstrap, which will incorporate the variation
in the response values, but it will not propagate the uncertainty on parameter
estimates when doing inverse regression. The key function boot() is specified
as follows: The first argument is the data, i.e., the mean response. The second
argument is the R function that converts the data value into an estimated
concentration; this is essentially ED() but being wrapped up in a function en-
vironment to allow only one value to vary. The third argument is the function
that returns randomly sampled values from the assumed normal distribution
of the mean; it is relying on the R function rnorm(). The fourth argument is
Continuous data 29
needed when giving the third argument. The last argument specifies the num-
ber of times to randomly draw mean values (it is common to use 1000). The
function set.seed() is used to ensure that the bootstrap procedure results
in the same results every time (based on the same pseudo-random numbers).
set.seed(201806061)
nasturtium.boot.res1 <- boot((693+722+690)/3,
statistic = function(simyVal){
ED(nasturtium.LL.3,
simyVal,
type = "absolute",
display = FALSE)[1]
},
ran.gen = function(yVal, mle){
rnorm(1, yVal, 55.55935/sqrt(3))
},
sim = "parametric",
R = 1000)
summary(nasturtium.boot.res1)
The resulting 95% confidence interval becomes [0.42, 0.8], which is somewhat
wider than the one we found by simply using ED(), [0.47, 0.75].
30 Dose-response analysis using R
set.seed(201806062)
nasturtium.boot.res2 <- boot(c(nasturtium[["wt"]], 690, 693, 722),
statistic = function(yValues){
ED(drm(head(yValues, -3) ~ conc,
data = nasturtium,
fct = LL.3()),
mean(tail(yValues, 3)),
type = "absolute",
display = FALSE)[1]
},
ran.gen = function(yVal, mle){
rnorm(42+3,
c(fitted(nasturtium.LL.3),
690, 693, 722), 55.55935)
},
sim = "parametric",
R = 1000)
The resulting 95% bootstrap confidence interval is slightly wider than the
previous one, implying that not much additional variation is picked up by
repeating all the steps of the nonlinear calibration. The above results are sim-
ilar to the ones derived by Racine-Poon (1988) through a completely different
approach.
Continuous data 31
head(G.aparine)
The variable drymatter contains the measured dry weight per pot, the vari-
able dose contains doses of the two herbicide treatments, and the variable
treatment encodes the two herbicide treatments: 1 corresponds to phen-
medipham alone and 2 corresponds to the mixture of phenmedipham and
methyl oleate; the latter is an adjuvant that does not have any effect in the
plants.
For each of the two herbicide treatments, a single dose-response sub-
experiment with replicates per dose was carried out. Such an experimental
design implies that the herbicide treatment effects and the sub-experiment
effects (if present) are confounded; had there been multiple dose-response
sub-experiments per herbicide treatment these two effects could have been
disentangled.
We fit a joint model based on the entire dataset including both herbicide
treatments. Actually, as there is a shared control group, there is no other op-
tion than a joint model to avoid that the control group counts twice in the
analysis. Again, we use the model fitting function drm(), including the addi-
tional argument named curveid that provides information on the grouping
in the data and also including the argument pmodels that for each of the
four model parameters b, c, d, and e (in that order) specifies if the grouping
should be applied (∼treatment) or no grouping should be used, i.e., a single
common model parameter shared by all groups (∼1). As there is a shared
control group we have to insist on a shared parameter d for the upper limit.
The model specification looks like this.
plot(G.aparine.LL.4,
broken = TRUE,
xlab = "Dose (g/ha)",
ylab = "Dry weight (mg/pot)")
1400
1200 1
Dry weight (mg/pot)
2
1000
800
600
400
200
0 1 10 100 1000
Dose (g/ha)
FIGURE 1.14
The fitted dose-response curves for the four-parameter log-logistic model (with
a shared control group) fitted to the dataset G.aparine. Average data per
concentration are also shown.
summary(G.aparine.LL.4)
##
## Model fitted: Log-logistic (ED50 as parameter) (4 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 1.61291 0.33330 4.8392 2.373e-06 ***
## b:treatment2 0.13809 0.37301 0.3702 0.7115667
34 Dose-response analysis using R
anova(G.aparine.LL.4.pooled, G.aparine.LL.4)
##
## 1st model
## fct: LL.4()
## pmodels: 1 (for all parameters)
## 2nd model
## fct: LL.4()
## pmodels: ~treatment, ~treatment, ~1, ~treatment
## ANOVA table
##
## ModelDf RSS Df F value p value
## 1st model 236 4927249
## 2nd model 233 2891677 3 54.673 0.000
As reported above, two out of three comparisons between the two treatments
of the individual model parameters gave significant results. Therefore, it is
not surprising that the global test is also significant (p < 0.0001): the p-value
from the global test may be thought of as a kind of weighted average of these
individual p-values.
head(S.alba.comp)
The first four columns are: exp denotes the sub-experiment (four levels),
herbicide denotes the herbicide applied, dose is the dose of the herbicide
applied (g a.i./ha), and drymatter is the response (in g/pot). There are four
more columns in the dataset, but they will not be used here.
We fit a joint model based on data from all four sub-experiments. Ide-
ally, we would fit a dose-response mixed-effects model that would treat the
sub-experiment variation as a separate source of variation. However, only hav-
ing two sub-experiments per herbicide treatment is not much and it may be
difficult to fit such a mixed model. We will come back to mixed models in
Chapter 7. Instead, we will look at two simpler approaches.
We may consider a joint model where we ignore the information about
the sub-experiments or, put in another way, pooling data from the two sub-
experiments for each herbicide. This approach corresponds to assuming that
the double number of replicates were used per dose (which is of course not
true). A joint model is convenient if we want to compare the two herbicides.
There is, however, no compelling reason for having a joint model encompassing
data from all sub-experiments. Moreover, it has been shown that, in theory,
estimated model parameters will be the same for a joint model and for separate
models per herbicide (Fang and Zhang, 2014). In practice, numerical issues
may lead to slight differences. Therefore, as an alternative, we may consider
separate models per sub-experiment and seek to combine them.
If there had been a priori knowledge available about some of the model pa-
rameters being the same for both herbicides, then the argument pmodels could
have been used to incorporate such assumptions: Perhaps the lower and upper
limits for the two herbicides are identical, whereas slopes and ED50 param-
eters are different from herbicide to herbicide? It depends on what we know
about the experiment before collecting data.
The summary output shows that the lower and upper limits for the herbi-
cides are fairly similar, indicating that it may be meaningful to compare the
herbicides in terms of ED50 levels.
Continuous data 37
summary(S.alba.LL.4)
##
## Model fitted: Log-logistic (ED50 as parameter) (4 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:bentazone 4.421705 1.376978 3.2112 0.001658 **
## b:glyphosate 1.606429 0.322669 4.9786 1.949e-06 ***
## c:bentazone 0.679894 0.092323 7.3643 1.673e-11 ***
## c:glyphosate 0.894469 0.138322 6.4666 1.756e-09 ***
## d:bentazone 4.018895 0.107033 37.5481 < 2.2e-16 ***
## d:glyphosate 3.874354 0.122103 31.7302 < 2.2e-16 ***
## e:bentazone 21.293475 1.348860 15.7863 < 2.2e-16 ***
## e:glyphosate 46.245932 6.379062 7.2496 3.074e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error:
##
## 0.4952302 (133 degrees of freedom)
Figure 1.15 shows the fitted dose-response curves together with averages per
herbicide and dose. The model fit seems to be more or less appropriate.
The estimated relative potency based on the ED50s for the two herbicides
is obtained using the function EDcomp(), which takes the model fit as the
first argument and the ED levels (provided as percentages) to be compared
as the second argument. Additionally, we specify the argument interval =
"delta" to obtain confidence intervals based on the delta method.
##
## Estimated ratios of effect doses
##
## Estimate Lower Upper
## bentazone/glyphosate:50/50 0.46044 0.32220 0.59868
plot(S.alba.LL.4,
broken = TRUE,
xlab = "Dose (g a.i./ha)",
ylab = "Dry matter (g/pot)")
bentazone
4 glyphosate
Dry matter (g/pot)
0 10 100 1000
Dose (g a.i./ha)
FIGURE 1.15
Four-parameter log-logistic model fitted to data on the effect of glyphosate
and bentazone on growth of white mustard (bentazone: solid line and open
circles, glyphosate: dashed line and triangles).
argument. The first two arguments are the model fit and the parameter of
interest.
##
## Comparison of parameter 'e'
##
## Estimate Std. Error t-value p-value
## bentazone-glyphosate -24.9525 6.5201 -3.827 0.0001988 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Comparison of parameter 'b'
##
## Estimate Std. Error t-value p-value
## bentazone-glyphosate 2.8153 1.4143 1.9906 0.04857 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The next step is to extract the estimated ED50 values with the corresponding
standard errors for each sub-experiment: We define a 4×2 matrix of NAs. Then
we fill in row by row the estimated ED50 and the corresponding standard
errors (coef() and summary() and square brackets to extract the values in
row 4 and columns 1 and 2).
ed50.estimates
## [,1] [,2]
## [1,] 29.26479 2.065501
## [2,] 18.85085 3.967966
## [3,] 25.56620 5.793800
## [4,] 62.73827 6.390966
Finally, we convert the matrix to a data frame with specified column names
(est and se) and a column with information about the treatments: the first
two rows contain estimates for the sub-experiments for bentazone and the last
two rows for glyphosate.
ed50.estimates
## est se treatment
## 1 29.26479 2.065501 bentazone
## 2 18.85085 3.967966 bentazone
Continuous data 41
The estimates may be pooled into combined estimates using functionality from
the R package metafor (Viechtbauer, 2010). Specifically we use the function
rma(), which takes the estimated ED50 values as the first argument and the
corresponding squared standard errors as the second argument; the smaller
the standard error, the more weight will be assigned to the corresponding
estimate. The treatment variable is specified by means of a formula (as also
for many other model fitting functions) using the argument mods. Finally, the
argument data is used for providing the dataset where the variables given in
the first three arguments are found.
Specifically, we fit a weighted version of a one-way analysis of variance
model for ED50 (est) with squared standard errors of estimated ED50 values
as weights (se), using the argument mods = ~ treatment. These variables are
found in the dataset ed50estimates, which we defined above.
round(coef(summary(S.alba.ED50)), 3)
Results from the joint and separate models disagree somewhat. The estimated
differences in ED50 are fairly similar. However, incorporating variation be-
tween sub-experiments, which is the more appropriate approach, makes it
more difficult to claim a significant difference as more variation in the data is
captured by this dose-response analysis.
2
Binary and binomial dose-response data
library(devtools)
43
44 Dose-response analysis using R
install_github("DoseResponse/drcData")
library(drcData)
library(multcomp)
1.0
0.8
Proportion dead
0.6
0.4
0.2
0.0
0 1000 10000
Dose (mg/ml)
FIGURE 2.1
Two-parameter log-logistic model fitted to the dataset acute.inh.
A plot of the original data and the fitted dose-response curve can give a visual
impression of how well the model fits the data, and with such a small dataset
it is possibly the only reasonable means of assessing the model. The plot is
shown in Figure 2.1. The model seems to provide a good fit to the data.
To see the parameter estimates we use the summary method.
summary(acute.inh.LL.2)
##
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0 and upper limit at 1 (2 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) -7.9301 5.0812 -1.5607 0.1186
46 Dose-response analysis using R
From the output we can directly read off the estimated LD50 with the corre-
sponding standard error: 895 (84).
To obtain confidence intervals we can use the function confint(), which
by default will return (marginal) 95% confidence intervals.
confint(acute.inh.LL.2)
## 2.5 % 97.5 %
## b:(Intercept) -17.88909 2.028992
## e:(Intercept) 731.53397 1059.062485
The 95% confidence interval of the slope b extends below 0, indicating that
the dose-response data do not contain much information about the transition
between lower and upper limits. However, the 95% confidence interval of LD50
is well defined and it ranges from 732 to 1059. Both the estimated LD50 and
the 95% confidence intervals are in good agreement with the results obtained
by Racine et al. (1986), who used a different or more computationally involved
Bayesian approach.
coef(summary(acute.inh.glm))
As the parameterization used in glm() differs from the one used in drm(), the
parameter estimates will not all be the same: The estimated intercept in the
logistic regression model fit is equal to b·log(e) ((-7.9301) * log(895.2982)
= -53.90213). The estimated slope of log(dose) is identical to the estimated
slope parameter b in the above two-parameter log-logistic model fit except for
Binary and binomial dose-response data 47
the change in sign. It is exactly the same model fit except for small numer-
ical differences due to different estimation procedures being used. For a re-
lated example using logistic regression, we refer to Venables and Ripley (2002,
pp. 190–194).
liver.tumor
The resulting summary output shows the estimated slope and EC50, where
EC50 may be interpreted as the concentration of TCDD resulting in a total
incidence of 50%:
summary(liver.tumor.LL.2)
##
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0 and upper limit at 1 (2 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) -4.9750 1.6897 -2.9443 0.003237 **
48 Dose-response analysis using R
##
## Estimated effective doses
##
## Estimate Std. Error Lower Upper
## e:1:5 20.5705 2.5294 15.6129 25.5281
## e:1:10 23.9041 1.9778 20.0278 27.7804
The argument interval specifies the method applied for calculating confi-
dence intervals. Here we ask for intervals based on the delta method (Piegorsch
and Bailer, 2005, pp. 436–437), which are commonly reported, even though
they are approximate, only reaching exactly 95% coverage for large sample
sizes.
It is important to realize that by choosing the two-parameter log-logistic
model, the lower and upper asymptotes are a priori fixed at the values 0 and
1, respectively. It means that any response obtained for dose 0 is redundant
and will not be used in the estimation at all. For the same reason, large doses
will also only contribute little to the model fit. In other words, the choice of
model already contributes information about the dose-response relationship
and this piece of information cannot be updated based on the data as there
is no relevant parameter in LL.2() that could utilize such information (the
lower limit is fixed at 0). One way to see that dose 0 data add nothing to the
analysis is to repeat the analysis without dose 0:
liver.tumor.LL.2.no.zero <- drm(incidence/total ~ conc,
weights = total,
data = subset(liver.tumor, conc > 0),
fct = LL.2(),
type = "binomial")
Estimates and standard errors are exactly the same as for the model fitted to
the entire dataset:
summary(liver.tumor.LL.2.no.zero)
##
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0 and upper limit at 1 (2 parms)
##
Binary and binomial dose-response data 49
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) -4.9750 1.6897 -2.9443 0.003237 **
## e:(Intercept) 37.1774 4.1838 8.8859 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
chlorac
## 4 40 40 38
## 5 80 40 40
## 6 160 40 40
The fitted dose-response curve captures the dose-response trend in the data
adequately, as shown in Figure 2.2.
Below, the summary output and the 95% confidence intervals are shown.
summary(chlorac.LN.3u)
##
## Model fitted: Log-normal with upper limit at 1 (3 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 4.603773 1.043813 4.4105 1.031e-05 ***
## c:(Intercept) 0.099988 0.033573 2.9783 0.002899 **
## e:(Intercept) 28.291922 2.271962 12.4526 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(chlorac.LN.3u)
## 2.5 % 97.5 %
## b:(Intercept) 2.55793762 6.6496084
## c:(Intercept) 0.03418668 0.1657886
## e:(Intercept) 23.83895875 32.7448861
Binary and binomial dose-response data 51
1.0
0.8
Proportion dead
0.6
0.4
0.2
0.0
0 10 100 1000
Concentration (mg/kg)
FIGURE 2.2
Two-parameter log-logistic model fitted to the dataset chlorac. The dashed
line shows the estimated non-zero lower limit.
The estimated natural mortality, which is the parameter estimate for c con-
verted and rounded to an integer percentage, is equal to 10% with a 95%
confidence interval ranging from 3.4% to 17%. The estimated EC50 is equal
to 28.29 with a 95% confidence interval ranging from 24 to 33; Hoekstra (1987)
found a very similar result.
The EC50 corresponds to a 50% increase between the estimated lower
limit and the fixed upper limit of 1, that is the concentration resulting in
50% mortality beyond the natural mortality. It does not correspond to the
concentration resulting in a total mortality of 50%, which is an EC value that
is defined in absolute terms based on the probability scale and not relative
to limits partly or fully estimated from the data. For binomial data these
absolute EC values are often more relevant than the relative ones and they
may also be estimated using the function ED().
52 Dose-response analysis using R
##
## Estimated effective doses
##
## Estimate Std. Error
## e:1:0.5 27.4464 2.3155
If instead a two-parameter model (e.g., LN.2()) had been fitted, then the
estimated EC50 would become smaller, biased downwards, and it would have
narrower 95% confidence intervals. In short, a less accurate but more precise
estimate of EC50 would be the result; this is an example of the so-called
bias-variance tradeoff occurring when choosing between different models.
summary(earthworms.LL.3)
##
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0 (3 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 1.505679 0.338992 4.4416 8.928e-06 ***
## d:(Intercept) 0.604929 0.085800 7.0505 1.783e-12 ***
## e:(Intercept) 0.292428 0.083895 3.4856 0.000491 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Binary and binomial dose-response data 53
Perhaps even better (truer to the experimental design) would be to fit a log-
logistic model where the upper limit is not estimated but instead fixed at the
value 0.5. This is achieved using the following R lines:
summary(earthworms.LL.3.fixed)
##
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0 (2 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 1.646689 0.376494 4.3737 1.221e-05 ***
## e:(Intercept) 0.377269 0.076785 4.9133 8.956e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By fixing the upper limit, there is a very slight gain in precision for the esti-
mated ED50 (the parameter e), as the standard error becomes a little smaller.
In contrast, but in this case of less interest, the precision of the slope param-
eter b is reduced as the dose 0 is important for estimating the slope. Note
that the above comparison between model fits is meant to show how different
model choices affect results. However, we want to stress that in practice, ide-
ally, a single model should be chosen up front based on what is known about
the experimental design. If this is not feasible, then model averaging is an
alternative (see Chapter 6 for more details and examples).
head(fluoranthene)
1.0
0.7
0.8 1.5
Proportion dead
0.6
0.4
0.2
0.0
0 10 100
Concentration (µg/L)
FIGURE 2.3
Two-parameter log-normal model fitted to the dataset fluoranthene, one
curve for each algal concentration.
Note that the second argument in the model specification identifies the in-
dividual dose-response curves in the data and here we can only identify the
two treatment groups, but not the underlying individual dose-response ex-
periments. Figure 2.3 shows that the fitted dose-response curves capture the
trends in the data.
56 Dose-response analysis using R
##
## Model fitted: Log-normal with lower limit at 0 and upper
## limit at 1 (2 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:0.7 2.70147 0.28738 9.4004 < 2.2e-16 ***
## b:1.5 2.87566 0.31085 9.2511 < 2.2e-16 ***
## e:0.7 15.10005 0.67387 22.4080 < 2.2e-16 ***
## e:1.5 17.90466 0.75721 23.6456 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(fluoranthene.LN.2.1)
## 2.5 % 97.5 %
## b:0.7 2.138217 3.264721
## b:1.5 2.266412 3.484906
## e:0.7 13.779295 16.420807
## e:1.5 16.420556 19.388755
From the summary it seems that the two slopes could be assumed to be iden-
tical, an assumption about parallelism. However, ideally such an assumption
should have been imposed before fitting any dose-response model to avoid
inflation of standard errors.
As the confidence interval for the two LC50 values are barely overlapping,
it could be tempting to claim that the p-value for testing that the two LC50
values are the same is just above 0.05, almost significant. However, in reality
the two LC50 values are clearly significantly different. This conclusion may
be reached from the direct comparison between the two LC50 values. We use
the function compParm() for making comparisons between treatments for a
specific model parameter:
compParm(fluoranthene.LN.2.1, "e", "-")
##
## Comparison of parameter 'e'
##
## Estimate Std. Error t-value p-value
## 0.7-1.5 -2.8046 1.0136 -2.7669 0.00566 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Binary and binomial dose-response data 57
where "e" indicates the model parameter of interest and "-" specifies that
comparisons should be in terms of differences (the other option being ratios).
The two LC50 values are clearly significantly different.
In an attempt to incorporate the experiment-level variation (as well as
any other model misspecification) into the dose-response analysis, we may
use the sandwich variance estimator for deriving robust standard errors (see
Section A.5 for more details).
##
## Comparison of parameter 'e'
##
## Estimate Std. Error t-value p-value
## 0.7-1.5 -2.80460 0.97744 -2.8693 0.004113 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The conclusion remains the same: there is a clearly significant difference be-
tween the two groups in terms of LC50 values (estimated difference: –2.8
(0.977) µg/L, p = 0.004).
summary(selenium)
The plot of the data with the fitted dose-response curves superimposed shows
that the fitted dose-response curves capture the trends in the data quite well
(Figure 2.4).
1.0
1
0.8
Proportion dead
2
3
0.6
4
0.4
0.2
0.0
Concentration
FIGURE 2.4
Two-parameter log-logistic model fitted to the dataset selenium. The different
line types correspond to the different types of selenium.
Binary and binomial dose-response data 59
The argument legendPos was used to specify the position of the legend
(more precisely: the top right corner of the box containing the legend).
Next, we fit a two-parameter log-logistic model, but now assuming different
slopes and a common LC50 parameter for all four types of selenium.
anova(selenium.LL.2.2, selenium.LL.2.1)
##
## 1st model
## fct: LL.2()
## pmodels: ~factor(type) - 1, ~1
## 2nd model
## fct: LL.2()
## pmodels: type (for all parameters)
## ANOVA-like table
##
## ModelDf Loglik Df LR value p value
## 1st model 5 -437.99
## 2nd model 8 -376.21 3 123.56 0
We can safely conclude that the four LC50 values are not identical (p <
0.0001). Jeske et al. (2009) reached the same conclusion. The next step is
to identify differences between the different types of selenium. The function
EDcomp() provides all comparisons of LC50 values in terms of ratios:
##
## Estimated ratios of effect doses
##
60 Dose-response analysis using R
##
## Estimated effective doses
##
## Estimate Std. Error Lower Upper
## e:1:50 252.2556 13.8268 225.1555 279.3556
## e:2:50 378.4605 39.3707 301.2953 455.6256
## e:3:50 119.7132 5.9054 108.1389 131.2875
## e:4:50 88.8053 8.6161 71.9180 105.6926
To obtain adjusted confidence intervals, we also initially use the function ED()
but now supplying the argument multcomp = TRUE to augment the output
with a component named EDmultcomp that may be supplied directly to the
function glht() in the package multcomp (Hothorn et al., 2008). Further-
more, the argument display = FALSE suppresses the printing of marginal
confidence intervals, which we have already obtained above.
selenium.EDres <- ED(selenium.LL.2.1, c(50),
interval = "delta",
multcomp = TRUE,
display = FALSE)
##
## Simultaneous Confidence Intervals
Binary and binomial dose-response data 61
##
## Fit: NULL
##
## Quantile = 2.4908
## 95% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## e:1:50 == 0 252.2556 217.8151 286.6960
## e:2:50 == 0 378.4605 280.3942 476.5267
## e:3:50 == 0 119.7132 105.0039 134.4225
## e:4:50 == 0 88.8053 67.3438 110.2667
summary(glht(selenium.EDres[["EDmultcomp"]],
linfct = contrMat(1:4, "Tukey")))
##
## Simultaneous Tests for General Linear Hypotheses
##
## Multiple Comparisons of Means: Tukey Contrasts
##
##
## Linear Hypotheses:
## Estimate Std. Error z value Pr(>|z|)
## 2 - 1 == 0 126.20 41.73 3.024 0.0111 *
## 3 - 1 == 0 -132.54 15.04 -8.816 <0.001 ***
## 4 - 1 == 0 -163.45 16.29 -10.033 <0.001 ***
## 3 - 2 == 0 -258.75 39.81 -6.499 <0.001 ***
## 4 - 2 == 0 -289.66 40.30 -7.187 <0.001 ***
## 4 - 3 == 0 -30.91 10.45 -2.959 0.0131 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
Using this approach, arbitrary LCxx values may be compared between groups.
Moreover, it also works for other types of dose-response data.
3
Count dose-response data
Count data include number of fronds, offspring, juveniles, leaves, or roots, i.e.,
non-negative integers. In contrast to binomial data, the experimental design
imposes no a priori upper limit on the counts. In theory, there may be no
upper limit although very large counts will be very unlikely. In practice, a
limited number of different counts may be observed, but some counts may
occur multiple times (with ties). Ideally, counts should be recorded over the
same time period in order to be comparable. However, there are ways to adjust
for varying durations of the time period. For instance, reproduction data,
which are commonly obtained from chronic toxicity tests in ecotoxicology, are
often counts of the number of offspring present at the end of the test period
(a certain pre-specified time period).
Dose-response analysis of count data is based on statistical models that
describe the mean trend on the scale of the counts, not on the scale of the
logarithm-transformed mean as would typically be the case for a generalized
linear model for count data (McCullagh and Nelder, 1989, Chapter 6). How-
ever, the same distributions as used in generalized linear models for counts
may be used when fitting dose-response models.
There may be several ways to analyze such count data, depending on the
distributional and modeling assumptions one is willing to make (Ritz and Van
der Vliet, 2009). In general the closer the chosen distributional assumptions
are to the true distributions, which generated the data, the more efficient (i.e.,
smaller standard errors) the statistical analysis will be. In practice we may
not get close to the true distributions, and it will often be a trade-off between
more bias in parameter estimates if we are being too confident and choose a
quite specific model, and more imprecision (larger estimated standard errors)
if we are less picky and choose a model that does not seem fully appropriate
(a bias-variance trade-off).
Counts are often assumed to follow a Poisson distribution, although this
distribution is not always sufficiently flexible to describe the variation in the
count dose-response data (see also Subsection A.2.2.1). The standard devia-
tion of Poisson distributed counts is completely determined by the mean of
the distribution; there is no separate parameter for the standard deviation as
is the case of the normal distribution. Therefore, it may easily happen that a
Poisson distribution does not adequately capture the variation in the observed
counts: the counts may exhibit less or more variation than predicted by the
Poisson distribution. Excess variation is usually referred to as over-dispersion
63
64 Dose-response analysis using R
(e.g., Morgan, 1992, Chapter 6), just as for binomial data. Nevertheless, a
Poisson dose-response model may be a good starting point for the analysis.
Apart from the distributional assumptions, dose-response analysis is car-
ried out in much the same way as for continuous dose-response data (Chap-
ter 1), e.g., model checking by means of residuals.
In this chapter, we use the following extension packages:
library(drc)
library(devtools)
install_github("DoseResponse/drcData")
library(drcData)
library(ggplot2)
library(lmtest)
library(multcomp)
library(dplyr)
library(sandwich)
head(lemna)
## conc frond.num
## 1 0 70
## 2 0 66
## 3 0 61
## 4 0 65
## 5 0 65
## 6 0 61
lemna %>%
group_by(conc) %>%
summarize(min = min(frond.num),
max = max(frond.num))
## # A tibble: 10 x 3
## conc min max
## <dbl> <dbl> <dbl>
## 1 0 61 70
## 2 0.38 64 67
## 3 0.76 54 65
## 4 1.52 55 58
## 5 3.03 50 63
## 6 6.06 49 56
## 7 12.1 41 47
## 8 24.2 37 37
## 9 48.5 31 36
## 10 97 29 34
plot(fitted(lemna.minor.LL.3), residuals(lemna.minor.LL.3),
xlab = "Fitted values",
ylab = "Raw residuals")
abline(h=0, lty=2)
Raw residuals
5
0
−5
30 40 50 60
Fitted values
FIGURE 3.1
The residual plot for the three-parameter log-logistic model fitted to the
Lemna minor data.
Based on the residual plot in Figure 3.1 we can conclude that there are no
substantial departures from random scatter (around the y axis). Therefore,
it seems reasonable to assume that a three-parameter log-logistic model will
suffice. We do not look at the standard QQ normal probability plot as it is
not helpful in assessing whether or not the data are Poisson distributed.
The plot of the data with the fitted dose-response curve superimposed is
shown below. Figure 3.2 also supports our initial impression that the three-
parameter log-logistic model describes the data adequately. However, it is
clear that the data hold almost no information about what would happen
for very large concentrations. The assumption about a lower limit of 0 is
crucial as it means that one piece of additional information (not found in the
data!) is incorporated into the model. This consideration is analogous to what
we discussed for continuous dose-response data in Subsection 1.1.5. In case
there are no strong biological reasons for a particular assumption, it may be
Count dose-response data 67
70
60
Number of fronds
50
40
30
20
10
0
Control 1 10 100
FIGURE 3.2
The fitted three-parameter log-logistic dose-response curve together with the
Lemna minor data.
zero counts. Assuming a non-zero lower limit (LL.4()) may result in a some-
what different result because the estimated EC50 is interpreted relative to
the estimated lower and upper limits. In this case, the estimated EC50 be-
comes approx. 5 times smaller as shown by Van der Vliet and Ritz (2013).
This point also applies to continuous dose-response data, but not to binomial
dose-response data.
summary(lemna.minor.LL.3)
##
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0 (3 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 0.49207 0.07418 6.6335 3.278e-11 ***
## d:(Intercept) 66.79414 2.59857 25.7042 < 2.2e-16 ***
## e:(Intercept) 56.07520 14.27537 3.9281 8.562e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimated EC50 is 56.1 with a corresponding standard error of 14.3. Thus,
the average number of fronds is reduced by 50% relative to an average level
of 66.8, i.e., reaching an average of 33.4, at a concentration of 56.1 (effluent
% v/v).
As seen previously for continuous data, the generic way to extract EC50
would be to use the function ED(), which may also be used for obtaining other
EC values:
ED(lemna.minor.LL.3, c(10, 20, 50))
##
## Estimated effective doses
##
## Estimate Std. Error
## e:1:10 0.64497 0.46796
## e:1:20 3.35159 1.67765
## e:1:50 56.07520 14.27537
The standard errors of the estimated EC10 and also EC20 to some extent
are quite large and the corresponding Wald-type 95% confidence interval will
have an unrealistic negative lower limit.
ED(lemna.minor.LL.3, c(10, 20, 50), interval = "delta")
##
## Estimated effective doses
Count dose-response data 69
##
## Estimate Std. Error Lower Upper
## e:1:10 0.644967 0.467963 -0.272224 1.562158
## e:1:20 3.351589 1.677654 0.063448 6.639731
## e:1:50 56.075199 14.275369 28.095989 84.054408
head(C.dubia)
## conc number
## 1 0 27
## 2 0 30
## 3 0 29
## 4 0 31
## 5 0 16
## 6 0 15
with(C.dubia, table(conc))
## conc
## 0 1.56 3.12 6.25 12.5
## 10 10 10 10 10
Bailer and Oris (1997) fitted a generalized linear model assuming a Poisson
distribution and including both linear and quadratic terms of concentrations
in order to describe the hormetic effect, which may be inferred from the ini-
tial increase in the response for low concentrations, followed by a decreasing
response for increasing concentrations (Figure 3.3).
70 Dose-response analysis using R
50
Number of offspring
40
30
20
10
0
0 2 4 6 8 10 12
FIGURE 3.3
The scatter plot of the number of offspring as a function of waste water con-
centration (%) for the C. dubia data.
The model fit shown in Figure 3.4 (top panel) is not good. For instance, the
predicted mean offspring at concentration 0 is not centered among the data
points for concentration 0: it has been shifted upwards because of the hormetic
Count dose-response data 71
effect observed for higher concentrations. Perhaps even more pronounced: the
predicted mean number of offspring is too small for the low concentrations
that seemed to produce a strong hormetic effect.
As a second attempt we consider the four-parameter Brain-Cousens horme-
sis model, which may be specified using the function BC.4(). This means we
retain the assumption that the lower limit is equal to 0:
The fitted dose-response curve is shown in Figure 3.4 (bottom panel). This
time the fitted dose-response curve describes the trend in the data very closely
(too closely?).
summary(C.dubia.BC.4)
##
## Model fitted: Brain-Cousens (hormesis) with lower limit fixed
## at 0 (4 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 3.83037 0.39251 9.7585 < 2.2e-16 ***
## d:(Intercept) 21.80069 1.38689 15.7191 < 2.2e-16 ***
## e:(Intercept) 7.60327 0.61601 12.3427 < 2.2e-16 ***
## f:(Intercept) 4.03279 0.88706 4.5462 5.462e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
72 Dose-response analysis using R
50
Number of offspring 40
30
20
10
0 1 10 100
50
Number of offspring
40
30
20
10
0 1 10 100
FIGURE 3.4
The fitted three-parameter log-logistic and four-parameter Brain-Cousens
hormesis models shown together with the C. dubia data.
Count dose-response data 73
The parameter f quantifies the extent of the hormetic effect in the sense
that the more positive this parameter is, the larger is the hormesis peak.
The parameter f is significantly different from 0, implying that there is a
substantial hormetic effect. Bailer and Oris (1997) reached the same conclusion
based on a quadratic generalized linear model.
The summary output does not provide any estimate of EC50. In contrast
to the four-parameter log-logistic model, the parameter e has lost its interpre-
tation as EC50 in the hormesis models. Therefore, to estimate EC50, we use
the function ED():
ED(C.dubia.BC.4, 50)
##
## Estimated effective doses
##
## Estimate Std. Error
## e:1:50 11.7863 0.5086
For hormesis models and biphasic dose-response models in general, it may be
possible to estimate EC values for both phases, i.e., corresponding to effects
to the left and right of the peak. For instance, we can estimate EC-10 and
EC-20 as follows (notice the use of the additional argument bound = FALSE,
which switches off checking if the specified effect levels lie between 0 and 100):
ED(C.dubia.BC.4, c(-10, -20), bound = FALSE)
##
## Estimated effective doses
##
## Estimate Std. Error
## e:1:-10 8.09294 0.46267
## e:1:-20 7.61867 0.49695
Actually, EC-10 and EC-20 turn out to be almost as precisely determined from
the data as is EC50 (with comparable standard errors), reflecting that these
estimates are within the concentration range where actual concentrations are
found in the data.
head(chlordan)
The residual plot shown in Figure 3.5 looks reasonable. Likewise, the fit-
ted dose-response model seems to describe the trend in the data adequately
(Figure 3.6).
Now we can take a look at the summary output:
summary(chlordan.LL.3)
##
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0 (3 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 1.089500 0.070639 15.423 < 2.2e-16 ***
## d:(Intercept) 113.666336 2.984324 38.088 < 2.2e-16 ***
## e:(Intercept) 1.615563 0.127838 12.638 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimated EC50 is 1.62 (0.13). The estimated average total reproduction
over 21 days in the control group is 114 (2.98) offspring.
Using the confint method we find the 95% confidence interval for EC50.
confint(chlordan.LL.3)
## 2.5 % 97.5 %
## b:(Intercept) 0.9510493 1.22795
## d:(Intercept) 107.8171683 119.51550
## e:(Intercept) 1.3650065 1.86612
Count dose-response data 75
plot(fitted(chlordan.LL.3), residuals(chlordan.LL.3))
abline(h=0, lty=2)
40
residuals(chlordan.LL.3)
20
0
−20
−60
20 40 60 80 100
fitted(chlordan.LL.3)
FIGURE 3.5
The residual plot for the three-parameter log-logistic model fitted to the
chlordan dataset.
with(chlordan, table(time))
## time
## 2.5 9.5 11.5 14.5 17.5 18.5 20.5 21
## 1 2 3 1 1 9 1 42
The above dose-response model did not incorporate this imbalance and, hence,
it may be seen as a misspecified model (as the distributional assumptions are
partly misspecified). Ideally, such imbalance should be taken into account in
the statistical analysis in order to reduce bias in the estimates.
One approach is to fit a three-parameter log-logistic model but use the du-
rations of observation periods as weights; this means that numbers of offspring
76 Dose-response analysis using R
plot(chlordan.LL.3,
broken = TRUE,
type = "all",
xlim = c(0, 10),
xlab = "Concentration (mu g/L)",
ylab = "Number of offspring")
120
Number of offspring
100
80
60
40
20
0
0 0.1 1 10
FIGURE 3.6
The fitted three-parameter log-logistic dose-response curve to the chlordan
dataset shown together with the raw data.
are scaled by their durations. This again means that the average number of
offspring per day for a given concentration is modeled. In particular, the pa-
rameter d can be interpreted as the average number of offspring per day in the
unexposed control group. However, it is important to stress that the scaling
or normalization by duration is taken care of by the model.
The weighted Poisson model is fitted as follows using the argument
weights:
##
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0 (3 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 0.929859 0.065783 14.135 < 2.2e-16 ***
## d:(Intercept) 5.544291 0.146147 37.936 < 2.2e-16 ***
## e:(Intercept) 1.758823 0.151862 11.582 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The estimated EC50 is now 1.76 (0.15). The estimated average daily repro-
duction rate in the control group is 5.54 (0.146) offspring.
Looking at Figure 3.6 (note in particular the variation in the mid-range
of concentrations), we may suspect that model misspecification is not only
due to the varying durations but also caused by over-dispersion. Therefore,
another approach would be to fit a negative-binomial model. Such a model
may be specified using type = "negbin2"; there is also the option "negbin1"
for a slightly different model (Delignette-Muller et al., 2014b).
chlordan.LL.3.nb.we <-drm(repro ~ conc,
data = chlordan,
fct = LL.3(),
type = "negbin2",
weight = time)
##
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0 (3 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) 0.91769 0.11430 8.0291 9.349e-16 ***
## d:(Intercept) 5.57532 0.27033 20.6242 < 2.2e-16 ***
## e:(Intercept) 1.68993 0.27040 6.2498 4.111e-10 ***
## O:(Intercept) -2.18109 0.28539 -7.6426 2.130e-14 ***
78 Dose-response analysis using R
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
confint(chlordan.LL.3.nb.we)
## 2.5 % 97.5 %
## b:(Intercept) 0.6936773 1.141707
## d:(Intercept) 5.0454867 6.105160
## e:(Intercept) 1.1599602 2.219908
## O:(Intercept) -2.7404380 -1.621742
Slightly different parameter estimates are obtained, but the standard errors
are almost twice as large as compared to the Poisson model. The param-
eter, O, appearing in the summary output is exp(ω) in Equation (A.8) in
Section A.2.2.2.
One last alternative approach would be to adjust for model misspecifi-
cation in terms of the assumed distribution (while still assuming that the
dose-response model function is correctly specified) by replacing the default
naı̈ve standard errors by robust standard errors (see Section A.5).
coeftest(chlordan.LL.3, vcov. = sandwich)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## b:(Intercept) 1.08950 0.17078 6.3795 3.413e-08 ***
## d:(Intercept) 113.66634 4.25420 26.7186 < 2.2e-16 ***
## e:(Intercept) 1.61556 0.25454 6.3470 3.862e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
This approach results in exactly the same parameter estimates as for the
unweighted Poisson model but with modified standard errors, which are almost
twice as large. Indeed, the results are similar to the results from the fitted
negative binomial model.
In terms of point estimates and 95% confidence intervals, the above results
for the models taking the weights into account are very similar to the results
originally reported by Delignette-Muller et al. (2014a) but obtained using a
much more complex Bayesian approach.
head(decontaminants)
summary(decontaminants)
to indicate that the parameter d is shared among the three groups (∼1),
whereas the parameters b and d should be different between the two groups
(∼group-1). Note that this model specification is only possible because the
data have been arranged in such a way that the control group has been merged
with one of the two other groups (it does not matter which one). Following the
original analysis we fit a three-parameter Weibull type 1 model. This model
fitting is an example of Wadley’s problem (Finney, 1971, Chapter 10).
decon.W1.3.po <- drm(count ~ conc,
curveid = group,
data = decontaminants,
fct = W1.3(),
type = "Poisson",
pmodels = list(~ group-1, ~ 1, ~ group-1))
abline(a = 0, b = 1, lty = 2)
##
## Model fitted: Weibull (type 1) with lower limit at 0 (3 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:grouphpc 0.773346 0.051621 14.9813 < 2.2e-16 ***
## b:groupoxalic 0.371221 0.039024 9.5127 < 2.2e-16 ***
Count dose-response data 81
150
100
50
0
10 20 30 40 50
Average count
FIGURE 3.7
Average counts versus squared empirical standard deviations per concentra-
tion per decontaminant for the decontaminants dataset. The broken line
corresponds to the variation predicted by the Poisson distribution.
The robust standard errors are approximately 2–3 times larger than the naı̈ve
standard errors shown in the above summary output.
An alternative approach would be to retain the three-parameter Weibull
type 1 model but replace the Poisson distribution by a negative binomial
distribution:
summary(decon.W1.3.nb)
##
## Model fitted: Weibull (type 1) with lower limit at 0 (3 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:grouphpc 0.777719 0.083595 9.3034 < 2.2e-16 ***
## b:groupoxalic 0.368470 0.061458 5.9955 2.029e-09 ***
## d:(Intercept) 50.014470 1.988664 25.1498 < 2.2e-16 ***
## e:grouphpc 0.264038 0.026979 9.7868 < 2.2e-16 ***
## e:groupoxalic 1.463080 0.379882 3.8514 0.0001174 ***
## O:(Intercept) 0.505370 0.208863 2.4196 0.0155366 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We see the same picture: standard errors increased as compared to the Poisson
model, but less dramatically this time: only around a factor 2. The fitted dose-
response curves based on the negative binomial model are shown in Figure 3.8.
The potency of two decontaminants may be compared in terms of the ratio
of their EC50 values using the function EDcomp():
##
## Estimated ratios of effect doses
##
## Estimate Lower Upper
## hpc/oxalic:50/50 0.30459 0.11155 0.49763
Count dose-response data 83
80
hpc
oxalic
60
Colonies
40
20
0 0.1 10
Concentration (% w/v)
FIGURE 3.8
Three-parameter Weibull type 1 model fitted to the dataset decontaminants.
library(drc)
library(devtools)
install_github("doseResponse/drcData")
library(drcData)
install_github("SigneMJensen/mmmVcov")
85
86 Dose-response analysis using R
library(mmmVcov)
library(multcomp)
library(dplyr)
guthion
We fit the two-parameter log-logistic model to binomial data obtained from the
multinomial data by merging the two categories “alive” and “moribund” into
a single category while retaining the category “dead.” The following analysis
compares “dead” to “moribund” and “alive” combined:
Estimated LC50 values and their ratio are obtained as in the previous chapters
using the functions ED() and EDcomp()
Multinomial dose-response data 87
ED(guthion.LL.2.am, c(50))
##
## Estimated effective doses
##
## Estimate Std. Error
## e:S:50 36.891558 1.887823
## e:T:50 1.432554 0.083677
##
## Estimated ratios of effect doses
##
## Estimate Lower Upper
## S/T:50/50 25.752 21.833 29.672
The estimated relative potency is close to the one obtained by Finney (1971,
p. 226) (the difference most likely due to different estimation procedures).
Next, we fit the two-parameter log-logistic model to binomial data obtained
by merging the neighbouring categories “moribund” and “dead” into a single
category while retaining the category “alive” unchanged. In general, you may
want to merge more and more categories starting from 100% affected/dead
to compare decreasing accumulated severity to alive. Or you start from un-
affected/alive and include more and more severe stages step by step. The
following analysis compares “moribund” and “dead” combined to “alive”:
guthion.LL.2.dm <- drm((moribund+dead)/total ~ dose,
curveid = trt,
weights = total,
data = guthion,
fct = LL.2(),
type = "binomial")
##
## Estimated effective doses
##
## Estimate Std. Error
## e:S:50 33.908787 1.552187
## e:T:50 1.324984 0.066308
##
## Estimated ratios of effect doses
88 Dose-response analysis using R
##
## Estimate Lower Upper
## S/T:50/50 25.592 22.190 28.994
Again, the estimated relative potency is close to the one obtained by Finney
(1971, p. 226). Figure 4.1 also shows that the two model fits are quite similar
in terms of differences between the two groups, reflecting that the “moribund”
category contains only little information.
1.0
S
Proportion affected
0.8 T
0.6
0.4
0.2
0.0
0 1 10 100 1000
Dose
FIGURE 4.1
Two-parameter log-logistic models fitted to binomial data obtained from the
dataset guthion. Black lines for the analysis of “dead” vs. “alive” and “mori-
bund” combined, grey lines for the analysis of “dead” and “moribund” com-
bined vs. “alive.”
Multinomial dose-response data 89
Moreover, the fitted binomial dose-response models have fairly similar esti-
mated slope parameters in both model fits (keeping in mind the narrow range
of the doses applied: 1, 45):
coef(summary(guthion.LL.2.am))[1:2, ]
coef(summary(guthion.LL.2.dm))[1:2, ]
relPot.pooled[["covar"]]
## [,1] [,2]
## [1,] 0.4024401 0.5548167
## [2,] 0.5548167 1.8861332
confint(glht(parm(relPot.pooled[["coef"]][,1],
relPot.pooled[["covar"]]),
linfct = matrix(c(0.5, 0.5), 1, 2)))
90 Dose-response analysis using R
##
## Simultaneous Confidence Intervals
##
## Fit: NULL
##
## Quantile = 1.96
## 95% family-wise confidence level
##
##
## Linear Hypotheses:
## Estimate lwr upr
## 1 == 0 25.6721 23.8656 27.4786
The argument seType = "san" implies that robust standard errors and cor-
relations are estimated (Jensen and Ritz, 2018). Once the variance-covariance
matrix has been estimated (and stored under the name relPot.pooled)
the functions glht() and parm() from the package multcomp may be used
to calculate the average of the two estimates (specified using linfct =
matrix(c(0.5, 0.5), 1, 2)) and the corresponding estimated standard er-
ror. Finally, confint() produces the associated confidence interval (95% by
default). Note that the confidence interval is narrower than the two confidence
intervals derived from the binomial dose-response models.
We follow the same steps as in the previous subsection: First we fit a two-
parameter log-logistic model to binomial data obtained by merging the not
deformed and deformed categories into a single category:
Next we fit the model where binomial data were obtained by merging deformed
and dead categories:
Figure 4.2 shows the fitted dose-response curves with the data. It is apparent
that the estimated slope coefficients are different between the two analyses,
i.e., the proportional odds assumption seems not appropriate. We proceed to
summarize the model fits one by one. First the results for the analysis of dead
vs. deformed+alive:
##
## Estimated ratios of effect doses
##
## Estimate Lower Upper
## T/FP:50/50 1284.5 -1007.2 3576.1
We use the argument reverse = TRUE to reverse the calculation to have the
reciprocal relative potency estimated (showing how much more potent ar-
bovirus FP is than arbovirus T); by default it would be the other way around.
The estimated relative potency of arbovirus FP relative to arbovirus T is 1284.
In other words the arbovirus FR is approx. 1300 times more potent than the
arbovirus T in killing chicken embryos.
For the second analysis where “deformed” and “dead” were merged we get
the following results:
92 Dose-response analysis using R
1.0
FP
Proportion affected
0.8 T
0.6
0.4
0.2
0.0
0 10 1000 1e+05
FIGURE 4.2
Two-parameter log-logistic models fitted to binomial data derived from the
dataset arbovirus. Black lines for the analysis of dead vs. alive+deformed,
grey lines for the analysis of dead+deformed vs. alive.
##
## Estimated ratios of effect doses
##
## Estimate Lower Upper
## T/FP:50/50 9.4238 -2.7513 21.5988
Multinomial dose-response data 93
95
96 Dose-response analysis using R
library(drc)
library(devtools)
install_github("DoseResponse/drcData")
library(drcData)
library(lmtest)
library(metafor)
library(multcomp)
library(plyr)
library(sandwich)
head(chickweed, 3)
tail(chickweed, 3)
plot(fitted(chickweed.LL.3), residuals(chickweed.LL.3))
abline(h=0, lty=2)
qqnorm(residuals(chickweed.LL.3))
0.010
residuals(chickweed.LL.3)
0.000
−0.010
fitted(chickweed.LL.3)
FIGURE 5.1
Residual plot for the three-parameter log-logistic model fitted to the dataset
chickweed.
but it should be kept in mind that the residuals will only be approximately
normally distributed and the many intervals without germination (if included)
will blur the picture somewhat (see Figure 5.2).
summary(chickweed.LL.3)
##
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0 (3 parms)
##
## Parameter estimates:
##
## Estimate Std. Error t-value p-value
## b:(Intercept) -20.76732 2.94421 -7.0536 1.743e-12 ***
## d:(Intercept) 0.20011 0.02830 7.0711 1.537e-12 ***
## e:(Intercept) 196.05291 2.50572 78.2422 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
100 Dose-response analysis using R
0.010
Sample Quantiles
0.000
−0.010
−2 −1 0 1 2
Theoretical Quantiles
FIGURE 5.2
QQ plot of raw residuals from the three-parameter log-logistic model fitted to
the dataset chickweed.
The summary output reveals that only 20 (2.8) percent of the seeds germinated
during the experiment. Moreover, it took 196 (2.5) hours to germinate half of
the seeds that germinated during the experiment.
One way to address this slight model misspecification caused by the cluster-
ing due to use of several petri dishes (even without having information about
which seeds belonged to which dish) is to use robust standard errors (see Sec-
tion A.5 for more explanation). We need to activate the packages lmtest and
sandwich to be able to use the function coeftest() with the argument vcov.
= sandwich.
coeftest(chickweed.LL.3,
vcov. = sandwich)
##
## t test of coefficients:
##
Time-to-event-response data 101
The resulting standard errors increased by a factor between 1.5 and 2.5, re-
flecting that the model was slightly misspecified.
Likewise, estimation of multiple time points corresponding to arbitrary
percentages of germination with robust standard errors may be achieved using
ED() with the same specification as in coeftest(). Below we estimate t10 ,
t50 , and t90 with robust standard errors and 95% confidence intervals.
ED(chickweed.LL.3,
c(10, 50, 90),
interval = "delta",
vcov. = sandwich)
##
## Estimated effective doses
##
## Estimate Std. Error Lower Upper
## e:1:10 176.3697 8.2987 160.1045 192.6349
## e:1:50 196.0529 4.4069 187.4156 204.6902
## e:1:90 217.9328 9.7192 198.8834 236.9821
ED(chickweed.LL.3,
c(0.05, 0.1, 0.15),
interval = "delta",
type = "absolute",
vcov. = sandwich)
##
## Estimated effective doses
102 Dose-response analysis using R
##
## Estimate Std. Error Lower Upper
## e:1:0.05 185.9445 5.7647 174.6459 197.2431
## e:1:0.1 196.0425 4.4069 187.4053 204.6798
## e:1:0.15 206.6817 6.0136 194.8953 218.4680
Using the method plot, we show the fitted regression curve together with
the cumulated proportions of germinated seeds. This plot may also serve as
a kind of graphical model diagnostics. We add a confidence band based on
95% pointwise Wald-type confidence intervals, which may easily be obtained
using the predict() method by providing a dataset of time points for which
to obtain predicted values (in this case 0:300, corresponding to 0, 1, 2, . . .,
300 hours). Figure 5.3 shows the resulting plot. The fitted germination curve
seems to describe the increase in germination over time adequately.
As Wald-type confidence intervals are used for constructing the confidence
band, the lower limit of the band would become slightly negative for around
180 hours if it were not truncated at 0. However, by default the predict()
method has the argument constrain switched on so that truncation is en-
forced (whenever it is reasonable).
plot(chickweed.LL.3,
xlim = c(0, 300), ylim = c(0, 0.30),
xlab = "Time (hours)",
ylab = "Proportion germinated",
log = "")
0.30
Proportion germinated 0.25
0.20
0.15
0.10
0.05
0.00
0 100 200
Time (hours)
FIGURE 5.3
The germination curve based on the three-parameter log-logistic model fit-
ted to the dataset chickweed plotted together with the observed cumulated
proportions and a pointwise 95% confidence band.
head(CadmiumDaphnia)
The dataset contains four variables: Dose denotes the dose of cadmium chlo-
ride, Start and End denote the limits on the monitoring intervals, and, finally,
Dead denotes the number of dead daphnias recorded for the corresponding
monitoring interval. Now we proceed to analyze these data using the meta-
analytic approach.
5.2.1.1 Step 1
We will assume that a two-parameter log-logistic model will fit data for all
doses. This may be highly questionable as all doses are not equally toxic.
However, the duration of the experiment was not long enough to observe
such differences. Moreover, it is not a limitation: different models for differ-
ent doses is also possible as long as the same parameter of interest may be
estimated.
We start out by manually fitting a two-parameter log-logistic model sepa-
rately to data for each dose, and then estimated parameters from the resulting
model fits are combined. It is a somewhat repetitive task, but it sheds some
Time-to-event-response data 105
light on how the meta-analytic approach works. So we start out fitting 7 mod-
els, one per dose, using the function subset() for defining anonymous subsets
on the fly.
daphnia.LL.2.00.0 <- drm(Dead ~ Start + End,
data = subset(CadmiumDaphnia, Dose == "0.0"),
fct = LL.2(),
type = "event")
The next step is to extract the estimated t50 and the corresponding standard
error from each of the above model fits. For the two-parameter log-logistic
model it is the estimate for the parameter e. We store the results in a dataset
called cadmium.t50, which we initially define as an empty dataset.
cadmium.t50 <- data.frame(t50 = rep(NA, 7),
t50.se = rep(NA, 7))
Then the dataset is filled in row by row using summary() in combination with
coef() for extracting estimates and standard errors.
106 Dose-response analysis using R
The manual fitting of 7 models could have been avoided using a call of drm()
with the argument curveid to assume different model parameters for different
doses. The R lines are shown below but are not executed; we invite the reader
to do so.
In theory, exactly the same parameter estimates and standard errors could
then be obtained in one go. However, in practice, parameter estimates and
standard errors may be slightly different as fitting this model involves estima-
tion of 7 + 7 = 14 parameter simultaneously. In general, it is more challenging
to fit such simultaneous models as lack of convergence and convergence to sub-
optimal parameter estimates may often occur, i.e., the estimation procedure
may have problems in case many parameters need to be estimated. Assuming
a shared parameter across doses (a shared slope parameter b) would be an
option (by means of the argument pmodels used in the same way as previ-
ously in Subsection 1.2.1). However, we believe it will in practice be difficult
to justify such assumptions based on a priori knowledge about the underly-
ing biological mechanisms. Moreover, as you will see in the next example (in
Subsection 5.2.2), fitting separate models to subsets of a dataset may be au-
tomated since it better scales up for more complex experimental designs, also
when it comes to merging relevant information about the experimental design
with the estimates obtained. To ensure that all relevant information is carried
Time-to-event-response data 107
over to the second step, we add a column to the dataset with information on
the doses.
cadmium.t50 <- data.frame(Dose = c(0, 3.2, 5.6, 10, 18, 32, 56),
cadmium.t50)
cadmium.t50
The scatterplot in Figure 5.4 shows the estimated t50 values as a function of
dose. A distinct nonlinear trend is discernible but the variation in the estimates
time-to-event seems to depend on the magnitude of the estimates: the larger
the estimate, the larger the variation.
5.2.1.2 Step 2
We use the package metafor to combine estimates from the individual model
fits into pooled estimates through the meta-analytic approach (Viechtbauer,
2010). Specifically we use the function rma(), which takes the estimated
t50 as the first argument and the corresponding squared standard errors as
the second argument. The explanatory variables are specified through a for-
mula in the argument mods. Finally, the argument data is used for provid-
ing the dataset where the variables given in the first three arguments are
found. Specifically, we fit a weighted version of a linear regression model for
the logarithm-transformed t50 as a function of dose through the specification
mods = Dose, exploiting that the estimated standard error for a logarithm-
transformed estimate is the estimated standard error of the untransformed
variable divided by the estimate itself (which may be obtained using the delta
method). The reason for applying a logarithm transformation is that the sum-
mary output showing increased variation with increased estimated t50 values
and, at the same time, this transformation may remedy some of the observed
nonlinearity seen in the below plot (Figure 5.4).
plot(t50 ~ Dose,
data = cadmium.t50,
xlab = "Dose (mu g Cd/L)",
ylab = "Time to 50% died")
60
Time to 50% died
40
20
0
0 10 20 30 40 50
FIGURE 5.4
Estimated t50 values from the first step in the meta-analytic approach as a
function of dose of cadmium chloride (using the dataset CadmiumDaphnia).
coef(summary(cadmium.t50.log.oneway))
head(cadmium.predicted)
The predicted values are found in the first column named pred, whereas the
third and fourth columns contain the lower and upper limits of the (95%) con-
fidence intervals (named ci.lb and ci.ub, respectively). Note that predicted
values and confidence intervals are back-transformed from the logarithmic
scale to the original time scale using the exponential function exp().
So we can redraw the scatterplot with the fitted nonlinear curve and con-
fidence band (Figure 5.5). The fitted regression curve describes the average
trend in the data quite satisfactorily.
plot(t50 ~ Dose,
data = cadmium.t50,
xlab = "Dose (mu g Cd/L)",
ylab = "Time to 50% died")
lines(0:60,
exp(cadmium.predicted[["pred"]]), lty = 1) # fitted line
lines(0:60,
exp(cadmium.predicted[["ci.lb"]]), lty = 2) # lower CI
lines(0:60,
exp(cadmium.predicted[["ci.ub"]]), lty = 2) # upper CI
Time to 50% died
60
40
20
0
0 10 20 30 40 50
FIGURE 5.5
Estimated t50 values from the first step in the meta-analytic approach plotted
vs. doses of cadmium chloride (using the dataset CadmiumDaphnia). The fitted
and back-transformed regression curve (solid line) with a 95% confidence band
(dashed lines) is also shown.
Data are in the dataset blackgrass. The three treatments are encoded in
the variables Bio, Depth, and Temp.
head(blackgrass)
## Exp Temp Popu Bio Depth Rep Start.Day End.Day Ger Accum.Ger TotalSeed
## 1 1 10 914 S 0 1 0 360 0 0 36
Time-to-event-response data 111
## , , Temp = 10
##
## Bio
## Depth R S
## 0 88 88
## 1 88 88
## 3 88 88
## 6 88 88
##
## , , Temp = 17
##
## Bio
## Depth R S
## 0 84 84
## 1 84 84
## 3 84 84
## 6 84 84
## Rep
## Exp 1 2 3 4
## 1 344 344 344 344
## 2 344 344 344 344
Therefore, we start out adding such a variable (given the name Pot) to the
dataset blackgrass. It would be nice if such a unique identifier were always
112 Dose-response analysis using R
included in such datasets as it may help understand where the random varia-
tion in the data is introduced.
Pot has 128 levels, corresponding to a total of 128 pots being used in the
experiment. Now we are ready to begin analyzing the data using the two-step
approach.
5.2.2.1 Step 1
A three-parameter log-logistic model is fitted to data from each pot sepa-
rately. We show how to do this in an automated way, avoiding fitting a model
manually for each pot.
Looping through data from all pots and fitting a dose-response model to
data from each pot separately may effectively be carried out using the R
package plyr in combination with the following helper function that defines
what has to be done for each pot (in this case fitting an LL.3() model). Note
that if you need to use different model functions within the same experiment
(e.g., both LL.2() and LL.3()) then automation may become more difficult
as a systematic approach for model selection is needed. In the code below
dataSet is a placeholder for the subset provided to the function applied for
each pot.
return(modelFit)
}
The model may not converge for all pots, which is the reason why we use try()
inside the helper function; this function will catch all errors and, thereby,
ensure that fitting a lot of models in a loop, as shown below, will not terminate
prematurely due to a few cases with lack of convergence and resulting in no
model fits. For germination data, lack of convergence can happen because of
a low number of seedlings emerging or because all emerging seedlings emerge
at the same or at very few times. In both cases, it may be difficult to fit any
dose-response model. Sometimes manual fitting for problematic cases may still
Time-to-event-response data 113
result in a useful model fit being obtained, but it is more cumbersome and, if
different model functions are assumed, results may be not be fully comparable
between automatic and manual model fits.
We use the function dlply() from plyr for the actual looping: For data cor-
responding to each combination of the three treatment variables Bio, Depth,
and Temp and the grouping variables Exp and Pot, reflecting the hierarchical
design, the above-defined function fitFct.LL.3() is applied. Note that com-
bining these 5 variables yield many more combinations than actually present
in the dataset, but it ensures that the experimental design is carried along with
the estimates. dlply() takes a dataset (data frame) as input and returns a
list.
The list black.grass.modelfits2 is quite large and you may not want to
look at it as it is. We proceed to extract the three parameter estimates and
corresponding estimated standard errors from each model fit. For that pur-
pose, we use another helper function as defined below.
returnVec
}
Again, we exploit functionality from the extension package plyr for looping
through the list of model fits (black.grass.modelfits2). This time we use
the function ldply(), which takes a list and returns a dataset. Note also that
we estimate t50 values that are relative to the estimated upper limits (this
could be changed using the argument type = "absolute" as seen previously
in Subsection 5.2.1).
114 Dose-response analysis using R
summary(blackgrass.parms)
5.2.2.2 Step 2
The second step of the analysis is fitting the meta-analytic random-effects
model for the parameter(s) of interest. We only consider a model for t50.
However, the same approach could be applied to the analysis of the slope and
maximum germination parameters.
Specifically, we consider a model that can be formulated as follows.
blackgrass.parms[["BioDepthTemp"]] <-
with(blackgrass.parms, interaction(Bio, Depth, Temp))
We use the function rma.mv() in the package metafor (Viechtbauer, 2010) for
estimating the random-effects meta-analytic model. The default method for
estimation is REML.
The specification Exp/Pot ensures that the hierarchical structure of the ex-
perimental design is also incorporated in the statistical analysis. A warning
message is issued because missing values occurred and were left out of the
analysis. It reflects that for some pots the three-parameter log-logistic model
could not be fitted. However, as pointed out previously, it may still be possible
to fit models for these pots manually.
To obtain a condensed summary output with estimates, standard errors,
and confidence intervals, the coef() method may be used.
The summary output (see below) also contains the estimated standard de-
viations for the experiment and pot-specific random effects. These estimates
would be useful to report in a publication to give some insight in how ran-
dom variation is split into between-experiment and between-pot variation (we
observe that the standard deviation describing the between-pot variation is
approximately a factor 3 larger than the standard deviation describing the
between-experiment variation).
summary(blackgrass.t50.mm)[["sigma2"]]
All pairwise comparisons with unadjusted p-values are then calculated (but
not shown) using glht().
summary(blackgrass.allpairwise,
test = adjusted(type = "none"))
targetedPairWiseComp <-
c("BioDepthTempR.0.10 - BioDepthTempS.0.10 = 0",
"BioDepthTempR.1.10 - BioDepthTempS.1.10 = 0",
"BioDepthTempR.3.10 - BioDepthTempS.3.10 = 0",
"BioDepthTempR.6.10 - BioDepthTempS.6.10 = 0",
"BioDepthTempR.0.17 - BioDepthTempS.0.17 = 0",
"BioDepthTempR.1.17 - BioDepthTempS.1.17 = 0",
"BioDepthTempR.3.17 - BioDepthTempS.3.17 = 0",
"BioDepthTempR.6.17 - BioDepthTempS.6.17 = 0")
The names of combinations as found in the summary output of the model fit
should be used. It is also important to include -1 in the model specification
as otherwise the parameterization will not allow the above specification of
contrasts. We can use glht() and the corresponding summary() method to
show the comparisons.
blackgrass.targeted.pairwise <- glht(blackgrass.t50.mm,
linfct = targetedPairWiseComp)
summary(blackgrass.targeted.pairwise)
##
## Simultaneous Tests for General Linear Hypotheses
##
## Fit: rma.mv(yi = t50, V = (t50.se)^2, mods = ~BioDepthTemp - 1,
## random = ~1 | Exp/Pot, data = blackgrass.parms)
##
## Linear Hypotheses:
## Estimate Std. Error z value
## BioDepthTempR.0.10 - BioDepthTempS.0.10 == 0 28.043 15.186 1.847
## BioDepthTempR.1.10 - BioDepthTempS.1.10 == 0 25.177 14.372 1.752
## BioDepthTempR.3.10 - BioDepthTempS.3.10 == 0 56.828 13.189 4.309
## BioDepthTempR.6.10 - BioDepthTempS.6.10 == 0 25.657 23.846 1.076
## BioDepthTempR.0.17 - BioDepthTempS.0.17 == 0 -1.010 14.119 -0.072
## BioDepthTempR.1.17 - BioDepthTempS.1.17 == 0 14.602 9.644 1.514
## BioDepthTempR.3.17 - BioDepthTempS.3.17 == 0 15.867 9.185 1.728
## BioDepthTempR.6.17 - BioDepthTempS.6.17 == 0 34.665 13.697 2.531
## Pr(>|z|)
## BioDepthTempR.0.10 - BioDepthTempS.0.10 == 0 0.414905
## BioDepthTempR.1.10 - BioDepthTempS.1.10 == 0 0.485949
## BioDepthTempR.3.10 - BioDepthTempS.3.10 == 0 0.000131 ***
## BioDepthTempR.6.10 - BioDepthTempS.6.10 == 0 0.929323
## BioDepthTempR.0.17 - BioDepthTempS.0.17 == 0 1.000000
## BioDepthTempR.1.17 - BioDepthTempS.1.17 == 0 0.671768
118 Dose-response analysis using R
Benchmark doses are in a sense parallel to the concept of effective doses (see
Section A.13) (Ritz et al., 2013a), but while the latter usually occurs in a
dose-region with a reasonable amount of data available, the benchmark dose
estimation is essentially an interpolation method for the low-dose region where
little or no data is present.
To define a benchmark dose (BMD) we first need to define the background
level, p0 , as the probability of an adverse response for an unexposed popula-
tion. This level could be taken from the literature, estimated from data, or
pre-specified at a fixed level such as 0.05. A benchmark response or benchmark
risk (BMR) is a small increase above the background level in the probability
of an adverse event. The BMD is the dose eliciting a response equal to BMR
on average, assuming some dose-response model.
Usually risk assessment is not based on the BMD itself but on the bench-
mark dose lower limit (BMDL), which is defined as the lower limit of the
confidence interval for the BMD (Crump, 1984). As the estimate of BMDL is
partly determined by the uncertainty of the BMD estimate, it penalizes poor
designs and analysis strategies.
The BMDL may be found using different approaches. The simplest ap-
proach proposed by Crump (Crump, 1984) is using a one-sided Wald-type
confidence interval. However, it may result in negative BMDLs. One simple
solution may be to combine the Wald-type confidence intervals with a trans-
formation (typically the logarithm) to avoid negative values (Buckley et al.,
2009; Namata et al., 2008; Moon et al., 2013). Other approaches for estimat-
ing the BMDL include finding the dose associated with the upper limit of the
confidence band of the fitted dose-response curve (inverse regression) (Buckley
et al., 2009; Fang et al., 2015), different bootstrap strategies (Buckley et al.,
2009; Piegorsch et al., 2012, 2014; Zhu et al., 2007) and profile likelihood in-
tervals (Yu and Catalano, 2005; Izadi et al., 2012; Fox et al., 2017; Ringblom
et al., 2014).
In this chapter, we use the following extension packages:
library(drc)
library(devtools)
install_github("DoseResponse/drcData")
library(drcData)
119
120 Dose-response analysis using R
install_github("DoseResponse/bmd")
library(bmd)
library(metafor)
library(sandwich)
BM R = f (BM D, β) − p0 (6.1)
head(echovirus)
As seen from the data, more people were exposed to the lower doses than
to the higher. This has two major advantages: Fewer people will experience
Benchmark dose estimation 121
an adverse effect and we get more information about the low dose area, which
we are interested in when considering benchmark doses.
The purpose of this study was to estimate the BMD10 , i.e., to find the
safe dose associated with a predefined acceptable level, BMR = 10% increase
over the background level, of pathogen-specific infections. For the dataset
echovirus, this involves extrapolation, which we would usually recommend to
avoid. However, in the present case one could argue that we also know that
no one will be infected unless they are exposed to the specific pathogen. This
means we have the extra information that a potential dose 0 would result in
no infections. In practice, this information is incorporated in our model fitting
by assuming the background level is 0. Accordingly, we fit a two-parameter
log-logistic model.
In this case, we are interested in finding the dose that results in 10% being
infected. Figure 6.1 shows the data together with the estimated dose-response
model (R lines for the plot are provided in Subsection C.2.1). The figure
illustrates how BMD can be found as the dose resulting in the level of risk
determined by the pre-specified level of BMR.
We use the package bmd to estimate the BMD and the corresponding
BMDL as follows:
bmd(pathogen.m1,
0.10,
def = "additional",
backgType = "modelBased")
## BMD BMDL
## 90.32084 10.0272
The estimated BMD resulting in 10% being infected is 90.3 pfu with a BMDL
of 10.0 pfu based on a Wald-type confidence interval that relies on asymptotic,
large-sample results on the behaviour of parameter estimates.
The default method for finding BMDL in bmd() is using the lower limit
of Wald-type confidence intervals. An alternative approach for estimating
BMDL is to use a bootstrap method, which does not, to the same degree,
depend on asymptotic results. We choose a parametric bootstrap approach
where new datasets are generated from a binomial distribution with parame-
ters (Ni , Ni /Yi ) for each dose, i, separately. Here Ni refers to the total number
in dose group i and Yi refers to the number of infected in dose group i. The
122 Dose-response analysis using R
1.0
0.8
Proportion infected
0.6
0.4
0.2
0.0
Echovirus 12 (pfu)
FIGURE 6.1
Two-parameter log-logistic model fitted to the dataset echovirus. With a back-
ground level of 0, BMD is the dose associated with the risk equal to the
pre-specified BMR of 0.10.
## BMD BMDL
## 90.32084 28.78989
The BMDL found by the bootstrap method is much higher than the BMDL
we found above. It may be caused by the relatively few doses in the present
study and the fact that we work with binomial data. A simple histogram of
bootstrap estimates (Figure 6.2) shows that the distribution is skewed and
Benchmark dose estimation 123
140
100
Frequency
60
0 20
Bootstrap estimates
FIGURE 6.2
Histogram of the distribution of the bootstrap estimates used for estimating
BMDL for the dataset echovirus.
further investigations will reveal a dependence between the estimate and the
bootstrap standard error. This phenomenon is well known when working with
distributions like the binomial where the variance is a function of the mean
(Hesterberg, 2015). Consequently, using the percentile bootstrap confidence
interval can give misleading estimates of the BMDL.
As an alternative to the default percentile interval, the function bmdBoot()
has an option called “BCa” for the argument bootInterval using a bias-
corrected and adjusted bootstrap interval (DiCiccio and Efron, 1996).
bmdBoot(pathogen.m1,
0.10,
def = "additional",
backgType = "modelBased",
bootType = "parametric",
R = 1000,
bootInterval = "BCa")
## BMD BMDL
## 90.32084 27.51616
124 Dose-response analysis using R
The corrected bootstrap interval results in almost the same BMDL, indi-
cating that the method based on the adjusted bootstrap interval may not be
needed here.
## BMD BMDL
## 756.442 683.1871
Alternatively, we could consider the excess risk definition. Still using the
estimated background level, BMD is the dose associated with a proportion of
cells being damaged of 0.01816 (0.01·(1-0.00825)+0.00825).
bmd(carbendazim.m1,
0.01,
def = "excess",
backgType = "modelBased")
## BMD BMDL
## 754.9196 682.057
Benchmark dose estimation 125
plot(carbendazim.m1,
broken = TRUE,
xlab = "Carbendazim (ng/ml)",
ylab = "Proportion of damaged cells")
Proportion of damaged cells
0.05
0.04
0.03
0.02
0.01
0 100 1000
Carbendazim (ng/ml)
FIGURE 6.3
Four-parameter log-logistic model fitted to the dataset carbendazim.
For this example with a low background risk, the difference between
additional and excess is small. Looking at the definition of BMD in (6.1)
and (6.2) this is not surprising. For p0 ≈ 0 they both reduce to BM R =
f (BM D, β).
Finally, if we knew from the literature that the background level is actually
0.01, then the BMD associated with a BMR = 0.01 is found by specifying a
user-defined background level, which will overrule any estimated background
level. The BMD is then the dose associated with a proportion of cells being
damaged equal to 0.0199 (0.01·(1-0.01)+0.01).
bmd(carbendazim.m1,
0.01,
def = "excess",
126 Dose-response analysis using R
backgType = "absolute",
backg = 0.01)
## BMD BMDL
## 785.8377 703.7944
Finding the BMD corresponding to a BMR = 0.1 using excess risk is then
straightforward.
BMD.NTP$Results
## BMD BMDL
## 23.90412 20.651
The data are from a National Toxicology Program (NTP) study (National
Toxicology Program, 2006). Other studies, however, examined the same end-
points using the same species, sex, and exposure method. For instance, the
dataset below is from a similar study examining the effect of TCDD on liver
tumors in female Sprague-Dawley rats (Kociba et al., 1978).
Benchmark dose estimation 127
head(TCDD)
BMD.Kociba$Results
## BMD BMDL
## 4.795546 1.175081
meta.m1
##
## Random-Effects Model (k = 2; tau^2 estimator: REML)
##
## tau^2 (estimated amount of total heterogeneity): 178.1907
## (SE = 258.1913)
## tau (square root of estimated tau^2 value): 13.3488
## I^2 (total heterogeneity / total variability): 97.60%
## H^2 (total variability / sampling variability): 41.70
##
## Test for Heterogeneity:
## Q(df = 1) = 41.7000, p-val < .0001
##
## Model Results:
##
## estimate se zval pval ci.lb ci.ub
## 17.5346 10.0634 1.7424 0.0814 0.9818 34.0874 .
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By combing the two studies we find that the estimated BMD is equal to
17.53 ng/kg and the corresponding estimated BMDL is equal to 0.98 ng/kg.
x0 − f (0, β)
p0 = 1 − Φ (6.4)
σ
head(GiantKelp)
## tubeLength dose
## 1 19.58 0.0
130 Dose-response analysis using R
## 2 18.75 0.0
## 3 19.14 0.0
## 4 16.50 0.0
## 5 17.93 0.0
## 6 18.26 5.6
plot(kelp.m1,
type = "all",
broken = TRUE,
xlab = expression(paste("Copper ", mu, "g/L",sep="")),
ylab = "Length germination tube (mm)")
Length germination tube (mm)
20
18
16
14
12
10
8
6
0 10 100
Copper µg/L
FIGURE 6.4
Four-parameter log-logistic model fitted to the dataset GiantKelp.
Benchmark dose estimation 131
bmd(kelp.m1,
0.1,
backgType = "absolute",
backg = 14,
def = "hybridAdd")
## BMD BMDL
## 12.30724 5.248608
We could also have used the cut-off for dichotomizing the continuous re-
sponse variable. Then everything would seemingly become simpler in the sense
that we could stick to the binomial dose-response model where BMD is per-
haps more conveniently defined.
bmd(kelp.m2,
0.1,
backgType = "modelBased",
def = "additional")
## BMD BMDL
## 9.278657 -53.68048
The estimated BMD becomes somewhat smaller, but the estimated BMDL
becomes much smaller, taking on a negative value. So, by dichotomizing the
data, we lose much information. In this example so much information is lost
that we in principle are not able to say anything about the BMD; it could be
any value.
132 Dose-response analysis using R
bmd(kelp.m1,
0.1,
backgType = "hybridSD",
def = "hybridAdd")
## BMD BMDL
## 8.878006 2.542278
Finally, looking at the residual plot in Figure 6.5 we may doubt the validity
of the assumption of variance homogeneity.
plot(resid(kelp.m1) ~ predict(kelp.m1),
ylab = "Residuals",
xlab = "Predicted")
abline(h = 0)
4
2
Residuals
0
−2
8 10 12 14 16 18
Predicted
FIGURE 6.5
Residual plot for the four-parameter log-logistic model fitted to the dataset
GiantKelp.
Benchmark dose estimation 133
## BMD BMDL
## 8.878006 1.275386
In this case the robust standard errors are larger and the estimated BMDL
becomes smaller.
plot(aconiazide.m1,
broken = TRUE,
type = "all",
xlab = "Aconiazide (mg/kg)",
ylab = "Weight change (g)")
Weight change (g)
350
300
250
0 100
Aconiazide (mg/kg)
FIGURE 6.6
Four-parameter log-logistic model fitted to the dataset aconiazide.
We use again the hybrid approach to estimate the BMD and BMDL for a
BMR = 0.05 but this time the background risk is based on a 3 SD cut-off. As an
alternative to the Wald-type confidence intervals used in the previous example,
we use an inverse regression approach (see Section A.9 and Section A.10). The
function bmd() is specified as follows.
bmd(aconiazide.m1,
0.05,
backgType = "hybridSD",
backg = 3,
def = "hybridAdd",
interval = "inv")
## BMD BMDL
## 97.61672 68.73968
Benchmark dose estimation 135
bmd(aconiazide.m2,
0.05,
backgType = "hybridSD",
backg = 3,
def = "hybridAdd",
interval = "inv")
## BMD BMDL
## 97.24473 75.20937
K
X
BMDMA = wk BMDk
k=1
exp(−AICk )
w k = PK
k=1 exp(−AICk )
1.0
LL.2
Proportion infected 0.8 LN.2
W2.2
0.6
0.4
0.2
0.0
Echovirus 12 (pfu)
FIGURE 6.7
Three different two-parameter models fitted to the dataset echovirus.
Figure 6.7 indicates that the Weibull model may be the better choice in
this case (R lines for the plot are provided in Subsection C.2.1).
Besides the visual assessment of the different model fits, AIC may also be
used for comparing the model fits.
## df AIC
## pathogen.m1 2 18.31155
## pathogen.LN.2 2 17.99485
## pathogen.W2.2 2 16.80155
138 Dose-response analysis using R
Based on the AIC values (smaller is better), the Weibull model provides
the better fit to data. As differences between AIC values are small (less than
10), one might suspect that choice of model only has little impact on the
resulting estimated BMD and BMDL values. However, when comparing the
results from the three model fits, large differences are found between estimated
BMD and BMDL values.
bmd.LL.2$Results
## BMD BMDL
## 90.32084 10.0272
bmd.LN.2$Results
## BMD BMDL
## 102.3873 19.13779
bmd.W2.2$Results
## BMD BMDL
## 60.85477 -5.673325
Notice that the BMDL estimated from the Weibull model is –5.67. A nega-
tive estimated BMDL is not meaningful since a dose cannot be < 0. In practice,
a negative value could be truncated at 0, meaning that we cannot be sure that
any dose above 0 result in an added risk less than 10%. The possibility of get-
ting negative values is a drawback of using Wald-type confidence intervals for
estimating BMDL. An alternative approach is to use inverse regression. Esti-
mating BMD and BMDL using inverse regression results in positive estimates
for all 3 models.
Benchmark dose estimation 139
bmd.LL.2.inv$Results
## BMD BMDL
## 90.32084 40.08553
bmd.LN.2.inv$Results
## BMD BMDL
## 102.3873 49.28374
bmd.W2.2.inv$Results
## BMD BMDL
## 60.85477 22.06095
The Weibull model still results in a much lower estimated BMDL compared
to the other models. In this case, both the choice of dose-response model
function and how to derive the BMDL really matters for the conclusion.
Instead of reporting the BMDL from a single “best fitting,” model as we
did above, we can use model averaging. Choosing AIC weights results in the
following weights, which show that all three models will contribute to the
weighted average.
Model averaging can be carried out using the function bmdMA(), which as
the first argument, takes a list of the models. The second argument specifies
the weights to be used. These can be directly specified by the user or indirectly
as shown below. Arguments for how BMD and BMR are specified follow the
style of the function bmd(). Finally, we need to specify how to estimate the
BMDL. The choice type = "Kang" results in an estimated BMDL being the
simple weighted average of the estimated BMDL values from the separately
fitted candidate models.
140 Dose-response analysis using R
bmdMA(modelList = list(pathogen.m1,pathogen.LN.2,pathogen.W2.2),
modelWeights = "AIC",
bmr = 0.1,
def = "additional",
backgType = "modelBased",
interval = "inv",
type = "Kang")
## BMD_MA BMDL_MA
## 73.38836 30.08932
## BMD_MA BMDL_MA
## 98.46404 70.32581
Benchmark dose estimation 141
300
LL.4
LN.4
W1.4
250 W2.4
0 100
Aconiazide (mg/kg)
FIGURE 6.8
Four different three-parameter models fitted to the dataset aconiazide.
## BMD_MA BMDL_MA
## 98.46404 70.32321
142 Dose-response analysis using R
400
Weight change (g)
300
200
100
Individual curves
Model−averaged curve
0
Aconiazide (mg/kg)
FIGURE 6.9
Four different three-parameter models fitted to the dataset aconiazide plot-
ted together with the model-averaged estimated dose-response curve.
bmr = 0.05,
backgType = "hybridSD",
back = 3,
def = "hybridAdd",
type = "curve")
## BMD_MA BMDL_MA
## 98.76187 70.72517
The effects between the groups are modeled on the level of the individual
parameters for each group
β j = Aj β̃ + B j bj
145
146 Dose-response analysis using R
residuals ij ∼ N 0, σ 2 , we can assume that the group effects also follow a
normal distribution
bj ∼ N (0, Ψ)
summarizing their effect on each parameter as a variance component. With
this assumption of modeling the individual parameters as random effects, the
variance on the scale of each parameter is decomposed into between- and
within-group variability. Further, the parametrization of the individual effects
through variance components allows us to estimate population effects from
individual samples with different content of information, e.g., unbalanced de-
signs with different numbers of observations for each individual or different
dose allocation.
Estimating the unknown parameters in the hierarchical nonlinear model
by maximum likelihood is a challenge. In Appendix A.7 two-stage estimators
for β are discussed.
library(devtools)
install_github("DoseResponse/drcData")
install_github("DoseResponse/drc")
install_github("DoseResponse/medrc")
library(medrc)
##
## Two-stage meta-analysis dose-response model
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0
##
## Call:
148 Dose-response analysis using R
The estimates for the population parameters, slope (b), asymptote (d), and
ED50 (e), are presented in the Coefficients section of the summary out-
put together with corresponding standard errors and hypothesis tests for each
coefficient being different to zero. The Variance estimates section contains
the estimated between-assay variance on the scale of each of the three param-
eters, together with the estimated standard deviation as the square root of
the variance.
Functions to obtain the effective dose in the package drc can be directly
applied to the medrc object; here, we are looking at the estimated ED25,
ED50, and ED75 and their corresponding confidence intervals derived by the
delta method.
##
## Estimated effective doses
##
## Estimate Std. Error Lower Upper
## e::25 0.0135045 0.0083568 -0.0043075 0.0313165
## e::50 0.1031208 0.0360900 0.0261969 0.1800448
## e::75 0.7874328 0.1531680 0.4609629 1.1139026
Hierarchical nonlinear models 149
The effective dose estimates are derived from the estimated population pa-
rameters and, therefore, these estimates may also be interpreted as population
averages.
As an alternative to the two-stage approach, the full likelihood can be
maximized with the Lindstrom-Bates algorithm using function medrm().
mod <- medrm(effect ~ conc,
data = vinclozolin,
fct = LL.3(),
random = b + d + e ~ 1|exper,
start = c(0.5, 2000, 0.1))
summary(mod)
The output is structured in the same way as for the metadrc object with
the population parameter estimates in the Fixed effects section and the
between-assay standard deviation estimates in the Random effects section.
The pairwise correlation between random effects are presented as the lower
triangular part of the correlation matrix directly beside the estimated variance
components.
The function ED() can be directly applied to the mixed-effects model ob-
ject:
##
## Estimated effective doses
##
## Estimate Std. Error Lower Upper
## e:1:25 0.0131600 0.0071288 -0.0011895 0.0275096
## e:1:50 0.1013562 0.0313061 0.0383402 0.1643722
## e:1:75 0.7806264 0.1410174 0.4967731 1.0644798
For the comparison of the two herbicides, dose-response curves were fitted as-
suming a three-parameter log-logistic model (Section B.1.1.1) with a separate
set of slope, upper asymptote, and ED50 parameters for each of the two treat-
ments. Individual assay effects were included on the slope, upper asymptote,
and ED50 parameters to model the between-assay variability in the different
model scales. Using the information about the between-assay variability is es-
pecially advantageous as the dose levels for the two herbicides did not cover
the same dose range.
The curveid argument in function drm() defines a set of model parame-
ters for every level of a categorical predictor variable, which groups together
observations within an individual. When we want to estimate a separate set
of parameters for each herbicide, we need to extend the indicator matrix Aj
adding indicator variables for multiple curves on the between-individual level
of the second stage. The metadrm() function contains an argument cid2 that
lets us define a curve identifier to group specific individual curves together.
Similar to the argument pmodels in function drm(), the argument pms2 al-
lows us to define different fixed-effects design matrices for each population
parameter on the between-individual level of the second stage.
metaspinach <- metadrm(SLOPE ~ DOSE,
data = spinach,
fct = LL.3(),
ind = CURVE,
cid2 = HERBICIDE,
struct = "UN")
summary(metaspinach)
##
## Two-stage meta-analysis dose-response model
## Model fitted: Log-logistic (ED50 as parameter) with lower
## limit at 0
##
## Call:
## metadrm(formula = SLOPE ~ DOSE, fct = LL.3(), ind = CURVE,
## data = spinach, cid2 = HERBICIDE, struct = "UN")
##
## Variance estimates:
## estim sqrt
## tau^2.1 0.0005 0.0221
## tau^2.2 0.1856 0.4308
## tau^2.3 0.0000 0.0009
##
## rho.b:(I rho.d:(I rho.e:(I
## b:(Intercept) 1 1.0000 -1.0000
## d:(Intercept) 1.0000 1 -1.0000
## e:(Intercept) -1.0000 -1.0000 1
152 Dose-response analysis using R
##
##
## Coefficients:
## Estimate Std.Err t value Pr(>|t|)
## b:bentazon 0.5021927 0.0252658 19.8764 9.590e-09 ***
## b:diuron 1.6572431 0.1154678 14.3524 1.654e-07 ***
## d:bentazon 1.3091412 0.2495357 5.2463 0.0005303 ***
## d:diuron 1.9858581 0.3056731 6.4967 0.0001119 ***
## e:bentazon 1.7452319 0.1716524 10.1672 3.116e-06 ***
## e:diuron 0.2086242 0.0091462 22.8100 2.840e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
EDcomp(metaspinach,
percVec = c(15, 50, 85),
percMat = rbind(c(1, 1),
c(2, 2),
c(3, 3)),
interval = "delta")
##
## Estimated ratios of effect doses
##
## Estimate Lower Upper
## bentazon/diuron:15/15 0.75332 0.38388 1.12275
## bentazon/diuron:50/50 8.36543 6.32764 10.40322
## bentazon/diuron:85/85 92.89634 47.22788 138.56480
The herbicide Diuron shows an effective inhibition at much lower dose levels
compared to Bentazon; although, a comparison of relative effective dose levels
is difficult to interpret with large differences between the upper asymptotes.
The same three-parameter log-logistic model can be fitted with the
medrm() function, using a full likelihood approach.
data = spinach,
fct = LL.3(),
random = b + d + e ~ 1|CURVE,
start = c(0.5, 1, 1.5, 1.5, 1.5, 0.3))
round(summary(modspinach$fit)$tTable[, 1:2], 3)
## Value Std.Error
## b.HERBICIDEbentazon 0.503 0.028
## b.HERBICIDEdiuron 1.659 0.087
## d.HERBICIDEbentazon 1.311 0.197
## d.HERBICIDEdiuron 1.986 0.241
## e.HERBICIDEbentazon 1.848 0.205
## e.HERBICIDEdiuron 0.209 0.007
The effective doses can be compared in a similar way as for the two-stage
estimation, resulting in comparable estimates.
EDcomp(modspinach,
percVec = c(15, 50, 85),
percMat = rbind(c(1, 1),
c(2, 2),
c(3, 3)),
interval = "fieller")
##
## Estimated ratios of effect doses
##
## Estimate Lower Upper
## bentazon/diuron:15/15 0.80056 0.39510 1.23745
## bentazon/diuron:50/50 8.86013 6.90605 10.89971
## bentazon/diuron:85/85 98.05904 63.37850 135.24171
The construction of the plot in Figure 7.1, showing the predicted curves
for each individual assay together with the population prediction is shown in
Section C.3.1.
2.0
1.5
HERBICIDE
SLOPE
bentazon
1.0 diuron
0.5
0.0
FIGURE 7.1
Scatterplot with the individual assay and population predictions for the two
herbicides bentazon and diuron.
All the eight herbicide preparations have essentially the same mode of
action in the plant; they all act like the plant auxins, which are plant regulators
that affect cell elongation and other essential metabolic pathways.
head(auxins)
## dryweight dose replicate herbicide formulation
## 1 1.51 0.000 1 control control
## 2 1.43 0.000 1 control control
## 3 0.05 1.000 1 MCPA tech
## 4 0.06 0.500 1 MCPA tech
## 5 0.15 0.250 1 MCPA tech
## 6 0.40 0.125 1 MCPA tech
Hierarchical nonlinear models 155
The fixed effect design matrix Aj is defined by including a list for the argument
fixed, containing a formula for each of the model parameters. The 1 codes
for the combined intercept of the common control together with the reference
group: the herbicide MCPA as a technical grades material. All further terms
denote the effect of each other treatment level to this intercept on the scale
of each model parameter.
The function meLL.3() is the medrc version of the three-parameter log-
logistic model, which is compatible for use in the nlme package environment.
156 Dose-response analysis using R
The Intercept denotes the estimated slope, asymptote, and ED50 for the
MCPA reference as technical grades material. The h24D, mp, and dp coefficients
show the largest effects on the ED50 scale; all herbicides as technical grades
material reach the ED50 at a higher dose level compared to MCPA. But most
obvious is an interaction effect on the ED50 scale (e.dpcomm), showing that
the commercial formulation of dichlorprop results in an average increase in
ED50 compared to the dp effect as technical grades material.
Hierarchical nonlinear models 157
tech comm
1.5
1.0
herbicide
dryweight
MCPA
24−D
mecorprop
dichlorprop
0.5
0.0
FIGURE 7.2
Scatterplot with the individual replicate and population predictions for four
different herbicides.
The ggplot2 package is used to draw the scatterplot in Figure 7.2, adding
lines for the predictions on the individual replicate and population level. The
R code to construct the plot can be found in Section C.3.2.
sica oleracea var. italica Plenck). Two stress treatments (not watered and a
watered control) are randomly assigned to four plants per genotype (2 per
treatment) resulting in 192 plants in total. For the genotypes 5, 17, 31, and
48, an additional 12 plants (6 per treatment) are included in the completely
randomized design, which results in a total of 240 plants. For each plant, the
length of the youngest leaf at the beginning of the experiment is measured
daily for a period of 16 days. For the additional 12 plants of the 4 genotypes,
the leaf water potential was measured as a secondary endpoint (omitted here);
due to these destructive measurements, some dropouts occurred.
The growth curves are assumed to follow a three-parameter logistic model
(Section B.1.1.1), setting the lower asymptote to zero. Population parameters
are defined as an intercept at the control treatment and as the difference in
treatment levels when stressing the plants. For each genotype, an individual
intercept and stress effect is assumed as random effects with an unstructured
covariance matrix. The correlation between repeatedly observed leaf lengths
within a plant is estimated assuming an autoregressive AR1 structure for the
residuals.
brmod <- nlme(LeafLength ~ meL.3(Day, b, d, e), data = broccoli,
fixed=b + d + e ~ 1 + Stress,
random=b + d + e ~ 1 + Stress | Genotype,
correlation = corAR1(form = ~ Day | Genotype/ID),
start = c(-0.4, 0, 15, 0, 5, 0))
summary(brmod)
The summary output shows the estimated population parameters as fixed ef-
fects for the control group (intercept), and the difference between parameters
for drought stress plants compared to the control. The estimated negative
population effects indicate a slower growth with a smaller slope and the in-
flection point being located at an earlier time. Additionally, the estimated
effect for the asymptote shows that the drought stress treatment results in
smaller plants. On the scale of each of these parameters, the between-genotype
variability is estimated; the genotype standard deviation is presented in the
Random effects section together with the pairwise correlation coefficients.
The dependency between residuals from a single plant is parametrized as the
coefficient of the autoregressive AR1 structure. The estimate of the coefficient
is near one, indicating highly correlated residuals within a plant.
Additionally, the predicted random effects can be obtained with the func-
tion ranef() or alternatively the linear function of fixed and random effects
with the function coef() (not shown here). Instead, we can visualize the
genotype-stress interaction by plotting the predicted leaf lengths on the level
of each genotype-stress combination in Figure 7.3. The corresponding R code
can be found in Section C.3.3.
160 Dose-response analysis using R
1 2 3 4 5 6 7
20
15
10
5
8 9 10 11 12 13 14
20
15
10
5
15 16 17 18 19 20 21
20
15
10
5
LeafLength
22 23 24 25 26 27 28 Stress
20
15
10 control
5
drought
29 30 31 32 33 34 35
20
15
10
5
36 37 38 39 40 41 42
20
15
10
5
4 81216
43 44 45 46 47 48
20
15
10
5
4 81216 4 81216 4 81216 4 81216 4 81216 4 81216
Day
FIGURE 7.3
Plant-specific growth curve predictions, comparing drought-stressed broccoli
plants vs. a watered control for each genotype.
The drought stress mainly has an effect on the upper asymptote parameter,
that is, the stressed plants are smaller than the watered plants. But we can also
see a large negative correlation between the genotype-specific stress effects and
the genotype-specific asymptote at the control; hence, genotypes with larger
leaves also show the larger response to the drought stress treatment.
Appendix A
Estimation
Once a suitable dose-response model function has been found, the next step
is to choose a suitable estimation procedure, which should ideally exploit the
type of response as much as possible.
Least squares estimation, which is a special case of maximum likelihood
estimation, should be used for continuous response variables that are approx-
imately normally distributed with the same standard deviation for all doses
(variance homogeneity) (Meister and den Brink, 2000). Section A.1 below
provides more details. More general maximum likelihood estimation involv-
ing distributional assumptions different from the normal distribution, may be
more appropriate in case the response is binary, a sum of binary variables
(Piegorsch and Bailer, 2005, pp. 172-179), or a count with a substantial por-
tion of zeros (Kerr and Meador, 1996; Ritz and Van der Vliet, 2009). More
details are provided in Section A.2 below. In practice, least squares estimation
is often applied also for responses that are not normally distributed. This ap-
proach makes sub-optimal use of the information available in the data, often
resulting in a loss in efficiency seen as unnecessary large standard errors (Szöcs
and Schäfer, 2015). Therefore, it is important to ensure alignment between the
type of response and the estimation procedure used.
To deal with model misspecification the results from maximum likelihood
estimation or even the entire estimation procedure may be modified (see Sec-
tion A.3 on transformations, Section A.4 on robust estimation, and Section A.5
on sandwich variance estimators).
Constrained estimation may be needed in case certain restrictions have to
be enforced for certain parameters (see Section A.6). However, surprisingly,
in most cases unconstrained estimation will suffice, i.e., no bounds need to be
imposed on ranges of any parameters even though ranges for some parameters
may be restricted (e.g., ED50/LD50 has to be non-negative), as usually reason-
able parameter estimates will be obtained if data carry sufficient information
about the parameters. Unreasonable parameter estimates often imply lack of
information. It can also happen that the estimation procedure is not successful
in finding the optimal parameter estimates. For dose-response analysis such
problems often happen due to poor choice of starting values for the parameter
estimates: the estimation procedure needs to be provided with good starting
values when searching for the optimal parameter estimates. The package drc
relies heavily on self-starter routines that compute data-driven starting values.
Section A.8 explains the idea in more detail.
161
162 Dose-response analysis using R
where wi ’s are user-specified weights (often left unspecified, i.e., equal to 1).
β is the vector of all model parameters, i.e., the parameters b, c, d, . . . of
Appendix B.
In drm(), weights are specified through the argument weights and they
should be on the same scale as the response, e.g., expressed as standard de-
viations and not empirical variances. Weights may be used for addressing
variance heterogeneity in the response. However, the transform-both-sides ap-
proach should be preferred instead of using often very imprecisely determined
weights.
Equation (A.1) has to be solved numerically in an iterative manner. One
approach is to use iteratively weighted least squares as is done in nls(), which
is part of the standard installation of R (Ritz and Streibig, 2008). Another
approach is to use a general-purpose minimizer directly, as in drm() where
optim() is used in combination with some pre-scaling of parameters. In our
experience drm() is more robust than is nls(); lack of robustness of nls()
has also been pointed out elsewhere (Nash, 2014).
Estimation 163
The scaling factor σ̂ is the residual standard error, which is estimated in the
same way as in linear regression. The observed information matrix (“hessian”)
in equation (A.2) is approximated numerically in optim() upon convergence.
However, it is not even a requirement that the inverse is available: if it is not
then NAs are returned for standard errors.
E(Yi ) = f (xi , β)
p
SD(Yi ) = f (xi , β) + ω(xi , β)2 (A.8)
Monitoring intervals where no event was observed may be left out of the
analysis as they will not contribute to the estimation (nj = 0 implies a zero
term in the log likelihood); in some cases they have to be left out to achieve
convergence of the estimation procedure.
The implicit assumption of the above log likelihood is that right-censoring
only happens at the end of the experiment, i.e., it is possible to follow all
seeds or organisms until the end of the experiment unless the event of interest
happens. However, it could also happen that some seeds or organisms are
right-censored during the experiment, e.g., seeds are dormant. Thus, the log
likelihood would need to be modified slightly (Ritz et al., 2013b).
Model checking can be done by visually assessing the agreement between
the observed cumulative germination curve and the fitted curve based on, or
166 Dose-response analysis using R
by looking at, the residual plot based on cumulative residuals (McCullagh and
Nelder, 1989, p. 179).
with the transformed response on the left-hand side and the transformed
model function on the right-hand side. Usually the function gλ is taken to
be the Box-Cox transformation gλ (y) = (y λ − 1)/λ for some suitable choice of
λ ∈ R. The value λ = 1 implies no transformation whilst λ = 0 corresponds
to the logarithm transformation. All other values of λ correspond to power
transformations. It is noteworthy that the Box-Cox transformation may al-
leviate variance heterogeneity and some skewness in the distribution of the
response and thus recover a normal distribution, but it may not remedy other
problems with the distributional assumptions such as counts observed with
ties (Ritz and Van der Vliet, 2009).
The package drc provides a boxcox method, which has been implemented
in much the same way as the corresponding method for linear models available
in the package MASS (Venables and Ripley, 2002). There is, however, a choice
between a profiling approach as used for linear models in MASS or a more
robust analysis of variance (ANOVA) approach where the optimal λ is esti-
mated from a more general ANOVA model, i.e., a linear model, and not from
the specified dose-response model; the latter requires replicate observations
for at least some doses.
Manual specification of the transformation is also possible through the
arguments bcVal and bcAdd, which correspond to λ and C in equation (A.10),
respectively. Using a specific transformation (i.e., a specific value of λ and
possibly C) based on previous experience should be preferred over using a
data-driven choice.
a robust estimation procedure, which will weigh down the influence of such
observations.
In drc robust dose-response analysis with a continuous response is available
through the argument robust using the same models as considered in the case
of robust linear regression (Venables and Ripley, 2002), except for the model
based on Hampel’s ψ, which is currently not implemented. However, it may
be difficult to fit such robust nonlinear models unless accurate starting values
for the model parameters are provided.
The estimated “information”-type variance-covariance has the following
form (Huber, 1981; Stromberg, 1993):
−1
∂2ρ
2
cov(β̂) = σ̃ (A.11)
∂βp1 ∂βp2
where the function ρ controls the influence of observations on the estimation
procedure (e.g., ρ(y) = y 2 corresponds to ordinary nonlinear least squares
estimation). Once parameter estimates have been obtained, all methods and
extractors available in drc may be used as if the model fit had been obtained
using maximum likelihood estimation.
2013). Several estimators for β are available, where the most popular among
R users utilizes the Lindstrom-Bates algorithm.
∂f j ∂f j
X̂ j = 0 ˆ , Ẑ j =
∂b0j β̃,
ˆ
∂ β̃ β̃,b̂j b̂j
The precision factor ∆ that is estimated in the second step is again plugged
in the penalized least-squares step. The algorithm then iterates between the
two steps until convergence. Pinheiro and Bates (2000) and Demidenko (2013)
provide additional information about the algorithm and numerical details of
its implementation in R.
Following Pinheiro and Bates (2000), we can assume the following distri-
bution for the fixed-effects estimator:
−1
M
ˆ ∼ N β̃, σ 2 X X̂ Σ−1 X̂ , where Σ = I − Ẑ ∆−1 ∆−T Ẑ T
β̃ j j j j j j
j=1
Estimation 171
Note that these limits are well defined and finite for dose-response models (for
most models in drc they correspond to the parameters d and c, respectively).
For a monotonously increasing mean function the same equation is obtained
by interchanging the limits.
The resulting effective dose is a relative quantity, defined in terms of a
percentage reduction. For instance, ED50 (α = 0.50) is the dose resulting
in a 50% reduction in the average response relative to the lower and upper
limits of f . Such relative effective doses are mostly suitable for continuous
174 Dose-response analysis using R
f (EDy0 , β) = y0 (A.14)
An absolute effective dose for y0 ∈]c, d[ may always be calculated as some rel-
ative effective dose for a suitably derived α value. In some cases, this approach
will involve parameter estimates of the lower and upper limits, but at present
the variation in these estimates will not be propagated to the estimated effec-
tive dose.
Estimated effective doses are obtained by inserting parameter estimates
and solving equation (A.13) with regards to the derived parameter ED100α.
In drc, the function ED() will calculate estimated effective doses, and the
argument type controls the type of effective dose being calculated: relative
(default) or absolute. Note also that for hormesis models, effective doses
may also be meaningfully defined for some α < 0 (by setting the argument
bound=TRUE in ED()). Model-averaged estimated ED values may be obtained
through the function maED() (Kang et al., 2000).
(Seber and Wild, 1989, p. 337). The parameter c denotes the lower asymptote
or limit of the dose-response model, and the parameter d denotes the upper
asymptote or horizontal limit of the dose-response curve. There will always be
a parameter b, which, in one way or another, will reflect the rate of change of
the dose-response curve between the upper and lower limits, analogous with
the slope coefficient of a simple linear regression model. However, in general,
dose-response models have no equivalent to an intercept and hence there is
no model parameter named a. The number of additional parameters (denoted
. . .) will depend on the choice of F (usually 1–3) and these parameters will be
denoted e, f etc.
In addition, many of the model functions are scale invariant in the sense
that the model itself, usually through the parameter e, which acts as a scaling
factor, accommodates the magnitude of doses. Likewise, many of the model
functions involve the logarithm. We use the natural logarithm and its inverse,
the exponential function (power of e), but most models may also be formulated
using any other logarithm (resulting in a slightly different reparameterization
of the models). For instance, models using the base-10 logarithm are occasion-
ally used.
177
178 Dose-response analysis using R
So, apart from the sign, there is a scaling factor, depending on c, d, and e,
that converts the parameter b into the slope; therefore, the use of the phrase
“relative slope.” Note that the scaling factor may be viewed as a kind of
normalization factor involving the range of the response. As a consequence,
estimated b values often lie in the range 0.5 to 20 in absolute value, in a way
centered around 1, regardless of the assay or experiment generating the data.
The four-parameter log-logistic model may also be parameterized in other
ways than shown in Eq. (B.2). One such alternative parameterization is where
the logarithm of ED50 denoted by ẽ, say, is a model parameter instead of e
Dose-response model functions 179
B.1.2 Extensions
B.1.2.1 Generalized log-logistic models
Generalized log-logistic model functions may be obtained based on cdf’s of the
Burr type III and XII distributions (Pant and Headrick, 2013). These functions
involve one additional asymmetry parameter compared to the four-parameter
log-logistic model and they include the four-parameter log-logistic model as a
special case. Inclusion of the parameter f implies that the parameter e loses
its interpretation as ED50, although ED50 may still be estimated as a derived
parameter.
The five-parameter Burr type III generalized log-logistic model function
was proposed by Finney (1979) for describing continuous dose-response data.
It is also known as the Richards model (Seber and Wild, 1989, pp. 332–333,
Sand et al., 2006). Specifically, the model function is defined as follows:
d−c
f (x, (b, c, d, e, f )) = c +
(1 + exp[b{log(x) − log(e)}])f
= c + (d − c)(1 + (x/e)b )−f (B.9)
where the parameter f , which should be positive, controls the degree of asym-
metry: f < 1 and f > 1 lead to a more rapid or a slower descent towards
the limits (for a more detailed explanation, see below under Weibull models);
see Sand et al. (2006) for an illustration of the different types of asymmetry.
For f = 1 the four-parameter log-logistic model is recovered, i.e., there is no
asymmetry. Gottschalk and Dunn (2005) provide a more detailed description
of the asymmetry.
For binomial data, the special case of the Burr type III generalized log-
logistic model obtained by fixing the lower and upper limits at 0 and 1, respec-
tively, is the so-called convenient three-parameter model proposed by Prentice
(1976); see also Shao (2000); Scholze et al. (2001).
The five-parameter Burr type XII generalized log-logistic model is defined
as follows:
This five-parameter model does not seem to have been used in practice for
fitting dose-response data. Currently, this model is not in drc.
Dose-response model functions 181
For binomial data there are a number of special cases: the three-parameter
model obtained by fixing the lower and upper limits at 0 and 1 (Prentice,
1976; Stukel, 1988), in another parameterization the related three-parameter
Aranda-Ordaz model (Aranda-Ordaz, 1981), and the so-called approximate
beta-Poisson model for b = 1, c = 0, and d = 1 (e.g., Namata et al., 2008).
at a value larger than 0 and less than or equal to 1. The closer α is to 1 the
steeper the ascent is towards the hormesis peak.
In contrast to the other log-logistic type models, the Brain-Cousens and
Cedergreen-Ritz-Streibig models are sensitive to the magnitudes of the doses,
which may need to be manually up- or downscaled appropriately prior to
model fitting (Belz and Piepho, 2012).
Moreover, these models can only describe decreasing trends, i.e., they are
suitable for modeling so-called inverse j-shaped hormesis models. However,
a modified Cedergreen-Ritz-Streibig model functions for u-shaped hormesis
models for describing increasing dose-response trends are also available:
d − c + f exp(−1/xα )
f (x, (b, c, d, e, f )) = d − (B.14)
1 + (x/e)b
d1 − c1 d2
f (x, (b1 , b2 , c1 , d1 , d2 , e1 , e2 )) = c1 + b
+ (B.15)
1 + (x/e1 ) 1 1 + (x/e2 )b2
The powers p1 and p2 need to be specified in advance (they are not estimated
from the data) and p1 has to be negative, whereas p2 has to be positive.
Following the recommendations of Royston and Altman (1994) the powers
should be chosen among the numbers –2, –1, –0.5, 0, 0.5, 1, 2, 3, e.g., some
choices are (–0.5, 0.5), (–1,1), (–1,2), (–1,3), or (–2,3).
Fractional polynomial models have been proposed as a flexible class of
candidate models for model averaging by including several choices of (p1 , p2 )
(Faes et al., 2007; Namata et al., 2008).
Finally, we note that fractional polynomial models may also be derived for
log-normal and Weibull models (Namata et al., 2008).
where the only restriction on the parameters is that the parameter e has to
be positive.
This is the Weibull growth model considered by Piegorsch and Bailer (2005,
pp. 79–82) (in a slightly different parameterization). In contrast to the log-
logistic model, the parameter e is not equal to ED50, but it is still the location
of the inflection point of the dose-response curve. The parameter b is reflecting
the steepness of the dose-response curve, being proportional to the slope at
the dose equal to e: the larger absolute values the steeper the curve. The
asymmetry can be characterized by comparison to the symmetric log-logistic
model. The dose-response curve descends slowly from the upper limit, but the
curve approaches the lower limit rapidly. In fact, the behaviour of the Weibull
type 1 model at the upper limit is very similar to the behaviour of the log-
logistic model (which can be seen by using the approximation exp u ≈ 1 + u
for u close to 0). The figure shown by McCullagh and Nelder (1989, p. 109)
is insightful in order to understand the differences between the four classes of
models.
B.4.3 NEC
Another built-in model is the so-called “no effect concentration” (NEC) model
proposed by Pires et al. (2002). The corresponding model function involves
four parameters:
c + (d − c) exp(b(x − e)) if x > e
f (x, (b, c, d, e)) =
d if x ≤ e
This model is a so-called threshold model such that the corresponding dose-
response curve is non-differentiable at the dose, where the constant, low-dose
level (equal to d) abruptly changes into exponential decay. This model is
similar to the hockey stick model (Environment Canada, 2005, pp. 108–110;
Ritz and Streibig, 2008, p. 41), except that linear decay has been replaced by
exponential decay.
This model may be used to describe biphasic dose-response data that exhibit
symmetric behaviour in terms of increase to and decrease from the maximum
response level, i.e., same behaviour for both phases.
The log-Gaussian model is defined in a similar way, but due to the log-
arithmic terms, asymmetry in the behaviour of the two phases (e.g., rapid
increase vs. slower decrease) may be incorporated:
For both models the parameter f may be useful for capturing varying shapes
in the dose-response data.
• Fixing f = 1:
d−c
c+
1 + exp(b(log(x) − log(e)))
In drc: LL.5(fixed = c(NA, NA, NA, NA, 1)) or abbreviated LL.4()
• Fixing both f = 1 and c = 0:
d
1 + exp(b(log(x) − log(e)))
Note that NAs are used to indicate that parameters are to be estimated
from the data. Note also that model parameters come in alphabetical order:
b, c, d, e, f , which is assumed when specifying the argument fixed.
Dose-response model functions 189
TABLE B.1
List of more commonly used model functions with the corresponding names
in drc.
NEC
4 NEC.4()
3 NEC.3()
2 NEC.2()
Biphasic with a
peak
5 gaussian()
5 lgaussian()
190 Dose-response analysis using R
TABLE B.2
List of more specialized model functions with the corresponding names in drc.
Brain-Cousens
Five-parameter BC.5()
Four-parameter BC.4()
Cedergreen-Ritz-Streibig Five-parameter
α=1 CRSa.5()
α = 0.5 CRSb.5()
α = 0.25 CRSc.5()
Four-parameter
α=1 CRSa.4()
α = 0.5 CRSb.4()
α = 0.25 CRSc.4()
In this appendix we provide R code for the plotting example where more than
a few lines are needed to obtain the result. We use the following extension
packages:
library(drc)
library(devtools)
install_github("DoseResponse/drcData")
library(drcData)
install_github("DoseResponse/medrc")
library(medrc)
library(ggplot2)
191
192 Dose-response analysis using R
plot(pathogen.m1,
broken = TRUE, bp = 10,
xlab = "Echovirus 12 (pfu)",
ylab = "Proportion infected",
xlim = c(0, 100000),
ylim = c(0, 1),
axes = FALSE)
R code for plots 193
6
Root length (cm)
25 50 75100
Ferulic acid (mM)
FIGURE C.1
Four-parameter log-logistic model with confidence bands fitted to the dataset
ryegrass.
194 Dose-response analysis using R
Likewise, the R lines for showing model averaging in Figure 6.7 in Subsec-
tion 6.3.1 look like this:
The R lines for producing Figure 6.9 in Subsection 6.3.2 are provided below.
AICWeights<-exp(-aconiazide.AIC$AIC) /
sum(exp(-aconiazide.AIC$AIC))
library(dplyr)
library(tidyr)
pdata <- spinach %>%
group_by(CURVE, HERBICIDE) %>%
expand(DOSE = exp(seq(-5, 5, length = 50)))
With the information about each predicted curve in the new dataframe, the
package ggplot2 is used to add the predictions to a scatterplot of the set of
observations, shown in Figure 7.2.
library(dplyr)
library(tidyr)
pauxdata <- auxins %>%
group_by(replicate, herbicide, formulation,
h24D, mp, dp, comm, h24Dcomm, mpcomm, dpcomm) %>%
expand(dose=exp(seq(-4, 0, length = 25)))
pauxdata <- subset(pauxdata, formulation != "control")
ggplot(broccoli,
aes(y = LeafLength, x = Day,
linetype = Stress, group = ID)) +
geom_line(colour = "grey") +
geom_line(data = pbroc, aes(y = ygeno)) +
facet_wrap(~ Genotype) +
theme_bw()
Bibliography
199
200 Bibliography
van der Hoeven, N. (1997). How to measure no effect. Part III: Statistical
aspects of NOEC, ECx and NEC estimates. Environmetrics 8, 255–261.
Hothorn, T., Bretz, F. and Westfall, P. (2008). Simultaneous inference in
general parametric models. Biometrical Journal 50, 346–363.
van Houwelingen, H. C., Arends, L. R. and Stijnen, T. (2002). Advanced meth-
ods in meta-analysis: Multivariate approach and meta-regression. Statistics
in Medicine 21, 589–624.
Huber, P. J. (1981). Robust Statistics. John Wiley & Sons, Hoboken.
Inderjit, Streibig, J. C. and Olofsdotter, M. (2002). Joint action of pheno-
lic acid mixtures and its significance in allelopathy research. Physiologia
Plantarum 114, 422–428.
Izadi, H., Grundy, J. E. and Bose, R. (2012). Evaluation of the Benchmark
Dose for Point of Departure Determination for a Variety of Chemical Classes
in Applied Regulatory Settings. Risk Analysis 32, 830–835.
Jager, T., Albert, C., Preuss, T. G. and Ashauer, R. (2011). General unified
threshold model of Survival – a toxicokinetic-toxicodynamic framework for
ecotoxicology. Environmental Science & Technology 45, 2529–2540.
Jensen, S. M., Andreasen, C., Streibig, J. C., Keshtkar, E. and Ritz, C. (2017).
A note on the analysis of germination data from complex experimental
designs. Seed Science Research 27, 321–327.
Jensen, S. M. and Ritz, C. (2015). Simultaneous inference for model averaging
of derived parameters. Risk Analysis 35, 68–76.
Jensen, S. M. and Ritz, C. (2018). A comparison of approaches for simulta-
neous inference of fixed effects for multiple outcomes using linear mixed
models. Statistics in Medicine 37, 2474–2486.
Jeske, D. R., Xu, H. K., Blessinger, T., Jensen, P. and Trumble, J. (2009).
Testing for the equality of EC50 values in the presence of unequal slopes
with application to toxicity of selenium types. Journal of Agricultural,
Biological, and Environmental Statistics 14, 469–483.
Jiang, X. and Kopp-Schneider, A. (2014). Summarizing EC50 estimates from
multiple dose-response experiments: A comparison of a meta-analysis strat-
egy to a mixed-effects model approach. Biometrical Journal 56, 493–512.
Kalaian, H. A. and Raudenbush, S. W. (1996). A multivariate mixed linear
model for meta-analysis. Psychological Methods 1, 227–235.
Kang, S.-H., Kodell, R. L. and Chen, J. J. (2000). Incorporating model un-
certainties along with data uncertainties in microbial risk assessment. Reg-
ulatory Toxicology and Pharmacology 32, 68–72.
204 Bibliography
Keller, F., Giehl, M., Czock, D. and Zellner, D. (2002). PK-PD curve-fitting
problems with the Hill equation? Try one of the 1-exp functions derived
from Hodgkin, Douglas or Gompertz. International Journal of Clinical
Pharmacology and Therapeutics 40, 23–29.
Kerr, D. R. and Meador, J. P. (1996). Modeling dose response using general-
ized linear models. Environmental Toxicology and Chemistry 15, 395–401.
Keshtkar, E., Mathiassen, S. K., Beffa, R. and Kudsk, P. (2017). Seed germi-
nation and seedling emergence of blackgrass (Alopecurus myosuroides) as
affected by non–target-site herbicide resistance. Weed Science 65, 732–742.
Kociba, R., Keyes, D., Beyer, J., Carreon, R., Wade, C., Dittenber, D.,
Kalnins, R., Frauson, L., Park, C., Barnard, S., Hummel, R. and Humiston,
C. (1978). Results of a two-year chronic toxicity and oncogenicity study of
2,3,7,8-tetrachlorodibenzo-p-dioxin in rats. Toxicology and Applied Phar-
macology 46, 279–303.
Kodell, R. L. and West, R. W. (1993). Upper confidence limits on excess risk
for quantitative responses. Risk Analysis 13, 177–182.
Kooijman, S. (1981). Parametric analyses of mortality rates in bioassays. Wa-
ter Research 15, 107–119.
Kratzer, D. D. and Littell, R. C. (2004). Appropriate statistical methods to
compare dose responses of methionine sources. Poultry Science 85, 947–954.
Lindstrom, M. J. and Bates, D. M. (1990). Nonlinear mixed effects models for
repeated measures data. Biometrics 46, 673.
MacDougall, J. (2006). Analysis of dose–response Studies – Emax model. In
Dose Finding in Drug Development (ed. N. Ting). Springer.
Martin-Betancor, K., Ritz, C., Fernández-Piñas, F., Leganés, F. and Rodea-
Palomares, I. (2015). Defining an additivity framework for mixture research
in inducible whole-cell biosensors. Scientific Reports 5, 17200.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models. Chapman
& Hall, Boca Raton, Florida, second edn.
Meister, R. and van den Brink, P. J. (2000). The analysis of laboratory toxicity
experiments (Chapter 4). In Statistics in Ecotoxicology (ed. T. Sparks), 99–
118. John Wiley & Sons, Chichester.
Meister, R. and den Brink, P. J. V. (2000). The analysis of laboratory toxicity
experiments. In Statistics in Ecotoxicology (ed. T. Sparks), 99–118. Wiley.
Moon, H., Kim, S. B., Chen, J. J., George, N. I. and Kodell, R. L. (2013).
Model uncertainty and model averaging in the estimation of infectious doses
for microbial pathogens. Risk Analysis 33, 220–231.
Bibliography 205
Pipper, C. B., Ritz, C. and Bisgaard, H. (2012). A versatile method for confir-
matory evaluation of the effects of a covariate in multiple models. Applied
Statistics 61, 315–326.
Pires, A. M., Branco, J. A., Picado, A. and Mendonça, E. (2002). Models for
the estimation of a “no effect concentration.” Environmetrics 13, 15–27.
Prentice, R. L. (1976). A generalization of the probit and logit methods for
dose response curves. Biometrics 32, 761–768.
R Core Team (2018). R: A Language and Environment for Statistical Com-
puting. R Foundation for Statistical Computing, Vienna, Austria.
Racine, A., Grieve, A. P., Fluhler, H. and Smith, A. F. M. (1986). Bayesian
methods in practice: Experiences in the pharmaceutical industry. Applied
Statistics 35, 93–150.
Racine-Poon, A. (1988). A Bayesian approach to nonlinear calibration prob-
lems. Journal of the American Statistical Association 83, 650–656.
Ricketts, J. H. and Head, G. A. (1999). A five-parameter logistic equation
for investigating asymmetry of curvature in baroreflex studies. American
Journal of Physiology (Regulatory Integrative Comp. Physiol. 46) 277, 441–
454.
Ringblom, J., Johanson, G. and Öberg, M. (2014). Current modeling practice
may lead to falsely high benchmark dose estimates. Regulatory Toxicology
and Pharmacology 69, 171–177.
Ritz, C. (2010). Towards a unified approach to dose-response modeling in
ecotoxicology. Environmental Toxicology and Chemistry 29, 220–229.
Ritz, C., Baty, F., Streibig, J. C. and Gerhard, D. (2015). Dose-response anal-
ysis using R. PLoS ONE 10, e0146021.
Ritz, C., Cedergreen, N., Jensen, J. E. and Streibig, J. C. (2006). Relative
potency in nonsimilar dose-response curves. Weed Science 54, 407–412.
Ritz, C., Gerhard, D. and Hothorn, L. A. (2013a). A unified framework for
benchmark dose estimation applied to mixed models and model averaging.
Statistics in Biopharmaceutical Research 5, 79–90.
Ritz, C., Pipper, C. B. and Streibig, J. C. (2013b). Analysis of germination
data from agricultural experiments. European Journal of Agronomy 45, 1–6.
Ritz, C. and Streibig, J. C. (2005). Bioassay analysis using R. Journal of
Statistical Software 12, 1–22.
Bibliography 207
Ritz, C. and Van der Vliet, L. (2009). Handling non-normality and variance
heterogeneity for quantitative sublethal toxicity tests. Environmental Tox-
icology and Chemistry 28, 2009–2017.
Royston, P. and Altman, D. G. (1994). Regression using fractional polynomi-
als of continuous covariates: Parsimonious parametric modelling. Applied
Statistics 43, 429–467.
Sand, S., Filipsson, A. F. and Victorin, K. (2002). Evaluation of the benchmark
dose method for dichotomous data: Model dependence and model selection.
Regulatory Toxicology and Pharmacology 36, 184–197.
Sand, S., von Rosen, D., Victorin, K. and Filipsson, A. F. (2006). Identification
of a critical dose level for risk assessment: Developments in benchmark dose
analysis of continuous endpoints. Toxicological Sciences 90, 241–251.
Sand, S., Victorin, K. and Filipsson, A. F. (2008). The current state of knowl-
edge on the use of the benchmark dose concept in risk assessment. Journal
of Applied Toxicology 28, 405–421.
Scholze, M., Boedeker, W., Faust, M., Backhaus, T., Altenburger, R. and
Grimme, L. H. (2001). A general best-fit method for concentration-response
curves and the estimation of low-effect concentrations. Environmental Tox-
icology and Chemistry 20, 448–457.
Seber, G. A. F. and Wild, C. J. (1989). Nonlinear Regression. John Wiley &
Sons, New York.
Shao, K. (2012). A comparison of three methods for integrating historical
information for Bayesian model averaged benchmark dose estimation. En-
vironmental Toxicology and Pharmacology 34, 288–296.
Shao, Q. (2000). Estimation for hazardous concentrations based on NOEC
toxicity data: An alternative approach. Environmetrics 11, 583–595.
Slob, W. (2002). Dose-response modeling of continuous endpoints. Toxicolog-
ical Sciences 66, 298–312.
Stephenson, G. L., Koper, N., Atkinson, G. F., Solomon, K. R. and Scrog-
gins, R. P. (2000). Use of nonlinear regression techniques for describing
concentration-response relationships of plant species exposed to contami-
nated site soils. Environmental Toxicology and Chemistry 19, 2968–2981.
Streibig, J. C. (1981). A method for determining the biological effect of her-
bicide mixtures. Weed Science 29, 469–473.
208 Bibliography
Yu, Z.-F. and Catalano, P. J. (2005). Quantitative risk assessment for mul-
tivariate continuous outcomes with application to neurotoxicology: The bi-
variate case. Biometrics 61, 757–766.
Zeileis, A. (2006). Object-oriented computation of sandwich estimators. Jour-
nal of Statistical Software 16, 1–16.
Zeileis, A. and Hothorn, T. (2002). Diagnostic checking in regression relation-
ships. R News 2, 7–10.
Zhu, Y., Wang, T. and Jelsovsky, J. Z. (2007). Bootstrap estimation of bench-
mark doses and confidence limits with clustered quantal data. Risk Analysis
27, 447–465.
Index
211
212 Index
Derived parameter, 48, 173, 174 Inverse regression, 134, 138, 172
Diquat, 20
Douglas model, 186 J-shaped model, 182
Joint model, 36, 79
E-max model, 179, 189
Earthworms, 49, 52 Least squares estimation, 161
Ecotoxicology, 6 Lettuce, 23
Effective dose, 3, 6, 119, 128, 172, Likelihood ratio test, 59, 163
173, 178 Link function, 70
Absolute, 51 Log-Gaussian model, 187
Interpretation, 68 Log-logistic model, 96, 178, 189
Estimation, 161 Four parameters, 3, 9, 13, 20,
Excess risk, 120 21, 32, 36, 124, 130, 133
Exponential decay model, 7, 184, 189 Three parameters, 27, 52, 65,
Extra risk, 120 74, 98, 112, 135, 140, 147,
Extrapolation, 121 151, 155, 158
Two parameters, 44, 47, 58, 86,
Fieller interval, 175 91, 105, 121, 126, 136
Fixing parameters, 188 Log-normal model, 183, 189
Fluoranthene, 54 Three parameters, 50, 140
Fractional polynomial, 182, 190 Two parameters, 54, 136
Fronds, 64 Logarithm transformation, 15, 107
Fungicide, 124 Logarithmic axis, 4
Logistic model, 178
Gamma model, 186, 189 Logistic regression model, 46, 179
Generalized linear models, 43, 63, 69 Lower limit, 6, 19
Generalized log-logistic model, 180 Lymphocytes, 124
Generalized nonlinear model, 50, 52
Genetic study, 124 Maximum likelihood estimation, 163
Germination, 95, 97, 103, 109 Median event time, 96
Giant Kelp, 129 Median germination time, 96
Global test, 34 Meta-analytic approach, 39, 105,
Gompertz model, 184 114, 127
Michaelis-Menten model, 179, 189
Hierarchical design, 109, 113, 145 Model averaging, 135, 174, 183
Hill model, 179 Model checking, 8, 32
Hockey stick model, 187 Model misspecification, 11, 77
Hodgkin-Huxley model, 186 Model reduction, 59
Hormesis, 23 Model selection, 112, 135
Hormesis model, 173, 181, 190 Model-averaged curve, 142
Hormetic effect, 71 Monitoring intervals, 95
Hybrid approach, 129 Multinomial data, 85
Multistage model, 186, 189
In vitro study, 124
Insecticide residues, 86 Naive standard errors, 13
Interpolation, 135 Natural mortality, 49
Index 213