Professional Documents
Culture Documents
12507
TUTORIAL
The lognormal distribution has a widespread application Moreover, a form of asymmetry is commonly required for
in the pharmacometric community. The focus of nonlinear models on physiological processes. This has been shown
mixed effects approaches in pharmacometrics has been on for observations on pharmacokinetic evaluations3,4 but also
the development of pharmacokinetic models initially, shift- for general applications.5 The skewness has even been de-
ing toward pharmacokinetic–pharmacodynamic (PKPD) rived on the basis of theoretical grounds.6,7 Paraphrasing
models later on.1 The variability in pharmacokinetic mod- Gronholm and Annila,6 thermodynamic laws require that re-
els is typically of a limited magnitude, specifically when sults from multiple reactions depend on intermediate states
measured in smaller and well-controlled studies, for exam- and therefore produce skewed distributions, independent
ple, with coefficient of variation (CV) in a range of 5–30%. of the distribution of individual reaction steps. The simplest
Variability in outpatient studies tends to be larger, e.g., 25–75 distribution that complies with the criteria of positivity and
%CV, whereas biomarkers as assessed with PKPD models asymmetry is the lognormal distribution. It is obtained by
often present with much larger variability, e.g., 50–150 %CV. raising the normal distribution to the power of the base of
choice, typically e (corresponding to the natural logarithm
Rationale for applying the lognormal distribution used as default). The result is strictly positive but has many
Large variability, as typically encountered in pharma- more properties of interest. Although more distributions
cological data sets with biomarkers, requires modeling are available and used by the pharmacometric community
assumptions different from those captured by the normal through transformations,8 the lognormal distribution is by far
distribution. An early example in pharmacology dates back the most commonly encountered.
to 1972,2 a study of effective doses in various tissues in
which a clear case for the use of the lognormal distribution Formal comparison of the approximated lognormal
was made. through PIs
Most of the physiological processes modeled by phar- Many pharmacometric applications have the objective to sup-
macometricians are strictly positive because drug exposure port decision making during pharmaceutical development or
and biomarkers are often measured as concentrations in in the regulatory review process.1 Whether the actual topic is
blood that cannot be negative. At the same time, parameters trial outcome probability, population coverage, or the impact
in pharmacokinetic and pharmacological models also need of special populations, the underlying property that drives
to remain positive to retain their meaning. Clearance, for the result is the model-derived PI, typically established at 80
example, describes the body’s capacity to remove xenobi- or 90% of the population. Properties of the lognormal distri-
otics that it cannot produce on its own. Normal or Gaussian bution will be evaluated against such PIs as the proverbial
standard deviation (SD) cannot extend to more than half of yardstick. The base distribution that we will use to compare
the mean without the distribution also getting substantial the lognormal distribution against is the normal or Gaussian
coverage into negative values. The lognormal distribution is distribution. The normal distribution is obviously a popular
a natural alternative that can span well beyond this range distribution throughout scientific analysis. Also in pharma-
while never producing negative values. cometric textbooks the normal distribution is discussed
1
PD-Value B.V., Houten, The Netherlands; 2Leiden Academic Center for Drug Research (LACDR), Leiden University, The Netherlands; 3Mathematical Institute, Leiden
University, Leiden, The Netherlands. *Correspondence: Jeroen Elassaiss-Schaap (jeroen@pd-value.com)
Received: November 20, 2019; accepted: March 3, 2020. doi:10.1002/psp4.12507
21638306, 2020, 5, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1002/psp4.12507 by National Health And Medical Research Council, Wiley Online Library on [04/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Log Domain Variability
Elassaiss-Schaap and Duisters
246
extensively, see, for example, ref. 9. Up to about 1990, the pharmacokinetics. It therefore is worthwhile to derive the
normal distribution was the only one available for the pharma- basic properties of the lognormal distribution from first
cometric community.10–12 The normal distribution is therefore principles and establish them robustly.
the reference distribution of choice. It will be investigated to
which extent PIs generated using the lognormal distribution Some basic definitions. Statistical models, such as
can be approximated by those of the normal distribution. those used in pharmacometrics, are typically calibrated on
The standard approaches to describe variability in observations of random variables (i.e., collected data). In
pharmacometric models are better suited for Gaussian dis- turn, these random variables are defined by the distribution
tributions, with a focus on CV; see also the description in they are assumed to follow. The first step in deriving the
the previous few paragraphs. A lack of helpful statistical in- basic properties of a random variable is to define the
struments and/or the application thereof may impede proper distribution itself, which will be done by specification of its
interpretation of results. Although the CV directly relates to PDF. The collection of all values a random variable could
the variability that is described, it does not reflect on the take in theory is known as the “support.” For instance, a
shape of the lognormal distribution. We therefore also revisit Gaussian (or normal) random variable can attain any value
old metrics such as “skewness” and discuss the possibility on the real line, i.e., its support is ( − ∞,∞). The PDF defines
of applying other metrics to capture the impact of variabil- for each value in the support what the probability density of
ity magnitude on the shape of the distribution. The primary (the occurrence of) that particular value is. This concept of
objective, however, is to precisely compare the lognormal density is a generalization of probability for discrete random
distribution to the default Gaussian or normal distribution. variables, for instance, 1/6 being the probability a single
To be able to evaluate how well the normal distribution is roll of a fair die generates six eyes. Similarly, one reads
capable of describing the lognormal distributions, a formal off a density for a particular realization from a continuous
procedure is set up to calculate optimal normal parameters random variable, such as the normal and lognormal. An
for given lognormal parameters using the Kullback–Leibler important characteristic of any PDF is that its area under
(KL) divergence of distributions and their probability density the curve (integrating over the entire support) equals one.
functions (PDFs). With these optimal parameters, the quality
of approximated PIs is explored as a function of variability The normal distribution. Suppose random variable X is
magnitude. To enable a wider audience to understand how normally distributed with mean 𝜇 and SD 𝜎 > 0, denoted as
these results are derived, the properties of the normal and X ∼ N (𝜇, 𝜎). Then, the PDF fX of X for some realization x on
lognormal distributions, as well as the KL divergence, will be ( − ∞,∞) is given by:
derived in this article starting from first principles. � �
The derived optimal approach will then be used to de- 1 (x − 𝜇)2 (1)
fX (x) : = √ exp − .
termine the extent to which the lognormal distribution can 2𝜋𝜎 2𝜎 2
be approximated by a normal distribution. The approxima-
tion was devised specifically for this purpose, and its use Throughout, equalities that follow by definition are denoted
as “:=”. The reader may verify that indeed ∫ −∞ fX (x)dx = 1, for
∞
assures the limitations in approximating the lognormal distri-
bution can be assessed objectively and robustly. which a standard Gaussian integration result is required.
Using the PDF, basic characteristics such as the mean
THEORETICAL (𝔼[X]) and SD (the square root of the variance Var[X]) can be
Formal properties of the lognormal distribution derived.
This section presents theoretical properties of the lognor- ∞
mal distribution as can be found in textbooks using the
∫
𝔼[X] : = xfX (x)dx (2)
normal distribution as starting point. Highlighting formal
differences between these distributions will aid the more −∞
intuitive discussion in the remainder of this article. A glos- In words, the expected value consists of the values that X
sary of terms is provided in Supplemental Materials 13. can take multiplied by how often those values occur relative
To preserve the flow, mathematical derivations are provided to each other; a density-weighted average. Again, intuition
in the Supplemental Materials 11. may be borrowed from discrete random variables. For in-
stance, a random variable supported on {2,4,8}, realized
Motivation. One of the basic properties of the lognormal with probabilities 1/4, 1/2, and 1/4, respectively, has an ex-
distribution is that its mean is not equal to its median, in pected value of 4.5.
contrast to the normal distribution. This has a large impact The expected value (Eq. 2) is a general definition for any
on interpretation. For example, pharmacometricians often continuous random variable X . In particular, for X Gaussian,
need to explicitly explain to outsiders that predictions we have:
with mixed effects models are representative of the ∞
typical individual rather than the mean of the data. The
∫
𝔼[X] : = xfX (x) dx =
impact of nonlinear mixed effects makes this necessary,
e.g., in the classic example of averaging Emax curves. −∞
Here, equality * is worked out in Supplemental Materials and the variance using k = 2 and k = 1 (squared).
11. For the variance, it always follows that
Var[U] : = 𝔼 [U 2 ] − (𝔼[U])2
( )2 ( )2 ( ) ( ( ))2
Var[X] : = 𝔼 [ X−𝔼[X] ] = 𝔼[X]2 − 𝔼[X] (4) 1 1 (12)
= exp 2𝜃 + 22 𝜔2 − exp 𝜃 + 𝜔2 ,
2 2
For X Gaussian the first term on the right-hand side equals
∞ � � which can be rewritten as exp(2𝜃 + 𝜔2 )(exp(𝜔2 ) − 1). The CV
(x − 𝜇)2 is trivially implied:
∫
1 ∗
2 2
𝔼[X ] = x √ exp − dx = 𝜎 2 + 𝜇 2 , (5)
2𝜎 2
2𝜋𝜎 √
Var[U] √
−∞
CV[U] = exp(𝜔2 ) − 1. (13)
and from Eq. 3 we know 𝔼[U]
(𝔼[X])2 = 𝜇 2 . (6) Finally, as established in Supplemental Materials 11,
the median of U ∼ logN(𝜃,𝜔) is located at exp(𝜃) and the
In summary, Var[X] = 𝜎 2 for X ∼ N(𝜇,𝜎). Again, mathematical
mode at exp(𝜃 − 𝜔2 ). Observe that the mean of U is given by
details (*) are discussed in Supplemental Materials 11. The
exp(𝜃 + 12 𝜔2 ); a property clearly different from the Gaussian
CV can now be plugged in easily:
“mean = median = mode” characteristic. This √is also reflected
√ [ ]
in the skewness of U given by (exp(𝜔2 ) + 2) exp(𝜔2 ) − 1.
[ ] Var X
𝜎 (7)
CV X : = [ ] =
𝔼X 𝜇 Overview. Table 1 summarizes the differences between
the normal and lognormal distributions established so
Finally, we mention that symmetry and unimodality of the far. The variables X or U can be replaced by the name of
Gaussian PDF around 𝜇 lead to some well-known proper- the parameter of interest to the pharmacometrician, for
ties; the median equals 𝜇, the skewness is 0, and the mode example, clearance (CL), i.e., CL ∼ logN(𝜃,𝜔).
of the distribution is located at its mean. This is in sharp In conclusion, several properties of the lognormal distri-
contrast to the lognormal distribution discussed next. bution can be straightforwardly derived from first statistical
principles. The take-home messages are that the mean of
The lognormal distribution. Suppose random variable U the lognormal distribution, in contrast to the normal, is dif-
is lognormally distributed with statistical parameters 𝜃 and ferent from the median and that it is related to both 𝜃 and 𝜔.
𝜔, denoted as U ∼ logN(𝜃,𝜔). Starting from a normal random The CV is only a function of 𝜔. Lastly, the parameter value
variable X with mean 𝜃 and SD 𝜔, this U can be defined as with largest density, i.e., the mode, is different from the mean
follows: and the median.
U: = exp (X) with X ∼ N(𝜃,𝜔). (8) Graphical exploration of the lognormal distribution
Remark that 𝜃 now takes the role of 𝜇 and 𝜔 that of 𝜎 in the To augment the formal definition of normal and lognormal
notation of the previous subsection. Again, let us define the parameters, a graphical exploration is presented. A normally
values that the distribution of U can take as u. From the defi- distributed parameter with N(𝜇,𝜎) is shown in Figure S1 with
nition in Eq. 8, it follows that the support of U is (0,∞). For 𝜇 = 8 and 𝜎 = 4, as its probability density and its cumulative
any realization u on this support, the PDF of the lognormal density, overlaid with a histogram of 1000 draws from the
is given by: normal distribution. Remark that for any mean and (positive)
SD, the normal distribution has at least some mass below
� � zero, in other words, the parameter can attain negative val-
1 ( log (u) − 𝜃)2
fU (u): = √ exp − . (9) ues. The figure also exemplifies that the mean lies at the
u 2𝜋𝜔 2𝜔2
peak of the distribution and that the cumulative distribu-
tion is exactly 50% at that point, in other words: The mean,
Throughout this article, log() is understood as the natural
mode, and median are equal.
logarithm with respect to base e unless mentioned other-
wise. This PDF expression can be taken at face value or Table 1 Main properties of the normal and lognormal distribution
derived through the normal PDF using Eq. 8 as is done in
Supplemental Materials 11. To derive the expected value, Normal X ∼ N (𝝁,𝝈) Lognormal U ∼ logN (𝜽,𝝎)
variance, and CV for U ∼ logN(𝜃,𝜔), the following proposi- Support (−∞,∞) (0,∞)
tion is used, for which a proof is provided in Supplemental Mean 𝜇
( )
exp 𝜃 + 21 𝜔2
Materials 11: ( ) √
SD 𝜎 ( )
( ) exp 𝜃 + 21 𝜔2 exp 𝜔2 − 1
1 √ ( )
𝔼[U k ] = exp k𝜃 + k 2 𝜔2 for any k = 1,2,.... CV 𝜎∕𝜇
2 (10) exp 𝜔2 − 1
Median 𝜇 exp (𝜃)
Hence, the expected value is obtained with k = 1 ( )
Mode 𝜇 exp 𝜃 − 𝜔2
∞ ( ) √
( ( ) ) ( )
∫
1 2 Skewness 0
exp 𝜔2 + 2
𝔼[U] : = u fU (u) du = exp 𝜃 + 𝜔 , exp 𝜔2 − 1
2 (11)
0 CV, coefficient of variation; SD, standard deviation.
www.psp-journal.com
21638306, 2020, 5, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1002/psp4.12507 by National Health And Medical Research Council, Wiley Online Library on [04/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Log Domain Variability
Elassaiss-Schaap and Duisters
248
An example of the lognormal distribution is shown in assume the normal and the lognormal distribution behave
Figure S2 with 𝜃 = log(8) and 𝜔 = 0.5 together with the normal similar. The percentiles of the normal distribution relate linear
distribution from Figure S1. The lognormal 𝜔 was set equal to 𝜎. Although at low values the 10th and 90th percentiles
to the CV of the normal distribution (4/8). The example visual- of lognormal are close to those of the normal distribution,
izes that the lognormal distribution cannot assume negative at mildly higher values they start to deviate. The increas-
values in contrast to the normal distribution. Other differ- ing skewness of the lognormal distribution with increasing 𝜔
ences are that the point of maximum density not coincides is clearly visible, as the 10th percentile shrinks toward zero
with its mean and that there is a heavier tail in its density and the 90th percentile increases exponentially.
toward higher values compared with the normal distribution. These graphs demonstrate that the behavior of the log-
In other words, the cumulative densities of the lognormal normal distribution is not easy to capture in a single number.
distribution occur at higher values. The difference between At the backtransformed scale, the median and the mean be-
the normal and the lognormal distribution increases with in- come further separated as a function of 𝜔; see Figure 1. This
creasing variability; see Table 1. It is notable that the mean can have counterintuitive effects, as the basic expectation
of the lognormal distribution now lies higher than 50% cu- is the mean to be located at the center of the distribution.
mulative probability; in other words, the mean has become The mean of the lognormal distribution also is not the value
larger than the median (the mean of log-transformed values with the highest probability density. In addition, the distri-
is equal to the log of the median); see Eq. 11. bution becomes more and more asymmetric, as shown in
The extent of the difference between the normal and log- the left panel of Figure 1. These properties all are governed
normal distribution is further evaluated in Figure 1. The 10th by one parameter, 𝜔, but are not easily and transparently
and 90th percentiles are plotted as a function of 𝜎 for the derived from it. For example, in the right panel of the same
normal and as a function of 𝜔 for the lognormal distribution, figure it can be observed how different the mean and the
where the same values for 𝜎 and 𝜔 are used. The plot there- mode become as a function of 𝜔. The increasing skewness
fore reflects what would happen if one would mistakenly of the lognormal distribution is not always presented and
mean
7.5 7.5
10th and 90th percentile
5.0 5.0
normal, 90th
2.5 2.5
median
lognormal, 10th
0.0 0.0
mode
normal, 10th
0 1 2 3 0 1 2 3
ω,σ ω
Figure 1 Differences between the lognormal distribution relative to the normal. The 10th and 90th percentiles of the normal (dashed,
𝜇 = 1, 𝜎 varies) and the lognormal (straight, 𝜃 = 0, i.e., the same median value, 𝜔 = 𝜎), as a function of 𝜎 (left panel). The mean (dark
straight), median (dotted), and mode (light straight) plotted against 𝜔 where 𝜃 is set to zero; at zero the curve therefore starts with one
(right panel). Note how quickly the mean runs off the scale and overtakes the position associated with the upper probability, as plotted
in the left panel, at an 𝜔 of about two. The mode similarly decreases to a position lower than associated with the lower probability, at
an 𝜔 of one.
appreciated as such. A perhaps more intuitive plot demon- A second risk that is inherent in the interpretation of a
strating the increase in skewness with increasing 𝜔 can be lognormal distribution as if it was normal is that the tails of
found in Figure 2. The skewness clearly increases more the distribution are misjudged. Frequently, the end result of a
than linearly with increasing 𝜔, and it becomes more and pharmacometric evaluation regards the tails of the distribu-
more difficult to summarize the distribution. tion, e.g., to determine whether exposure in a subpopulation
In the following part of this article, the consequences of corresponds with that of the general population for at least
interpretation of lognormal distributions as if they were nor- 90% of the subpopulation. Therefore, it is important to es-
mal will be discussed. The conclusion for now is that the tablish how closely the lognormal distribution corresponds
shape of the lognormal distribution does clearly depend on with a normal distribution in its tails.
the value of its variability. Suppose one has estimated 𝜃 and 𝜔2 (estimates denoted
as 𝜃̂ respectively 𝜔̂2 here) based on gathered data about the
RESULTS random variable CL ∼ logN(𝜃,𝜔), for instance, in NONMEM
Interpretation of the lognormal distribution as normal with declaration CL = EXP(THETA(1) + ETA(1)). Model results
Proxy interpretation as normal and its limitations. One for CL could be: 𝜃̂ = 0.45; 𝜔̂2 = 0.45, and CV
̂ = 75%. A phar-
aspect of potential misrepresentation is the difference macokinetic parameter was chosen as a relevant example,
between mean and median. As discussed previously, but the same principles hold for any parameter or response
the median and mean are equal for the normal but not for variable such as concentration.
the lognormal distribution. The median of the lognormal Table 2 lists the implied mean, SD, mode, median, and
distribution is exp(𝜃) and therefore is independent of 𝜔. All two percentiles for CL in the population for several model
other characteristics of the lognormal distribution, however, interpretations treated next, the top line (A) being the correct
are dependent on 𝜔, including the mean and also the mode. one. In scenario B, the parameters are mistaken for those of
A first deviation when interpreting the lognormal distribution the normal distribution. Scenario C is a rule of thumb that
as normal therefore relates to the difference between the interprets 𝜃̂ as (Gaussian) mean. Scenarios D to F apply a
median, mean, and mode, as is discussed further in section formal approximation of 𝜎 ̂ assuming, respectively, the mean,
“Statistics to Describe the Lognormal Distribution”. mode, and median are equal to 𝜃̂.
1.00
median
ω=2 ω = 0.5
(constant)
ω=1
0.15
0.75
ω=2
Cumulative density
Probability density
0.10 ω = 0.5
0.50
ω=1
0.05
0.25
median
(constant)
0.00 0.00
0 5 10 15 20 25 0 5 10 15 20 25
Values Values
Figure 2 Probability density (left) and cumulative density (right) of the lognormal distribution at different values of 𝜔 and fixed median.
The probabilities according to the lognormal distribution at 𝜔 values of 0.5 (gray), 1 (black stripes), and 2 (black) is plotted against
parameter values between 0 and 25; 𝜃 was set to log(8).
www.psp-journal.com
21638306, 2020, 5, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1002/psp4.12507 by National Health And Medical Research Council, Wiley Online Library on [04/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Log Domain Variability
Elassaiss-Schaap and Duisters
250
Table 2 Mean; SD; mode; 10%, 50% (median), 90% percentiles; and associated RE% for several candidate model interpretations for 𝜽̂ = log( 8.0)
and 𝝎̂ = 0.67
In the best case (A), the model estimates 𝜃̂ and 𝜔 ̂ are the locations will diverge exponentially; see Figure 3 (In the
used to infer the mean, SD, mode, and percentile statistics Proofs section it is shown that the ratio of any percentile in
from the corresponding lognormal distribution. This may KL optimized normal approximation against the true log-
sound trivial, but it is not uncommon to encounter one of normal percentiles is only dependent on 𝜔 ̂, i.e., not on 𝜃̂.)
the following five alternative interpretations (B–F) based on The tails of the distribution are specifically important when
the Gaussian distribution in practice instead. A worst case simulations are used to generate PIs and the tolerance for
scenario (B) would result from interpreting the estimators deviating locations is not very high. The relative error in the
directly as if they specify a normal distribution CL ∼ N( 𝜃, ̂ 𝜔)
̂, lower 10% tail of the approximation gets to 10% at an 𝜔 of
leading to a substantial underrepresentation of effect size about 0.25. The upper tail hits the 10% error level at higher
and variability (80% PI [1.2–2.9]). Furthermore, the authors values because the approximated SDs are large compared
have seen a form of the following rule of thumb (C) being in- with the lognormal SDs, analogs to the difference between
formally applied. Using 𝜃̂ and CV ̂ , approximate the mean by the correct equation for CV compared with the first-order
̂ and consequently the SD by exp( 𝜃)
exp( 𝜃) ̂ . Then, based
̂ ⋅ CV Taylor approximation as also described in the next para-
on characteristics of the Gaussian, suppose the mode and graph, to be able to cover the bulk of the density. The rule
median are similar to the mean and construct 10% and 90% of thumb approximates the 10% tail worse, but the 90% tail
percentiles symmetrically based on exp( 𝜃) ̂ ± 2SD̂ , which is a the better among all approximations, and performs reason-
strong inflation with respect to the normal 80% PI that uses ably well up to an 𝜔 of about 1.1. The results at the 10% tail
factor 1.28 instead of 2. In case this inflation causes the nevertheless lead to the conclusion that above an 𝜔 of ≈
10% percentile to become negative (which is impossible for 0.25, the lognormal distribution cannot be interpreted as a
a lognormal random variable), it is replaced by 0 (or consid- normal distribution even when using KL-optimal estimates.
ered “small”). The results remain similar when the 95% PI is chosen in-
Realistically, interpretations D, E, and F try to mimic the stead of the 80% PI; see Figure S3.
lognormal distribution with a Gaussian by matching the
mean, mode, or median of the underlying lognormal, re- STATISTICS TO DESCRIBE THE LOGNORMAL
spectively. By fixing the normal mean 𝜇̂ in this way, focus DISTRIBUTION
is placed on what value of 𝜎 ̂ does the best job in approxi- CV
mating the targeted logN( 𝜃, ̂ 𝜔)
̂ distribution. Despite 𝜇̂ having The usual way of representing a distribution in pharma-
improved, it may be clear that 𝜎 ̂=𝜔 ̂ is completely on the cometric reports and papers is the mean and CV of the
wrong scale as before. With the goal of approximating the distribution. The CV is an adequate measure to summarize a
̂ 𝜔)
logN( 𝜃, ̂ density with the N( 𝜇̂,̂𝜎 ) in mind, we turn to the normal distribution as it is clear how far values extend below
widely adopted KL divergence12 as pseudometric for ap- and above the mean and what the probability is of finding
proximation error, see the Proofs section, which defines negative values. The CV is, however, more difficult to inter-
̂s in Eq. 20. Examining Table 2, the mean-match
different 𝜎 pret for the lognormal distribution with larger values of 𝜔.
Gaussian (D) seems to be closest to the benchmark (A) in Optimally, one would convert the CV back into the lognormal
terms of the 10% and 90% percentiles, although the 80% SD 𝜔 and use that for further interpretation. In the phar-
PI [0.35,19.7] is too wide (compare [3.4, 18.9]) and the me- macometric literature, some scrutiny needs to be applied
dian too high (10.0 vs. 8.0). Despite the unfeasible estimate before doing that because often the CV is approximated
√ by
for the lower 10% percentile (−4.0), the rule of thumb (C) its first-order
√Taylor expansion around zero, 𝜔 instead of
2
does not do much worse than the KL mean-match (D) in the correct exp(𝜔2 − 1) (see also ref. 3). At higher values,
terms of the 90% percentile. for example, 2 for 𝜔, the traditionally reported CV would be
How large the deviations at the tails of the approximation 200% instead of 732%. Regardless of this confusion, the CV
are also depends on the magnitude of variability. At very remains difficult to interpret beyond 𝜔 values that allow for
small values of 𝜔, the difference will be negligible because sufficient precision in the interpretation of the lognormal as a
the shape of the distributions is similar, but at larger values normal distribution as explored previously.
−20
Figure 3 The percentage error of the location of the lower (left panel) and upper (right panel) 10% tail of the Kullback–Leibler optimally
approximated normal distribution representing a given lognormal distribution or as assessed by rule of thumb, see text for an
explanation. The relative error in the location of the approximate normal tail compared with the given true, i.e., the tail of the lognormal
distribution, is plotted as percentage against the lognormal standard deviation 𝜔. The Kullback–Leibler optimal approximation was
calculated for an assumed equal mean, median, or mode. The horizontal lines indicate the 10% and 25% error levels in either direction.
www.psp-journal.com
21638306, 2020, 5, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1002/psp4.12507 by National Health And Medical Research Council, Wiley Online Library on [04/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Log Domain Variability
Elassaiss-Schaap and Duisters
252
10.0
ω=2 median
(constant)
0.15
7.5
Normalized peak density and mode
peak=1.10 x minimum
mean=1.25 x median
mode=0.5 x median
Probability density
0.10 ω = 0.5
5.0
peak
density
2.5 lognormal 0.05
ω ≈ 1.3
mode peak density
normal
0.0 0.00
0 1 2 3 0 5 10 15 20 25
ω,σ Values
Figure 4 The lognormal distribution increases in sharpness after reaching a minimum at an 𝜔 of about one. Combined plot of the peak
density value (solid) of the lognormal distribution, normalized to its lowest value, and of the mode of the lognormal distribution (dark
gray) as a function of the lognormal standard deviation 𝜔 (left panel). The peak density of the normal distribution as a function of 𝜎
(dashed gray) has been added as reference. Vertical lines highlight some 𝜔 values of special interest; from left to right, the values where
the mean is 25% higher than the median, where the mode is half of the median, and where the peak density becomes 10% larger than
the minimum density. Selected examples of the lognormal probability density at several values of 𝜔; 𝜃 was set to log (8) (right panel).
lognormal distribution these three statistics are different which a normal approximation would lead to more than 10%
and their difference furthermore increases with increasing error in the location of its lower tail. A gray area follows, with
variability, see also Figure 1. The mode shrinks toward zero a distribution peak that is still somewhat normal-like, while
and the mean increases more than exponentially (exponen- increasingly eccentric mode and mean values are realized.
tially with 𝜔2). Given large enough 𝜔 values, the mean will At an 𝜔 of about ≈ 1.1, the gray area ends where the best
be even higher than most percentiles because percentiles normal approximation would lead to more than 10% error in
above 50% increase exponentially with 𝜔; for example, the the location of its upper tail.
mean of the lognormal distribution is above its 90th per- The previous interpretation can be considered reasonable
centile for 𝜔 larger than ≈ 2.56. At such values, the mode but somewhat strict. Depending on the context and applica-
and the mean therefore represent extreme values of the tion, e.g., in the context of small efficacy–safety windows, a
lognormal distribution. The median on the other hand is more stringent set of cutoff values could be selected, 0.12
at the center of the lognormal distribution regardless of 𝜔 and 0.83 as the border where the error in the tail would
value. be 5% and the mode is found at 50% of the median, re-
spectively. Or in case modeled distributions are used in a
DISCUSSION descriptive fashion, a more liberal lower cutoff value could
be selected with ≈ 0.67, the point at which the mean gets
We have explored the properties of the lognormal distribu- 25% larger than the median. A more liberal alternative for
tion in mathematical and graphical detail, and the limitations the higher cutoff could be an 𝜔 of ≈ 1.3, where the distribu-
one encounters when interpreting it as if it were a normal tion sharpens by more than 10% and the mode and mean
distribution. These limitations have been established using become too eccentric. The interpretation of a distribution
formal and therefore objective approximation methods. below the lower cutoff is interpreted as almost identical to
The lognormal distribution can be considered to have that of a normal distribution, whereas above the higher cut-
normal-like properties up to a value of 𝜔 ≈ 0.25, the point at off the distribution is considered as a completely different
www.psp-journal.com
21638306, 2020, 5, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1002/psp4.12507 by National Health And Medical Research Council, Wiley Online Library on [04/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Log Domain Variability
Elassaiss-Schaap and Duisters
254
0.2
0.1
0.0
0.2
0.1
0.0
0.2
0.1
0.0
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
Values
Figure 5 Probability densities at 𝜔 values of interest computed at 𝜃 = log (8). The inserts tabulate values of CV, MMR, MDI, the relP90,
and skewness. Vertical dotted lines indicate the position of the median (left) and the mean (right); from an 𝜔 value of two, onward the
position of the mean falls outside the horizontal axis range. CV, coefficient of variation; MDI, mode density inflation; MMR, mean-
to-median ratio; relP90, 90th percentile relative to the median.
distribution, with diverging mean and medians, and a peak certainly if normal distributions are presented alongside,
in density that is close to zero. %CV. Furthermore, instead of the mean, the median would be
reported as appropriate. From 𝜔 values of about 0.25–0.67,
Cutoff values and regions of ω values to guide the lognormal starts to deviate from the normal distribution,
interpretation and the mean-to-mode ratio or MMR starts to deviate from
Three scenarios can be discerned when reporting on lognor- one. The MMR can function as a warning of asymmetry and
mal distributions; see Table 3 and Supplemental Material as an indication that the mean explicitly cannot be assumed
12. In case the 𝜔 is small, the results can be interpreted to be similar to the median. It also starts to indicate the extent
comparable to normal distributions, focusing on 𝜔 and, of spread of the distribution into higher values. With these 𝜔
values, the tails of the distribution can no longer be recovered decrease the support for the large 𝜔 lognormal distribution,
using an interpretation as normal. Beyond an 𝜔 of 1.1, the dis- alternative parameterizations such as mixture models or
tributional shape gets very different, and it is suggested to semiparametric conversions could be explored.
include the third parameter, the mode-density ratio or MDI
that serves to illustrate the increased peak sharpness at val- PROOFS
ues close to zero. At this scenario, the text of the report would Analytical expression for KL divergence of the normal
include some discussion of additional support for the distribu- distribution against the lognormal distribution
tional shape. The tails of the lognormal have lost any similarity The KL divergence12 between two continuous distributions
to those of the normal distribution at these high 𝜔 values. P and Q with densities p() respectively q() is defined as
In reports that discuss population coverages, for exam- ∞ ( )
ple, 80% or 90% PIs, under such high 𝜔 it is especially p (u)
∫
KL(P||Q) : = log p (u) du. (18)
advisable to carefully convey to how different the tails of q (u)
the lognormal distribution are. For example, the sensitivity −∞
of coverages to parameter uncertainty or misspecification Because it is our goal to evaluate how different a given
could become quite different for lower and upper intervals. lognormal distribution is from (the best-fitting) normal dis-
Explicit discussion of such properties where relevant may tribution, we will denote logN(𝜃,𝜔) by P and the N(𝜇,𝜎)
help prevent unrealistic expectations among the audience. distribution by Q. The following may be considered a tech-
Notwithstanding the preceding discussion, the math- nicality. Because p(u) is only supported on (0,∞) we have to
ematically best presentation of a lognormal distribution is assume that p(u)log(p(u)) = 0 for p(u) = 0, i.e., for u ≤ 0. Hence,
indeed strictly as a lognormal distribution, with 𝜃 and 𝜔 pre- the difference between the lognormal and the normal is merely
sented in their untransformed values. Such a presentation, considered on (0,∞) in this setting. Strictly speaking, that is
however, might receive a less-than-favorable reception by a not how the KL divergence is intended because Q is formally
wider target audience and therefore cannot be deemed op- no longer a probability measure on this restricted space (0,∞)
timal. We hope that our findings and discussion of possible , but it serves its purpose here. Finally, one may observe that
statistics can help in interpretation. the reverse definition KL(Q||P) is ill defined (+∞), which is why
we speak of “divergence” rather than “distance.”
Confirmation of lognormal distributions with high As a first step toward optimizing 𝜎 ̂ by KL minimization (on
variability (0,∞)) for fixed 𝜃, 𝜔, and 𝜇, we derive an analytical expression
Fleming et al.2 investigated the distribution of potency val- for the divergence.
ues of acetylcholine. Their paper clearly showed highly ( ) ( ( )
1 𝜎 1
skewed distributions of observed potencies with high KL(P||Q) = − − 𝜃 + log + 2 exp 2𝜃 + 2𝜔2
2 𝜔 2𝜎
counts close to zero, consistent with the skewed and asym- ( ) )
1 2
metric properties of high-variability lognormal distribution, −2𝜇 exp 𝜃 + 𝜔 + 𝜇2
2 (19)
i.e., a high MDI. It is, however, not always clear that such a
curvature is indeed needed. Therefore it is recommended Proof.
∞
to perform additional investigations to confirm the modeled p (u)
∫
skewness if a lognormal 𝜔 falls beyond the gray zone. The KL(P||Q) = p (u) log du = I + II + III + IV
q (u)
actual indicators could vary dependent on the amount of 0
data available, the background knowledge, and the avail- Substituting p and q gives the following four expressions
able software. Three types of confirmation could be sought: after rewriting.
(i) typical run completion checks, foremost whether the
standard error of 𝜔 is not overly large; (ii) post hoc checks, Part I.
such as whether the post hocs show indeed—at the nor- ∞ � � �2 � � �2
mal scale—a high probability at low values consistent with 1 log (u) − 𝜃 log (u) − 𝜃
2 ∫ u𝜔 2𝜋
1 1
I=− √ exp − du
the MDI and whether the mode of the distribution does 2 𝜔 𝜔
0
indeed occur frequently; and (iii) simulation-based diag-
nostics such as mirror plots and visual predictive checks. Substitute (log(u) − 𝜃)∕𝜔 = s, u = exp(𝜃 + 𝜔s), du = 𝜔exp(𝜃 + 𝜔s)ds.
When the number of observations/individuals is small, it is
easy to see that these diagnostics perform poorly and re-
∞ � �
2 ∫
1 2 1 1 2 1
search of additional (external) support for large 𝜔 values I=− s √ exp − s ds = −
2𝜋 2 2
would be advisable. In case the results of the investigations −∞
www.psp-journal.com
21638306, 2020, 5, Downloaded from https://ascpt.onlinelibrary.wiley.com/doi/10.1002/psp4.12507 by National Health And Medical Research Council, Wiley Online Library on [04/04/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
Log Domain Variability
Elassaiss-Schaap and Duisters
256
Here, the last equality follows from a standard Gaussian Finally, adding parts I + II + III + IV completes the proof.
integral (which one may recognize as the variance of a stan-
dard normal random variable). Optimizing 𝜎̂ of the distribution to match a given
lognormal distribution
Part II.
Show that 𝜎 ̂ is equal to one of the following three cases:
∞ � � �2 �
log (u) − 𝜃 √
∫ u𝜔 2𝜋
1 1 ⎛ exp( 𝜃)
̂ exp(2𝜔 ̂2 ) − exp( 𝜔
̂2 ) (D)
II = − √ exp − log (u) du �
2 𝜔 ⎜ � �
0 ⎜ exp( 𝜃)
̂ exp(2𝜔 ̂2 ) − 2exp − 12 𝜔 ̂2 ) (E) (20)
̂2 + exp( − 2𝜔
⎜ �
⎜ � �
Substitute log(u) = s, s = exp(u), du = exp(u)ds. ⎜ exp( 𝜃)
̂ exp(2𝜔 ̂2 ) − 2exp 12 𝜔 ̂2 + 1 (F)
⎝
∞ � � � �
1 s−𝜃 2
∫
1
II = − s√ exp − ds = −𝜃
2𝜋𝜔 2 𝜔 Proof.
−∞ Fixing 𝜃, 𝜔, and 𝜇, we set the partial derivative of Eq. 19 to zero.
( ( ( ) ))
Here, the last equality follows by recognizing the expression 𝛿KL
=𝜎 1 1 − 1 exp (2𝜃 + 2𝜔2 ) − 2𝜇exp 𝜃 + 1 𝜔2 + 𝜇2
as expected value of a Gaussian random variable with mean 𝛿𝜎 𝜎 2 2
Part III. We work out the details of this equation case by case, sub-
� stituting 𝜇.
∞ � �2 � � � � �
1 log (u) − 𝜃 D. Set 𝜇 = exp(𝜃 + 12 𝜔2 ).
∫ u𝜔 2𝜋
1 𝜎 𝜎
III = √ exp − log du = log
2 𝜔 𝜔 𝜔
0 0 = 1 − 𝜎12 (exp(2𝜃 + 2𝜔2 ) − exp(2𝜃 + 𝜔2 )) ⇒
√
𝜎 = exp(𝜃) exp(2𝜔2 ) − exp(𝜔2 )
The result follows immediately since the integral of the log-
normal PDF ∫ 0 p(u)du over its entire support equals 1.
∞
E. Set 𝜇 = exp(𝜃 − 𝜔2 )
Part IV. ( ( )
� 1 1 2
∞ � �2 � 0= 1− exp(2𝜃 + 2𝜔2
) − 2exp 2𝜃 − 𝜔
log (u) − 𝜃 𝜎2 2
2𝜎 2 ∫ u𝜔 2𝜋
1 1 1
IV = √ exp − (u − 𝜇)2 du )
2 𝜔 2
+ exp(2𝜃 − 2𝜔 ) ⇒
0
√ ( )
Writing (u − 𝜇)2 = u2 − 2𝜇u + 𝜇 2, we again recognize three 1
𝜎 = exp(𝜃) exp(2𝜔2 ) − 2exp − 𝜔2 + exp( − 2𝜔2 )
terms. First, for u2 we plug in k = 2 in Eq. 10. 2
∞ � � �2 �
1 log (u) − 𝜃 � � F. Set 𝜇 = exp(𝜃)
∫ u𝜔 2𝜋
2 1
u √ exp − du = exp 2𝜃 + 2𝜔2
2 𝜔 ( )
0 0 = 1 − 12 (exp(2𝜃 + 2𝜔2 ) − 2exp 2𝜃 + 21 𝜔2
𝜎
Second, for −2𝜇u, we use k = 1. +exp(2𝜃)) ⇒
√ ( )
∞ � � �2 � 𝜎 = exp(𝜃) exp(2𝜔2 ) − 2exp 12 𝜔2 + 1
1 log (u) − 𝜃
∫ u𝜔 2𝜋
1
−2𝜇 u √ exp − du
2 𝜔
0
� � Now, substituting 𝜃 and 𝜔 by their estimators 𝜃̂ and 𝜔
̂ pro-
= −2𝜇exp 𝜃 + 𝜔
1 2 vides the resulting expression for 𝜎
̂.
2
Ratio of percentiles of lognormal and KL-optimized,
Third, 𝜇2 is again simply a multiplication factor to the total matched normal do not depend on 𝜽̂
area under the curve for the lognormal PDF (which equals
In Supplemental Materials 11, it is explained that the
one) as in part III.
median of U ∼ logN(𝜃,𝜔), i.e., U = exp(X) with X ∼ N(𝜃,𝜔),
∞ � � �2 � can be found as exp(𝜃) (note: 𝜃 being the median of X )
1 log (u) − 𝜃
∫ u𝜔 2𝜋
1
𝜇2 √ exp − du = 𝜇2 because the exponential is a strictly increasing function.
2 𝜔
0 The same holds for other percentiles than the median.
Denote the p × 100%-percentile of the standard normal
Collecting all terms for part IV gives distribution Z ∼ N(0,1) by Zp. Then, the p × 100%-percen-
( ( ) ) tile of U is given by
1 ( ) 1
IV = 2 exp 2𝜃 + 2𝜔2 − 2𝜇exp 𝜃 + 𝜔2 + 𝜇 2 Up = exp(𝜃 + Zp 𝜔), (21)
2𝜎 2
where we have used the well-known fact that for normal approximation remained valid up to about an 𝜔 of 1.1, with
distributions, X ∼ N(𝜇,𝜎), percentiles Xp can be stated as an error of 25%. Above this level of variability, the lognormal
distribution does not resemble a normal distribution any-
Xp = 𝜇 + Zp 𝜎. (22) more, and other statistics may be helpful in the reporting
and discussion of lognormal distributions at high variability
Now, in Table 2, the best-case (A) percentiles of the lognormal values.
distribution are given by Eq. 21, plugging in 𝜃̂ and 𝜔
̂. One may
note that −Z0.10 = Z0.90 ≈ 1.28. On the other hand, for any of
the KL-matched scenarios (D,E,F) percentiles follow Eq. 22 be- Supporting Information. Supplementary information accompa-
cause of the interpretation as Gaussian plugging in 𝜇̂ according nies this paper on the CPT: Pharmacometrics & Systems Pharmacology
to Table 2 and corresponding 𝜎 ̂ from Eq. 20. The relative dif- website (www.psp-journal.com).
ference Xp ∕Up is given for the respective cases D, E, and F, by:
www.psp-journal.com