You are on page 1of 33

The Stata Journal (2019)

19, Number 2, pp. 261–293 DOI: 10.1177/1536867X19854002

qmodel: A command for fitting parametric


quantile models
Matteo Bottai Nicola Orsini
Division of Biostatistics Biostatistics Team
Institute of Environmental Medicine Department of Public Health Sciences
Karolinska Institutet Karolinska Institutet
Stockholm, Sweden Stockholm, Sweden
matteo.bottai@ki.se nicola.orsini@ki.se

Abstract. In this article, we introduce the qmodel command, which fits para-
metric models for the conditional quantile function of an outcome variable given
covariates. Ordinary quantile regression, implemented in the qreg command, is a
popular, simple type of parametric quantile model. It is widely used but known to
yield erratic estimates that often lead to uncertain inferences. Parametric quantile
models overcome these limitations and extend modeling of conditional quantile
functions beyond ordinary quantile regression. These models are flexible and ef-
ficient. qmodel can estimate virtually any possible linear or nonlinear parametric
model because it allows the user to specify any combination of qmodel-specific
built-in functions, standard mathematical and statistical functions, and substi-
tutable expressions. We illustrate the potential of parametric quantile models
and the use of the qmodel command and its postestimation commands through
real- and simulated-data examples that commonly arise in epidemiological and
pharmacological research. In addition, this article may give insight into the close
connection that exists between quantile functions and the true mathematical laws
that generate data.
Keywords: st0555, qmodel, qmodel postestimation, predict, qmodel quantile,
qmodel plot, quantile regression, quantile regression coefficient models, integrated
loss function

1 Introduction
The interest in modeling quantiles, such as the median and the 90th percentile, of a
variable of interest given a set of covariates has spurred research in many fields of sci-
ence. Applications and new methods have been increasingly appearing in the literature
(among others, Koenker et al. [2018]). Quantile regression is arguably the most pop-
ular method for modeling conditional quantiles given covariates. The qreg command
was released early on, and related commands such as qreg2, lqreg, and laplacereg
have since been developed (Machado, Parente, and Santos Silva 2011; Orsini and Bottai
2011; Bottai and Orsini 2013).
Quantile regression can estimate a conditional quantile at a time, imposing no re-
strictions on the quantile function but on the assumed functional relationship between
quantile and regression coefficients. For example, linear quantile regression assumes

c 2019 StataCorp LLC


st0555
262 qmodel: A command for fitting parametric quantile models

that a given quantile (for example, median) is a linear combination of covariates and
unknown regression coefficients. While this gives ordinary quantile regression flexibility,
it can also cause high variability of its estimator. Erratic estimates occur frequently in
applications and often lead to uncertain inferences. For example, the coefficient at one
percentile may be significantly different from zero, while those at adjacent percentiles
are not; the induced quantile function may be decreasing for some covariate patterns;
the estimated interquartile range may be smaller than the 20th-to-80th-percentile range.
In this article, we introduce the qmodel command, which fits parametric quantile
models. The latter extend modeling of conditional quantile functions far beyond ordi-
nary quantile regression. By specifying parametric forms, they can improve efficiency
and ease interpretation at the possible cost of introducing bias (Frumento and Bottai
2016, 2017; Bottai and Cilluffo 2017). The qmodel command can fit parametric quantile
models, allowing for most general specifications of conditional quantile functions given
covariates, of which ordinary linear quantile regression and nonlinear quantile regression
are special cases.
As a motivating example, we now introduce a simple linear parametric quantile
model, on which we further elaborate in section 3. We analyze data from 930 sub-
jects over 20 years old and with body mass index (BMI) over 30 kg/m2 from the 2017
U.S. National Health and Nutrition Examination Survey (NHANES). Figure 1 shows the
box plots of BMI over age groups. The bottom quantiles (for example, the 10th per-
centile) do not vary with age, while the top quantiles (for example, the 90th percentile)
decline steeply.

60
Body mass index

50

40

30
20− 30− 40− 50− 60− 70− 80−

Figure 1. Box plots of BMI over age groups with data from NHANES

We define a model for the conditional quantile function of BMI given age

Q(p|age) = β0 (p) + β1 (p)(age − 20)


M. Bottai and N. Orsini 263

For example, Q(0.5|20) represents the conditional median in 20-year-old individuals.


qreg can estimate the parameters β0 (p) and β1 (p) for any given quantile p ∈ (0, 1). We
use it to estimate all the percentiles, Q(0.01|age), . . . , Q(0.99|age).
Figure 2 displays the regression coefficient estimated with both ordinary quantile
regression and a parametric quantile model. The top row shows the estimated intercept,
β0 (p), and coefficient for age, β1 (p), as functions of p, along with shaded areas indicating
their 95% confidence intervals. The inclusion criteria explain the vanishing confidence
intervals at the smallest quantiles. The estimates show a markedly erratic behavior,
which can be explained only by sampling variability.
The bottom row in figure 2 shows the estimates for the intercept and the coeffi-
cient associated with age obtained with a parametric quantile model. These can be
compared with the 99 percentiles, Q(0.01|age), . . . , Q(0.99|age), shown in the top row,
that were obtained with qreg. Although the overall trends are comparable, the para-
metric estimates are smoother, and their confidence intervals are narrower than their
nonparametric counterparts.

A B
70 0
Intercept

−.1
Age

50
−.2

30 −.3
0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
p p

C D
70 0
Intercept

−.1
Age

50
−.2

30 −.3
0 .2 .4 .6 .8 1 0 .2 .4 .6 .8 1
p p

Figure 2. Point estimates (lines) and 95% confidence intervals (shaded areas) for the in-
tercept (first column) and the coefficient for age (second column) from ordinary quantile
regression (first row) and from the parametric quantile model (second row) as functions
of the order of the quantile, p, with data from NHANES
264 qmodel: A command for fitting parametric quantile models

The above is an example of a simple parametric quantile model. In the remainder


of this article, we illustrate the potential of parametric quantile models. Section 2
contains the syntax of the qmodel command and its postestimation commands; section 3
expands on the example from the NHANES epidemiological study introduced above;
section 4 describes a nonlinear parametric quantile model arising from an example of
pharmacological modeling; section 5 provides additional details on parametric quantile
models; and section 6 concludes with final remarks.

2 Syntax
This section describes the syntax of the qmodel command and its postestimation com-
mands predict, qmodel plot, and qmodel quantile.

2.1 qmodel
qmodel fits parametric models for conditional quantile functions given covariates by
minimizing the definite integral between 0 and 1 of the quantile loss function with
respect to the order of the quantile, p.
    
qmodel exp varname = quantile function if in
, cluster(varname)

npoints(integer) qpoints(numlist) initial(values) log

exp varname is the variable whose quantile function is modeled or an expression of


it such as log(varname) and logit(varname).
quantile function is an expression representing the parametric quantile model. It
can be any combination of built-in functions, substitutable expressions, and mathemat-
ical functions. Its argument is indicated by the letter p. For example, the standard
exponential quantile function is -log(1-p).

Options

cluster(varname) specifies the cluster variable used in the cluster–robust sandwich


estimator for the standard errors.
npoints(integer) specifies the number of equally spaced internal points for the numer-
ical integration. The default is npoints(99), which results in the following set of
points: 0.01, 0.02, . . . , 0.99.
qpoints(numlist) specifies a list of points for the numerical integration. For exam-
ple, qpoints(.05 .1(.1).9 .95). If both qpoints() and npoints() are specified,
npoints() is ignored.
initial(values) specifies the initial values of the parameters for the optimization algo-
rithm. The values are separated by commas. The default is a vector of zeros. The
M. Bottai and N. Orsini 265

initial values correspond to the parameters in the order they appear in the quantile
model expression quantile function. For example, initial(10, 0, -1).
log shows the iteration log.

Built-in functions

alogistic
Description: asymmetric logistic quantile function
Expression: (exp({alogistic:log(a)})*log(p)-exp({alogistic:log(b)})
*log(1-p))
beta
Description: beta quantile function
Expression: invibeta(exp({beta:log(a)}),exp({beta:log(b)}),p)
chi2
Alias: chi-square
Description: chi-square quantile function
Expression: invchi2(exp({chi2:log(df)}),p)
clog
Description: complementary log term
Expression: ({clog:log(1-p)}*log(1-p))
cnormal
Description: centered normal quantile function
Expression: (exp({cnormal:log(sd)})*invnormal(p))
cubic
Alias: poly3
Description: centered cubic polynomial
Expression:
({cubic:constant}+{cubic:p}*(p-.5)+{cubic:p^2}*(p-.5)^2
+{cubic:p^3}*(p-.5)^3)
cp3
Description: centered cubic term
Expression: ({cp3:(p-.5)^3}*(p-.5)^3)
cp2
Description: centered quadratic term
Expression: ({cp2:(p-.5)^2}*(p-.5)^2)
cp
Description: centered linear term
Expression: ({cp:p-.5}*(p-.5)))
266 qmodel: A command for fitting parametric quantile models

exponential
Description: exponential quantile function
Expression: (-exp({exponential:log(mean)})*log(1-p)))
flat
Alias: cons, poly0
Description: flat quantile function
Expression: ({flat:constant})
gamma
Description: gamma quantile function
Expression: (exp({gamma:log(scale)})
*invgammap(exp({gamma:log(shape)}),p))
lognormal
Description: log-normal quantile function
Expression: (exp({lognormal:mean}
+exp({lognormal:log(sd)})*invnormal(p)))
logitnormal
Description: logit-normal quantile function
Expression: invlogit({logitnormal:mean}
+exp({logitnormal:log(sd)})*invnormal(p))
linear
Alias: uniform, poly1
Description: centered linear (uniform) quantile function
Expression: ({linear:constant}+{linear:p}*(p-.5))
log
Description: log term
Expression: ({log:log(p)}*log(p))
normal
Description: normal quantile function
Expression: ({normal:mean}+exp({normal:log(sd)})*invnormal(p))
p3
Alias: p-cubed
Description: cubic term
Expression: {p3:p^3}*^3
p2
Alias: p-squared
Description: quadratic term
Expression: {p2:p^2}*p^2
p
Description: linear term
Expression: {p:p}*p
M. Bottai and N. Orsini 267

quadratic
Alias: poly2
Description: centered quadratic polynomial
Expression: ({quadratic:constant}+{quadratic:p}*(p-.5)
+{quadratic:p^2}*(p-.5)^2)
root2
Alias: sqrt
Description: square root term
Expression: ({root2:p^(1/2)}*p^.5)
root3
Description: cubic root term
Expression: ({root3:p^(1/3)}*p^(1/3))
splineD K
Description: spline of order D with K equally spaced internal knots between 0 and 1
Expression for linear splines with one knot:
{spline:constant}+{spline:p}*p^1+{spline:spline^1_1}*(p-1/2)^1
*(p>(1/2))
weibull
Description: Weibull quantile function
Expression: (exp({weibull:log(mean)})
*(-log(1-p))^exp(-{weibull:log(shape)}))

Stored results

qmodel stores the following in e():


Scalars
e(N) number of observations
e(bic) Bayesian information criterion
e(aic) Akaike’s information criterion
e(loss) loss function
Macros
e(cmd) qmodel
e(cmdline) command as typed
e(Q b) quantile model for Mata
e(Q) quantile model with substitutable expressions
e(properties) b V
e(predict) program used to implement predict
Matrices
e(b) coefficient vector
e(V) variance–covariance matrix of the estimators
Functions
e(sample) marks estimation sample
268 qmodel: A command for fitting parametric quantile models

2.2 predict
The predict command predicts specified functions of parameters at the proportions
stored in the existing variable specified in proportion(varname). The functions of
parameters to be predicted are specified in quantile function of qmodel using the special
symbols ( and ) or, equivalently, [ and ] . Standard errors of the predicted quantiles
can be obtained with the se option.
 
predict newvarlist, proportion(varname) se

Options

proportion(varname) specifies the name of an existing variable containing proportions.


proportion() is required.
se specifies the standard error of the prediction.

2.3 qmodel plot


The qmodel plot command plots specified functions of the parameters against the pro-
portion. The functions of parameters to be predicted are specified in quantile function
of qmodel using the special symbols ( and ) or, equivalently, [ and ] .
 
qmodel plot , ci replace addplot(string) twoway options

Options

ci shows confidence intervals of the quantiles.


replace replaces previous graph.
addplot(string) adds other plots to the generated graph.
twoway options are any of the options documented in [G-3] twoway options.

2.4 qmodel quantile


The qmodel quantile command computes point estimates, standard errors, test statis-
tics, significance levels, and confidence intervals for the quantile of exp varname in
qmodel at the proportions specified in numlist. The default is the median.
    
qmodel quantile numlist , at(varname =# varname =# ... ) nlcom options
M. Bottai and N. Orsini 269

Options
  
at(varname =# varname =# ... ) specifies the values of the covariates at which the
quantiles are to be estimated.
nlcom options specifies standard nlcom options; see [R] nlcom.

3 Example 1: Body mass and age


In this section, we describe the use of the qmodel command and its postestimation
commands. We expand on the simple linear regression model introduced in section 1.
We present an example of nonlinear models in section 4.
The data arise from all the male participants in the 2017 NHANES who were at least
20 years old and had a BMI of at least 30 kg/m2 . We consider the simple linear regression
model introduced in section 1,

Q(p|age) = β0 (p) + β1 (p)(age − 20)

The variable age is centered at the value 20, the smallest observed value in our data.
The β0 (p) function therefore represents the quantile function of BMI in 20-year-olds, that
is, Q(p|20) = β0 (p). We generally recommend ensuring that the value zero is within the
observed range of all covariates because this eases the interpretation of the intercept
function and the numerical stability of the estimation algorithm.
To construct parametric models for the regression coefficients, β0 (p) and β1 (p), one
can obtain point estimates for a set of proportions with ordinary quantile regression,
as shown in the top panels of figure 2, and then find a parametric quantile model that
most closely approximates them.
Based on panel A in figure 2, we start by approximating the quantile function β0 (p)
with the quantile function of an exponential distribution with support BMI ≥ 30,

β0 (p) = 30 − θ0 log(1 − p)

with θ0 > 0. Because the individuals were included in the study if their BMI was at
least 30 kg/m2 , the smallest BMI value is known to be 30 with no sampling variability.
Hence, we do not include a parameter representing a location shift of the baseline value
but rather the fixed number 30.
We chose an initial approximating function for β0 (p) based on the estimates for the
intercept of ordinary quantile regression. If these were unavailable, we could consider
the quantile function of BMI within a range of small age values. For example, we build
the empirical quantile function of BMI in individuals between 20 and 25 years of age
shown in figure 3.
270 qmodel: A command for fitting parametric quantile models

. use nhanes_lite
(NHANES 2015-16, merged Demographics + Body Measures)
. sort bmi
. generate p = _n/_N if inrange(age, 20, 25)
(869 missing values generated)
. line bmi p

Technical note

The quantile function of a variable is its cumulative distribution function, where the
x axis and y axis are swapped. The quantile function maps from the proportion (x axis)
to the values of the outcome variable (y axis).

60
Body mass index

50

40

30
0 .2 .4 .6 .8 1
p

Figure 3. Empirical quantile function of BMI in individuals between 20 and 25 years old
with data from NHANES

When no graphical representations are available, one can start with flexible models
such as cubic splines or step functions. Understanding the substantive meaning of the
outcome variable can help one make sound decisions. For example, in our model, β0 (p)
represents the top tail of the conditional distribution of BMI truncated at 30 kg/m2 .
While the exponential function would not be a reasonable choice for the entire distribu-
tion of BMI, which may be expected to be unimodal, it is a sensible initial approximation
for its extreme top tail.
We now consider the regression coefficient associated with age, β1 (p). Based on
panel B in figure 2, we start by approximating it with a third-order polynomial with no
intercept,
β1 (p) = θ1 p + θ2 p2 + θ3 p3

Other flexible functions, such as splines or step functions, could be used instead.
M. Bottai and N. Orsini 271

The quantile parametric model resulting from the above specifications is

Q(p|age) = 30 − θ0 log(1 − p) + (θ1 p + θ2 p2 + θ3 p3 )(age − 20)

The above model constrains the smallest BMI value to be equal to 30 kg/m2 , Q(0|age) =
30, at all age values, in accordance with the inclusion criterion BMI ≥ 30 kg/m2 .
We estimate the parameters of the above model with the qmodel command:
. qmodel bmi = 30 + _exponential + (_p + _p2 + _p3)*(age - 20), npoints(200)
Parametric Quantile Model Number of obs = 930
AIC = 15.11179
Loss function = 1226.338993 BIC = 34.45253

bmi Coef. Std. Err. z P>|z| [95% Conf. Interval]

exponential
log(mean) 1.88083 .0636308 29.56 0.000 1.756116 2.005544

p
p -.0280415 .0169581 -1.65 0.098 -.0612788 .0051958

p2
p^2 .0197677 .041465 0.48 0.634 -.0615022 .1010376

p3
p^3 -.121129 .0425205 -2.85 0.004 -.2044676 -.0377903

The table reports the estimates for all the model parameters along with their stan-
dard errors, z statistics, p-values, and 95% confidence intervals. The level of the confi-
dence intervals can be changed with the set level command. The header above the
table shows the value of the loss function, the number of observations used in the esti-
mation, and the values of Akaike’s information criterion and the Bayesian information
criterion.
The qmodel command in the above paragraph contains the following built-in func-
tions: exponential, p, p2, and p3. The complete list of functions, each with a short
description and its expanded expression, is given in section 2 and in the help files that
open with the help qmodel command.
The built-in functions are internally expanded into standard substitutable expres-
sions. qmodel saves the expanded syntax in the e(Q) macro, which can be retrieved as
follows:
. display e(Q)
30+(-exp({exponential:log(mean)})*log(1-p))+({p:p}*p+{p2:p^2}*p^2+{p3:p^3}*p^3)*
> (age-20)

Typing the following command would give identical output to the above. For brevity,
we suppress the output with the quietly command.
. quietly qmodel bmi = 30+(-exp({exponential:log(mean)})*log(1-p))
> +({p:p}*p+{p2:p^2}*p^2+{p3:p^3}*p^3)*(age-20), npoints(200)
272 qmodel: A command for fitting parametric quantile models

The model parameters are named within curly brackets. The letter p is the symbol
that represents the proportion p, the argument of the quantile function Q(p). Any
occurrences of the symbol p in the model expression are interpreted as such, so if a
covariate named or abbreviated as p is included in the model, it is interpreted as the
proportion, not the covariate. Similarly, if a covariate named or abbreviated as one of
the built-in functions is included in the model, it is interpreted as the function, not the
covariate. To introduce such a covariate, one needs to rename it first.
qmodel allows one to specify any function of parameters, covariates, and the propor-
tion p. Curly bracket syntax, qmodel built-in functions, and Stata standard functions
can be used in any combination, as demonstrated in the following sections and in the
help documentation.

Technical note

The exponential built-in function constrains its mean to be positive by estimating


the logarithm of the mean, which can take on any real value. This is a popular method
for constraining parameters that must be positive. It ensures positive estimates for the
mean and often improves the performance of the estimation algorithm.

The above qmodel command specifies the qpoints(200) option, which increases the
quadrature points in the estimation algorithm to 200. With the default 99 quadra-
ture points, the algorithm fails to converge because the functions p2 and p3 are nearly
collinear over the interval p ∈ (0, 1).
The quadratic term of the coefficient for age is not significant, and we omit it from
the model.
. qmodel bmi = 30 + _exponential + (_p + _p3)*(age - 20)
Parametric Quantile Model Number of obs = 930
AIC = 13.11667
Loss function = 1232.344029 BIC = 27.62223

bmi Coef. Std. Err. z P>|z| [95% Conf. Interval]

exponential
log(mean) 1.879772 .0473395 39.71 0.000 1.786989 1.972556

p
p -.0230036 .0110279 -2.09 0.037 -.0446179 -.0013893

p3
p^3 -.1028131 .020904 -4.92 0.000 -.1437842 -.061842

After we remove the quadratic term from the model, the near collinearity among
covariates vanishes, and specifying the qpoints(200) option becomes unnecessary.
We now check the goodness of the fit of the exponential function by fitting the more
flexible Weibull function, of which the exponential is a special case. We use the qmodel
command with the built-in Weibull function:
M. Bottai and N. Orsini 273

. qmodel bmi = 30 + _weibull + (_p + _p3)*(age - 20)


Parametric Quantile Model Number of obs = 930
AIC = 15.11667
Loss function = 1232.338098 BIC = 34.45741

bmi Coef. Std. Err. z P>|z| [95% Conf. Interval]

weibull
log(mean) 1.877965 .0557546 33.68 0.000 1.768688 1.987242
log(shape) -.0079472 .048473 -0.16 0.870 -.1029526 .0870582

p
p -.020373 .0188415 -1.08 0.280 -.0573016 .0165556

p3
p^3 -.1085563 .0439087 -2.47 0.013 -.1946157 -.0224968

The logarithm of the shape parameter of the Weibull quantile function is not signifi-
cantly different from zero, which is equivalent to stating that the shape parameter is not
significantly different from one. A Weibull distribution with shape equal to one is an
exponential distribution. We therefore opt for the exponential quantile function. The
values of Akaike’s information criterion and the Bayesian information criterion with the
exponential function are smaller than those with the Weibull, further supporting our
decision.

Technical note

Akaike’s information criterion and the Bayesian information criterion are often used
to compare parametric quantile models that are nonnested. Although they are widely
accepted, they are not always reliable and should generally be regarded as guidelines
(Burnham and Anderson 2004).

We consider the model with the exponential intercept our final model. It can be
written as
Q(p|age) = 30 − θ0 log(1 − p) + (θ1 p + θ2 p3 )(age − 20)
Its estimates for the functions β0 (p) = 30 − θ0 log(1 − p) and β1 (p) = θ1 p + θ2 p3 are
displayed on the bottom row of figure 2.

Technical note

The above final model sets the conditional distribution of BMI among 20-year-olds
to be exponential, Q(p|20) = 30 − θ0 log(1 − p), with mean equal to θ0 and support over
the half line (30, ∞). For increasing values of age, the conditional distribution smoothly
morphs into a sequence of different distributions that are progressively farther away from
the exponential. Their quantile function is the weighted sum of an exponential function
and a cubic function, with weights varying with age. Except at age equal to 20 years, the
corresponding conditional cumulative distribution function and conditional probability
274 qmodel: A command for fitting parametric quantile models

density function do not have a closed-form expression. Attempting to estimate the model
parameters by maximizing the likelihood function would therefore be cumbersome.

3.1 The qmodel plot command


This and the following two subsections describe the three postestimation commands of
qmodel: qmodel plot, predict, and qmodel quantile.
The qmodel plot postestimation command produces plots for any sets of parameters
specified in the latest qmodel command. We use qmodel plot to build the two plots on
the bottom row of figure 2. To plot the β0 (p) function, qmodel plot requires identifying
the set of parameters that define the function to be plotted. This is accomplished by
enclosing the relevant part of the qmodel command within the pair of special brackets
( and ) . Square brackets [ and ] can be used instead.

. quietly qmodel bmi = _(30 + _exponential)_ + (_p + _p3)*(age - 20)

If the function to be plotted is specified by the special brackets in the qmodel command,
as above, we can use the qmodel plot command to plot it.

. qmodel_plot, ci ylabel(30 50 70, angle(h)) ytitle("Intercept")

Panel C in figure 2 shows the resulting plot. The ci option requires pointwise
confidence bands to be laid over the point estimates. The standard two-way plot options,
such as ylabel() and ytitle(), are allowed. We refer the reader to the documentation
of the twoway command for more details.
The special brackets in the qmodel command are required only for later qmodel plot
commands, which need them to identify the functions to plot. The qmodel command
itself does not need them, as demonstrated in the examples presented in the first part
of section 3.
We now use the qmodel plot command to plot the β1 (p) function shown in panel D
in figure 2 after enclosing this function within special brackets in the qmodel command.

. quietly qmodel bmi = 30 + _exponential + _(_p + _p3)_*(age - 20)


. qmodel_plot, ci ylabel(-.3(.1)0) ytitle("Age")

When multiple sets of special brackets are included, the qmodel plot command
produces multiple graphs. For example, the following two lines of code produce the
above two plots simultaneously (graphs not shown in this article).

. quietly qmodel bmi = _(30 + _exponential)_ + _(_p + _p3)_*(age - 20)


. qmodel_plot, ci

When multiple graphs are specified, these are given default names, and the name()
option is not allowed. Any options specified in the qmodel plot command would apply
to all the graphs.
M. Bottai and N. Orsini 275

As noted in section 1, the overall trend of the parametric and nonparametric es-
timates are similar, the former are smoother, and their confidence intervals narrower
than the latter. The confidence intervals vanish as the proportion p tends to 0 because
of the inclusion criterion BMI ≥ 30 kg/m2 .

3.2 The predict command


The predict command predicts specified functions of parameters and their standard er-
rors at values of p stored in an existing variable, specified by the required proportion()
option. For example, we estimate the 99 percentiles after generating a new variable
named proportion that contains the proportions 0.01 to 0.99 by 0.01 steps. We also
estimate the corresponding standard errors by specifying the se option.

. quietly qmodel bmi = _(30 + _exponential)_ + _(_p + _p3)_*(age - 20)


. range proportion .01 .99 99
(831 missing values generated)
. predict beta0 beta1, p(proportion)
. predict se_beta0 se_beta1, p(proportion) se

The estimates for the quantiles are stored in the newly created variables named
beta0 and beta1. The corresponding estimates for the standard errors are stored in
the newly created variables named se beta0 and se beta1. Because the above qmodel
command contains two sets of enclosed parameters, in the predict command, we specify
two new variable names, one for each set of parameters. If only one variable name is
provided, predict predicts only the first set of enclosed parameters given in the qmodel
command. In general, if there are fewer new variable names listed in the predict
command than the number of parameter sets, predict predicts only those specified in
the order they appear in the model given in the qmodel command.
We list the estimates for the specified sets of parameters at the median, β0 (0.5) and
β1 (0.5), with their respective standard errors.

. clist proportion beta0 se_beta0 beta1 se_beta1 if proportion==.5


proport~n beta0 se_beta0 beta1 se_beta1
50. .5 34.54151 .2149928 -.0243534 .0060572

Plotting the newly generated variables, beta0 and beta1, against the newly created
variable proportion would produce the same plots as those created by the qmodel plot
command, shown on the bottom row of figure 2.

3.3 The qmodel quantile command


The qmodel quantile command can estimate any quantiles at any specified covariate
values from the latest qmodel command. For example, we estimate the median BMI in
the 30-year-old male and obese NHANES population.
276 qmodel: A command for fitting parametric quantile models

. qmodel_quantile, at(age=30)
Quantile p=.5 at age=30
quantile: (30+(-exp(_b[exponential:log(mean)])*log(1-p)))+(_b[p:p]*p
> +_b[p3:p^3]*p^3)*(age-20)

bmi Coef. Std. Err. z P>|z| [95% Conf. Interval]

quantile 34.29798 .1701276 201.60 0.000 33.96453 34.63142

The above table shows the estimated quantile along with its standard error, z statis-
tic, p-value, and 95% confidence intervals. The estimated median is 34.3 kg/m2 with a
standard error of 0.170 and a 95% confidence interval equal to [34.0, 34.6].
Above, we did not specify any proportion, and the qmodel quantile command
defaulted to the median. The proportion can be specified as a numeric list. For example,
we estimate the 95th, 96th, and 97th percentiles of BMI in 30-year-old individuals.
. qmodel_quantile .95(.01).97, at(age=30) noheader
Quantile p=.95 at age=30

bmi Coef. Std. Err. z P>|z| [95% Conf. Interval]

quantile 48.52805 .7529977 64.45 0.000 47.0522 50.0039

Quantile p=.96 at age=30

bmi Coef. Std. Err. z P>|z| [95% Conf. Interval]

quantile 49.95966 .8175188 61.11 0.000 48.35735 51.56196

Quantile p=.97 at age=30

bmi Coef. Std. Err. z P>|z| [95% Conf. Interval]

quantile 51.81354 .9018836 57.45 0.000 50.04588 53.5812

In our study population, the 95th percentile is estimated to be 48.5 kg/m2 with a
standard error of 0.753 and a confidence interval equal to [47.1, 50.0].
The above qmodel quantile command specifies the noheader option. The com-
mand is based on the nlcom command and allows all the options of that command. We
refer the reader to the documentation of this command for more details.
Thanks to the parametric assumptions, we can obtain estimates for any percentile,
however extreme. For example, we estimate the 0.9999 quantile as follows:
. qmodel_quantile .9999, at(age=30) noheader
Quantile p=.9999 at age=30

bmi Coef. Std. Err. z P>|z| [95% Conf. Interval]

quantile 89.08735 2.653125 33.58 0.000 83.88732 94.28738


M. Bottai and N. Orsini 277

Our data contain only 930 individuals. This implies that the above estimate of
89.1 kg/m2 is an extrapolation, valid only if the parametric quantile model is true.
Ordinary quantile regression would be unable to estimate such an extreme percentile.
As with any other statistical method, one should generally be cautious when interpreting
extrapolated inferences.

4 Example 2: Nonlinear dose–response relationships


Parametric quantile models extend beyond the ordinary linear quantile regression frame-
work. Some examples of the range of possibilities are described in section 5. This sec-
tion describes the estimation of a nonlinear parametric quantile model frequently used
in pharmacological research.
We analyze the data from a fictitious laboratory experiment in which 200 animals
were injected with 5 doses of an agent, and 400 more animals were injected with the
same 5 doses but of a different agent. The liver concentration of the agent was measured
one hour after injection. Figure 4 shows the data and true conditional median of the
liver concentration for the two agents.

80 80

60 60
Liver concentration

Liver concentration

40 40

20 20

0 0
0 1 2 3 4 0 1 2 3 4
Log(dose) Log(dose)

Figure 4. Observed liver concentration (dots) and true median concentration (line) at
5 injected doses of the first agent (left panel) and the second agent (right panel) with
simulated data

In our data, the location, spread, and overall shape of the distribution of liver con-
centration change over dose and between agents. We consider a popular dose–response
function, the Hill function,
θ1 − θ0
concentration = θ0 + +ε
1 + 10θ2 (θ3 −dose)
278 qmodel: A command for fitting parametric quantile models

The parameters of the above model can be interpreted as follows: minimal response
(θ0 ), maximal response (θ1 ), slope (θ2 ), and dose corresponding to half the maximal
response (θ3 ). The error term, ε, is generally assumed to follow a normal distribution.
A possible specification of the Hill quantile function is
θ1 − θ0
Q(p|dose) = θ0 + + ε(p)
1 + 10θ2 (θ3 −dose)
For the error term, we use the specification

ε(p) = exp(θ4 + θ5 dose)z(p)

with the first injected agent and

ε(p) = exp(θ4 + θ5 dose)z(p) + θ6 {log(0.5) − log(1 − p)}

with the second injected agent. The term z(p) denotes the standard normal quantile
function.
The error term with the first agent follows a heteroskedastic normal distribution.
With the second agent, the quantile function of the error term is the sum of a het-
eroskedastic normal quantile function and an exponential quantile function centered at
the median. The two parameters θ4 and θ5 allow for the observed increasing variabil-
ity of liver concentration and dose. The exponential function is applied to constrain
the standard deviation of the error term to be positive. The parameter θ6 models the
changing shape of the error distribution between agents.
Because ε(0.5) = 0, the conditional median concentration given dose is

θ1 − θ0
Q(0.5|dose) = θ0 +
1 + 10θ2 (θ3 −dose)

We generated the data shown in figure 4 with the following code:


. clear all
. set seed 12345
. set obs 600
number of observations (_N) was 0, now 600
. generate agent = _n>200
. generate dose = mod(_n,5)
. generate p = runiform()
. generate concentration = 10 + 50/(1 + 10^(2*(2 - dose))) + exp(-.5 + .5*dose)*
> invnormal(p) + (log(.5) - log(1-p))*agent
M. Bottai and N. Orsini 279

We use the qmodel command to estimate the parameters of the Hill quantile function
only for the first agent.

. qmodel concentration =
> {theta0} + ({theta1}-{theta0})/(1 + 10^({theta2}*({theta3} - dose)))
> + exp({theta4} + {theta5}*dose)*invnormal(p)
> if agent==0, initial(10, 60, 1, 2, 0, 0)
Parametric Quantile Model Number of obs = 200
AIC = 16.81733
Loss function = 123.6346142 BIC = 36.60723

concentrat~n Coef. Std. Err. z P>|z| [95% Conf. Interval]

theta0 10.02316 .1168091 85.81 0.000 9.794214 10.2521


theta1 60.14847 .4468234 134.61 0.000 59.27271 61.02423
theta2 2.120641 .2564068 8.27 0.000 1.618093 2.623189
theta3 1.998072 .0060535 330.07 0.000 1.986207 2.009936
theta4 -.3235488 .1243375 -2.60 0.009 -.5672458 -.0798517
theta5 .4400006 .0681094 6.46 0.000 .3065087 .5734926

The confidence intervals of all the parameters are approximately centered at the
respective true values used in the above data-generating code, θ0 = 10, θ1 = 60, θ2 = 2,
θ3 = 2, θ4 = −0.3, and θ5 = 0.4.

Technical note

The expression of the quantile function in the above qmodel command is the same as
that used in the generate command in the simulation above, except that the parameters
are replaced by numeric values in the latter command. This may help one see quantile
functions as true data-generating laws.

To facilitate convergence of the estimation algorithm, we specified the initial values


of the parameters with the initial() option. Pretending not to know the true values,
we determined the initial values as follows: based on visual inspection of figure 4,
we set the minimal response θ0 = 10, the maximal response θ1 = 60, and the dose
corresponding to half the maximal response θ3 = 2. The slope was assigned the value
θ2 = 1, a reasonable positive number, and the variance parameters were assigned the
values θ4 = 0 and θ5 = 0, corresponding to homoskedastic standard normal errors. If
the initial values were not specified, the qmodel command would default to the initial
value zero for all the parameters and fail to converge.
280 qmodel: A command for fitting parametric quantile models

We now use the qmodel quantile command to estimate the median and 95th per-
centile of liver concentration at the control dose.
. qmodel_quantile .5 .95, at(dose=0) noheader
Quantile p=.5 at dose=0

concentrat~n Coef. Std. Err. z P>|z| [95% Conf. Interval]

quantile 10.02606 .1145456 87.53 0.000 9.801553 10.25056

Quantile p=.95 at dose=0

concentrat~n Coef. Std. Err. z P>|z| [95% Conf. Interval]

quantile 11.21624 .1969464 56.95 0.000 10.83023 11.60224

The estimated 95th percentile is equal to 11.2 mg/ml with a standard error of 0.2
and a 95% confidence interval equal to [10.8, 11.6].
Presently, no command can estimate ordinary quantile regression with nonlinear
models (Koenker and Park 1996). We therefore compare the above estimates with those
from the nl command, which can fit nonlinear models for the conditional mean of the
outcome variable. Under the normal errors generated in our fictitious example with the
first injected agent, the conditional median is identical to the conditional mean.
. nl (concentration =
> {theta0} + ({theta1}-{theta0})/(1 + 10^({theta2}*({theta3} - dose))))
> if agent==0, vce(robust) nolog
> initial(theta0 10 theta1 60 theta2 1 theta3 2)
(obs = 200)
Nonlinear regression Number of obs = 200
R-squared = 0.9963
Adj R-squared = 0.9962
Root MSE = 2.565692
Res. dev. = 940.4262

Robust
concentrat~n Coef. Std. Err. t P>|t| [95% Conf. Interval]

/theta0 10.0339 .2281207 43.99 0.000 9.584015 10.48379


/theta1 60.03977 .5185948 115.77 0.000 59.01703 61.06251
/theta2 2.135745 .5143438 4.15 0.000 1.121387 3.150104
/theta3 1.997798 .0060065 332.61 0.000 1.985952 2.009644

The above nl command also requires specifying the initial values of the param-
eters because not doing so would produce wrong estimates. Owing to the error of
heteroskedasticity, we request the robust estimator for the standard errors.
Finally, we model the differences between the two injected agents with the qmodel
command. We include the additional parameter θ6 to model the different shape of the
error distribution between the two agents. We set the initial values for the parameters
θ0 to θ5 as before. We set the initial value for the additional parameter θ6 equal to 0,
corresponding to the case of equal error distribution with both agents.
M. Bottai and N. Orsini 281

. qmodel concentration =
> {theta0} + ({theta1}-{theta0})/(1 + 10^({theta2}*({theta3} - dose)))
> + exp({theta4} + {theta5}*dose)*invnormal(p)
> + {theta6}*(log(.5) - log(1-p))*agent,
> initial(10, 60, 1, 2, 0, 0, 0)
Parametric Quantile Model Number of obs = 600
AIC = 20.13355
Loss function = 461.0722091 BIC = 50.91206

concentrat~n Coef. Std. Err. z P>|z| [95% Conf. Interval]

theta0 9.959535 .0990781 100.52 0.000 9.765346 10.15372


theta1 60.18075 .2856419 210.69 0.000 59.6209 60.7406
theta2 1.958932 .14838 13.20 0.000 1.668112 2.249751
theta3 2.005584 .004521 443.61 0.000 1.996723 2.014446
theta4 -.320077 .1055352 -3.03 0.002 -.5269221 -.1132319
theta5 .451795 .0391254 11.55 0.000 .3751107 .5284793
theta6 .8890369 .1585979 5.61 0.000 .5781908 1.199883

The confidence intervals of all the parameters are approximately centered at the
respective true values used in the above data-generating code. The true value for the
additional parameter is θ6 = 1. With the second agent, the conditional median is
different from the conditional mean, and the above estimates cannot be compared with
those from the nl command.
All the qmodel commands in this section estimate the respective true quantile func-
tions. These are known because we generated the data ourselves. In real-data settings,
the true quantile functions obviously are unknown, and good-fitting models should be
found. For brevity, however, we do not discuss model-building strategies in this section;
some are described in section 3.

5 More details about parametric quantile models


This subsection contains further details about parametric quantile models and their
potentials. Parametric quantile models were introduced by Frumento and Bottai (2016,
2017) and Bottai and Cilluffo (2017). See those articles for more information.
The conditional quantile function of an outcome variable of interest y given a k-
dimensional vector of covariates x is defined as

Q(p|x) = inf{y : P (Y ≤ y|x) = p}

The symbol p ∈ (0, 1) represents the order of the quantile. For example, p = 0.5
corresponds to the conditional median. The function Q(p|x) is nondecreasing with
respect to its argument, the proportion p.
Parametric quantile models assume the quantile function to be known up to a pa-
rameter vector θ. For example, the quantile function of a variable uniformly distributed
between zero and θ > 0 is
Q(p|θ) = θp
282 qmodel: A command for fitting parametric quantile models

and that of an exponential variable with mean equal to θ > 0 is


Q(p|θ) = −θ log(1 − p)

Parametric quantile functions provide a flexible framework for modeling shapes of


distributions because of their properties. First, they are invariant to monotone trans-
formations of the outcome variable. For example, if Q(p) is the quantile function of a
variable y, then
g{Q(p)}
is the quantile function of g(y) for any monotone function g : R 7→ R. For example, the
quantile function of a variable whose square root is exponential with mean equal to θ is
Q(p|θ) = θ2 log(1 − p)2
and that of a standard log-normal variable is
Q(p|θ) = exp{z(p)}
where z(p) indicates the standard normal quantile function.
Second, the modeling potential of parametric quantile models can be expanded by
considering that sums, products, and functions of nondecreasing functions are them-
selves nondecreasing. For example, if two functions g1 (p) and g2 (p) are nondecreasing
over the interval p ∈ (0, 1), then
g1 (p) + g2 (p)
is nondecreasing. If g1 is nondecreasing on the entire real line, then
g1 {g2 (p)}
is nondecreasing. If two functions g1 (p) and g2 (p) are nondecreasing and positive over
p ∈ (0, 1), then
g1 (p)g2 (p)
is nondecreasing.
The possible dependence of the quantile function on the covariates can be evaluated
by including the covariates in the model. For example,
Q(p|x1 , x2 ) = exp(θ0 + θ1 x1 + θ2 x2 )p
defines a uniform distribution with support between zero and a value that depends on
the values of the two covariates x1 and x2 ;
Q(p|x1 , x2 ) = log(2p) − exp(θ0 + θ1 x1 + θ2 x2 ) log(2 − 2p)
defines a unimodal, asymmetric, and zero-median logistic distribution whose variance
and skewness depend on the covariate values; and
s
exp(θ0 + θ1 x1 + θ2 x2 )
Q(p|x1 , x2 ) = t{p|2 + exp(θ0 + θ1 x1 + θ2 x2 )}
2 + exp(θ0 + θ1 x1 + θ2 x2 )
M. Bottai and N. Orsini 283

with t(p|ν) representing the quantile function of the Student’s t-distribution with ν
degrees of freedom, defines a distribution whose kurtosis changes along with values of
covariates, while the mean, variance, and skewness remain unchanged.
Ordinary quantile regression assumes

Q(p|x1 , x2 ) = β0 (p) + β1 (p)x1 + β2 (p)x2

The above conditional quantile function is the weighted sum of three functions of p,
β0 (p), β1 (p), and β2 (p), with weights defined by the covariate values, x1 and x2 .
All the above models can be fit with the qmodel command.

5.1 Transformations of the outcome variable


This subsection illustrates the potentials of transforming the outcome variable. The
qmodel command easily allows one to model transformation of the outcome variable
of interest. For example, we generate 1,000 random observations from a log-normal
distribution and estimate its parameters with three alternative and equivalent syntax
specifications.

. clear all
. set seed 12345
. set obs 1000
number of observations (_N) was 0, now 1,000
. generate y = exp(rnormal())
. qmodel log(y) = _normal
Parametric Quantile Model Number of obs = 1,000
AIC = 9.667566
Loss function = 289.3293655 BIC = 19.48308

log(y) Coef. Std. Err. z P>|z| [95% Conf. Interval]

normal
mean .0191749 .0323673 0.59 0.554 -.0442638 .0826136
log(sd) .0058047 .0246659 0.24 0.814 -.0425396 .054149

. qmodel y = _lognormal
Parametric Quantile Model Number of obs = 1,000
AIC = 10.12017
Loss function = 454.9417122 BIC = 19.93568

y Coef. Std. Err. z P>|z| [95% Conf. Interval]

lognormal
mean .0197244 .0330788 0.60 0.551 -.0451089 .0845577
log(sd) .0011235 .023555 0.05 0.962 -.0450434 .0472904
284 qmodel: A command for fitting parametric quantile models

. qmodel y = exp(_normal)
Parametric Quantile Model Number of obs = 1,000
AIC = 10.12017
Loss function = 454.9417122 BIC = 19.93568

y Coef. Std. Err. z P>|z| [95% Conf. Interval]

normal
mean .0197244 .0330788 0.60 0.551 -.0451089 .0845577
log(sd) .0011235 .023555 0.05 0.962 -.0450434 .0472904

The first syntax models log(y) with a normal distribution, the second models y with
a log-normal distribution using the lognormal built-in function, and the third models
y applying the exponential function to the normal built-in function. The estimates
from the last two specifications are identical to each other. The estimates from the
first slightly differ from the other two because of numeric approximations. The loss
function of the first is substantially smaller in the first model specification because of
the different scale of the outcome variables. The first syntax is often the most stable
computationally, especially when the outcome variable takes on large values that cannot
be exponentiated at double precision.
As a second example, we generate 1,000 random observations from a logit-normal
distribution and estimate its parameters with three alternative and equivalent syntax
specifications.
. clear all
. set seed 12345
. set obs 1000
number of observations (_N) was 0, now 1,000
. generate y = invlogit(rnormal())
. qmodel logit(y) = _normal
Parametric Quantile Model Number of obs = 1,000
AIC = 9.667566
Loss function = 289.3293661 BIC = 19.48308

logit(y) Coef. Std. Err. z P>|z| [95% Conf. Interval]

normal
mean .0191749 .0323673 0.59 0.554 -.0442638 .0826136
log(sd) .0058047 .0246659 0.24 0.814 -.0425396 .054149

. qmodel y = _logitnormal
Parametric Quantile Model Number of obs = 1,000
AIC = 8.095893
Loss function = 60.09300672 BIC = 17.9114

y Coef. Std. Err. z P>|z| [95% Conf. Interval]

logitnormal
mean .0205917 .0329158 0.63 0.532 -.043922 .0851054
log(sd) -.0074835 .0275169 -0.27 0.786 -.0614155 .0464486
M. Bottai and N. Orsini 285

. qmodel y = invlogit(_normal)
Parametric Quantile Model Number of obs = 1,000
AIC = 8.095893
Loss function = 60.09300672 BIC = 17.9114

y Coef. Std. Err. z P>|z| [95% Conf. Interval]

normal
mean .0205917 .0329158 0.63 0.532 -.043922 .0851054
log(sd) -.0074835 .0275169 -0.27 0.786 -.0614155 .0464486

The logit normal is a flexible distribution for outcome variables that are bounded
within a known interval, such as visual analogue scales and percentages. It constitutes
an alternative to the beta distribution, which is implemented in the beta built-in
function.

5.2 Sums of functions for modeling skewness and kurtosis


This subsection briefly introduces the use of sums of quantile functions as a flexible
modeling tool. In particular, we discuss sums of functions for modeling skewness and
kurtosis. As an example of a model for skewness, we consider a quantile function
defined as the sum of a standard normal quantile function and a standard exponential
quantile function. The left-hand-side panel shows the right-skewed histogram of the
1,000 generated observations. The quantile function is depicted by the thick line in the
right-hand-side panel of figure 5. Its two components are shown as the solid thin line
(standard normal) and the dashed thin line (standard exponential).
286 qmodel: A command for fitting parametric quantile models

8
100

6
80
Frequency

4
60

y
2
40
0
20
−2
0
−5 0 5 10 0 .2 .4 .6 .8 1
y p

Figure 5. Left-hand-side panel: histogram of 1,000 observations generated from a quan-


tile function defined as the sum of a standard normal and standard exponential dis-
tribution; right-hand-side panel: the estimated quantile function (thick line) and its
constituent parts, standard normal (solid thin line) and standard exponential (dashed
thin line).

We generate 1,000 random observations and estimate the parameters with qmodel
with the two built-in functions normal and exponential.

. clear all
. set seed 12345
. set obs 1000
number of observations (_N) was 0, now 1,000
. generate p = runiform()
. generate y = invnormal(p) + invexponential(1,p)
. qmodel y = _normal + _exponential
Parametric Quantile Model Number of obs = 1,000
AIC = 12.28237
Loss function = 535.0572157 BIC = 27.00564

y Coef. Std. Err. z P>|z| [95% Conf. Interval]

normal
mean -.1011351 .1207899 -0.84 0.402 -.337879 .1356088
log(sd) -.0589565 .0911043 -0.65 0.518 -.2375176 .1196046

exponential
log(mean) .0614916 .116641 0.53 0.598 -.1671205 .2901037
M. Bottai and N. Orsini 287

The estimated parameters are not significantly different from zero, which is their
true value.
As an example of a model for kurtosis, we consider a quantile function defined as
the sum of the standard normal quantile function and another standard normal raised
to the third power. The left-hand-side panel shows the thick-tailed histogram of the
1,000 generated observations. The resulting quantile function is depicted as the thick
line in the right-hand-side panel of figure 6. Its two components are shown as the solid
thin line (standard normal) and the dashed thin line (standard normal cubed).

250 8

200
4
Frequency

150
y 0

100

−4
50

−8
0
−10 −5 0 5 10 0 .2 .4 .6 .8 1
y p

Figure 6. Left-hand-side panel: histogram of 1,000 observations generated from a quan-


tile function defined as the sum of the standard normal quantile function and another
standard normal raised to the third power; right-hand-side panel: the estimated quan-
tile function (thick line) and its constituent parts, standard normal (solid thin line) and
the standard normal cubed (dashed thin line).

We generate 1,000 random observations and estimate the parameters with qmodel
with the two built-in functions normal and cnormal:

. clear all
. set seed 12345
. set obs 1000
number of observations (_N) was 0, now 1,000
. generate p = runiform()
. generate y = invnormal(p) + (exp(-.5)*invnormal(p))^3
288 qmodel: A command for fitting parametric quantile models

. qmodel y = _normal + _cnormal^3


Parametric Quantile Model Number of obs = 1,000
AIC = 12.08712
Loss function = 440.1508302 BIC = 26.81038

y Coef. Std. Err. z P>|z| [95% Conf. Interval]

normal
mean -.0328258 .0403337 -0.81 0.416 -.1118784 .0462268
log(sd) .0074105 .0470209 0.16 0.875 -.0847488 .0995698

cnormal
log(sd) -.5022805 .0306044 -16.41 0.000 -.562264 -.4422971

All the confidence intervals contain their respective true values. With the same
thick-tailed yet symmetric data as above, we fit a model with the three built-in functions
normal, cnormal, and clog:
. qmodel y = _normal + _cnormal^3 + _clog
Parametric Quantile Model Number of obs = 1,000
AIC = 14.08712
Loss function = 440.1500494 BIC = 33.71814

y Coef. Std. Err. z P>|z| [95% Conf. Interval]

normal
mean -.0456682 .0979872 -0.47 0.641 -.2377196 .1463831
log(sd) -.0057807 .1222153 -0.05 0.962 -.2453182 .2337569

cnormal
log(sd) -.50253 .0366421 -13.71 0.000 -.5743471 -.4307128

clog
log(1-p) -.0159823 .1307528 -0.12 0.903 -.2722531 .2402885

The clog built-in function is similar to exponential except it does not constrain
its parameter to be strictly positive. It therefore can model distributions skewed either
to the left or to the right. Its estimated parameter is not significantly different from
zero, which is its true value. The estimates of the other parameters in the model are
close to those of the previous model, which did not allow for skewness.

5.3 Modeling conditional parametric quantile functions


This section illustrates the use of qmodel and its postestimation commands when the
quantile function depends on a set of covariates. We generate 1,000 random observations
from a quantile function defined as
Q(p|x) = z(p) − x1 + x2
where z(p) represents the standard normal quantile function, x1 is a binary covariate
with 0.5 probability, and x2 is a standard normal covariate. The above conditional
M. Bottai and N. Orsini 289

distribution is normal with unit standard deviation and mean that depends on a linear
combination of the two covariates.
. clear all
. set obs 1000
number of observations (_N) was 0, now 1,000
. set seed 12345
. generate x1 = rbinomial(1,.5)
. generate x2 = rnormal()
. generate y = rnormal() - x1 + x2

We estimate the parameters with the two built-in functions normal and flat. The
latter represents a flat or constant function of the proportion p. When the same built-in
function is specified multiple times, qmodel enumerates them in the order they appear
in the model.
. qmodel y = _normal + _flat*x1 + _flat*x2
Parametric Quantile Model Number of obs = 1,000
AIC = 13.62765
Loss function = 278.0091105 BIC = 33.25868

y Coef. Std. Err. z P>|z| [95% Conf. Interval]

normal
mean .0190015 .0454167 0.42 0.676 -.0700137 .1080166
log(sd) -.0063547 .0246736 -0.26 0.797 -.0547141 .0420046

flat
constant -.9719143 .0649784 -14.96 0.000 -1.09927 -.844559

flat.2
constant .9504763 .0298755 31.81 0.000 .8919215 1.009031

All the estimates are close to the true values with which the data were generated.
The estimates are similar to those from linear regression.
. regress y x1 x2, noheader vce(robust)

Robust
y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x1 -.9642124 .0612819 -15.73 0.000 -1.084469 -.8439561


x2 .95578 .0290417 32.91 0.000 .8987902 1.01277
_cons .018081 .0429056 0.42 0.674 -.0661147 .1022767

The data were generated with a homoskedastic error with standard deviation equal
to 1. Hence, the true value of the logarithm of the standard deviation of the normal
function in the quantile model is zero. The estimate from qmodel, −0.0063547, can be
compared with that from regress.
. display log(e(rmse))
-.03025572
290 qmodel: A command for fitting parametric quantile models

To help explain parametric quantile models and the qmodel command, we now fit
a misspecified model in which the flat function for the covariate x2 is replaced by a
normal quantile function.

. qmodel y = _normal + _flat*x1 + _normal*x2


Parametric Quantile Model Number of obs = 1,000
AIC = 15.62765
Loss function = 278.0091114 BIC = 40.16643

y Coef. Std. Err. z P>|z| [95% Conf. Interval]

normal
mean .0190112 .0454166 0.42 0.676 -.0700038 .1080261
log(sd) -.0063617 .0246728 -0.26 0.797 -.0547196 .0419962

flat
constant -.9719214 .0649785 -14.96 0.000 -1.099277 -.8445659

normal.2
mean .9504952 .0298754 31.82 0.000 .8919405 1.00905
log(sd) -14.5226 4.031293 -3.60 0.000 -22.42379 -6.621412

The estimated mean of the normal for the coefficient of x2 , 0.95, is close to the esti-
mate for the flat function in the previous model, and its estimated standard deviation,
exp(−14.5226) = 0.0000004931, is nearly 0. Therefore, the resulting function for the
coefficient of x2 is essentially flat, which is the true shape under which the data were
generated. The estimates of all the other parameters are nearly unchanged from the
previous model, where the coefficient of x2 was correctly specified as a flat function.

5.4 Ordinary quantile regression as a special case of quantile models


Ordinary quantile regression can be seen as a special case of the more general parametric
quantile models. For example, we estimate the conditional median of a variable y given
a covariate x.

. clear all
. set obs 1000
number of observations (_N) was 0, now 1,000
. set seed 12345
. generate x = rnormal()
. generate y = rnormal() + x
M. Bottai and N. Orsini 291

. qmodel y = _flat + _flat*x


Parametric Quantile Model Number of obs = 1,000
AIC = 10.0004
Loss function = 403.5896671 BIC = 19.81591

y Coef. Std. Err. z P>|z| [95% Conf. Interval]

flat
constant .0392935 .0477642 0.82 0.411 -.0543225 .1329095

flat.2
constant .9385966 .0370856 25.31 0.000 .8659102 1.011283

The above estimates are similar to those that can be obtained with the qreg com-
mand.

. qreg y x, nolog
Median regression Number of obs = 1,000
Raw sum of deviations 565.6781 (about .0568939)
Min sum of deviations 403.8584 Pseudo R2 = 0.2861

y Coef. Std. Err. t P>|t| [95% Conf. Interval]

x .9405279 .0406677 23.13 0.000 .8607239 1.020332


_cons .0391333 .0414907 0.94 0.346 -.0422857 .1205524

When the interest is only in the median and not in the entire quantile function, a
computationally faster alternative is to specify only one quadrature point at 0.5 with
the qpoints() option.

. qmodel y = _flat + _flat*x, qpoints(.5)


Parametric Quantile Model Number of obs = 1,000
AIC = 10.0004
Loss function = 403.5896671 BIC = 19.81591

y Coef. Std. Err. z P>|z| [95% Conf. Interval]

flat
constant .0392935 .0477648 0.82 0.411 -.0543238 .1329109

flat.2
constant .9385966 .0370867 25.31 0.000 .8659079 1.011285
292 qmodel: A command for fitting parametric quantile models

Any quantile can be estimated by specifying the corresponding proportion in the


qpoints() option.
. qmodel y = _flat + _flat*x, qpoints(.9)
Parametric Quantile Model Number of obs = 1,000
AIC = 9.172938
Loss function = 176.4323748 BIC = 18.98845

y Coef. Std. Err. z P>|z| [95% Conf. Interval]

flat
constant 1.378923 .0561666 24.55 0.000 1.268838 1.489007

flat.2
constant .9533475 .0696382 13.69 0.000 .8168592 1.089836

6 Final remarks
Parametric quantile models define the entire conditional distribution of the outcome
variable of interest. If of interest, they can be used to generate simulated data, plot
quantile functions and cumulative distribution functions by simply swapping the axes,
plot probability density functions by differentiating the cumulative distribution func-
tion with the dydx command, and estimate treatment effects (Frölich and Melly 2010;
Cattaneo, Drukker, and Holland 2013). The large- and small-sample behavior of the
estimator of parametric quantile models is described by Frumento and Bottai (2016)
for linear models, by Frumento and Bottai (2017) for linear models with censored and
truncated data, and by Bottai and Cilluffo (2017) for nonlinear models. The qmodel
command can provide estimates for conditional quantiles that are generally more effi-
cient than those obtained by ordinary quantile regression. However, misspecified para-
metric models may yield biased estimates. Using the qmodel command requires careful
model building. As illustrated in sections 3 and 4, one can avail oneself of visual rep-
resentations and comparison of nested and nonnested parametric models with varying
degrees of complexity. Frumento and Bottai (2016) presented an overall goodness-of-fit
test based on the Kolomogorov–Smirnov’s test statistic.

7 Acknowledgment
We are grateful to an anonymous referee for helpful comments on earlier versions of the
article.

8 References
Bottai, M., and G. Cilluffo. 2017. Nonlinear quantile parametric models. Unpublished
manuscript.
Bottai, M., and N. Orsini. 2013. A command for Laplace regression. Stata Journal 13:
302–314.
M. Bottai and N. Orsini 293

Burnham, K. P., and D. R. Anderson. 2004. Multimodel inference: Understanding AIC


and BIC in model selection. Sociological Methods and Research 33: 261–304.

Cattaneo, M. D., D. M. Drukker, and A. D. Holland. 2013. Estimation of multivalued


treatment effects under conditional independence. Stata Journal 13: 407–450.

Frölich, M., and B. Melly. 2010. Estimation of quantile treatment effects with Stata.
Stata Journal 10: 423–457.

Frumento, P., and M. Bottai. 2016. Parametric modeling of quantile regression coeffi-
cient functions. Biometrics 72: 74–84.

. 2017. Parametric modeling of quantile regression coefficient functions with


censored and truncated data. Biometrics 73: 1179–1188.

Koenker, R., V. Chernozhukov, X. He, and L. Peng, eds. 2018. Handbook of Quantile
Regression. Boca Raton, FL: Chapman & Hall/CRC.

Koenker, R., and B. J. Park. 1996. An interior point algorithm for nonlinear quantile
regression. Journal of Econometrics 71: 265–283.

Machado, J. A. F., P. M. D. C. Parente, and J. M. C. Santos Silva. 2011. qreg2: Stata


module to perform quantile regression with robust and clustered standard errors.
Statistical Software Components S457369, Department of Economics, Boston College.
https: // ideas.repec.org / c / boc / bocode / s457369.html.

Orsini, N., and M. Bottai. 2011. Logistic quantile regression in Stata. Stata Journal 11:
327–344.

About the author


Matteo Bottai is a professor of biostatistics in the Division of Biostatistics at the Institute of
Environmental Medicine, Karolinska Institutet in Stockholm, Sweden.
Nicola Orsini is an associate professor of medical statistics in the Biostatistics Team at the
Department of Public Health Sciences, Karolinska Institutet in Stockholm, Sweden.

You might also like