You are on page 1of 22

QUANTILE REGRESSION

Motivation: Linear Regression Modeling and Its Shortcomings

Recall: Ordinary Least Squares Model

Note:
A fundamental aspect of linear-regression models is that they attempt to describe how the location of
the conditional distribution behaves by utilizing the mean of a distribution to represent its central
tendency.
It invokes a homoscedasticity assumption; that is, the conditional variance, Var (y|x), is assumed to be
a constant 2 for all values of the covariate.
A third distinctive feature of the OLS is its normality assumption.
Outliers (cases that do not follow the relationship for the majority of the data) tend to have undue
influence on the fitted regression line.
Consider an extreme situation:

Note that: These results show that the LRM approach can be inadequate for a variety of reasons, including
heteroscedasticity and outlier assumptions and the failure to detect multiple forms of shape shifts.

Source

SS

df

MS

Model
Residual

7.7702e+12
2
3.9306e+13 22621

3.8851e+12
1.7376e+09

Total

4.7076e+13 22623

2.0809e+09

income

Coef.

ed
white
_cons

6313.654
11451.75
-42655.95

Std. Err.
100.8045
799.8409
1442.537

t
62.63
14.32
-29.57

Number of obs
F( 2, 22621)
Prob > F
R-squared
Root MSE

P>|t|

=
22624
= 2235.92
= 0.0000
= 0.1651
= 0.1650
=
41684

[95% Conf. Interval]

0.000
0.000
0.000

6116.07
9884.01
-45483.42

6511.237
13019.5
-39828.47

. estat hettest,normal
Breusch-Pagan / Cook-Weisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of income
chi2(1)
Prob > chi2

=
=

5180.30
0.0000

Ordinary Least Squares Versus Quantile Regression Model

Ordinary Least Squares

Quantile Regression Model

objective function

residuals

estimates

conditional quantile functions,

such as conditional median
functions

allows
heteroskedasticity?

no

yes

distributional
assumptions

normality and homoskedasticity

of error terms

none

comprehensiveness

only yields information about

the conditional mean E(Y|X)

yields information about the

whole conditional distribution of Y

Quantile Regression Model

Proposed by Koenker and Bassett (1978), quantile regression models conditional quantiles as functions of
predictors. It estimates the effect of a covariate on various quantiles in the conditional distribution.

Quantile Regression Estimation

In Quantile Regression, the distance of points from a line is measured using a weighted sum of vertical
distances (without squaring):
points below the fitted line are given a weight 1-p;
points above the fitted line are given a weight p.
Each choice for this proportion p gives rise to a different fitted conditional-quantile function. The task is to find
an estimator with the desired property for each possible p.
The quantile regression is described by the following equation:

where

where the loss function

is defined as

The following are the existing algorithms to obtain the regression estimator:
Simplex Method - for moderate data size
Interior Point Method - for large data size
Interior Point Method with Preprocessing - for very large data sets (n>10 5)
Smoothing Method

Properties of Quantile Regression Estimators

1.
2.
3.
4.

Scale Equivariant
Regression Shift Equivariant
Equivariant to Reparametrization of Design
Equivariant to Monotone Transformation

Inference in Quantile Regression

Methods of Constructing Confidence Intervals
1. Sparsity
- based on the asymptotic distribution of the
: the asymptotic dispersion matrix involves the
reciprocal of the density function of the error terms
- this reciprocal is called the sparsity function and this must be estimated first before confidence
intervals can be constructed
- yields different estimates for the case of i.i.d error terms and for the case of non-i.i.d. error
terms
2. Inversion of Rank Tests
- generalization of sign tests
- based on the relationship between order statistics and rank scores
- involves linear programming (simplex method)
- computationally burdensome for large data sets
3. Bootstrap (Resampling)
- does not make use of any distributional assumption
- the number of resamples, M, is usually between 50 and 200
Recommendation:
Let n be the number of observations and k be the number of parameters.
n 1000 and k 10

Bootstrap

Sparsity

1. Wald Test
Ho:

, where

Test statistic:
where

parameters in .

Ho:

, where

Test statistic:
where

parameters in .

with degrees of freedom equal to the number of

Remark: Koenker and Machado (1999) prove that these two tests are asymptotically equivalent.

Test for Equality of Coefficients Across Quantiles

Let p and q be distinct quantiles.

Ho:

Ha:

Test statistic:

where

.
.

Ho:
Ha:

Test statistic:

where

specified in Ho.

with degrees of freedom equal to the number of parameters

Goodness of Fit
Recall:
In ordinary least squares, the goodness of fit is measured by R 2, the coefficient of determination. It is
interpreted as the proportion of the variation in the dependent variable explained by the predictor variables in
the model.

An analog of the R 2 statistic is developed for quantile-regression models. Since quantile-regression models are
based on minimizing a sum of weighted distances with different weights used depending on whether
or

, goodness of fit is measured that is consistent with this criterion.

Koenker and Machado (1999) suggest measuring goodness of fit by comparing the sum of weighted distances
for the model of interest with the sum in which only the intercept appears. Let
be the sum of weighted
distances for the full pth quantile regression model and let
be the sum of weighted distance for the
model that includes only a constant term.
8 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

and

Then, the goodness of fit is defined as

Since
are nonnegative, R (p) is at most 1. Also,
is greater than or equal to
implying that R (p) is greater than or equal to zero. Hence, R (p) is [0, 1] with larger R (p) indicating better fit.
R (p) allows for comparison of a fitted model with any number of covariates to the model in which only the
intercept is present.
To extend the concept of R (p), relative R (p) is introduced. It measures the fit relative to a more restricted
form of model. It can be expressed as,
where
, sum of weighted distances for the less restricted p th quantile
regression model
, sum of weighted distance for the more restricted model
STATA provides the measure of goodness of fit using R(p) and refers it as pseudo-R2.
Remark:
R(p) accounts for the appropriate weight each observation takes for specific quantile equation. It is easy to
comprehend and its interpretation follows the familiar R-squared for the OLS.
Interpretation of Coefficients
In OLS, fitted coefficients can be interpreted as the estimated change in the mean of the response variable
resulting from one unit increase in a continuous covariate.
9 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

Similarly, the QRM coefficient estimate is interpreted as the estimated change in the pth quantile of the
response variable corresponding to a unit change in the regressor.
Median-Regression Model
The simplest QRM is the median-regression model (MRM), expresses the conditional median of a response
variable given predictor variables and alternative to OLS that fits the conditional mean. MRM and OLS both
attempt to model the central location of a response variable.
Median-regression model is more suitable in modeling the behavior a collection of skewed conditional
distributions. For instance, if these conditional distributions are skewed to the right, their means reflects what
is happening in the upper tail and not in the middle.
Interpretation: In the case of a continuous covariate, the coefficient estimate is interpreted as the change in
the median of the response variable corresponding to a unit change in the predictor.
Using QRM Results to Interpret Shape Shifts
Two of the most important features to consider are scale (spread) and skewness.
The analysis of shape effects reveals more info than analysis of location effects alone.
Arrays of QRM coefficients for a range of quantiles can be used to determine how a one-unit increase in the
covariate affects the shape of the response distribution. This shape shift is highlighted using the graphical
method. For a particular covariate, we plot the coefficients and the confidence envelope, where the predictor
variable effects on the y-axis and the value of p is on the x-axis.

Graphical patterns for the effect of a covariate on the response:

1. A horizontal line indicates a pure location shift by a one-unit increase in the covariate.
2. An upward-sloping curve indicates an increase in the scale
The effect of one unit increase of the regressor is positive for all values of p and steadily
increasing with p
3. Whereas a downward-sloping curve indicates a decrease in the scale of the conditional-response
distribution
Note that regressors are for shape shifts if
whenever p>q.

10 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

>

Scale Shifts
The standard deviation is commonly employed measure of the scale or spread for symmetric distribution For
skewed distributions, the distaces between selected quantiles provide a more informed description of the
spread than the standard deviation. For a value of p between 0 and .5,we identify two sample quantiles:Q(1-p)
and Q(p)(the pth quantile). The pth interquantile range, IQR(p)=Q(1p)Q(p) is a measure of spread. This
quantity describes the range of the middle (12p)
proportion of the distribution.
Suppose the reference group and comparison group have the same median. Fixing some choice of p, we can
measure the interquantile range IQRr = Ur Lr and IQRc = UcLc for the reference group and comparison group
respectively.The difference-in-differences IQRc IQRr as a measure of the scale shift.
The QRM fits provide an alternative approach to estimating scale-shift effects. Here,
is the fitted
coefficient indicating the increase or decrease in any particular quantile brought about by a unit increase in
the covariate. Thus, when we increase the covariate by one unit, the corresponding pth interquantile range
changes by the amount

which is the

When SCS(p) is zero, there is apparently no evidence of scale change. A negative value indicates that increasing
the covariate results in a decrease in scale, while a positive value indicates the opposite effect.

Skewness Shifts
A disproportional scale shift that relates to greater skewness indicates an additional effect on the shape of the
response distribution
Let Mr and Mc indicate the median of the reference and the comparison, respectively. The upper spread is U r
Mr
and Uc Mc for the reference and comparison, respectively. The lower spread is for the reference and M cLc for
the comparison. The disproportion can be measured by taking the ratio of Uc Mc / Ur Mr to McLc / Mr Lr
If this ratio-of-ratios equals 1, then there is no skewness shift. Ifthe ratio-of-ratios is less than 1, the rightskewness is reduced. If the ratio-of ratios is greater than 1, the right-skewness is increased. The shift in terms
of percentage change can be obtained by this quantity minus 1. This is known as quantity skewness shift,or
SKS

11 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

In general, using the QRM coefficients, model-based SKS is obtained. This involves the conditional quantiles of
the reference group. The SKSfor the middle 100(12p)%of the population is:

Note that because we take the ratio of two ratios, SKS effectively eliminates the influence of a proportional
scale shift. When SKS=0, it indicates either no scale shift or a proportional scale shift. SKS<0 indicates a
reduction of right-skewness due to the effect of the explanatory variable whereas SKS>0 indicates an
exacerbation of right-skewness.

Quantile Regression in Stata

Example 1:
income = household income
ed = number of years of education of household head
white = 1 if household head is white, 0 if black
. qreg income ed white
Iteration 1: WLS sum of weighted deviations =
Iteration 1: sum of abs. weighted
Iteration 2: sum of abs. weighted
Iteration 3: sum of abs. weighted
note: alternate solutions exist
Iteration 4: sum of abs. weighted
Iteration 5: sum of abs. weighted
Iteration 6: sum of abs. weighted
note: alternate solutions exist
Iteration 7: sum of abs. weighted
Iteration 8: sum of abs. weighted

6.202e+08

deviations =
deviations =
deviations =

6.202e+08
6.151e+08
6.086e+08

deviations =
deviations =
deviations =

6.043e+08
6.020e+08
6.018e+08

deviations =
deviations =

6.018e+08
6.018e+08

Median regression
Raw sum of deviations 6.68e+08 (about 39977.45)
Min sum of deviations 6.02e+08

income

Coef.

ed
white
_cons

4794.333
9792.334
-29927.67

Std. Err.
91.68188
727.3664
1312.101

t
52.29
13.46
-22.81

P>|t|
0.000
0.000
0.000

Number of obs =
Pseudo R2

22624
0.0985

[95% Conf. Interval]

4614.63
8366.645
-32499.47

4974.036
11218.02
-27355.86

An additional one year of education will increase the median income by about \$4,794. The median income of
whites is \$9,792 higher than that of the blacks. Both ED and WHITE are significant predictors of INCOME based
on the t-statistics. The coefficient for ED in the MRM is lower than the coefficient in the OLS model (\$6,314).
This suggests that while an increase of one year of education gives rise to an average increase of \$6,314 in
income, the increase would not be as substantial for most of the population. Similarly, the coefficient for
white in the MRM is lower than the corresponding coefficient in the OLS model (\$11,452).

. test ed white
( 1)
( 2)

Reject the null hypothesis of

. There is sufficient
evidence to say that ED and WHITE are jointly significant predictors of
INCOME.

ed = 0
white = 0
F(

2, 22621) = 1589.34
Prob > F =
0.0000

Quantile Regression Estimates for Income

ED
WHITE
CONS

.05
1130
3197

.10
1782
4689

.20
2757
6557

.25
3172
6724

.30
3571
7541

.40
4266
8744

.50
4794
9792

.60
5571
11091

.70
6224
11739

.75
6598
12142

.80
6954
12972

.90
8279
14049

.95
9575
17484

-7910

-13536

-20721

- 22986

-25590

-29104

-29928

-33090

-32909

-32344

-30702

-27562

-22126

We see that one more year of education can increase income by \$1,782 at the .10th quantile and \$1,130 at
the .05 th quantile. Examining the estimates of education at the .90 th and .95th quantiles, the coefficient for
the .95th quantile is \$9,575, much larger than at the .90 th quantile (\$8,279). These results suggest the
contribution of prestigious higher education to income disparity.

. sqreg income ed white, quantile(0.1 0.25 0.5 0.75 0.9)

(fitting base model)
(bootstrapping ....................)
Simultaneous quantile regression
bootstrap(20) SEs

Number of obs
.10 Pseudo R2
.25 Pseudo R2
.50 Pseudo R2
.75 Pseudo R2
.90 Pseudo R2

Bootstrap
Std. Err.

income

Coef.

ed
white
_cons

1782.333
4688.667
-13536

59.18355
300.6245
715.7417

ed
white
_cons

3172.222
6723.666
-22985.67

ed
white
_cons

=
=
=
=
=
=

22624
0.0441
0.0726
0.0985
0.1141
0.1208

P>|t|

30.12
15.60
-18.91

0.000
0.000
0.000

1666.329
4099.422
-14938.9

1898.337
5277.912
-12133.1

45.30373
541.5137
814.5297

70.02
12.42
-28.22

0.000
0.000
0.000

3083.424
5662.262
-24582.2

3261.021
7785.07
-21389.13

4794.333
9792.334
-29927.67

51.30182
565.642
570.7646

93.45
17.31
-52.43

0.000
0.000
0.000

4693.778
8683.637
-31046.4

4894.888
10901.03
-28808.93

ed
white
_cons

6598.182
12141.82
-32344.18

169.8196
827.9499
1995.658

38.85
14.66
-16.21

0.000
0.000
0.000

6265.324
10518.98
-36255.81

6931.04
13764.66
-28432.55

ed
white
_cons

8278.88
14049.07
-27561.84

224.7802
1900.115
3388.43

36.83
7.39
-8.13

0.000
0.000
0.000

7838.295
10324.71
-34203.39

8719.465
17773.43
-20920.28

q10

q25

q50

q75

q90

at the .10th and .90th quantiles:

. test [q10]ed=[q90]ed
( 1)

[q10]ed - [q90]ed = 0
F(

1, 22621) =
Prob > F =

780.16
0.0000

Testing for equality of

. test [q10]white=[q90]white
( 1)

[q10]white - [q90]white = 0
F(

1, 22621) =
Prob > F =

14.71
0.0001

and

. test ([q10]ed=[q90]ed) ([q10]white=[q90]white)

( 1)
( 2)

[q10]ed - [q90]ed = 0
[q10]white - [q90]white = 0
F(

2, 22621) =
Prob > F =

395.42
0.0000

The effect of an additional year of education is different for the lower-income bracket and the higher-income
bracket. Likewise, the effect of being white is also different for the lower-income bracket and the higherincome bracket. The joint effect of ED and WHITE is also significant i.e. the effect of an addditional year of
schooling and being white at the .10th quantile differs from the effect at the .90th quantile.

2000

4000

6000
ed

8000

10000

Shape Shifts

.2

.4

.6

.8

The effect of ED can be described as the change in the

income quantile brought about by one additional year of
education, at any level of education, fixing race. The
education effect is significantly positive, because the
confidence envelope does not cross the horizontal zero
line. The graph shows an upward-sloping curve for the
effects of education: the effect of one more year of
schooling is positive for all values of p and steadily
increasing with p. The increase accelerates after
the .80th quantile.

Quantile

25000
0

5000

white
10000 15000

20000

The effect of WHITE can be described as the change in the

income quantile brought about by changing the race from
black to white, fixing the education level. The effect of
being white is significantly positive, as the zero line is far
below the confidence envelope. The graph shows an
upward-sloping curve for the effect of being white as
compared with being black. The slopes below the .15 th
quantile and above the .90th quantile are steeper than
those at the middle quantiles.
0

.2

.4

.6

.8

Quantile

The estimate is monotonically increasing with p. This tells us that an additional year of education or changing
race from black to white has a greater effect on income for higher-income brackets than for lower-income
brackets. The monotonicity also has scale-effect implications. Changing race from black to white or adding a
year of education increases the scale of the response.

Shape Shifts: Scale Shifts

pth Interquantile Range

SCS (ED)

SCS (WHITE)

0.25:

3426

5418

0.10:

6497

9360

0.05:

8445

14287

The scale shift brought about by one more year of schooling for the middle 50% of the population is \$3,426.
One more year of schooling increases the scale of income by \$6,497 for the middle 80% of the population, and
by \$8,445 for the middle 90% of the population. Controlling for education, whites income spread is higher
than blacks income spread by: \$5,418 for the middle 50% of the population, \$9,360 for the middle 80%, and
\$14,287 for the middle 90%.

Note: The

middle 100(1-2p)% of the

population
middle 50% (p=0.25)

SKS (ED)

SKS (WHITE)

-0.047

-0.087

-0.037

-0.085

middle 90% (p=0.05)

-0.016

-0.066

One more year of schooling reduces right-skewness by 1.6% for the middle 90% of the population, 3.7% for
the middle 80% and 4.7% for the middle 50%. The impact of being white also decreases right-skewness by 6.6%
for the middle 90%, 8.5% for the middle 80% and 8.7% for the middle 50%. This finding indicates a greater
expansion of the white upper middle class than the black upper middle class.

Summary:
One more year of education induces a positive location and scale shift but a negative skewness shift. Similarly,
being white induces a positive location and scale shift with a negative skewness shift. The model suggests that
while higher education and being white are associated with a higher median income and a wider income
spread, the income distributions for the less educated and for the blacks are more skewed.

Quantile Regression in SAS

Example 2:
Murders number of murders per 1,000,000 inhabitants per annum
Inhabitants number of inhabitants
Income Percentage of families with incomes below \$5000
Unemp Percentage of unemployed inhabitants

PROC QUANTREG DATA = sample CI = rank;

MODEL murders = inhabitants income unemp/quantile = 0.05 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.95 plot =
quantplot;
TEST inhabitants income unemp/ wald lr;
RUN;

Note: If we consider all quantiles, the rank option for computing confidence intervals is not available. (You
may use only sparsity and resampling.) Likewise, it is not possible to use Wald and Likelihood Ratio Tests.

Quantile Regression Estimates for Number of Murders

0.05

0.10

0.20

0.25

0.30

0.40

0.50

0.60

0.70

0.75

0.80

0.90

0.95

Intercept

-58.38

-58.38

-59.91

-39.30

-37.18

-46.68

-67.90

-86.95

-76.14

-103.34

-103.42

-104.40

-164.52

Inhabitants
income
unemp

1.96

1.96

1.88

0.72

0.63

1.22

1.86

3.28

3.05

5.07

5.07

5.03

9.41

0.86
4.36

0.86
4.36

1.12
4.04

1.53
2.44

1.34
2.76

1.44
2.88

1.39
5.06

1.26
5.46

1.10
5.08

1.04
5.25

1.04
5.26

1.12
5.31

1.06
5.72

An additional inhabitant will increase the median number of murders by 1.86; a unit increase in the
percentage of families with incomes below \$5000 will increase the median number of murders by 1.39; a unit
increase in the percentage of unemployed inhabitants will increase the median number of murders by 5.06.

Ho:

Test Results
Quantile Test

0.05

Wald

955.7538

955.75

<.0001

0.10

Wald

144.0549

144.05

<.0001

0.10

309.54

<.0001

0.20

Wald

60.6047

60.60

<.0001

0.20

45.29

<.0001

0.30

Wald

55.0154

55.02

<.0001

0.30

33.00

<.0001

0.40

Wald

32.7730

32.77

<.0001

0.40

37.22

<.0001

0.50

Wald

58.0711

58.07

<.0001

0.50

37.88

<.0001

0.60

Wald

96.7067

96.71

<.0001

0.60

36.74

<.0001

0.70

Wald

139.95

<.0001

0.70

24.65

<.0001

0.80

Wald

233.48

<.0001

139.9484

233.4782

Test Results
Quantile Test

0.80

31.72

<.0001

0.90

Wald

1267.92

<.0001

0.90

Likelihood Ratio 26.3500

26.35

<.0001

0.95

Wald

978.71

<.0001

1267.9173

978.7139

For all quantiles in consideration, there is sufficient evidence to conclude that the number of inhabitants, the
percentage of families with incomes below \$5000, and the percentage of unemployed inhabitants are jointly
significant predictors of the number of murders.

Test for Equality of Coefficients

PROC QUANTREG DATA = sample CI = rank;
MODEL murders = inhabitants income unemp/quantile = 0.75 0.8;
TEST inhabitants income unemp/qinteract;
RUN;
Test Results Equal Coefficients
Across Quantiles
Chi-Square

DF Pr > ChiSq

0.0056

0.9999

Thus, there is no sufficient evidence to conclude that the coefficents for the 0.75 th and the 0.8 th quantile
jointly differ.

20 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

Shape Shifts

The effect of inhabitants on the number of murders is only significant from around the 0.5 th quantile onwards.
The effect of income on the number of murders is only significant until somewhere around the 0.7 th quantile.
The effect of the unemployment on the number of murders is only significant until somewhere around the
0.5th quantile. Thus, the lower quantiles of income and unemployment significantly affect the number of
murders while the upper quantiles of the number of inhabitants significantly affect the number of murders.
Scale Shifts
pth interquartile
range
0.25:
0.10:
0.05:

SCS(inhabitants)

SCS(income)

SCS(unemp)

4.3518
3.0699
7.455

-0.4872
0.2602
0.2017

2.8115
0.9506
1.3628

An additional inhabitant increases the scale of the number of murders by 4.3518 for the middle 50% of the
population, by 3.0699 for the middle 80% of the population, and by 7.455 for the middle 90% of the
population. A unit increase in the percentage of families with incomes below \$5000 decreases the scale of the
number of murders by 0.4872 for the middle 50% of the population, while it increases the scale of the number
of murders by 0.2602 for the middle 80% of the population, and by 0.2017 for the middle 90% of the
population. A unit increase in the percentage of unemployed inhabitants increases the scale of the number of
murders by 2.8115 for the middle 50% of the population, by 0.9506 for the middle 80% of the population, and
by 1.3628 for the middle 90% of the population.

21 CELOSO | LIBRES | MARCELINO | RIGODON | SAMILEY

Skewness Shifts
middle 100(1-2p)% of
the population
middle 50% (p=0.25)
middle 80% (p=0.10)
middle 90% (p=0.05)

SKS(inhabitants)
-0.50852
0.100869
0.520159

SKS(income)
0.114957637
-0.465156696
-0.403896185

SKS(unemp)
-0.73607
-0.44369
-0.36112

An additional inhabitant reduces the right-skewness by 50.9% for the middle 50% of the population, while it
increases the right-skewness by 10.1% for the middle 80% and by 52% for the middle 90%. A unit increase in
the percentage of families with incomes below \$5000 increases the right-skewness by 11.5% for the middle 50%
of the population, while it reduces the right-skewness by 46.5% for the middle 80% and by 40.4% for the
middle 90%. A unit increase in the percentage of unemployed inhabitants reduces the right-skewness by 73.6%
for the middle 50% of the population, by 44.4% for the middle 80%, and by 36.1% for the middle 90%.