QUANTILE REGRESSION-III
Dr. Muhammad Shahadat Hossain Siddiquee
Professor, Department of Economics
University of Dhaka
Email: [Link]@[Link]
Cell: +8801719397749
Starting Points of QR: Theoretical Background
• Classical regression focuses on the expectation (mean) of the variable Y,
E(Y|X), given a collection of variables X.
• It only provides information about a specific region of the conditional
distribution of Y.
• Quantile regression (QR) extends this approach by allowing the study of the
conditional distribution of Y on X at different locations, providing a global
view of the relationship between Y and X.
• To illustrate this point, QR can be compared to classical regression in the
same way that quantiles describe the location of a distribution compared to
the mean.
• QR was first developed by Koenker and Bassett (1978), who extended the
ordinary least squares (OLS) estimate of the conditional mean models to
conditional quantile functions.
• QR function takes a probability value (p, where 0 ≤ p ≤ 1) as input and
returns the value of the random variable below which that probability lies.
For example, if Q(0.75) = 60, it means that 75% of the data values are below 60.
Theoretical Background: QR…
• The estimation of conditional quantile functions can be formulated as an
optimization problem, which allows QR to utilize mathematical techniques
commonly used for the conditional mean function.
• Compare mean and quantiles while considering their objective functions.
• Assuming Y is a generalized random variable (i.e., A generalized random
variable, or variate, extends the concept of a random variable by allowing for a
broader range of probabilistic experiments and outcomes, without being tied
to a specific type of experiment), the mean is determined as the point c in the
distribution where the squared sum of deviations is minimized, which
corresponds to the solution of the following minimization problem (Davino et
al., 2013):
• In mathematics, the arguments of the minima (abbreviated arg min or argmin)
is the input points at which a function output value is minimized.
Theoretical Background: QR…
• The median, instead, minimizes the absolute sum of deviations. In
terms of a minimization problem, the median is thus:
• We can get the sample estimators 𝜇̂ and 𝑀𝑒̂ for such centers using the
sample observations.
• It is common knowledge that the univariate quantiles are identified by
specific points in the distribution. For example, the 𝜃th quantile is the
value of y for which 𝑃(𝑌 ≤ 𝑦) = 𝜃.
• Taking the cumulative distribution function (CDF) as a starting point:
Theoretical Background: QR…
• The quantile function is defined as Inverse of Cumulative Distribution
Function (CDF) such as:
For θ ∈ [0, 1]. Where, “inf” stands for infimum, which is the greatest lower
bound of a set or function. If F(.) is strictly increasing and continuous, then
is the unique real number y such that F(y) = θ (Gilchrist, 2000: 13).
• The CDF, F(y), gives the probability that a random variable Y is less than or
equal to y.
• The quantile function, Q(p), finds the value y such that F(y) = p.
• It is explained in our first lecture that the quantile function is also known
as the percentile function, percent-point function, or inverse cumulative
distribution function.
Theoretical Background: QR…
• Less frequently, quantiles are shown as specific distributional centers,
minimizing the weighted absolute sum of deviations (Hao and
Naiman, 2007). In such a view the 𝜃-th quantile is thus:
Where denotes the following loss function (Davino, et al., 2013).
• It follows that this loss function is an asymmetric absolute loss
function, which is a weighted sum of absolute deviations with (1 − θ)
weight applied to the negative deviations and (θ) weight applied to
the positive deviations.
Theoretical Background: QR…
• In the case of a discrete variable Y with probability distribution 𝑓(𝑦) =
𝑃(Y = 𝑦), the previous minimization problem becomes:
• The same criterion is adopted in the case of a continuous random
variable substituting summation with integrals:
Theoretical Background: QR…
• Where 𝑓(𝑦) denotes the probability density function of Y.
• The sample estimator 𝑞 for θ ∈ [0, 1] is likewise obtained using the
sample information in the previous formula.
• Finally, it is straightforward to say that for θ = 0.5 we obtain the
median solution.