Multiple Regression Analysis: OLS Asymptotic

Department of Finance & Banking, University of Malaya
Multiple Regression
Analysis: OLS
Asymptotic
Dr. Aidil Rizal Shahrin
aidil_rizal@um.edu.my
October 10, 2020

Contents
1 Introduction
2 Consistency
2.1 Deriving the Inconsistency in OLS
3 Asymptotic Normality and Large Sample Inference

3.1 The Lagrange Multiplier Statistics
2/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

Introduction
i. Previously, what we covered are called finite sample, small

sample, or exact properties of the OLS estimator.
ii. Under the first four Gauss-Markov Assumption, it is a
finite sample property because it holds for any sample size
n (with some restriction that n ≥ k + 1).
iii. The fact that OLS is BLUE under full set Gauss-Markov
assumptions is also finite sample property.
iv. Under Assumption MLR.6, it allows us to derive the exact
sampling distributions of the OLS estimator. The OLS
estimator has normal sampling distribution (Theorem 1
previous topic) due to this assumption which led directly
to the t and F distributions for t and F statistics. If the error
is not normally distributed, the distribution of t is not
exactly t, same goes to F statistic does not have exact F
distribution for any sample size.
Introduction
v. Asymptotic properties or large sample properties of

estimator and test statistic are defined as the sample size
grows without bound.
vi. Luckily, OLS has a satisfactory large sample properties.
vii. On important practicality is that, even without the
normality assumption (MLR.6), t and F statistics have
approximately t and F distributions, at least in large sample
size.

Consistency
i. Unbiasedness of estimators, although important, cannot

always be obtained.
ii. Although not all useful estimators are unbiased,
economists agree that consistency is a minimal
requirement for an estimator.
iii. The intuitive of consistency is simple.
a. Let β̂j be the estimator for βj .
b. For each n, β̂j has a probability distribution.
c. Under Assumption MLR.1-MLR.4, β̂j is unbiased and has
mean value of βj .
d. If this estimator is consistent, the distribution of β̂j becomes
more tightly distributed around βj as n grows.
e. As n tends to infinity, the distribution of β̂j collapse to the
single point βj , or technically it converge to βj . As shown in
Fig.1.
Consistency
f. A sufficient condition to be consistent is that the bias
(E(β̂j ) − βj = bias) and variance should be both tend to zero
as the sample size increase. But it is not necessary condition.
An estimator can still be consistent even the bias does not
tend to zero.
Figure 1: Sampling distribution of β̂1 for sample sizes n1 < n2 < n3

Consistency
Theorem 1: Consistency of OLS

Under Assumptions MLR.1 through MLR.4, the OLS estimator
β̂j is consistent for βj , for all j = 0, 1, . . . , k.
iv. The proof of consistency of OLS:

Pn
(xi1 − x̄1 )yi
β̂1 = Pi=1
n 2
(1)
i=1 (xi1 − x̄1 )
n−1 n (xi1 − x̄1 )ui
P
= β1 + −1 Pi=1 n 2
(2)
n i=1 (xi1 − x̄1 )
where we substitute yi with β0 + β1 xi1 + ui in Eq.1. We

divide both numerator and denominator by n in Eq.2
because it allows us to apply the law of large number
(LLN). We conclude that, the numerator and denominator
Consistency
in second part of Eq.2 converge in probability to the
population quantities of:
n
P
X
n−1 (xi1 − x̄1 )ui −−→ Cov(x1 , u) (3)
i=1
n
P
X
−1
n (xi1 − x̄1 )2 −−→ Var(x1 ) (4)
i=1
v. Provided that Var(x1 ) 6= 0 (which is assumed in MLR.3),

we can use probability limit to get:

Cov(x1 , u)
plim β̂1 = plim β1 +
Var(x1 )

Cov(x1 , u) (5)
= β1 + plim
Var(x1 )
= β1
Consistency
because Cov(x1 , u) = 0. Remember, we assume that
E(u|x1 ) = 0 in Assumption MLR.4, which result in the
covariance between them is zero. And this lead to
important assumption for consistency in OLS as:
Assumption MLR.4’: Zero Mean and Zero Correlation

E(u) = 0 and Cov(xi , u) = 0, for j = 1, 2, . . . , k.
vi. Assumption MLR.4’ is weaker than MLR.4. Because,

having MLR.4 implied MLR.4’.
vii. In MLR.4, the assumption is E(u|x1 , . . . , xk ) = 0, meaning
that any function of the explanatory variables is
uncorrelated with u. Or, u is statistically independent of all
explanatory variables.
viii. But in MLR.4’, we only require that xj is uncorrelated with
u (remember this is linearly only).
Deriving the Inconsistency in OLS
i. Point to remember: If the error is correlated with any of the

independent variables, the OLS is biased and inconsistent.
ii. In simple regression, the inconsistency or asymptotic bias
in β̂1 is
Cov(x1 , u)
plim β̂1 − β1 = (6)
Var(x1 )
Since Var(x1 ) > 0, the inconsistency in β̂1 is positive if x1

and u are positively correlated and vice versa.

Asymptotic Normality and Large Sample
Inference
i. Consistency alone does not allow us to perform statistical

inference.
ii. The exact normality of the OLS estimators depends on the
normality of the distribution of the error u in the
population.
iii. If u is not normal, β̂j will not be normally distributed.
iv. With β̂j not normally distributed, t statistics will not have t
distribution, and F statistics also will not have F
distribution.

Inference
v. Since u is not observed but y does, with Assumption
MLR.6, the normality in u is equivalence to:
y ∼ N(β0 + β1 x1 + β2 x2 + . . . + βk xk , σ 2 )
where σ 2 = σu2 = σy2 . So, we have:
E(y|x) = β0 + β1 x1 + β2 x2 + . . . + βk xk
Var(y|x) = σ 2
vi. To understand this, let us assume k = 2 (assuming

Normality of course). In the Fig.2, it shows that they have
the same distribution (normal) but with difference in
location. However, the σ 2 are the same.

Inference
Figure 2: Probability density function of u and y.

Inference
vii. Whenever y takes on just few values, it cannot have
anything close to a normal distribution. Or sometimes, the
distribution of your y is heavily skewed to the right like in
Fig.3. Since assumption MLR.6 is crucial for t and F test,
what can we do?
Figure 3: Histogram of prate using the data in 401k

Inference
viii. The answer is, we rely on central limit theorem (CLT) to

conclude that the OLS estimators satisfy asymptotic
normality. Meaning that, they are approximately normally
distributed in large enough sample size.
ix. The key point of Theorem 2 (CLT Theorem) is that
Assumptions MLR.6 has been dropped. The only
restriction is that it has finite variance.
x. The key point of Theorem 2 is that, regardless of the
population distribution of u, the OLS estimators, when
properly standardized (point c.), have approximate
standard normal distribution. Thanks to CLT for this!
xi. However, base on Theorem 2, we should now used the
standard normal distribution for inference rather than the t
distribution (point c.).
Inference
xii. But from practical perspective, it just legitimate to write as:
a
(β̂j − βj )/se(β̂j ) ∼ tn−k−1 = tdf
because tdf approaches the N(0, 1) distribution as df gets

large, even when MLR.6 does not hold.
xiii. Caveat: If the sample size is not very large, the t
distribution can be poor approximation of the t statistics
when u is not normally distributed.
xiv. However, there is no rule of thumb on how big the sample
should be before CLT can play the role. Some say n = 30,
but it can’t be sufficient for all distribution of u.
xv. Not just that, quality of approximation also depends on df .
If k is large, need even larger n to use the t approximation.

Inference
xvi. Remember that the Theorem 2 require homoskedasticity
assumption (along with zero conditional assumption). If
Var(y|x) is not constant, the t statistic and confidence
interval are invalid. CLT does not work regardless how big
the sample size is.
xvii. Remember in previous section on sufficient condition for
consistency? OLS is unbiased already if its fulfill first four
Gauss-Markov Assumption and from Theorem 2, the
ˆ β̂j ) shrinks to zero at the rate of 1/n. Which actually
Var(
shown in Theorem 1.
xviii. The asymptotic normality of the OLS estimator also
implies that the F statistics have approximate F
distribution in large sample size.

Inference
Theorem 2: Asymptotic Normality of OLS
Under the Gauss-Markov Assumptions MLR.1 through MLR.5
√ a
a. n(β̂j − βj ) ∼ N(0, σ 2 /a2j ), where σ 2 /a2j > 0 is the asymptotic
√
variance of n(β̂j − βj ); for the slope coefficients,
Pn
aj = plim(n−1 i=1 r̂2ij ), where the r̂ij are the residuals from
regressing xj on other independent variables. We say that β̂j is
asymptotically normally distributed;
b. σ̂ 2 is a consistent estimator of σ 2 = Var(u);
c. For each j,
a
(β̂j − βj )/sd(β̂j ) ∼ N(0, 1)
and
a
(β̂j − βj )/se(β̂j ) ∼ N(0, 1)
where se(β̂j ) is the usual OLS standard error.

The Lagrange Multiplier Statistics
i. Other way to test multiple exclusion restriction (in large

sample), is Lagrange Multiplier (LM) statistic.
ii. The LM statistic we derive here rely on the Gauss-Markov
assumption (without MLR.6).
iii. The advantage of LM over F statistic is that the former only
require estimation of the restricted model only. Let k = 4,
then
y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + u
And interest q = 3 of these has zero population parameters;
or
H0 : β2 = 0, β3 = 0, β4 = 0 (7)
Where the alternate is that any of above does not hold.
Then run the restricted model as below:
y = β̃0 + β̃1 x1 + ũ (8)

The Lagrange Multiplier Statistics
then run regression of:
ũ on x1 , x2 , x3 and x4 (9)
which is called auxiliary regression. If null in Eq.7 is true,

the R2 from Eq.9 should be ’close’ to zero. Because ũ will be
approximately uncorrelated with a all independent
variables.
iv. Under Eq.7:
a
n · R2 ∼ χ2q (10)
where as usual, the null is rejected at significance level if
Eq.10 is larger than the critical value. Remember, the R2 in
Eq.10 is from auxiliary regression in Eq.9.

Multiple Regression Analysis: OLS Asymptotic

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiple Regression Analysis: OLS Asymptotic

Uploaded by

Copyright:

Available Formats

Department of Finance & Banking, University of Malaya

October 10, 2020

3 Asymptotic Normality and Large Sample Inference

2/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

i. Previously, what we covered are called finite sample, small

v. Asymptotic properties or large sample properties of

4/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

i. Unbiasedness of estimators, although important, cannot

Figure 1: Sampling distribution of β̂1 for sample sizes n1 < n2 < n3

Theorem 1: Consistency of OLS

iv. The proof of consistency of OLS:

where we substitute yi with β0 + β1 xi1 + ui in Eq.1. We

v. Provided that Var(x1 ) 6= 0 (which is assumed in MLR.3),

Assumption MLR.4’: Zero Mean and Zero Correlation

vi. Assumption MLR.4’ is weaker than MLR.4. Because,

i. Point to remember: If the error is correlated with any of the

Since Var(x1 ) > 0, the inconsistency in β̂1 is positive if x1

10/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

i. Consistency alone does not allow us to perform statistical

11/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

where σ 2 = σu2 = σy2 . So, we have:

vi. To understand this, let us assume k = 2 (assuming

12/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

Figure 2: Probability density function of u and y.

13/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

Figure 3: Histogram of prate using the data in 401k

viii. The answer is, we rely on central limit theorem (CLT) to

because tdf approaches the N(0, 1) distribution as df gets

16/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

17/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

where se(β̂j ) is the usual OLS standard error.

18/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

i. Other way to test multiple exclusion restriction (in large

y = β̃0 + β̃1 x1 + ũ (8)

then run regression of:

which is called auxiliary regression. If null in Eq.7 is true,

20/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

You might also like