You are on page 1of 20

Department of Finance & Banking, University of Malaya

Multiple Regression
Analysis: OLS
Asymptotic
Dr. Aidil Rizal Shahrin
aidil_rizal@um.edu.my

October 10, 2020


Contents

1 Introduction

2 Consistency
2.1 Deriving the Inconsistency in OLS

3 Asymptotic Normality and Large Sample Inference


3.1 The Lagrange Multiplier Statistics

2/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Introduction

i. Previously, what we covered are called finite sample, small


sample, or exact properties of the OLS estimator.
ii. Under the first four Gauss-Markov Assumption, it is a
finite sample property because it holds for any sample size
n (with some restriction that n ≥ k + 1).
iii. The fact that OLS is BLUE under full set Gauss-Markov
assumptions is also finite sample property.
iv. Under Assumption MLR.6, it allows us to derive the exact
sampling distributions of the OLS estimator. The OLS
estimator has normal sampling distribution (Theorem 1
previous topic) due to this assumption which led directly
to the t and F distributions for t and F statistics. If the error
is not normally distributed, the distribution of t is not
exactly t, same goes to F statistic does not have exact F
distribution for any sample size.
3/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Introduction

v. Asymptotic properties or large sample properties of


estimator and test statistic are defined as the sample size
grows without bound.
vi. Luckily, OLS has a satisfactory large sample properties.
vii. On important practicality is that, even without the
normality assumption (MLR.6), t and F statistics have
approximately t and F distributions, at least in large sample
size.

4/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Consistency

i. Unbiasedness of estimators, although important, cannot


always be obtained.
ii. Although not all useful estimators are unbiased,
economists agree that consistency is a minimal
requirement for an estimator.
iii. The intuitive of consistency is simple.
a. Let β̂j be the estimator for βj .
b. For each n, β̂j has a probability distribution.
c. Under Assumption MLR.1-MLR.4, β̂j is unbiased and has
mean value of βj .
d. If this estimator is consistent, the distribution of β̂j becomes
more tightly distributed around βj as n grows.
e. As n tends to infinity, the distribution of β̂j collapse to the
single point βj , or technically it converge to βj . As shown in
Fig.1.
5/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Consistency
f. A sufficient condition to be consistent is that the bias
(E(β̂j ) − βj = bias) and variance should be both tend to zero
as the sample size increase. But it is not necessary condition.
An estimator can still be consistent even the bias does not
tend to zero.

Figure 1: Sampling distribution of β̂1 for sample sizes n1 < n2 < n3


6/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Consistency

Theorem 1: Consistency of OLS


Under Assumptions MLR.1 through MLR.4, the OLS estimator
β̂j is consistent for βj , for all j = 0, 1, . . . , k.

iv. The proof of consistency of OLS:


Pn
(xi1 − x̄1 )yi
β̂1 = Pi=1
n 2
(1)
i=1 (xi1 − x̄1 )
n−1 n (xi1 − x̄1 )ui
P
= β1 + −1 Pi=1 n 2
(2)
n i=1 (xi1 − x̄1 )

where we substitute yi with β0 + β1 xi1 + ui in Eq.1. We


divide both numerator and denominator by n in Eq.2
because it allows us to apply the law of large number
(LLN). We conclude that, the numerator and denominator
7/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Consistency
in second part of Eq.2 converge in probability to the
population quantities of:
n
P
X
n−1 (xi1 − x̄1 )ui −−→ Cov(x1 , u) (3)
i=1
n
P
X
−1
n (xi1 − x̄1 )2 −−→ Var(x1 ) (4)
i=1

v. Provided that Var(x1 ) 6= 0 (which is assumed in MLR.3),


we can use probability limit to get:
 
Cov(x1 , u)
plim β̂1 = plim β1 +
Var(x1 )
 
Cov(x1 , u) (5)
= β1 + plim
Var(x1 )
= β1
8/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Consistency
because Cov(x1 , u) = 0. Remember, we assume that
E(u|x1 ) = 0 in Assumption MLR.4, which result in the
covariance between them is zero. And this lead to
important assumption for consistency in OLS as:

Assumption MLR.4’: Zero Mean and Zero Correlation


E(u) = 0 and Cov(xi , u) = 0, for j = 1, 2, . . . , k.

vi. Assumption MLR.4’ is weaker than MLR.4. Because,


having MLR.4 implied MLR.4’.
vii. In MLR.4, the assumption is E(u|x1 , . . . , xk ) = 0, meaning
that any function of the explanatory variables is
uncorrelated with u. Or, u is statistically independent of all
explanatory variables.
viii. But in MLR.4’, we only require that xj is uncorrelated with
u (remember this is linearly only).
9/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Deriving the Inconsistency in OLS

i. Point to remember: If the error is correlated with any of the


independent variables, the OLS is biased and inconsistent.
ii. In simple regression, the inconsistency or asymptotic bias
in β̂1 is
Cov(x1 , u)
plim β̂1 − β1 = (6)
Var(x1 )

Since Var(x1 ) > 0, the inconsistency in β̂1 is positive if x1


and u are positively correlated and vice versa.

10/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Asymptotic Normality and Large Sample
Inference

i. Consistency alone does not allow us to perform statistical


inference.
ii. The exact normality of the OLS estimators depends on the
normality of the distribution of the error u in the
population.
iii. If u is not normal, β̂j will not be normally distributed.
iv. With β̂j not normally distributed, t statistics will not have t
distribution, and F statistics also will not have F
distribution.

11/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Asymptotic Normality and Large Sample
Inference
v. Since u is not observed but y does, with Assumption
MLR.6, the normality in u is equivalence to:

y ∼ N(β0 + β1 x1 + β2 x2 + . . . + βk xk , σ 2 )

where σ 2 = σu2 = σy2 . So, we have:

E(y|x) = β0 + β1 x1 + β2 x2 + . . . + βk xk
Var(y|x) = σ 2

vi. To understand this, let us assume k = 2 (assuming


Normality of course). In the Fig.2, it shows that they have
the same distribution (normal) but with difference in
location. However, the σ 2 are the same.

12/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Asymptotic Normality and Large Sample
Inference

Figure 2: Probability density function of u and y.

13/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Asymptotic Normality and Large Sample
Inference
vii. Whenever y takes on just few values, it cannot have
anything close to a normal distribution. Or sometimes, the
distribution of your y is heavily skewed to the right like in
Fig.3. Since assumption MLR.6 is crucial for t and F test,
what can we do?

Figure 3: Histogram of prate using the data in 401k


14/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Asymptotic Normality and Large Sample
Inference

viii. The answer is, we rely on central limit theorem (CLT) to


conclude that the OLS estimators satisfy asymptotic
normality. Meaning that, they are approximately normally
distributed in large enough sample size.
ix. The key point of Theorem 2 (CLT Theorem) is that
Assumptions MLR.6 has been dropped. The only
restriction is that it has finite variance.
x. The key point of Theorem 2 is that, regardless of the
population distribution of u, the OLS estimators, when
properly standardized (point c.), have approximate
standard normal distribution. Thanks to CLT for this!
xi. However, base on Theorem 2, we should now used the
standard normal distribution for inference rather than the t
distribution (point c.).
15/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
Asymptotic Normality and Large Sample
Inference
xii. But from practical perspective, it just legitimate to write as:
a
(β̂j − βj )/se(β̂j ) ∼ tn−k−1 = tdf

because tdf approaches the N(0, 1) distribution as df gets


large, even when MLR.6 does not hold.
xiii. Caveat: If the sample size is not very large, the t
distribution can be poor approximation of the t statistics
when u is not normally distributed.
xiv. However, there is no rule of thumb on how big the sample
should be before CLT can play the role. Some say n = 30,
but it can’t be sufficient for all distribution of u.
xv. Not just that, quality of approximation also depends on df .
If k is large, need even larger n to use the t approximation.

16/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Asymptotic Normality and Large Sample
Inference
xvi. Remember that the Theorem 2 require homoskedasticity
assumption (along with zero conditional assumption). If
Var(y|x) is not constant, the t statistic and confidence
interval are invalid. CLT does not work regardless how big
the sample size is.
xvii. Remember in previous section on sufficient condition for
consistency? OLS is unbiased already if its fulfill first four
Gauss-Markov Assumption and from Theorem 2, the
ˆ β̂j ) shrinks to zero at the rate of 1/n. Which actually
Var(
shown in Theorem 1.
xviii. The asymptotic normality of the OLS estimator also
implies that the F statistics have approximate F
distribution in large sample size.

17/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


Asymptotic Normality and Large Sample
Inference
Theorem 2: Asymptotic Normality of OLS
Under the Gauss-Markov Assumptions MLR.1 through MLR.5
√ a
a. n(β̂j − βj ) ∼ N(0, σ 2 /a2j ), where σ 2 /a2j > 0 is the asymptotic

variance of n(β̂j − βj ); for the slope coefficients,
Pn
aj = plim(n−1 i=1 r̂2ij ), where the r̂ij are the residuals from
regressing xj on other independent variables. We say that β̂j is
asymptotically normally distributed;
b. σ̂ 2 is a consistent estimator of σ 2 = Var(u);
c. For each j,
a
(β̂j − βj )/sd(β̂j ) ∼ N(0, 1)
and
a
(β̂j − βj )/se(β̂j ) ∼ N(0, 1)

where se(β̂j ) is the usual OLS standard error.

18/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme


The Lagrange Multiplier Statistics

i. Other way to test multiple exclusion restriction (in large


sample), is Lagrange Multiplier (LM) statistic.
ii. The LM statistic we derive here rely on the Gauss-Markov
assumption (without MLR.6).
iii. The advantage of LM over F statistic is that the former only
require estimation of the restricted model only. Let k = 4,
then
y = β0 + β1 x1 + β2 x2 + β3 x3 + β4 x4 + u
And interest q = 3 of these has zero population parameters;
or
H0 : β2 = 0, β3 = 0, β4 = 0 (7)
Where the alternate is that any of above does not hold.
Then run the restricted model as below:

y = β̃0 + β̃1 x1 + ũ (8)


19/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme
The Lagrange Multiplier Statistics

then run regression of:

ũ on x1 , x2 , x3 and x4 (9)

which is called auxiliary regression. If null in Eq.7 is true,


the R2 from Eq.9 should be ’close’ to zero. Because ũ will be
approximately uncorrelated with a all independent
variables.
iv. Under Eq.7:
a
n · R2 ∼ χ2q (10)
where as usual, the null is rejected at significance level if
Eq.10 is larger than the critical value. Remember, the R2 in
Eq.10 is from auxiliary regression in Eq.9.

20/20 Aidil Rizal Shahrin University of Malaya Unofficial Beamer Theme

You might also like