You are on page 1of 61

Multiple Regression Analysis

y = 0 + 1x1 + 2x2 + . . . kxk + u

2. Statistical Inference
(Hypothesis Testing)

2011-3-28 Lectured by Dr. Jin Hongfei 1


Multiple Regression: Recap
Linear in parameters: y = 0 + 1x1 + 2x2
+…+ kxk + u [MLR.1]
{(xi1, xi2,…, xik, yi): i=1, 2, …, n} is a random
sample from the population model, so that
yi = 0 + 1xi1 + 2xi2 +…+ kxik + ui [MLR.2]
E(u|x1, x2,… xk) = E(u) = 0. Conditional mean
independence [MLR.3]
No exact multicollinearity [MLR.4]
V(u|x) = V(u) = σ2. Homoscedasticity. [MLR.5]

2011-3-28 Lectured by Dr. Jin Hongfei 2


Multiple Regression: Recap
Estimation by Ordinary Least Squares (OLS) leads
to the fitted regression “line”

yˆ  ˆ0  ˆ1 x1  ˆ2 x2    ˆk xk


Where ˆ j is the OLS estimate of  j

When we consider this as an estimator,



E  ˆ j    j
2

Var  ˆ j  
SSTj 1  R j2 

2011-3-28 Lectured by Dr. Jin Hongfei 3


Multiple Regression: Recap
ˆ 2   uˆ i2 /( n  k  1 )  SSR / df
ˆ
thus, se (  j )   /[ SST j (1  R )]
ˆ 2 1/ 2

se  ˆ j 
is the (estimated)
standard error of ˆ j

Previous results on goodness of fit


and functional form apply
2011-3-28 Lectured by Dr. Jin Hongfei 4
Statistical Inference: Overview
• The statistical properties of the least squares
estimators derive from the assumptions of the model
• These properties tell us something about the
optimality of the estimators (Gauss-Markov)
• But also provide the foundation for the process of
statistical inference:
• “How confident are we about the estimates that we
have obtained?”.
2011-3-28 Lectured by Dr. Jin Hongfei 5
Statistical Inference: Overview
• Suppose we have estimated the model:

• We could have got the value of 0.74 for the


coefficient on education by chance. How confident
are we that the true parameter value is not 0.8 or 1.5
or -3.4 or 0?
• Statistical inference addresses this kind of question.

2011-3-28 Lectured by Dr. Jin Hongfei 6


Assumptions of the Classical
Linear Model (CLM)
So far, we know that given the Gauss-Markov
assumptions, OLS is BLUE,
In order to do classical hypothesis testing, we
need to add another assumption (beyond the
Gauss-Markov assumptions)
Assume that u is independent of x1, x2,…, xk and u
is normally distributed with zero mean and
variance 2: u ~ Normal(0,2) [MLR.6]
Adding MLR.6 gives the CLM assumptions

2011-3-28 Lectured by Dr. Jin Hongfei 7


Outline of Lecture
1. tests of a single linear restriction (t-tests),
e.g.  j  0j

2. tests of multiple linear restrictions (F-


tests) e.g. i   j  0

3. [if time] tests of linear combinations (t-


tests, also possible using 2. above) e.g.
0
i   j  
2011-3-28 Lectured by Dr. Jin Hongfei 8
CLM Assumptions (cont)
Under CLM, OLS is not only BLUE, but is the
minimum variance unbiased estimator
We can summarize the population assumptions of
CLM as follows:
y|x ~ Normal(0 + 1x1 +…+ kxk, 2)
While we assume normality, sometimes that is
not the case
Large samples will let us drop the normality
assumption

2011-3-28 Lectured by Dr. Jin Hongfei 9


Normal Sampling Distributions
Under theCLM assumptions, conditional on
the sample values of the independent variables
j 
ˆ ~ Normal  ,Var ˆ , so that
j   j

ˆ j   
  ~ Normal 0,1
j
sd ˆ j
ˆ j is distributed normallybecause it
is a linear combination of the errors
2011-3-28 Lectured by Dr. Jin Hongfei 10
The t Test
Under the CLM assumptions
 ˆ j   j  ~t
 
se ˆ j
n k 1

Note this is a t distribution (vs normal)


because we have to estimate  by 
ˆ 2 2

Note the degrees of freedom: n  k  1


2011-3-28 Lectured by Dr. Jin Hongfei 11
The t Distribution…
looks like the standard normal except it has fatter
tails
is a family of distributions characterized by
degrees of freedom
gets more like the standard normal as degrees of
freedom increase
is indistinguishable from standard normal when
df greater than 120

2011-3-28 Lectured by Dr. Jin Hongfei 12


The t Test (cont)
Start with a null hypothesis
An important null hypothesis, H0: j = 0
If this null is true, then xj has no effect on y,
controlling for other x’s
If H0 true then xj should be excluded from
the model

2011-3-28 Lectured by Dr. Jin Hongfei 13


The t Test: Null Hypothesis
The null hypothesis is a maintained or status
quo view of the world.
We reject the null only if there is a lot of
evidence against it.
Analogy with the presumption of innocence in
English law.
Only reject if ˆ j is “sufficiently far” from zero.

2011-3-28 Lectured by Dr. Jin Hongfei 14


The t Test (cont)
• We want to make Pr(Reject
H0|H0 true) “sufficiently small”.
• Need the distribution of ˆ j if the
null is true.
ˆ j
• Know distribution of tˆ j  ˆ .
se   
j
2011-3-28 Lectured by Dr. Jin Hongfei 15
t Test: Significance Level
If we want to have only a 5% probability of
rejecting H0 if it is really true, then we say
our significance level is 5%.
The significance level is often denoted α.
Significance levels usually chosen to be 1%,
5% or 10%.
5% most common or default value.

2011-3-28 Lectured by Dr. Jin Hongfei 16


t Test: Alternative Hypotheses
Besides our null, H0, we need an alternative
hypothesis, H1, and a significance level (α)
H1 may be one-sided, or two-sided
H1: j > 0 and H1: j < 0 are one-sided
H1: j  0 is a two-sided alternative
If we want to have only a 5% probability of
rejecting H0 if it is really true, then we say our
significance level is 5%

2011-3-28 Lectured by Dr. Jin Hongfei 17


One-Sided Alternative: βj > 0
Consider the alternative H1: βj > 0
Having picked a significance level, , we look up
the (1 – )th percentile in a t distribution with n – k
– 1 df and call this c, the critical value
We can reject the null hypothesis in favour of the
alternative hypothesis if the observed t statistic is
greater than the critical value
If the t statistic is less than the critical value then
we do not reject the null

2011-3-28 Lectured by Dr. Jin Hongfei 18


One-Sided Alternative: βj > 0
yi = 0 + 1xi1 + … + kxik + ui

H0: j = 0 H1: j > 0

Fail to reject
reject
  
0 c
2011-3-28 Lectured by Dr. Jin Hongfei 19
One-sided (tailed) vs Two-sided
Because the t distribution is symmetric, testing H1:
j < 0 is straightforward. The critical value is just
the negative of before
We can reject the null if the t statistic < –c, and if
the t statistic >–c then we fail to reject the null
For a two-sided test, we set the critical value
based on /2 and reject H1: j  0 if the absolute
value of the t statistic > c (|t| > c)

2011-3-28 Lectured by Dr. Jin Hongfei 20


Two-Sided Alternatives: j ≠ 0
yi = 0 + 1xi1 + … + kxik + ui

H0: j = 0 H1: j ≠ 0
fail to reject

reject reject
   
-c 0 c
2011-3-28 Lectured by Dr. Jin Hongfei 21
Summary for H0: j = 0
Unless otherwise stated, the alternative is
assumed to be two-sided
If we reject the null, we typically say “xj is
statistically significant at the level”
If we fail to reject the null, we typically say
“xj is statistically insignificant at the 
level”

2011-3-28 Lectured by Dr. Jin Hongfei 22


Testing other hypotheses
A more general form of the t statistic
recognizes that we may want to test
something like H0: j =  j0

In this case, the appropriate t statistic is

t
 ˆ j   0 j  , where
 
se ˆ j
 0  0 for the standard test
j

2011-3-28 Lectured by Dr. Jin Hongfei 23


Confidence Intervals
Another way to use classical statistical testing is
to construct a confidence interval using the same
critical value as was used for a two-sided test
A (1 - ) confidence interval is defined as
ˆ   ˆ  
 j  c  se  j , wherec is the 1 -  percentile
 2
in a tnk 1 distribution
Interpretation (loose): “We are 95% confident the
true parameter lies in the interval”. (α=5%)
Interpretation (better): In repeated samples the true
parameter will lie in the interval 95% of the time.
2011-3-28 Lectured by Dr. Jin Hongfei 24
Computing p-values for t tests
An alternative to the classical approach is to ask,
“what is the smallest significance level at which the
null would be rejected?”
So, compute the t statistic, and then look up what
percentile it is in the appropriate t distribution – this
is the p-value
p-value is the probability we would observe the t
statistic at least as large as we did, if the null were
true
Smaller p-values mean a “more significant”
regressor.
p < 0.05 means reject at 5% significance level
2011-3-28 Lectured by Dr. Jin Hongfei 25
Eviews and p-values, t tests, etc.
Most computer packages will compute the
p-value for you, assuming a two-sided test
If you really want a one-sided alternative,
just divide the two-sided p-value by 2
Eviews provides the t statistic and p-value
for H0: j = 0 for you, in columns labeled “t-
Statistic” and “Prob.” respectively

2011-3-28 Lectured by Dr. Jin Hongfei 26


Example (4.1)
log(wage)=β0 + β1educ + β2exper +
β3tenure + u
H0:β2 = 0 H1:β2≠0
Null says that experience does not affect
the expected log wage
Using data from wage1.wfl we obtain the
Eviews output…

2011-3-28 Lectured by Dr. Jin Hongfei 27


2011-3-28 Lectured by Dr. Jin Hongfei 28
Estimated Model: Equation Form

This is a standard way of writing estimated regression


models in equation form.

2011-3-28 Lectured by Dr. Jin Hongfei 29


The hypothesis test
H0:β2 = 0 H1:β2 ≠ 0
We have n-k-1=526-3-1=522 degrees of freedom
Reject if |t| > c
Can use standard normal critical values
c = 1.96 for a two-tailed test at 5% significance
level
t = 0.0041/0.0017 = 2.41 > 1.96
So reject H0
Interpretation: experience has a statistically
significant impact on the expected log wage.

2011-3-28 Lectured by Dr. Jin Hongfei 30


Notice…

Eviews reports this t ratio (2.39), a p-value for this


test (0.017) for the unknown parameter β2

2011-3-28 Lectured by Dr. Jin Hongfei 31


Extensions to example
Claim: the returns to education are less
than 10%
Test this formally
H0:β1 = 0.1 H1:β1 < 0.1
Notice two things have changed
 A non-zero value under the null
 A one-tailed alternative

2011-3-28 Lectured by Dr. Jin Hongfei 32


Extension continued
Reject null hypothesis if t<-c = -1.645 (5% test)
t=(0.092029-0.1)/0.0073299=-1.0875
So we do not reject the null hypothesis
No evidence to suggest returns to education are
less than 10%
Note: Eviews does not report the t-statistic or the
p-value for this test - you have to do the work by
yourself

2011-3-28 Lectured by Dr. Jin Hongfei 33


Multiple Linear Restrictions
We may want to jointly test multiple
hypotheses about our parameters
A typical example is testing “exclusion
restrictions” – we want to know if a group
of parameters are all equal to zero
If we fail to reject then those associated
explanatory variables should be excluded
from the model

2011-3-28 Lectured by Dr. Jin Hongfei 34


Testing Exclusion Restrictions
The null hypothesis might be something
like H0: k-q+1 = 0, , k = 0
The alternative is just H1: H0 is not true
This means that at least one of the
parameters is not zero in the population
Can’t just check each t statistic separately,
because we want to know if the q
parameters are jointly significant at a given
level – it is possible for none to be
individually significant at that level
2011-3-28 Lectured by Dr. Jin Hongfei 35
Exclusion Restrictions (cont)
To do the test we need to estimate the “restricted
model” without xk-q+1,, …, xk included, as well as
the “unrestricted model” with all x’s included
Intuitively, we want to know if the change in SSR
is big enough to warrant inclusion of xk-q+1,, …, xk

F
SSRr  SSRur  q
, where
SSRur n  k  1
r is restricted and ur is unrestricted
2011-3-28 Lectured by Dr. Jin Hongfei 36
The F statistic
The F statistic is always positive, since the
SSR from the restricted model can’t be less
than the SSR from the unrestricted
Essentially the F statistic is measuring the
relative increase in SSR when moving from
the unrestricted to restricted model
q = number of restrictions, or dfr – dfur
n – k – 1 = dfur

2011-3-28 Lectured by Dr. Jin Hongfei 37


The F statistic (cont)
To decide if the increase in SSR when we
move to a restricted model is “big enough”
to reject the exclusions, we need to know
about the sampling distribution of our F stat
Not surprisingly, F ~ Fq,n-k-1, where q is
referred to as the numerator degrees of
freedom and n – k – 1 as the denominator
degrees of freedom
2011-3-28 Lectured by Dr. Jin Hongfei 38
The F statistic (cont)
f(F) Reject H0 at 
fail to reject significance level
if F > c

reject
  
0 c F
2011-3-28 Lectured by Dr. Jin Hongfei 39
Example (4.9)
bwght = β0 + β1cigs + β2parity +
β3faminc + β4motheduc + β5fatheduc + u
bwght – birth weight (lbs), cigs – avge. no.
of cigarettes smoked per day during
pregnancy, parity – birth order, faminc –
annual family income, motheduc – years of
schooling of mother, fatheduc – years of
schooling of father.
2011-3-28 Lectured by Dr. Jin Hongfei 40
Example (4.9) continued
Test whether, controlling for other factors,
parents’ education has any impact on birth
weight
H0: β4 = β5 = 0 H1: H0 not true
q = 2 restrictions
Restricted model:
bwght = β0 + β1cigs + β2parity +
β3faminc + u

2011-3-28 Lectured by Dr. Jin Hongfei 41


Example (4.9) continued

SSRur
2011-3-28 Lectured by Dr. Jin Hongfei 42
Example (4.9) continued

SSRr
2011-3-28 Lectured by Dr. Jin Hongfei 43
Example (4.9) continued
Reject if F > F2,1185 = 3.00
( SSRr  SSRur ) / q
F
SSRur /(n  k  1)
(465166.792-464041.135)/2
  1.44
464041.135 /(1191-5-1)
so we do not reject H0. Parental education
is not a significant determinant of birth
weight.
2011-3-28 Lectured by Dr. Jin Hongfei 44
Example (4.9) easier way!
Use the test command in Eviews
Command here is: eq1.wald c(5)=0, c(6)=0
Type the following commands in command
area:
Equation eq1.ls bwght c cigs parity faminc motheduc fatheduc
eq1.wald c(5)=0, c(6)=0

2011-3-28 Lectured by Dr. Jin Hongfei 45


Output of F- test

2011-3-28 Lectured by Dr. Jin Hongfei 46


The R2 form of the F statistic
Because the SSR’s may be large and unwieldy, an
alternative form of the formula is useful
We use the fact that SSR = SST(1 – R2) for any
regression, so can substitute in for SSRu and SSRur

F
R q R
2 2
, where again

 
ur r
1  R n  k  1
2
ur

r is restricted and ur is unrestricted


2011-3-28 Lectured by Dr. Jin Hongfei 47
Overall Significance
A special case of exclusion restrictions is to test
H0: 1 = 2 =…= k = 0
It can be shown that in this case
( SST  SSR ) / k R2 k
F 

SSR /(n  k  1) 1  R 2  n  k  1
In these formulae everything refers to the
unrestricted model
Eviews reports the observed F statistic and
associated p-value for this “test of overall
significance” every time you estimate a regression
2011-3-28 Lectured by Dr. Jin Hongfei 48
Observed F for a test of
Eviews output overall significance

2011-3-28 Lectured by Dr. Jin Hongfei P-value 49


General Linear Restrictions
The basic form of the F statistic will work
for any set of linear restrictions
First estimate the unrestricted model and
then estimate the restricted model
In each case, make note of the SSR
Imposing the restrictions can be tricky –
will likely have to redefine variables again

2011-3-28 Lectured by Dr. Jin Hongfei 50


Example:
Unrestricted model:
y = 0 + x1+ x2 + x3 + u
H0: =1 and 
Restricted model is
y-x1 = 0 + x2 + u
Estimate both (you need to create y-x1)
Use:  SSRr  SSRur  q
F ~ Fq ,nk 1
SSRur  n  k  1
2011-3-28 Lectured by Dr. Jin Hongfei 51
F Statistic Summary
Just as with t statistics, p-values can be
calculated by looking up the percentile in
the appropriate F distribution
If only one exclusion is being tested, then F
= t2, and the p-values will be the same

2011-3-28 Lectured by Dr. Jin Hongfei 52


Testing a Linear Combination
Suppose instead of testing whether 1 is equal to a
constant, you want to test if it is equal to another
parameter, that is H0 : 1 = 2
Note that this could be done with an F-test
However we could also consider forming:

ˆ1  ˆ 2
t

se ˆ1  ˆ 2 
2011-3-28 Lectured by Dr. Jin Hongfei 53
Testing Linear Combo (cont)
Since
   
se ˆ1  ˆ 2  Var ˆ1  ˆ 2 , then
Var ˆ  ˆ   Var ˆ   Var ˆ   2 Cov ˆ , ˆ 
1 2 1 2 1 2

se ˆ  ˆ   se ˆ   se ˆ   2 s 


1
2 2 2
1 2 1 2 12

where s is an estimate of Cov ˆ , ˆ 


12 1 2

2011-3-28 Lectured by Dr. Jin Hongfei 54


Testing a Linear Combo (cont)
So, to use formula, need s12, which standard
output does not have
Many packages will have an option to get it, or
will just perform the test for you
In Eviews, after ls y c x1 x2 … xk you would type
test c(2) = c(3 ) to get a p-value for the test
More generally, you can always restate the
problem to get the test you want

2011-3-28 Lectured by Dr. Jin Hongfei 55


Example (Section 4.4)
log(wage) = 0 + 1jc + 2univ + 3exper + u
Under investigation is whether the returns to
junior college (jc) are the same as the returns
to university (univ)
H0: 1 = 2, or H0: 1 = 1 - 2 = 0
1 = 1 + 2, so substitute in and rearrange
 log(wage) = 0 + 1jc + 2(jc + univ) +
3exper + u
2011-3-28 Lectured by Dr. Jin Hongfei 56
Example (cont):
This is the same model as originally, but
now you get a standard error for 1 – 2 = 1
directly from the regression output
Any linear combination of parameters
could be tested in a similar manner
Using the data from twoyear.wfl:

2011-3-28 Lectured by Dr. Jin Hongfei 57


Eviews output: original model

2011-3-28 Lectured by Dr. Jin Hongfei 58


Eviews output: reparameterised
model

2011-3-28 Lectured by Dr. Jin Hongfei 59


Notice
The two estimated regressions are the same
The estimated standard error of ̂1is 0.0069
Testing H0: 1 = 2 is equivalent to testing
H0: 1 = 0
For a two-tailed test use Eviews p-value
(0.142)  do not reject H0
For one-tailed test (1 < 0), c = -1.645, t =
-1.47, so do not reject H0
2011-3-28 Lectured by Dr. Jin Hongfei 60
Even easier…
Run the original regression in eviews then type:

Equation eq1.ls lwage c jc univ exper


eq1.wald c(2)=c(3)

Eviews reports a p-value for the required test


(Prob>F)

2011-3-28 Lectured by Dr. Jin Hongfei 61

You might also like