You are on page 1of 61

Slides by K.

Clark adapted from


P.Anderson 1
Multiple Regression Analysis
y = |
0
+ |
1
x
1
+ |
2
x
2
+ . . . |
k
x
k
+ u

2. Statistical Inference
(Hypothesis Testing)
2
Multiple Regression: Recap
Linear in parameters: y = |
0
+ |
1
x
1
+ |
2
x
2

++ |
k
x
k
+ u [MLR.1]
{(x
i1
, x
i2
,, x
ik
, y
i
): i=1, 2, , n} is a random
sample from the population model, so that
y
i
= |
0
+ |
1
x
i1
+ |
2
x
i2
++ |
k
x
ik
+ u
i
[MLR.2]
E(u|x
1
, x
2
, x
k
) = E(u) = 0. Conditional mean
independence [MLR.3]
No exact multicollinearity [MLR.4]
V(u|x) = V(u) =
2
. Homoscedasticity. [MLR.5]
3
Multiple Regression: Recap
Estimation by Ordinary Least Squares (OLS) leads
to the fitted regression line


Where is the OLS estimate of

When we consider this as an estimator,


0 1 1 2 2

...
k k
y x x x = | +| +| + +|

j
|
( )
( )
2
2

1
j
j j
Var
SST R
o
| =

( )

j j
E | = |
j
|
4
Multiple Regression: Recap
( )
( ) ( )
2 2
1 2
2
1

thus, 1
i
j j j
u n k SSR df
se SST R
o =
| = o (

( )

j
se |
is the (estimated)
standard error of

j
|
Previous results on goodness of
fit and functional form apply
5
Statistical Inference: Overview
The statistical properties of the least squares
estimators derive from the assumptions of the model
These properties tell us something about the
optimality of the estimators (Gauss-Markov)
But also provide the foundation for the process of
statistical inference:
How confident are we about the estimates that we
have obtained?.
6
Statistical Inference: Overview
Suppose we have estimated the model:
0.23 0.74 0.39 wage educ exper = + +
We could have got the value of 0.74 for the
coefficient on education by chance. How confident
are we that the true parameter value is not 0.8 or 1.5
or -3.4 or 0?
Statistical inference addresses this kind of question.
7
Assumptions of the Classical
Linear Model (CLM)
So far, we know that given the Gauss-Markov
assumptions, OLS is BLUE,
In order to do classical hypothesis testing, we
need to add another assumption (beyond the
Gauss-Markov assumptions)
Assume that u is independent of x
1
, x
2
,, x
k
and u
is normally distributed with zero mean and
variance o
2
: u ~ Normal(0,o
2
) [MLR.6]
Adding MLR.6 gives the CLM assumptions

8
Outline of Lecture
1. tests of a single linear restriction (t-tests),
e.g.

2. tests of multiple linear restrictions (F-
tests) e.g.

3. [if time] tests of linear combinations (t-
tests, also possible using 2. above) e.g.
0
j j
| = |
0
i j
| =| =
0
i j
| +| = |
9
CLM Assumptions (cont)
Under CLM, OLS is not only BLUE, but is the
minimum variance unbiased estimator
We can summarize the population assumptions of
CLM as follows:
y|x ~ Normal(|
0
+ |
1
x
1
++ |
k
x
k
, o
2
)
While we assume normality, sometimes that is
not the case
Large samples will let us drop the normality
assumption

10
Normal Sampling Distributions
( )
( )
( )
( )
j
Under the CLM assumptions, conditional on
the sample values of the independent variables

~ Normal , , so that

~ Normal 0,1

is distributed normally because it


is a linear combina
j j j
j j
j
Var
sd
(
| | |

| |
|
|
tion of the errors ( ) u
11
The t Test
( )
( )
1
2 2
Under the CLM assumptions

Note this is a distribution (vs normal)

because we have to estimate by


Note the degrees of freedom: 1
j j
n k
j
t
se
t
n k

| |
|
o o

12
The t Distribution
looks like the standard normal except it has fatter
tails
is a family of distributions characterised by
degrees of freedom
gets more like the standard normal as degrees of
freedom increase
is indistinguishable from standard normal when
df greater than 120

13
The t Test (cont)
Start with a null hypothesis
An important null hypothesis, H
0
: |
j
= 0
If this null is true, then x
j
has no effect on y,
controlling for other xs
If H
0
true then x
j
should be excluded from
the model (efficiency argument, extraneous
regressor)


14
The t Test: Null Hypothesis
The null hypothesis is a maintained or
status quo view of the world.
We reject the null only if there is a lot of
evidence against it.
Analogy with the presumption of
innocence in English law.
Only reject if is sufficiently far from
zero.

j
|
15
The t Test (cont)
( )

j
j
j
t
se
|
|

|
We want to make Pr(Reject
H0|H0 true) sufficiently small.
Need the distribution of if
the null is true.
Know distribution of

j
|
16
t Test: Significance Level
If we want to have only a 5% probability of
rejecting H
0
if it is really true, then we say
our significance level is 5%.
The significance level is often denoted .
Significance levels usually chosen to be
1%, 5% or 10%.
5% most common or default value.

17
t Test: Alternative Hypotheses
Besides our null, H
0
, we need an alternative
hypothesis, H
1
, and a significance level ()

H
1
may be one-sided, or two-sided
H
1
: |
j
> 0 and H
1
: |
j
< 0 are one-sided
H
1
: |
j
= 0 is a two-sided alternative
If we want to have only a 5% probability of
rejecting H
0
if it is really true, then we say our
significance level is 5%

18
One-Sided Alternative:
j
> 0
Consider the alternative H
1
:
j
> 0
Having picked a significance level, o, we look up
the (1 o)
th
percentile in a t distribution with n k
1 df and call this c, the critical value
We can reject the null hypothesis in favour of the
alternative hypothesis if the observed t statistic is
greater than the critical value
If the t statistic is less than the critical value then
we do not reject the null
19
y
i
= |
0
+ |
1
x
i1
+

+ |
k
x
ik
+ u
i

H
0
: |
j
= 0 H
1
: |
j
> 0
c
0
o
(1 o)
One-Sided Alternative:
j
> 0
Fail to reject
reject
20
One-sided (tailed) vs Two-sided
Because the t distribution is symmetric, testing
H
1
: |
j
< 0 is straightforward. The critical value is
just the negative of before
We can reject the null if the t statistic < c, and if
the t statistic >c then we fail to reject the null
For a two-sided test, we set the critical value
based on o/2 and reject H
1
: |
j
= 0 if the absolute
value of the t statistic > c (|t| > c)
21
y
i
= |
0
+ |
1
x
i1
+

+ |
k
x
ik
+ u
i

H
0
: |
j
= 0 H
1
: |
j
0

c

0
o/2
(1 o)
-c

o/2
Two-Sided Alternatives: |
j
0

reject
reject
fail to reject
22
Summary for H
0
: |
j
= 0
Unless otherwise stated, the alternative is
assumed to be two-sided
If we reject the null, we typically say x
j
is
statistically significant at the o % level
If we fail to reject the null, we typically say
x
j
is statistically insignificant at the o %
level

23
Testing other hypotheses
A more general form of the t statistic
recognizes that we may want to test
something like H
0
: |
j
=
In this case, the appropriate t statistic is
( )
( )
0
0

, where

0 for the standard test


j
j
j
j
t
se

=
=
| |
|
|
0
j
|
24
Confidence Intervals
Another way to use classical statistical testing is
to construct a confidence interval using the same
critical value as was used for a two-sided test
A (1 - o) % confidence interval is defined as
( )
1

. , where c is the 1- percentile
2
in a distribution
j j
n k
c se
t

o
| |
| |
|
\ .
Interpretation (loose): We are 95% confident the
true parameter lies in the interval. (=5%)
Interpretation (better): In repeated samples the true
parameter will lie in the interval 95% of the time.
25
Computing p-values for t tests
An alternative to the classical approach is to ask,
what is the smallest significance level at which
the null would be rejected?
So, compute the t statistic, and then look up what
percentile it is in the appropriate t distribution
this is the p-value
p-value is the probability we would observe the t
statistic we did, if the null were true
Smaller p-values mean a more significant
regressor.
p < 0.05 means reject at 5% significance level
26
Stata and p-values, t tests, etc.
Most computer packages will compute the
p-value for you, assuming a two-sided test
If you really want a one-sided alternative,
just divide the two-sided p-value by 2
Stata provides the t statistic, p-value, and
95% confidence interval for H
0
: |
j
= 0 for
you, in columns labeled t, P > |t| and
[95% Conf. Interval], respectively
27
Example (4.1)
log(wage)=
0
+
1
educ +
2
exper +

3
tenure + u
H
0
:
2
= 0 H
1
:
2
0
Null says that experience does not affect
the expected log wage
Using data from wage1.dta we obtain the
Stata output
28
. use wage1

. regress lwage educ exper tenure

Source | SS df MS Number of obs = 526
-------------+------------------------------ F( 3, 522) = 80.39
Model | 46.8741776 3 15.6247259 Prob > F = 0.0000
Residual | 101.455574 522 .194359337 R-squared = 0.3160
-------------+------------------------------ Adj R-squared = 0.3121
Total | 148.329751 525 .28253286 Root MSE = .44086

------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .092029 .0073299 12.56 0.000 .0776292 .1064288
exper | .0041211 .0017233 2.39 0.017 .0007357 .0075065
tenure | .0220672 .0030936 7.13 0.000 .0159897 .0281448
_cons | .2843595 .1041904 2.73 0.007 .0796756 .4890435
------------------------------------------------------------------------------

.
1

( ) se |
3

( ) se |
0

( ) se |
2

( ) se |
29
2
Hence we can write the fitted regression line as:
log(wage) 0.284 0.092 0.0041 0.022
(0.104) (0.007) (0.0017) (0.003)
n=526, R =0.316
Note: standard error
educ exper tenure = + + +
s in parentheses.
This is a standard way of writing estimated regression
models in equation form.
Estimated Model: Equation Form
30
The hypothesis test
H
0
:
2
= 0 H
1
:
2
0
We have n-k-1=526-3-1=522 degrees of freedom
Reject if |t| > c
Can use standard normal critical values
c = 1.96 for a two-tailed test at 5% significance
level
t = 0.0041/0.0017 = 2.41 > 1.96
So reject H
0

Interpretation: experience has a statistically
significant impact on the expected log wage.
31
Notice
------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .092029 .0073299 12.56 0.000 .0776292 .1064288
exper | .0041211 .0017233 2.39 0.017 .0007357 .0075065
tenure | .0220672 .0030936 7.13 0.000 .0159897 .0281448
_cons | .2843595 .1041904 2.73 0.007 .0796756 .4890435
------------------------------------------------------------------------------


Stata reports this t ratio (2.39), a p-value for this
test (0.017) and the lower (0.00074) and upper
(0.0075) bounds of a confidence interval for the
unknown parameter
2

32
Extensions to example
Claim: the returns to education are less
than 10%
Test this formally
H
0
:
1
= 0.1 H
1
:
1
< 0.1
Notice two things have changed
A non-zero value under the null
A one-tailed alternative
33
Extension continued
Reject null hypothesis if t<-c = -1.645 (5% test)
t=(0.092029-0.1)/0.0073299=-1.0875
So we do not reject the null hypothesis
No evidence to suggest returns to education are
less than 10%
Note: Stata does not report the t-statistic or the p-
value for this test - you have to do the work
34
Multiple Linear Restrictions
We may want to jointly test multiple
hypotheses about our parameters
A typical example is testing exclusion
restrictions we want to know if a group
of parameters are all equal to zero
If we fail to reject then those associated
explanatory variables should be excluded
from the model
35
Testing Exclusion Restrictions
The null hypothesis might be something
like H
0
: |
k-q+1
= 0, ... , |
k
= 0
The alternative is just H
1
: H
0
is not true
This means that at least one of the
parameters is not zero in the population
Cant just check each t statistic separately,
because we want to know if the q
parameters are jointly significant at a given
level it is possible for none to be
individually significant at that level

36
Exclusion Restrictions (cont)
To do the test we need to estimate the restricted
model without x
k-q+1
,
,
, x
k
included, as well as
the unrestricted model with all xs included
Intuitively, we want to know if the change in SSR
is big enough to warrant inclusion of x
k-q+1
,
,
, x
k

( )
( )
ed unrestrict is ur and restricted is r
where ,
1

k n SSR
q SSR SSR
F
ur
ur r
37
The F statistic
The F statistic is always positive, since the
SSR from the restricted model cant be less
than the SSR from the unrestricted
Essentially the F statistic is measuring the
relative increase in SSR when moving from
the unrestricted to restricted model
q = number of restrictions, or df
r
df
ur

n k 1 = df
ur
38
The F statistic (cont)
To decide if the increase in SSR when we
move to a restricted model is big enough
to reject the exclusions, we need to know
about the sampling distribution of our F stat
Not surprisingly, F ~ F
q,n-k-1
, where q is
referred to as the numerator degrees of
freedom and n k 1 as the denominator
degrees of freedom
39
0 c

o
(1 o)
f(F)
F
The F statistic (cont)
reject
fail to reject
Reject H
0
at o
significance level
if F > c
40
Example (4.9)
bwght =
0
+
1
cigs +
2
parity +
3
faminc
+
4
motheduc +
5
fatheduc + u
bwght birth weight (lbs), cigs avge. no.
of cigarettes smoked per day during
pregnancy, parity birth order, faminc
annual family income, motheduc years of
schooling of mother, fatheduc years of
schooling of father.

41
Example (4.9) continued
Test whether, controlling for other factors,
parents education has any impact on birth
weight
H
0
:
4
=
5
= 0 H
1
: H
0
not true
q = 2 restrictions
Restricted model:
bwght =
0
+
1
cigs +
2
parity +
3
faminc
+ + u

42
Example (4.9) continued

Unrestricted Model


regress bwght cigs parity faminc motheduc fatheduc

Source | SS df MS Number of obs = 1191
-------------+------------------------------ F( 5, 1185) = 9.55
Model | 18705.5567 5 3741.11135 Prob > F = 0.0000
Residual | 464041.135 1185 391.595895 R-squared = 0.0387
-------------+------------------------------ Adj R-squared = 0.0347
Total | 482746.692 1190 405.669489 Root MSE = 19.789

------------------------------------------------------------------------------
bwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | -.5959362 .1103479 -5.40 0.000 -.8124352 -.3794373
parity | 1.787603 .6594055 2.71 0.007 .493871 3.081336
faminc | .0560414 .0365616 1.53 0.126 -.0156913 .1277742
motheduc | -.3704503 .3198551 -1.16 0.247 -.9979957 .2570951
fatheduc | .4723944 .2826433 1.67 0.095 -.0821426 1.026931
_cons | 114.5243 3.728453 30.72 0.000 107.2092 121.8394
------------------------------------------------------------------------------



SSR
ur

43
Example (4.9) continued
Using data in bwght.dta:
Restricted Model


. regress bwght cigs parity faminc if e(sample)

Source | SS df MS Number of obs = 1191
-------------+------------------------------ F( 3, 1187) = 14.95
Model | 17579.8997 3 5859.96658 Prob > F = 0.0000
Residual | 465166.792 1187 391.884408 R-squared = 0.0364
-------------+------------------------------ Adj R-squared = 0.0340
Total | 482746.692 1190 405.669489 Root MSE = 19.796

------------------------------------------------------------------------------
bwght | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
cigs | -.5978519 .1087701 -5.50 0.000 -.8112549 -.3844489
parity | 1.832274 .6575402 2.79 0.005 .5422035 3.122345
faminc | .0670618 .0323938 2.07 0.039 .0035063 .1306173
_cons | 115.4699 1.655898 69.73 0.000 112.2211 118.7187
------------------------------------------------------------------------------


SSR
r

44
Example (4.9) continued
Reject if F > F
2,
= 3.00

( ) /
/( 1)
r ur
ur
SSR SSR q
F
SSR n k

=

(465166.792-464041.135)/2
1.44
464041.135 /(1191-5-1)
= =
so we do not reject H
0
. Parental education
is not a significant determinant of
birthweight.
45
Example (4.9) easier way!
Use the test command in Stata
Command here is: test motheduc=fatheduc=0

regress bwght cigs parity faminc motheduc fatheduc

(output suppressed)

. test motheduc=fatheduc=0

( 1) motheduc - fatheduc = 0
( 2) motheduc = 0

F( 2, 1185) = 1.44
Prob > F = 0.2380

46
The R
2
form of the F statistic
Because the SSRs may be large and unwieldy, an
alternative form of the formula is useful
We use the fact that SSR = SST(1 R
2
) for any
regression, so can substitute in for SSR
u
and SSR
ur
( )
( ) ( )
ed unrestrict is ur and restricted is r
again where ,
1 1
2
2 2

k n R
q R R
F
ur
r ur
47
Overall Significance
A special case of exclusion restrictions is to test
H
0
: |
1
= |
2
== |
k
= 0
It can be shown that in this case

( )
( )
2
2
( ) /
/( 1)
1 1
SST SSR k R k
F
SSR n k
R n k

= =


In these formulae everything refers to the
unrestricted model
Stata reports the observed F statistic and
associated p-value for this test of overall
significance every time you estimate a regression

48
Stata output

Observed F for a test of
overall significance
P-value
. regress lwage educ exper tenure

Source | SS df MS Number of obs = 526
-------------+------------------------------ F( 3, 522) = 80.39
Model | 46.8741776 3 15.6247259 Prob > F = 0.0000
Residual | 101.455574 522 .194359337 R-squared = 0.3160
-------------+------------------------------ Adj R-squared = 0.3121
Total | 148.329751 525 .28253286 Root MSE = .44086

------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
educ | .092029 .0073299 12.56 0.000 .0776292 .1064288
exper | .0041211 .0017233 2.39 0.017 .0007357 .0075065
tenure | .0220672 .0030936 7.13 0.000 .0159897 .0281448
_cons | .2843595 .1041904 2.73 0.007 .0796756 .4890435
------------------------------------------------------------------------------
49
General Linear Restrictions
The basic form of the F statistic will work
for any set of linear restrictions
First estimate the unrestricted model and
then estimate the restricted model
In each case, make note of the SSR
Imposing the restrictions can be tricky
will likely have to redefine variables again
50
Example:
Unrestricted model:
y = |
0
+ |
1
x
1
+ |
2
x
2
+ |
3
x
3
+ u

H
0
: |
1
=1 and |
3
=0
Restricted model is
y-x
1
= |
0
+ |
2
x
2
+ u
Estimate both (you need to create y-x
1
)
Use:

( )
( )
, 1
~
1
r ur
q n k
ur
SSR SSR q
F F
SSR n k


51
F Statistic Summary
Just as with t statistics, p-values can be
calculated by looking up the percentile in
the appropriate F distribution
Stata will do this by entering: display
fprob(q, n k 1, F), where the appropriate
values of F, q,and n k 1 are used
If only one exclusion is being tested, then F
= t
2
, and the p-values will be the same
52
Testing a Linear Combination
Suppose instead of testing whether |
1
is equal to a
constant, you want to test if it is equal to another
parameter, that is H
0
: |
1
= |
2

Note that this could be done with an F-test
However we could also consider forming:
( )
1 2
1 2


t
se
| |
=
| |
53
Testing Linear Combo (cont)
( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( )
{ }
( )
1 2 1 2
1 2 1 2 1 2
1
2 2 2
1 2 1 2 12
12 1 2
Since

, then

2 ,

2

where is an estimate of ,
se Var
Var Var Var Cov
se se se s
s Cov
| | = | |
| | = | + | | |
( (
| | = | + |

| |
54
Testing a Linear Combo (cont)
So, to use formula, need s
12
, which
standard output does not have
Many packages will have an option to get
it, or will just perform the test for you
In Stata, after reg y x1 x2 xk you would
type test x1 = x2 to get a p-value for the test
More generally, you can always restate the
problem to get the test you want
55
Example (Section 4.4)
log(wage) = |
0
+ |
1
jc + |
2
univ + |
3
exper +
u
Under investigation is whether the returns
to junior college (jc) are the same as the
returns to university (univ)
H
0
: |
1
= |
2
, or H
0
: u
1
= |
1
- |
2
= 0
|
1
= u
1
+ |
2
, so substitute in and rearrange
log(wage) = |
0
+ u
1
jc + |
2
(jc + univ) +
|
3
exper + u
56
Example (cont):
This is the same model as originally, but
now you get a standard error for |
1
|
2
= u
1

directly from the regression output
Any linear combination of parameters
could be tested in a similar manner
Using the data from twoyear.dta:
57
Stata output: original model

. regress lwage jc univ exper

Source | SS df MS Number of obs = 6763
-------------+------------------------------ F( 3, 6759) = 644.53
Model | 357.752575 3 119.250858 Prob > F = 0.0000
Residual | 1250.54352 6759 .185019014 R-squared = 0.2224
-------------+------------------------------ Adj R-squared = 0.2221
Total | 1608.29609 6762 .237843255 Root MSE = .43014

------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
jc | .0666967 .0068288 9.77 0.000 .0533101 .0800833
univ | .0768762 .0023087 33.30 0.000 .0723504 .0814021
exper | .0049442 .0001575 31.40 0.000 .0046355 .0052529
_cons | 1.472326 .0210602 69.91 0.000 1.431041 1.51361
------------------------------------------------------------------------------

58
Stata output: reparameterised model

. regress lwage jc totcoll exper

Source | SS df MS Number of obs = 6763
-------------+------------------------------ F( 3, 6759) = 644.53
Model | 357.752575 3 119.250858 Prob > F = 0.0000
Residual | 1250.54352 6759 .185019014 R-squared = 0.2224
-------------+------------------------------ Adj R-squared = 0.2221
Total | 1608.29609 6762 .237843255 Root MSE = .43014

------------------------------------------------------------------------------
lwage | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
jc | -.0101795 .0069359 -1.47 0.142 -.0237761 .003417
totcoll | .0768762 .0023087 33.30 0.000 .0723504 .0814021
exper | .0049442 .0001575 31.40 0.000 .0046355 .0052529
_cons | 1.472326 .0210602 69.91 0.000 1.431041 1.51361
------------------------------------------------------------------------------



59
Notice
The two estimated regressions are the same
The estimated standard error of is 0.0069
Testing H
0
: |
1
= |
2
is equivalent to testing
H
0
: u
1
= 0
For a two-tailed test use Stata p-value
(0.142) do not reject H
0

For one-tailed test (u
1
< 0), c = -1.645, t =
-1.47, so do not reject H
0

1

u
60
Even easier
Run the original regression in Stata then type
. test univ=jc
Stata reports a p-value for the required test
(Prob>F)

. regress lwage jc univ exper
{output suppressed}

. test jc=univ

( 1) jc - univ = 0.0

F( 1, 6759) = 2.15
Prob > F = 0.1422


61
Next time
What happens when MLR.6 is not a
reasonable assumption to make?
Can we still find reasonable estimators and
perform inference?
Chapter 7 of Wooldridge
Also one hour class test on lectures 1-6