j

( )
( )
2
2
1
j
j j
Var
SST R
o
 =
( )
j j
E  = 
j

4
Multiple Regression: Recap
( )
( ) ( )
2 2
1 2
2
1
thus, 1
i
j j j
u n k SSR df
se SST R
o =
 = o (
( )
j
se 
is the (estimated)
standard error of
j

Previous results on goodness of
fit and functional form apply
5
Statistical Inference: Overview
The statistical properties of the least squares
estimators derive from the assumptions of the model
These properties tell us something about the
optimality of the estimators (GaussMarkov)
But also provide the foundation for the process of
statistical inference:
How confident are we about the estimates that we
have obtained?.
6
Statistical Inference: Overview
Suppose we have estimated the model:
0.23 0.74 0.39 wage educ exper = + +
We could have got the value of 0.74 for the
coefficient on education by chance. How confident
are we that the true parameter value is not 0.8 or 1.5
or 3.4 or 0?
Statistical inference addresses this kind of question.
7
Assumptions of the Classical
Linear Model (CLM)
So far, we know that given the GaussMarkov
assumptions, OLS is BLUE,
In order to do classical hypothesis testing, we
need to add another assumption (beyond the
GaussMarkov assumptions)
Assume that u is independent of x
1
, x
2
,, x
k
and u
is normally distributed with zero mean and
variance o
2
: u ~ Normal(0,o
2
) [MLR.6]
Adding MLR.6 gives the CLM assumptions
8
Outline of Lecture
1. tests of a single linear restriction (ttests),
e.g.
2. tests of multiple linear restrictions (F
tests) e.g.
3. [if time] tests of linear combinations (t
tests, also possible using 2. above) e.g.
0
j j
 = 
0
i j
 = =
0
i j
 + = 
9
CLM Assumptions (cont)
Under CLM, OLS is not only BLUE, but is the
minimum variance unbiased estimator
We can summarize the population assumptions of
CLM as follows:
yx ~ Normal(
0
+ 
1
x
1
++ 
k
x
k
, o
2
)
While we assume normality, sometimes that is
not the case
Large samples will let us drop the normality
assumption
10
Normal Sampling Distributions
( )
( )
( )
( )
j
Under the CLM assumptions, conditional on
the sample values of the independent variables
~ Normal , , so that
~ Normal 0,1
j

15
The t Test (cont)
( )
j
j
j
t
se



We want to make Pr(Reject
H0H0 true) sufficiently small.
Need the distribution of if
the null is true.
Know distribution of
j

16
t Test: Significance Level
If we want to have only a 5% probability of
rejecting H
0
if it is really true, then we say
our significance level is 5%.
The significance level is often denoted .
Significance levels usually chosen to be
1%, 5% or 10%.
5% most common or default value.
17
t Test: Alternative Hypotheses
Besides our null, H
0
, we need an alternative
hypothesis, H
1
, and a significance level ()
H
1
may be onesided, or twosided
H
1
: 
j
> 0 and H
1
: 
j
< 0 are onesided
H
1
: 
j
= 0 is a twosided alternative
If we want to have only a 5% probability of
rejecting H
0
if it is really true, then we say our
significance level is 5%
18
OneSided Alternative:
j
> 0
Consider the alternative H
1
:
j
> 0
Having picked a significance level, o, we look up
the (1 o)
th
percentile in a t distribution with n k
1 df and call this c, the critical value
We can reject the null hypothesis in favour of the
alternative hypothesis if the observed t statistic is
greater than the critical value
If the t statistic is less than the critical value then
we do not reject the null
19
y
i
= 
0
+ 
1
x
i1
+
+ 
k
x
ik
+ u
i
H
0
: 
j
= 0 H
1
: 
j
> 0
c
0
o
(1 o)
OneSided Alternative:
j
> 0
Fail to reject
reject
20
Onesided (tailed) vs Twosided
Because the t distribution is symmetric, testing
H
1
: 
j
< 0 is straightforward. The critical value is
just the negative of before
We can reject the null if the t statistic < c, and if
the t statistic >c then we fail to reject the null
For a twosided test, we set the critical value
based on o/2 and reject H
1
: 
j
= 0 if the absolute
value of the t statistic > c (t > c)
21
y
i
= 
0
+ 
1
x
i1
+
+ 
k
x
ik
+ u
i
H
0
: 
j
= 0 H
1
: 
j
0
c
0
o/2
(1 o)
c
o/2
TwoSided Alternatives: 
j
0
reject
reject
fail to reject
22
Summary for H
0
: 
j
= 0
Unless otherwise stated, the alternative is
assumed to be twosided
If we reject the null, we typically say x
j
is
statistically significant at the o % level
If we fail to reject the null, we typically say
x
j
is statistically insignificant at the o %
level
23
Testing other hypotheses
A more general form of the t statistic
recognizes that we may want to test
something like H
0
: 
j
=
In this case, the appropriate t statistic is
( )
( )
0
0
, where
=
=
 


0
j

24
Confidence Intervals
Another way to use classical statistical testing is
to construct a confidence interval using the same
critical value as was used for a twosided test
A (1  o) % confidence interval is defined as
( )
1
. , where c is the 1 percentile
2
in a distribution
j j
n k
c se
t
o
 
 

\ .
Interpretation (loose): We are 95% confident the
true parameter lies in the interval. (=5%)
Interpretation (better): In repeated samples the true
parameter will lie in the interval 95% of the time.
25
Computing pvalues for t tests
An alternative to the classical approach is to ask,
what is the smallest significance level at which
the null would be rejected?
So, compute the t statistic, and then look up what
percentile it is in the appropriate t distribution
this is the pvalue
pvalue is the probability we would observe the t
statistic we did, if the null were true
Smaller pvalues mean a more significant
regressor.
p < 0.05 means reject at 5% significance level
26
Stata and pvalues, t tests, etc.
Most computer packages will compute the
pvalue for you, assuming a twosided test
If you really want a onesided alternative,
just divide the twosided pvalue by 2
Stata provides the t statistic, pvalue, and
95% confidence interval for H
0
: 
j
= 0 for
you, in columns labeled t, P > t and
[95% Conf. Interval], respectively
27
Example (4.1)
log(wage)=
0
+
1
educ +
2
exper +
3
tenure + u
H
0
:
2
= 0 H
1
:
2
0
Null says that experience does not affect
the expected log wage
Using data from wage1.dta we obtain the
Stata output
28
. use wage1
. regress lwage educ exper tenure
Source  SS df MS Number of obs = 526
+ F( 3, 522) = 80.39
Model  46.8741776 3 15.6247259 Prob > F = 0.0000
Residual  101.455574 522 .194359337 Rsquared = 0.3160
+ Adj Rsquared = 0.3121
Total  148.329751 525 .28253286 Root MSE = .44086

lwage  Coef. Std. Err. t P>t [95% Conf. Interval]
+
educ  .092029 .0073299 12.56 0.000 .0776292 .1064288
exper  .0041211 .0017233 2.39 0.017 .0007357 .0075065
tenure  .0220672 .0030936 7.13 0.000 .0159897 .0281448
_cons  .2843595 .1041904 2.73 0.007 .0796756 .4890435

.
1
( ) se 
3
( ) se 
0
( ) se 
2
( ) se 
29
2
Hence we can write the fitted regression line as:
log(wage) 0.284 0.092 0.0041 0.022
(0.104) (0.007) (0.0017) (0.003)
n=526, R =0.316
Note: standard error
educ exper tenure = + + +
s in parentheses.
This is a standard way of writing estimated regression
models in equation form.
Estimated Model: Equation Form
30
The hypothesis test
H
0
:
2
= 0 H
1
:
2
0
We have nk1=52631=522 degrees of freedom
Reject if t > c
Can use standard normal critical values
c = 1.96 for a twotailed test at 5% significance
level
t = 0.0041/0.0017 = 2.41 > 1.96
So reject H
0
Interpretation: experience has a statistically
significant impact on the expected log wage.
31
Notice

lwage  Coef. Std. Err. t P>t [95% Conf. Interval]
+
educ  .092029 .0073299 12.56 0.000 .0776292 .1064288
exper  .0041211 .0017233 2.39 0.017 .0007357 .0075065
tenure  .0220672 .0030936 7.13 0.000 .0159897 .0281448
_cons  .2843595 .1041904 2.73 0.007 .0796756 .4890435

Stata reports this t ratio (2.39), a pvalue for this
test (0.017) and the lower (0.00074) and upper
(0.0075) bounds of a confidence interval for the
unknown parameter
2
32
Extensions to example
Claim: the returns to education are less
than 10%
Test this formally
H
0
:
1
= 0.1 H
1
:
1
< 0.1
Notice two things have changed
A nonzero value under the null
A onetailed alternative
33
Extension continued
Reject null hypothesis if t<c = 1.645 (5% test)
t=(0.0920290.1)/0.0073299=1.0875
So we do not reject the null hypothesis
No evidence to suggest returns to education are
less than 10%
Note: Stata does not report the tstatistic or the p
value for this test  you have to do the work
34
Multiple Linear Restrictions
We may want to jointly test multiple
hypotheses about our parameters
A typical example is testing exclusion
restrictions we want to know if a group
of parameters are all equal to zero
If we fail to reject then those associated
explanatory variables should be excluded
from the model
35
Testing Exclusion Restrictions
The null hypothesis might be something
like H
0
: 
kq+1
= 0, ... , 
k
= 0
The alternative is just H
1
: H
0
is not true
This means that at least one of the
parameters is not zero in the population
Cant just check each t statistic separately,
because we want to know if the q
parameters are jointly significant at a given
level it is possible for none to be
individually significant at that level
36
Exclusion Restrictions (cont)
To do the test we need to estimate the restricted
model without x
kq+1
,
,
, x
k
included, as well as
the unrestricted model with all xs included
Intuitively, we want to know if the change in SSR
is big enough to warrant inclusion of x
kq+1
,
,
, x
k
( )
( )
ed unrestrict is ur and restricted is r
where ,
1
k n SSR
q SSR SSR
F
ur
ur r
37
The F statistic
The F statistic is always positive, since the
SSR from the restricted model cant be less
than the SSR from the unrestricted
Essentially the F statistic is measuring the
relative increase in SSR when moving from
the unrestricted to restricted model
q = number of restrictions, or df
r
df
ur
n k 1 = df
ur
38
The F statistic (cont)
To decide if the increase in SSR when we
move to a restricted model is big enough
to reject the exclusions, we need to know
about the sampling distribution of our F stat
Not surprisingly, F ~ F
q,nk1
, where q is
referred to as the numerator degrees of
freedom and n k 1 as the denominator
degrees of freedom
39
0 c
o
(1 o)
f(F)
F
The F statistic (cont)
reject
fail to reject
Reject H
0
at o
significance level
if F > c
40
Example (4.9)
bwght =
0
+
1
cigs +
2
parity +
3
faminc
+
4
motheduc +
5
fatheduc + u
bwght birth weight (lbs), cigs avge. no.
of cigarettes smoked per day during
pregnancy, parity birth order, faminc
annual family income, motheduc years of
schooling of mother, fatheduc years of
schooling of father.
41
Example (4.9) continued
Test whether, controlling for other factors,
parents education has any impact on birth
weight
H
0
:
4
=
5
= 0 H
1
: H
0
not true
q = 2 restrictions
Restricted model:
bwght =
0
+
1
cigs +
2
parity +
3
faminc
+ + u
42
Example (4.9) continued
Unrestricted Model
regress bwght cigs parity faminc motheduc fatheduc
Source  SS df MS Number of obs = 1191
+ F( 5, 1185) = 9.55
Model  18705.5567 5 3741.11135 Prob > F = 0.0000
Residual  464041.135 1185 391.595895 Rsquared = 0.0387
+ Adj Rsquared = 0.0347
Total  482746.692 1190 405.669489 Root MSE = 19.789

bwght  Coef. Std. Err. t P>t [95% Conf. Interval]
+
cigs  .5959362 .1103479 5.40 0.000 .8124352 .3794373
parity  1.787603 .6594055 2.71 0.007 .493871 3.081336
faminc  .0560414 .0365616 1.53 0.126 .0156913 .1277742
motheduc  .3704503 .3198551 1.16 0.247 .9979957 .2570951
fatheduc  .4723944 .2826433 1.67 0.095 .0821426 1.026931
_cons  114.5243 3.728453 30.72 0.000 107.2092 121.8394

SSR
ur
43
Example (4.9) continued
Using data in bwght.dta:
Restricted Model
. regress bwght cigs parity faminc if e(sample)
Source  SS df MS Number of obs = 1191
+ F( 3, 1187) = 14.95
Model  17579.8997 3 5859.96658 Prob > F = 0.0000
Residual  465166.792 1187 391.884408 Rsquared = 0.0364
+ Adj Rsquared = 0.0340
Total  482746.692 1190 405.669489 Root MSE = 19.796

bwght  Coef. Std. Err. t P>t [95% Conf. Interval]
+
cigs  .5978519 .1087701 5.50 0.000 .8112549 .3844489
parity  1.832274 .6575402 2.79 0.005 .5422035 3.122345
faminc  .0670618 .0323938 2.07 0.039 .0035063 .1306173
_cons  115.4699 1.655898 69.73 0.000 112.2211 118.7187

SSR
r
44
Example (4.9) continued
Reject if F > F
2,
= 3.00
( ) /
/( 1)
r ur
ur
SSR SSR q
F
SSR n k
=
(465166.792464041.135)/2
1.44
464041.135 /(119151)
= =
so we do not reject H
0
. Parental education
is not a significant determinant of
birthweight.
45
Example (4.9) easier way!
Use the test command in Stata
Command here is: test motheduc=fatheduc=0
regress bwght cigs parity faminc motheduc fatheduc
(output suppressed)
. test motheduc=fatheduc=0
( 1) motheduc  fatheduc = 0
( 2) motheduc = 0
F( 2, 1185) = 1.44
Prob > F = 0.2380
46
The R
2
form of the F statistic
Because the SSRs may be large and unwieldy, an
alternative form of the formula is useful
We use the fact that SSR = SST(1 R
2
) for any
regression, so can substitute in for SSR
u
and SSR
ur
( )
( ) ( )
ed unrestrict is ur and restricted is r
again where ,
1 1
2
2 2
k n R
q R R
F
ur
r ur
47
Overall Significance
A special case of exclusion restrictions is to test
H
0
: 
1
= 
2
== 
k
= 0
It can be shown that in this case
( )
( )
2
2
( ) /
/( 1)
1 1
SST SSR k R k
F
SSR n k
R n k
= =
In these formulae everything refers to the
unrestricted model
Stata reports the observed F statistic and
associated pvalue for this test of overall
significance every time you estimate a regression
48
Stata output
Observed F for a test of
overall significance
Pvalue
. regress lwage educ exper tenure
Source  SS df MS Number of obs = 526
+ F( 3, 522) = 80.39
Model  46.8741776 3 15.6247259 Prob > F = 0.0000
Residual  101.455574 522 .194359337 Rsquared = 0.3160
+ Adj Rsquared = 0.3121
Total  148.329751 525 .28253286 Root MSE = .44086

lwage  Coef. Std. Err. t P>t [95% Conf. Interval]
+
educ  .092029 .0073299 12.56 0.000 .0776292 .1064288
exper  .0041211 .0017233 2.39 0.017 .0007357 .0075065
tenure  .0220672 .0030936 7.13 0.000 .0159897 .0281448
_cons  .2843595 .1041904 2.73 0.007 .0796756 .4890435

49
General Linear Restrictions
The basic form of the F statistic will work
for any set of linear restrictions
First estimate the unrestricted model and
then estimate the restricted model
In each case, make note of the SSR
Imposing the restrictions can be tricky
will likely have to redefine variables again
50
Example:
Unrestricted model:
y = 
0
+ 
1
x
1
+ 
2
x
2
+ 
3
x
3
+ u
H
0
: 
1
=1 and 
3
=0
Restricted model is
yx
1
= 
0
+ 
2
x
2
+ u
Estimate both (you need to create yx
1
)
Use:
( )
( )
, 1
~
1
r ur
q n k
ur
SSR SSR q
F F
SSR n k
51
F Statistic Summary
Just as with t statistics, pvalues can be
calculated by looking up the percentile in
the appropriate F distribution
Stata will do this by entering: display
fprob(q, n k 1, F), where the appropriate
values of F, q,and n k 1 are used
If only one exclusion is being tested, then F
= t
2
, and the pvalues will be the same
52
Testing a Linear Combination
Suppose instead of testing whether 
1
is equal to a
constant, you want to test if it is equal to another
parameter, that is H
0
: 
1
= 
2
Note that this could be done with an Ftest
However we could also consider forming:
( )
1 2
1 2
t
se
 
=
 
53
Testing Linear Combo (cont)
( ) ( )
( ) ( ) ( ) ( )
( ) ( ) ( )
{ }
( )
1 2 1 2
1 2 1 2 1 2
1
2 2 2
1 2 1 2 12
12 1 2
Since
, then
2 ,
2
where is an estimate of ,
se Var
Var Var Var Cov
se se se s
s Cov
  =  
  =  +   
( (
  =  + 
 
54
Testing a Linear Combo (cont)
So, to use formula, need s
12
, which
standard output does not have
Many packages will have an option to get
it, or will just perform the test for you
In Stata, after reg y x1 x2 xk you would
type test x1 = x2 to get a pvalue for the test
More generally, you can always restate the
problem to get the test you want
55
Example (Section 4.4)
log(wage) = 
0
+ 
1
jc + 
2
univ + 
3
exper +
u
Under investigation is whether the returns
to junior college (jc) are the same as the
returns to university (univ)
H
0
: 
1
= 
2
, or H
0
: u
1
= 
1
 
2
= 0

1
= u
1
+ 
2
, so substitute in and rearrange
log(wage) = 
0
+ u
1
jc + 
2
(jc + univ) +

3
exper + u
56
Example (cont):
This is the same model as originally, but
now you get a standard error for 
1

2
= u
1
directly from the regression output
Any linear combination of parameters
could be tested in a similar manner
Using the data from twoyear.dta:
57
Stata output: original model
. regress lwage jc univ exper
Source  SS df MS Number of obs = 6763
+ F( 3, 6759) = 644.53
Model  357.752575 3 119.250858 Prob > F = 0.0000
Residual  1250.54352 6759 .185019014 Rsquared = 0.2224
+ Adj Rsquared = 0.2221
Total  1608.29609 6762 .237843255 Root MSE = .43014

lwage  Coef. Std. Err. t P>t [95% Conf. Interval]
+
jc  .0666967 .0068288 9.77 0.000 .0533101 .0800833
univ  .0768762 .0023087 33.30 0.000 .0723504 .0814021
exper  .0049442 .0001575 31.40 0.000 .0046355 .0052529
_cons  1.472326 .0210602 69.91 0.000 1.431041 1.51361

58
Stata output: reparameterised model
. regress lwage jc totcoll exper
Source  SS df MS Number of obs = 6763
+ F( 3, 6759) = 644.53
Model  357.752575 3 119.250858 Prob > F = 0.0000
Residual  1250.54352 6759 .185019014 Rsquared = 0.2224
+ Adj Rsquared = 0.2221
Total  1608.29609 6762 .237843255 Root MSE = .43014

lwage  Coef. Std. Err. t P>t [95% Conf. Interval]
+
jc  .0101795 .0069359 1.47 0.142 .0237761 .003417
totcoll  .0768762 .0023087 33.30 0.000 .0723504 .0814021
exper  .0049442 .0001575 31.40 0.000 .0046355 .0052529
_cons  1.472326 .0210602 69.91 0.000 1.431041 1.51361

59
Notice
The two estimated regressions are the same
The estimated standard error of is 0.0069
Testing H
0
: 
1
= 
2
is equivalent to testing
H
0
: u
1
= 0
For a twotailed test use Stata pvalue
(0.142) do not reject H
0
For onetailed test (u
1
< 0), c = 1.645, t =
1.47, so do not reject H
0
1
u
60
Even easier
Run the original regression in Stata then type
. test univ=jc
Stata reports a pvalue for the required test
(Prob>F)
. regress lwage jc univ exper
{output suppressed}
. test jc=univ
( 1) jc  univ = 0.0
F( 1, 6759) = 2.15
Prob > F = 0.1422
61
Next time
What happens when MLR.6 is not a
reasonable assumption to make?
Can we still find reasonable estimators and
perform inference?
Chapter 7 of Wooldridge
Also one hour class test on lectures 16