You are on page 1of 97

0.

Introduction
What is econometrics?
Econometrics is the application of statistics and economic theory to
data in order to test economic hypotheses.
Economic theory describes relationships between economic variables.
For example, the law of demand tells us that as prices go down, the
quantity demanded will go up.
However, as the owner of a firm or as a policymaker, we are often
interested in the magnitude of the relationship between two variables.
For example, if cigarette taxes increase, the quantity demanded falls.
By how much? What will be the impact on tax revenues?
To answer these questions, we need to know something about the
empirical relationship between cigarette prices and cigarette demand.
1

We could ask a variety of other questions:


1) What is the impact of education on earnings?
2) How much do increases in government transfers (e.g., TANF)
reduce work effort?
3) What is the effect of an increased police force on the amount of
crime committed in a city?

Econometrics is also useful for forecasting.


1) Firms forecast revenues and costs.
2) Governments forecast consumer spending and unemployment rates.

Does econometrics always give the right answer?


Earnings

Years of Education

is the statistical relationship between of years of education and earnings.


One more year of education will increase earnings by .
However, we only have observational data to estimate this statistical
relationship or correlation.
We typically will not be analyzing a randomized experiment.
Does one more year of school really cause earnings to increase?
Or, do more able people, who would have earned more anyway, get more
education?
We will have to rely both on economic theory as well as our understanding
of econometric theory to interpret our findings.
3

Is econometrics the same as program evaluation?


Program evaluation undertakes an examination of a program (or policy)
through the study of the programs goals, processes, and outcomes.
For example, an evaluation of the Pittsburgh Promise program, which
provides scholarships and other college related needs to graduates of
Pittsburgh Public Schools, would likely include a study of whether the
program increased the educational attainment of city school graduates.
Such an evaluation would implement statistical and econometric
methodologies as part of the study.
While economists would find the results of this evaluation very useful, they
would be interested in knowing whether this program also informs us about
the relationship between educational attainment and outcomes of interest to
economists such as wages, crime, intergenerational outcomes, etc.

Example
In 1973, the Indonesian government decided that it was important to
provide equity across the countrys provinces.
Indonesia undertook a massive schooling building program in which
over 61,000 primary schools were built within the next six years.
The intent of the program was to target new schools in areas where
enrollments were previously low which was likely due, in part, to the
long distances students had to travel to attend school.
Between 1973 and 1978, the school enrollment rates of 7 to 12 year old
Indonesians rose from 69 percent to 83 percent.
From the perspective of whether or not the program increased
education levels in Indonesian, it appears to have been successful.

From an economists viewpoint, this program can be used to ask a


question of great interest such as does increased education raise wages?
Duflo (2001) uses the Indonesian schooling building program to
answer precisely this question.
The idea is that this program is effectively an experiment in that it
raised education levels in some parts of Indonesian but not in others.
In terms of an experiment, children who reside in areas where school
building increased are the treatment group while those in areas where
no new schools were built are the control group.
She is able to study whether the increase in education causes an
increase in the wages of those affected by the program.

In addition, we can also use economic theory to think about how the
program might impact those who were not directly affected by it.
An increase in the supply of educated workers will shift the labor
supply curve and therefore lead to a new lower equilibrium wage which
will indirectly affect those born before the school building program.
Duflo (2004) examines the impact of the schooling building program
on those born before the program took effect in their province.
She finds that the increase in educated workers due to the program
reduces the wages of workers in older age cohorts by 4 to 10 percent.
By thinking through the economic theory for how an increased supply
of workers will affect the economy overall, we can find implications
for how those who do not participate in a program may be affected.

The goal of this course is to impart a basic understanding of


econometric theory in order to be able to interpret the findings from
studies that implement econometric methodologies.
As mentioned earlier, not all studies will use true experiments or
natural experiments to estimate the impact of program or policy.
As such, we will require a number of assumptions to be maintained in
order for these observational studies to have a causal interpretation.
Therefore, it is very important to understand the theory behind the
methods that we will learn, the assumptions that they require, under
what circumstances these assumptions are violated, and what, if
anything, we can learn when the assumptions are incorrect.
The empirical examples in class and empirical exercises in homework
assignments are aimed to link the theory you learn in class with
applications that illustrate these important issues.

I. Statistical Review
For this course, we will assume that everyone understands basic
probability and statistics.
However, we will spend the first two or three classes reviewing these
concepts for two reasons.
First, we want to be certain that everyone has seen the same topics
presented in a similar manner before moving onto econometrics.
Second, many of the statistical concepts you have previously seen will
be applied and extended in econometrics.
By reviewing these concepts, it will be much easier to see the parallels
between what you already know and how they are applied.

Appendix B Fundamentals of Probability


Section B.1 Random Variables and Their Probability Distributions
A random variable is a variable whose value is determined by the
outcome of an experiment.
A discrete random variable takes on a finite or a countably infinite
number of values.
Examples
Tossing a coin, rolling a pair of dice, drawing a card
A discrete random variable, , is described by its probability density
, which is a list of all of the values the
function (pdf), denoted as
random variable can take on and the associated probabilities where

where

1,2, . . . ,

can be one of k possible values.


10

A continuous random variable has a sample space that contains an


uncountably infinite number of outcomes.
Examples
Temperature, height, and an amount of time.
However, the probability that a continuous random variable takes on
any particular value exactly is zero.
Thus, for continuous random variables, we work with the cumulative
distribution function (cdf) which is written as

where

is the continuous pdf.

11

Section B.2 Joint and Conditional Distributions, and Independence


Let X and Y be discrete random variables.
The joint distribution of and
probability density function
,

The random variables and


joint pdf can be written as
,

and
where
respectively.

is fully described by their joint


,

are independent if and only if their


are the marginal pdfs for

and ,

We will not examine the joint pdf of continuous random variables


this semester which is why it is not discussed here.

12

In economics, we are often interested in the pdf of one random


variable given a particular value of another random variable.
We write the conditional PDF of
|

Notice that

given

is defined as
,

0.

is only defined if

When both random variables are discrete we can write


|

which is read as the probability

given that

When and are independent, then knowing the value of


provides no information of and vice versa such that
|

and

13

Section B.3 Features of Probability Distribution Functions


Expected Value
The expected value, or mean, of a probability distribution
is
function that takes on k discrete values
, ,,

where
If

is the pdf of .

is a continuous random variable then

, or sometimes
We write
as the population mean.

, and refer to

14

We can also compute the expected value of a function,


of the random variable .

If is a discrete random variable then the expected value of


the random variable
is given by

If is a continuous random then the expected value of the


random variable
is

15

Properties of the Expected Value,


1) For any constant c,

2) For any constants a and b,


are constants and
, ,,
3) If
random variables, then

,,

are

Alternatively, we can write this expression as

16

Variance
The variance measures the dispersion of a pdf.

The variance is the expected value of the (squared) difference


a value of X and the mean of the distribution.
We can apply the formulas for the expected value of a
function of X to compute the variance.
For example, the variance of a discrete random variable is

17

Properties of
1) If c is constant, then

2) If a and b are constants, then

0.

One issue with using the variance is that its units are the
square of the units of the random variable.
For example, if the random variable X is measured in feet then
is measured in feet squared.
In some instances it is useful to work with the positive square
root of the variance which is known as the standard deviation
and is denoted as .

18

Section B.4 Features of Joint and Conditional Distributions


Covariance
The covariance is a measure of how much two random
variables move together (co-vary).
,

Notice that if
mean, then

tends to be above its mean when


,
0.

Similarly, if tends to be below its mean when


mean, or vice versa, then
,
0.

is above its

is above its

19

,
,

Properties of
1) If

and

When

are independent, then


and

0.

are independent,
,

Thus,

0.

2) If a, b, c, and d are constants, then


,
3) |

This last result is known as the Cauchy-Schwartz Inequality.

20

Correlation Coefficient
The correlation coefficient offers an advantage over the
covariance since it is on a rather intuitive scale.
,

,
Notice that
In addition, 1

will have the same sign as


,

1.

Whereas,
, can take on any real value,
,
allows us to scale the degree to which two variables co-vary.
+1 means X and Y are perfectly positively correlated.
-1 means X and Y are perfectly negatively correlated.

21

Variance of Sums of Random Variables


For constants a and b, then
2
Notice that if

while if

1 and

1 and

1, then
2

1, then

Recall that if and are independent, the covariance between


the two variables is zero (
,
0), so that for two
independent random variables the above formulas reduce to

22

Conditional Expectation
While the covariance and correlation treat the relationship between
X and Y symmetrically, in many instances we will be interested in
explaining a variable in terms of another variable.
For example, we may be interested in knowing whether earnings
depend upon an individuals level of education.
One set of statistics we might compute is the expected amount of
earnings for people conditional on their levels of education.
The conditional expectation for a discrete random variable
is
where takes on m different values
, ,,
|

given

23

Properties of Conditional Expectation


1)

for any function

Once we know , any function of


2) For functions

and

behaves like a constant.

,
|

3) If X and Y are independent, then


4)

|
This property is known as the law of iterated expectations.
We can first compute

for each possible value of .

We then take the expected value of


distribution of which yields
.

over the

24

5)

| ,

This property is a more general version of the law of iterated


expectations.
|
6) If
, then
,
0(
,
Moreover, every function of X is uncorrelated with Y.

0).

Note that the converse of this last property is not true; if


| depends on .
,
0, then it is possible that
Combining these last two properties notice that if and
|
are random variables where
0 then
0 since

i.
ii.

0, i.e. U and X are uncorrelated since


0.

25

Section B.5 The Normal and Related Distributions


The Normal Distribution
A random variable is said to be normally distributed with
if
mean and variance
1
,

2
is used to denote the fact that
The symbol ~
,
normally distributed with mean and variance .

is

A random variable Z follows the standard normal distribution


when
0 and
1 such that ~ 0,1 and has a pdf
1
2

26

Properties of the Normal Distribution


1) If ~

, then

0,1 .

Applying properties of the expected value, notice that


1
1

and applying properties of the variance we see that


1
1
2) If ~

, then

3) If and Y are jointly normally distributed, then they are


independent if, and only if,
,
0.

27

4) Any linear combination of independent, identically distributed


normal random variables has a normal distribution.
This last property has implications for the average of
independent, identically distributed normal random variables.
If , , , are independent random variables, each of
, then the average of the random
which is distributed
,

variables,
is normally distributed.
Furthermore,

28

If , , , are independent random variables, each of


, then
which is distributed
,
1

Furthermore, recall that the first property of the normal


, then
~ 0,1 .
distribution is that if ~
,
Therefore, if , , , are independent random variables,
, then
each of which is distributed
,

0,1

29

The Chi-Square Distribution

Let
where ,
normal random variables.

,,

are independent standard

Then X follows the chi square distribution with n degrees of


freedom (which is a special case of the gamma distribution)
which we write as ~ .
Degrees of freedom generally refers to the number of
independent pieces of information to create a random variable.
We will use the abbreviation d.f. to refer to degrees of freedom.
If a random variable X is distributed ~ , then it has an
expected value of n and a variance of 2n.

30

The following figure plots the chi-square distribution for three


different values of the degrees of freedom: 3, 5, and 7.

Chisquaredistribution
ProbabilityDensity

0.6
0.5
0.4
0.3
0.2
0.1
0
0

3
3df

5
5df

10

7df

31

The F Distribution
Suppose U and V are independent chi square random variables
with n and m degrees of freedom, respectively.
A random variable of the form

is said to have an F

distribution with m and n degrees of freedom.


We will use the notation
,

to denote an F random variable with m and n degrees of freedom.

32

Since the , random variable is the ratio of two independent


chi square random variables, it will always be non-negative
and its distribution will depend on the values of both m and n.

ProbabilityDensity

1.0
0.8
0.6
0.4
0.2
0.0
0

m=5,n=10

m=10,n=10

10

m=20,n=10

33

The t Distribution
Let

be a standard normal random variable.

Let
has

be a chi square random variable independent of


degrees of freedom.

which

The Students t ratio with n degrees of freedom is denoted

The t distribution with degrees of freedom has an expected


value of zero and a variance of
2 .

34

The standard normal distribution and the t distribution have


similar shape.
0.45

ProbabilityDensity

0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
3

2
Zratio

0
tratio,4df

tratio,10df

Both have an expected value of zero and the variance of the


t distribution,
, converges to 1 as .

35

Note a useful statistical relationship exists between the F


distribution and the t distribution by recalling that the t ratio

is comprised of a standard normal random variable, Z, and a


chi-square random variable with n degrees of freedom.
Squaring both sides of the above expression yields

Since ~ by the definition of a chi-square random variable


and U is a chi-square random variable, we see that

1
~

36

Appendix C Fundamentals of Mathematical Statistics


Section C.1 Populations, Parameters, and Random Sampling
In this course we will spend a great deal of time focusing on
statistical inference which is using statistics in order to draw
inferences about a population parameter.
A population is a well-defined group such as a group of
individuals, a group of firms, etc.
Statistical inference is usually split into two categories:
1) Estimation - We can estimate a population parameter such as
the average household income in the state of Michigan.
2) Hypothesis Testing - We can test a hypothesis such as the
average household income in Michigan equals $50,000.

37

Sampling
In many instances, we will be interested in knowing the value
of one or more population parameters.
For example, if we want to know about the degree of income
inequality in society, we would be curious to know about the
expected value and variance of the population income
distribution.
If we have a Census, then we would be able to learn the true
characteristics of the income distribution.
However, interviewing everyone in the population is a very
costly exercise in terms of both time and money.

38

Random Sampling
Instead, we will observe a sample of the population and use
the sample to generate our best guess as to what the true
characteristics of the population distribution actually are.
Suppose that is a random variable with a probability density
function
; where is an unknown parameter.
A random sample from
; is observations,
,
that are drawn independently from the pdf
; .

,,

as
We sometimes refer to the random sample
, ,,
independent, identically distributed (i.i.d.) random variables.

39

Section C.2 Finite Sample Properties of Estimators


We now turn to estimators of population parameters and note that
there are two types of properties of these estimators.
The first set of properties is finite sample properties which are
sometimes referred to as small sample properties.
The latter title is somewhat misleading since it refers to samples of
any size whether the number of observations is small or large.
The second set of properties is asymptotic properties which refer to
the behavior of estimators as the sample size approaches infinity.

40

Estimators and Estimates


Any function of a random sample whose objective is to
approximate a parameter is called an estimator.
Example
is a random sample from a
Suppose that
, ,,
population with a mean of .

The sample average,


unknown population mean, .

, is an estimator of the

, and we
, ,,
After we collect the actual data,
compute the estimator by using the values that we measure in
the sample, the resulting value is known as an estimate.

41

More generally, the estimator of the parameter is a


function of the random variables , , , where
,

,,

For any random sample of size n that we draw, we can


compute an estimate of by using .
Notice that since depends on the random variables,
, , , , is itself a random variable.
Thus, the estimator has a distribution which is known as a
sampling distribution.
For example, as with all estimators, the sample average has a
sampling distribution.

42

Unbiasedness
An estimator

of

is said to be unbiased if

That is, if the expected value of the estimator equals the


parameter being estimated then the estimator is unbiased.
Example

The sample average,


, is an unbiased
estimator of the unknown population mean, , since

43

We define the bias of an estimator as

Example

For the sample average,


.
seen that

, we have already

Therefore, we can compute the bias of .


0
The bias of the sample average is
means, as we have already seen, that

0 which
is unbiased.

44

An example of a biased estimator of the population mean is


1
1
This alternative estimator is biased since
1

1
1
1
1
1
1

45

The Sampling Variance of an Estimator


While unbiasedness is an important property of an estimator, it
is not the only criteria to consider when selecting an estimator.
Even if an estimator is unbiased, we know that in a given
random sample of observations, our estimate of the parameter
may be quite different than the true value of the parameter.
Intuitively, we would like the estimator we select, , to have a
high probability of being close to the true parameter, .
Therefore, the sampling variance of the estimator, which is
associated with the sampling distribution of the estimator, is
an important factor to consider.

46

Example
We can compute the sampling variance of the sample

average,
.

1
1

Notice that the sampling variance of


sample size, , gets larger.

gets smaller as the

47

Efficiency
It is possible that we may encounter multiple estimators for
the same parameter.
Example
We have already seen that the sample average is an
unbiased estimator of the population mean, .
An alternative estimator is to only use the first
observation of the random sample, , as an estimator for
the population mean, .
Notice that this alternative estimator is also an unbiased
estimator since

48

If we have multiple estimators for the same parameter, it is


helpful to have a method for choosing between them.
As discussed above, estimators have a sampling distribution
which means that they will yield a different estimate each time
that a new random sample is collected.
However, some estimators are more likely to be closer to the
population parameter than other estimators.
Intuitively, when comparing estimators, the estimator with the
smaller sampling variance is more likely to be closer to .
If and are both unbiased estimators for a parameter ,
then we say that we say that is more efficient than if

49

The following graph compares the sampling variance of two


estimators where the estimator with the smaller sampling
variance is shown with the solid red line.

Notice that for the given interval ,


, the estimator
with the smaller variance has more probability in this range.

50

Example

As we have seen, the sample average


, is an
unbiased estimator for and has a sampling variance of

The alternative estimator using only the first observation


of the random sample, , is also unbiased estimator for .
The variance of the alternative estimator is

Therefore,

is more efficient than

since

51

Section C.3 Asymptotic or Larger Sample Properties of Estimators


Another useful set of properties of estimators are
asymptotic or larger sample properties of the estimators.
One useful reason for investigating the asymptotic properties
of estimators is that we can examine the performance of an
estimator as the sample size grows which gives us another
way to choose between estimators.
Another useful reason for examining asymptotic properties is
that determining the sampling distribution in finite samples is
rather difficult for some estimators.
However, in many cases, it is easier to determine the
asymptotic sampling distribution and to use it as an
approximation in order to draw inferences.

52

Consistency
One useful property for an estimator is that as the sample
grows infinitely large, the estimator equals the true parameter.
is an estimator of with a sample size n, then
Formally, if
is a consistent estimator of if for every
0,
lim

If

is not consistent for

1
then we say it is inconsistent.

In addition, if
is consistent, then we say that
which is written as
probability limit of

is the

53

A useful illustration of consistency is the sample average, ,


from a population with mean and variance .
We have already seen that
we saw that

Notice that as
Therefore,

is unbiased for

and, in addition,

0.

is a consistent estimator of .

Thus, if , , , are independent and identically


distributed random variables with mean , then
,
which is known as the law of large numbers.

54

A (biased) alternative estimator for the population mean is


1
1
1
Notice that as

In addition, we can show that


1
Notice that as

It can be shown that

0.
is a consistent (but biased) estimator of .

55

Asymptotic Normality
In order to draw inferences, we need to know not only the
estimator, but we also need to know information about the
sampling distribution of the estimator.
Many econometric estimators are approximated by the normal
distribution as the sample size gets large.
:
1,2, be a sequence of random variables such
Let
that for all numbers z,

where

is the standard normal cdf.

is said to have an asymptotic standard normal


Then,
distribution which we write as
0,1 where the a
~
stands for either asymptotically or approximately.

56

Recall from before that if , , , are independent random


, then
variables, each of which is distributed
,

0,1

A very powerful result, known as the Central Limit Theorem,


applies to a random sample , , , drawn from any
distribution with mean and variance .
As the sample size gets infinitely large, the following ratio
involving the sample average

has an asymptotic standard normal distribution.

57

Section C.5 Interval Estimation and Confidence Intervals


While estimation of a population parameter generally yields a
single number as an estimate, that overlooks the fact that there
is uncertainty about the true parameter.
Example
The sample average yields a point estimate, , of the true
population average, .
However, simply reporting this point estimate ignores the
fact that has a sampling distribution.
However, we can instead generate an interval estimate which
is range in which the true parameter lies.

58

Example
Suppose that , , , are independent random
.
variables, each of which is distributed
,
We have already seen that

0,1

As we have seen, the sample average, , is an unbiased


point estimate for the population mean, .
How can we use this information to create an interval
estimate for the true population mean, ?

59

0,1

We can create an interval that has a 95% probability of


containing the population mean, .
We call such an interval a 95% confidence interval.
In general, we can create a 100 1
% confidence
interval by choosing a level of significance, .
The smaller the value of
level of confidence.

that we choose, the higher our

However, to increase our confidence, we will need a


larger interval.

60

We form a 95% confidence interval from


.

0.95

where we have choose (-z.025,z.025) such that there is a


5% chance that Z falls outside of the confidence interval.
Thought of alternatively, there is a 2.5% chance of
being in either tail of the distribution.
0.4

PDF

0.3

0.2

0.1

0
z_0.025

z_0.025
Z

61

0.4

PDF

0.3

0.2

0.1

0
z_0.025

z_0.025
Z

0.95

For example, using textbook Appendix Table G.1., we see


that the probability that a standard normal random
variable falls between -1.96 and +1.96 is 0.95, or
1.96

1.96

0.95

62

The cdf for the standard normal distribution shown below


is similar to Appendix Table G.1 for 3.1
1.8.
Z
-3.0
-2.9
-2.8
-2.7
-2.6
-2.5
-2.4
-2.3
-2.2
-2.1
-2.0
-1.9
-1.8

0
0.0013
0.0019
0.0026
0.0035
0.0047
0.0062
0.0082
0.0107
0.0139
0.0179
0.0228
0.0287
0.0359

0.01
0.0013
0.0018
0.0025
0.0034
0.0045
0.0060
0.0080
0.0104
0.0136
0.0174
0.0222
0.0281
0.0351

0.02
0.0013
0.0018
0.0024
0.0033
0.0044
0.0059
0.0078
0.0102
0.0132
0.0170
0.0217
0.0274
0.0344

0.03
0.0012
0.0017
0.0023
0.0032
0.0043
0.0057
0.0075
0.0099
0.0129
0.0166
0.0212
0.0268
0.0336

0.04
0.0012
0.0016
0.0023
0.0031
0.0041
0.0055
0.0073
0.0096
0.0125
0.0162
0.0207
0.0262
0.0329

0.05
0.0011
0.0016
0.0022
0.0030
0.0040
0.0054
0.0071
0.0094
0.0122
0.0158
0.0202
0.0256
0.0322

0.06
0.0011
0.0015
0.0021
0.0029
0.0039
0.0052
0.0069
0.0091
0.0119
0.0154
0.0197
0.0250
0.0314

0.07
0.0011
0.0015
0.0021
0.0028
0.0038
0.0051
0.0068
0.0089
0.0116
0.0150
0.0192
0.0244
0.0307

0.08
0.0010
0.0014
0.0020
0.0027
0.0037
0.0049
0.0066
0.0087
0.0113
0.0146
0.0188
0.0239
0.0301

0.09
0.0010
0.0014
0.0019
0.0026
0.0036
0.0048
0.0064
0.0084
0.0110
0.0143
0.0183
0.0233
0.0294

The values of Z are determined by 0.XY where X is the


tenths place shown in the rows and Y is the hundredths
place in the columns.
As the table shows,

1.96

0.025.

63

The cdf for the standard normal distribution shown below


is similar to Appendix Table G.1 for 1.8
3.1.
Z
1.8
1.9
2.0
2.1
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
3.0

0
0.9641
0.9713
0.9772
0.9821
0.9861
0.9893
0.9918
0.9938
0.9953
0.9965
0.9974
0.9981
0.9987

0.01
0.9649
0.9719
0.9778
0.9826
0.9864
0.9896
0.9920
0.9940
0.9955
0.9966
0.9975
0.9982
0.9987

0.02
0.9656
0.9726
0.9783
0.9830
0.9868
0.9898
0.9922
0.9941
0.9956
0.9967
0.9976
0.9982
0.9987

0.03
0.9664
0.9732
0.9788
0.9834
0.9871
0.9901
0.9925
0.9943
0.9957
0.9968
0.9977
0.9983
0.9988

0.04
0.9671
0.9738
0.9793
0.9838
0.9875
0.9904
0.9927
0.9945
0.9959
0.9969
0.9977
0.9984
0.9988

As the table shows,

0.05
0.9678
0.9744
0.9798
0.9842
0.9878
0.9906
0.9929
0.9946
0.9960
0.9970
0.9978
0.9984
0.9989

1.96

0.06
0.9686
0.9750
0.9803
0.9846
0.9881
0.9909
0.9931
0.9948
0.9961
0.9971
0.9979
0.9985
0.9989

0.07
0.9693
0.9756
0.9808
0.9850
0.9884
0.9911
0.9932
0.9949
0.9962
0.9972
0.9979
0.9985
0.9989

0.08
0.9699
0.9761
0.9812
0.9854
0.9887
0.9913
0.9934
0.9951
0.9963
0.9973
0.9980
0.9986
0.9990

0.09
0.9706
0.9767
0.9817
0.9857
0.9890
0.9916
0.9936
0.9952
0.9964
0.9974
0.9981
0.9986
0.9990

0.975.

Therefore,

1.96

1.96

1.96
0.975 0.025
0.95

1.96

64

1.96
Since

1.96

0.95

is a standard normal random variable, the

probability is 0.95 that it falls between -1.96 and +1.96, or


1.96

1.96

0.95
to

We can then re-write the expression inside of


find the 95% confidence for .
1.96

1.96

1.96

1.96

0.95

1.96

1.96

0.95
0.95

65

Example
The height of white females who registered to vote in
Allegheny County, PA during the 1960s is normally
distributed with a variance of 6.25 (in inches).
If a random sample of 9 women is selected and the
sample average is height
65.5, construct a 95%
confidence interval for the true average height, .
Noting that

6.25
confidence interval for is
1.96
65.5

1.96

2.5

9
63.87

2.5, the 95%


1.96
65.5
67.13

1.96

2.5
9

66

Confidence Intervals for the Mean from a Normally Distributed


Population
Assuming that we have a random sample from a normally
distributed population and that we know the variance of the
distribution, we can use the approach on the preceding slides
to construct a confidence interval, .
In situations in which we have much prior experience with the
items being sampled, such as the manufacturer of a product
who has detailed knowledge of the weight of their product,
then we may know the variance of the distribution.
However, in many instances we will not know the variance of
the distribution so that we cannot use the above methods.

67

In order to construct a confidence interval for a random


sample that is drawn from a normal distribution but with
unknown variance, we must first estimate the variance.
The sample variance

for a sample of size n is computed as

It can be shown that


population variance,

is an unbiased estimator of the true


.

If the random sample is drawn from a normal distribution,


follows the chi square distribution with
then the ratio
1 degrees of freedom.

68

It can also be shown that if , , , is a random sample


from the normal distribution with mean , and variance ,
then the t ratio

has a Student t distribution with


1 degrees of freedom
where is the square root of the sample variance, .
We can create a 100 1
% confidence interval for
using an approach similar to the method used earlier for Z.
Thus, for the 95% confidence interval, we must find the
appropriate values such that
.

0.95

69

0.95

Appendix Table G.2. in the textbook contains a number of


useful values for the Students t distribution.
where
The table gives values , such that
,
the value of is shown across the top row and each row of the
table corresponds to a different degrees of freedom.
df
1
2
3
4
5
6

30

0.10
3.078
1.886
1.638
1.533
1.476
1.440

1.310

1.282

0.05
6.314
2.920
2.353
2.132
2.015
1.943

0.025
12.706
4.303
3.182
2.776
2.571
2.447

0.01
31.821
6.965
4.541
3.747
3.365
3.143

0.005
63.657
9.925
5.841
4.604
4.032
3.707

1.697 2.042

2.457

2.750

1.645 1.960

2.326

2.576

70

0.95

For a sample of size n, we must find the corresponding value


from the t distribution table with
1 degrees of freedom.
Using that value, . , , we can construct the 95%
confidence interval in a manner similar to as we did above
.

0.95

0.95
0.95

71

Example
Returning to the height example of white females who
registered to vote in Allegheny County, PA during the
1960s where height is normally distributed.
Suppose that for the random sample of
we compute the sample average height

9 women,
65.5.

However, we do not know the variance of height in the


population, , but are able to compute the sample
8.5.
variance of height,
Construct a 95% confidence interval for the true average
height, , among white female registered voters.

72

8 d.f.,

First, notice that with


df 0.10 0.05 0.025
8 1.397 1.860 2.306

0.01
2.896

0.95

2.306.

0.005
3.355

Inserting the appropriate values into the expression below


yields the 95% confidence interval for, .
.

65.5

2.306

8.5
9

63.26

65.5

2.306

8.5
9

67.74

73

Asymptotic Confidence Intervals for Nonnormal Populations


Although the proceeding two methods work when the
population is drawn from a normal distribution, what should
we do if the data come from another distribution?
If we are unable to compute the sampling distribution of
then one alternative is to rely on the central limit theorem.
By this theorem, as the sample gets very large then the
sampling distribution of
In cases where

approaches the standard normal.

is unknown, the sampling distribution of

also approaches the standard normal as

Thus, we can rely on the Central Limit Theorem to construct


confidence intervals for in large samples.

74

Section C.6 Hypothesis Testing


Suppose that we have limited information about a population
parameter.
We may develop an idea or a hypothesis about the true
parameter value.
If we are able to randomly sample from our population, we
could then test our hypothesis.
When testing our hypothesis, we call our hypothesis the null
hypothesis ( ).

75

Example
In our height example, we may formulate a null
hypothesis that the true mean height of white female
registered voters in Allegheny County is 63 inches.
We would write
:

63

We test the null hypothesis against an alternative


hypothesis ( ) of which there are multiple options.

:
:

63
are called one-sided alternative hypotheses.
63

63 is a two-sided alternative hypothesis.

76

To test the null hypothesis against one of these alternative


hypotheses, we need to develop a decision rule.
Once we have done so, we then use this decision rule to decide
if we will reject our null hypothesis or if we fail to reject it.
There are two approaches to forming our decision rules for
hypothesis testing:
1) using confidence intervals
2) using the test of signficance

77

We will now illustrate both of these approaches using our


height example where the population variance is unknown.
In this example,

9,

65.5, and

2.92.

We will test
:

63

with the alternative hypothesis being


:

63

Futhermore, we will choose our level of significance


0.05
which corresponds to a 95%, or 100 1
%, confidence
interval.

78

Using confidence intervals in hypothesis testing


We first need to decide how we form our confidence interval
for our hypothesis test.
We previously showed that we form our confidence interval
using the t-distribution.
.

0.95

We will next form our decision rule.


if our hypothesized mean,
Decision rule: Reject
not within the 95% confidence interval.

63, is

79

We are now ready to test our hypothesis.


We previously found for our example that the 95% confidence
interval for is
63.26

67.74

The 95% confidence interval does not contain our null


hypothesis value of 63.
We therefore reject our null hypothesis at the 95% level of
confidence.
In hypothesis testing, the confidence interval is also called the
acceptance region.
The area outside the region is the critical region and the limits
of the regions are the critical values.

80

Using tests of significance in hypothesis testing


The test statistic that we will use for our problem is from the
t distribution (called the t-ratio).

From our sample we can compute all of the values found in


the t ratio except for .
However, once we have formed a null hypothesis, we
substitute that for .
We will then be asking the question If our null hypothesis is
true, what is the probability that our sample estimate of the
mean is that far from the true value?

81

We next examine the t distribution tables for the value that


corresponds to
0.05 for the sample degrees of freedom.
As we previously saw, the corresponding critical t value is
2.306.
.
,
We are now able to form our decision rule.
Decision rule: Reject

if |

|
65.5

2.306.
63

2.929

2.57

Since our t ratio is greater than our critical t value we reject


our null hypothesis.

82

What if we have a one-tailed hypothesis instead?


The alternative hypothesis could have been

63.

Even though we will still form a 95% confidence interval, how


will our critical value change?
0.4

PDF

0.3

0.2

0.1

0
t_0.05
T

Thus, we want to have 5% probability in the right-hand tail


rather than 2.5% in both tails.

83

Examining the t distribution table with


df 0.10 0.05 0.025
8 1.397 1.860 2.306

0.01
2.896

we see that the critical value is

The decision rule is to reject

if

8 d.f.,

0.005
3.355

1.860.
1.860.

We form the t ratio as before and we previously found the t


ratio to be 2.57.
The difference is now the critical value we use in our decision
rule which is 1.860 as opposed to 2.306.
Since the t ratio is still greater than our critical value, we reject
the null hypothesis.

84

Type I and Type II errors


Can our null hypothesis still be true even if we rejected it?
We only have a 95% confidence interval, not 100%.
It could be that we have made a type I error which where we
reject the null hypothesis when it is really true.
The probability of making a type I error is known as the level
of significance, , which is the same we use to choose the
size of our confidence interval.
We also could make a type II error which is where we accept
the null hypothesis when it is in fact false.
We designate to be the probability of making a type II error.
1
is known as the power of the test.

85

How can we think about Type I and Type II errors?


In a trial, the null hypothesis is that the defendant is not guilty
while the alternative hypothesis is that the defendant is guilty.
A Type I error is to reject the null hypothesis when it is true.
In a trial, that would mean finding a defendant guilty when she
is really not guilty.
A Type II error is to accept the null when it is false.
In a trial, that would mean finding a defendant not guilty when
she is really guilty.
Most jurists prefer to reduce the number of not guilty persons
who are sent to prison (i.e. reduce the number of Type I errors).

86

P-values
Our hypothesis testing proceeds by finding a critical value
and then testing whether either the sample average lies within
the confidence interval or if the t-statistic exceeds a threshold.
Another approach is to ask the question How likely is that
we would observe the sample mean, , that we find in our
sample if the population mean is really ?
In our height example, we would ask how likely is it that we
would observe
65.5 in our sample if the true population
mean is
63 (the null hypothesis).
We proceed as before when we were using the test of
significance approach.
65.5

63

2.929

2.57

87

2.57,

Using the t distribution table and where


df 0.10 0.05 0.025
8 1.397 1.860 2.306

0.01
2.896

0.005
3.355

we see that the estimated t ratio of 2.57 falls between 2.306


which corresponds to a probability of p=0.05 (in both tails)
and 2.896 which corresponds to a probability of p=0.02.
Thus, the probability that we would observe
65.5 under
the null hypothesis that
63 is between 0.05 and 0.02.
We call this probability the p-value.
Why concern ourselves with p-values?
Instead of arbitrarily choosing a level of significance as we did
before, we are now reporting the probability that we would
find the sample mean if the null hypothesis is in fact true.

88

Testing the Equality of Two Population Means


Suppose that individuals can belong to one of two populations
and we are interested in knowing if the means of the two
populations are the same.
Observations from both populations are normally distributed
and ~
.
,
,
with ~
The null hypothesis is
:
which we can also write as
:

The alternative hypothesis is


:

or

89

We use the test which allows for not only the sample sizes to
differ between the two samples but we also allow for the
variances in the two populations to differ.
To test this null hypothesis, suppose that we draw large
(n>30), independent samples from each population.
We make this assumption since the degrees of freedom
calculation is slightly complicated.
However, if the sample is large enough we can simply
approximate the sample distribution of the test statistic with
the standard normal distribution by application of the Central
Limit Theorem.

90

From the sample of size m from the first population we can


calculate the sample mean and the sample variance .
From the second sample, with a sample size n, we can
determine and .
The test statistic that we use is a t ratio which is

The degrees of freedom subscript is intentionally left off since,


as discussed above, we are assuming that we have a large sample
such that this statistic has a standard normal distribution.

91

Notice the similarity between the t ratio for the test of equality
of two means and the t ratio that we used previously to test a
hypothesis about a population mean

The difference is that the sample mean and the population


mean from the initial t ratio are replaced by the differences in
the sample means and the population means while the variance
of the sample mean is replaced by the variance of the
difference in means.

92

Example
We can test whether mean height differs between male and
female voters in Allegheny County.
We draw a new sample of 36 male and 36 female voters where
the means of the sample are
summarize height sex
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------height |
72
67.29861
3.850791
59.5
75
sex |
0

Notice that the variable sex appears to be missing.


However, since it is a character variable, not a numeric
variable, we must use the tabulate command.
tabulate sex
sex |
Freq.
Percent
Cum.
------------+----------------------------------F |
36
50.00
50.00
M |
36
50.00
100.00
------------+----------------------------------Total |
72
100.00

93

To test the null hypothesis that the mean heights of men and
women are equal, we need to compute the means and standard
deviations separately for men and women.
summarize height if sex=="F"
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------height |
36
64.52778
2.850926
59.5
72
summarize height if sex=="M"
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------height |
36
70.06944
2.481799
64
75

The bysort command yields the same result.


bysort sex: summarize height
-----------------------------------------------------------------------------------> sex = F
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------height |
36
64.52778
2.850926
59.5
72
-----------------------------------------------------------------------------------> sex = M
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------height |
36
70.06944
2.481799
64
75

94

bysort sex: summarize height


-----------------------------------------------------------------------------------> sex = F
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------height |
36
64.52778
2.850926
59.5
72
-----------------------------------------------------------------------------------> sex = M
Variable |
Obs
Mean
Std. Dev.
Min
Max
-------------+-------------------------------------------------------height |
36
70.06944
2.481799
64
75

The sample statistics are for women 64.5,


8.12 and for men
70.1,
2.48, and
Since our null hypothesis is
alternative hypothesis is :
two tailed test.

2.85, and
6.15.

0 while our
0, we will perform a

At the
0.05 level of significance, the critical t value is
if | | 1.96.
1.96 so our decision rule is to reject

95

Women: 64.5,
Men:
70.1,

2.85, and
2.48, and

8.12
6.15

We can insert these values, along with the sample sizes of 36


observations in each sample, into the test statistic.

64.5


70.1

8.12
36
8.89
Since | |

| 8.89|

8.89

6.15
36

1.96, we reject

96

Practice Problems for This Set of Notes


Appendix B: Problems B.2 and B.10
Appendix C: Problems C.1, C.3, C.6 and C.7

97

You might also like