1
James B. McDonald
Brigham Young University
5/2010
I. Introduction to Econometrics
Objective: Make this one of the most interesting and useful courses you take in your
undergraduate program.
Outline: A. Models and Basic Concepts, B. Data, C. Econometric Projects, D.
Problem set
Econometrics deals with the problem of estimating relationships between variables. These
techniques are widely used in the public and private sectors as well as in academic settings. They
help provide an understanding about relationships between variables which can also be useful in
policy analysis and in quantifying expectations about future events.
Some applications of econometric procedures include:
• Economics and Business
o Estimation of demand relationships
impact of advertising on demand
pricing decisions
determinants of market share
estimation of income elasticities
o Estimation of cost relationships
o International trade and the balance of payments
o Macro models
o Rational expectations
o Predicting corporate bankruptcy or individual default on loans
o Identifying takeover targets
• Education
o Production functions
Tradeoffs between different education techniques
o Estimation of supply and demand for teachers
o Predicting acceptance into graduate and professional programs
o Estimating the impact of different types of schools on graduate’s salaries
• Political Science
o Analysis of voting behavior
• Public Sector
o Forecasting tax receipts
o Public Sector production functions
• Legal Profession
o Models of jury selection
o Discrimination
2
I
In each application there is the question of (1) MODEL FORMULATION (functional form,
variable classification as well as the theoretical foundation), (2) ESTIMATION of unknown
parameters, (3) TESTING hypotheses, and (4) PREDICTION.
A. Models and Basic Concepts
1. The formulation of the model is generally based upon economic considerations.
Example 1. Consumer Demand Theory
Maximize U(X
1
, X
2
)
Subject to P
1
X
1
+ P
2
X
2
= Y
where Y denotes income and the P
i
and X
i
, respectively, denote the price and quantity of the
i
th
good.
The solution of this problem yields demand equations for X
1
and X
2
X
i
= D
i
(P
1
, P
2
, Y) i = 1, 2
where the functional form is unknown unless the utility function U( ) is specified. If
advertising (A) effects preferences (U(X
1
, X
2
, A)), then demand will also depend upon
advertising expenditure, X
i
= D
i
(P
1
, P
2
, Y, A). Statistical data for X
i
, P
i
and Y and
econometric procedures are then used to estimate the demand equations and any unknown
parameters.
Example 2. A Simple Macro Model
C
t
= β
1
+ β
2
(Y
t
 T
t
)
Y
t
= C
t
+ I
t
+ G
t
+ X
t
where C
t
, Y
t
, I
t
, G
t
, T
t
, and X
t
respectively denote consumption, total production, investment,
government expenditure, taxes and net exports. β
1
and β
2
are unknown parameters.
It is important to remember that models are not complete descriptions of a situation, but
rather attempt to summarize the main relationships between the variables.
3
I
a. Classification of Variables
(1) Endogenous variables (dependent)those variables determined by the model, e.g.,
X
1
and X
2
in example 1 and Y
t
and C
t
in example 2.
(2) Exogenous variables (independent)those variables not determined by the model,
but which are assumed to be given. P
1
, P
2
and Y would be exogenous in example
1. I
t
, G
t
, T
t
and X
t
would be the exogenous variables in model 2.
(3) Predetermined variables
(a) lagged endogenous variablesendogenous variables from a previous time
period;
(b) exogenous variables as defined above.
b. Representation of Models
(1) Structural representationa mathematical representation of a hypothesized
model (based on economic theory) which determines the value of endogenous
variables collectively explained by the model. The structural equations may
include more than one endogenous or dependent variable per equation.
Examples:
(a) A simple macro model
C
t
= β
1
+ β
2
(Y
t
 T
t
) + ε
t
Y
t
= C
t
+ I
t
+ G
t
+ X
t
Dependent variables: C,Y
Independent variables: T, I, G, X
Unknown parameters: β
1
, β
2
(b) Demand: Q
t
= β
1
+ β
2
P
t
+ γ
1
Y
t
+ ε
1t
Supply: Q
t
= β
3
+ β
4
P
t
+ γ
2
w
t
+ ε
2t
Dependent variables: Q, P
4
I
Independent variables: Y, W
Unknown parameters: β
1
, β
2
, β
3
, β
4
, γ
1
, γ
2
The ε's in these equations represent the "errors" not explained in the model. The
errors can represent the impact of other explanatory factors or measurement errors.
In each case we will want to use data to estimate the unknown parameters.
(2) Reduced form representationexpresses the current level of each of the
endogenous variables as a function of predetermined variables (exogenous and/or
lagged dependent).
Examples: The reduced form representation corresponding to the two previous
structural models can be shown to be as follows:
(a)
β
ε
β β
β
2
t
t t 2 t t
2 2
1
t
 1
+ )
X
+
T B

G
+
I
(
 1
1
+
 1
=
Y
β
ε
β
β
β
β
β
2
t
t t
2
t t
2
2
2
1
t
 1
+ )
X
+
T

G
+
I
(
 1
+
 1
=
C
(b)
β β
ε ε
β β
γ
β β
γ
β β
β β
4 2
t 2 t 2
t
4 2
1
t
4 2
2
4 2
1 3
t


+
Y

+
w

+


=
P
β β
ε
β
ε
β
β β
γ β
β β
γ β
β β
β β
4 2
t 2
4
t 1
2
t
4 2
1 4
t
4 2
2 2
4 2
4 2
t


+
Y


w

+


= Q
Economics 388 will introduce the analysis of structural economic models, but will primarily
focus on models written in the reduced form representation, i.e., with the dependent variable
on the left and predetermined variables on the right hand side. However, there are some
very important problems with endogenous variables (endogenous regressors) on the right
hand side of the equation.
5
I
2. Estimation of Unknown Parameters
The coefficients of the variables in the reduced form and structural representations are
referred to as parameters and are generally unknown. The notation β
ˆ
will be used to denote
the estimator of the unknown population parameter β. In order to obtain any quantitative (as
opposed to qualitative) estimates of the impact of changes in exogenous variables upon the
dependent variables, the unknown parameters must be estimated. As an example of this we
note that based upon the macro model just considered
.
 1
1
=
G
Y
2
t
t
β ∂
∂
Recalling that
Y
C
=
t
t
2
∂
∂
β (marginal propensity to consume) is generally assumed to be
between zero and one, we can deduce that in this model an increase in government
expenditure will result in an increase in the equilibrium level of income. However, in order
to estimate the magnitude of the increase in Y
t
associated with the increase in G
t
, β
2
must be
estimated. Sometimes it may be easier to estimate the reduced form coefficient
β
2
 1
1
directly.
3. Tests of Hypotheses
Many times we are faced with the problem of determining whether a particular variable
is an important explanatory factor: does wealth or advertising have a significant impact on
consumption; what is the direction of influence of a change in a variable; or how can we test
hypotheses about the magnitude of an elasticity under consideration. All of these problems
involve hypothesis testing and require a knowledge of the density of the estimator under
consideration or of a related test statistic.
6
I
For example, assume that the density of β
ˆ
2
, f( β
ˆ
2
), under the null hypothesis H
o
: β
2
= 0
appears as follows:
Assume that
2
ˆ
β denotes the estimated value of β
2
. If β
ˆ
2
is far out in the tail, which is
unlikely under the null hypotheses, we will agree to reject the null hypothesis that β
2
= 0.
4. Prediction
A frequent application of econometrics is to obtain predictions for the dependent
variables corresponding to a certain value for the independent variable(s) [X]. In order to
obtain a prediction for the dependent variable (Y) in some future period, we need to obtain a
prediction for the independent variables (X) (say X*) in that period and also assume that the
relationship between X and Y observed in the sample period continues to be valid in the
future. Substituting in the predicted value of X (X*) into the estimated relationship yields
the estimated value of Y (Y*= β
ˆ
1
+ β
ˆ
2
X*). We know that Y* will probably not be exactly
correct and so we will also discuss methods of obtaining confidence intervals for the actual
value of Y.
β
2
= 0
f( β
ˆ
2
)
7
I
The first exercise set attempts to clarify the notion of reduced form and structural
representations of economic models. The importance of the structural parameters is also
illustrated in these exercises. We now turn to some important issues related to the data used
in estimating economic models.
B. Data
Applied econometrics involves the four steps just discussed: (1) model formulation and
interpretation of variables, (2) estimation of unknown parameters, (3) hypothesis testing, and
(4) prediction. The process summarized in these four steps is an integral part of empirical
research in the physical and social sciences. However, the results of this research may be
sensitive to the formulation of the model AND the data used. Frequently the desired data are
not available or are not in the desired form. Some data types and issues involve:
quantity and price indices: Paasche, Laspeyers
real or nominal values
total or per capita levels
stocks vs. flows
deseasonalized vs. seasonalized
An important question is whether the data we are using measure what we really want [story:
museum]. A useful reference to the importance of data and data limitations is O.
Morgenstern, On the Accuracy of Economic Observations.
Estimated relationship between x and y
Y*= β
ˆ
1
+ β
ˆ
2
X*
confidence intervals
8
I
1. Data Characteristics:
a. QuantitativeQualitative
Quantitative variables measure "quantities" such as
price, sales volume, weight or income.
Qualitative variables are used to model "either/or" situations and might be used to
model membership in one of several groups such as:
⋅homeowner or nonhomeowner
⋅employed/unemployed
⋅male/female
⋅accurate or inaccurate income tax returns
Dependent and independent variables can be quantitative or qualitative variables.
Example: Consider a possible relationship between salary, years of employment
and gender. This model might be formulated as:
Salary = β
1
+ β
2
years employed + β
3
Gender
where we will discuss ways in which “Gender” can be included in the econometric
model in another section dealing with binary or qualitative variables.
b. Time Series, Cross Sectional, Pooled Data
Time Series Datameasures a particular variable over successive time periods (annual,
quarterly, monthly, weekly; e.g., income, consumer price index (CPI)).
Cross Sectional Datameasures a particular variable at a given point in time for
different entities. An example of cross sectional data would be the wholesale price of
unleaded gas at 2:30 p.m. on January 2, 2009 across different gas stations.
9
I
Pooled or Merged Cross Sectional/Time Series Data
Per Capita Income, by State and Year
States
Year
1980 1985 1990 1995 2000 2005
Alabama
T
h
i
s
c
o
l
u
m
n
a
l
o
n
e
w
o
u
l
d
b
e
c
r
o
s
s

s
e
c
t
i
o
n
a
l
.
Alaska This row alone would be timeseries.
…
Utah
.
.
.
Panel Datapooled cross sectional data in which the same cross section is sampled over
time. A wellknown panel data set is the National Longitudinal Study. This study
surveys family expenditures of approximately 20,000 people.
c. NonexperimentalExperimental Data
Nonexperimental datatypical in the social sciences.
Observations drawn from a system not subject to experimental control.
Experimental (common in natural sciences, but experimental data are becoming
more commonly used in economics)
examples: Physics/chemistry
Negative income tax (different tax rates, direct subsidies)
Health insurance
Influence of housing allowance
Split cabledifferent commercials
2. Data problems
a. Degrees of freedom
Not enough observations to estimate model (the number of observations must be greater
than the number of parameters)
10
I
b. Multicollinearitymulticollinearity refers to the tendency of economic variables to
move together making it difficult to accurately estimate the impact of changes in
individual variables. This is often encountered in nonexperimental data available in
the social sciences.
c. Measurement error and accuracy.
o Changing definitions of variablesgovernment statistics: money, automobiles
(include station wagons?)
o Measurement Errorerror boxes
o More accuracy reported than justified[Story: Weigh hogs in Texas]
o Combining data with different accuracies—[Story: Age of river]
o Accuracy isn't necessarily symmetrichence the errors need not "cancel" out
income tax reports—individual and corporate profits
women's age in surveys not many report ages between forty and forty five
3. Some data sources
Excellent websites include
http://www.ciser.cornell.edu/ASPs/datasource.asp and
http://www.econdata.net/.
Both of these websites provide access to a wide variety of data sources. Included in the
description of econdata.net is a list of the ten best sites based on user feedback. Some are
copied below for your convenience:
• Bureau of the Census
The Census Bureau site will lead you to the full range of popular and obscure Census
data series. The site has a comprehensive AtoZ listing of data subjects, as well as
**American FactFinder** and CenStats, querybased means for accessing data for
your area from a variety of Census series.
• Bureau of Labor Statistics
Bureau of Labor Statistics (BLS) has a wealth of information available through its
Web site. BLS jobs, wages, unemployment, occupation, and prices data series are
available through a much improved querybased system. Also see Economy at a
Glance for an integrated set of BLS data for states and metro areas.
• Bureau of Economic Analysis
The Bureau of Economic Analysis (BEA) makes its Gross State Product, Regional
Economic Information System (REIS), and foreign direct investment data available
11
I
on its Web site. You can also use this site to access BEA's national income account
data and its publication of record, the Survey of Current Business.
• http://www.econdata.net/
This website includes links to many different types of data, including some of the
following sites.
• http://www.Census.Gov
This site includes all data for the Census of Population and Housing and U.S. and
World Population data.
• http://www.census.gov. United Nations Statistical Division
• http://www.stls.frb.org [St. Louis Federal Reserve Economic Data Base]
Price indices, interest rates, balance of payments, employment, and monetary data.
• [Resources for Economists on the Internet]
U.S. macro and regional data, other U.S. data, international data, financial data, and
academic journal archive data.
• http://rfe.org (Resources for Economists)
• http://www.bea.doc.gov
The Bureau of Economic Analysis provides timeseries data on a
variety of U.S. macroeconomic variables.
• http://www.psidonline.org
The Panel Study of Income Dynamics (PSID) is a nationally representative
longitudinal study of families and individuals begun in 1968. The initial focus
was to examine employment, earnings, and income over the life cycle for 5000
families. Interviews for many of these families and their descendents has
continued.
• http://www.icpsr.umich.edu
• http://www.icpsr.umich.edu/icpsrweb/ICPSR/
The Interuniversity Consortium for Political and Social Research (ICPSR)
provides access to an extensive collection of downloadable data. Try it, you may
like it.
• http://www.ipums.umn.edu
Integrated Public Use Microdata Series. Registration is free and registered users
can select “Create Extract” to choose variables to include in their data set.
• International—is an integrated series of census microdata samples from 1960 to
the present. At this time, the series includes eighty samples drawn from twentysix
countries, with more scheduled for release in the future.
• USA is an integrated series of representative samples drawn from the U.S.
censuses of the period from 1850 to 2000. IPUMSUSA also includes American
Community Survey (ACS) data from 2000 to 2005.
• CPS provides integrated data and documentation from the March Current
Population Survey (CPS) from 1962 to 2006. The harmonized CPS data is also
compatible with the data from IPUMSUSA
Some other internet resources
• National Bureua of Economic Research
o http://www.nber.org/data/
12
I
• Another excellent data site which has data to explore the impact of religious
practices on the family is
http://www.people.cornell.edu/pages/jpp34/religion_datasets.htm
• For those interested in sports data, try espn.com, pgatour.com, nba.com, basketball
reference.com, hoopdata.com
• For those considering purchasing a diamond, you might try www.diamonds.net
•
•
DataFerrett is a popular data mining tool that accesses data stored in TheDataWeb through
the internet. DataFerrett can be installed as an application on your desktop or use a java applet
with an internet browser. DataFerrett is compatible with Windows operating systems.
http://dataferrett.census.gov/
• National Center for Health Statistics
• National Retirement Survey
Google is also an excellent resource to assist in locating data and studies related to your area
of interest.
C. Econometric Projects
The purpose of the project is to provide an opportunity to formulate a model of interest,
collect relevant data, estimate the model and interpret the results. This experience will
facilitate an integration of the statistical and econometric methodologies discussed in class
with other economics courses which may focus more on institutional descriptions of events
and organizations or an analysis of theoretical models. These models are merely
hypothesized explanations of observed economic data and should be estimated and tested.
Econometrics provides a method of testing the validity of the hypotheses underlying
economic models.
1. Model Selection and Data
The selection of a model and data to be used are the first steps in an econometric
project. Other economics courses or related journal articles may provide a source of
interesting models. The determination of an econometric project should be based on both an
interesting model and available data. A common problem encountered with econometric
projects is the unavailability of relevant data. Some helpful data sources are contained in the
section I.B.3 of the notes. A growing number of journals provide data used in published
articles. Replicating and updating the research in a published paper can be a productive
exercise. Alternatively, you might consider selecting a project related to your future career
aspirations, a unique data source to which you have special connections, or a passion you
have long held. A premed student used epidemiology data he was already working on with
13
I
a professor from the Microbiology Department. A prelaw student studied the determinants
to law school rankings. A BYU basketball player studied the impact of various statistics on
total BYU points scored. A student working for a directsales company used Census data to
predict what counties would be most successful for his company. Another student had a job
in the energy industry and built a model predicting natural gas prices. One approach is to
think about topics that would be good talking points in future job interviews. Previous
topics have truly been very diverse in terms of both topic and scope. Some more examples:
• Determination of factors related to admission to medical school (one student wrote
the admissions committee and requested anonymous data, one student’s father was
the president of a college)
• The relationship between the value of diamonds and cut, color, and clarity (one
student found an online database of diamond prices and characteristics)
• Factors best determining the probability of divorce (one student used IPUMS.org,
one student obtained the data from a BYU MFHD professor he had)
• Interplay between state hunting licenses and state deer population (student requested
data from Minnesota State Hunting Department)
• Financial applications such as estimating betas of stocks (students have used
Marriott School resources, such as Bloomberg and Compustat)
• Production functions
• Phillips Curve (students have used publicly available unemployment and inflation
data)
• Prediction of consumer default on loans
• Estimating the likelihood of medical doctors to commit suicide (student used
DataFerret to access National Center for Health Statistics microdata)
• Impact of foreign aid on national stability and economic development (one student
had done research with a Political Science professor that provided him with the
development data, one student’s sister was working for an international aid NGO)
• Determinants of profit in used car sales (student used his roommate’s dad’s
dealership’s proprietary data)
• Relationship between consumer debt, credit ratings, and demographics (student used
American FactFinder for demographic data and used credit ratings from the small
business he worked for)
14
I
• Impact of weather, daylight savings time, advertising and local events on retail sales
(one student requested sales data from his boss at a local store, another asked his
brother for sales and advertising data from his startup restaurant)
Once a topic has been selected you should review the previous literature on the topic. A
computer literature search will be helpful. Google Scholar is a useful starting point. Once
you find some good papers that deal with your topic, it is often useful to follow their
citations to identify other relevant literature. In specifying your model, you should clearly
identify the endogenous (dependent) variables to be explained as well as the exogenous
(independent) variables in your model. If you are replicating a previously published
empirical study, it would also be interesting to update the analysis. For economics 388 you
may want to restrict the model to explain one or two endogenous variables. For economics
588, four endogenous variables is a reasonable upper limit with at least six or eight
exogenous variables. If you are working with a simultaneous equations model, both the
structure and reduced form parameters should be estimated.
2. Model Estimation
For single equation models or reduced form representations, ordinary least squares can
be used if neither autocorrelation nor heteroskedasticity is present. Multicollinearity makes
it difficult to obtain accurate estimates of the effects of individual variables. Improved
estimation procedures are available if either autocorrelation or heteroskedasticity is present.
Simultaneous structural equation models are better treated with estimation techniques
specifically developed for these models. The most widely used of these techniques is
probably two stage least squares or instrumental variables estimation. Alternative methods
are also available for structural models and will be discussed in economics 588.
Ordinary least squares, two stage least squares, instrumental variables, and many other
estimators are available in such computer packages as SAS, Stata, SHAZAM, SPSS,
EVIEWS, RATS, TSP, Matlab, Gretl,and R, to mention only a few. Gretl and R are free.
15
I
3. Organization of the writeup
The format for your paper should be modeled after that required by scholarly refereed
journals and would include:
(a) Title page
(b) Abstract. This should be less than one page in length and summarize the topic,
methodology and findings.
(c) Introduction. This section should state the nature and objectives of the project along
with a review of the relevant literature.
(d) Description of the model. The model should be defined and each equation carefully
explained. The variables should be clearly defined. The expected impact of each
exogenous variable on the dependent variable and the reasons explained, i.e., discuss
the comparative statics of the model.
(e) Interpretation of the variables and estimated model. The interpretation of the variables
and data references should be included in the paper. Also include a copy of the data or
references to the data. Basic statistical descriptions for the variables, such as the mean,
variance, minimum, and maximum should be summarized in a table. The results of
estimating the model should be reported and discussed in this section and would
include: parameter estimates, standard errors, tstatistics, Fstatistics, R
2
, tests for
normality, autocorrelation, heteroskedasticity and possibly the degree of
multicollinearity.
(f) Economic analysis of the estimated model and implications. This section would include
a comparison of the estimated results with the comparative static implications of the
economic model. Policy implications, if any, and the predictive capability of the model
could also be included in this section.
(g) Summary and conclusions. Review the major findings as well as possible future work.
(h) Bibliography. Include complete citations for all references in the paper including data
sources.
(i) Include copies of your data in an appendix or give a complete citation to the data
sources. This facilitates a replication of your work which is an important component of
scientific research.
16
I
D. Problem set
Intro Problem Set
Introduction and Stata
Theory
1. Consider the labor model
Demand: w = 100  5N
Supply: w = 50 + 5N
where w denotes the wage rate and N denotes the number of individuals.
a. Graph these schedules and solve for the equilibrium wage and employment level.
b. Graphically depict the effect of imposing a minimum wage of w = 80. What is the
associated level of unemployment?
(JM)
2. Now consider the demand and supply schedules:
Demand: w = β
1
 β
2
N
Supply: w = γ
1
+ γ
2
N
a. Demonstrate that the equilibrium wage rate ( w) is given by
β γ
γ β β γ
2 2
1 2 1 2
+
+
= w
b. Demonstrate that the level of unemployment associated with the imposition of a minimum
wage rate of w + 10 is given by
.
1
+
1
10
2 2
β γ
(Hint: What is the level of unemployment at w?)
c. What is the importance of knowing the values of the structural parameters for policy
implications?
(JM)
3. Assume the demand for gasoline is given by Q
d
= β
1
 β
2
P
g
and the supply of gasoline is
given by Q
s
= 100 + 10P
g
 2P
c
where Q, P
g
, and P
c
denote the quantity gasoline, the price of
gasoline and the price of crude oil.
a. Obtain an expression for the equilibrium price of gasoline (
g
P ) in terms of β
1
, β
2
, and
P
c
.
17
I
b. Evaluate the effect that an increase in P
c
of 10 units will have upon the equilibrium
price of gasoline. Do the values of β
1
and β
2
have any effect on the magnitude of the
effect?
(JM)
4. Application in Stata
There are two ways to execute commands in Stata: writing a simple program file of commands
(called dofiles) or entering in each command one at a time into Stata’s command line prompt.
We will use the latter method here, but you are encouraged to learn how to use dofiles. They
are especially useful when you want to be able to replicate results several times, such as for
your projects.
First we enter in the data. Open up Stata, type in “edit” and hit enter.
Stata’s Data Editor should appear. Starting with the top left cell, enter in the data below, in
two columns:
This represents students’ GPAs along with the corresponding level of
parental income in thousands of dollars. The first student, for example, has a
3.9 GPA and comes from a family having an annual income of $ 75,000.
Close the data editor by clicking on the X in the top right corner. Stata has
saved your data and automatically named the two columns “var1” and “var2”
respectively. You can see them in the Variables window in the top left. Let’s
make sure that the data is as we want it.
Type “list” and hit enter. You should see a little table listing the data you have just entered.
Since “var1” and “var2” are vague variable names, let’s rename them.
Type in “rename var1 gpa” and hit enter. Then type in “rename var2 income.” Now when
you type in “list” you will see new variable names.
To see summary statistics for the two variables, use the summarize command: “summarize gpa
income.” (You can also just type “summarize” and Stata will summarize all of the variables
in memory.)
To see a scatter plot of the two variables with gpa on the yaxis and income on the xaxis, use
the plot command: “plot gpa income” (In Stata the dependent variable always goes first in a
list).
To run a simple linear regression showing the estimated effect of parental income on GPA,
use the regress command: “regress gpa income.”
To generate a new variable equal to the square of income, use the generate command:
“generate incomesq = income^2”. Use the list command again to look at a table of all three
variables.
Print the Stata output to turn in with this assignment (either using File… Print, or by copying
the output to a text editor like Notepad).
3.9 75
4.0 63
3.0 45
3.5 45
2.0 27
3.0 36
3.5 54
2.5 18
2.5 24
18
I
*For most Stata commands, you don’t have to type out the entire command word. For
example, for generate instead of typing out “generate” you can use “g” “ge” or “gen”.
*You may have Stata keep a log of your results for you using the log command. At the
beginning of your Stata session, type “log using mynewlog” where “mynewlog” is the name of
your log file. Stata will open a new log in the “working directory.” To find out where the
working directory is, use the call directory command by simply typing in “cd” and hitting
enter. When you are done using the log and before exiting the program, close the log by
typing in “log close.”
5. Select a data website such as http://www.oswego.edu/~kane/econometrics/data.htm, select
two variables, calculate the means and variances, and plot the observations on the two
variables.
II
1
James B. McDonald
Brigham Young University
5/2010
II. TWO VARIABLE LINEAR REGRESSION MODEL
Several applications about the importance of having information about the relationship
between economic variables were illustrated in the introduction. This section provides some
essential building blocks used in estimating and analyzing "appropriate" functional relationships
between two variables. We first consider estimation problems associated with linear relationships.
The properties and distribution of the least squares estimators are considered. Diagnostic and test
statistics which are important in evaluating the adequacy of the specified model are then discussed.
A methodology for forecasting and the determination of confidence intervals associated with the
linear model is presented. Finally, some alternative functional forms (nonlinear) which can be
estimated using techniques of regular least squares are presented.
A. INTRODUCTION
Consider the model
Y
t
= β
1
+ β
2
X
t
+ ε
t
with n observations (X
1
,Y
1
), . . ., (X
n
,Y
n
) which are graphically depicted as
ε
t
: true random disturbance or
error term
(vertical distance from the
observation to the line)
• Random behavior
• Measurement error (Y)
• Omitted variables
β
1
+ β
2
X
t
: population regression line
• β
1
and β
2
are unknown
II
2
Population Regression Function:
The observations don't have to lie on the population regression line, but it is usually
assumed that
E(Y
t
 X
t
) = β
1
+ β
2
X
t
, i.e.,
the expected value or the "average" value of Y corresponding to any given value of X lies on
the population regression line.
An important objective of econometrics is to estimate the unknown parameters (β
1
, β
2
),
and thereby estimate the unknown population regression line. This estimated regression line is
referred to as the sample regression line. Again, the sample regression line is an estimator of
the population regression line.
Sample Regression Function:
e
t
(the residual) is the vertical distance from the Y
t
to the sample regression line, so
t t 1 2 t t t
ˆ ˆ ˆ
e Y X Y Y = −β −β = − , whereas
t t 1 2 t
Y X ε = −β −β
It is important to recognize that the residual (e
t
) is an estimate of the equation error or
random disturbance (ε
t
) and may have different properties.
{ {
{
1 2
observed estimated random
Y disturbance or
regression
"residual"
line
estimated Y
for a given X
ˆ ˆ
ˆ
t t t
t t
Y X e
Y e
β β = + +
= +
14243
sample
{ {
1 2
observed error or
population
Y random
regression
disturbance
line
t t t
Y X β β ε = + +
14243
II
3
B. THE ESTIMATION PROBLEM
(1) Given a sample of (X
t
,Y
t
): (X
1
,Y
1
), . . ., (X
n
,Y
n
),
Y
t
.
. .
.
.
.
_____________________________
X
t
(2) estimate β
1
, β
2
,
( ) 1 2
ˆ ˆ
, β β .
Note that each different guess of β
1
and β
2
, i.e.,
1
ˆ
β and
2
ˆ
β , gives a different sample
regression line. How should
1
ˆ
β and
2
ˆ
β be selected? There are many possible approaches
to this problem. We now review five possible alternatives and then carefully develop a
method known as least squares.
Criteria: (five of many)
(1) minimize "vertical" distances
min Σ e
t
no unique solution
1
ˆ
β and
2
ˆ
β
min Σ e
2
t
least squares or ordinary least squares (OLS)
1
ˆ
β and
2
ˆ
β
(2) min Σ e
t
p
robust estimators
1
ˆ
β and
2
ˆ
β
p=2 gives least squares
p=1 gives least absolute deviations (LAD)
(3) min Σ (horizontal distances)
2
1
ˆ
β and
2
ˆ
β
(4) min Σ
t
(perpendicular distances from regression line)
2
1
ˆ
β and
2
ˆ
β
II
4
(5) Method of moments (MM) estimators
Sample average of estimated residuals = E(ε
t
) = 0
0 =
et
n
1 = t
∑
Sample covariance between residual and X = E(ε
t
X
t
) = 0
0 =
X e t t
∑
The solution of these equations yields OLS estimators
Many techniques are available and each may have different properties. We will want
to use the best estimators. One of the most popular procedures is least squares.
Derivation of Least Squares Estimators (OLS)*
The sum of squares of the vertical distances between Y
t
and the sample regression line is
called, by many authors, the sum of squared errors and is denoted SSE. The SSE can be
written as
( )
2
2
t t 1 2 t
ˆ ˆ
SSE = e = Y  β  β X
∑ ∑
Different
ˆ
β 's (sample regression lines) are associated with different SSE. This can be
visualized as in the next figure. Least squares amounts to selecting the estimators with the
smallest SSE.
____________
*Since the SSE involves squaring the residuals, least squares estimators may be very sensitive to
"outlying" observations. This will be discussed in more detail later.
II
5
Minimizing SSE with respect to
ˆ
β
1
and
ˆ
β
2
yields
Proof: In order to minimize the SSE with respect to
ˆ
β
1
and
ˆ
β
2
, we differentiate SSE,
with respect to
ˆ
β
1
and
ˆ
β
2
, yielding:
(1) )
X
ˆ

ˆ

Y
( 2 =
ˆ
SSE
(1)
t
2 1
t
t
1
β β
β
∂
∂
∑
e
2  =
t
t
∑
ˆ
β
1
ˆ
β
2
SSE
( )
( )
( )
( )( )
( )
1 2
t t
t
2 2 2
t
t
t t
2
t
ˆ ˆ
Y X (the sample regression line goes through X,Y )
X Y nXY
ˆ
X nX
X X Y Y
X X
Cov(X, Y)
Var(X)
β = β
−
β =
−
− −
=
−
=
∑
∑
∑
∑
II
6
)
X
( )
X
ˆ

ˆ

Y
( 2 =
ˆ
SSE
t
(1)
t
2 1
t
t
2
β β
β
∂
∂
∑
)
X
ˆ

X
ˆ

X Y
( 2  =
2
t
2
t
1
t t β β
∑
.
X e
2  =
t t
∑
We see that setting these derivatives equal to zero,
1 2
SSE SSE
= 0 and = 0
ˆ ˆ
β β
∂ ∂
∂ ∂
, implies
These two equations are often referred to as the normal equations. Note that the normal
equations imply that the sample mean of the residuals is equal to zero and that the sample
covariance between the residuals and X is zero which were also the conditions used in
method of moments estimation.
Solving the first normal equation for
ˆ
β
1
yields
which implies that the regression line goes through the point ( X, Y). The slope of the
sample regression line is obtained by substituting
1 2
ˆ ˆ
Y X β = −β into the second normal
equation
t t
2
SSE
= 0 or = 0
e X
ˆ
β
 
∂
∑


∂
\ ¹
and solving for
ˆ
β
2
. This yields
1 2
ˆ ˆ
Y X β = −β
t t
t
2 2 2
t
t
( YX nXY)
ˆ
( X nX )
Cov(X, Y)
Var(X)
−
β =
−
=
∑
∑
n
t
t=1
n
t t
t=1
e = 0
e X = 0.
∑
∑
II
7
C. PROPERTIES OF LEAST SQUARES ESTIMATORS
The properties of the
ˆ
β
1
and
ˆ
β
2
derived in the previous section will be very sensitive to
which of the following five assumptions are satisfied:
(A.1) ε
t
are normally distributed
(A.2) E(ε
t
X
t
) = 0
(A.3) Homoskedasticity:
Var(ε
t
X
t
) =
2 2
t
σ = σ for every t
Homoskedasticity Heteroskedasticity
(A.4) No Autocorrelation:
Cov(ε
t
, ε
s
) = 0 t ≠ s
II
8
(A.5) The X's are nonstochastic (fixed in repeated sampling) and
Var(X) is finite, or in other words:
2
1
0 lim ( )
n
t
n
t
X X
→∞
=
< − < ∞
∑
.
(This assumption can be relaxed, but the X’s need to be uncorrelated with
the errors in order for OLS estimators to be unbiased and consistent.)
A linear model satisfying (A.2)(A.5) is referred to as the classical linear regression model. If
(A.1)(A.5) are satisfied, then we have the classical normal linear regression model. We will
now summarize the properties of the least squares estimators in each of these two cases.
1. The Classical Linear Regression Model (A.2 – A.5)
If Y
t
= β
1
+ β
2
X
t
+ ε
t
where (A.2)(A.5) are satisfied, then the
i
ˆ
β ’sare
⋅unbiased:
( )
ˆ
i i
E β β =
⋅consistent: Var(
ˆ
β
i
) → 0 as n → ∞
⋅the minimum variance of all linear unbiased estimators.
⋅These estimators are referred to as BLUEbest linear unbiased estimators.
⋅ (A.2)(A.5) are known as the GaussMarkov Assumptions.
2. The Classical Normal Linear Regression Model (A.1 – A.5)
If Y
t
= β
1
+ β
2
X
t
+ ε
t
where (A.1)(A.5) are satisfied, then the least squares estimators are:
⋅unbiased
⋅consistent
⋅minimum variance of all unbiased estimators
(not just linear estimators)
⋅normally distributed
This result facilitates t and F tests which will be discussed in another section.
⋅least squares estimators will also be maximum likelihood estimators.
Since these desirable properties are conditional on the assumptions, it is important to test
for their validity. These tests will be outlined in another section of the notes.
We now attempt to give some intuitive motivation to the concept of maximum likelihood
estimation, then we prove that least squares are maximum likelihood estimators if (A.1)
(A.5) are valid.
II
9
a. Pedagogical examples of maximum likelihood estimation:
(1) Estimation of µ (population mean)
The observed values of a normally distributed random variable Y
t
are denoted
by (Y
t
's) on the horizontal axis. Assume that we know that these data were
generated by one of two populations (#1, #2). Is it possible that the data were
generated from #1?, from #2? Which is the "most likely" population to have
generated the sample?
(2) Regression models
In this example, which of the two population regression lines is most likely* to
have generated the random sample?
II
10
*It might be useful to think about these “pdf’s” as “coming out” of the page in a
third dimension with the “points” being thought of as being normally distributed
around the population regression line.
b. Maximum likelihood estimationDerivation
How can we quantify the ideas illustrated by these two examples and obtain the
"most likely" sample regression line? We now formally derive the maximum
likelihood estimators of β
1
and β
2
under the assumptions (A.1)(A.5).
For the model
Y
t
= β
1
+ β
2
X
t
+ ε
t
(1) E(Y
t
) = β
1
+ β
2
X
t
(2) Var(Y
t
X) = Var(β
1
+ β
2
X
t
+ ε
t
X
t
) = σ
2
;
hence, we can write Y
t
~ N[β
1
+ β
2
X
t
; σ
2
] which means that the density of Y
t
, given
X
t
, is given by f(Y
t
X
t
) = .
2
e
=
2
2 / ) X   Y (
2 2
t 2 1 t
σ
π
σ β β
These results can be visually depicted as in
the following figure:
II
11
The Likelihood Function for a random sample is defined by the product of the density
functions. Since each density function gives the likelihood or relative frequency of an
individual observation being realized, when we multiply these values, we obtain the
likelihood of observing the entire sample, given the current parameters:
L(Y;β
1
,β
2
,σ
2
) = ( ) ( )
1 n
f Y f Y L
=
) ( ) (2
e
2 n/
2
2 n/
2 / ) X   Y ( 
2 2
t 2 1 t
σ
π
σ β β ∑
and the Log Likelihood Function is given by:
l (Y;β
1
,β
2
,σ
2
) = ln L(Y;β
1
,β
2
,σ
2
)
= Σ
t
ln f(Y
t
)
. ln
2
n
 ) ln(2
2
n
 2 / )
X
 
Y
(  =
2 2
2
t
2 1
t
t
σ
Π
σ
β β
∑
( )
2 2
n n
=  SSE/ 2 ln(2 )  ln
2 2
− π
σ σ
Maximum Likelihood Estimators (MLE) are obtained by maximizing l (Y; β
1
, β
2
, σ
2
)
over β
1
, β
2
, and σ
2
. This maximization requires that we solve the following equations:
0 =
SSE
2
1 
= (1)
1
2
1
β ∂
∂
σ
β ∂
∂l
0 =
SSE
2
1 
= (2)
2
2
2
β ∂
∂
σ
β ∂
∂l
0 =
ˆ
1
2
n
 ) ˆ (
2
SSE
= (3)
2
2 
2
2
σ
σ
σ
∂
∂l
LogL
β
1
β
2
ߚ
መ
1
ߚ
መ
2
II
12
Results:
•
ˆ
β
1
and
ˆ
β
2
(the MLE) are also the OLS estimators β
1
and β
2
when (A.1) – (A.5).
•
( )
2
2
t 1 2
2 t
ˆ ˆ
Y
e
ˆ
n n
 
−β −β

σ = =


\ ¹
∑
∑
= average of square vertical deviations is the MLE of σ
2
•
2
ˆ σ is biased.
s
2
= Σe
t
2
/(n  2) is an unbiased estimator of σ
2
. The reason
2
ˆ σ is biased is that
not all of the e
t
's are independent. Recall that there are two constraints on the
e
t
's:
Σe
t
= 0
Σe
t
X
t
= 0;
hence, (n – 2) of the residuals (estimated errors) are independent. In other
words, if we had (n2) of the e
t
's, we could solve for the remaining two using
the two constraints above.
3. Important observation:
If the assumptions (A.1)  (A.5) are not satisfied, we may be
able to "do better" than least squares. It is important to test
the validity of (A.1)  (A.5).
II
13
i
2
ˆ i i
ˆ
~ N ;
β
(
β β σ
¸ ¸
D. DISTRIBUTION OF
1
ˆ
β AND
2
ˆ
β .
1. Distribution
In this section we give, without proof, the distribution of the least squares estimators if
(A.2)(A.5) hold. We also consider factors impacting estimator precision and finally
provide some simulation results to provide intuition to the distributional results. The
main results are then summarized. The proofs will be given in the next chapter using
matrix algebra.
1
ˆ
β and
2
ˆ
β are linear functions of the '
t
Y s are random variables; hence,
1
ˆ
β and
2
ˆ
β are
random variables.
Expected Value: (unbiased estimators)
E(
1
ˆ
β ) = β
1
E(
2
ˆ
β ) = β
2
Variance (Population)
2
2
2
2 2
ˆ t
= / (  X = )
X
n (X) Var
β
σ
σ σ
∑
( )
1
2 2 2 2
ˆ t
= 1/n + / (  X)
X
X β
∑
σ σ
σ σ
β
2
ˆ
2 2
2
X
+ /n =
1
ˆ
β and
2
ˆ
β are consistent because they are unbiased and their variances approach zero as
the sample size increases.
Furthermore, if (A.1) holds (ε
t
~ N(0, σ
2
)), then Y
t
~ N[β
1
+β
2
X
t
;σ
2
], which implies the
i
ˆ
β ' s will be normally distributed since they will be linear combinations of normally
distributed variables.
These results can be summarized by stating that if (A.1)(A.5) are valid, then
where the equations for the variances are given above.
II
14
2. What factors contribute to increased precision (reduced variance) of parameter
estimators?
Consider the density of β
ˆ
1
and recall that
1
2
2 2
2 2 2
ˆ t
1 1
X
= ( + / (  X ) = + . )
X
X
n n n (X) Var
β
σ σ σ
 
∑

\ ¹
Precise Less Precise
Var(X)
n
σ
II
15
3. Interpretation of
ˆ
β
i
~ N[β
i
; ]
2
ˆ
i
σ
β
using Monte Carlo Simulations
In this section we report the results of some Monte Carlo simulations which provide
additional intuition about the distribution of
i
ˆ
β . We first construct the model used to
generate the data and then generate the data. Parameter estimates are then obtained,
another sample is generated and the process is continued until we can consider the
histograms of the estimators. Most Monte Carlo studies are similar in structure.
Consider the simple model which is referred to as the data generating process (DGP)
Y
t
= β
1
+ β
2
X
t
+ ε
t
= 4 + 1.5X
t
+ ε
t
where ε
t
~ N(0, σ
2
= 4). We will let the X's be given by
X
t
= 1, 2, . . ., 20. The selection of
1
β ,
2
β ,
2
σ , and the X’s are arbitrary.
We then generate 20 random disturbances (ε) using a random number generator for
N(0, σ
2
= 4).
The X's and ε's are then substituted into
Y
t
= 4 + 1.5X
t
+ ε
t
to determine corresponding Y's. We now have 20 observations on X
t
and Y
t
.
Pretend that we don't know what β
1
, β
2
, σ
2
are. The only thing we observe are the (X
t
,
Y
t
). This might be visualized as
X → β
1
, β
2
, σ
2
, ε → Y
We now estimate the unknown parameters (β
1
, β
2
, σ
2
) using the previously discussed
formulas. This could yield, for example:
(
ˆ
β
1
,
ˆ
β
2
, σ
2
) = (3.618, 1.615, 2.499).
If 14 more samples were generated, we would have a total of 15 estimates of β
1
, β
2
, σ
2
.
II
16
The results of these random simulations are given by:
Trial
ˆ
β
1
1
2
ˆ
β
s
ˆ
β
2
2
2
ˆ
β
s s
2
R
2
D.W.*
________________________________________________________________________
1 3.618 .539 1.615 .00372 2.499 .974 2.14
2 3.794 .992 1.494 .00689 4.599 .947 2.32
3 5.770 .826 1.346 .00578 3.838 .946 2.10
4 3.491 .646 1.516 .00449 2.997 .966 2.41
5 4.443 .566 1.438 .00397 2.623 .967 2.20
6 4.697 .968 1.491 .00672 4.486 .948 2.83
7 5.428 .504 1.363 .00348 2.333 .967 2.40
8 4.685 .923 1.394 .00672 4.278 .944 1.73
9 6.122 .653 1.337 .00449 3.025 .956 2.21
10 2.589 .885 1.624 .00624 4.100 .960 1.63
11 4.046 1.447 1.514 .01000 6.707 .927 3.35
12 4.384 1.362 1.488 .00941 6.314 .928 1.32
13 3.452 .797 1.594 .00563 3.693 .962 2.06
14 4.301 .598 1.495 .00423 2.770 .968 1.51
15 3.196 .910 1.566 .00640 4.221 .955 2.17
Average 4.27 .8411 1.485 .0059 3.8989 .954 2.16
*D.W. denotes Durbin Watson statistic which can be used to test the validity of (A.4).
Given that
( )
2
1
n
t
t
X X
=
−
∑
= 665.
Questions:
(1) Evaluate the population variance of
ˆ
β
1
and
ˆ
β
2
; i.e., . ,
2
ˆ
2
ˆ
2 1
σ σ
β β
(2) Compare the average of
1
2
ˆ
β
s and
2
2
ˆ
β
s with their population counterparts obtained in (1).
(3) Evaluate the sample variance of the fifteen estimates of
ˆ
β
1
and
ˆ
β
2
and compare them
with their population counterparts.
(4) Use a chisquare test to determine whether the average of the s
2
's is consistent with
σ
2
= 4. Hint:
2
2
2
n 2
~ (15(18) = 270)
s
 
∑ χ

σ \ ¹
.
II
17
A histogram of the estimated
1
ˆ
β ' s might yield a result similar to the following:
Note the relationship between the histogram and the normal density . ) , N(
2
ˆ
1
1
σ
β
β
In practice we only have one sample of X's and Y's; hence, we only have one
observation of
1
ˆ
β ,
2
ˆ
β ,
σ
β
ˆ
i
or
sˆ
i
β
and these distributional results must be interpreted
accordingly.
4. Review:
Model: Y
t
= β
1
+ β
2
X
t
+ ε
t
A.1 ε
t
is distributed normally
A.2 E(ε
t
X
t
) = 0
A.3 Var(ε
t
) = σ
2
∀t
A.4 Cov(ε
t
ε
s
) = 0 t ≠ s
A.5 The X's are nonstochastic and
2
1
0 lim ( )
n
t
n
t
X X
→∞
=
< − < ∞
∑
.
Unknown parameters: β
1
, β
2
, σ
2
Problem: Given a sample of size n: (X
1
,Y
1
), . . ., (X
n
,Y
n
), obtain estimators of the
unknown parameters.
Estimators of the unknown parameters are given by:
1
ˆ
β
4
II
18
Parameter Estimator
β
1
:
ˆ
β
1
= Y 
ˆ
β
2
X
β
2
:
) X 
X
(
) Y 
Y
)( X 
X
(
=
ˆ
2
t
t t
2
∑
∑
β
Var(X)
Y) Cov(X,
=
X
n 
X
Y X n 
Y X
=
2
2
t
t t
∑
∑
σ
2
:
2  n
)
X
ˆ

ˆ

Y
(
= 2)  /(n
e
=
s
2
t
2 1
t
2
t
2
β β
∑
∑
Distributions:
1
2 2
2 2 2
ˆ t
1
1
ˆ
~ N[ , = /n + / (  X ] )
X
X β
∑ β
σ σ σ β
2
2
2 2
ˆ t
2
2
ˆ
~ N[ , = / (  X ] )
X
β
∑ β
σ σ β
The covariance between β
1
, and
ˆ
β
2
is given by
) X  (X / X  = )
ˆ
var( X  = X  =
2
2
2
2
ˆ ˆ ˆ
2 2 1
∑
σ β σ σ
β β β
and will be proven later.
The
σ
β
2
ˆ
i
are estimated by
) X 
X
( /
s
X
+
n
s
=
s
2
t
2 2
2
2
ˆ
1
∑
β
. ) X 
X
( /
s
=
s
2
t
2 2
ˆ
2
∑
β
It should be mentioned that
1 2
1 2
2 2 2
2
ˆ ˆ
t t
2
1 2
2 2 2 2
ˆ ˆ
ˆ ˆ
(n 2) (n 2) (   ) s s (n 2) Y X
s
= = = ~ (n 2)
β β
β β
∑
β β
χ
σ σ σ σ
II
19
E. DESCRIPTIVE STATISTICS AND HYPOTHESIS TESTS
In this section we assume that (A.1)(A.5) are valid and consider test statistics which can
be used to test whether the model has any explanatory power. Z and t statistics and R
2
(the
coefficient of determination) are important tools in this analysis. An important hypothesis is
whether the exogenous variable X helps explain Y. Normally, we would hope to reject the
hypothesis H
0
: β
2
=0 (Y
t
=β
1
+ε
t
). We also consider how to test more general hypotheses of the
form H
0
: β
i
=β
0
i
.
1. , = :
H
0
i i
0
β β where
σ
β
2
ˆ
i
is known
i
0
i
i
ˆ
ˆ

Z = ~ N(0,1)
β
β
β
σ
The test statistic measures the number of standard deviations that
i
ˆ
β differs from the
hypothesized value. Large values provide the basis for rejecting the null hypothesis. The
critical value is 1.96 for a two tailed test at the 5% level.
2. , = :
H
0
i i
0
β β where
σ
β
2
ˆ
i
is unknown
i
i
0 0
i i
i i
2
ˆ
ˆ
ˆ ˆ
 
t = = ~ t(n  2)
s
s β
β
β β
β β
Note the structure of the tstatistic and the Zstatistic are the same, except the standard
error in the Zstatistic is replaced by an unbiased estimator. s
ˆ
β
i
would, in some sense, get
closer to σ
ˆ
β
i
as the sample size increases. We see this as we compare critical values for
the t and Zstatistics.
II
20
Relationship between t statistics and the standard normal
90% 95% 99%
N(0,1) 1.645 1.960 2.326
t(1) 6.314 12.706 31.821
2 2.920 4.303 6.965
3 2.353 3.182 4.541
4 2.132 2.776 3.747
10 1.812 2.228 2.764
25 1.708 2.060 2.485
∞ 1.645 1.960 2.326 = N(0,1)
Note that the critical values for a tstatistic are larger than for a standard normal, because
the t density has thicker tails.
II
21
Confidence Intervals and tstatistics:
We note, from the following, the close relationship between the tstatistic just discussed
and confidence intervals.
)
t
<
s

ˆ
<
t
Pr(
2 /
ˆ
0
i
i
2 /
i
α
β
α
β
β
)
s t
+
ˆ
< <
s t

ˆ
Pr( =
ˆ 2 /
i
i
ˆ 2 /
1 i i
β α β α β
β
β
= 1  α
Thus, the use of confidence intervals or "test statistics" are just two different ways of
looking at the same problem.
II
22
3. Coefficient of Determination (R
2
)
The coefficient of determination measures the fraction of the total sum of squares
"explained" by the model. The following figure will provide motivation and definition of
important terms.
Define the total sum of squares (SST) to be
) Y 
Y
ˆ +
Y
ˆ 
Y
( = ) Y 
Y
( = SST
2
t t
t
2
t
t
∑
∑
+ ) Y 
Y
ˆ ( + )
Y
ˆ 
Y
( =
2
t
2
t
t
∑ ∑
cross products = 0 if
least squares is used
¦ ¹
´ `
¹ )
) Y 
Y
ˆ ( +
e
=
2
t
2
t
∑ ∑
= SSE + SSR,
where SSE and SSR, respectively, denote the sum of squared errors and sum of squares
explained by the regression model.
• total sum of squares = sum of squared errors + sum of squares "explained"
by regression model.
• SST = SSE + SSR
The coefficient of determination (R
2
) is defined by
SST
SSE
 1 =
SST
SSR
=
R
2
t t t
ˆ
e Y Y = −
t
ˆ
Y Y −
t 1 2 t
ˆ ˆ ˆ
Y X = β +β
II
23
) Y 
Y
(
e
 1 =
2
t
2
t
∑
∑
= fraction of total sum of squares "explained" by the model.
Note that increasing the number of independent variables in the model will not change SST,
but will decrease the SSE as long as the estimated coefficient of the new variable(s) is not
equal to zero; hence, increase R
2
. This is true even if the additional variables are not
statistically significant. This has provided the motivation for considering the adjusted R
2
(
2
R )
instead of R
2
. The adjusted
2
R is defined by
1)  /(n ) Y 
Y
(
K) /(n )
e
(
 1 =
R
2
t
2
t 2
∑
∑
where K = the number of β's (coefficients) in the model. R
2
will only increase with the
addition of a new variable if the associated tstatistic is greater than 1 in absolute value. This
results follows from the equation
( )( )
_ var
2
2
_ var 2 2
ˆ
ˆ
0
( 1)
1
1
New
New
New
New Old
n SSE
R R
n k n K SST s
β
β
¦ ¹
 
¦ ¹ −
− ¦ ¦¦ ¦

− = −
´ `´ `
− − − 
¦ ¦
¹ )¦ ¦
\ ¹
¹ )
where the last term in
the product is
( )
2
1 t − and K denotes the number of coefficients in the “old” regression model
and the “new” regression model includes K+1 coefficients.
4. Analysis of Variance (ANOV)
We have just decomposed the total sum of squares (SST) into two components:
• sum of squares error (SSE)
• sum of squares explained by regression (SSR).
This decomposition is commonly summarized in the form of an analysis of variance
(ANOV) table.
Source of Variation
SS
d.f
MSE
Model
Error
SSR
SSE
K  1
n – K
SSR/(K1)
SSE/(n  K)
Total
SST
n – 1
K = number of coefficients in model
II
24
where SS denotes the sum of squares and degrees of freedom, d.f., is the number of
independent terms in SS. The mean squared error (MSE) is the corresponding sum of squares
(SS) divided by the degrees of freedom.
Dividing the MSE for the model by the MSE for the error (s
2
) gives an Fstatistic:
K SSE/n
1) SSR/(K
= F
2
2
n K
R
= ~ F(K  1, n  K)
K1 1
R
   
 
\ ¹\ ¹
The Fstatistic can be used to test the hypothesis that all nonintercept (slope) coefficients
are equal to zero.
In the case of a single exogenous variable,
t 1 2 t t
Y = β X +β +ε
the F statistic ( )
2
2
n2 R
~ F 1, n 2
1 1R
 
 
−
 
\ ¹
\ ¹
tests the hypothesis
0
H : β
2
= 0 (all nonintercept coefficients = 0).
II
25
5. Sample Stata regression output (general format and a numerical example)
sum lwage educ
Variable  Obs Mean Std. Dev. Min Max
+
lwage  N sample mean
lwage
s smallest value largest value
educ  N sample mean
educat
s smallest value largest value
. reg lwage educ
ANOVA (Analysis of Variance Table)
Source  SS df MS Number of obs = N
+ F( #coef1, N#coeff) =
Model  SSR #coef1 SSR/(#coeff1) Prob > F = 0.0000
Residual  SSE N#coef SSE/(N#coeff) Rsquared = SSR/SST = 1 SSE/SST
+ Adj Rsquared =
/( # )
1
/( 1)
SSE N coeff
SST N
−
−
−
Total  SST N1 SST/(N1) Note:
2 2
, R
#
SSE
s MSE s s
N coeff
= = =
−
Regression results

lwage  Coef. Std. Err. t P>t [95% Conf. Interval]

educ 
2
ˆ
β
2
ˆ
s
β
2
2
ˆ
ˆ
s
β
β
 


\ ¹
Probability of a larger t stat.
( ) ˆ / 2
ˆ
/
i
i
t s
α
β
β + −
_cons 
1
ˆ
β
1
ˆ
s
β
1
ˆ
1
ˆ
s
β
β
 


\ ¹
Same as above
sum lwage educ
Variable  Obs Mean Std. Dev. Min Max
+
lwage  428 1.190173 .7231978 2.054164 3.218876
educ  753 12.28685 2.280246 5 17
. reg lwage educ
Source  SS df MS Number of obs = 428
+ F( 1, 426) = 56.93
Model  26.3264193 1 26.3264193 Prob > F = 0.0000
Residual  197.001022 426 .462443713 Rsquared = 0.1179
+ Adj Rsquared = 0.1158
Total  223.327441 427 .523015084 Root MSE = .68003

lwage  Coef. Std. Err. t P>t [95% Conf. Interval]
+
educ  .1086487 .0143998 7.55 0.000 .0803451 .1369523
_cons  .1851968 .1852259 1.00 0.318 .5492673 .1788736
II
26
F. FORECASTS
If we have determined that our model has significant explanatory power, we may want to use it
to obtain predictions. We turn to constructing predictions or forecasts and confidence intervals
for the (1) regression line (or mean Y corresponding to a given X) and (2) individual value of
Y corresponding to an arbitrary value of X.
Sample: (X
t
, Y
t
), t = 1, 2, . . ., n
Estimators: β
ˆ
1
, β
ˆ
2
Sample Regression Line: Y
ˆ
t
= β
ˆ
1
+ β
ˆ
2
X
t
Uncertainty about β
ˆ
1
, β
ˆ
2
implies uncertainty about Y
t
.
E( Y
ˆ
t
) = β
1
+ β
2
X
t
σ σ σ
β β β β
ˆ ˆ t
2
ˆ
2
t
2
ˆ
t
2 1 2 1
X
2 +
X
+ = )
Y
ˆ Var(
σ σ σ
σ
β β β
2
) ˆ t
2
ˆ
2
t
2
ˆ
2
2
2 1 2
X (
X
2 +
X
+
X
+
n
=
σ
σ
β
2
ˆ
2
t
2
2
) X 
X
( +
n
=
σ
2
Y
ˆ
t
=
Therefore,
β β
σ
t
2
ˆ t
t 1 2 Y
ˆ ~ N( + ; ).
X
Y
σ
2
Y
ˆ
can be estimated by
s
) X 
X
( +
n
s
=
s
2
ˆ
2
t
2
2
Y
ˆ
2
β
From these results we can construct
sample period
II
27
Confidence Intervals for β
1
+ β
2
X
t
: (regression line or E(YX))
The forecasting problem is more often concerned with finding confidence intervals for the
actual value of Y
t
(not E(Y
t
X
t
)) rather than the “mean” or expected value Y
t
corresponding to
an arbitrary value of X
t
. To do this we consider an analysis of the forecast error (FE):
FE = Y
t

t
Y
ˆ
E(FE) = 0
σ
FE
2
= Var(FEX)
= Var(Y
t
) + Var( Y
ˆ
)
=
σ σ
2 2
ˆ
Y
+
due to due to
the error uncertainty about
term population regression line
with σ
FE
2
being estimated by s
FE
2
=
2
Y
s
ˆ
+ s
2
Note that
Y
s
ˆ
and s
FE
are functions of
( )
2
X X − , i.e., the further X is from the mean value, the
larger
Y
s
ˆ
and s
FE
. This can also be seen in the following figure.
t c
Y
1 2 t c
Y
Y t s
X t s
±
β +β ±
ˆ
ˆ
ˆ
ˆ ˆ
where t
c
= t
α/2
(n2).
II
28
Confidence Intervals (CI) for actual Y
t
: (not β
1
+ β
2
X
t
)
where
2
FE FE
=
s s
s
+
s
=
2
Y
ˆ
2
2
2
2
2
2
ˆ t
s
= + + (  X )
s X s
n
β
The two curved lines closest to the sample regression line correspond to CI’s for the population
regression line and the two curved lines furthest from the sample regression line are the CI’s
for the actual value of Y corresponding to different values of X.
G. ESTIMATION USING Stata
These calculations can be very tedious for even moderate sample sizes. Fortunately,
calculators and many computer programs make this part of econometrics relatively painless,
even exciting. Thus, we will be able to focus on understanding the statistical procedures, the
validity of the assumptions, and interpreting the statistical output. We will outline the
commands used in least squares estimation using the program Stata. Extensive manuals and
abbreviated information are also available describing additional procedures and options are
available for Stata and other programs such as SAS, EVIEWS, Gretl, R, SHAZAM and
TSP. Gretl is quite user friendly and it is free.
Stata
The data files can be created with Microsoft Excel (saving the file as a csv file). Stata
will automatically read in any column headings the data have. With a file named
FUN388.CSV, we can easily perform least squares estimation of the relationship
ts
Y
ˆ
FE
t
±
1 2 t
C.I. for X
(inner intervals)
β +β
t
C.I. for Y
(outer intervals)
II
29
Y
t
= β
1
+ β
2
X
t
+ ε
t
using the commands:
. insheet using "C:\FUN388.CSV”, clear This reads the data into STATA.
This can also be done by opening the
data editor and manually pasting the
data.
. sum Y X Gives statistical characteristics of Y and
X.
. plot Y X Plots Y on vertical axis, X on the
horizontal axis
. reg Y X Uses OLS to estimate the given model
To view additional residual diagnostics, use the following commands:
After the “. reg Y X” command, type
. predict error, resid (the variable “error” now contains the estimated
residuals)
1. To test for normality of the errors, type
. sktest error Tests for normality using a skewness/kurtosis test.
OR
. swilk error Tests for normality using a ShapiroWilk test
OR
. sfrancia error Tests for normality using a ShapiroFrancia test.
OR
. qnorm error Displays plot of error against quantiles of normal
distribution.
OR
. findit jb The “findit” command is useful in Stata to find
commands that are not yet installed. “findit jb”
will find the command for a JarqueBera test for
normality. After installing the command, type “jb
error” to run a JarqueBera test.
2. To test for heteroskedasticity, the following postregression commands are useful:
. whitetst tests for heteroskedasticity using White’s test.
II
30
. estat hettest varnames tests for heteroskedasticity using a BreuschPagan
and Cook and Weisberg test.
. estat hettest, rhs iid or fstat uses all rhs var’s and a chi squre or ftest
. estat imest, preservewhite tests for heteroskedasticity (using White’s test)
and for skewness and kurtosis.
More postesimation commands are explained in the STATA help file titled
“regress postestimation.”
3. To test for autocorrelation (serial independence or randomness) of the error terms
you must first declare your data to be time series with the command
. tsset timevar timevar is the name of the time variable in your
dataset.
You can then test for autocorrelation in your time series data with the commands
. estat dwatson tests for firstorder autocorrelation.
. estat bgodfrey BreuschGodfrey test for higherorder serial
correlation.
. estat archlm tests for ARCH effects in the residuals.
. runtest varname varname is the name of the variable being tested
for random order.
4. Some other options:
a. To calculate the sum of absolute errors (SAE), type
. egen SAE = sum(abs(error))
“SAE” will appear as a constant column in the data editor.
b. To view information criteria, including the loglikelihood value and the
Akaike and Schwarz Bayesian information criteria, type
. estat ic
c. To display the variance covariance matrix, type
. estat vce
d. To display the correlation matrix, type
. estat vce, corr
e. Help files – use the Help menu or type HELP KEYWORD
II
31
Sample Stata output corresponding to the Anscombe_A data set in problem 1.2 (#4)
. infile x y using "C:\anscombe_a.txt", clear
(11 observations read)
. list y x
++
 y x 

1.  8.04 10 
2.  6.95 8 
3.  7.58 13 
4.  8.81 9 
5.  8.33 11 

6.  9.96 14 
7.  7.24 6 
8.  4.26 4 
9.  10.84 12 
10.  4.82 7 

11.  5.68 5 
++
. plot y x
10.84 +
 *


 *


 *

 *
y  *
 *
 *
 *


 *


 *
4.26 + *
++
4 x 14
II
32
. sum y x
Variable  Obs Mean Std. Dev. Min Max
+
y  11 7.500909 2.031568 4.26 10.84
x  11 9 3.316625 4 14
. reg y x
Source  SS df MS Number of obs = 11
+ F( 1, 9) = 17.99
Model  27.5100011 1 27.5100011 Prob > F = 0.0022
Residual  13.7626904 9 1.52918783 Rsquared = 0.6665
+ Adj Rsquared = 0.6295
Total  41.2726916 10 4.12726916 Root MSE = 1.2366

y  Coef. Std. Err. t P>t [95% Conf. Interval]
+
x  .5000909 .1179055 4.24 0.002 .2333701 .7668117
_cons  3.000091 1.124747 2.67 0.026 .4557369 5.544445

. whitetst
White's general test statistic : .6998421 Chisq( 2) Pvalue = .7047
. estat hettest
BreuschPagan / CookWeisberg test for heteroskedasticity
Ho: Constant variance
Variables: fitted values of y
chi2(1) = 0.41
Prob > chi2 = 0.5232
. estat ic

Model  Obs ll(null) ll(model) df AIC BIC
+
.  11 22.88101 16.84069 2 37.68137 38.47717

*ll(model) corresponds to the optimized loglikelihood value to the specified model; whereas,
ll(null) is obtained by estimating the model without any explanatory variables. Twice the difference
of the loglikelihood values is distributed as a chi square with df equal to the number of explanatory
variables.
II
33
H. FUNCTIONAL FORMS
In many applications the relationships between variables are not linear. A simple test for the
presence of nonlinear relationships is the Regression Specification Error Test (RESET–Ramsey,
1969). This test can be performed as follows:
H
o
:
t t t
y X β ε = + (estimate a linear model)
H
a
:
2 3
1 2
ˆ ˆ
t t t t t
y X y y β δ δ ε = + + (the ˆ y ’s denote OLS predicted values)
An F test of the hypothesis that both delta coefficients are simultaneously equal to zero is
approximately distributed as an F(2, NK). Alternatively nonlinear functions of x can be added to
the linear terms and test for the collective explanatory power of the nonlinear terms. BoxCox
transformations provide another approach.
The linear regression model just considered is more general than might first appear.
Many nonlinear models can be transformed so that "linear techniques" can be used.
We can consider two types of nonlinear models:
o transformable typesestimable by least squares
o nontransformableuse nonlinear optimization algorithms
1. Transformable Models
a. LogLog or Double Log Model
t t t
Y AX
β
= ε
The slope and elasticity
are given by:
1
dY
=
dX
A X
β
β
Y X
dY X
= =
dX Y
β η
•
•
0 β =
β =
0 1 < β <
β =
1 β >
β =
0 β <
β =
II
34
This model can be estimated using least squares by taking the logarithm of the model
to yield
ln Y
t
= ln A + β ln X
t
+ ln ε
t
= β
1
+ β
2
lnX
t
+ ln ε
t
where β
1
= lnA and β
2
= β . Regressing ln(Y
t
) on ln(X
t
) gives estimates for β
1
and
β
2
; hence
1
A e
β
=
ˆ
ˆ
and
2
ˆ ˆ
β = β .
b. Semi Log Models
(1)
t
X
t
Y = A B
t
ε
The slope and elasticities are given by
B; ln
Y
=
dX
dY
t
Y X
= X ln B η
•
Estimation: Least squares can again be applied to the logarithmic
transformation of the original model.
ln Y
t
= ln A + (ln B)X
t
+ ln ε
t
= β
1
+ β
2
X
t
+ ln ε
t
.
Hence
1
A e
β
=
ˆ
ˆ
and
2
B e
β
=
ˆ
ˆ
.
(2)
t
Y
t
X = A B
t
ε
The slope and elasticity are given by
B ln X
1
=
dX
dY
and η
Y⋅X
= 1/(Y ln B).
0 < B < 1
B = 1
B > 1
II
35
Estimation: Applying least squares to
ε 
¹

\

t t t
ln
B ln
1

X
ln B) /ln (1 + B A/ln ln  =
Y
= β
1
+ β
2
ln X
t
+ η
t
which yields
B
ˆ
= e
1/ β
ˆ
2
and Aˆ = e
β
ˆ
1
/ β
ˆ
2
.
c. Reciprocal Transformations
Y
t
= A + B/X
t
+ ε
t
The slope and elasticity are:
B/YX  = ;
X
B/  =
dX
dY
X Y
2
ε •
and
. B/XY  =
X Y
η
•
β > 0
β < 0
II
36
Estimation: Let Z = 1/X, then estimate
Y
t
= A + BZ
t
+ ε
t
= β
1
+ β
2
Z + ε
t
and
1
A = β
ˆ ˆ
and
2
B = β
ˆ ˆ
.
d. Logarithmic Reciprocal Transformations
Y
t
= e
AB/X+ε
t
B/X = ;
X
BY/  =
dX
dY
X Y
2
η
•
Estimation: This model can be estimated using least squares on
ln Y
t
= A  B/X
t
+ ε
t
= β
1
+ β
2
(1/X) + ε
t
where A = αˆ = β
ˆ
1
and
B
ˆ
=  β
ˆ
2
.
Application:
α = 0 market share
asymptotic level
II
37
e. Polynomials
y = β
1
+ β
2
x + β
3
x
2
+ β
4
x
3
β
3
= β
4
= 0 β
4
= 0 β
4
≠ 0
Cost Function:
TC(q) = β
1
+ β
2
q + β
3
q
2
+ β
4
q
3
MC(q) = β
2
+ 2β
3
q + 3β
4
q
2
• the desired shape requires β
4
> 0
• a minimum for positive q requires
MC'(q) = 2β
3
+ 6β
4
q = 0
q = 2β
3
/6β
4
> 0
β
3
< 0
• minimum MC > 0 requires
4
2
3
β  4β
2
3β
4
< 0
2
3
β < 3β
2
β
4
β
2
> 0
Restrictions (Summary):
β
1
≥ 0, β
2
> 0, β
3
< 0, β
4
> 0
2
3
β < 3β
2
β
4
II
38
f. Production Functions
(1) Cobb Douglas (CD)
β β
β β
⋅
ε
3 4
1 2
+ t
t t t t
=
e Y L K
ln Y
t
= β
1
+ β
2
t + β
3
ln L
t
+ β
4
ln K
t
+ ln ε
t
Production Characteristics:
β
3
+ β
4
= 1 constant returns to scale
β
3
= percent of total revenue paid to labor
= 1 =
W
/
W
%
(K/L) %
=
K L
∆
∆
σ elasticity of substitution
(2) Translog Transformation
ln Y
t
= β
1
+ β
2
ln L
t
+ β
3
ln K
t
+ β
3
(ln L
t
)
2
+ β
4
(ln K
t
)
2
+ β
5
(ln L
t
)(ln K
t
)
Note that this model includes the Cobb Douglas as a special case (β
3
=β
4
=β
5
=0).
(3) Constant Elasticity of Substitution (CES)
[ ] ε δ δ
ρ ρ
ρ
β β
t t t
M/
t +
t K
)  (1 +
L e
=
Y
2 1
,
 1
1
=
ρ
σ
M = returns to scale.
Cost functions can be estimated from estimated production functions.
Estimation: (?)
ln Y
t
= β
1
+β
2
t + M/ρ ln[δL
ρ
+(1  δ)k
ρ
] + ln ε
t
This function is a "nontransformable" type.
2. "Nontransformable" Models
Problem: Estimate the parameters in
Y
t
= F(β
1
, β
2
, . . ., β
s
; X
lt
, . . ., X
Kt
) + ε
t
.
II
39
Two possible approaches include using nonlinear optimization programs or
approximations.
(a) Nonlinear Optimization Approach
(1) Define the objective function
Min SSE or
Maximum Likelihood
(2) Specify an initial guess for parameters.
(3) "Press go."
Start at initial value and iterate to a solution.
(b) Examples:
(1) Logistic Model
[ ]
ε
γ β
α
δ γ
t
X +
t
e
+ +
=
Y
t
Estimation:
(
¸
(
¸
δ γ


¹

\

β
α
∑
X
  
Y
ln = SSE
t
t
2
= Σ(ln ε
t
)
2
(2) Constant elasticity of substitution (CES) production function
(3) Box Cox. Define
λ
λ
λ
1 
Y
=
Y
) (
Consider Y
(λ)
= β
1
+ β
2
X
(λ)
+ ε
t
.
λ = 0: ln y = β
1
+ β
2
ln X + ε
t
λ = 1: Y  1 = β
1
+ β
2
(X  1) + ε
or Y = 1 + β
1
 β
2
+ β
2
X + ε
Stata will estimate "BoxCox" models with the command format
boxcox depvar [indepvars] [, options]
Options (list from help file “boxcox” in Stata).
model(lhsonly) applies the BoxCox transform to depvar only.
model(lhsonly) is the default.
model(rhsonly) applies the transform to the indepvars only.
II
40
model(lambda) applies the transform to both depvar and indepvars, and
they are transformed by the same parameter.
model(theta) applies the transform to both depvar and indepvars, but this
time, each side is transformed by a separate parameter.
notrans(varlist) specifies that the variables in varlist be included as
nontransformed independent variables.
II
41
I. PROBLEM SETS
Problem Set 2.1
Simple Linear Regression
Theory
1. Let kids denote the number of children ever born to a woman, and let educ denote the years of
education for the woman. A simple model relating fertility to years of education is
kids = β
0
+ β
1
educ + u
where u is the unobserved error.
a. All of the factors besides a woman’s education that affect fertility are lumped into the
error term, u. What kinds of factors are contained in u? Which of these are likely to be
correlated with level of education, which are not?
b. Will a simple regression analysis uncover the ceteris paribus effect of education on
fertility? Explain.
(Wooldridge 2.1)
2. Demonstrate that
t t
2 2
t
(  X)(  Y) Covariance( , )
X Y
ˆ
=
( ) (  X)
X
X Y
Variance X
β
∑
=
∑
is equivalent to
a.
) X n( 
X
Y X n 
Y X
2
2
t
t t
∑
∑
b.
) X 
X
(
Y
) X 
X
(
2
t
t t
∑
∑
(Hints: Expand the numerator and denominator and remember that t
X nX =
∑ ).
c. If you only have two observations (n=2), ( ) ( )
1 1 2 2
( , , , ) X Y X Y , demonstrate that the
equation for
2
ˆ
β can be simplified to
2 1
2 1
Y Y rise
run X X
−
=
−
.
(JM IIB, JM Math)
3. Demonstrate that the sample regression line obtained from least squares with an estimated
intercept passes through ( X, Y). (Hint:
1 2
ˆ ˆ ˆ
Y X β β = + , substitute X X = , and simplify)
(JM IIB)
II
42
4. Consider the model
Y
t
= βX
t
+ ε
t
, where
A.1 ε
t
distributed normally
A.2 E(ε
t
) = 0 ∀t
A.3 Var(ε
t
) = σ
2
∀t
A.4 Cov(ε
t
,ε
s
) = 0 ∀t, s (t≠s)
A.5 X
t
nonstochastic.
a) Find the least squares estimator of β.
Hint: SSE = Σε
t
2
= Σ(Y
t
 βX
t
)
2
.
b) Find the MLE of β and σ
2
.
Hint: l (Y; Β
2
, β σ ) = Σ ln f(Y
t
;
2
, β σ )
= ) ln(
2
n
 ) ln(2
2
n
 2 / )
X

Y
( 
2 2
2
t t σ
Π 
¹

\

σ
β ∑
c) Will the sample regression line
( )
ˆ ˆ
t
Y X β = obtained in (a) or (b) pass through ( X, Y)?
Explain.
(JM IIB)
Applied
5. The data set in CEOSAL2.RAW contains information on chief executive officers for U.S.
corporations. The variable salary is annual compensation, in thousands of dollars, and ceoten
is prior number of years as company CEO.
i) Find the average salary and average tenure in the sample.
ii) How many CEO’s are in their first year as CEO (that is, ceoten = 0)?
iii) Estimate the simple regression model
log(salary) = β
0
+ β
1
ceoten + ε
and report your results in the usual form*. What is the predicted percentage increase in
salary given one more year as CEO?
(Wooldridge C.2.2)
*The usual form is to write out the equation with the estimated betas and their standard errors
underneath in parentheses. For example, if I was estimating
Y
t
= α + βX
t
+ ε
t
and estimated α to be .543 with a standard error of .001 and β to be 1.43 with a standard error of
1.01 then I would report my results in the “usual form” as follows:
Y
t
= .543 + 1.43*X
t
R
2
=.955
(.001) (1.01)
N = 123.
** We will review the required Stata commands in class/TA sessions.
II
43
Problem Set 2.2
Simple Linear Regression
Theory
Consider the model
Y
t
= β
1
+ β
2
X
t
+ ε
t
.
1. BACKGROUND: The purpose of this problem is to show that, using OLS, the total sum of
squares can be partitioned into two parts as follows:
) Y 
Y
ˆ +
Y
ˆ 
Y
( = ) Y 
Y
(
2
t t
t
n
1 = t
2
t
n
1 = t
∑ ∑
) Y 
Y
ˆ ( + ) Y 
Y
ˆ )(
Y
ˆ 
Y
( 2 + ) Y
ˆ

Y
( =
2
t
n
1 = t
t t
t
n
1 = t
2
t
n
1 = t
∑ ∑ ∑
where the terms ) Y 
Y
ˆ ( , ) Y
ˆ

Y
( , ) Y 
Y
(
2
t
n
1 = t
2
t
n
1 = t
2
t
n
1 = t
∑ ∑ ∑
are referred to as the total sum of
squares (SST), sum of squares error (SSE), sum of squares "explained by the regression"
(SSR), respectively. This notation differs from that used by Wooldridge, but conforms with
notation used in a number of other econometrics texts
QUESTION: Explain why the cross product term
n n n
t t t t
t t t
1 2
t =1 t =1 t =1
ˆ ˆ
ˆ ˆ ˆ (  )(  Y) = (  Y) = ( +  Y) = 0
e e Y X
Y Y Y
β β
∑ ∑ ∑
when least squares estimators are used. (Remember the first order conditions or normal
equations.)
(JM IIB)
Applied
2. For the population of firms in the chemical industry, let rd denote annual expenditures on
research and development, and let sales denote annual sales (both are in millions of
dollars).
a. Write down a model (not an estimated equation) that implies a constant elasticity
between rd and sales. Which parameter is the elasticity? (Hint: what functional
form should be used?)
b. Now estimate the model using the data in RDCHEM.RAW. Write out the estimated
equation in the usual form*. What is the estimated elasticity of rd with respect to
sales? Explain in words what this elasticity means.
(Wooldridge C 2.5)
* report the estimated parameters, standard errors, and R
2
II
44
3. Consider the following four sets of data
1
Data Set A B C D
Variable X Y X Y X Y X Y
Obs. No. 1 10.0 8.04 10.0 9.14 10.0 7.46 8.0 6.58
2 8.0 6.95 8.0 8.14 8.0 6.77 8.0 5.76
3 13.0 7.58 13.0 8.74 13.0 12.74 8.0 7.71
4 9.0 8.81 9.0 8.77 9.0 7.11 8.0 8.84
5 11.0 8.33 11.0 9.26 11.0 7.81 8.0 8.47
6 14.0 9.96 14.0 8.10 14.0 8.84 8.0 7.04
7 6.0 7.24 6.0 6.13 6.0 6.08 8.0 5.25
8 4.0 4.26 4.0 3.10 4.0 5.39 19.0 12.50
9 12.0 10.84 12.0 9.13 12.0 8.15 8.0 5.56
10 7.0 4.82 7.0 7.26 7.0 6.42 8.0 7.91
11 5.0 5.68 5.0 4.74 5.0 5.73 8.0 6.89
a. For each of the data sets estimate the relationship
Y
t
= β
1
+ β
2
X
t
+
t
ε
, using least squares.
b. Compare and explain the four sets of results. (Hint: plot the data.)
c. In each of the four cases obtain a prediction of the value of Y
t
corresponding to a value of X = 20.
Which of the forecasts would you feel most comfortable with? Explain.
d. Based upon these examples comment on the following widely held notions.
i) "Numerical calculations are exact, but graphs are rough."
ii) "For any particular kind of statistical data there is just one set of calculations constituting a
correct statistical analysis."
iii) "Performing intricate calculations is rigorous, whereas actually looking at the data is cheating."
(JM II)
1
Reference: Anscombe, F. J., "Graphs in Statistical Analysis," The American Statistician, Vol. 27 (1973), p. 1721.
II
45
4. The following Stata printout corresponds to the first Anscombe data set.
a. From the printout, determine the values of the following:
X =
2
s =
ˆ
2
2
s
β
=
b. Calculate the predicted value of Y and the variance of the forecast error
corresponding to x=20.
(1)
ˆ
Y =
(2)
2 2
ˆ
Y
s s + =
(3)
2
ˆ
Y
s =
Hint: Recall that
2
2
2 2 2
ˆ ˆ
(20 )
Y
s
s X s
n
β
 
= + −

\ ¹
and
2
FE
s =
2 2
ˆ
Y
s s +
c. Calculate 95% confidence intervals for the actual value of Y corresponding to X=20.
d. Calculate 95% confidence intervals for the population regression line corresponding
to X=20.
Yet another hint: the sample and population regression lines, respectively,
are defined by
( ) 1 2
ˆ ˆ ˆ
t t
Y X β β + and
1 2 t
X β β + , so use ˆ
Y
s
for part (d) and FE
s
for part
(c).
Check your work: Recall that the confidence interval for the population regression
line is narrower than the confidence interval for the actual value of Y corresponding
to a given X.
5. Consider the attached data file (functional forms 2.dta).
X denotes the independent variable, x=1,2,3, ..., 100. Corresponding to this independent
variable, various dependent variables were generated. Plot and estimate an appropriate
functional form between
a. the dependent variable denoted loglog and x;
b. semilog1 and x;
c. reciptrans and x;
d. polya and x;
e. polyb and x; and
f. polyc and x.
II
46
STATA Output (for problem #4)
. infile x y using "anscombe_a.txt", clear
(11 observations read)
. summ y x
Variable  Obs Mean Std. Dev. Min Max
+
y  11 7.500909 2.031568 4.26 10.84
x  11 9 3.316625 4 14
. reg y x
Source  SS df MS Number of obs = 11
+ F( 1, 9) = 17.99
Model  27.5100011 1 27.5100011 Prob > F = 0.0022
Residual  13.7626904 9 1.52918783 Rsquared = 0.6665
+ Adj Rsquared = 0.6295
Total  41.2726916 10 4.12726916 Root MSE = 1.2366

y  Coef. Std. Err. t P>t [95% Conf. Interval]
+
x  .5000909 .1179055 4.24 0.002 .2333701 .7668117
_cons  3.000091 1.124747 2.67 0.026 .4557369 5.544445

. set obs 12
obs was 11, now 12
. replace x=20 in 12
(1 real change made)
. predict yhat
(option xb assumed; fitted values)
. predict sfe, stdf
. list in 11/12 //only lists observations 11 and 12
++
 x y yhat sfe 

11.  5 5.68 5.500546 1.375003 
12.  20 . 13.00191 1.830386 
1
II'
James B. McDonald
Brigham Young University
1/7/2010
III. Classical Normal Linear Regression Model Extended to the Case of k
Explanatory Variables
A. Basic Concepts
Let y denote an n x l vector of random variables, i.e., y = (y
1
, y
2
, . . ., y
n
)'.
1. The expected value of y is defined by
 





\ ¹
M
1
2
n
E( ) y
E( ) y
E(y) =
E( ) y
2. The variance of the vector y is defined by
 





\ ¹
K
K
M M O M
1 1 2 1 n
2 1 2 2 n
n 1 n 2 n
Var( ) Cov( , ) Cov( , ) y y y y y
Cov( , ) Var( ) Cov( , ) y y y y y
Var(y) =
Cov( , ) Cov( , ) Var( ) y y y y y
NOTE: Let µ = E(y), then
Var(y) = E[(y  µ)(y  µ)']
µ  



µ
\ ¹
M
1 1
n n
 y
= E
 y
(y
1
 µ
1
, . . ., y
n
 µ
n
)
µ µ µ µ
µ
µ µ µ µ
µ
2
1 2 1 n 1 2 1 n
1 1
2
2 1 2 n 2 1 2 n
2 2
n
E(  )(  ) . . . E(  )(  ) y y y y
E(  y )
E(  )(  ) . . . E(  )(  ) y y y y
E(  y )
. . .
=
. . .
. . .
E( y
 









µ µ µ µ
µ
\ ¹
2
n 1 n 2 1 n 2
n n
 )(  ) E(  )(  ) . . . y y y
E(  y )
2
II'
 








\ ¹
1 1 2 1 n
2 1 2 2 n
n 1 n 2 n
Var( ) Cov( , ) ... Cov( , ) y y y y y
Cov( , ) Var( ) ... Cov( , ) y y y y y
. . .
= .
. . .
. . .
Cov( , ) Cov( , ) ... Var( ) y y y y y
3. The n x l vector of random variables, y, is said to be distributed as a multivariate
normal with mean vector µ and variance covariance matrix Σ ΣΣ Σ (denoted y ~
N(µ,Σ ΣΣ Σ)) if the probability density function of y is given by
1
1
 (y ) (y )
2
n 1
2 2
e
f(y; , ) = .
(2  ) 
′ µ µ ∑
µ ∑
π ∑
Special case (n = 1): y = (y
1
), µ = (µ
1
), Σ = (σ
2
).
) ( ) (2
e
= ) , ; y f(
2
1
2
2
1
)  y (
1
)  y (
2
1

1 1
1 1 2 1 1
σ
π
σ µ
µ
σ
µ 
¹

\

.
2
e
=
2
2
)  y (
2
2
1 1
σ
π
σ
µ
4. Some Useful Theorems
a. If y ~ N(µ
y
,Σ
y
), then z = Ay ~ N(µ
z
= Aµ
y
; Σ
z
= AΣ
y
A') where A is a
matrix of constants.
b. If y ~ N(0,I) and A is a symmetric idempotent matrix, then y'Ay ~ χ
2
(m)
where m = Rank(A) = trace (A).
c. If y ~ N(0,I) and L is a k x n matrix of rank k, then Ly and y'Ay are
independently distributed if LA = 0.
d. If y ~ N(0,I), then the idempotent quadratic forms y'Ay and y'By are
independently distributed χ
2
variables if AB = 0.
3
II'
NOTE:
(1) Proof of (a)
(2) Example: Let y
1
, . . ., y
n
denote a random sample drawn from
N(µ,σ
2
).
The "Useful" Theorem 4.a implies that:
2
1 n
1 1 1 1
y = + ... + = , . . . y ~ N( , /n) y y
n n n n
 
µ
σ

\ ¹
.
Verify that
(a) µ
µ
µ
=
n
1
,...,
n
1




¹

\


¹

\

M
(b) . /n =
n
1
n
1
I
n
1
,...,
n
1
2 2
σ σ







¹

\


¹

\

M
E(z) = E(Ay) = AE(y) = Aµ
y
VAR(z) = E[(z  E(z))(z  E(z))']
= E[(Ay  Aµ
y
)(Ay  Aµ
y
)']
= E[A(y  µ
y
)(y  µ
y
)'A']
= AE[(y  µ
y
)(y  µ
y
)']A'
= AΣ
y
A' =Σ
z
(
(
(
(
(
¸
(
¸



¹

\

σ
σ





¹

\

µ
µ





¹

\

=
2
2
1
. . . 0
. .
0 . . .
,
.
.
N ~
.
.
y O
n y
y
4
II'
B. The Basic Model
Consider the model defined by
(1) y
t
= β
1
x
tl
+ β
2
x
t2
+ . . . + β
k
x
tk
+ ε
t
(t = 1, . . ., n).
If we want to include an intercept, define x
tl
= 1 for all t and we obtain
(2) y
t
= β
1
+ β
2
x
t2
+ . . . + β
k
x
tk
+ ε
t
.
Note that βi can be interpreted as the marginal impact of a unit increase in x
i
on the
expected value of y.
The error terms (ε
t
) in (1) will be assumed to satisfy:
(A.1) ε
t
distributed normally
(A.2) E(ε
t
) = 0 for all t
(A.3) Var(ε
t
) = σ
2
for all t
(A.4) Cov(ε
t
ε
s
) = 0,t ≠ s.
Rewriting (1) for each t (t = 1, 2, . . ., n) we obtain
y
1
= β
1
x
11
+ β
2
x
12
+ . . . + β
k
x
1k
+ ε
1
y
2
= β
1
x
21
+ β
2
x
22
+ . . . + β
k
x
2k
+
ε
2
. . . .
. . . .
(3) . . . .
y
n
= β
1
x
n1
+ β
2
x
n2
+ . . . + β
k
x
nk
+ ε
n
.
The system of equations (3) is equivalent to the matrix representation
y = Xβ + ε
where the matrices y, X, β and ε are defined as follows:
5
II'
y = Xβ + ε.
(A.1)' ε ~ N(0; Σ = σ
2
I)
(A.5)' The x
tj
's are nonstochastic and
x
n
XX
=
Limit
n
→∞
′
 
Σ

\ ¹
is nonsingular.
columns: n observations on k
individual variables.
rows: may represent
observations at a given point
in time.
1 1
2 2
n
k
= and = .
β     ε
 
β
ε
 
β ε
 
 
 
β
ε
\ ¹ \ ¹
M M
NOTE: (1) Assumptions (A.1)(A.4) can be written much more
compactly as
(A.1)’ ε ~ N (0; Σ = σ
2
I).
(2) The model to be discussed can then be summarized as
11 1k
1
21 2k
2
n1 nk
n
(nxk) (nx1)
y
x x
y
x x
y = X =
y
x x
   
 
 
 
 
\ ¹ \ ¹
K
K
M M M
K
6
II'
C. Estimation
We will derive the least squares, MLE, BLUE and instrumental variables estimators in
this section.
1. Least Squares:
The basic model can be written as
y = Xβ + ε
ˆ ˆ
= Xβ + e = Y + e
where
ˆ ˆ
Y = Xβ is an nx1 vector of predicted values for the dependent variable and
e denotes a vector of residuals or estimated errors.
The sum of squared errors is defined by
n
2
t
t =1
ˆ
SSE(β) =
e
∑






¹

\

e
e
e
)
e
, ,
e
,
e
( =
n
2
1
n 2 1
M
K
e e = ′
ˆ ˆ
= (y  Xβ) (y  Xβ) ′
ˆ ˆ ˆ ˆ
= y y  β X y  y Xβ + β XXβ ′ ′ ′ ′ ′ ′
ˆ ˆ ˆ
= y y  2β Xy + β X Xβ . ′ ′ ′ ′ ′
The least squares estimator of β is defined as the
ˆ
β which minimizes
ˆ
SSE(β). A
necessary condition for
ˆ
SSE(β) to be a minimum is that
ˆ
dSSE(β)
= 0
ˆ
dβ
(see Appendix A for how to differentiate a real
valued function with respect to a vector)
ˆ
dSSE(β)
ˆ
= 2X y + 2XXβ = 0 or
ˆ
dβ
′ ′
7
II'
y X =
ˆ
X X ′ ′ β
y X ) X X ( =
ˆ
1
′ ′ β
Normal Equations
is the least squares estimator.
Note that
ˆ
β is a vector of least squares estimators of β
1
, β
2
,...,β
k
.
2. Maximum Likelihood Estimation (MLE)
Likelihood Function: (Recall y ~ N (Xβ; Σ = σ
2
I))
1 1
 (yX ) (yX )
2
2
1
n/ 2
2
e
L(y; , = I) =
(2  ) 
′ β β ∑
µ ∑
σ
π ∑
2
1
 (yX ) (yX )
2
1
n/ 2
2
2
e
=
(2  I ) 
′ β β
σ
π
σ
2
(yX ) (yX ) / 2
n
n 2
2
2
e
= .
(2 ( ) )
′ β β σ
π
σ
The natural log of the likelihood function,
σ
π
σ
β ′ β
2
2
ln
2
n
 2 ln
2
n

2
) X (y ) X (y
 = L ln = l
is known as the log likelihood function. l is a function of β and σ
2
.
The MLE. of β and σ are defined by the two equations (necessary conditions for a
maximum):
2
1
= (2X y + 2(X X) ) = 0
β
2
∆
∆
∂
′ ′ β
∂
σ
l
2 2
2 2
(y  X ) (y  X ) n 1
=  = 0
2
2( )
∆ ∆
∆ ∆
 
′ ∂ β β

∂ 
σ
σ \ ¹
σ
l
i.e.,
1
= (XX X'y )
∆
′ β
8
II'


¹

\


¹

\

+ π + =
n
ln ) 2 ln( 1
2
n

SSE
l
.
NOTE: (1)
ˆ
=
∆
β β
(2)
2
∆
σ
is a biased estimator of σ
2
; whereas,
2
1 (y  X ) (y  X ) SSE
= e e = =
s
n k n  k n  k
∆ ∆
′ β β
′
is an unbiased estimator of σ
2
.
A proof of the unbiasedness of s
2
is given in Appendix B.
Only nk of the estimated residuals are independent. The
necessary conditions for least squares estimates impose k
restrictions on the estimated residuals (e). The restrictions
are summarized by the normal equations X'X
ˆ
β = X'y, or
equivalently
(3) Substituting σ
2
= SSE/n into the log likelihood function
yields what is known as the concentrated log likelihood
function
which expresses the loglikelihood value as a function of β
only. This equation also clearly demonstrates the
equivalence of maximizing l and minimizing SSE.
X’e = 0
2
t
1
2
= (y  X ) (y  X )
n
e e
e
= =
n n
∆
∆ ∆
′ β β
′ ∑
σ
9
II'
3. BLUE ESTIMATORS OF β, β .
%
We will demonstrate that assumptions (A.2)(A.5) imply that the best
(least variance) linear unbiased estimator (BLUE) of β is the least squares
estimator. We first consider the desired properties and then derive the associated
estimator.
Linear: Ay =
~
β where A is a kxn matrix of constants
Unbiased: β β AX = AE(y) = )
~
E(
We note that β β β = X A = )
~
E( requires AX = I.
Minimum Variance:
i i
i
Var( ) = Var(y)
β A A′
%
= σ
2
A
i
A
i
'
where A
i
= the i
th
row of A and
i
i
= y
β A
%
.
Thus, the construction of BLUE is equivalent to selecting the matrix A so that the
rows of A
Min A
i
A
i
' i = 1, 2, . . ., k
s.t. AX = I
or equivalently, min
i
Var( )
β
%
s.t. AX = I (unbiased).
The solution to this problem is given by
A = (X'X)
1
X' ; hence, the BLUE of β is given by
1
= Ay (X X X y ) ′ ′ β =
%
.
The details of this derivation are contained in Appendix C.
NOTE: (1)
(2) ( )
1
AX X' X X' X I
−
= = ; thus
β
%
is unbiased
1
ˆ
β = β = β = (X X X y )
∆
′ ′
%
10
II'
4. Instrumental Variables Estimators
y = Xβ + ε
Let Z denote an n x k matrix of “instruments” or "instrumental" variables.
Consider the solution of the modified normal equations:
Z
Z' Y Z' X ; = β
%
hence, ( )
1
z
ˆ
β Z X Z y
−
′ ′ = .
z
ˆ
β
is referred to as the instrumental variables estimator of β based on the
instrumental variables Z. Instrumental variables can be very useful if the
variables on the right hand side include “endogenous” variables or in the case of
measurement error. In this case OLS will yield biased and inconsistent
estimators; whereas, instrumental variables can yield consistent estimators.
NOTE: (1) The motivation for the selection of the instruments (Z) is
that the covariance (Z,ε) approaches 0 and Z and X are
correlated. Thus Z'(Y) = Z'(Xβ + ε) = Z' X β + Z'ε ≈ Z' Xβ.
(2) If
n
Z X
Lim
n →∞
′
 

\ ¹
is nonsingular and
n
Z
= 0
Lim
n →∞
′ ε  

\ ¹
, then
z
ˆ
β
is a consistent estimator of β.
(3) Many calculate an R
2
after instrumental variables
estimation using the formula R
2
= 1 – SSE/SST. Since this
can be negative, there is not a natural interpretation of R
2
for instrumental variables estimators. Further, the R
2
can’t
be used to construct Fstatistics for IV estimators.
(4) If Z includes “weak” instruments (weakly correlated
with the X’s), then the variances of the IV estimator can
be large and the corresponding asymptotic biases can be
large if the Z and error are correlated. This can be
seen by noting that the bias of the instrumental variables
estimator is given by
E( )
1
' / ( ' / ) Z X n Z n ε
−
.
(5) As a special case, if Z = X, then
∆
ˆ
ˆ ˆ
= = β = β = β
β β
z x
%
.
11
II'
(6) If Z is an x k* n matrix where k< k* (Z contains more
variables than X), then the IV estimator defined above must
be modified. The most common approach in this case is to
replace Z in the “IV” equation by the projections** of X on
the columns of Z, i.e. ( )
1
ˆ
' ' X Z Z Z Z X
−
= .
This substitution yields the IV estimator
( ) ( )
1
1
1 1
ˆ ˆ
' '
' ' ' ' ' '
IV
X X X Y
X Z Z Z Z X X Z Z Z Z Y
β
−
−
− −
(
=
¸ ¸
(
=
¸ ¸
which yields estimates for k k* ≤ .
.
The Stata command for the instrumental variables estimator
is given by
ivregress 2sls depvar (varlist_1 =varlist_iv)
[varlist_2]
where estimator = 2sls, gmm, or liml with
2sls is the default estimator
for the model
1 2
depvar = (varlist_1)b + var(list_2)b + error
where varlist_iv are the instrumental variables for varlist_1.
A specific example is given by:
ivregres 2sls y1 (y2=z1 z2 z3) x1 x2 x3
Identical results could be obtained with the command,
Ivregress 2sls y1 (y2 x1 x2 x3=z1 z2 z3)
which is equivalent to regressing all of the right hand side
variables on the set of instrumental variables. This can be
thought of as being of the form
ivregress 2sls y (X=Z)
**The projections of X on Z can be obtained by obtaining
estimates of
in the "reduced form" equation X Z V Π = Π+ to yield
( )
1
ˆ
' ' Z Z Z X
−
Π = ; hence, the estimate of X is given by
( )
1
ˆ ˆ
' ' X Z Z Z Z Z X
−
= Π =
12
II'
D. Distribution of
∆
ˆ
β, , β β
%
Recall that under the assumptions (A.1) – (A.5) y ~ N(Xβ, Σ = σ
2
I) and
1
ˆ
β = β = β = (XX Xy; )
∆
′ ′
%
hence, by useful theorem (II.’ A. 4.a), we conclude that
∆
2
y y
ˆ
β = β = β ~ N(A A A) = N[Ax , A IA] ′ ′ β µ ∑ σ
%
where A = (X'X)
1
X'.
The desired derivations can be can be simplified by noting that
AXβ = (X'X)
1
X'Xβ = β
σ
2
AA' = σ
2
(X'X)
1
X'((X'X)
1
X')'
= σ
2
(X'X)
1
X'X((X'X)
1
)'
= σ
2
((X'X)
1
)'
= σ
2
((X'X)')
1
= σ
2
(X'X)
1
.
Therefore ( )
( )
∆
1
2
ˆ
β = β = β ~ N β; XX
−
′ σ
%
NOTE: (1) σ
2
(X'X)
1
can be shown to be the CramerRao matrix, the matrix
of lower bounds for the variances of unbiased estimators.
(2)
∆
ˆ
β, , β, β
%
are
⋅unbiased
⋅consistent
.minimum variance of all (linear and nonlinear unbiased
estimators
⋅normally distributed
13
II'
(3) An unbiased estimator of σ
2
(X'X)
1
is given by
s
2
(X'X)
1
where s
2
= e'e/(nk) and is the formula used to calculate the
"estimated variance covariance matrix" in many computer
programs.
(4) To report s
2
(X'X)
1
in STATA type
. reg y x
. estat vce
(5) Distribution of the variance estimator
χ
σ
2
2
2
(n  k)
s
~ (n  k)
NOTE: This can be proven using the theorem (II'.A.4(b)) and noting that
2
ˆ ˆ
(n k) = e e = (Y  Xβ) (Y  Xβ) .
s
′ ′
1
= (X + ) (I  X(X X X )(X + ) ) ′ ′ ′ β ε β ε
= ε'(I  X(X'X)
1
X')ε.
Therefore,
2
1
2
(n k)
s
= (I  X(XX X) )
′ ε ε    
′ ′
 
σ σ
\ ¹ \ ¹ σ
= M
′ ε ε    
 
σ σ
\ ¹ \ ¹
where ~ N [0, I].
ε  

σ
\ ¹
hence
2
2
2
(n k)
s
~ (n k) because χ
σ
M is idempotent with rank and trace equal to n  k.
14
II'
E. Statistical Inference
1. H
o
: β
2
= β
3
= . . . = β
k
= 0
This hypothesis tests for the statistical significance of overall explanatory power
of the explanatory variables by comparing the model with all variables included to
the model without any of the explanatory variables, i.e., y
t
= β
1
+ ε
t
(all non
intercept coefficients = 0). Recall that the total sum of squares (SST) can be
partitioned as follows:
) y  yˆ ( + ) yˆ  y ( = ) y  y (
2
t
N
1 = t
2
t t
N
1 = t
2
t
N
1 = t
∑ ∑ ∑
or
SST = SSE + SSR.
Dividing both sides of the equation by σ
2
yields quadratic forms, each having a
chisquare distribution:
2 2 2
SST SSE SSR
= +
σ σ σ
χ
2
(n  1) = χ
2
(n  k) + χ
2
(k  1).
This result provides the basis for using
to test the hypothesis that β
2
= β
3
= . . . = β
k
= 0.
NOTE: (1)
R
 1
R
=
SST
SSR
 1
SSR/SST
=
SSR  SST
SSR
=
SSE
SSR
2
2
hence, the Fstatistic for this hypothesis can also be rewritten as
Recall that this decomposition of SST can be summarized in an ANOVA table as
2
2
SSR
(K1)(n K)
K 1
F = = ~ F(K  1, n  K)
SSE
(n K)(K1)
n K
χ
−
χ
−
2
2
2 2
R
n  k
R
k  1
F = = ~ F(k  1, n  k).
(1  ) /(n  k) k  1 1 
R R
   
 
\ ¹\ ¹
15
II'
follows:
Source of Variation
SS
d.f
MSE
Model
Error
SSR
SSE
K  1
n – K
SSR/(K1)
SSE/(n  K)
2
s =
Total
SST
n – 1
K = number of coefficients in model
where the ratio of the model and error MSE’s yields the F statistic just discussed.
Additionally, remember that the adjusted R
2
(
2
R ), defined by
2
2 t
2
t
( ) /(n K)
e
= 1  ,
R
(  Y /(n  1) )
Y
∑
∑
will only increase with the addition of a new variable if the tstatistic associated with
the new variable is greater than 1 in absolute value. This result follows from the
equation
( )( )
_ var
2
2
_ var 2 2
ˆ
ˆ
0
( 1)
1
1
New
New
New
New Old
n SSE
R R
n k n K SST s
β
β
¦ ¹
 
¦ ¹ −
− ¦ ¦¦ ¦

− = −
´ `´ `
− − − 
¦ ¦
¹ )¦ ¦
\ ¹
¹ )
where the last
term in the product is
( )
2
1 t − and K denotes the number of coefficients in the “old”
regression model and the “new” regression model includes K+1 coefficients.
The Lagrangian Multiplier (LM) test can also be used to test this hypothesis
2 2
~ ( 1)
a
LM NR k χ = −
16
II'
2. Testing hypotheses involving individual β
i
's
Recall that
1
ˆ
β ~ N (β; σ (X X ) ) ′
where
( )
2
ˆ ˆ ˆ ˆ ˆ
β β β β β
1 1 2 1 k
2
ˆ ˆ ˆ ˆ ˆ
β β β β β
2 1 2 2 k 1
2
2
ˆ ˆ ˆ ˆ ˆ
β β β β β
k 1 k 2 k
XX
−
 


′ σ =



\ ¹
σ σ σ
σ σ σ
σ σ σ
L
M O
which can be estimated by
( )
2
ˆ ˆ ˆ ˆ ˆ
β β β β β
1 1 2 1 k
2
ˆ ˆ ˆ ˆ ˆ
β β β β β
2 1 2 2 k 1
2
2
ˆ ˆ ˆ ˆ ˆ
β β β β β
k 1 k 2 k
s s s
s s s
s X X
s s s
−
 


′ =



\ ¹
L
M O
Hypotheses of the form H
0
: β
i
=
0 i
β can be tested using the result
The validity of this distributional result follows from
2
N(0,1)
~ t(d)
(d) /d χ
since
i
i
i
ˆ
β
ˆ
 β
β
~ N(0,1) and
σ
i
i
2
2
ˆ
β 2
ˆ
β
(n  k)
~ (n  k). χ
s
σ
i
0
i
i
ˆ
β
ˆ
 β
β
~ t(n  k)
s
17
II'
3. Tests of hypotheses involving linear combinations of coefficients
A linear combination of the β
i
's can be written as
1
k
i 1 k
i
=1
k
β
= ( ,..., ) = δ β. β
δ δ δ
β
 

′


\ ¹
∑
l
M
We now consider testing hypotheses of the form
Recall that
1
2
ˆ
β ~ N (β; (X X ) ; )
σ
′
therefore,
1
2
ˆ
δ β ~ N (δβ; δ (XX δ) )
σ
′ ′ ′
hence,
'
' ' '
1 2 ' 2
ˆ
δ β
ˆ ˆ
δβ  δβ δβ  γ
= ~ t(n  k).
δ (X,X δ ) s
s
The ttest of a hypothesis involving a linear combination of the coefficients
involves running one regression and estimating the variance of
ˆ
δ β ′ from s
2
(X'X)
1
to construct the test statistics.
4. More general tests
a. Introduction
We have considered tests of the overall explanatory power of the
regression model (H
o
: β
2
= β
3
= . . . β
k
= 0), tests involving individual parameters
(e.g., H
o
: β
3
= 6), and testing the validity of a linear constraint on the coefficients
H
0
: δ'β = γ.
18
II'
(H
o
: δ’β = γ). In this section we will consider how more general tests can be
performed. The testing procedures will be based on the Chow and Likelihood
ratio (LR) tests. The hypotheses may be of many different types and involve the
previous tests as special cases. Other examples might include joint hypotheses of
the form: H
o
: β
2
+ 6 β
5
= 4, β
3
= β
7
= 0. The basic idea is that if the hypothesis is
really valid, then goodness of fit measures such as SSE, R
2
and loglikelihood
values (l) will not be significantly impacted by imposing the valid hypothesis in
estimation. Hence, the SSE, R
2
or l values will not be significantly different for
constrained (via the hypothesis) and unconstrained estimation of the underlying
regression model. The tests of the validity of the hypothesis are based on
constructing test statistics, with known exact or asymptotic distributions, to
evaluate the statistical significance of changes in SSE, R
2
, or l .
Consider the model
y = X β + ε
and a hypothesis, H
o
: g(β) = 0 which imposes individual and/or multiple
constraints on the β vector.
The Chow and likelihood ratio tests for testing H
o
: g(β) = 0 can be
constructed from the output obtained from estimating the two following
regression models.
(1) Estimate the regression model y = Xβ + ε without imposing any
constraints on the vector β. Let the associated sum of square errors,
coefficient of determination, loglikelihood value and degrees of freedom
19
II'
be denoted by SSE, R
2
, l , and (n  k).
(2) Estimate the same regression model where the β is constrained as
specified by the hypothesis (H
o
: g(β) = 0) in the estimation process. Let
the associated sum of squared errors, R
2
, loglikelihood value and degrees
of freedom be denoted by SSE
*
, R
2*
, l
*
and (n  k)
*
, respectively.
b. Chow test
The Chow test is defined by the following statistic:
where r = (nk)  (nk)* is the number of independent restrictions imposed on β by
the hypothesis. For example, if the hypothesis was H
o
: β
2
+ 6 β
5
=4, β
3
= β
7
= 0,
then the numerator degrees of freedom (r) is equal to 3. In applications where the
SST is unaltered by the imposing the restrictions, we can divide the numerator and
denominator by SST to yield the Chow test rewritten in terms of the change in the
R
2
between the constrained and unconstrained regressions.
Note that if the hypothesis (H
0
: g(β) = 0) is valid, then we would expect R
2
(SSE)
and R
2*
(SSE
*
) to not be significantly different from each other. Thus, it is only
large values (greater than the critical value) of F which provide the basis for
rejecting the hypothesis. Again, the
2
R form of the Chow test is only valid if the
dependent variable is the same in the constrained and unconstrained regression.
References:
(1) Chow, G. C., "Tests of Equality Between Subsets of Coefficients in Two
Linear Regressions," Econometrica, 28(1960), 591605.
(2) Fisher, F. M., "Tests of Equality Between Sets of Coefficients in Two Linear
Regressions: An Expository NOTE," Econometrica, 38(1970), 36166.
SSE*  SSE
r
SSE
~ F(r, n  k)
n  k
 






\ ¹
2 2
2
 * n  k
R R
F = ~ F(r, n  k)
1  r
R
  
 
\ ¹ \ ¹
20
II'
c. Likelihood ratio (LR) test.
The LR test is a common method of statistical inference in classical
statistics. The motivation behind the LR test is similar to that of the Chow test
except that it is based on determining whether there has been a significant
reduction in the value of the loglikelihood value as a result of imposing the
hypothesized constraints on β in the estimation process. The LR test statistic is
defined to be twice the difference between the values of the constrained and
unconstrained loglikelihood values (2( l  l
*
)) and, under fairly general
regularity conditions, is asymptotically distributed as a chisquare with degrees of
freedom equal to the number of independent restrictions (r) imposed by the
hypothesis. This may be summarized as follows:
The LR test is more general than the Chow test and for the case of
independent and identically distributed normal errors, with known σ
2
, LR is equal
to LR = [SSE
*
 SSE]/σ
2
.
Recall that s
2
= SSE/(n  k) appears in the denominator of the Chow test statistic
and that for large values of (nk), s
2
is "close" to σ
2
; hence, we can see the
similarity of the LR and Chow tests. If σ
2
is unknown, substituting the
concentrated loglikelihood function into LR yields
LR = 2 ( l  l
*
)
= n [ln (SSE
*
)  ln (SSE) ]
= n [ln (SSE
*
/ SSE)].
2 a
LR = 2(  *) (r). χ l l
%
21
II'
a
LR = nln[1/(1R
2
)] = nln[1R
2
] ~ χ
2
(k1).
If the hypothesis H
o
: β
2
= β
3
= . . . β
k
= 0 is being tested in the classical
normal linear regression model, then SSE
*
= SST and LR can be rewritten in
terms of the R
2
as follows:
In this case, the Chow test is identical to the F test for overall explanatory power
discussed earlier.
Thus the Chow test and LR test are similar in structure and purpose. The
LR test is more general than the Chow test; however, its distribution is
asymptotically (not exact) chisquare even for nonnormally distributed errors.
The LR test provides a unified method of testing hypotheses.
d. Applications of the Chow and LR tests:
(1) Model: y
t
= β
1
+ β
2
x
t2
+ β
3
x
t3
+ β
4
x
t4
+ ε
t
H
o
: β
2
= β
3
= 0 (two independent constraints)
(a) Estimate y
t
= β
1
+ β
2
x
t2
+ β
3
x
t3
+ β
4
x
t4
+ ε
t
to obtain SSE = Σe
t
2
= (n  4)s
2
, R
2
,
l =


¹

\


¹

\

Π
n
SSE
ln + ) ln(2 + 1
2
n
 ,
nk = n  4
(b) Estimate y
t
= β
1
+ β
4
x
t4
+ ε
t
to obtain
SSE* = Σe
t
*
2
= (n  2)s*
2
SSE*, R
2*
, l * and (nk)* = n  2
22
II'
(c) Construct the test statistics
SSE*  SSE
SSE*  SSE
n 4 SSE*SSE (n k)* (n k)
2
Chow = = =
SSE SSE
2 SSE
n k n 4
− − −   
 
\ ¹\ ¹
− −
  
 
\ ¹ \ ¹
2 2
2
 * n  4
R R
= ~ F(2, n  4)
1  2
R
a
LR = 2( l  l *) ~ χ
2
(2).
(2) Tests of equality of the regression coefficients in two different regressions
models.
(a) Consider the two regression models
y
(1)
= X
(1)
β
(1)
+ ε
(1)
n
1
observations, k independent variables
y
(2)
= X
(2)
β
(2)
+ ε
(2)
n
2
observations, k independent variables
H
o
: β
(1)
= β
(2)
(k independent restrictions)
(b) Rewrite the model as
(1)'
(1)
(1) (1) (1)
(2) (2) (2) (2)
0 y
X
y = = +
0
y X
 
      β ε

  

β ε \ ¹ \ ¹ \ ¹
\ ¹
Estimate (1)' using least squares and determine SSE, R
2
, l
and (n  k) = n
1
+ n
2
 2k.
Now impose the hypothesis that β
(1)
= β
(2)
= β and write (1)
as
(2)’
(1)
(1) (1)
(2) (2) (2)
y
X
y = = β +
y X
 
   
ε

 

ε \ ¹ \ ¹
\ ¹
Estimate (2)’ using least squares to obtain the constrained
sum of squared errors (SSE*), R
2*
, l * and
23
II'
(n  k)* = n
1
+ n
2
 k.
(c) Construct the test statistics
SSE*  SSE
(n  k) *  (n  k)
Chow =
SSE
(n k) −
2 2
1 2
, 1 2
2
 * +  k
R R n n
= ~ F ( +  2k)
k n n
1  k
R
  
 
\ ¹ \ ¹
a
LR = 2( l  l *) ~ χ
2
(k).
5. Testing Hypotheses using Stata
a. Stata reports the log likelihood values when the command
estat ic
follows a regression command and can be used in constructing LR tests.
b. Stata can also perform many tests based on t or Chowtype tests.
Consider the model
(1) Y
t
= β
1
+ β
2
X
t2
+ β
3
X
t3
+ β
4
X
t4
+ ε
t
with the hypotheses:
(2) H
1
: β
2
= 1
H
2
: β
3
= 0
H
3
: β
3
+ β
4
= 1
H
4
: β
3
β
4
= 1
H
5
: β
2
= 1 and β
3
= 0
The Stata commands to perform tests of these hypotheses follow OLS
estimation of the unconstrained model.
24
II'
reg Y X2 X3 X4
estimates the unconstrained model
test X2 = 1 (Tests H
1
)
test X3 = 0 (Tests H
2
)
test X3 + X4 = 1 (Tests H
3
)
testnl _b[X3]*_b[X4] = 1 (Tests H
4
. The “testnl” command is
for testing nonlinear hypotheses. The
suffix “_b”, along with the braces,
must be used when testing nonlinear
hypotheses)
test (X2 = 1) (X3 = 0) (Tests H
5
)
95% confidence intervals on coefficient estimates are automatically calculated in
Stata. To change the confidence level, use the “level” option as follows:
reg Y X2 X3 X4, level(90) (changes the confidence level
to 90%)
25
II'
F. Stepwise Regression
Stepwise regression is a method for determining which variables might be
considered as being included in a regression model. It is a purely mechanical approach,
adding or removing variables in the model solely determined by their statistical
significance and not according to any theoretical reason. While stepwise regression can be
considered when deciding among many variables to include in a model, theoretical
considerations should be the primary factor for such a decision.
A stepwise regression may use forward selection or backward selection. Using
forward selection, a stepwise regression will add one independent variable at a time to see
if it is significant. If the variable is significant, it is kept in the model and another variable
is added. If the variable is not significant, or if a previously added variable becomes
insignificant, it is not included in the model. This process continues until no additional
variables are significant.
Stepwise regression using Stata
To perform a stepwise regression in Stata, use the following commands:
Forward:
stepwise, pe(#): reg dep_var indep_vars
stepwise, pe(#) lockin1: reg dep_var (forced in
variables) other indep_vars
Backward:
stepwise, pr(#): reg dep_var indep_vars
26
II'
stepwise, pr(#) lockin1: reg dep_var (forced in
variables) other indep_vars
where the “#” in “pr(#)” is the significance level at which variables are removed, as
0.051, and the “#” in “pe(#)” is the significance level at which variables are entered or
added to the model. If pr(#1) and pr(#2) are both included in a stepwise regression
command, #1 must be greater than #2. Also, “depvar” represents the dependent variable,
“forced_indepvars” represent the independent variables which the user wishes to remain
in the model no matter what their significance level may be, and “other_indepvars”
represents the other independent variables which the stepwise regression will consider
including or excluding. Forward and backward stepwise regression may yield different
results.
G. Forecasting
Let y
t
= F(X
t
, β) + ε
t
denote the stochastic relationship between the variable y
t
and the vector of variables X
t
where X
t
= (x
t1
,..., x
tk
). β represents a vector of unknown parameters.
Forecasts are generally made by estimating the vector of parameters
ˆ
β(β) ,
determining the appropriate vector )
X
ˆ (
X
t
t
and then evaluating
t t
ˆ
ˆ ˆ = F( , β) . y
X
The forecast error is FE = y
t
 yˆ
t
.
There are at least four factors which contribute to forecast error.
27
II'
1. Incorrect functional form (This is an example of specification error and will be
discussed later.)
2. Existence of random disturbance (ε
t
)
Even if the "appropriate" future value of X
t
and true parameter values, β,
were known with certainty
FE = y
t
 yˆ
t
= y
t
 F(X
t
,β) = ε
t
2
FE
σ = Variance(FE)
= Var(ε
t
) = σ
2
.
In this case confidence intervals for y
t
would be obtained from
t t ( / 2) ( / 2)
t
Pr [F ( , β)  σ < < F ( , β) + σ] = 1  α y
t t X X α α
which could be visualized as follows for the linear case:
Y
t
X
Y
t
X
t
28
II'
3. Uncertainty about β
Assume F(X
t
, β) = X
t
β in the model y
t
= F(X
t
, β) + ε
t,
then the predicted
value of y
t
for a given value of X
t
is given by
t
t
ˆ
ˆ = β , y
X
and the variance of ˆ
t
y (sample regression line),
t
2
ˆ y
σ is given by
t
2
t t
ˆ y
ˆ
= Var (β)
X X
σ
′
,
with the variance of the forecast error (actual y) given by:
2
FE
σ
t
2
2
ˆ y
= + .
σ
σ
Note that
2
FE
σ takes account of the uncertainty associated with the unknown
regression line and the error term and can be used to construct confidence
intervals for the actual value of Y rather than just the regression line.
Unbiased sample estimators of
t
2
ˆ y
σ and
2
FE σ
can be easily obtained by replacing σ
2
with its unbiased estimator s
2
.
Confidence intervals for
t t
E (  ) ,
Y X
the population regression line:
t t
ˆ ˆ t t t (α/2) (α/2) y y
ˆ ˆ
Pr [ β  < < β + ] = 1  α
t s t s X Y X
Confidence intervals for Y
t
:
t t t (α/2) FE (α/2) FE
ˆ ˆ
PR [ β  < < β + ] = 1  α
t s t s X Y X
Y
t
X
t
29
II'
4. A comparison of confidence intervals.
Some students have found the following table facilitates their understanding of the different confidence intervals for the
population regression line and actual value of Y. The column for the estimated coefficients is only included to compare
the organizational parallels between the different confidence intervals.
Statistic
( )
1
ˆ
' ' X X X Y β
−
=
ˆ ˆ
t t
Y X β = = sample regression line =
predicted Y values corresponding to
t
X .
FE (forecast error)
ˆ ˆ
t t t t
FE Y Y Y X β = − = −
Distribution
( )
1
2
, ' N X X β σ
−
(
¸ ¸
( )
1
2 2 '
ˆ
, ( ' )
t
t t t
Y
N X X X X X β σ σ
−
(
=
¸ ¸
2 2 2
ˆ
0,
t
FE
Y
N σ σ σ
(
= +
¸ ¸
tstat
/ 2 / 2
ˆ
ˆ
1 Pr
i
i i
t t
s
α α
β
β β
α
 
−
 − = − < <

\ ¹
=
ˆ ˆ
2 2
ˆ ˆ
Pr
i i
i i i
t s t s
α α
β β
β β β
 
− < < +

\ ¹
/ 2 / 2
ˆ
ˆ
1 Pr
t
t t
Y
X X
t t
s
α α
β β
α
 
−
 − = − < <

\ ¹
ˆ ˆ
2 2
ˆ ˆ
Pr
t t t
Y Y
X t s X X t s
α α
β β β
 
− < < +

\ ¹
/ 2 / 2
0
1 Pr
FE
FE
t t
s
α α
α
  −
− = − < <

\ ¹
=
2 2
Pr 0
FE FE
FE t s FE t s
α α
 
− < < +

\ ¹
=
2 2
ˆ ˆ
Pr
t FE t t FE
X t s Y X t s
α α
β β
 
− < < +

\ ¹
C.I.
i
β :
ˆ ˆ
2 2
ˆ ˆ
,
i i
i i
t s t s
α α
β β
β β
 
− +

\ ¹
t
X β :
ˆ ˆ
2 2
ˆ ˆ
, X
t t
Y Y
X t s t s
α α
β β
 
− +

\ ¹
:
t
Y
2 2
ˆ ˆ
, X
t FE t FE
X t s t s
α α
β β
 
− +

\ ¹
where
ˆ
Y
s is used to compute confidence intervals for the regression line ( ( )
t t
E Y X β = ) and
FE
s is used in the calculation of
confidence intervals for the actual value of Y. Recall that
2 2 2
ˆ
s
FE
Y
s s = + ; hence,
2 2
ˆ
>
FE
Y
s s and the confidence intervals for
Y are larger than for the population regression line.
30
II'
5. Uncertainty about X. In many situations the value of the independent variable also
needs to be predicted along with the value of y. Not surprisingly, a “poor” estimate of
X
t
will likely result in a poor forecast for y. This can be represented graphically as
follows:
6. Hold out samples and a predictive test.
One way to explore the predictive ability of a model is to estimate the model on a
subset of the data and then use the estimated model to predict known outcomes which
are not used in the initial estimation.
7. Example
M
6 +
G
2.5 + 10 = yˆ
t t
t
t t
t 2 3
ˆ ˆ ˆ
= + +
G M β β β
where y
t
, G
t
, M
t
denote GDP, government expenditure, and money supply.
Assume that
Y t
X
Y
t
X
X
ˆ
t
31
II'
. 10 =
s
,
10
15 3 2
3 20 5
2 5 10
= ) X X (
s
2 3 
1 
2




¹

\

′
a. Calculate an estimate of GPD(y) which corresponds to
G
t
= 100, M
t
= 200, i.e., X
t
= (1, 100, 200).
t
t
10
ˆ
ˆ = β = (1, 100, 200) 2.5 y
X
6
 



\ ¹
1460. = 1200 + 250 + 10 =
b. Evaluate
s
2
yˆ
t
and
s
2
FE
corresponding to the X
t
in question (a).
10
.
200
100
1
15 3 2
3 20 5
2 5 10
200) 100, (1, =
X
) ) X X (
s
(
X
= s
3 
t
1 
2
t yˆ
2
t




¹

\





¹

\

′
′
921.81 =
30.30 =
syˆ
t
931.81 = 921.81 + 10 =
s
+
s
=
s yˆ
2
FE
2
t
30.53 =
SFE
7. Forecasting—basic Stata commands
a) The data file should include values for the explanatory variables
corresponding to the desired forecast period, say in observations n
1
+ 1 to n
2
.
b) Estimate the model using least squares
reg Y X1 . . . XK, [options]
c) Use the predict command, picking the name you want for the predictions, in
32
II'
this case, yhat, e,
ˆ
, and
FE
Y
s s .
predict yhat, xb ← this option predicts
ˆ
Y
predict e, resid ← this option predicts the residuals (e)
predict sfe, stdf ← this option predicts the standard
error of the forecast (
FE s
)
predict syhat, stdp ← this option predicts the standard
error of the prediction (
ˆ
Y
s )
list y yhat sfe ← this option lists indicated variables
These commands result in the calculation and reporting of
s
e, , Y
ˆ
Y,
FE
and
ˆ
Y
s for observations 1 through n
2
. The predictions will show up in the Data
Editor of STATA under the variable names you picked (in this case, yhat,
e, sfe and syhat).
You may want to restrict the calculations to t= n
1
+ 1, .. , n
2
by using
predict yhat if(_n> n
1
), xb
where “n
1
” is the numerical value of n
1
.
d) The variance of the predicted value can be calculated as follows:
s

s
=
s
2
FE
2
yˆ
2
t
33
II'
H. PROBLEM SETS: MULTIVARIATE REGRESSION
Problem Set 3.1
Theory
OBJECTIVE: The objective of problems 1 & 2 is to demonstrate that the matrix equations and
summation equations for the estimators and variances of the estimators are equivalent.
Remember
1
n
t
t
X NX
=
=
∑
and Don’t get discouraged!!
1. BACKGROUND: Consider the model (1) Y
t
= β
1
+ β
2
X
t
+ ε
t
(t = 1, . . ., N) or
equivalently,
(1)’
1 1 1
2 2 2 1
2
n n n
1
ε Y X
1
ε Y X
= +
1
ε Y X
β
β
( ( (
( ( (
(
( ( (
(
( ( (
¸ ¸
( ( (
¸ ¸ ¸ ¸ ¸ ¸
M M M M
(1)” Y = Xβ + ε
The least squares estimator of Y X ) X X ( =
ˆ
is
ˆ
ˆ
1 
2
1
′ ′ β



¹

\

β
β
.
If (A.1)  (A.5) (see class notes) are satisfied, then



¹

\

β β β
β β β
β
)
ˆ
Var( )
ˆ
,
ˆ
Cov(
)
ˆ
,
ˆ
Cov( )
ˆ
Var(
= )
ˆ
Var(
2 1 2
2 1 1
) X X ( =
1
2
′
σ
QUESTIONS: Verify the following:
*Hint: It might be helpful to work backwards on part c and e.
a.
(
(
¸
(
¸
Σ
′
X
X N
X N
N
= X X
t
2
and
1
'
N
t t
t
NY
X Y
X Y
=
 

=


\ ¹
∑
b. )
X
N 
X
( / ) Y X N 
Y X
( =
ˆ
2
t
2
t t
2
Σ Σ
β
34
II'
c. X
ˆ
 Y =
ˆ
2 1
β β
d. )
X
N 
X
( / = )
ˆ
Var(
2
t
2 2
2
Σ
σ β
e.


¹

\

Σ
σ β
X
N 
X
X
+
n
1
= )
ˆ
Var(
2
t
2
2
2
1
)
ˆ
Var(
X
+ ) Y Var( =
2
2
β
f. )
ˆ
Var( X  = )
ˆ
,
ˆ
Cov(
2 2 1
β β β
(JM II’A, JM Stats)
2. Consider the model:
ε
β
t t t
+
X
=
Y
a. Show that this model is equivalent to Y = Xβ + ε
where
1 1 1
2 2 2
n n n
ε Y X
ε Y X
Y , X = , ε
ε Y X
( ( (
( ( (
( ( (
= =
( ( (
( ( (
¸ ¸ ¸ ¸ ¸ ¸
M M M
b. Using the matrices in 2(a), evaluate Y X ) X X (
1
′ ′ and compare your answer with
the results obtained in question 4 in Problem Set 1.1.
c. Using the matrices in 2(a) evaluate ) X X (
1
2
′
σ
.
(JM II’A)
Applied
3. Use the data in HPRICE1.RAW to estimate the model
price = β
0
+ β
1
sqrft + β
2
bdrms + u
where price is the house price measured in thousands of dollars, sqrft is
the floorspace measured in square feet, and bdrms is the number of bedrooms.
a. Write out the results in equation form.
b. What is the estimated increase in price for a house with one more bedroom, holding
square footage constant?
35
II'
c. What is the estimated increase in price for a house with an additional bedroom that is 140
square feet in size? Compare this to your answer in part (ii).
d. What percentage variation in price is explained by square footage and number of
bedrooms?
e. The first house in the sample has sqrft = 2,438 and bdrms = 4. Find the predicted selling
price for this house from the OLS regression line.
f. The actual selling price of the first house in the sample was $300,000 (so price = 300).
Find the residual for this house. Does it suggest that the buyer underpaid or overpaid for
the house?
36
II'
Problem Set 3.2
Theory
1. R
2
, Adjusted R
2
(
2
R ), F Statistic, and LR
The R
2
(coefficient of determination) is defined by
SST
SSE
 1 =
SST
SSR
=
R
2
where
e
= SSE
t
2
Σ and ) Y 
Y
ˆ ( = SSR , ) Y 
Y
( = SST
2
t
2
t
Σ Σ .
Given that SST = SSR + SSE when using OLS,
a. Demonstrate that 0 ≤ R
2
≤ 1.
b. Demonstrate that n = k implies R
2
= 1. (Hint: n=k implies that X is square. Be
careful! Show .)
ˆ
X = Y
ˆ
= Y β
c. If an additional independent variable is included in the regression equation, will
the R
2
increase, decrease, or remain unaltered? (Hint: What is the effect upon
SST, SSE?)
d. The adjusted ,
R
,
R
2
2
is defined by .
1) SST/(n
k) SSE/(n
 1 =
R
2
Demonstrate that
, 1
R
R
k n
k  1
2
2
≤ ≤ ≤ i.e., the adjusted R
2
can be negative.
))
R
 (1
k n
1 n
=
k n
1 n
SST
SSE
=
R
 1 : (Hint
2
2

¹

\


¹

\


¹

\

e. Verify that

¹

\

σ
2
SSE  SSE*
= LR if σ
2
is known
/SSE) ln(SSE* n = if σ
2
is unknown where SSE* denotes the
restricted SSE.
37
II'
f. For the hypothesis H
0
: β
2
= . . . = β
k
= 0, verify that the corresponding LR statistic
can be written as )
R
 ln(1 n  =
R
 1
1
ln n = LR
2
2

¹

\

.
FYI: The corresponding LM test statistic for this hypothesis can be written in
terms of the coefficient of variation as
2
LM NR = .
(JM IIB)
2. Demonstrate that
a. X’e = 0 is equivalent to the normal equations . Y X =
ˆ
X X ′ β ′
b. X’e = 0 implies that the sum of estimated error terms will equal zero if regression
equation includes an intercept.
Remember:
ˆ ˆ
e Y Y Y Xβ = − = −
(JM IIB)
Applied
3. The following model can be used to study whether campaign expenditures affect election
outcomes:
voteA = β
0
+ β
1
ln(expendA) + β
2
ln(expendB) + β
3
prtystrA + u
where voteA is the percent of the vote received by Candidate A, expendA and expendB are
campaign expenditures by Candidates A and B, and prtystrA is a measure of party
strength for Candidate A (the percent of the most recent presidential vote that went to A's
party).
i) What is the interpretation of β
1
?
ii) In terms of the parameters, state the null hypothesis that a 1% increase in A's
expenditures is offset by a 1% increase in B's expenditures.
iii) Estimate the model above using the data in VOTE1.RAW and report the results in
the usual form. Do A's expenditures affect the outcome? What about B's
expenditures? Can you use these results to test the hypothesis in part (ii)?
iv) Estimate a model that directly gives the t statistic for testing the hypothesis in part
(ii). What do you conclude? (Use a two sided alternative.). A possible approach,
test
0 1 2
: H D β β + = , plug
2
D β − for
1
β and simplify.
(Wooldridge C. 4.1)
38
II'
4. Consider the data
t Output (Y
t
) Labor (L
t
) Capital (K
t
)
1 40.26 64.63 133.14
2 40.84 66.30 139.24
3 42.83 65.27 141.64
4 43.89 67.32 148.77
5 46.10 67.20 151.02
6 44.45 65.18 143.38
7 43.87 65.57 148.19
8 49.99 71.42 167.12
9 52.64 77.52 171.33
10 57.93 79.46 176.41
The Cobb Douglas Production function is defined by
(1)
3 4
1 2
β β
+ t β β
t t t t
= ε
e Y K L
where (β
2
t) takes account of changes in output for any reason other than a change in L
t
or
K
t
; ε
t
denotes a random disturbance having the property that lnε
t
is distributed N(0, σ
2
).
Labor’s share


¹

\

receipts sales total
receipts wage total
is given by β
3
if β
3
+ β
4
(the returns to scale) is
equal to one. β
2
is frequently referred to as the rate of technological change
. K and L fixed for
Y
/
dt
dY
t
t

¹

\

Taking the natural logarithm of equation(1),we obtain
(2)
t t t 1 2 3 t 4
ln = + t + ) + ln( ) + ln(ε ) . β β β ln(L β
Y K
If β β
4 3
+ is equal to 1, then equation (2) can be rewritten as
(3)
t t t t
1 2 3
ln( / ) = + t + ln( / ) + ln .
Y K L K
t
ε β β β
a. Estimate equation (2) using the technique of least squares.
b. Corresponding to equation (2)
1) Test the hypothesis H
o
: β
2
= β
3
= β
4
= 0. Explain the implications of this
hypothesis. (95% confidence level)
2) perform and interpret individual tests of significance of β
2
, β
3
, and β
4
, i.e. test
39
II'
H
o
: β
i
= 0 .α = .05.
3) test the hypothesis of constant returns to scale, i.e., H
o
: β
3
+ β
4
= 1, using
a. a ttest for general linear hypothesis, let restrictions δ= (0,0,1,1);
b. a Chow test;
c. a LR test.
c. Estimate equation (3) and test the hypothesis that labor’s share is equal to .75, i.e., β
3
=
.75.
d. Reestimate the model (equation 2) with the first nine observations and check to see if the actual
log(output) for the 10th observation lies in the 95% forecast confidence interval.
(JM II)
5. The translog production function corresponding to the previous problem is given by
2 2
1 2 3 4 5 6 7
ln(Y) = + t + ln(L) + ln(K) + (ln(L) + (ln(K) + (ln(L)) ln(K) + ln(ε ) β β β β β ) β ) β
t
a. What restrictions on the translog production function result in a CobbDouglas
production function?
b. Estimate the translog production function using the data in problem 5 and use the Chow and
LR tests to determine whether it provides a statistically significant improved fit to the data,
relative to the CobbDouglas function.
(JM II)
6. The transcendental production function corresponding to the data in problem 5 is defined by
1 2 3 4 5 6
+ t + L + K β β β β β β
Y =
e L K
a. What restrictions on the transcendental production function result in a CobbDouglas
production function?
b. Estimate the transcendental production function using the data in problem 2 and use the Chow
and LR tests to compare it with the CobbDouglas production function.
(JM II)
40
II'
APPENDIX A
Some important derivatives:
Let


¹

\



¹

\



¹

\

a a
a a
= A ,
a
a
= a ,
x
x
= X
22 21
12 11
2
1
2
1
(symmetric) ) a =
a
=
a
(
21 12
1. a =
dX
a) X ( d
=
dX
X) a ( d ′ ′
2. AX 2 =
dX
AX) X ( d ′
Proof of a =
dX
a) X ( d ′
Note: a’X = X’a = a
1
x
1
+ a
2
x
2
a =
a
a
=
X
/ a) X (
X
/ a) X (
=
dX
a) X ( d
2
1
2
1
(
(
¸
(
¸
(
(
¸
(
¸
∂ ′ ∂
∂ ′ ∂
′
Proof of
d (XAX)
= 2AX
dX
′
Note: X’AX = a
11
x
1
2
+ (a
12
+ a
21
) x
1
x
2
+ a
22
x
2
2
(
(
¸
(
¸
(
(
¸
(
¸
∂ ′ ∂
∂ ′ ∂
′
x a
2 +
x
a 2
x
a 2 +
x a
2
=
X
/ AX) X (
X
/ a) X (
=
dX
AX) X ( d
2 22 1
2 1 11
2
1
(
(
¸
(
¸
x a
+
x
a
x
a +
x a
2 =
2 22 1
2 1 11
(
(
¸
(
¸
(
(
¸
(
¸
x
x
a a
a a
2 =
2
1
22
11
. AX 2 =
41
II'
APPENDIX B
An unbiased estimator of σ
2
is given by
. k) SSE/(n = y) ) X ) X X ( X  (I y (
k n
1
=
s
1 
2
′ ′ ′

¹

\

Proof: To show this, we need some results on traces:
a
= (A) tr
ii
n
i
Σ
1) tr(I) = n
2) If A is idempotent, tr(A) = rank of A
3) tr(A+B) = tr(A) + tr(B)
4) tr(AB) = tr(BA) if both AB and BA are defined
5) tr(ABC) = tr(CAB)
6) tr(kA) = k tr(A)
Now, remember that
2
1
= e e ˆ σ
n
′
and e e
k  n
1
=
s
2
′
1
ˆ
e = y  Xβ = y  X ( X X X y = My ) ′ ′
= M (Xβ + ε) = MXβ + Mε ,
= Mε ,
where M = I  X(X’X)
1
X’.
Note that M is symmetric, and idempotent (problem set R.2).
So
2
1 1
= e e = ε MMε ˆ σ
n n
′ ′ ′
42
II'
1
= ε MMε .
n
′
1
= ε Mε .
n
′
and
2
1
= ε Mε .
s
n  k
′
2
1 1
E ( ) = E (ε Mε) = E (tr(ε Mε)) ˆ σ
n n
′ ′ because
i j
cov ( , ) = 0, i j)
ε ε
≠
1 1
= Etr (Mεε ) = tr (ME (εε ))
n n
′ ′
2 2
1 1
= tr (M I) = tr ( M)
σ σ
n n
2
σ
= tr(M)
n
2
1 σ
= tr(I  X(X X X ) )
n
′ ′
2
1 σ
= (n  tr (X(X X X )) )
n
′ ′
2
1 σ
= (n  tr (XX(X X )) )
n
′ ′
2
k
σ
= (n  trace ( ))
I
n
2
σ
= (n  k)
n
2 2 2 2
n  k n
= so E ( ) = E ( ) = . ˆ σ s σ σ
n n  k
Therefore
2
ˆ σ
is biased, but
2 2 2
n
E ( ) = E ( ) = ˆ s σ σ
n  k
and s
2
is unbiased.
43
II'
APPENDIX C
β = AY = (X X) X Y ′ ′ ′
%
is BLUE.
Proof: Let
i
i
= Y
β A
%
where A
i
denotes the i
th
row of the matrix A. Since the result will be
symmetric for each β
i
(hence, for each A
i
), denote A
i
by a’ where a is a (n by 1) vector.
The problem then becomes:
Min a’Ia when I is nxn
s.t. AX = I when X is nxk (for unbiasedness)
or min a’Ia
s.t. X’a = i where i is the i
th
column of the identity matrix.
Let = a Ia + λ (X a  i) ′ ′ ′ l which is the associated Lagrangian function where λ is kx1.
The necessary conditions for a solution are:
= 2a I + λ X = 0
a
∂
′ ′ ′
′ ∂
l
= (X a  i) = 0 .
λ
∂
′
′ ∂
l
This implies
a = (1/ 2)λ X) . ′ ′ ′
Now substitute a = (½)Xλ into the expression for = 0
λ
∂
′ ∂
l
and we obtain
(1/ 2) X Xλ = i ′
1
λ =  2 (X X i ) ′
X ) X X ( i (2) 2) / (1 = a
1
′ ′ ′ ′
.
A
= X ) X X ( i =
i
1
′ ′ ′
which implies
X ) X X ( = A
1
′ ′
hence,
1
β = (XX X y . ) ′ ′
%
III A
1
James B. McDonald
Brigham Young University
2/9/2010
IV. Miscellaneous Topics
A. Multicollinearity
1. Introduction
The least squares estimator of β in the model
y = Xβ + ε
is defined by
ˆ
β = (X'X)
1
X'y.
As long as the columns of the X matrix are independent, (X'X)
1
exists and
ˆ
β can
be evaluated. If any one column of X can be expressed as a linear combination of the
remaining columns, X'X = 0 and (X'X)
1
is not defined.
Consider the matrix
k
1 1 1 2 1 k
2 1 2 2 2 k
k 1 k 2
kX
Cor( , ) Cor( , ) ... Cor( )
X X X X X X
Cor( , ) Cor( , ) ... Cor( )
X X X X X X
Cor(X) =
Cor( , ) Cor( , ) Cor( )
X X X X X
M M M
L
12 1k
21 2k
k1 k2
1 ...
1 ...
=
1
ρ ρ
ρ ρ
ρ ρ
M M O M
L
where ρ
ij
= correlation (X
i
,X
j
). Recall that 0 ≤ Cor(X) ≤ 1.
One "polar" case is that in which the "independent" or exogenous variables are
orthogonal or uncorrelated with each other, i.e., Cor(X) = I; hence, Cor(X) = 1.
III A
2
Another polar case is the situation in which one exogenous variable can be written as a
linear combination of the remaining exogenous variables, e.g.,
x
t2
x
t3
Sales Revenue
t
= β
1
+ β
2
(Sales of right ski boots) + β
3
(Sales of left ski boots) + ε
t
.
In this case,
2 3
3 2
1 Cor( , ) 1 1
X X
Cor(X) =
Cor( , ) 1 1 1
X X
=
and Cor(X) = 0.
While the extreme case of Cor(X) = 0 is not particularly common, frequent instances in
which Cor(X) is small may arise in which some rather "strange" results may occur. We
will define multicollinearity to exist whenever Cor(X) < 1. Cor(X) = 0 is referred to
as exact multicollinearity. Multicollinearity is not necessarily bad, but it may make it
difficult to accurately estimate the impact of individual variables on the expected value of
the dependent variable. The question of interest is generally not whether we have
multicollinearity, but what is the "degree" of multicollinearity, what are the associated
consequences, and what can be done about it? While multicollinearity can contribute to
imprecise estimates, it is not the only cause or explanation of imprecise estimation. In
summary, the impact of multicollinearity is that if two or more independent variables move
together, then it can be difficult to obtain precise estimates of the effects of the individual
variables, β
i
= ∂Ε(y
t
)/∂X
ti
.
III A
3
2. A special case of two explanatory variables.
In order to illustrate some of the consequences of multicollinearity, consider the
following model:
(1) y
t
= β
1
+ β
2
x
t2
+ β
3
x
t3
+ ε
t
t = 1,2, . . ., n.
Summing (1) over t and dividing by n we obtain
(2)
t
y = β
1
+ β
2
x
2
+ β
3
x
3
+ ε
where y, x
2
, x
3
, and ε , respectively, denote the sample means of y
t
, x
t2
, x
t3
, and ε
t
.
Subtracting (2) from (1) yields
(3) y˜
t
= β
2
x˜
t2
+ β
3
x˜
t3
+
t
ε%
where y˜
t
= y
t
 y, x˜
t2
= x
t2
 x
2
, x˜
t3
= x
t3
 x
3
, and
t
ε% = ε
t
 ε .
The least squares estimators of β
2
and β
3
are given by (Appendix A.1)
(4)
2 1
3
ˆ
= (XX X y )
ˆ
β
′ ′
β
% % %
where
2y 22 23
3y 32 33
m m m
XX = , Xy =
m m m
′ ′
% % %
)
x
 y )(
x

x
( =
x
~
x
~
=
m
j t i
ti
n
1 = t
tj ti
n
1 = t
ij ∑ ∑
n n
iy ti ti
i t t
t =1 t =1
= = (  )(  y) y y
m x x
x ∑ ∑
%
%
and
(5)
2 1
2
3
ˆ
Var = (X X . )
ˆ
β
′
σ
β
% %
From equation (5) it can be shown that
(6)
i
2
2
ˆ
2
i
23
=
Var( )(1 )
X
n
β
σ
σ
ρ
(7)
s

ˆ
=
t
ˆ
i
i
ˆ
i
i
β
β
β
β
III A
4
where
{ }
2
2
t 2 t 3 2 2 3 t 2 t 3 2
2 3 23 2 2 2
2
t 2 t 3
t 2 2 t 3 3
(  )(  )
( ) x x
x x x x
= = Correlation (X , X ).
x x
(  (  ) )
x x x x
∑
∑
= ρ
∑ ∑
∑ ∑
% %
% %
The confidence intervals for β
i
are given by
(8) .
)  )(1
x
Var( n
s
t
ˆ
=
s t
ˆ
2
23
ti
2
2 / 1
2 /
i
ˆ 2 /
i i
ρ
±
β
±
β α β α
Equation (6) can be used to illustrate the point made on page 3 about multicollinearity
only being one of several factors which may impact estimator precision. From (6) we note
that (other things being equal) increasing the sample size (n), increasing the variance of the
variable whose coefficient is being estimated (X
i
), reducing σ
2
, or reducing the square of the
correlation between the independent variables will increase the precision of our estimators,
i.e., reduce the variance of the estimator. A graphical analysis may be helpful.
In order to focus on the effect of multicollinearity on the variance of say
ˆ
β
2
, consider
the ratio
σ
β
2
2
~ with multicollinearity (ρ
23
≠ 0) to
σ
β
2
ˆ
2
without multicollinearity (ρ
23
= 0). In
other words, for different values of ρ
2 2
23
, we calculate this ratio, which reflects how many
times worse (greater) the variance is of an estimator subject to multicollinearity compared to
one without. This ratio is equal to 1/(1ρ
22
23
).
ρ
2
23
2
2
2
2
ˆ
β
β
σ
σ
%
0
1
1/2
2
2/3
3
9/10
10
99/100
100
Note again that other things being equal, the larger the correlation between the two
independent variables in equation (1), the larger the variance of
ˆ
β
2
and the less "precise" will be
III A
5
the estimator. The effect can be substantial. However, it is important to recall that
multicollinearity is not the only factor having an impact on estimator precision as measured
by
σ
β
2
ˆ
2
, see equation (6).
The following figure of the density of
ˆ
β
2
for different values of ρ
23
(and hence
σ
β
2
ˆ
2
) will be
useful in our discussion of the possible impact of multicollinearity.
Density of
ˆ
β ββ β
2
Recall that (i) the points of inflection on the normal density curve occur at µ ± σ so that
if we are testing the hypothesis H
o
: β
2
= 1
(ii)
2 2
ˆ ˆ
2
ˆ
Pr( <  1 ) 0.68
β β
≤ =
σ σ β
(iii)
2 2
ˆ ˆ
2
ˆ
Pr(2 <  1 < 2 ) 0 .95
β β
=
σ σ β
(iv)
σ σ
β
β
β β
ˆ ˆ
2
2
2 2
1 
<
1 
ˆ
Pr = 0) <
ˆ
Pr(
σ ρ
σ
β
β
/
m
 1  <
1 
ˆ
Pr =
22
2
23
ˆ
2
2
From (iv) we can evaluate the probability of
ˆ
β
2
assuming the "wrong sign" for the case in which
β
2
= 1 for given
m22
and σ. In the previous figure these probabilities are shown as the area to
the left of the vertical dotted line. If σ =
m22
(strictly for purposes of exposition), the
probability of an "incorrect" sign would be given in the following table.
ˆ
0.5 =
2
2
β
σ
ˆ
1.0 =
2
2
β
σ
ˆ
1.5 =
2
2
β
σ
III A
6
23
ρ
Probability of
an incorrect
sign
0
.16
1/2
.24
2/3
.28
9/10
.37
99/100
.46
Based on our previous discussion we note that increases in and "severe" multicollinearity
can be associated with the following situations.
(1) The precision of estimation is reduced (Var(
ˆ
β
i
) increases) so that it becomes difficult to
accurately estimate individual effects of variables which move together.
(2) It was noted that the probability of obtaining estimates having the "wrong" sign
increases as Corr
2
(x
2
,x
3
) increases.
(3) Note from (7) that as ρ
23
→ 1, the tstatistics get smaller: hence, based upon a strict
adherence to a "tcriterion" for deleting variables, a variable may be deleted from an
equation when that variable does have an effect. This is always a possibility in
statistical inference, but with severe multicollinearity the confidence intervals can
become so wide (see equation (8)) as to make it difficult to reject "almost any
hypothesis." Recall that confidence intervals for β
i
are given by
)  )(1
x
Var( n
s
t
ˆ
2
23
ti
2
c
i
ρ
±
β
for the case in which k = 3.
(4) Severe multicollinearity is frequently associated with "significant" F statistics and
"insignificant" t statistics for a group of variables which are expected to be important.
The collective importance of a group of variables can be checked using a Chow test.
Huge Fstatistics but small tstatistics? Likely diagnosis: multicollinearity
III A
7
To visualize this situation consider the joint confidence intervals for β
2
and β
3
which
might appear as
Note that the individual confidence intervals for β
2
and β
3
include 0; hence, we
would not be able to reject the hypothesis that β
2
or β
3
= 0. The joint confidence
interval for β
2
and β
3
does not include the origin; hence, the F statistic will be
statistically significant. It is the high correlation between x
2
and x
3
that contributes
to the elliptical shape of the joint confidence interval.
(5) Coefficient estimates may be extremely sensitive to the addition of more data.
(6) Corr(X) =
23 2
23
23
1
1
1
ρ
ρ
ρ
= − may be close to zero.
(7) Various pairwise correlations between the X's may be close to 1.
(8) Condition index (CI).
High pairwise correlations between explanatory variables are sufficient for
multicollinearity problems, but are not necessary. Belseley, Kuh and Welsch (BKW)
define a condition index
Maximum eigen value
CI =
Minimum eigen value
where the eigen values correspond to the correlation matrix of the x's. BKW use arule
of thumb is that multicollinearity is high if CI > 30.
Consider the condition index for the two polar cases in the introduction of this section.
III A
8
1 0
0 1
=
C1
1 1
1 1
=
C2
which have respective eigen values
(λ
11
, λ
12
) = (1,1) and (λ
21
, λ
22
) = (0, 2).
The corresponding condition indices are then
0 =
1
1
=
CI1
2
2
= (undefined) so the CI as C 0.
CI
0
→∞ →
We remind the reader that the CI merely provides a rule of thumb.
In problem number 3.1(1), the reader is asked to verify that the condition index
corresponding to the correlation matrix
ρ
ρ
1
1
= C
is given by
1 +  
.
1   
ρ
ρ
Note that CI increases as ρ increases and includes C
1
and C
2
as special cases.
3. Some results for the case of an arbitrary number of independent variables.
Consider the more general model
(9) Y
t
= β
1
+ β
2
X
t2
+ β
3
X
t3
+ . . . + β
k
X
tk
+ ε
t
.
Some of the results obtained in the previous section can be extended to the more general
case as follows:
(10ac)
i
2
2
ˆ
2
2
i
i
=
(1  )
s n
β
σ
σ
ρ
i
2
2
ˆ
2
2
i
i
s
=
S
(1  )
s n
β
ρ
i
i
1/ 2 2
i i i
i i i
ˆ
ˆ
ˆ ˆ ˆ
 s (1  (  ) )
= =
t
s
s
n
β
β
ρ β
β β β
where
2 2
i ti i
s = (X X ) /n
∑
III A
9
ρ
2
i
= Correlation
2
(between X
i
and all other independent variables)
= R
2
obtained from regressing X
i
on other independent variables.
These results seem reasonable. In particular, the higher the correlation between an
independent variable and the set of other independent variables, the less precise the
associated coefficient estimator as measured by the variance. Again, we note that
“multicollinearity" is only one factor contributing to poor estimator precision (large
σ
β
2
ˆ
2
). Large values of σ
2
and small N and small s
2
i
have the same impact.
The impact of multicollinearity as measured by pairwise correlations between
independent variables becomes much less clear. In particular, if c
ij
is the correlation
between the ith and jth independent variable, it can be shown that
)
c
)(
c
(
Ns
 =
c
ik ii
2
i
2
ik
2
ˆ
i
σ
∂
σ
∂
β
(11)
where c
st
denotes the st
th
element in the inverse of the correlation matrix. Consequently,
the impact of an increase in the pairwise correlation between two variables upon
estimator precision is indeterminant.
Finally, for a given "degree of multicollinearity," individual coefficient estimators
may be statistically significant if the overall fit of the model
2
( ) R
is good enough. To be more specific
(12)
i
i
i
/ 2
ˆ
ˆ

>
t
s
α
β
β
β
if and only if
2
2 2 i 2 i
i
i
2 2
2 y
ˆ
(  )
N
> 1  (1  )
s
R
t s α
β
β
ρ
In other words, for any degree of multicollinearity, as measured by
2
i
ρ , the estimate of β
i
will be statistically significant if the adjusted R
2
(
2
R ) is large enough to satisfy the
inequality in equation (12). This inequality can be easily derived by squaring both sides
of the first inequality, replacing the 2
ˆ
s
i
β
by
2
2
ti
s
n Var( )(1  )
x
i
ρ
, noting that
III A
10
2
2
2
/( )
1 1
/( 1)
y
SSE n k s
R
SST n s
−
= − = −
−
and manipulating the resulting expression. The second
inequality in (12) can also be rewritten in terms of R
2
.
III A
11
4. Some proposed "solutions" to the multicollinearity problem
There have been numerous solutions proposed to circumvent the multicollinearity
problem. However, the basic problem with multicollinearity is that the variables
(exogenous) may be moving so closely together as to make it difficult to obtain accurate
estimates of individual effects and, consequently, each proposed technique has associated
problems. It should be mentioned that even for the case of severe (not perfect)
multicollinearity, least squares estimators are unbiased, minimum variance of all unbiased
estimators, consistent, and are asymptotically efficient as long as (A.1)(A.5) are satisfied.
Some suggested solutions include:
(1) Obtain more data: If additional data had been available it would probably have been
used initially. One might try combining cross sectional and time series data. Panel
data often includes more variability and less collinearity among the variables.
(2) Principle components: Replace "problem variables" with a fewer number of linear
combinations of the deleted variables which "accounts for most of their explanatory
power (variance)." This approach is associated with interpretational problems as well as
resulting in the possibility of biased estimators.
(3) Delete a variable: The deletion of one of the variables which is "nearly" linearly related
to the other independent variables is a common practice, but may result in biased
estimators if it is an important variable.
(4) Impose constraints on the parameters: This approach is really a generalization of
(3) deleting a variable, i.e., β
i
= 0. However, there may be theoretical reasons for
imposing constraints on the parameters such as constant returns to scale in a production
function or no money illusion in demand equations. The validity of these constraints
could be investigated using a Chow or likelihood ratio test. Judge has shown that least
squares estimator which takes account of linear constraints is minimum variance among
estimators satisfying the constraint. If the constraint is not true, the estimator will be
biased and have variances equal to unconstrained least squares.
III A
12
(5) Ridge Regression Techniques
A simple ridge regression estimator is given by the following
ˆ
β (k) = (X'X + kI)
1
X'y.
The ridge regression estimator will be biased (bias(
ˆ
β (k)) = k(X'X + kI)
1
β), but the
value of k is often selected to minimize the MSE (
ˆ
β (k)), say for k*. Note that for k = 0
the ridge estimator is the OLS estimator of β, i.e.,
ˆ
β (0) =
ˆ
β . It can be shown that
MSE (
ˆ
β (k*)) ≤ MSE (
ˆ
β (0)).
The basis for selected
ˆ
β (k*) is motivated by considering the following figure.
In this case the OLS estimator is unbiased, but has a large variance relative to the biased
ridge estimator. Recall that it can be shown that MSE(
ˆ
β ) = var(
ˆ
β ) + (bias(
ˆ
β ))
2
.
This figure suggests possible benefits by selecting a slightly biased estimator if there are
significant reductions in variance. The MSE is often used to quantify this tradeoff.
Ridge estimators are biased and the problem of statistical inference has not been worked
out.
β ββ β
( )
ˆ
β k *
( )
ˆ
β 0
III A
13
5. PROBLEM SET 4.1
Multicollinearity
Theory
1. Prove that the condition index (C.I.) corresponding to the correlation matrix
1+ 1
C is C.I. =
1 1
ρ ρ
=
ρ ρ
Hint: Use the quadratic formula from college algebra.
(JM IIIA)
2. Prove and discuss equation (12) in the notes on collinearity. (Hint: this problem basically
involves algebraic manipulation, be patient). Based on the result in equation (12), you
can see that statistical significance of individual estimators is retained for an arbitrary
degree of multicollinearity if the explanatory power of the model is high enough.
(JM IIIA 6)
Applied
3. Consider the following data:
Y
t
C
t
W
t
1883 1749 2.36
1909 1756 2.39
1969 1814 2.47
2015 1867 2.52
2126 1943 2.65
2239 2047 2.81
2335 2127 2.93
2403 2164 3.01
2486 2256 3.12
2534 2315 3.18
2534 2328 3.70
Where Y
t
, C
t
, and W
t
, respectively, denote income, consumption, and wage rates.
a. Estimate
(1)
t 1 2 t t
C Y = α + α + ε
III A
14
(2)
t 1 2 t t
C W ′ = β +β + ε
(3)
t 1 2 t 3 t t
C Y W ′′ = γ + γ + γ + ε
using the first ten observations. Also, estimate equation (3) for the entire data set (11
observations). Explain the results.
(JM IIIA)
4. Refer to problem 4 from "HW 2.2: KVariate Regression". Test the hypothesis that
β
3
= β
4
= 0 in equation (2) and reconcile the results with the results obtained based upon
individual tests of significance for β
3
and β
4
using tstatistics.
(JM IIIA)
5. Consider the following set of data:
Y X
2
X
3
2 1 1
4 2 4
6 3 7
8 4 10
10 5 13
12 6 16
14 7 19
16 8 22
18 9 25
20 10 28
Discuss any problems associated with estimating β
1
, β
2
and β
3
in the model
Y
t
= β
1
+ β
2
X
t2
+
3
β X
t3
+ ε
t
.
(JM IIIA)
6. In a study relating college grade point average (GPA) to time spent in various activities,
you distribute a survey to several students. The students are asked how many hours they
spend each week in four activities: studying, sleeping, working, and leisure. Any
activity is put into one of four categories, so that for each student, the sum of hours in the
four activities must be 168.
a. What problems will you encounter in estimating the model
1 2 3 4 4 t
GPA study sleep work leisure = α + α + α + α + α + ε
III A
15
b. How could you reformulate the model so that it’s parameters have a useful
interpretation? (Wooldridge, 3
rd
edition, problem 3.5)
7. A problem of interest to health officials (and others) is to determine the effects of
smoking during pregnancy on infant health. One measure of infant health is birth
weight: a birth weight that is too low can put an infant at risk for contracting various
illnesses. Since factors other than cigarette smoking that affect birth weight are likely to
be correlated with smoking, we should take those factors into account. For example,
higher income generally results in access to better prenatal care, as well as better
nutrition for the mother. An equation that recognizes this is
bwght = β
0
+ β
1
cigs + β
2
faminc + u
a) What do you think is the most likely sign for β
2
?
b) Do you think cigs and faminc are likely to be correlated? Explain why the
correlation might be positive or negative.
c) Now estimate the equation with and without faminc, using the data in BWGHT.RAW.
Report the results in equation form, including the sample size and Rsquared.
Discuss your results, focusing on whether adding faminc substantially changes the
estimated effect of cigs on bwght. Is the estimated coefficient of β
2
statistically
significant?
III A
16
Appendix 1. Derivation of equation (4)
y
t
= β
1
+ β
2
x
t2
+ β
3
x
t3
+ ε
t
y = β
1
+ β
2
x
2
+ β
3
x
3
+ ε
( )
t
y  y
= β
2
(x
t2
 x
2
) + β
3
(x
t3
 x
3
) + ε
t
 ε
y˜
t
= β
2
x˜
2
+ β
3
x˜
3
+ ε%
t
The X
%
matrix is given by
x
~
x
~
. .
. .
. .
x
~
x
~
x
~
x
~
x
~
x
~
x
~
x
~
3 n 2 n
43 42
33 32
23 22
13 12
and
12 13
22 23
32 33
12 22 n2 42 43
13 23 n3
n2 n3
x x
x x
x x
...
x x x x x
(X X) =
... . .
x x x
. .
. .
x x
′
% %
% %
% %
% % % % %
% %
% % %
% %
2
t 2 t 3 t 2
2
t3 t 2 t 3
x x x
=
x x x
∑ ∑
∑ ∑
% % %
% % %
m m
m m
=
33 32
23 22
III A
17
Appendix 2. Derivation of equation (6)
m

m m
m m

m

m
=
m m
m m
2
23 33 22
22 23
23 33
33 32
23 22
1 
m

m m
m
= )
ˆ
Var(
2
23 33 22
33
2
2
σ
β
m
m

m m
=
33
2
23 33 22
2
σ
m
m

m
=
33
2
23
22
2
σ
m m
m m

m
=
33 22
2
23 22
22
2
σ
) (
m

m
=
2
23
22 22
2
ρ
σ
)  (1
m
=
2
23
22
2
ρ
σ
)  )(1
x
~
(
=
2
23
2
2 t
2
ρ ∑
σ
Similarly,
m

m m
m
= )
ˆ
Var(
2
23 33 22
22
2
3
σ
β
)  (1
m
=
2
23
33
2
ρ
σ
)  )(1
x
~
(
=
2
23
2
3 t
2
ρ ∑
σ
I I I B 1
James B. McDonal d
Br i ghamYoung Uni ver si t y
2/ 18/ 2010
IV. Miscellaneous Topics
B. Binary Variables (Dummy Variables)
Many var i abl es, whi ch we may want t o i ncl ude i n an economet r i c model , may
not be quant i t at i ve ( measur abl e) , but r at her ar e qual i t at i ve i n nat ur e. For
exampl e, an i ndi vi dual wi l l be a homeowner , or wi l l not ; wi l l be mar r i ed or
not . Such char act er i st i cs may have a bear i ng on an i ndi vi dual ' s behavi or , but
ar e not quant i f i abl e. One way t o i ncl ude t he ef f ect of such char act er i st i cs
i s t o i nt r oduce bi nar y or dummy var i abl es. For exampl e, l et t he bi nar y
var i abl e D
t
i ndi cat e whet her a gi ven i ndi vi dual i s mar r i ed or not by def i ni ng
D
t
= 0 i f t he t
th
i ndi vi dual i s si ngl e and D
t
= 1 i f t he t
th
i ndi vi dual i s
mar r i ed.
We now consi der sever al model s whi ch make use of dummy var i abl es, di scuss
t he dummy var i abl e t r ap, i ndi cat e some i nt er est i ng gener al i zat i ons, and
i nvest i gat e appl i cat i ons of t hese t echni ques t o sever al pr obl ems i n
economi cs.
1. Models with binary explanatory variables
a. An exampl e: t he r el at i onshi p bet ween sal ar y and a col l ege degr ee
Let Y
t
= Annual sal ar y of t he t
th
per son i n t he sampl e,
D
1t
= 1 i f t he t
th
per son i s a col l ege gr aduat e
= 0 ot her wi se,
D
2t
= 1 i f t he t
th
per son i sn' t a col l ege gr aduat e
= 0 ot her wi se.
Not e t hat D
2t
= 1  D
1t
Consi der t he f ol l owi ng t wo model s whi ch can be used t o st udy t he
i mpact of a col l ege degr ee on annual sal ar y.
Model 1:
Y
t
= α
1
+ α
2
D
1t
+ ε
t
I I I B 2
Model 2:
Y
t
= β
1
D
1t
+ β
2
D
2t
+ ε
t
.
The coef f i ci ent s i n t he t wo r epr esent at i ons have di f f er ent
i nt er pr et at i ons as summar i zed i n t he f ol l owi ng t abl e.
E( Y
t
)
E( Y
t
Model 1
α
1
+ α
2
Model 2
β
1
E( Y
t
α
1
β
2
I n t he model wi t h one f ewer dummy var i abl es t han cat egor i es
( model 1; cat egor i es = col l ege gr aduat e, not a col l ege gr aduat e)
t he coef f i ci ent of t he bi nar y var i abl e r epr esent s t he expect ed
di f f er ence or di f f er ent i al bet ween t he i ncome l evel s associ at ed
wi t h st at e of t he i ncl uded dummy var i abl e and t he st at e ( bench
mar k) associ at ed wi t h t he del et ed dummy var i abl e, i . e. ,
α
2
= E( Y
t
gr aduat e)  E( Y
t
not a col l ege gr aduat e)
The coef f i ci ent s i n t he r epr esent at i on whi ch i ncl udes t he
same number of bi nar y var i abl es as cat egor i es ( model 2) r epr esent
t he expect ed i ncome l evel associ at ed wi t h each cat egor y.
b. Est i mat i on:
Assume t hat we have a t ot al of n obser vat i ons wi t h t he
f i r st n
1
( n
1
+ n
2
= n) havi ng col l ege degr ees. The t wo
di f f er ent model s can be wr i t t en i n mat r i x not at i on as
Model 1:
I I I B 3
ε
ε
ε
α
α
n
2
1
2
1
n
2
1
+
0 1
0 1
1 1
1 1
=
Y
Y
Y
M
M M
M M
M
or Y = X α + ε
Model 2:
ε
ε
ε
β
β
n
2
1
2
1
n
2
1
+
1 0
1 0
0 1
0 1
=
Y
Y
Y
M
M M
M M
M
or Y = X*β + ε .
The l east squar es est i mat or s of t he vect or s α and β ar e
gi ven by
αˆ = ( X' X)
 1
X' Y
α
α
ˆ
ˆ
=
Y

Y
Y
=
2
1
2 1
2
and
β
ˆ
= ( X*' X*)
 1
X*' Y
β
β
ˆ
ˆ
=
Y
Y
=
2
1
2
1
wher e
Y1
and
Y2
r espect i vel y, denot e t he sampl e mean i ncome
f or t hose havi ng col l ege degr ees and t hose wi t hout a
I I I B 4
degr ee. Not e t hat t hese ar e sampl e est i mat es ( sampl e means)
of t he popul at i on means.
c. Dummy Var i abl e Tr ap
Consi der t he model
Y
t
= γ
1
+ γ
2
D
1t
+ γ
3
D
2t
+ ε
t
or i n mat r i x f or m
, +
1 0 1
1 0 1
0 1 1
0 1 1
=
Y
Y
Y
n
2
1
3
2
1
n
2
1
ε
ε
ε
γ
γ
γ
M
M M M
M M M
M
Y = X**γ + ε
The l east squar es est i mat or s of γ, i f t hey exi st , ar e gi ven
by
γˆ = ( X**' X**)
 1
X**' Y.
Not e t hat
1 1 0
1 1 0
1 1 1
. . .
X**'X** = 1 1 1 0 . 0
. 1 0
0 0 0 1 . 1
1 0 1
1 0 1
K
K
K
;
n
0
n
0
n n
n n
n
=
2 2
1 1
2 1
I I I B 5
hence, t he f i r st col umn i s equal t o t he sumof t he second
and t hi r d col umns and
X**' X** = 0.
Ther ef or e, ( X**' X**)
 1
and t he vect or γˆ i s not def i ned.
Not e t hat t hi s pr obl emcoul d be det ect ed by not i ng t hat t he
f i r st col umn i n X** i s equal t o t he sumof t he second and
t hi r d col umns.
The dummy var i abl e t r ap cor r esponds t o i ncl udi ng an
i nt er cept i n a model i n whi ch t he same number of dummy
var i abl es have been i ncl uded as cat egor i es f or t he
qual i t at i ve char act er i st i c. The dummy var i abl e t r ap can be
t hought of as r esul t i ng I per f ect mul t i col l i near i t y.
Two appr oaches t o avoi di ng t he dummy var i abl e t r ap ar e
:
( 1) use an i nt er cept and one f ewer dummy var i abl e
t han cat egor i es or
( 2) i ncl ude t he same number of dummy var i abl es as
cat egor i es ( wi t h onl y one char act er i st i c) , but
del et i ng t he i nt er cept .
I I I B 6
d. Gener al i zat i ons
Ther e ar e numer ous ways i n whi ch dummy var i abl es can be
advant ageousl y used i n f or mul at i ng economet r i c model s.
Sever al qual i t at i ve char act er i st i cs can be model ed i n t he
same equat i on wi t h or wi t hout quant i t at i ve var i abl es. I f
sever al qual i t at i ve char act er i st i cs ar e t o be i ncl uded i n a
model as expl anat or y var i abl es, an i nt er cept and one f ewer
dummy var i abl es t han cat egor i es shoul d be i ncl uded f or each
qual i t at i ve char act er i st i c. I nt er act i on t er ms ( pr oduct s of
bi nar y var i abl es) can be i ncl uded. The dependent var i abl e
can be chosen t o be a bi nar y var i abl e i n appl i cat i ons such
as sel ect i ng good l oan appl i cant s or i n det er mi ni ng whi ch
i ncome t ax r et ur ns t o audi t . Al t er nat i ve appr oaches t o
usi ng dummy var i abl es as dependent var i abl es ar e avai l abl e
and a f ew wi l l be di scussed i n Sect i on 2 ( I I I . B. 2) .
e. Some exampl es and pr ecaut i onar y comment s
( 1) Consumpt i on behavi or i n war t i me ( or ot her uni que t i me
per i ods)
Def i ne Z
t
= 1 i f t cor r esponds t o war t i me and 0
ot her wi se.
I ndi cat e how t o model each of t he f ol l owi ng
si t uat i ons.
( 1) ( 2)
( 3)
β
2
β
1
β
1
β
2
β
1
I I I B 7
wher e C
t
and Y
t
denot e consumpt i on and i ncome i n
per i od t . Case ( 1) cor r esponds t o a model wi t h
di f f er ent sl opes and a common sl ope, ( 2) a common
i nt er cept and di f f er ent sl opes, and ( 3) t he
possi bi l i t y of di f f er ent i nt er cept s and sl opes.
I t can be shown t hat usi ng dummy var i abl es t o
est i mat e t he i nt er cept ( s) and sl ope( s) i s mor e
ef f i ci ent t han r unni ng separ at e r egr essi ons i n
cases ( 1) and ( 2) but i s equi val ent t o r unni ng
separ at e r egr essi ons f or case 3.
( 2) I nt er act i on Ter ms
The use of bi nar y var i abl es i n r egr essi on model s
t akes account of "addi t i ve" ef f ect s. For
exampl e, consi der t he model
Sal ar y = β
1
+ β
2
( i ncome) + β
3
( gender ) + β
4
( r ace)
wher e
Gender = 1 f emal e
= 0 ot her wi se
Race = 1 mi nor i t y
= 0 ot her wi se.
3 4
and β β , r espect i vel y, measur e t he addi t i ve
i mpact on sal ar i es of bei ng a woman and a member of
a mi nor i t y. I f t he dat a suggest t hat t her e i s an
ext r a i mpact ( posi t i ve or negat i ve) of bei ng a
woman and a mi nor i t y, t hi s can be model ed usi ng an
I I I B 8
i nt er act i on t er mZ = ( Gender ) ( Race) by est i mat i ng
t he model
Sal ar y = β
1
+ β
2
( i ncome) + β
3
( Gender ) + β
4
( Race)
+ β
5
Z.
β
ˆ
5
coul d be t est ed f or st at i st i cal l y
si gni f i cance. A si mi l ar appr oach coul d be t aken t o
al l ow gender , r ace, and i nt er act i on ef f ect s t o i mpact
t he sl ope.
( 3) The Rat chet t Ef f ect
Thi s exampl e does not use dummy var i abl es, but i l l ust r at es
how i magi nat i ve use of dat a can be pr of i t abl y ut i l i zed. Let
Y
t
* = hi ghest i ncome l evel exper i enced. Consi der t he
f ol l owi ng f i gur es.
I I I B 9
The consumpt i on f unct i on depi ct ed i n t he f i r st f i gur e can be
est i mat ed f r omt he f ol l owi ng equat i on
C
t
= βY
t
+ γ( Y*
t
 Y
t
) .
Not e t hat f or per i ods i n whi ch t her e i s "gr owt h" ( not j ust
r ecover y) Y
t
= Y
t
*
and C
t
= βY
t
and dur i ng a r ecessi on or
associ at ed r ecover y Y*
t
i s f i xed and i s gr eat er t han Y
t
and C
t
= γY
y
* + ( β  γ) Y
t
. I n or der t o t est t o see i f aggr egat e
behavi or al di f f er ences exi st dur i ng gr owt h per i ods as compar ed
wi t h r ecessi on or r ecover y per i ods t he hypot hesi s H
0
: γ = 0
coul d be t est ed.
( 4) A Pr ecaut i onar y Not e
I I I B 10
Consi der t he pr obl emof model i ng t he i mpact of educat i on
upon sal ar y wher e educat i on f or each i ndi vi dual i s r epor t ed as
bei ng ( a) hi gh school ( HS) or l ess, ( b) havi ng at t ended
col l ege ( BS) , ( c) Mast er ' s degr ee ( MS) , or ( d) havi ng a Ph. D.
( PhD) .
The l evel of educat i on mi ght be measur ed i n sever al ways.
Thr ee of whi ch mi ght be ( E1, E2 or E3) :
E1
E2
E3
HS
1
12
Number of
Year s
At t endi ng
School
BS
2
16
MS
3
18
PhD
4
20
E1 assi gns an i ndex t o t he cat egor i es ( assumi ng a monot oni c
r el at i onshi p) , E2 i s a r ough measur e of t he number of year s
of school , and E3 assumes a l i near r el at i onshi p bet ween t he
dependent var i abl e and t he number of year s of school .
Al t er nat i vel y, bi nar y var i abl es coul d be used whi ch al l ow
di f f er ent i at ed i mpact s f or di f f er ent degr ees. To expl or e
t hi s appr oach f ur t her , l et
D1 = 1 HS
= 0 Ot her wi se
D2 = 1 BS
= 0 Ot her wi se
D3 = 1 MS
= 0 Ot her wi se
D4 = 1 PhD
= 0 Ot her wi se
I I I B 11
Now consi der t he f our model s f or r el at i ng sal ar y t o t he
l evel of educat i on:
Model 1. S
t
= α
1
+ α
2
E1
t
+ ξ
t
Model 2. S
t
= β
1
+ β
2
E2
t
+ η
t
Model 3. S
t
= γ
1
+ γ
2
E3
t
+ ψ
Model 4. S
t
= δ
1
+ δ
2
D
2t
+ δ
3
D
3t
+ δ
4
D
4t
+ ε
t
These f or mul at i ons have ver y di f f er ent i mpl i cat i ons f or t he
est i mat ed mar gi nal benef i t of obt ai ni ng a hi gher degr ee or an
addi t i onal year of school . These r esul t s ar e summar i zed i n
t he next t abl e.
Mar gi nal Benef i t of an Addi t i onal Degr ee*
Model 1
Model 2
Model 4
BS
α
2
4β
2
δ
2
MS
α
2
2β
2
δ
3
 δ
2
PhD
α
2
2β
2
δ
4
 δ
3
*Model t hr ee assi gns a const ant mar gi nal expect ed val ue of γ
2
t o each addi t i onal year of school at al l educat i onal l evel s.
Not e t hat onl y model 4 al l ows f or di f f er ent i at ed r et ur ns t o
degr ees. These r et ur ns can even be negat i ve. I f δ
2
and δ
3
 δ
2
ar e
posi t i ve and δ
4
 δ
3
i s negat i ve, t hi s suggest s t hat expect ed
sal ar i es ar e hi gher f or i ndi vi dual s havi ng a BS or MS r at her t han
t he l ower degr ee, but t hat t he expect ed sal ar y f or t hose wi t h PhDs
I I I B 12
i s l ower t han sal ar i es of t hose wi t h a MS. Model 1 i mpl i es a
const ant mar gi nal benef i t f or at t ai ni ng each addi t i onal degr ee.
Al so not e t hat i n model s 1, 2, and 4 t he mar gi nal benef i t of
addi t i onal year s of school i ng i n each f or mul at i on i s zer o unl ess
t her e i s a change i n gr oup member shi p ( addi t i onal degr ee i s
ear ned) .
The f or mul at i on associ at ed wi t h Model 1 i mpl i es t hat t he
mar gi nal benef i t i s l i near i n t he educat i on var i abl e. The
est i mat es al so depend upon how t he gr oups ar e number ed. For
exampl e, i f t he var i abl e has been def i ned as
E1*
HS 1
PhD 2
BS 3
MS 4
Thi s woul d suggest t hat t he mar gi nal benef i t of a Ph. D. over
havi ng not gone past hi gh school i s t he same as t he expect ed
benef i t of havi ng an MS degr ee i nst ead of st oppi ng at a BS
degr ee.
We need t o be ver y car ef ul about t he i mpl i cat i ons of t he adopt ed
speci f i cat i on. Some r epr esent at i ons of t he i mpact of mar i t al
st at us on dependent var i abl es ar e subj ect t o t he pr evi ousl y
ment i oned i ssues. I nt r oduci ng di f f er ent bi nar y var i abl es f or
di f f er ent cat egor i es al l ows t he gr eat est f l exi bi l i t y. We may al so
want t o al l ow f or nonl i near r el at i onshi ps bet ween var i abl es such
as weal t h, r egr essi ng per sonal i ncome or weal t h on age and ( age)
2
t o t ake account of a l i f e cycl e ef f ect .
I I I B 13
2. Models with binary dependent variables or limited dependent variables
a. I nt r oduct i on
Consi der model s i n whi ch one mi ght want t o expl ai n
( 1) when t her e wi l l be a def aul t on a l oan ( Y = 1) or no def aul t
( Y = 0)
( 2) whet her a t ax r et ur n has been f i l ed by someone who has
mi sr epr esent ed t hei r f i nanci al posi t i on ( Y = 1) or accur at el y
r ef l ect s t he si t uat i on ( Y = 0)
( 3) The mar ket shar e of a f i r m( 0 ≤ Y ≤ 1)
These ar e known as l i mi t ed dependent var i abl e pr obl ems.
Amemi ya ( 1981) has an excel l ent sur vey paper i n t he Jour nal of
Economi c Li t er at ur e.
I n each case t he dependent var i abl e ( Y) i n t he f unct i on
Y = f ( X; β) + ε
i s const r ai ned i n val ue.
Numer ous appr oaches have been adopt ed f or t hi s pr obl emand
t hese i ncl ude r egr essi on anal ysi s, l i near pr obabi l i t y model s,
di scr i mi nant anal ysi s, and l i mi t ed dependent model s.
b. Li near Pr obabi l i t y Model ( LPM)
Let y
t
= α + βX
t
+ ε
t
y
t
= 1 i f f i r st opt i on chosen
0 ot her wi se
x
t
vect or of val ues of at t r i but es
( i ndependent var i abl e( s) )
ε
t
i ndependent l y di st r i but ed r andomvar i abl e
wi t h a zer o mean
Implications of the LPM:
• E( y
t
) = X
t
β
Now l et P
t
= Pr ob( y
t
= 1)
I I I B 14
Q
t
= 1  P
t
= Pr ob( y
t
= 0)
so t hat
E( y
t
) = 1 • Pr ob( y
t
= 1) + 0 • Pr ob( y
t
= 0)
= 1 • P
t
+ 0 • Q
t
= P
t
Thus t he r egr essi on equat i on descr i bes t he pr obabi l i t y t hat t he
f i r st choi ce i s made. The vect or β measur es t he ef f ect of a uni t
change i n t he expl anat or y var i abl es on t he pr obabi l i t y of choosi ng
t he f i r st al t er nat i ve. OLS can be used t o est i mat e t he LPM;
however , t her e i s some quest i on about t he appr opr i at eness of OLS
i n t hi s model . To appr eci at e t he r easons f or t hi s concer n, not e
t he f ol l owi ng:
ε
t
= y
t
 X
t
β
• Si nce y can onl y assume t he val ues of 0 or 1, ε
t
can’t be
di st r i but ed nor mal l y.
Fur t her , E( ε
t
) = P
t
( 1  X
t
β) + ( 1  P
t
) (  X
t
β) and i f
E( ε
t
) = 0 t hi s i mpl i es
P
t
= X
t
β and
( 1  P
t
) = 1  X
t
β.
Now t o f i nd t he var i ance of t he er r or t er mε
t
• Var ( ε
t
) = E( ε
2
t
) = ( 1  X
t
β)
2
P
t
+ (  X
t
β)
2
( 1  P
t
)
I I I B 15
= ( 1  X
t
β)
2
( X
t
β) + ( X
t
β)
2
( 1  X
t
β)
= ( 1  X
t
β) ( X
t
β)
whi ch shows t hat t he variance of the error depends on the
independent variables and, by definition, is heteroskedastic. One
possi bl e sol ut i on t o t hi s pr obl emi s t o use wei ght ed l east
squar es.
• Anot her pr obl emwi t h t he LPMi s t hat of pr edi ct i on:
Not e t hat wi t h t he l i near pr obabi l i t y model t her e i s a chance
t hat pr edi ct ed val ues f or y
t
may l i e out si de t he i nt er val [ 0, 1] .
One possi bl e sol ut i on i s t o set al l pr edi ct i ons gr eat er t han 1
equal t o 1 and al l pr edi ct i ons l ess t han 0 equal t o zer o.
However , t hese obser vat i ons pr esent a pr obl emi n r unni ng wei ght ed
l east squar es.
I I I B 16
c. Qual i t at i ve Response Model s
( 1) I nt r oduct i on
Anot her possi bi l i t y f or bi nar y or l i mi t ed dependent var i abl es
i s t o use const r ai ned est i mat i on. Di scr i mi nant anal ysi s i s st i l l
anot her appr oach. Si nce obser ved val ues f or Y
t
ar e const r ai ned t o
t he i nt er val ( 0, 1) , f unct i onal f or ms F( X
t
) whi ch ar e const r ai ned
t o t he i nt er val ( 0, 1) can be sel ect ed. Thi s qui t e nat ur al l y
suggest s usi ng cumul at i ve pr obabi l i t y di st r i but i ons f or F( X
t
) .
F( X
t
) = P
t
Thi s possi bi l i t y admi t s many al t er nat i ve model s:
( )
t
X
t t t
= Pr Y 1 X F( ; ) = f(s; ) ds
P X
β
−∞
= = β θ θ
∫
wher e f ( s; θ) denot es a "wel l behaved" pr obabi l i t y densi t y f unct i on
wi t h di st r i but i onal par amet er s θ. F( X
t
β; θ) i s t he cor r espondi ng
cumul at i ve di st r i but i on f unct i on eval uat ed at X
t
β, whi ch i s
somet i mes r ef er r ed t o as t he scor e . Two model s whi ch have been
wi del y used ar e t he st andar d nor mal and l ogi st i c model s:
I I I B 17
f ( s; θ)
z

F(z) = f(s; ) ds
∞
θ
∫
Nor mal
π 2
e
2 / s 
2
π
∫
∞
2
e
2 / s 
z

2
Logi st i c
)
e
+ (1
e
2
s 
s 
z
1
1 +
e
These t wo di st r i but i ons ar e onl y t wo of many whi ch coul d have been
used, but cur r ent l y domi nat e t hi s l i t er at ur e and ar e r espect i vel y
known as pr obi t ( based on t he nor mal ) and l ogi t ( based on t he l og
l ogi st i c) model s.
( 2) Est i mat i on
The est i mat i on of l i mi t ed dependent model s depends upon t he
model or densi t y sel ect ed and t he nat ur e of t he dat a.
( a) Y
t
= 0 or 1 and ( b) 0 < Y
t
< 1.
I f we have dat a based on di scr et e choi ces, t hen we have t he case
(a) Y
t
= 0 or 1.
The l i kel i hood f unct i on i n t hi s case i s gi ven by
t t
n
1Y Y
t t t
t =1
L( , ; ) = (1  )
Y P P
β θ
Π
t t
n
1 Y Y
t t
t 1
= F( ; (1  F( ; ) ) )
x x
=
β θ β θ
∏
and t he l og l i kel i hood f unct i on i s
{ }
n
t t t t t
t =1
( , ; ) lnF( ; ) + (1  ) ln(1  F( ; ) .
Y Y x Y x
β θ = β θ β θ
∑
l
Thi s expr essi on i s maxi mi zed over t he par amet er s β and θ t o obt ai n
maxi muml i kel i hood est i mat or s. Thi s pr ocedur e can be qui t e
I I I B 18
i nvol ved i f t he expr essi on f or t he cumul at i ve di st r i but i on i s
compl i cat ed. Recal l t hat
ds ) f(x; = )
x
Pr(z = t) ,
x
F(
x

t t
t
θ
∫
β ≤ β
β
∞
wher e θ denot es unknown di st r i but i onal par amet er s. Any pdf coul d
be sel ect ed i n t he pr evi ous f r amewor k. The pr edi ct ed i mpact of a
change i n t he expl anat or y var i abl es depends on t he pdf as
( )
( )
Pr 1
t t
i t
it
Y X
f X
X
β β
∂ =
=
∂
.
Thus, t he
i
β coef f i ci ent s al one do not pr ovi de est i mat es of t he
mar gi nal i mpact of a change i n
t
X on
( )
Pr 1
t t
Y X = .
I I I B 19
probit Y X1 X2, options
St at a commands f or est i mat i ng l i mi t ed dependent var i abl es
model s. As not ed ear l i er , t he t wo most commonl y used pdf ’s i n
qual i t at i ve r esponse model s ar e t he nor mal and l ogi st i c
di st r i but i ons wi t h t he cor r espondi ng qual i t at i ve r esponse model s
bei ng r ef er r ed t o as t he pr obi t and l ogi t model s whi ch can be
est i mat ed i n most common economet r i c sof t war e packages. Some
usef ul St at a commands i n wor ki ng wi t h bi nar y var i abl es ar e gi ven
bel ow:
• To cr eat e dummy var i abl es i n St at a, use t he “gen” command
as f ol l ows:
gen dummy_var = exp
wher e exp i s an expr essi on t hat cat egor i zes t he
dummy_var as a 0 or 1. For exampl e, t o t ake a
cont i nuous var i abl e on i ncome and cr eat e a dummy
var i abl e wher e a 0 r epr esent s “l ess t han $50, 000
annual l y” and a 1 r epr esent s “$50, 000 or mor e
annual l y, ” use t he f ol l owi ng command:
gen income_dummy = income >= 50000
• The pr obi t model can be est i mat ed usi ng St at a wi t h t he
command
The maxi muml i kel i hood est i mat es, of β
1
, β
2
, β
3
and l og
l i kel i hood val ues wi l l be r epor t ed. The mar gi nal i mpact of
changes i n t he expl anat or y var i abl es on t he pr edi ct i ons
( ( )
i t
f X β β ) r at her t han
i
β can be obt ai ned by usi ng t he command
I I I B 20
logit Y X1 X2, options
dprobit Y X1 X2, options
A pr edi ct i on mat r i x can be pr i nt ed usi ng t he command:
estat classification, cutoff(#)
The el ement s on t he mai n di agonal ar e t he number of cor r ect
pr edi ct i ons and t he of f di agonal el ement s i ndi cat e t he number of
mi sses.
Obser ved
D
~D
Pr edi ct ed
+
M
11
M
12
–
M
21
M
22
The opt i on,
estat classification,cutoff(for example, .5)
speci f i es t he val ue at whi ch an obser vat i on has a pr edi ct ed
posi t i ve out come. The def aul t cut of f poi nt i s 0. 5.
• Si mi l ar Logi t r esul t s can be obt ai ned usi ng t he command
• Pr edi ct i on mat r i ces f or t he LPMcan be obt ai ned as
f ol l ows
r eg y X’s
pr edi ct yhat
gen pr edy = yhat >. 5
t abul at e y pr edy
I I I B 21
( b) Limited dependent variables models where 0 < Y
t
< 1
I f we have a di scr et e choi ce model wi t h gr ouped dat a or a
model wi t h t he dependent var i abl e st r i ct l y bet ween 0 and 1,
al t er nat i ve est i mat i on t echni ques ar e avai l abl e.
One appr oach i s t o use
m
v
= pˆ
t
t
t
v
t
= number choosi ng t he f i r st r esponse i n t he
t
th
gr oup
m
t
= number i n t he t
th
gr oup
F
 1
( Pˆ
t
) = X
t
β or
F
 1
( Y
t
) = X
t
β
I f F i s known, t hen r egr essi on t echni ques can be empl oyed t o
est i mat e t he vect or β. Recal l t hat t he pr obi t model i s based
upon t he nor mal cumul at i ve di st r i but i on f unct i on and
π
∫
β
∞
2
ds
e
=
) s (
x

2 / 2
t
.
The Logi t model i s based upon t he l ogi st i c di st r i but i on f unct i on
e
+ 1
1
= )
x
F(
t t
 x 
t
ε β
β
The pr obi t model i nvol ves r at her compl i cat ed est i mat i on and t her e
i s no compel l i ng r eason t hat t he nor mal shoul d be used. The Logi t
has t hi cker t ai l s, but appr oxi mat es t he pr obi t model .
The Logi t model i s par t i cul ar l y wel l sui t ed f or gr ouped dat a
or ot her si t uat i ons i n whi ch
0 < Y
t
= F( X
t
B) < 1.
Thi s can be seen by sol vi ng
e
+ 1
1
= )
x
F(
t t
 x 
t
ε β
β = Y
t
f or
t t
X β ε + whi ch yi el ds
I I I B 22
t 1
t t t
t
Y
( ) = ln = +
F Y x
1 
Y
t
Z β
ε
=
Regr essi on t echni ques can be di r ect l y used t o obt ai n est i mat or s of
β wher e t he dependent var i abl e ( Z
t
=l n( Y
t
/ ( 1 Y
t
) ) i s r egr essed on
t he X
t
’s. Not e t hat Y
t
≠ 0 or 1 i n t hi s r epr esent at i on.
3. PROBLEM SET 4.2
Dummy/Binary variables
Problems 1, 2, 3, 4, and 5 deal wi t h bi nar y i ndependent var i abl es, i ncl udi ng
t he use of i nt er act i on t er ms. Pr obl ems 5 and 6 f ocus on model i ng bi nar y
dependent var i abl es.
Theory
1. Suppose you col l ect dat a f r oma sur vey on wages, educat i on, exper i ence,
and gender . I n addi t i on you ask f or i nf or mat i on about mar i j uana usage.
The or i gi nal quest i on i s: "On how many occasi ons l ast mont h di d you smoke
mar i j uana?"
a) Wr i t e an equat i on t hat woul d al l ow you t o est i mat e t he ef f ect s of
mar i j ana usage on wage, whi l e cont r ol l i ng f or ot her f act or s. You
shoul d be abl e t o make st at ement s such as, "Smoki ng mar i j uana f i ve
mor e t i mes per mont h i s est i mat ed t o change wage by x%. "
b) Wr i t e a model t hat woul d al l ow you t o t est whet her dr ug usage has
di f f er ent ef f ect s on wages f or men and women, whi l e cont r ol l i ng f or
ot her var i abl es. How woul d you t est t hat t her e ar e no di f f er ences i n
t he ef f ect s of dr ug usage f or men and women? You may want t o model
t he i mpact of i nt er act i ons.
c) Suppose you t hi nk i t i s bet t er t o measur e mar i j uana usage by
put t i ng peopl e i nt o one of f our cat egor i es: nonuser , l i ght user ( 1 5
t i mes per mont h) , moder at e user ( 6 10 t i mes per mont h) , and heavy
user ( mor e t han 10 t i mes per mont h) . Now wr i t e a model t hat al l ows
you t o est i mat e t he ef f ect s of mar i j uana usage on wage, whi l e
cont r ol l i ng f or ot her var i abl es and avoi di ng t he dummy var i abl e t r ap.
I I I B 23
d) Usi ng t he model i n par t ( c) , expl ai n i n det ai l how t o t est t he
nul l hypot hesi s t hat mar i j uana usage has no ef f ect on wage. Be ver y
speci f i c and i ncl ude a car ef ul l i st i ng of degr ees of f r eedom.
e) What ar e some pot ent i al pr obl ems wi t h dr awi ng causal i nf er ence
usi ng t he sur vey dat a you col l ect ed?
(Wooldridge 7.8)
Applied
2. The f i l e TRAFFI C2. RAWcont ai ns dat a on t r af f i c acci dent s i n Cal i f or ni a
f r om1981 t o 1989, wi t h each mont h bei ng a separ at e obser vat i on. You
suspect t hat Cal i f or ni a t r af f i c acci dent s ( l i st ed i n dat a f i l e as
var i abl e totacc) may be cor r el at ed wi t h t he mont h of t he year .
a) Run a r egr essi on t hat shows t he ef f ect of t he mont h on t he number
of t r af f i c acci dent s. Does i t appear t hat seasonal adj ust ment i s
appr opr i at e when moni t or i ng t he number of Cal i f or ni a t r af f i c
acci dent s? Just i f y.
b) You may have not i ced t hat t he dat a di d not i ncl ude t he var i abl e
jan so t hat t he number of dummy var i abl es woul d be one l ess t han t he
number of cl assi f i cat i ons. I nser t a var i abl e jan. And set jan = 1
f or Januar y obser vat i ons ( i . e. when al l ot her mont h var i abl es equal
zer o) . What est i mat i on pr obl ems ar e t her e wi t h havi ng t he same
number of dummy var i abl es as cl assi f i cat i ons? Est i mat e t hi s
r egr essi on and compar e your r esul t s wi t h t he r esul t s of par t ( i ) .
( RST)
3. Consi der t he f ol l owi ng dat a on t he l engt h of empl oyment and associ at ed
sal ar y l evel .
Empl oyee Sal ar y Year s Empl oyed
1 425 1
2 480 3
3 905 20
4 520 5
5 505 4
6 540 15
7 380 6
I I I B 24
8 440 2
9 420 1
10 405 4
11 650 10
The sal ar y f i gur es ar e r evi ewed by empl oyee number s 1 and 7 and t hey
not e t hat empl oyee number s 1, 2, 7, 9, and 10 ar e member s of a mi nor i t y
gr oup and t hey cl ai mt hat t her e i s evi dence of di scr i mi nat i on i n t he
sal ar y st r uct ur e. Anal yze t hi s asser t i on.
( JM IIIB4)
I I I B 25
4. Consi der t he f ol l owi ng model s:
a. ( )( )
1 2 3 4
Consump Income Wealth Income Wealth α α α α ε = + + + +
wher e Consump denot es consumpt i on expendi t ur es i n dol l ar s and Income
and Wealth ar e measur ed i n dol l ar s.
( 1) Eval uat e t he mar gi nal pr opensi t y t o consume (
Consump
Income
∂
∂
) .
( 2) What i s t he i nt er pr et at i on of
4
α ?
b.
1 2 3 4 5 6
( )( ) Wage Female Race Female Race Education Experience β β β β β β ε = + + + + + +
wher e Wage r epr esent s t he hour l y wage i n dol l ar s, Education measur es
year s of educat i on beyond hi gh school , Experience i s j ob exper i ence
measur ed i n year s, and Female and Race ar e bi nar y var i abl es wi t h Female
=1 f or f emal e empl oyees and Race=1 f or non whi t e and non Hi spani c
empl oyees.
( 1) What i s t he i nt er pr et at i on of each of t he f ol l owi ng
par amet er s?
1
2
3
4
5
6
β
β
β
β
β
β
( 2) What j oi nt hypot hesi s coul d be t est ed t o check f or gender or
r aci al di scr i mi nat i on?
( 3) How coul d t he model be modi f i ed t o al l ow t he possi bi l i t y of
di f f er ent annual i ncr eases i n t he hour l y wage r at e f or f emal es?
I I I B 26
5. Consi der t he f ol l owi ng hypot het i cal dat a ( adapt ed f r omGuj ar at i , p. 473) .
The Y i s a bi nar y var i abl e ( Y=1 owns a home, 0 ot her wi se) and X i s f ami l y
i ncome i n t housands of dol l ar s.
Fami l y Y X Fami l y Y X
1 0 8 21 1 22
2 1 16 22 1 16
3 1 18 23 0 12
4 0 11 24 0 11
5 0 12 25 1 16
6 1 19 26 0 11
7 1 20 27 1 20
8 0 13 28 1 18
9 0 9 29 0 11
10 0 10 30 0 10
11 1 17 31 1 17
12 1 18 32 0 13
13 0 14 33 1 21
14 1 20 34 1 20
15 0 6 35 0 11
16 1 19 36 0 8
17 1 16 37 0 17
18 0 10 38 1 16
19 0 8 39 0 7
20 1 18 40 1 17
a. Fi t a l i near pr obabi l i t y model ( LPM)
1 2
Y X β β ε = + +
t o t he dat a and i nvest i gat e t he pr edi ct i ve abi l i t y of t he
est i mat ed model .
b. Fi t pr obi t and l ogi t model s t o t hi s same dat a set and compar e t he
pr edi ct i on r esul t s. I ncl ude t he pr edi ct i on mat r i ces.
For pr obi t or l ogi t model s of t he f or m
y = β
0
+ β
1
x1 + β
2
x2 + . . . + β
k
xk
Stata uses t he commands:
probit y x1 x2 . . . xk
logit y x1 x2 . . . xk
I n or der t o pr i nt t he pr edi ct i on mat r i x usi ng
a . 5 t hr eshol d use t he command
I I I B 27
c. Compar e t he f or ecast i ng abi l i t y of t he t hr ee model s ( LPM, pr obi t ,
and l ogi t ) cor r espondi ng t o a cut of f val ue of . 3 Use t he command,
estat class, cutoff(.3)
d. Compar e t he mar gi nal i mpact of a change i n i ncome on t he
l i kel i hood of homeonwner shi p usi ng t he t hr ee model s.
6. Let grad be a dummy var i abl e f or whet her a st udent  at hl et e at a l ar ge
uni ver si t y gr aduat es i n f i ve year s. Let hsGPA and SAT be hi gh school
gr ade poi nt aver age and SAT scor e, r espect i vel y. Let study be t he number
of hour s spent per week i n an or gani zed st udy hal l . Suppose t hat , usi ng
dat a on 420 st udent  at hl et es, t he f ol l owi ng l ogi t model i s obt ai ned:
( ) ( )
ˆ
1 , , 1.17 .24 .00058 .073 P grad hsGPA SAT study hsGPA SAT study = = Λ − + + +
wher e ( ) ( ) exp( ) /(1 exp( ))
t
z z z F X β Λ = + = i s t he cdf f or t he l ogi t model .
Hol di ng hsGPA f i xed at 3. 0 and SAT f i xed at 1, 200, comput e t he est i mat ed
di f f er ence i n t he gr aduat i on pr obabi l i t y f or someone who spent 10 hour s
per week i n st udy hal l and someone who spent 5 hour s per week.
( Wool dr i dge, 4
th
edi t i on pr obl em17. 2)
I I I B 28
I I I . C
1
James B. McDonal d
Br i ghamYoung Uni ver si t y
7/ 14/ 2009
IV. Miscellaneous Topics
C. Lagged Variables
I ndi vi dual s f r equent l y r espond t o a change i n i ndependent var i abl es
wi t h a t i me l ag. Consequent l y, economi c model s descr i bi ng i ndi vi dual
behavi or as wel l as model s whi ch at t empt t o r epr esent t he r el at i onshi ps
bet ween aggr egat ed var i abl es wi l l of t en i ncl ude l agged i ndependent
var i abl es or l agged dependent var i abl es. We f i r st consi der model s whi ch
i ncl ude l agged i ndependent var i abl es ( di st r i but ed l ag model s) and t hen
i nvest i gat e model s cont ai ni ng l agged dependent var i abl es ( aut or egr essi ve
model s) . Di st r i but ed l ag and aut or egr essi ve model s pr ovi de an at t empt t o
model dynami c behavi or .
1. Lagged Independent Variables  Distributed Lag Models
a. Di st r i but ed l ag model s ar e of t he f or m:
y
t
= δ + β
0
x
t
+ β
1
x
t  1
+ . . . + β
s
x
t  s
+ u
t
wher e ∂y
t
/ ∂x
t
= β
0
denot es t he i mmedi at e i mpact of a change i n
x on y, ∂y
t
/ ∂x
ti
= β
i
denot es t he i mpact of a change i n x on y
af t er i per i ods. Thus, t he β
i
’s i ndi cat e t he di st r i but i onal
( over t i me) i mpact of x on y.
( 1) Di st r i but ed l ag model s can be est i mat ed usi ng l east squar es i f n
( sampl e si ze) > number of coef f i ci ent par amet er s ( s + 2 = # l ags
+2 ( f or
0
and δ β ) ) and yi el ds BLUE i f u
t
~ NI D ( 0, σ
2
) .
I I I . C
2
( 2) Sever al possi bl e pr obl ems can ar i se i n di st r i but ed l ag model s:
( a) how many l ags shoul d be used ( s=?) , ( b) t he degr ees of
f r eedom( n  k) = n  2s  2 may be smal l f or l ar ge l ags ( s) ,
and ( c) a ser i ous mul t i col l i near i t y pr obl emcan ar i se i f t he
x' s ar e st r ongl y i nt er cor r el at ed wi t h t he cor r espondi ng β
ˆ
i
bei ng ver y er r at i c.
b. Al t er nat i ve Est i mat i on Pr ocedur es: An al t er nat i ve est i mat i on
pr ocedur e whi ch has been pr oposed t o "ci r cumvent " t he i mpact of
possi bl e mul t i col l i near i t y i s t o i mpose some "r easonabl e" pat t er n
t o t he β
i
' s i n t he est i mat i on pr ocedur e. I deal l y, t he val i di t y of
t hese hypot hesi zed const r ai nt s woul d be t est ed. Two of t he most
commonl y encount er ed pat t er ns f or t he β
i
' s ar e t he Koyck scheme
and Al mon pol ynomi al wei ght s. The Koyck model assumes t hat t he
β
i
' s decl i ne geomet r i cal l y and t he Al mon f or mul at i on assumes t hat
t he pat t er ns i n t he β
i
' s can be model ed by a pol ynomi al i n "i ".
We wi l l f i r st di scuss t he Koyck model , t hen t he Al mon pr ocedur e,
and t hen consi der an appl i cat i on of t hese pr ocedur es t o est i mat i ng
t he r el at i onshi p bet ween sal es and adver t i si ng expendi t ur e.
( 1) Koyck Scheme
Model : y
t
= δ + β
0
x
t
+ β
1
x
t  1
+ . . . + u
t
I I I . C
3
Koyck suggest ed t hat t he β
i
be appr oxi mat ed by
The Koyck wei ght s ( β
i
) decl i ne geomet r i cal l y f or 0 < λ < 1.
We now der i ve an equat i on whi ch can be used i n est i mat i ng t he
Koyck f or mul at i on of di st r i but ed l ag coef f i ci ent s wi t h
geomet r i cal l y decl i ni ng wei ght s. Thi s der i vat i on i s done i n
t wo ways: ( 1) usi ng a l i near oper at or and ( 2) usi ng al gebr ai c
mani pul at i ons. Let Lx
t
= x
t  1
, L
2
x
t
= x
t  2
, et c.
( 1) Subst i t ut i ng t he Koyck expr essi on f or β
i
i nt o t he di st r i but ed
l ag model yi el ds
i i
t t 0 t
i=0
= + ( ) + u y
L x
∞
δ β
λ
∑
or
0
t t
t
= + ( ) + . y
u x
1  L
β
δ
λ
Mul t i pl yi ng bot h si des of t hi s equat i on by ( 1  λL) yi el ds
y
t
 λy
t  1
= ( 1 λL) y
t
=( 1  λ) δ + β
0
x
t
+ u
t
 λu
t1
y
t
= δ(1  λ) + β
0
x
t
+ λy
t1
+ u
t
 λu
t1
.
β
i
= β
0
λ
i
β
i
I I I . C
4
or
Not e t hat t hi s equat i on can be est i mat ed by r egr essi ng y
t
on x
t
and y
t1
.
( 2) Anot her way t o der i ve t he est i mat i ng equat i on f or t he
Koyck di st r i but ed l ag model wi t hout t he l ag oper at or ( L) i s as
f ol l ows:
Subst i t ut e β
j
= β
0
λ
j
i nt o equat i on f or t he di st r i but ed l ag
model t o obt ai n
y
t
= δ + β
0
x
t
+ β
0
λx
t1
+ β
0
λ
2
x
t2
+ . . . + u
t
.
Now r epl ace t by "t  1" i n t hi s equat i on and mul t i pl y by λ
λy
t1
= δλ + β
0
λx
t1
+ β
0
λ
2
x
t2
+. . . +λu
t1
.
Subt r act t hese t wo equat i ons t o obt ai n
y
t
 λy
t1
= δ( 1  λ) + β
0
x
t
+ u
t
 λu
t1
wher e v
t
= u
t
 λu
t1
and t hi s est i mat i ng equat i on i s t he same
as obt ai ned i n ( 1) .
y
t
= δ(1  λ) + β
0
x
t
+ λy
t1
+ v
t
I I I . C
5
Not e: ( a) The assumpt i on of a Koyck wei ght i ng scheme r educes
t he number of par amet er s t o be est i mat ed t o 3 ( δ, λ, β
0
) .
( b) I f t he u
t
' s i n t he or i gi nal model ar e i ndependent l y
di st r i but ed, t hen t he l ast r epr esent at i on of t he model i s
char act er i zed by aut ocor r el at i on and cont ai ns a l agged
dependent var i abl e whi ch poses speci al est i mat i on pr obl ems and
wi l l be consi der ed l at er .
( 2) Al mon Pol ynomi al Di st r i but ed Lags
The Al mon pol ynomi al di st r i but ed l ag f or mul at i on i s one of t he
most wi del y used i n pr act i ce. We begi n wi t h a model wi t h a
f i ni t e number of l ags:
Model : y
t
= δ + β
0
x
t
+ β
1
x
t1
+ . . . + β
s
x
ts
+ u
t
.
The Al mon wei ght i ng Scheme i s def i ned by:
β
j
= f ( j ) = a
o
+ a
1
j + . . . + a
p
j
p
j =1, 2, . . . , s
s = # of l ags = # of β' s  1
p = degr ee of pol ynomi nal .
Pol ynomi al s ar e ext r emel y f l exi bl e and can be used t o
appr oxi mat e any cont i nuous f unct i on as accur at el y as desi r ed
by sel ect i ng p t o be l ar ge enough.
The cor r espondi ng est i mat i ng equat i on can be obt ai ned by
subst i t ut i ng f ( j ) f or β
j
i nt o t he di st r i but ed l ag model ,
I I I . C
6
col l ect i ng t er ms i nvol vi ng a
i
' s and t hen est i mat i ng t he a
i
' s
usi ng l east squar es. Gi ven est i mat es f or t he a
i
' s,
cor r espondi ng est i mat es of t he β
j
' s can be obt ai ned f r omt he
est i mat ed f ( j ) . By usi ng such a speci f i cat i on we ar e
est i mat i ng ( p + 2) par amet er s ( δ, a
0
, . . . , a
p
) r at her t han
( s + 2) par amet er s ( δ, β
0
, . . . , β
s
) . I f p ( t he degr ee of
pol ynomi al def i ni ng t he wei ght s) i s smal l er t han s ( t he
maxi muml ag) , t hen t he Al mon wei ght i ng scheme r esul t s i n f ewer
par amet er s needi ng t o be est i mat ed. I n gener al p i s usual l y
sel ect ed t o be r at her smal l ( 2, 3, 4) .
To per f or mt hi s est i mat i on pr ocedur e i n Stata, gener at e t he
pol ynomi al var i abl es ( t he “z
i
' s”) , r un t he r egr essi on of t he
dependent var i abl e on t he pol ynomi al var i abl es, and t hen
r ecover t he β
j
' s f r omt he est i mat i on. For exampl e, t he
f ol l owi ng code wi l l est i mat e t he pr evi ous model wi t h t hr ee
l ags ( s=3) usi ng a second or der ( p=2) pol ynomi al t o descr i be
t he pat t er ns of t he β
i
' s:
*generate the polynomial variables
gen z0 = X+X[_n1]+X[_n2]+X[_n3]
gen z1 = X[_n1]+X[_n2]*2+X[_n3]*3
gen z2 = X[_n1]+X[_n2]*4+X[_n3]*9
*regress the Y variable on the polynomial variables
reg Y z0 z1 z2
estat ic
*recover the betas
scalar b0 = _b[z0]
scalar b1 = _b[z0]+_b[z1]+_b[z2]
scalar b2 = _b[z0]+_b[z1]*2+_b[z2]*4
scalar b3 = _b[z0]+_b[z1]*3+_b[z2]*9
*display the betas
display b0, b1, b2, b3
The mat hemat i cal det ai l s behi nd t hese t r ansf or mat i ons ar e
i l l ust r at ed i n t he f i r st sect i on of t he appendi x. Thi s
est i mat i on pr ocedur e i s automated by such pr ogr ams as SAS and
SHAZAM. For exampl e t he SHAZAMcommand t o est i mat e t he
I I I . C
7
pr evi ous model wi t h t hr ee l ags ( s=3) usi ng a second or der
( p=2) pol ynomi al t o descr i be t he pat t er ns of t he β
i
' s i s gi ven
by:
OLS Y X(0.3,2)
Thi s command wi l l not onl y est i mat e t he a
i
' s, but wi l l al so
gener at e t he β
ˆ
i
' s. However , many cal cul at i ons ar e goi ng on
i n t he backgr ound. The r el at ed det ai l s and di st r i but i onal
det ai l s ar e summar i zed i n t he appendi x "A Few Det ai l s f or t he
Al mon Di st r i but ed Lag. "
Examples:
The Al mon est i mat or s have a smal l er var i ance t han t he l east
squar es est i mat or , whet her t he assumpt i on of a pol ynomi al l ag
i s val i d or not . I f t he assumpt i on i s i ncor r ect t he Al mon
est i mat or i s bi ased and i nconsi st ent [ cf . Schmi dt &Si ckl es,
I ER ( Oct ober 1975) ; Schmi dt &War d, JASA ( Mar ch 1973) ] .
TESTING t he Al mon scheme
H
o
: β
j
= f ( j ) = a
o
+ a
1
j + . . . + a
p
j
p
j =1, 2, . . . , s
can be per f or med usi ng LR or Chow t est s t o compar e t he Al mon
and OLS r esul t s.
I I I . C
8
c. A Revi ew and Appl i cat i on of Di st r i but ed Lag Model s t o Est i mat i ng
t he Rel at i onshi p Bet ween Sal es and Adver t i si ng
I n many si t uat i ons t he economi c agent s whose behavi or i s bei ng
model ed don' t r eact i mmedi at el y or compl et el y t o changes i n t he
economi c envi r onment . I nst ead, t he adj ust ment may be gr adual and
t ake pl ace over sever al per i ods of t i me. The del ay may be due t o
habi t per si st ence, t he cost of f r equent changes, t he del ay i n
gat her i ng dat a or ot her t echnol ogi cal , i nst i t ut i onal or behavi or al
f act or s. Wel l  known exampl es woul d i ncl ude t he r esponse of such
macr oeconomi c var i abl es as GDP or pr i ces t o unexpect ed changes i n
t he money suppl y, gover nment spendi ng or t he t ax syst em.
Adver t i si ng has al so been shown t o have an i mpact on sal es whi ch
gener al l y l ast s f or mor e t han one per i od of t i me.
Di st r i but ed l ag model s pr ovi de a conveni ent descr i pt i ve model
of si t uat i ons i n whi ch changes i n an i ndependent var i abl e may have
an i mpact whi ch l ast s f or sever al t i me per i ods.
A si mpl e exampl e of such a model i s gi ven by
S
t
= δ + β
0
A
t
+ β
1
A
t1
+ β
2
A
t2
+ . . . + β
k
A
tk
+ ε
t
wher e S
t
and A
t
r epr esent sal es and adver t i si ng expendi t ur e dur i ng
t he t
th
t i me per i od. I n t hi s model "δ" r epr esent s t he l evel of
sal es whi ch woul d t ake pl ace wi t hout any adver t i si ng. The i mpact
of adver t i si ng can be r eadi l y det er mi ned. An i ncr ease i n
adver t i si ng of one uni t woul d be expect ed t o i ncr ease sal es by β
0
dur i ng t he same per i od. Sal es i n t he next per i od woul d i ncr ease
I I I . C
9
by β
1
uni t s. Si mi l ar l y, t he i mpact on sal es af t er k t i me per i ods
i s gi ven by β
k
.
I I I . C
10
The "di st r i but ed l ag" ef f ect of adver t i si ng on sal es mi ght be
vi sual l y r epr esent ed as f ol l ows:
Fi gur e 2
Di st r i but ed l ag coef f i ci ent s
Thi s f i gur e cor r esponds t o t he case i n whi ch i ncr eased adver t i si ng
has an i mmedi at e i mpact on sal es, t he i mpact i ncr eases f or t wo
per i ods, t hen decl i nes and t hen t her e i s no i mpact af t er f our
per i ods. An al t er nat i ve scenar i o mi ght be wher e adver t i si ng has
t he gr eat est i mpact on sal es i n t he same t i me per i od, f ol l owed by
a gr adual l y decl i ni ng i mpact . Thi s coul d be r epr esent ed i n Fi gur e
3.
β
i
β
i
I I I . C
11
Fi gur e 3
Decl i ni ng di st r i but ed l ag coef f i ci ent s
Di st r i but ed l ag model s ar e ext r emel y f l exi bl e i n t er ms of
admi ssi bl e behavi or . However , t hi s f l exi bi l i t y can l ead t o
est i mat i on pr obl ems. I n pr i nci pl e, l east squar es est i mat es of t he
coef f i ci ent s ar e t he mi ni mumvar i ance est i mat or s of al l unbi ased
est i mat or s of t he coef f i ci ent s i n di st r i but ed l ag model s under t he
st andar d assumpt i ons associ at ed wi t h t he model .
I n pr act i ce, sever al di f f i cul t i es ar e encount er ed. I n or der
t o i l l ust r at e t hese pr obl ems, assume t hat mont hl y obser vat i ons on
sal es and adver t i si ng f or t hr ee year s ar e avai l abl e. I n or der t o
est i mat e t he di st r i but ed i mpact of adver t i si ng on sal es, we mi ght
consi der est i mat i ng t he model :
S
t
= δ + β
0
A
t
+ β
1
A
t1
+ . . . + β
12
A
t12
+ ε
t
.
Thi s speci f i cat i on cont ai ns 14 unknown par amet er s ( coef f i ci ent s)
and r equi r es obser vat i ons on each of t he var i abl es, i . e. , S
t
, A
t
,
A
t1
, . . . , A
t12
. These dat a ar e r epor t ed i n t he Tabl e i n t he
Appendi x l abel ed "Sal es and Adver t i si ng Dat a. " I n or der t o have
an obser vat i on f or each var i abl e i ncl udi ng A
t12
, t he f i r st t wel ve
obser vat i onal val ues on sal es must be del et ed wi t h t he f i r st
useabl e t i me per i od cor r espondi ng t o t =13. Hence, t he useabl e
sampl e si ze i s r educed f r om36 t o 24 by t he i ncl usi on of t he 12
l agged var i abl es f or adver t i si ng. The degr ees of f r eedom
associ at ed wi t h t hi s model ar e 10 ( useabl e sampl e si ze  number of
coef f i ci ent s t o be est i mat ed) . I n f act i f 17 l ags had been
I I I . C
12
i ncl uded, t he useabl e sampl e si ze woul d be equal t o t he number of
coef f i ci ent s t o be est i mat ed and t he degr ees of f r eedomwoul d be
zer o.
Anot her pr obl emar i ses when t he expl anat or y var i abl e i s
associ at ed wi t h a t r end over t i me. I f t he t r end i s appr oxi mat el y
l i near , t hen mul t i col l i near i t y bet ween t he cur r ent and l agged
val ues of t he expl anat or y var i abl es may make i t di f f i cul t t o
accur at el y est i mat e i ndi vi dual par amet er coef f i ci ent s. The
pai r wi se cor r el at i ons of l agged adver t i si ng ar e gi ven i n t he
f ol l owi ng t abl e:
Tabl e 2
Pai r wi se Cor r el at i ons of Lagged Adver t i si ng
A A(  1) A(  2) A(  3) A(  12)
A 1 . 874 . 866 . 859 . . .
. 892
A(  1) 1 . 874 . 855 . . . .
896
A(  2) 1 . 863 . . . . 839
A(  3) 1 .
. .
. .
. .
A(  12) 1
Each of t hese si t uat i ons ( l ow degr ees of f r eedomand
mul t i col l i near i t y) can r esul t i n unr el i abl e est i mat es of t he di st r i but ed l ag
coef f i ci ent s ( β
i
) .
OLS estimation (demonstration using Stata):
I I I . C
13
As a case i n poi nt , i f we r egr ess sal es on adver t i si ng expendi t ur e
f or t he cur r ent and pr evi ous t wel ve mont hs usi ng t he command:
. t sset t
. r eg S A A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 or
. r eg S A A1 A12
. est at i c “r epor t s t he cor r espondi ng l og l i kel i hood val ue”
wher e each of t he AJ have been gener at ed by addi ng an “L” i n f r ont
of t he var i abl e
. gen A1 = l . A
. gen A2 = l . A1
…
. gen A12 = l . A11
We t hen obt ai n
Source  SS df MS Number of obs = 24
+ F( 13, 10) = 3.51
Model  8029.73337 13 617.671797 Prob > F = 0.0268
Residual  1760.76663 10 176.076663 Rsquared = 0.8202
+ Adj Rsquared = 0.5864
Total  9790.5 23 425.673913 Root MSE = 13.269

S  Coef. Std. Err. t P>t [95% Conf. Interval]
+
A  .4270829 .2063794 2.07 0.065 .0327592 .8869249
A1  .0015484 .2161103 0.01 0.994 .4799754 .4830721
A2  .1026181 .1849852 0.55 0.591 .3095545 .5147907
A3  .1387561 .1593701 0.87 0.404 .2163427 .4938549
A4  .0324424 .1771302 0.18 0.858 .427113 .3622282
A5  .0431555 .1744989 0.25 0.810 .4319632 .3456522
A6  .2148685 .1721424 1.25 0.240 .1686887 .5984256
A7  .114542 .1544704 0.74 0.475 .2296396 .4587236
A8  .1045846 .1490156 0.70 0.499 .436612 .2274427
A9  .2443856 .1460974 1.67 0.125 .5699108 .0811397
A10  .1016249 .173713 0.59 0.572 .4886817 .2854318
A11  .0571411 .2020959 0.28 0.783 .5074388 .3931567
A12  .0085637 .20028 0.04 0.967 .4376881 .4548154
_cons  478.7293 18.94364 25.27 0.000 436.5202 520.9383

Loglikelihood value = 85.6
Not e: l ags can al so be
cr eat ed i n STAT usi ng t he
command:
. gen A1 = A[ _n 1]
I I I . C
14
The f ol l owi ng f i gur e shows t he cor r espondi ng OLS est i mat es of t he
β
i
Fi gur e 4
Di st r i but ed Lag Coef f i ci ent s
( No Const r ai nt s)
The est i mat or vol at i l i t y, l ar ge st andar d er r or s and smal l t  st at i st i cs
f or t he est i mat ed OLS β' s suggest a mul t i col l i near i t y pr obl em.
Nei t her t he pat t er n or si gns f or t he β
i
' s ar e consi st ent wi t h a
r easonabl e expl anat i on of t he i mpact of adver t i si ng on sal es.
The most common appr oach f or deal i ng wi t h t hese pr obl ems i s t o
assume t hat t he β
i
' s f ol l ow a "r easonabl e" pat t er n whi ch i s descr i bed
by a f ewer number of par amet er s. The associ at ed model i s est i mat ed
and used i n anal yzi ng t he i mpact of t he var i abl e i n quest i on.
Cl ear l y, t he advant ages of t hi s appr oach ar e condi t i onal upon t he
accur acy of t he assumpt i ons made about t he β
i
' s and t hese assumpt i ons
shoul d be t est ed. The Koyck di st r i but ed l ag and pol ynomi al
di st r i but ed l ag model s wi l l be appl i ed.
KOYCK DI STRI BUTED LAGS:
β
i
0. 1
0. 2
I I I . C
15
I f t he model bui l der i s wi l l i ng t o assume t hat t he i mpact of t he
i ndependent var i abl e ( adver t i si ng) on t he dependent var i abl e ( sal es)
decl i nes geomet r i cal l y over t i me, t he Koyck model can pr ovi de a
r easonabl e possi bi l i t y. I n t hi s model t he coef f i ci ent s ar e assumed t o
be of t he f or m
β
i
= λ
i
β
o
i = 1, 2, . . .
Thi s can be vi sual l y r epr esent ed ( f or t wo di f f er ent val ues of λ) as
β
i
0. 5
i
λ = 0. 6
λ = 0. 9
I I I . C
16
S
t
= a(1  λ) + β
0
A
t
+ λS
t1
+ ε
t
 λε
t1
The Koyck assumpt i on i mpl i es t hat
2, 1, = i =
A
S
i
i t
t
β
∂
∂
= λ
i
β
o
,
i . e. , a change of one uni t of adver t i si ng wi l l have an i mmedi at e
i mpact ( β
0
) on sal es and wi l l cont i nue t o af f ect sal es t her eaf t er , but
at an exponent i al l y decl i ni ng r at e. I n ot her wor ds, sal es wi l l be
i nf l uenced by not onl y cur r ent adver t i si ng, but al l past val ues of
adver t i si ng.
Rewr i t i ng t he di st r i but ed l ag model and subst i t ut i ng f or t he Koyck
coef f i ci ent s yi el ds
S
t
= a + β
0
A
t
+ β
1
A
t1
+ β
2
A
t2
+ . . . + ε
t
= a + β
0
A
t
+ λβ
1
A
t1
+ λ
2
β
2
A
t2
+ . . . + ε
t
.
Not i ce t hat by assumi ng t hat t he coef f i ci ent s f ol l ow a Koyck model ,
onl y t hr ee coef f i ci ent s ( a, β
0
and λ) need be est i mat ed. Thi s
r epr esent at i on can be wr i t t en i n a f or mwhi ch f aci l i t at es est i mat i on
by r epl aci ng t by t  1, and mul t i pl yi ng by λ t o yi el d:
( ORI GI NAL) S
t
= a + β
0
A
t
+ λβ
0
A
t1
+ λ
2
β
0
A
t2
+ . . . + ε
t
( MODI FI ED) λS
t  1
= aλ + λβ
0
A
t1
+ λ
2
β
0
A
t2
+ . . . + ε
t  1
.
Subt r act i ng t he "modi f i ed r epr esent at i on" f r omt he "or i gi nal
r epr esent at i on" yi el ds
S
t
 λS
t1
= a  aλ + β
0
A
t
+ ε
t
 λε
t1
or equi val ent l y,
.
I I I . C
17
Thi s i s t he f or mwe have pr evi ousl y di scussed whi ch can be est i mat ed
usi ng l east squar es wi t h t he St at a commands
Wi t h t he f ol l owi ng St at a out put :
Source  SS df MS Number of obs = 35
+ F( 2, 32) = 77.63
Model  21128.4531 2 10564.2265 Prob > F = 0.0000
Residual  4354.68977 32 136.084055 Rsquared = 0.8291
+ Adj Rsquared = 0.8184
Total  25483.1429 34 749.504202 Root MSE = 11.666

S  Coef. Std. Err. t P>t [95% Conf. Interval]
+
A  .3732443 .0621284 6.01 0.000 .2466929 .4997957
S1  .1628443 .128893 1.26 0.216 .0997022 .4253907
_cons  407.1455 63.71989 6.39 0.000 277.3523 536.9387

( The est i mat ed i nt er cept i n t hi s model cor r esponds t o aˆ ( 1  λ
ˆ
) ;
hence,
aˆ = 407. 145/ ( 1  . 1628) = 486. 32
The di st r i but ed l ag coef f i ci ent s can be easi l y r ecover ed f r omt he
equat i on
β
ˆ
i
= β
ˆ
0
λ
ˆ
i
= ( . 3732) ( . 1628)
i
;
t her ef or e, t he i mmedi at e i mpact of a one dol l ar i ncr ease i n
adver t i si ng i s est i mat ed t o be β
ˆ
0
= . 3732 wi t h subsequent i ncr eases
i n sal es est i mat ed t o be ( . 0608, . 0099, . 0016, . 003, 0) f or t he f i r st
tsset t
gen S1 = S[_n1])
reg S A S1
I I I . C
18
t hr ough t he f i f t h per i ods. The l ong r un i mpact of a one dol l ar
i ncr ease i n adver t i si ng i s obt ai ned f r omt he f ol l owi ng:
i mmedi at e: β
ˆ
0
+ l ag one per i od: β
ˆ
0
λ
ˆ
+ l ag t wo per i ods: β
ˆ
0
λ
ˆ
2
M
cont i nue
Tot al Long Run I mpact
β
ˆ
0
/ ( 1  λ
ˆ
) = . 446
Sever al comment s need t o be made. Fi r st , i t i s ver y i mpor t ant t o t est
f or aut ocor r el at i on. The l east squar es est i mat or s wi l l be bi ased and
i nconsi st ent i f t he model cont ai ns l agged dependent var i abl es and
aut ocor r el at ed r andomdi st ur bances. Est i mat i on t echni ques have been
devel oped whi ch yi el d consi st ent est i mat or s i n t hi s case, but wi l l not
be di scussed her e. Least squar es appl i ed t o an equat i on wi t h a l agged
dependent var i abl e and uncor r el at ed er r or s wi l l yi el d bi ased, but
consi st ent est i mat or s. Secondl y, i f i t i s f el t t hat t he assumpt i on t hat
t he i mpact of t he i ndependent var i abl e begi ns decl i ni ng i mmedi at el y i s
t oo r est r i ct i ve, t hi s can be r el axed. The Koyck pr ocedur e can be
modi f i ed t o cor r espond t o decl i ni ng wei ght s af t er an ar bi t r ar y
t r ansi t i on per i od.
I I I . C
19
POLYNOMI AL DI STRI BUTED LAGS:
As i ndi cat ed ear l i er , pol ynomi al di st r i but ed l ag model s pr ovi de
one of t he most common appr oaches t o di st r i but ed l ag model s. The
basi c i dea i s t o appr oxi mat e t he desi r ed f or mf or t he β
i
' s wi t h a
pol ynomi al whi ch i s descr i bed by a f ewer number of par amet er s t han
t he or i gi nal β
i
' s i n t he model . I n pr act i ce, p i s r ar el y chosen
t o be l ar ger t han t wo or t hr ee, i . e. , t he β
i
' s f ol l ow a quadr at i c
or cubi c f or m. As an exampl e, i f p = 2, t he β
i
' s ar e compl et el y
descr i bed by t hr ee par amet er s ( a
0
, a
1
, a
2
) i n t he equat i on:
β
i
= a
0
+ a
1
i + a
2
i
2
.
Consequent l y, t he model
S
t
= a + β
0
A
t
+ β
1
A
t1
+ β
2
A
t2
+ . . . + β
s
A
ts
+ ε
t
onl y i nvol ves t he par amet er s ( a, a
0
, a
1
, a
2
) r egar dl ess of t he
number of l ags ( s) i ncl uded i n t he equat i on. Once t he a
0
, a
1
, a
2
ar e est i mat ed, t he cor r espondi ng est i mat es of β
i
can be obt ai ned
f r om
β
i
= a
0
+ a
1
i + a
2
i
2
,
i . e. ,
β
0
= a
0
β
1
= a
0
+ a
1
+ a
2
β
2
= a
0
+ 2a
1
+ 4a
2
, et c.
I I I . C
20
Al so not e t hat speci f yi ng t he β
i
' s t o be quadr at i c al l ows
consi der abl e f l exi bi l i t y.
β
i
β
i
β
i
β
i
Fi gur e 5. Quadr at i c Di st r i but ed Lags
Stata Example
As an exampl e of est i mat i ng pol ynomi al di st r i but ed l ag
coef f i ci ent s, we est i mat e t he di st r i but ed l ag i mpact of
adver t i si ng on sal es usi ng pol ynomi al di st r i but ed l ags wi t h t he
f ol l owi ng St at a commands ( wher e s=12 and p= 2) :
gen z0 = A + A[_n1]+A[_n2]+A[_n3]+…+A[_n12]
*index `i' should range up to the order of the polynomial (p)
forvalues i= 1/2 {
gen z`i' = A[_n1]+A[_n2]*2^`i'+A[_n3]*3^`i' …+A[_n
12]*12^`i’
}
*regress s on the p+1 transformed variables
reg S z0 z1 z2
*Recover the betas from the coefficients of the zi’s
*(beta0 will be the same as a0, the coefficient of z0)
Scalar b0=_b[z0]
Display b0
forvalues i=1/12 {
scalar b`i' = _b[z0]+_b[z1]*`i'+_b[z2]*`i'^2
display "beta"
display b0
display b`i'
I I I . C
21
}
. reg s z0 z1 z2
Source SS df MS Number of obs = 24
 F( 3, 20) = 14.37
Model 6688.46 3 2229.49 Prob > F = 0.0000
Residual 3102.04 20 155.10 Rsquared = 0.6832
 Adj Rsquared = 0.6356
Total 9790.5 23 425.67 Root MSE = 12.454

s Coef. Std. Err. t P>t

z0 .2366588 .1137905 2.08 0.051
z1 .0611558 .0432326 1.41 0.173
z2 .0032403 .0032659 0.99 0.333
_cons  484.40 15.95 30.36 0.000

. estat ic

Model  Obs ll(null) ll(model) df AIC BIC

.  24 106.1879 92.39565 4 192.7913 197.5035
The polynomial distributed lag coefficients can then be obtained from the equation
β
i
= a
0
+ a
1
i + a
2
i
2
=. 2366  . 0612 i + . 0032 i
2
.
The r esul t i ng coef f i ci ent s ar e gi ven bel ow:
β
i
0 .237
1 .179
2 .127
3 .082
4 .044
5 .012
6 .013
7 .033
8 .045
9 .051
I I I . C
22
10 .051
11
12
.044
.031
The β
i
' s ( pol ynomi al di st r i but ed l ag model ) can be i l l ust r at ed as i n
Fi g. 6
β
i
. 3
. 2
. 1
0 1 2 3 4 5
Fi gur e 6. Pol ynomi al Di st r i but ed Lag Coef f i ci ent
The r esul t s f r omt hese t hr ee t echni ques ( OLS, Koyck, PDL) ar e summar i zed
i n Fi gur e 7.
β
i
. 4
. 3
. 2
. 1
1 2 3 4 5 6 7 i
Fi gur e 7. Al t er nat i ve Est i mat es of Di st r i but ed Lag Ef f ect s
Not e t hat t he di st r i but ed l ag coef f i ci ent s associ at ed wi t h t he
Koyck and pol ynomi al model s decl i ne  at di f f er ent r at es. The
pol ynomi al di st r i but ed l ag model suggest s t hat t he i mpact of
adver t i si ng i sn' t st at i st i cal l y si gni f i cant beyond t hr ee or f our
OLS di st r i but ed
l ag
Koyck di st r i but ed l ag
pol ynomi al di st r i but ed l ag
I I I . C
23
mont hs. The est i mat ed wei ght s f r omt he Koyck model "di e out " even
mor e qui ckl y. Thi s i s i n shar p cont r ast t o t he wei ght s whi ch wer e
est i mat ed wi t hout any const r ai nt s ( OLS) . The advant age of t he
al t er nat i ves t o unconst r ai ned est i mat i on shoul d be appar ent . The
r el at ed l i t er at ur e cont ai ns a di scussi on of many al t er nat i ves. The
met hodol ogy i s si mi l ar t o t hat al r eady di scussed: ( 1) speci f y a
"f or mf or t he β
i
' s" whi ch r educes t he number of par amet er s t o be
est i mat ed; ( 2) t hese new par amet er s ar e t hen est i mat ed and t he
cor r espondi ng β' s obt ai ned.
The r eader may want t o gai n exper i ence by est i mat i ng some
al t er nat i ve speci f i cat i ons. I t woul d be i nst r uct i ve t o consi der t he
sensi t i vi t y of pol ynomi al di st r i but ed l ag β
i
' s t o t he number of l ags,
degr ee of under l yi ng pol ynomi al as wel l as assumpt i ons about end
poi nt s. The r eader mi ght al so demonst r at e t hat i f we assume t he
ef f ect of adver t i si ng doesn' t begi n t o decay exponent i al l y unt i l
per i od t wo ( r at her t han i n t he f i r st per i od) , t he r el evant model can
be wr i t t en as
S
t
= a( 1  λ) + λS
t1
+ β
0
A
t
+ ( β
1
 λβ
0
) A
t1
+ ε
t
 λε
t1
wher e β
i
= λ
i1
β
1
f or i = 1, 2, . . . Est i mat e t hi s model and compar e
t he r esul t s wi t h t hose obt ai ned usi ng t he Koyck model . The consistency
of the polynomial distributed lag model specification with the unconstrained
estimates can be easily tested using a likelihood ratio test.
I I I . C
24
2. Lagged Dependent Variables  Autoregressive model
Aut or egr essi ve model s i ncl ude l agged val ues of dependent var i abl es,
can be vi ewed as bei ng dynami c model s, and l i nk di f f er ent t i me
per i ods. We f i r st i nt er pr et and summar i ze t he st at i st i cal pr oper t i es
of OLS est i mat or s of aut or egr essi ve model s. The coef f i ci ent s i n t hese
model s have i mpor t ant "dynami c" i nt er pr et at i ons concer ni ng compar at i ve
st at i c r esul t s. Fi nal l y, we show t hat t he f amous par t i al and adapt i ve
expect at i ons model s can be expr essed as aut or egr essi ve model s.
a. I nt er pr et i ng t he coef f i ci ent s i n aut or egr essi ve model s. A model
i s sai d t o be dynami c i f val ues of t he dependent var i abl e f r omt he
cur r ent and pr evi ous t i me per i ods ar e i ncl uded i n t he same
equat i on. The i ncl usi on of l agged dependent var i abl es pr esent s
sever al pr obl ems t o t he economet r i ci an. I n or der t o di scuss some
of t hese pr obl ems, consi der t he f ol l owi ng aut or egr essi ve model :
Y
t
= α + βI
t
+ γY
t1
+ ε
t
wher e Y
t
and I
t
denot e some aggr egat e measur es of pr oduct i on and
i nvest ment .
( 1) Pr oper t i es of est i mat or s and st at i st i cal i nf er ence
I f t he ε
t
' s ar e i ndependent of each ot her ( i . e. , A. 4) , t hen
l east squar es est i mat or s of α, β, γ, ( α
s
, β
ˆ
, γˆ ) wi l l be
bi ased, but consi st ent ; wher eas, i f t he ε
t
ar e ser i al l y
cor r el at ed, α
s
, β
ˆ
, γˆ wi l l be bi ased and i nconsi st ent . I n
nei t her case wi l l t he t and F st at i st i cs be appr opr i at e ( mor e
I I I . C
25
on t hi s i n anot her sect i on) . The pr oper t i es of l east squar es
est i mat or s can be compact l y summar i zed as i n t he f ol l owi ng
t abl e:
Pr oper t i es of Least Squar es
Resi dual s
Uncor r el at ed Cor r el at ed
No Lagged Dependent
Var i abl e
unbi ased
consi st ent
ef f i ci ent
unbi ased
consi st ent
not ef f i ci ent
Lagged Dependent
Var i abl e
bi ased
consi st ent
not ef f i ci ent
bi ased
i nconsi st ent
not ef f i ci ent
Thus i t i s i mpor t ant t o t est f or aut ocor r el at i on. The D. W.
can be used f or model s wi t hout l agged dependent var i abl es and
Dur bi n' s h t est or Br eusch Godf r ey t est can be used f or
aut or egr essi ve model s. ( See t he di scussi on of aut ocor r el at i on
i n sect i on I V of t he not es. )
( 2) I nt er pr et at i on of coef f i ci ent s
For not at i onal si mpl i ci t y del et e ε
t
f r omt he pr evi ous equat i on
and consi der
Y
t
= α + βI
t
+ γY
t1
i s r ef er r ed t o as t he i mpact mul t i pl i er
f or t hi s model and i s not what i s
gener al l y r ef er r ed t o as "t he i nvest ment
mul t i pl i er . " The i mpact mul t i pl i er
β
∂
∂
=
I
Y
t
t
I I I . C
26
measur es t he change i n Y
t
dur i ng t he same
per i od as I
t
changes.
We not e t hat si nce
Y
t
= α + βI
t
+ γY
t1
i t f ol l ows t hat
Y
t1
= α + βI
t1
+ γY
t2
;
hence,
Y
t
= α + βI
t
+ γ( α + βI
t1
+ γY
t2
)
= α( 1 + γ) + β[ I
t
+ γI
t1
] + γ
2
Y
t2
.
Cont i nui ng t hi s pr ocess we obt ai n
Y
t
= α( 1 + γ + γ
2
+ . . . ) + β[ I
t
+ γI
t1
+γ
2
I
t2
+ . . . ] .
What wi l l t he t ot al ef f ect of a change i n I
t
have on Y
t
, Y
t+1
, .
. . ,
when ∆I
t
= 1 ∆Y
t
= β
∆Y
t+1
=βγ
∆Y
t+2
= βγ
2
M
Tot al i mpact =
γ
β
γ γ β
 1
= ...) + + + (1
2
The t wo per i od cumul at i ve mul t i pl i er i s gi ven by β + βγ,
t he t hr ee per i od by β + βγ + βγ
2
and so on.
The l ong r un i nvest ment mul t i pl i er i s gi ven by
γ
β
 1
. The
l ong r un mul t i pl i er can be i nt er pr et ed i n t wo ways: ( 1) t he
cumul at i ve ( over t i me) change i n Y cor r espondi ng t o a one t i me
= ... +
I
... +
I
+
I
+
I
+
 1
3 t
3
2 t
2
1 t t
γ β γ β βγ β
γ
α
I I I . C
27
i ncr ease i n i nvest ment expendi t ur e; or ( 2) t he i ncr ease i n
l ong r un equi l i br i umY cor r espondi ng t o a sust ai ned i ncr ease
i n i nvest ment expendi t ur e. These t wo i nt er pr et at i ons ar e
r epr esent ed i n t he f ol l owi ng f i gur e.
I I I . C
28
I mpact of change i n i nvest ment
One per i od change Sust ai ned change
Y
t
Y
t
I
I
t t
b. Some common aut or egr essi ve model s
( 1) Par t i al adj ust ment model
Opt i mal : The opt i mal val ue of y
t
, y
t
*, i s a f unct i on of x
t
y
t
* = α + βx
t
t
u +
Adj ust ment mechani sm:
y
t
 y
t1
= γ( y
t
*  y
t1
) 0 < γ ≤ 1
Not e: ( 1) γ = 1 cor r esponds t o compl et e adj ust ment .
( 2) Thi s adj ust ment mechani smi s consi st ent wi t h t he
mi ni mi zat i on of cost s, c
t
, wher e
c
t
= α( y
t
 y
t
*)
2
+ β( y
t
 y
t1
)
2
cost s: out of equi l i br i umchange
wher e y
t1
and y
t
* ar e gi ven.
∆I =1 ∆I =1
t
Y
1
β
∆ =
− γ
t
Y
1
β
∆ =
− γ
I I I . C
29
y
t
= αγ + βγx
t
+ (1  γ)y
t1
+ γ
t
u
Combi ni ng t he basi c equat i on and adj ust ment mechani smyi el ds
whi ch can be est i mat ed usi ng OLS.
( 2) Adapt i ve Expect at i ons Model . Thi s model r el axes t he
assumpt i on t hat t he dependent var i abl e depends onl y on t he
cur r ent l evel of t he i ndependent var i abl e. Let x
t
*
denot e t he
"expect ed" l evel of x
t
and assume t he dependent var i abl e
i mmedi at el y adj ust s t o x
t
*
.
Basi c Rel at i onshi p:
y
t
= α + β x
t
*
+ u
t
Adj ust ment Mechani sm:
x
t
*
 x
t1
*
= δ( x
t
 x
t1
*
) 0 < δ ≤ 1
δ = 1 cor r esponds t o compl et e adj ust ment .
Combi ni ng t hese expr essi ons yi el ds
Not e t he si mi l ar i t y and di f f er ences bet ween t he f or ms f or t he
Koyck, par t i al adj ust ment , and adapt i ve expect at i ons model s.
y
t
= αδ + βδx
t
+ (1  δ)y
t1
+ (u
t
 (1  δ)u
t1
)
I I I . C
30
( 3) Par t i al Adj ust ment and Adapt i ve Expect at i ons Model
Basi c Rel at i onshi p: y
t
*
= α + β x
t
*
opt i mal expect ed
Adj ust ment Mechani sms:
y
t
 y
t1
= γ( y
t
*
 y
t1
) + u
t
0 < γ ≤ 1
x
t
*
 x
t1
*
= δ( x
t
 x
t1
*
) 0 < δ ≤ 1
Combi ni ng t hese expr essi ons yi el ds
c. Est i mat i on of Aut or egr essi ve model s
Consi der t he model
y
t
= β
1
+ β
2
y
t1
+ β
3
x
t
+ ε
t
wi t h t he f ol l owi ng assumpt i ons f or t he er r or t er m.
Assumpt i on I . ε
t
~ NI D( 0, σ
2
) wher e NI D st ands f or
i ndependent l y and i dent i cal l y di st r i but ed as
N( 0, σ
2
) .
Assumpt i on I I . ε
t
= u
t
 λu
t1
Koyck
a. u
t
~ NI D ( 0, σ
2
u
)
b. u
t
= ρu
t1
+ η
t
ρ < 1
η
t
~ NI D( 0, σ
2
η
)
Assumpt i on I I I . ε
t
= ρε
t1
+ u
t
u
t
~ NI D( 0, σ
2
u
)
y
t
= αγδ + βγδx
t
+ [(1  δ) + (1  γ)]y
t1
 (1  δ)(1  γ)y
t2
+ (u
t
 (1  δ)u
t1
)
I I I . C
31
( 1) Assumpt i on I . l east squar e est i mat or s of β = ( β
1
, β
2
, β
3
)
wi l l be bi ased, but consi st ent .
( a) Remember t hat OLS est i mat or s ar e unbi ased and consi st ent
i n t he pr esence of aut ocor r el at i on, but ar e no l onger
mi ni mumvar i ance est i mat or s.
( b) The pr esence of l agged dependent var i abl es r esul t s i n
l east squar es est i mat or s whi ch ar e bi ased, but ar e st i l l
consi st ent .
( c) The pr esence of aut ocor r el at i on and l agged dependent
var i abl es i mpl i es t hat l east squar es est i mat or s wi l l be
bi ased and i nconsi st ent . Thi s si t uat i on ar i ses wi t h
assumpt i on I I and I I I . Hence, est i mat or s ot her t han l east
squar es est i mat or s need t o be devel oped f or t he case of
l agged dependent var i abl es and aut ocor r el at i on.
( d) The i ncl usi on of l agged dependent var i abl es bi ases t he
val ue of t he Dur bi n Wat son st at i st i c t owar ds 2 and
t her ef or e t he st andar d i nt er pr et at i on of D. W. i s not
val i d.
The h t est has been pr oposed as a t est f or aut ocor r el a
t i on i n t hi s case
ρ
) y of . est . (Coef ar V
ˆ
n  1
n
= h
1 t
2
1
The asympt ot i c di st r i but i on of h i s
h ~ N( 0, 1) .
Ther e ar e t wo mai n pr obl ems wi t h t hi s t est :
( i ) The h t est i s not val i d i f n
ˆ
Var ( ) > 1
( i i ) N( 0, 1) seems t o be a yi el d a poor f i t t o t he
di st r i but i on of h f or f r equent l y encount er ed sampl e
si zes. Some have ar gued t hat t he use of du and 4 du
t o def i ne cr i t i cal r egi ons appear s t o pr ovi de mor e
accur at e r esul t s. Du cor r esponds t o t he upper l i mi t
I I I . C
32
( )
1 t t t 2 1 1 t t
Y 1 C C
− −
ε − ε + γ β + λ − β = λ −
f or a Dur bi n Wat son Test St at i st i c whi ch wi l l be
di scussed l at t er .
_______________________________________
du 2 4 du
Ot her t est s f or t he pr esence of aut ocor r el at i on i n a model wi t h
l agged dependent var i abl es ar e avai l abl e. For exampl e, t he
Br eusch Godf r ey and Lj ung Box t est s can be modi f i ed t o appl y t o
aut or egr essi ve model s. The Br eusch Godf r ey t est can be appl i ed by
r egr essi ng t he OLS
t
' on the lagged y's and the lagged e '
t
e s s i mpl i ed by
t he model ( aut or egr essi ve and number of aut or egr essi on or movi ng
aver age er r or s) and t est i ng f or t he col l ect i ve expl anat or y power
of t he coef f i ci ent s of t he l agged er r or s usi ng an F t est .
A br i ef t r eat ment of est i mat i on i n t he case of I I or I I I i s
r epor t ed i n t he appendi x.
I I I . C
33
D. Causality or Exogeniety
The exi st ence of a r el at i onshi p does not i mpl y t hat ei t her var i abl e
causes t he ot her var i abl e. Ther e i s an ext ensi ve l i t er at ur e on what i t
means f or X t o cause Y or f or X t o be exogenous t o Y. A r el at ed concept
i s Gr anger causal i t y. X i s sai d t o not Gr anger  cause Y i f t he
condi t i onal di st r i but i on of Y, gi ven l agged Y and l agged X i s equal t o
t he condi t i onal di st r i but i on of Y, gi ven l agged Y. Al t er nat i vel y,
l agged X’s do not hel p expl ai n cur r ent l evel s of Y. A t est of whet her X
Gr anger  causes Y can be per f or med as f ol l ows:
( 1) Est i mat e t he f ol l owi ng model :
1 1 1 1
... ...
t t p t p t p t p t
y a b y b y c x c x ε
− − − −
= + + + + + + + .
( 2) Test t he j oi nt hypot hesi s,
0 1
: ... 0
p
H c c = = = ( X does not
Gr anger  cause Y) usi ng an F t est . A “l ar ge” F st at i st i c pr ovi des
evi dence t hat X Gr anger  causes Y.
I I I . C
34
APPENDIX PDL MODELS
1. "A Few Details for the Almon Distributed Lag."
Consi der t he pr obl emof est i mat i ng an Al mon di st r i but ed l ag model wi t h p =
2 and s = 3 so we have a 2nd degr ee pol ynomi al wi t h 3 l ags. The β
i
' s can be
expr essed i n t er ms of t he a
i
' s ( r ecal l : β
j
= a
0
+ a
1
i + a
2
i
2
) as
β
0
= a
0
β
1
= a
0
+ a
1
+ a
2
β
2
= a
0
+ 2a
1
+ 4a
2
β
3
= a
0
+ 3a
1
+ 9a
2
.
Subst i t ut i ng t hese expr essi ons i nt o t he or i gi nal di st r i but ed l ag model f or β
i
yi el ds:
y
t
= α + a
0
x
t
+ ( a
0
+ a
1
+ a
2
) x
t1
+ ( a
0
+ 2a
1
+ 4a
2
) x
t2
+ ( a
0
+
3a
1
+ 9a
2
) x
t3
= α + a
0
( x
t
+ x
t1
+ x
t2
+ x
t3
)
+ a
1
( x
t1
+ 2x
t2
+ 3x
t3
)
+ a
2
( x
t1
+ 4x
t2
+ 9x
t3
) + u
t
For a mor e gener al case, assume p = 3 and s = 10.
s = 10: y
t
= δ + β
o
x
t
+ β
1
x
t1
+ . . + β
10
x
t10
+ u
t
p = 3: β
i
= a
0
+ a
1
i + a
2
i
2
+ a
3
i
3
β
0
= a
0
β
1
= a
0
+ a
1
+ a
2
+ a
3
= Σa
i
β
2
= a
0
+ a
1
2 + a
2
2
2
+ a
3
2
3
= Σa
i
2
i
M
β
10
= a
0
+ a
1
10 + a
2
10
2
+ a
3
10
3
= Σa
i
10
i
Agai n, af t er subst i t ut i ng f or β
i
, we obt ai n
y
t
= δ + a
0
x
t
+ ( Σa
i
) x
t1
+ ( Σa
i
2
i
) x
t2
+ . . .
+ ( Σa
i
10
i
) x
t10
+ u
t
.
I I I . C
35
Rear r angi ng t er ms we obt ai n
y
t
= δ + a
0
( x
t
+ x
t1
+ . . . + x
t10
)
+ a
1
( x
t1
+ 2x
t2
+ . . . + 10x
t10
)
+ a
2
( x
t1
+ 2
2
x
t2
+ . . . + 10
2
x
t10
)
+ a
3
( x
t1
+ 2
3
x
t2
+ . . + 10
3
x
t10
) +u
t
δ
∑ ∑
ix a
+
x a
+ = y
i t
10
1 = i
1 i t
10
0 = i
0
t
u
+
x i a
+
x i a
+
t i t
3
10
1 = i
3 i t
2
10
0 = i
2
∑ ∑
Def i ni ng )
x i
( =
z i t
j
10
0 = i
tj ∑
we can est i mat e t he a
i
, ( t he β
i
) by obt ai ni ng est i mat es of
y
t
= δ + a
0
z
t0
+ a
1
z
t1
+ a
2
z
t2
+ a
3
z
t3
+ u
t
) Z Z ( =
aˆ
.
.
.
aˆ
ˆ
Var
1 
2
3
0
u
′
σ
δ
Now si nce
δ
β
β
β
δ
a
a
a
a
10 10 10 10
0
. . . . .
. . . . .
. . . . .
3 3 3 3
0
2 2 2 2
0
1 1 1 1 0
0 0 0 0 1
=
.
.
.
3
2
1
0
3 2 1 0
3 2 1 0
3 2 1 0
10
1
0
I I I . C
36
δ
a
.
.
.
a
C =
3
0
C ) Z Z C( =
ˆ
.
.
.
ˆ
ˆ
ˆ
Var then
1 
2
10
1
0
u
′ ′
σ
β
β
β
δ
I I I . C
37
PROBLEM SET 4.3: LAGGED VARIABLES
Applied problems
1. Repl i cat e t he r esul t s i n t he appl i cat i ons of OLS, Koyck, and PDL model s
t o est i mat e t he r el at i onshi p bet ween sal es and adver t i si ng expendi t ur es
r epor t ed i n not es. The dat a ar e avai l abl e i n f i l e hw3_3_table1.txt).
I n par t i cul ar ,
( a) est i mat e
S
t
= a + β
0
A
t
+ . . . +β
0
A
t12
+ ε
t
usi ng ( 1) OLS
( 2) Koyck Lags ( r epor t λ, α, β
0
)
( 3) Pol ynomi al di st r i but ed l ags, or der = 2
( b) Compar e t he di st r i but ed l ag coef f i ci ent s wi t h OLS.
( c) Test t he PDL speci f i cat i on agai nst t he OLS usi ng a Chow and LR
t est .
( d) Re est i mat e t he model usi ng a pol ynomi al di st r i but ed l ag wi t h
or der = 3 and t est whet her t he di f f er ences bet ween p=2 and p=3 ar e
st at i st i cal l y si gni f i cant .
( e) ( Bonus) Est i mat e a modi f i ed Koyck model whi ch decl i nes
geomet r i cal l y af t er t he f i r st l ag.
Hi nt : r epl i cat e t he commands cont ai ned i n t he PDL sect i on of t he cl ass not es.
The TA wi l l be a gr eat r esour ce.
I I I . C
38
(JM IIIC)
Table 1
Sales and Advertising
t St At At1 At2 At3 At4 At12
1 521 73
2 515 94 73
3 533 88 94 73
4 531 103 88 94 73
5 544 104 103 88 94 73
6 528 73 104 103 88 94
7 537 121 73 104 103 88
8 541 134 121 73 104 103
9 531 102 134 121 73 104
10 535 79 102 134 121 73
11 527 119 79 102 134 121
12 517 118 119 79 102 134
13 547 145 118 119 79 102 73
14 560 128 145 118 119 79 94
15 557 145 128 145 118 119 88
16 548 191 145 128 145 118 103
17 543 159 191 145 128 145 104
18 580 169 159 191 145 128 73
19 564 162 169 159 191 145 121
20 581 181 162 169 159 191 134
21 557 170 181 162 169 159 102
22 575 183 170 181 162 169 79
23 585 205 183 170 181 162 119
24 568 185 205 183 170 181 118
25 569 200 185 205 183 170 145
26 551 173 200 185 205 183 128
27 586 243 173 200 185 205 145
28 581 215 243 173 200 185 191
29 559 210 215 243 173 200 159
30 594 229 210 215 243 173 169
31 593 227 229 210 215 243 162
32 579 249 227 229 210 215 181
33 609 265 249 227 229 210 170
34 602 257 265 249 227 229 183
35 617 253 257 265 249 227 205
36 601 239 253 257 265 249 185
I I I . C
39
2. I n Exampl e 11. 4 ( Wooldridge p.389) i t may be expect ed t hat t he expect ed
val ue of t he r et ur n at t i me t, i t a quadr at i c f unct i on of return
t1
. To
check t hi s possi bi l i t y, use t he dat a i n NYSE.RAW t o est i mat e
return
t
= β
0
+ β
1
return
t1
+ β
2
return
2
t1
+ u
( a) r epor t t he r esul t s i n st andar d f or m
( b) St at e and t est t he nul l hypot hesi s t hat E( return
t
return
t1
) does not
depend on return
t1
. ( Hi nt : Ther e ar e t wo r est r i ct i ons t o t est her e. )
( c) Dr op return
2
t1
f r omt he model , but add t he i nt er act i on t er m
return
t1
ּreturn
t2
. Now t est t he ef f i ci ent mar ket s hypot hesi s ( β
1
= β
2
= 0) .
( d) What do you concl ude about pr edi ct i ng weekl y st ock r et ur ns based on
past st ock r et ur ns?
(Wooldridge C. 11.3)
1 I V
James B. McDonal d
Br i ghamYoung Uni ver si t y
7/ 12/ 2010
V. Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
A. Introductory Comments, B. Nonnormality of errors, C. Nonzero mean of errors, D.
Generalized Regression Model, E. Heteroskedasticity, F. Autocorrelation, G. Panel Data, H.
Stochastic X’s, I. Measurement Error, J. Specification Error
A. Introductory Comments
The Cl assi cal Nor mal Li near Regr essi on Model i s def i ned by:
y = Xβ + ε
wher e ( A. 1) ε i s di st r i but ed nor mal l y
( A. 2) E( ε
t
) = 0 f or al l t
( A. 3) Var ( ε
t
) = σ
2
f or al l t
( A. 4) Cov ( ε
t
ε
s
) = 0 f or t ≠ s
( A. 5) The X' s ar e nonst ochast i c and
( )
n
X X
lim
n
′
∞ → i s nonsi ngul ar ,
Σ
X
.
Recal l t hat assumpt i ons ( A. 1)  ( A. 4) can be wr i t t en mor e compact l y as
ε ~ N[ 0, Σ = σ
2
I ] .
I n sect i on ( I I ' ) we demonst r at ed t hat under assumpt i ons ( A. 1)  ( A. 5) t he
l east squar es est i mat or ( β
ˆ
) , t he maxi muml i kel i hood est i mat or (
∆
β ) , and
t he best l i near unbi ased est i mat or ( β
~
) ar e i dent i cal , i . e. ,
β
ˆ
= β
~
=
∆
β = ( X' X)
 1
X' y and
β
ˆ
~ N[ β; σ
2
( X' X)
 1
] .
Addi t i onal l y, we pr oved t hat t he l east squar es est i mat or β
ˆ
( hence β
~
and
∆
β ) ar e
•unbi ased est i mat or s
2 I V
•mi ni mumvar i ance of al l unbi ased est i mat or s
3 I V
•consi st ent
•asympt ot i cal l y ef f i ci ent .
I n t hi s sect i on we wi l l demonst r at e t hat t he st at i st i cal pr oper t i es of
β
ˆ
ar e cr uci al l y dependent upon t he val i di t y of assumpt i ons ( A. 1)  ( A. 5) .
The associ at ed di scussi on wi l l pr oceed by dr oppi ng one assumpt i on at a
t i me and consi der i ng t he consequences. Fi r st , we wi l l dr op ( A. 1) and t hen
( A. 2) . Thi s wi l l be f ol l owed by consi der i ng t he gener al i zed r egr essi on
model whi ch can be vi ewed as a gener al i zed model whi ch i ncl udes
het er oskedast i ci t y ( vi ol at i on of ( A. 3) ) , aut ocor r el at i on ( vi ol at i on of
( A. 4) ) , and t he cl assi cal nor mal l i near r egr essi on model as speci al cases.
I n Sect i ons G, H, and I we wi l l consi der t he i mpl i cat i ons of vi ol at i ng
( A. 5) , t he exi st ence of measur ement er r or , and pr esence of speci f i cat i on
er r or ( guessi ng t he wr ong model ) .
B. The Random Disturbances are not distributed normally, but (A.2)(A.5) are valid.
An i nspect i on of t he der i vat i on of t he l east squar es est i mat or β
ˆ
r eveal s t hat t he deduct i on i s i ndependent of any of t he assumpt i ons
( A. 1)  ( A. 5) ; hence,
β
ˆ
= ( X' X)
 1
X' y
i s st i l l t he cor r ect f or mul a f or t he l east squar es est i mat or of β i n t he
model
y= Xβ + ε
r egar dl ess of t he assumpt i ons about t he di st r i but i on of ε. However , i t
shoul d be ment i oned t hat t he st at i st i cal pr oper t i es of β
ˆ
ar e ver y
sensi t i ve t o t he assumpt i ons about t he di st r i but i on of ε.
Si mi l ar l y, we not e t hat t he BLUE of β i s i nvar i ant wi t h r espect t o
t he assumpt i ons about t he under l yi ng pr obabi l i t y densi t y f unct i on of ε as
l ong as ( A. 2)  ( A. 5) ar e val i d. I n t hi s case we can concl ude t hat
β
ˆ
= β
~
= ( X' X)
 1
X' y
4 I V
and bot h β
ˆ
and β
~
wi l l be
• unbi ased
• mi ni mumvar i ance of al l l i near unbi ased est i mat or s
( not necessar i l y of al l unbi ased est i mat or s si nce t he Cr amer Rao
l ower bound depends upon densi t y of t he r esi dual s)
• consi st ent
• st andar d t and F t est s and conf i dence i nt er val s ar e not necessar i l y
val i d f or nonnor mal l y di st r i but ed r esi dual s.
The di st r i but i on of β
ˆ
wi l l depend on t he di st r i but i on of ε whi ch
det er mi nes t he di st r i but i on of y ( y = Xβ + ε) and t he di st r i but i on of β
ˆ
and β
~
( β
ˆ
= β
~
= ( X' X)
 1
X' y ) .
Let ' s consi der t he MLE of β. Recal l t hat t he f i r st st ep i n t he
der i vat i on of MLE of β i s t o def i ne t he l i kel i hood f unct i on, f or
i ndependent and i dent i cal l y di st r i but ed obser vat i ons,
L = f ( y
1
; β) . . . f ( y
n
; β)
whi ch r equi r es a knowl edge of t he di st r i but i on of t he r andomdi st ur bances
and coul d not be def i ned ot her wi se. MLE ar e gener al l y ef f i ci ent . Least
squar es est i mat or s wi l l be ef f i ci ent i f f ( y; ) = nor mal . However , l east
squar es need not be ef f i ci ent i f t he r esi dual s ar e not di st r i but ed
nor mal l y. For exampl e, i f ε i s di st r i but ed as a Lapl ace wi t h A. 2 A. 5
hol di ng, OLS wi l l be consi st ent and BLUE, but not ef f i ci ent .
Consi der t he case i n whi ch t he densi t y f unct i on of t he r andom
di st ur bances i s t he Lapl ace or doubl e exponent i al def i ned by
( )
 /
e
f ;  < <
2
ε λ
 
ε σ = ∞ ε ∞

λ
\ ¹
whi ch can be gr aphi cal l y depi ct ed as
5 I V
f ( ε
t
)
Thi s densi t y has t hi cker t ai l s t han t he nor mal and i s mor e peaked at 0.
The associ at ed l i kel i hood f unct i on i s def i ned by
L = f ( y
1
; β, λ ) . . . f ( y
n
; β, λ )
1 n 1 n
  /   / y y X X
e e
= . . .
2 2
β λ β λ
λ λ
wher e X
t
= ( 1, x
t 2
, . . . , x
t k
) , β' = ( β
1
, . . . , β
k
) . The l og
l i kel i hood f unct i on i s gi ven by
t
t
1
= lnL =     /  nln(2 ).
X
n
t
y β λ λ
=
∑
l
6 I V
The MLE of β i n t hi s case wi l l mi ni mi ze t he sumof t he absol ut e val ue of
t he er r or s
t
t
t
  
X
y β
∑
and i s somet i mes cal l ed t he "l east l i nes, " mi ni mumabsol ut e devi at i ons
( MAD) , l east absol ut e devi at i on ( LAD) , or l east absol ut e er r or ( LAE)
est i mat or ; wher eas, t he l east squar es est i mat or of β mi ni mi zes t he sumof
squar ed er r or s
( )
2
t t
t
y X β −
∑
and wi l l not be t he MLE est i mat or
∆
β i n t hi s case. For t he l i near
r egr essi on model wi t h Lapl ace er r or t er ms
∆
β ( LAD) wi l l be unbi ased,
consi st ent , and asympt ot i cal l y ef f i ci ent . The f ol l owi ng t abl e compar es
and cont r ast s t he r el at i ve per f or mance of OLS and LAD est i mat or s f or t he
t wo di f f er ent er r or di st r i but i ons, t he nor mal and Lapl ace.
Var i ance covar i ance mat r i ces of t he OLS and LAD est i mat or s
Est i mat or \ er r or
di st r i but i on
Nor mal Lapl ace
OLS
( )
1
2
' X X σ
−
( )
1
2
' X X σ
−
LAD
( )
1
2
2 ' X X σ
−
( )
2
1
'
2
X X
σ −  

\ ¹
Fr omt hi s t abl e we can see t hat t he var i ance of LAD est i mat or s i s t wi ce
t hat of t he cor r espondi ng OLS est i mat or s f or nor mal er r or s, but i s hal f
t he OLS var i ance f or Lapl ace er r or s. Recal l t hat t he Lapl ace pdf has
t hi cker t ai l s t han t he nor mal ; hence, t he pr esence of out l i er s LAD may be
pr ef er r ed t o OLS. LAD est i mat or s can be obt ai ned usi ng t he St at a command
qreg y X’s
7 I V
The exer ci se set consi der s a gener al i zed er r or ( GED) di st r i but i on
whi ch i ncl udes bot h t he nor mal and doubl e exponent i al or Lapl ace as
speci al cases. Consequent l y, l east squar es and LAD est i mat or s ar e speci al
cases of MLE of t he GED di st r i but i on.
I n t he past , t he f unct i onal f or mof t he di st r i but i on of t he r esi dual s
has r ar el y been i nvest i gat ed. Thi s i s changi ng and coul d be i nvest i gat ed
by compar i ng t he di st r i but i on of ε
t
wi t h t he nor mal .
Var i ous t est s have been pr oposed t o i nvest i gat e t he val i di t y of t he
nor mal i t y assumpt i on. These t est s t ake di f f er ent f or ms. One cl ass of
t est s i s based on exami ni ng t he skewness or kur t osi s of t he di st r i but i on
of t he est i mat ed r esi dual s.
The skewness coef f i ci ent
3
1 3/ 2
2
E( )
=
( )
ε
γ
σ
whi ch can be est i mat ed by
3
1
1 3/ 2
2
1
/
ˆ
/
n
t
t
n
t
t
n
n
ε
γ
ε
=
=
 

\ ¹
=
 

\ ¹
∑
∑
and has an asympt ot i c di st r i but i on
N( 0, 6/ n) .
Si mi l ar l y, t he excess kur t osi s coef f i ci ent
4
2 2
2
E( )
=  3
( )
ε
γ
σ
can be est i mat ed by
4
t
t
2
2
2
t
t
/
e
ˆ  3
( )
e
/
n
n
γ
 

\ ¹
=
∑
∑
8 I V
and has an asympt ot i c di st r i but i on
N( 0, 24/ n)
f or nor mal l y di st r i but ed r esi dual s. These t wo r esul t s pr ovi de t he basi s
f or const r uct i ng “t  t ype” t est s t o t est whet her t he sampl e skewness or
kur t osi s ar e consi st ent wi t h t he assumpt i on of nor mal l y di st r i but ed
r esi dual s.
The Jar que Ber a t est pr ovi des a j oi nt t est of a symmet r i c di st r i but i on
f or t he r esi dual wi t h kur t osi s of t hr ee. The t est st at i st i c i s def i ned by
( )
2
2
excess kurtosis
skewness
JB = n +
6 24
(
(
(
¸ ¸
and has an asympt ot i c Chi squar e di st r i but i on wi t h t wo degr ees of f r eedom.
The di st r i but i on of JB f ol l ows f r omi t bei ng equal t o t he sumof squar es
of t wo asympt ot i cal l y i ndependent st andar d nor mal var i abl es.
Chi  squar e goodness of f i t t est s have al so been pr oposed whi ch ar e
based upon compar i ng t he hi st ogr amof est i mat ed r esi dual s wi t h t he nor mal
di st r i but i on.
These t est st at i st i cs and ot her s ar e avai l abl e out put on such pr ogr ams
as St at a, SAS, or SHAZAM. The St at a commands ar e gi ven bel ow.
To t est f or st at i st i cal l y si gni f i cant depar t ur es of skewness and
kur t osi s f r omt he nor mal , t he commands ar e:
reg y X’s
predict resid, res
sum resid, detail
sktest resid
The out put f r omt he sktest e al so i ncl udes t he cal cul at i on of a
Jar que Ber a l i ke t est , al ong wi t h t he associ at ed p val ues. The
exact t est st at i st i cs di f f er f r omt hose out l i ned above, but ar e
si mi l ar i n st r uct ur e and t est s t he same hypot heses.
(D’Agostino, Belander, and D’Agostino, American Statistician, 1990, pp. 316321)
To per f or ma Chi  squar e t est i n St at a, you must f i r st i nst al l t he
“csgof ” command by t ypi ng
findit csgof
and t hen i nst al l i ng t he command and hel p f i l es.
9 I V
The Kol mogor ov Smi r nov t est i s based upon t he di st r i but i on of t he
maxi mumver t i cal di st ance bet ween t he cumul at i ve hi st ogr amand t he
cumul at i ve di st r i but i on of t he hypot hesi zed di st r i but i on. James Ramsey' s
pr ogr amSEA ( Speci f i cat i on Er r or Anal ysi s) enabl es one t o per f or msuch a
t est . Thi s can al so be per f or med i n St at a usi ng t he command “ksmi r nov”.
An al t er nat i ve appr oach i s t o consi der gener al di st r i but i on f unct i ons
whi ch i ncl ude many of t he common al t er nat i ve speci f i cat i ons such as t he
nor mal as speci al cases. The f i r st pr obl emi n t he pr obl emset i l l ust r at es
t hi s appr oach. Fi ve ot her di st r i but i ons whi ch mi ght al so be consi der ed
ar e t he gener al i zed t , skewed gener al i zed t , t , EGB2, and I nver se
Hyper bol i c Si ne di st r i but i ons. Est i mat i on pr ocedur es exi st whi ch per f or m
wel l f or non nor mal di st r i but i ons. Some of t hese ar e r ef er r ed t o as
r obust , M, semi par amet r i c, or par t i al l y adapt i ve est i mat or s whi ch
accommodat e ver y f l exi bl e under l yi ng di st r i but i ons. Ker nel est i mat or s
pr ovi de anot her appr oach t o t hi s pr obl emwhi ch ar e nonpar amet r i c i n t hat
t hey ar e i ndependent of a di st r i but i onal assumpt i on. Usi ng some of t hese
al t er nat i ve est i mat or s, t he hypot hesi s of nor mal l y di st r i but ed r esi dual s
can al so be t est ed usi ng t he LR, Wal d, or Rao or Lagr angi an mul t i pl i er
t est s.
10 I V
C. ε ~ N (µ, Σ ΣΣ Σ = σ
2
I), i.e., drop (A.2)
The l east squar es est i mat or s of β i s gi ven by
β
ˆ
= ( X' X)
 1
X' y
The expect ed val ue of β
ˆ
i s gi ven as f ol l ows
E( β
ˆ
) = ( X' X)
 1
X' E( y)
= ( X' X)
 1
X' ( Xβ + E( ε) )
= ( X' X)
 1
X' Xβ + ( X' X)
 1
X' µ
= β + ( X' X)
 1
X' µ
wi t h t he second t er mr epr esent i ng t he bi as, whi ch appear s t o suggest t hat
al l of t he l east squar es est i mat or s i n t he vect or β
ˆ
ar e bi ased.
However , i f E( ε
t
) = µ f or al l t , t hen
1
. .
= = . .
. .
1
µ
µ µ
µ
   
 
 
 
 
 
 
\ ¹ \ ¹
and i t can be shown t hat
( X' X)
 1
X' µ = ( X' X)
 1
X'




¹

\
µ
µ







¹

\

0
0 =
1
.
.
.
1
11 I V
and onl y t he est i mat or of t he i nt er cept i s bi ased. I f an er r or
di st r i but i on has a nonzer o mean, t hi s get s i ncl uded i n t he i nt er cept t er m
and separ at e est i mat es of β
1
and µ can' t be obt ai ned.
Mor e gener al vi ol at i ons of ( A. 2) such as a non zer o, non const ant mean can
l ead t o bi ased est i mat or s of t he i nt er cept and sl ope coef f i ci ent s.
β
1
+ β
2
X
t
µ
12 I V
D. Generalized Normal Linear Regression Model
1. Introduction
I n many economi c appl i cat i ons ei t her ( A. 3) or ( A. 4) i s vi ol at ed, i . e. ,
Het er oskedast i ci t y: Var ( ε
t
) ≠ σ
2
f or al l t
Aut ocor r el at i on: Cov ( ε
t
, ε
s
) ≠ 0 f or t ≠ s
For si t uat i ons i n whi ch ei t her or bot h aut ocor r el at i on and
het er oskedast i ci t y exi st s
Var ( ε) = Σ ≠ σ
2
I ,
13 I V
t he model can be wr i t t en mor e gener al l y as
y = Xβ + ε
( A. 1)  ( A. 4) ε ~ N( 0, Σ)
( A. 5) Same as bef or e
Thi s model i s r ef er r ed t o as t he gener al i zed nor mal l i near r egr essi on
model and i ncl udes t he cl assi cal nor mal l i near r egr essi on model as a
speci al case, i . e. , when
Σ = σ
2
I .
The unknown par amet er s i n t he gener al i zed r egr essi on model ar e t he
1 k
n(n  1)
's = ( , ..., ) and the n(n+1) / 2 = n +
2
 
β β β

\ ¹
i ndependent par amet er s i n t he symmet r i c mat r i x Σ. I n gener al i t i s not
possi bl e t o est i mat e Σ unl ess some si mpl i f yi ng assumpt i ons ar e made.
For exampl e, wi t h t he case of het er oskedast i ci t y al one



¹

\

ε
ε
= Σ
) ( Var 0
0 ) ( Var
n
1
O
or f or aut ocor r el at i on al one








¹

\

σ
ε ε ε ε σ
∑
2
n 1 2 1
2
) , Cov( ... ) Cov(
=
M O
and f or t he cl assi cal nor mal l i near r egr essi on model
14 I V



¹

\

σ
σ
= Σ
2
2
0
0
O
2. Estimators of β
a. Least squar es est i mat i on
SSE = ( y Xβ) ' ( y Xβ)
= y' y  2β' X' y + β' X' Xβ
SSE
= 2X y + 2X X
∂
′ ′ β
∂β
Set t i ng t hi s der i vat i ve equal t o zer o and sol vi ng yi el ds:
ˆ
2 ' 2 ' X y X Xβ =
β
ˆ
= ( X' X)
1
X' y
b. Maxi muml i kel i hood est i mat i on
1
(1/ 2)( yX )' ( yX )
n/ 2 1/ 2
e
L(y; ) =
(2  ) 
β β
∑
β
π ∑
l = l nL = (  n/ 2) l n ( 2π)  1/ 2 l n Σ  1/ 2 ( y Xβ) ' Σ
 1
( y  Xβ)
= (  n/ 2) l n ( 2π)  1/ 2 l n Σ  1/ 2 ( y' Σ
 1
y  2β' X' Σ
 1
y +
β' X' Σ
 1
Xβ)
1 1
d
= (1/ 2)(2 X' y + 2X' X )
d
β
β
∑ ∑
l
Set t i ng t hi s der i vat i ve equal t o 0 and sol vi ng i mpl i es
1 1
(X X) = X y
∆
′ ′ β
∑ ∑
whi ch ar e r ef er r ed t o as t he modi f i ed nor mal equat i ons. The sol ut i on
of t hese equat i ons
1 1 1
= (X X X y )
∆
′ ′ β
∑ ∑
15 I V
i s t he maxi muml i kel i hood est i mat or of β.
16 I V
c. Best l i near unbi ased est i mat or
Li near i t y condi t i on: β
~
= Ay wher e A i s a k x n mat r i x of
unknown const ant s.
Unbi ased condi t i on: Sel ect A so t hat
E( β
~
) = β, whi ch r equi r es E( β
~
) = AE( y) = AXβ => AX = I
Mi ni mumvar i ance condi t i on: Sel ect A so t hat
E( β
~
) = β and Var ( β
~
) i s a mi ni mum. Let Var ( β
~
k
) = a'
k
Σa
k
wher e a'
k
i s kt h r ow of t he mat r i x A. The mi ni mi zat i on pr obl emi s
t o mi n a'
k
Σa
k
s. t . X' a
k
= i
k
( wher e i
k
i s t he kt h col umn of t he
i dent i t y mat r i x) .
l = a' Σa + λ' ( X' a I )
= 2 a + X = 0
a
∂
∑ λ
′ ∂
l
= X'a  I = 0, so X'A = I
∂
′ ∂λ
l
1
1
a = X
2
λ
∑
.
Now f r om X' a = I , we subst i t ut e f or a and have:
1
1
=> X X = I
2
′ λ
∑
λ =  2 ( X' Σ
 1
X)
 1
I
=> a = Σ
 1
X( X' Σ
 1
X)
 1
I
a' = I ' ( X' Σ
 1
X)
 1
X' Σ
 1
so A = ( X' Σ
 1
X)
 1
X' Σ
 1
and
y X ) X X ( =
~
1 1 1
∑
′
∑
′ β .
We obser ve t hat t he BLUE and MLE of β ar e i dent i cal , but di f f er ent
f r omt he l east squar es est i mat or of β.
17 I V
3. Distribution of β
ˆ
, β
~
, and
∆
β .
For t he Cl assi cal Nor mal Li near Regr essi on Model
( ε ~ N ( 0, σ
2
I ) )
β
ˆ
= β
~
=
∆
β = ( X' X)
 1
X' y ~ N( β; σ
2
( X' X)
 1
)
For t he Gener al i zed Regr essi on Model ( ε ~ N( 0, Σ) ) we have
β
ˆ
= ( X' X)
 1
X' y = A
1
y
and
β
~
=
∆
β = ( X' Σ
 1
X)
 1
X' Σ
 1
y = A
2
y
Maki ng use of t he usef ul t heor em
I f y ~ N[ µ
y
; Σ
y
] , t hen
z = Ay ~ N [ µ
z
= Aµ
y
; Σ
z
= AΣ
y
A' ] ,
we obt ai n
β
ˆ
~ N [ A
1
Xβ; A
1
Σ A'
1
]
~ N [ β; ( X' X)
 1
X' Σ X( X' X)
 1
]
β
~
=
∆
β ~ N [ A
2
Xβ; A
2
Σ A'
2
]
~ N [ β; ( X' Σ
 1
X)
 1
] .
Not e t hat t he β
ˆ
, β
~
, and
∆
β ar e unbi ased est i mat or s of β, but
Var ( β
ˆ
i
) > Var ( β
~
i
) = Var (
∆
β
i
) .
Al so not e t hat f or t he case Σ = σ
2
I , t hese r esul t s i ncl ude t he
f ol l owi ng as a speci al case
β
ˆ
= β
~
=
∆
β ~ N [ β, σ
2
( X' X)
 1
] .
18 I V
4. Consequences of using least squares formulas when Var(ε) = Σ ΣΣ Σ ≠ ≠≠ ≠ σ
2
I
β
ˆ
= ( X' X)
 1
X' y and Var ( β
ˆ
) =( X' X)
 1
X' Σ X( X' X)
 1
( )
1
2
' X X σ
−
≠
a. β
ˆ
i s an unbi ased and consi st ent est i mat or of β.
b. β
ˆ
i s not ef f i ci ent , Var ( β
ˆ
i
)
( )
i
Var β ≥
%
.
c. The use of σ
2
( X' X)
 1
wi l l f r equent l y r esul t i n ser i ous
under est i mat es of Var ( β
ˆ
) . *Associ at ed f or ms of t and F
st at i st i cs ar e no l onger val i d. However , r obust measur es of
t he act ual st andar d er r or s can be used t o const r uct “t 
st at i st i cs” whi ch ar e asymptotically val i d.
d. Pr edi ct i ons of y
t
based on OLS wi l l yi el d l ar ger sampl i ng
var i at i on t han coul d
β
i
19 I V
be obt ai ned usi ng al t er nat i ve t echni ques. See t he next
sect i on f or mor e det ai l .
5. Predictions in the generalized regression model:
Gol dber ger ( JASA, 1962) demonst r at ed t hat t he best unbi ased
pr edi ct i on of y
t
i n per i od n + h, h per i ods i n t he f ut ur e, i s
gi ven by
y
n
( h) = yˆ
n+h
= X
n+h
∆
β + W' Σ
 1
e
wher e
∆
β = ( X' Σ
 1
X)
 1
X' Σ
 1
y
e = y  X
∆
β
W= E( ε'
N + h
ε) .
Ther ef or e t he pr edi ct i ons f or OLS or MLE may have sampl i ng var i ances
whi ch ar e l ar ger t han coul d be obt ai ned usi ng t he Gol dber ger
t echni que.
Not e:
a. I f t he ε' s ar e uncor r el at ed t hen
1
n+h 1
n n
n+h n
n
W = E = E = 0
+
 ε   
  ε ε  
ε

 
ε ε
\ ¹
 
ε
\ ¹ \ ¹
M
and t he best l i near unbi ased pr edi ct or of y
t
i n per i od n+h i s
yˆ
n+h
= X
n+h
∆
β
b. I f t her e i s cor r el at i on bet ween t he r andomdi st ur bances, t hen
t he best l i near unbi ased pr edi ct or may di f f er f r omour BLUE of
20 I V
t he det er mi ni st i c component X
n+h
β. The adj ust ment , W’Σ
 1
e,
woul d “cor r ect ” f or t he exi st ence of cor r el at i on bet ween t he
r andomdi st ur bances.
21 I V
6. Alternative methods of obtaining BLUE or MLE of β by transforming data or
using Generalized Least Squares (GLS).
The di scussi on i n t hi s sect i on pr ovi des mot i vat i on f or t he way MLE
can be per f or med i n r egr essi on pr ogr ams. Consi der t he gener al i zed
r egr essi on model :
y= Xβ + ε ε ~ N ( 0, Σ)
Tr ansf or mt he model ( and dat a) by pr emul t i pl yi ng by a
transformation mat r i x T, i . e. ,
[ Ty] = [ TX] β + [ Tε]
I f we sel ect a transformation mat r i x T such t hat
Tε ~ N ( 0, TΣT' = σ
2
I ) ,
t hen i t f ol l ows t hat
TΣT' = σ
2
I ( Tr ansf or med er r or t er ms Tε, sat i sf y ( A. 1)  ( A. 4) ) .
Σ = σ
2
T
 1
( T' )
 1
or Σ
 1
= σ
 2
T' T.
Appl yi ng l east squar es t o t he t r ansf or med dat a, we obt ai n
β
ˆ
T
= [ ( TX) ’TX]
 1
[ X’T’Ty] = ( ) ( )
1
' ' ' ' X T TX X T Ty
−
=
whi ch yi el ds t he maxi muml i kel i hood est i mat or of β, i . e. ,
β
ˆ
T
= ( X’Σ
 1
X)
 1
X’Σ
 1
y
In other words, applying least squares to an appropriately transformed
regression model will yield MLE of β. These estimators are sometimes
referred to as generalized least squares (GLS) estimators of β .
22 I V
7. Robust estimates of the standard errors of the OLS estimator
As we not ed ear l i er , i f
2
I Σ ≠ σ ,
( )
( ) ( )
1 1
1 2
OLS
Var X'X X' X(X'X) X X
ˆ
'
−
β = Σ ≠ σ and OLS “st andar d er r or s”
r epor t ed by most comput er pr ogr ams, ( )
1
2
' s X X
−
, wi l l be
i nappr opr i at e f or const r uct i ng t  st at i st i cs. Whi t e ( 1980,
Economet r i ca, pp. 817 838) and Newey West ( 1987, Economet r i ca, 703
708) out l i ne how t o obt ai n consi st ent est i mat or s of t he cor r ect
( )
ˆ
OLS
Var β f or t he cases of het er oskedast i ci t y and aut ocor r el at i on.
These pr ocedur es ar e pr ogr ammed i nt o many economet r i c packages.
In Stata
. for heteroskedasticity: reg dep_var rhs_vars, robust
or
. for autocorrelation: newey dep_var rhs_vars, lag(#) wher e
( #) i s
t he maxi mumnumber of l ags t o consi der i n t he aut ocor r el at i on
st r uct ur e. Typi ng “l ag( 0) i s t he same as usi ng t he “r eg …,
r obust ”
command above.
23 I V
E. Heteroskedasticity (Violation of (A.3))
1. Introduction
I n cer t ai n appl i cat i ons t he r esear cher may f i nd t hat t he
assumpt i on
Var ( y
t
) = Var ( ε
t
) = σ
2
f or al l t
appear s t o be i nconsi st ent wi t h t he dat a and model under
consi der at i on. Thi s pr obl emcan ar i se i n a number of cont ext s. For
exampl e, i f t he dat a ar e obt ai ned by combi ni ng cr oss sect i onal and
t i me ser i es dat a wher e di f f er ent sampl e si zes ar e i nvol ved, one mi ght
expect t he aver ages ( or t ot al s) associ at ed wi t h t he l ar gest sampl e
si ze t o have a di f f er ent var i ance t han obser vat i ons associ at ed wi t h
t he smal l est sampl e si ze. Anot her exampl e of het er oskedast i ci t y whi ch
mi ght ar i se i n an anal ysi s of expendi t ur e pat t er ns ( C
t
) cor r espondi ng
t o di f f er ent i ncome l evel s ( y
t
) i n budget st udi es.
I n t hi s exampl e we not e t hat t her e appear s t o be gr eat er var i at i on i n
consumpt i on l evel s associ at ed wi t h hi gher i ncome l evel s t han f or l ower
β
2
= sl ope
β
1
24 I V
l evel s. Thi s mi ght ar i se because i ndi vi dual s wi t h hi gher i ncomes can
make mor e di scr et i onar y pur chases t han t hose wi t h l ower i ncomes who
spend most of t hei r i ncome on necessi t i es. Thi s si t uat i on coul d be
model ed as
C
t
= β
1
+ β
2
Y
t
+ ε
t
( A. 1) , ( A. 2) , ( A. 3) ’: ε
t
~ N( 0, σ
t
2
)
( A. 4) Cov ( ε
t
, ε
s
) = 0 t ≠ s
( A. 5) Same as bef or e.
Mor e gener al l y t he het er oskedast i c model can be model ed as
y = Xβ + ε
( A. 1) ' ε ~ N[ 0, Σ]
( A. 5) The X' s ar e nonst ochast i c and
1
n
(XX)
Limit
n →∞
′
i s nonsi ngul ar
wher e
2
1
2
2
2
n
... 0
0
. . .
. . .
. . .
0 ...
 
σ

σ


Σ =





σ \ ¹
.
As not ed i n t he pr evi ous sect i on, i f Σ ≠ σ
2
I ( any of t he var i ances
ar e unequal ) , l east squar es est i mat or s wi l l not be equal t o t he MLE or
BLUE of β. Least squar es est i mat or s wi l l st i l l be unbi ased and
consi st ent , but wi l l not be mi ni mumvar i ance nor asympt ot i cal l y
ef f i ci ent and t he st andar d st at i st i cal t est s based on l east squar es
ar e i nval i d. For t hi s r eason i t i s i mpor t ant t o t est f or t he
exi st ence of het er oskedast i ci t y.
25 I V
2. Test for Heteroskedasticity
The basi c i dea behi nd al l of t hese t est s i s t o det er mi ne whet her t her e
appear s t o be any syst emat i c behavi or of t he var i ances of t he er r or s.
The f i r st t est , t he Gol df el d Quandt t est , gr oups t he dat a and t est s
f or equal i t y of t he var i ances of t he di f f er ent gr oups. Many of t he
ot her t est s use t he squar ed OLS r esi dual
( )
2
t
e as a pr oxy f or
2
t
σ and
sear ch f or syst emat i c r el at i onshi ps bet ween
( )
2
t
e and ot her var i abl es.
a. GoldfeldQuandt Test
The nul l hypot hesi s t o be i nvest i gat ed i s
H
0
:
2
1
σ =
2
2
σ = . . . =
2
n
σ
A common t est f or het er oskedast i ci t y i s t he Gol df el d Quandt t est .
( 1) Di vi de t he dat a i nt o t hr ee gr oups ( r oughl y equal si zes n
1
+ n
2
+ n
3
= n)
( 2) Run separ at e r egr essi on on gr oups I and I I I . Let s
2
I
and s
2
III
r epr esent t he cor r espondi ng est i mat or s of σ
2
.
( 3) Under t he nul l hypot hesi s of homoskedast i ci t y,
2
III
3 1
2
I
s
~ F(  k,  k)
n n
s
*pl ace t he l ar ger s
2
i n t he numer at or .
26 I V
Under t he nul l hypot hesi s one woul d expect
2
III
2
I
s
s
t o be f ai r l y
cl ose t o one and l ar ge di f f er ences f r omone woul d pr ovi de t he
basi s f or r ej ect i ng t he nul l hypot hesi s. Thi s i s an exact
t est . A di sadvant age of t he t est ar i ses i n cases i n whi ch
many r egr essor s ar e i nvol ved and a nat ur al or der i ng may not be
obvi ous t o f or mt he t hr ee gr oups.
b. The Park test (Glejser test) can be t hought of as bei ng based upon
usi ng e
t
as a pr oxy f or σ
t
and t hen i nvest i gat i ng r el at i onshi ps
of t he f or m
e
t
= f ( X
t
) or
2
t
e = g( X
t
) .
F(n
3
 k, n
1
 k)
Fail to Reject H
0
Reject H
0
27 I V
Var i ous f or ms f or t he f unct i ons f ( ) and g( ) have been
consi der ed. The nul l hypot hesi s of homoskedast i ci t y i s t est ed by
i nvest i gat i ng whet her t he X’s i n f ( X
t
) or g( X
t
) have any
col l ect i ve expl anat or y power . St at i st i cal l y si gni f i cant
expl anat or y power of t he
Xt
woul d pr ovi de t he basi s f or r ej ect i ng
t he assumpt i on of homoskedast i ci t y. The exact val i di t y of F t est s
i s quest i onabl e, wi t h t hei r use bei ng based on asympt ot i c
consi der at i ons. Recal l t hat t he e
t
' s ar e cor r el at ed even i f t he ε
t
' s ar e uncor r el at ed.
c. The White test [ Economet r i ca, 1980, pp. 817 38] . Hal Whi t e
suggest s r egr essi ng
2
t
e on al l of t he expl anat or y var i abl es, t hei r
squar es, and cr oss pr oduct s and t hen t est i ng f or t he col l ect i ve
expl anat or y power of t he r egr essor s. The r at i onal e f or t hi s t est
i s t hat t he hypot hesi s
2
t
σ = f ( X
t
) i s bei ng i nvest i gat ed wi t h
2
t
e as
a pr oxy f or
2
t
σ and usi ng a second or der Tayl or Ser i es
appr oxi mat i on f or t he f unct i on f ( X
t
) . The nul l hypot hesi s of
homoskedast i ci t y woul d be consi st ent wi t h a l ack of st at i st i cal
si gni f i cance t est . Whi t e ment i ons t he use of a Rao or Lagr angi an
mul t i pl i er t est
LM= NR
2
whi ch i s asympt ot i cal l y Chi squar e wi t h degr ees of f r eedomequal
t o t he number of sl ope coef f i ci ent s,
2
1) 2)(k (k+
, i n t he “
2
t
e
auxi l i ar y” r egr essi on equat i on.
Not e: The R
2
i n t he LMt est i s t he R
2
f r omt he pr evi ousl y
descr i bed “
2
t
e r egr essi on” equat i on. The Whi t e t est can be
per f or med by r et r i evi ng t he est i mat ed er r or s and r egr essi ng t hem
on t he var i abl es, t hei r squar es, and cr oss pr oduct s.
28 I V
Al t er nat i vel y, t he St at a command reg y x’s, f ol l owed by whitetst on
t he next l i ne wi l l aut omat i cal l y per f or mt he Whi t e Test .
d. The modified White test. For l ar ge k, t he Whi t e t est i nvol ves
many r egr essor s wi t h l ar ge degr ees of f r eedom. To ci r cumvent t hi s
pr obl em, Whi t e pr oposed an al t er nat i ve t est based on est i mat i ng
t he model :
2 2
0 1 2
ˆ ˆ
t t t t
e y y δ δ δ η = + + +
wher e ˆ
t
y denot es t he pr edi ct ed y’s f r oman i ni t i al OLS est i mat i on
of t he or i gi nal model The cor r espondi ng LMt est (
2
NR ) i s
asympt ot i cal l y di st r i but ed as a ( )
2
2 χ .
e. BreuschPagan Test. Thi s t est i s i ncl uded i n St at a. I t
i s per f or med by r egr essi ng t he squar es of t he est i mat ed er r or s on
t he X’s or ot her var i abl es and t est i ng f or t he col l ect i ve
expl anat or y power usi ng an LMt est or an F t est . The St at a
commands ar e:
reg y x
estat hettest (performs the regression
2
0 1
ˆ
t t t
e y δ δ η = + + ) , iid ( r epor t s LMt est st at i st i c) or fstat
( r epor t s t he F st at i st i c)
Al t er nat i ves or var i at i ons
estat hettest x’s, iid or normal or fstat
estat hettest, rhs
estat hettest x’s, x^2’s, crossproducts, iid
or fstat
estat hettest yhat yhat^2, ftest or iid
wher e t he LM or F t est s can be used t o
t est
2 2
0
:
t
H σ σ = ( homoskedast i ci t y) .
3. Estimation
29 I V
a. Viewed as applying OLS to an appropriately transformed model (Stata)
For appl i cat i ons i n whi ch t he r andomdi st ur bances ar e
char act er i zed by het er oskedast i ci t y, BLUE and MLE of β wi l l be
unbi ased, consi st ent , and have smal l er var i ances t han l east
squar es est i mat or s. I n sect i on ( I V. D. 5) we demonst r at ed t hat i f a
mat r i x T can be f ound such t hat
Var ( Tε) = σ
2
I ( or Σ
 1
= σ
 2
T' T) ,
t he MLE ( and BLUE) of β can be obt ai ned by t r ansf or mi ng t he dat a
( model ) f r om
y = Xβ + ε
t o
Ty = TXβ + Tε
and appl yi ng l east squar es t o t he t r ansf or med model .
Consi der t he model
y
t
= X
t
β + ε
t
= β
1
+ β
2
x
t 2
+ . . . + β
k
x
t k
+ ε
t
wher e ε
t
~ N ( 0, σ
2
t
) .
We wi l l consi der t he t r ansf or mat i on f r oma sl i ght l y di f f er ent
per spect i ve. The or i gi nal model can be t r ansf or med t o a f or m
char act er i zed by homoskedast i ci t y by pr emul t i pl yi ng t he or i gi nal
f or mul at i on by σ/ σ
t
, i . e. , ( wher e σ i s an unknown const ant )
σ
σ
ε
σ
σ
β
σ
σ
β
σ
σ
β
σ
σ
t
t
t
tk
k
t
2 t
2
t
1
t
t
+
x
+ ... +
x
+ =
y
.
Not e t hat t he var i ance of t he t r ansf or med r andomdi st ur bance i s
gi ven by
2
t
t
2
t t
Var = Var( )
 σ 
ε σ
ε

σ
σ \ ¹
30 I V
σ
σ
σ σ
2
2
t
2
t
2
= =
and t he er r or s i n t he t r ansf or med r egr essi on, σε
t
/ σ
t
, sat i sf y
t he assumpt i ons ( A. 1)  ( A. 4) .
The cor r espondi ng transformation mat r i x i s gi ven by
1
2
3
n
1
0 0 0 0
1
0 0 0 0
1
0 0 0
T
0 0 0 0
1
0
 

σ



σ


= σ

σ







σ
\ ¹
K
L
L
M O M
L
Not e t hat :
2
1 1
1
2
2
2 2
2
n
n n
1 1
0 0
0
1 1
T T
0
1 1
0 0
   
σ σ
 
  σ
 

σ
 
σ σ 
′ Σ = σ σ
 

 


σ  
\ ¹
 
σ σ
\ ¹ \ ¹
O
O O
= σ
2
I
and t he t r ansf or med dat a mat r i ces ar e gi ven by:
31 I V
1
1
1
1
n
n
n n
1 y
0 0
y
0
y* Ty ,
y
1
0
y
  
 
σ
 
 σ
 

 
= σ = σ =

 

 

σ
 
\ ¹
σ
\ ¹ \ ¹
K
M
M M
M O
TX =
/
x
/
x
/ 1
. . .
. . .
. . .
/
x
... /
x
/ 1
= X*
n nk n 2 n n
1 k 1 1 12 1







¹

\

σ σ σ
σ σ σ
σ .
An appl i cat i on of l east squar es t o t he t r ansf or med dat a wi l l
yi el d MLE and BLUE of β. I t can be ver i f i ed t hat T' T = σ
2
Σ
 1
.
Not e:
I n t he GLS est i mat or t he mul t i pl i cat i ve const ant i n t he
t r ansf or mat i on mat r i x i s ar bi t r ar y and wi l l cancel out . I n
summar y, i f t he or i gi nal model i s y Xβ ε = + , and we appl y OLS
t o t he t r ansf or med model , we obt ai n
ˆ
T
β = ( X' T' TX)
 1
X' T' Ty
= ( Xσ
2
Σ
1
X)
1
X' σ
2
Σ
1
y
= ( X' Σ
 1
X)
 1
X' Σ
 1
y
=
∆
β = β
~
.
Thus when choosi ng a T mat r i x f or dat a t r ansf or mat i on, t he
unknown const ant σ need not be speci f i ed.
b. Estimation using Stata:
The command
vwls y X’s, sd(
t
σ )
32 I V
wi l l per f or mt he pr evi ousl y descr i bed est i mat i on and yi el d MLE.
The mai n pr obl emi s t o det er mi ne what t he
t
σ shoul d be.
4. Nature of Heteroskedasticity (σ
t
's) and estimation
The pr obl emof est i mat i ng t he σ
t
st i l l r emai ns and t her e i s not a
gener al sol ut i on whi ch wi l l wor k i n al l cases.
a. Sometimes σ
t
can be deduced from the model
( 1) y
t
= at + η
t
t = number of t osses of a coi n
y
t
= number of heads i n t t osses
E( y
t
) = at
Var ( η
t
) = npq = t ( 1/ 2) ( 1 1/ 2) = t / 4 =
2
t
σ
St at a Commands f or MLE ar e:
gen sig =t^.5
vwls y t,sd(sig)
The l east squar es est i mat i on of a i s gi ven by aˆ = Σt y
t
/ Σt
2
and t he MLE of a i s Σy
t
/ Σt = t ot al number of heads/ t ot al
number of t osses.
( 2) Combi nat i on of t i me ser i es and cr oss sect i onal dat a
( y
t
, X
t
) t i me ser i es obt ai ned by t aki ng
aver ages of cr oss sect i onal sampl es of si ze n
t
Let y
t
= a + bx
t
+ ε
t
be t he model , t hen an assumpt i on whi ch
mi ght be "r easonabl e" i s
Var ( y
t
) = Var ( ε
t
) = σ
2
/ n
t
The cor r espondi ng St at a commands f or MLE ar e
33 I V
gen sig = 1/
t
n ^.5
vwls y x, sd(sig)
b. Sometimes the researcher can analyze the behavior of the residuals and look
for trends
Tr y σ
2
t
= σ
2
x
t
or σ
2
t
= σ
2
x
2
t
.
I f σ
2
t
= σ
2
x
t
t hen use t he St at a commands
gen sig=x^.5
vwls y x, sd(sig)
Si mi l ar l y i f σ
2
t
= σ
2
x
2
t
, t hen use t he St at a commands
gen sig=x
vwls y x, sd(sig)
c. An example of Feasible GLS with multiple regressors (Wooldridge).
Consi der t he model y
t
= X
t
β + ε
t
wi t h
( )
2
t
X
t t t
Var X e
δ
σ ε = = .
Estimated or f easi bl e GLS ( BLUE) of t he unknown coef f i ci ent s i n t he
or i gi nal r egr essi on model can be obt ai ned as f ol l ows:
( 1) Regr ess y on t he X’s t o obt ai n t he est i mat ed r esi dual s ( e)
reg y X’s
34 I V
( 2) Regr ess t he nat ur al l ogar i t hmof t he squar ed OLS r esi dual s
on t he X’s and save t he pr edi ct ed val ues (
ˆ
t
X δ ) .
predict e, resid
gen Le2=ln(e*e)
reg Le2 X’s
predict xdelta,xb
gen sig=(exp(xdelta))^.5
Use t he cal cul at ed wei ght s (
( )
( )
.5
ˆ
t
X
t
e
δ
σ = ) t o per f or ma wei ght ed
l east squar es
vwls y X’s,sd(sig)
Al t er nat i ve assumpt i ons about t he nat ur e of het er oskedast i ci t y
coul d be used i n t hi s pr ocedur e.
5. Predictions
The best l i near unbi ased pr edi ct or s wi l l be gi ven by
( )
ˆ ˆ
n h n
Y Y h
+
= = X
n+h
∆
β
( see not es ( sect i on D. 5) ) .
F. Autocorrelation (Violation of A.4)
1. Introduction
One of t he most common vi ol at i ons of ( A. 1)  ( A. 5) wi t h t i me ser i es dat a i s
t he pr esence of aut ocor r el at ed r andomdi st ur bances i n r egr essi on model s.
Aut ocor r el at ed r andomdi st ur bances r ef er s t o t he pr obl emi n whi ch t he
er r or t er ms ar e not st at i st i cal l y i ndependent . When wor ki ng wi t h t i me
ser i es dat a, you shoul d be awar e of t he possi bi l i t y of what i s known as
t he spurious regression pr obl em. Thi s pr obl emcan ar i se when t he dependent
var i abl e ( y) and one or mor e of t he expl anat or y var i abl es ( say X) bot h
35 I V
exhi bi t a t r endi ng behavi or . I n t hi s si t uat i on, r egr essi ng y on X may
suggest a st at i st i cal l y si gni f i cant r el at i onshi p bet ween y and X, when
t hey ar e unr el at ed ( a spur i ous r egr essi on) and onl y appear r el at ed because
of a shar ed t r endi ng behavi or . One appr oach t o ci r cumvent i ng t hi s
si t uat i on i s t o i ncl ude “t” i n t he set of r egr essor s, e. g. ,
t 1 2 t 3 t
y X t = β +β +β + ε . I f t hi s i s t he cor r ect model and t he var i abl e t i s
del et ed f r omt he equat i on, t he r esul t ant est i mat or s of
1 2
and β β wi l l be
bi ased. The OLS est i mat e f or
2
β i s t he same as woul d ar i se f r om
r egr essi ng t he r esi dual s f r oma r egr essi on of y on t on t he r esi dual s
obt ai ned f r omr egr essi ng x on t .
Ti me ser i es r egr essi ons i n St at a r equi r e t he user t o desi gnat e t hat
t he ser i es i s a t i me ser i es by i ncl udi ng a command of t he f or mtsset t wher e
t i s a t i me var i abl e whi ch indexes t he dat a. Thi s can be cr eat ed wi t h t he
command gen t=_n.
The case of posi t i ve aut ocor r el at i on mi ght be depi ct ed as f ol l ows:
β
1
+ β
2
X
t
36 I V
Not e t hat posi t i ve r andomdi st ur bances t end t o be f ol l owed by posi t i ve
r andomdi st ur bances and negat i ve r andomdi st ur bances t end t o be f ol l owed
by negat i ve r andomdi st ur bances. Thus, we ar e f aced wi t h a si t uat i on i n
whi ch t he non di agonal el ement s of
( ) ( ) ( )
( ) ( )
( ) ( )
1 1 2 1 n
2 1 2
n 1 n
Var Cov , Cov ,
Cov , Var
Cov , Var
ε ε ε ε ε  

ε ε ε

Σ =



ε ε ε
\ ¹
L
M
M O
L
ar e nonzer o; t her ef or e Σ ≠ σ
2
I and t he l east squar es est i mat or s of β
agai n wi l l not equal t he MLE or BLUE of β and ar e t her ef or e not mi ni mum
var i ance est i mat or s.
Possi bl e causes of aut ocor r el at ed r andomdi st ur bances mi ght i ncl ude
del et i ng a r el evant var i abl e, sel ect i ng t he i ncor r ect f unct i onal f or m, or
t he model may be cor r ect l y speci f i ed, but t he er r or t er ms ar e cor r el at ed.
The mat r i x Σ cont ai ns
2
1) n(n+
=
2
1) n(n
+ n di st i nct el ement s. I n t he
cont ext of t he gener al i zed r egr essi on model , we l ack suf f i ci ent dat a t o
obt ai n separ at e i ndependent est i mat es f or each of t he Cov( ε
i
ε
j
) . I n or der
t o ci r cumvent t hi s pr obl emwe f r equent l y assume t hat t he ε
t
' s ar e r el at ed
i n such a manner t hat f ewer par amet er s descr i be t he pr ocess. One such
model whi ch pr ovi des an accur at e appr oxi mat i on i n many cases i s t he f i r st
or der aut or egr essi ve pr ocess
ε
t
= ρ ε
t  1
+ u
t
wher e t he u
t
ar e assumed t o be i ndependent l y and i dent i cal l y di st r i but ed
as N( 0, σ
2
u
) . Not e t hat t he u
t
sat i sf y assumpt i ons ( A. 1)  ( A. 4) . Based
upon t hi s f or mul at i on i t can be shown t hat E( ε
t
) = 0
37 I V
•
ρ
σ
σ ε ε
2
2
u 2
t
 1
= = ) Var(
• Cov( ε
t
, ε
t  s
) = ρ
s
σ
2
ε
= 0 <=> ρ = 0
• Cor r ( ε
t
, ε
t  s
) = ρ
s
Not e: ε
t
= ρ( ε
t  1
) + u
t
= ρ( ρε
t  2
+ u
t  1
) + u
t
= ρ
2
ε
t  2
+ ρu
t  1
+ u
t
= u
t
+ ρu
t  1
+ ρ
2
u
t  2
. . .
u
=
r t
r
0 = r
ρ
∑
∞
=> E( ε
t
) = 0 si nce E( u
t  r
) = 0 f or al l t and r
... + )
u
E( + )
u
E( + )
u
E( = ) E(
2
2 t
4
2
1 t
2
2
t
2
t
ρ ρ
ε
=
2
u
σ ( 1 + ρ
2
+ ρ
4
+ . . . )
= σ
2
u
/ ( 1  ρ
2
)
E( ε
t
ε
t  s
) = ...)]
u
+
u
+
u
x( ...)
u
+
u
+
u
E[(
2
2 s t 1 s t s t
2
2 t 1 t t
ρ ρ ρ ρ
= E {[ u
t
+ ρu
t  1
+ . . . ρ
s
( u
t  s
+ ρu
t  s 1
+ . . . ) ] ( u
t  s
+ ρu
t  s 1
. . . ) }
= ρ
s
E[ ( u
t  s
+ ρu
t  s 1
+ . . . )
2
]
( )
2
2
s
t
E ρ ε
−
=
= ρ
s
σ
2
ε
= ρ
s
σ
2
u
/ ( 1  ρ
2
) .
We obser ve t hat t he r andomdi st ur bances ε
t
ar e char act er i zed by const ant
var i ance ( homoskedast i ci t y) but ar e uncor r el at ed i f and onl y i f ρ = 0 i n
38 I V
whi ch case t he ε
t
= u
t
and assumpt i ons ( A. 1) and ( A. 4) ar e sat i sf i ed. We
al so not e t hat si nce
Cov( ε
t
, ε
t  1
) = E( ε
t
ε
t  1
) =
2
ε
ρσ , i . e. ,
we expect a gener al pat t er n of posi t i ve r andomdi st ur bances t o be f ol l owed
by posi t i ve r andomdi st ur bances and negat i ve val ues t o be f ol l owed by
negat i ve val ues i f ρ > 0. However , i f ρ < 0, we woul d gener al l y expect
t he si gns of t he r andomdi st ur bances t o al t er nat e.
Based upon t he assumpt i on t hat t he pr ocess ε
t
i s a f i r st or der
pr ocess, we can wr i t e t he associ at ed var i ance covar i ance mat r i x as
2 n 1
n 2
2
u 2 n 3
2
n 1 n 2 n 3
1
1
= . 1
1
1
−
−
−
− − −
  ρ ρ ρ

ρ ρ ρ

σ

∑ ρ ρ ρ
 ρ


ρ ρ ρ
\ ¹
L
L
L
M M M O M
L
Σ i s now compl et el y char act er i zed by t he t wo par amet er s ρ and
2
ε
σ =
2
u
2
1
σ
−ρ
and
t he est i mat i on pr obl emi s consi der abl y si mpl i f i ed.
A pl ot of cor r ( ε
t
, ε
t  s
) f or di f f er ent val ues of s i s r ef er r ed t o as
t he cor r el ogr amof t he pr ocess ε
t
. I f t he sampl e cor r el ogr am( gr aph of
est i mat ed cor r el at i on coef f i ci ent s) appear s
as
ρ
39 I V
ρ
2
0 1 2 s
We woul d i nt er pr et t hi s evi dence as bei ng consi st ent wi t h t he assumpt i on
of a f i r st  or der aut or egr essi ve pr ocess wi t h a posi t i ve ρ. The sampl e
cor r el ogr amcan be gener at ed wi t h t he Stata commands: r eg y x’s
pr edi ct e, r es
ac e, l ags( # of l ags)
We have shown t hat wi t hi n t he cont ext of a f i r st  or der aut or egr essi ve
model Σ = σ
2
I , i f and onl y i f ρ = 0. I t becomes i mpor t ant t o t est t he
hypot hesi s t hat ρ = 0.
A mor e gener al model f or t he di st ur bances i s an aut or egr essi ve movi ng
aver age ( ARMA( p, q) ) def i ned by
ε
t
 φ
1
ε
t  1
. . .  φ
p
ε
t  p
= u
t
 θ
1
u
t  1
. . .  θ
q
u
t  q
.
Thi s model wi l l be st udi ed i n mor e det ai l i n anot her sect i on. Not e t hat
t hi s speci f i cat i on i ncl udes t he f i r st or der aut or egr essi ve pr ocess as t he
f ol l owi ng speci al case
ARMA ( p = 1, q = 0) : ε
t
 φ
1
ε
t  1
= u
t
.
2. Tests for autocorrelation.
a. The right hand side variables are exogenous
Ther e ar e numer ous t est s f or t he pr esence of aut ocor r el at i on wher e t he
r i ght hand si de var i abl es ar e exogenous. Among t hese ar e ( 1) t he Dur bi n
Wat son t est , ( 2) t est s st r uct ur ed i n t er ms of an est i mat or of t he
cor r el at i on bet ween ε
t
and ε
t  1
, ( 3) Thei l  Nagar t est , ( 4) t he Von Neumann
r at i o, ( 5) t he Br eusch Godf r ey t est , ( 6) t he Lj ung Box t est , and ( 7) a
t est f or t he number of si gn changes i n t he est i mat ed r andomdi st ur bances
40 I V
( Runs t est ) . Of t hese t est s, t he Dur bi n Wat son t est st at i st i c i s pr obabl y
t he most wi del y used.
( 1) Dur bi n Wat son t est
The Dur bi n Wat son t est st at i st i c i s def i ned by
wher e e
t
denot es t he l east squar es est i mat or of t he r andom
di st ur bance ε
t
. Thi s expr essi on can be wr i t t en i n a usef ul
al t er nat i ve f or mby not i ng t hat
e
+
e e
2 
e
= )
e

e
(
2
1 t
n
2 = t
1 t t
n
2 = t
2
t
n
2 = t
2
1 t t
n
2 = t
∑ ∑ ∑ ∑
n n n
2 2 2 2
t t t t1 1 n
t =1 t =1 t =2
= +  2  
e e e e e e
∑ ∑ ∑
e

e

e e

e
2 =
2
n
2
1 1 t t
n
2 = t
2
t
n
1 = t


¹

\

∑ ∑
hence,
e
e

e

e e

e
2
= . W . D
2
t
n
1 = t
2
n
2
1 1 t t
n
2 = t
2
t
n
1 = t
∑
∑ ∑


¹

\

( )
n
t t1
2 2
1 n t =2
n n
2 2
t t
t =1 t =1
/
e e
+
e e
ˆ ˆ = 2(1 )  where =
/
e e
n
n
ρ ρ
∑
∑ ∑
so t hat D.W. 2(1  ˆ ρ) wi t h ˆ ρ denot i ng an est i mat or of ρ, t he
cor r el at i on bet ween
t1
t
and ε ε .
e
)
e

e
(
= . W . D
2
t
n
1 = t
2
1 t t
n
2 = t
∑
∑
41 I V
Fr omt hi s expr essi on we not e t hat i f ρ = 0, we woul d expect t o
have ˆ ρ "cl ose" t o zer o and t he val ue of D. W. cl ose t o t wo. Si nce
D. W. depends upon t he dat a, associ at ed conf i dence i nt er val s woul d be
dat a dependent . Some economet r i c pr ogr ams use t he dat a and cal cul at e
exact pvalues. To ci r cumvent t hi s pr obl em, Dur bi n and Wat son der i ved
t he di st r i but i on of t wo st at i st i cs L and U whi ch ar e i ndependent of
t he dat a and bound D. W. , L< D.W. <U . Tabul at ed cr i t i cal val ues f or
t he D. W. ar e based on L and U; hence, t he r epor t ed conf i dence
i nt er val s f or t he hypot hesi s ρ = 0 f or D. W. ( der i ved f r omconf i dence
i nt er val s f or t he bounds) may appear somewhat pecul i ar as i l l ust r at ed
by t he f ol l owi ng f i gur e.
42 I V
The val ues of d
L
and d
U
def i ne t he cr i t i cal r egi on and ar e
t abul at ed i n many t ext s accor di ng t o t he cr i t i cal l evel ( α l evel ) ,
sampl e si ze ( n) , and number of noni nt er cept ( sl ope) coef f i ci ent s i n
t he model ( k' ) . The t abl es have been ext ended t o cover addi t i onal
sampl e si zes and number of expl anat or y var i abl es by Savi n and Whi t e
[ Economet r i ca, 1977] .
The nul l hypot hesi s H
o
: ρ = 0 i s rejected i f
D. W. < d
L
or D. W. > 4  d
L
.
We fail to reject t he hypot hesi s i f
d
U
< D. W. < 4  d
U
,
and t he t est i s inconclusive i f
d
L
< D. W. < d
U
or 4  d
U
< D. W. < 4  d
L
.
Thi s t est i s not st r i ct l y appr opr i at e f or model s wi t h l agged
dependent var i abl es i ncl uded ( see Dur bi n, Economet r i ca, 1970) . The
D. W. t est does not t ake account of t he expl anat or y var i abl es, whi ch
r esul t s i n t he exi st ence of an “uncer t ai n r egi on. ” The St at a
commands t o cal cul at e t he D. W. st at i st i c ar e:
o reg lhs_var rhs_vars
o estat dwatson (performs a Durbin Watson test for
serial correlation)
o estat bgodfrey or
o estat bgodfrey, lags(1/4)
An exact D. W. t est whi ch t akes account of t he X' s and does not
i nvol ve an “uncer t ai n” r egi on i s avai l abl e i n some comput er
pr ogr ams. The Shazamcommand t o cal cul at e t he exact D. W. i s OLS y
x’s, DWPVALUE .
( 2) Wool dr i dge’s t  t est
43 I V
Wool dr i dge t est of
0
: 0 H ρ = , no aut ocor r el at i on, i s based on t est i ng
whet her l agged OLS er r or s have st at i st i cal l y si gni f i cant expl anat or y
power f or cur r ent er r or s. Thus, t he r egr essi on commands coul d be
reg y x’s
predict e, resid
reg e l.e
and a t or F st at i st i c i s used t o t est f or st at i st i cal si gni f i cance,
r ecogni zi ng t hat t hei r val i di t y i s based on asympt ot i c
di st r i but i ons. Thi s appr oach woul d not be val i d f or t he hypot hesi s
0
: 1 H ρ = because t he cor r espondi ng t  st at i st i c i s not di st r i but ed as
a t  st at i st i c. A Di ckey Ful l er t est coul d be used f or t hi s
hypot hesi s.
b. Tests in the presence of lagged dependent variables
( 1) Dur bi n’s h t est , def i ned by,
2
_y coefficient
1
2 1
lagged
DW n
h
ns
 
= −

−
\ ¹
~N[ 0, 1]
can be used t o t est f or t he pr esence of aut ocor r el at i on i n an
aut or egr essi ve model wi t h one l agged dependent var i abl e.
Dur bi n’s h t est can be per f or med i n St at a wi t h t he command
f ol l owi ng t he “r eg” command
. estat durbinalt
( 2) The Br eusch Godf r ey and Lj ung Box t est s can be modi f i ed t o
appl y t o aut or egr essi ve model s. For exampl e t he Br eusch Godf r ey
t est can be appl i ed by r egr essi ng t he OLS
t
' on the lagged y's and the lagged e '
t
e s s i mpl i ed by t he model
( aut or egr essi ve and number of aut or egr essi on or movi ng aver age
44 I V
er r or s) and t est i ng f or t he col l ect i ve expl anat or y power of t he
coef f i ci ent s of t he l agged er r or s usi ng an F t est .
3. Estimation
For appl i cat i ons i n whi ch t he hypot hesi s of no aut ocor r el at i on i s
r ej ect ed, we may want t o obt ai n maxi muml i kel i hood est i mat or s of t he
vect or β. These can be obt ai ned by pr oceedi ng i n t he same manner as i n
t he case of het er oskedast i ci t y, i . e. , we wi l l at t empt t o t r ansf or mt he
model so t hat t he t r ansf or med r andomdi st ur bances sat i sf y ( A. 1)  ( A. 4) and
t hen appl y l east squar es.
Consi der t he model
y
t
= X
t
β + ε
t
= β
1
+ β
2
x
t 2
+ . . . + β
k
x
t k
+ ε
t
wher e
ε
t
= ρε
t  1
+ u
t
t = 1, 2, . . . , n.
Repl aci ng t he t i n t he expr essi on f or y
t
by t  1 and mul t i pl yi ng by ρ we
obt ai n
ρy
t  1
= ρX
t  1
β + ρε
t  1
= β
1
ρ + β
2
ρx
t  1 2
+ . . . + β
K
ρx
t  1
k
+ ρε
t  1
Subt r act i ng ρy
t  1
f r omy
t
yi el ds
y
t
 ρy
t  1
= β
1
( 1 ρ) + β
2
( x
t 2
 ρx
t  1
2
) + . . . + β
k
( x
t k
 ρx
t  1
k
) + ε
t
 ρε
t  1
or y*
t
= β
1
( 1  ρ) + β
2
x
t 2
* + . . . + β
k
x
t k
* + u
t
t = 2, . . . , n
wher e y*
t
= y
t
 ρy
t  1
x
t i
* = x
t i
 ρx
t  1
i
t = 2, . . . , n, i = 2, . . . , k.
Not e t hat we have ( n  1) obser vat i ons on y
t
*, x
t i
*. The r andom
di st ur bance t er massoci at ed wi t h t he t r ansf or med equat i on sat i sf i es ( A. 1) 
( A. 4) . The t r ansf or med dat a mat r i ces ar e gi ven by
45 I V














¹

\










¹

\

ρ
ρ
ρ










¹

\

ρ
ρ
ρ
y
y
.
.
.
y
y
y
1  ... 0 0 0 0
. . .
. . .
. . .
0 0 ... 0 1  0
0 0 ... 0 0 1 
=
y  y
.
.
.
y  y
y  y
= y*
n
1 n
3
2
1
1 n n
2 3
1 2
( n 1) x 1 ( n 1) x n n x 1
= T
1
Y
and
2,2 1,2 2,k 1,k
3,2 2,2 3,k 2,k
n,2 n 1,2 n,k n 1,k
1 x x x x
1 x x x x
X*
1 x x x x
− −
−ρ −ρ −ρ  

−ρ −ρ −ρ

=



−ρ −ρ −ρ
\ ¹
L
L
M
L
= T
1
X
A common t echni que of est i mat i on i s t hen based upon appl yi ng l east squar es
t o
y* = X* β + u
or
y
t
 ρy
t  1
= β
1
( 1 ρ) + β
2
x
t 2
* + . . . + β
k
x
t k
* + u
t
t =
2, . . . , n
Sever al comment s need t o be made about t hi s appr oach. Fi r st , ρ i s
gener al l y not known and est i mat es of ρ wi l l need t o be used. Al so not e
t hat t he i nt er cept i n t he t r ansf or med equat i on i s β
1
( 1 ρ) , r at her t han
1
β ;
46 I V
hence, t he f i nal est i mat e of t he i nt er cept must be di vi ded by 1 ρ i n or der
t o r ecover an est i mat e of β
1
. Fi nal l y, we need t o ment i on t hat even i f ρ
i s known t hi s est i mat or of β wi l l not be i dent i cal l y equal t o t he MLE of
β because n 1 obser vat i ons ar e used r at her t han n obser vat i ons, i . e. , we
ar e not usi ng al l of t he sampl e i nf or mat i on i n t he est i mat i on. Thi s l ast
pr obl emcan be cor r ect ed and MLE of β can be obt ai ned by not i ng t hat
2 2 2
1 1
1
1 = 1 1 y
X
β ρ ρ ρ
ε
+
( ) ( )
2 2
12 1 2
= 1 + 1
X
β ρ β ρ ( ) ( )
ε
ρ ρ β
1
2
k 1
2
k
 1 +
X
 1 + ... +
wher e
2 2
2 2
1 u
1 ~ N[0, (1 ) = ]
ε
ρ ρ
ε σ σ
and t hen appl yi ng l east squar es t o t he t r ansf or med equat i on
y** = X** β + ε*
wher e
2
1
2 1
3 1
n n1
1 y
 y y
 y y
.
.
.
 y y
 
ρ

 ρ

ρ







ρ
\ ¹
= T
2
y
47 I V












¹

\

ρ ρ ρ
ρ ρ ρ
ρ ρ ρ
ρ ρ ρ
x

x
...
x

x
 1
. . .
. . .
. . .
x

x
...
x

x
 1
x

x
...
x

x
 1
x
 1
...
x
 1  1
= * X*
k 1 n nk 2 1 n 2 n
k 2 k 3 22 32
k 1 k 2 12 22
k 1
2
12
2 2
= T
2
X
=
2
1 0 0 0
1 0 0
0 1
0
0 0 1
 
−ρ

−ρ


−ρ




−ρ
\ ¹
L
L
O
M O O O
X.
The t r ansf or mat i on mat r i ces T
1
and T
2
ar e r el at ed by
2
2
1
1 0 0
T
T
 
−ρ
= 

\ ¹
L
Not e: ( 1) T
2
i s n x n wher eas T
1
i s n 1 x n; hence, y** i s n x 1 and y*
i s n 1 x 1.
( 2) I f al l n obser vat i ons ar e used, t hen a pr ogr ammust be used
whi ch suppr esses est i mat i on of an i nt er cept . Thi s i s because
t he f i r st col umn of X** cont ai ns di f f er ent el ement s.
( 3) I f onl y t he l ast n 1 obser vat i ons ar e used, t hen a r egr essi on
pr ogr amwhi ch est i mat es an i nt er cept can be used and t he
48 I V
est i mat e of β
1
can be r ecover ed by di vi di ng t he est i mat ed
i nt er cept by 1 ρ.
( 4) I n cases i n whi ch ρ i s known t he above pr ocedur es ar e
r el at i vel y st r ai ght f or war d. When ρ i s not known al t er nat i ve
t echni ques have been devel oped. A common t echni que can be
out l i ned as f ol l ows:
( a) Est i mat e y = Xβ + ε usi ng OLS t o obt ai n y = X
ˆ
β + e.
Obt ai n an est i mat e of ρ usi ng t he e vect or .
e
)
e e
(
= ˆ
2
t
n
1 = t
1 t t
n
2 = t
∑
∑
•
ρ
( b) Tr ansf or mt he dat a usi ng ˆ ρ i nst ead of ρ. T
1
or T
2
can be
used. St at a al l ows t he use of T
1
or T
2.
( c) Appl y l east squar es t o t he t r ansf or med dat a. The
associ at ed est i mat or s ar e r ef er r ed t o as t wo st age
est i mat or s. ( Don' t conf use t hese est i mat or s wi t h t wo
st age l east squar es whi ch wi l l be di scussed l at er ) .
( d) Maxi muml i kel i hood est i mat or s can be obt ai ned by usi ng t he
est i mat e of β det er mi ned i n t he l ast st ep, β*; cal cul at e
t he associ at ed er r or t er ms e* = y  Xβ*; cal cul at e a new
est i mat e of ρ i n t er ms of e*; t r ansf or mt he dat a ( y, X) ;
r eest i mat e β; r epeat t hi s pr ocess unt i l conver gence i s
achi eved.
Thi s pr ocess, whi l e concept ual l y si mpl e, woul d be t edi ous t o per f or m
by hand. The St at a, TSP, SAS and SHAZAMpr ogr ams have been wr i t t en t o
aut omat i cal l y per f or mt hi s i t er at i ve est i mat i on pr ocedur e.
The St at a “MLE” est i mat i on can be per f or med as f ol l ows:
• tsset “t ype i n t he name of a “t i me” var i abl e
49 I V
• prais depvar_rhs_vars ( per f or ms i t er at i ve MLE usi ng T
2
assumi ng an AR( 1) model )
• prais depvar rhs_vars, corc ( per f or ms i t er at i ve “MLE”
usi ng T
1
assumi ng an AR( 1) model )
• prais depvar rhs_vars, twostep ( st ops t he pr ai s est i mat i on
af t er t he f i r st st ep)
4. Unit roots and the DickeyFuller test
I n our di scussi on of est i mat i ng r egr essi on model s wi t h aut ocor r el at ed
di st ur bances we not ed t hat t he t r ansf or med r egr essi on model wi t h an AR( 1)
er r or ,
( )
1 1 t t t t t
y y X X u ρ ρ β
− −
− = − + ,
was char act er i zed by uncor r el at ed er r or s. Not e t hat t hi s model si mpl i f i es t o
t he r egul ar r egr essi on model wher e 0 ρ = wi t h OLS yi el di ng ef f i ci ent est i mat or s.
I n t he pr evi ous sect i on we di scussed sever al t est s of t he hypot hesi s
0
: 0 H ρ = and how MLE can be obt ai ned when t he nul l
hypot hesi s i s r ej ect ed.
Anot her hypot hesi s of i nt er est i s
0
: 1 H ρ = t o check f or what ar e r ef er r ed t o
as uni t r oot s. Not e
i n t hi s case t he t r ansf or med equat i on becomes
( )
1 1 t t t t t
y y X X u β
− −
− = − + ,
wi t h t he cor r espondi ng est i mat i on i nvol vi ng r egr essi ng changes i n y on changes
i n x. Regul ar t 
t est s can’t be used t o t est f or uni t r oot s. The Di ckey Ful l er t est i s
desi gned f or t hi s case. Si mpl e
Di ckey Ful l er t est s can be per f or med by est i mat i ng t he f ol l owi ng equat i ons and
t est i ng f or
st at i st i cal si gni f i cance of t he est i mat ed θ :
50 I V
( )
1 1 1
1 1
1 =
t t t t t
t t t t
y y y u y or
y y t y u
α ρ α θ
α δ θ
− − −
− −
− = + − + +
− = + + +
,
The nul l hypot hesi s
0
: 1 H ρ = i s r ej ect ed i f θ ’s t  st at i st i c i s l ess t han t he
cr i t i cal val ues r epor t ed i n
t he f ol l owi ng t abl es, r espect i vel y,
Si gni f i cance
l evel
1% 2. 5% 5% 10%
Cr i t i cal val ue  3. 43  3. 12  2. 86  2. 57
Si gni f i cance
l evel
1% 2. 5% 5% 10%
Cr i t i cal val ue  3. 96  3. 66  3. 41  3. 12
5. Predictions
The expr essi on obt ai ned by Gol dber ger f or t he best l i near unbi ased
pr edi ct or s i n t he case of AR( 1) er r or t er ms i s
yˆ
n+h
= X
n+h
∆
β
+ W' Σ
 1
e
wher e
n h 1
n+h
2
u
2
h
n+h
W E
1
+ −
  ε ε ρ  
σ 

′ = =


−ρ


ε ε ρ
\ ¹
\ ¹
M M
2
1
2
u 2
1 0 0 0 0
1 0 0 0
1
0 0 0 1
0 0 0 0 e 1
−
−ρ  

−ρ +ρ −ρ

 Σ =
σ

−ρ +ρ −ρ


−
\ ¹
L
L
M
L
L
51 I V
Ther ef or e,
yˆ
n+h
= X
n+h
∆
β + ρ
h
e
n
Thi s mi ght gr aphi cal l y be depi ct ed as:
Not e t hat as we at t empt t o f or ecast f ur t her i nt o t he f ut ur e, t he
adj ust ment f act or s, ρ
h
e
n
, appr oaches zer o and yˆ
n+h
appr oaches X
n+h
∆
β as
h → ∞.
X
n
X
n+1
t
X
∆
β
e
n
n 1 n
ˆ ˆ e e
+
= ρ
X
t
pr edi ct ed
val ue
52 I V
V. G. Panel Data: an introduction
Panel dat a r ef er s obser vat i onal dat a on i ndi vi dual s ( i , i = 1, 2, . . . m)
over t i me ( t =1, 2, . . ,
i
T ) ( t wo di mensi ons) and mi ght be denot ed as ( )
it
Y .
The panel dat a set i s r ef er r ed t o as bal anced i f ever y i ndi vi dual i s
obser ved f or ever y poi nt of t i me,
1 2
. . .
m
T T T T = = = =
. Ot her wi se, t he
panel dat a set i s r ef er r ed t o as unbal anced. Obser vat i ons f or a gi ven
i ndi vi dual over t i me ar e t i me ser i es; wher eas, cr oss sect i onal dat a ar e
obser vat i ons f or di f f er ent i ndi vi dual s at a gi ven poi nt i n t i me. I n many
appl i cat i ons, t he dat a ar e f or shor t per i ods of t i me, but i ncl ude many
i ndi vi dual s.
1. OLS and GLS (generalized least squares)
Model s f or panel dat a t ake a number of di f f er ent f or ms. Per haps t he
si mpl est r epr esent at i on i s gi ven by
it it it
Y X β ε = +
(1)
wher e
it
X denotes a 1xk vect or of obser vat i ons on k exogenous var i abl es
f or t he
th
i i ndi vi dual at t he
th
t t i me per i od and wher e t he mar gi nal
i mpact of t he X’s on Y i s assumed const ant over i ndi vi dual s and t i me
( i ncl udi ng t he i nt er cept ) . Thi s speci f i cat i on i s somet i mes cal l ed t he
pooled model. Let t he model be r ewr i t t en i n mat r i x f or mas
1 1 1
2 2 2
. . .
. . .
m m m
y X
y X
y X
ε
ε
β
ε
( ( (
( ( (
( ( (
( ( ( = +
( ( (
( ( (
( ( (
¸ ¸ ¸ ¸ ¸ ¸
53 I V
or
Y Xβ ε = +
OLS est i mat es of β , ( )
1
ˆ
' ' X X X Y β
−
= , can be obt ai ned wi t h t he
command
reg y x’s or
reg y x’s, vce(robust, bootstrap, or jackknife)
Recal l , t hat i n t he pr esence of het er oskedast i ci t y and/ or aut ocor r el t i on
GLS ( gener al i zed l east squar es est i mat or s) can pr ovi de mor e ef f i ci ent
est i mat or s t han OLS. The f or mul as f or t he GLS est i mat or s and
cor r espondi ng var i ance covar i ance mat r i x ar e gi ven by
( )
( ) ( )
1
1 1
1
1
' '
'
X X X Y
Var X X
β
β
−
− −
−
−
= Ω Ω
= Ω
%
%
wher e ( )
Var ε = Ω ,
i i
mxm T xT
I Ω=Σ ⊗
,
i
T m ≥
.
I n or der t o obt ai n GLS ( gener al i zed l east squar es) est i mat or s,
si mpl i f yi ng assumpt i ons about t he var i ance of , , ε Ω need t o be made and
t he nat ur e of t he l ongi t udi nal / panel dat a must be pr ovi ded t o St at a wi t h
t he “xtset” command as f ol l ows:
xtset panel_var or
xtset panel_var time_var
t o i ndi cat e t hat panel dat a ar e bei ng used wher e panel_var denot es t he
i ndi vi dual i dent i f i cat i on code or gr oup var i abl e and time_var i s an
i ndex whi ch r epr esent s t he t i me var i abl e whi ch def i nes t he panel s bei ng
used. Thi s i s si mi l ar t o usi ng “t sset time_variable” t o al er t St at a t hat
t i me ser i es ar e bei ng used.
The “xt gl s” command can be used t o obt ai n var i ous gener al i zed
54 I V
l east squar es est i mat or s of β , dependi ng on t he f or mof t he var i ance
covar i ance of t he er r or t er m.
I f t her e i s het er oskedast i ci t y acr oss panel s,
2
1
2
2
2
0 . . . 0
0 . . . 0
. . . .
. . . .
. . . .
0 0 . . .
m
I
I
I
σ
σ
σ
(
(
(
(
Ω =
(
(
(
(
(
¸ ¸
,
cor r espondi ng GLS est i mat or s can be obt ai ned usi ng t he command
xtgls y x’s, panels(hetero)
I f t her e i s cor r el at i on acr oss panel s ( cr oss sect i onal cor r el at i on) of
t he f or m
2
1 1,2 1,
2
2,1 2 2,
2
,1 ,2
. . .
. . .
. . . .
. . . .
. . . .
. . .
m
m
m m m
I I I
I I I
I I I
σ σ σ
σ σ σ
σ σ σ
(
(
(
(
Ω= (
(
(
(
(
¸ ¸
,
t he GLS est i mat or i s obt ai ned wi t h t he command ( t hi s can onl y be appl i ed
t o bal anced panel s)
xtgls y x’s, panels(correlated)
The command
xtgls y x’s, igls
i t er at es t he gener al i zed l east squar es pr ocedur e unt i l conver gence i s
55 I V
obt ai ned.
St at a al l ows f or aut ocor r el at i on wi t hi n t he panel s. The St at a
manual , ( Logni t udi nal / Panel Dat a, ver si on 10, p. 150) st at es t hat t hr ee
opt i ons ar e al l owed: ” cor r ( i ndependent ) or no aut ocor r el at i on, cor r ( ar 1)
( ser i al cor r el at i on wher e t he cor r el at i on par amet er i s common f or al l
panel s) , or cor r ( psar 1) ( ser i al cor r el at i on wher e t he cor r el at i on
par amet er i s uni que f or each panel ) . ” A coupl e of obser vat i ons ar e i n
or der : ( 1) xt gl s y X’s, panel s( i i d) cor r ( i ndependent ) i s equi val ent t o
r egr ess y X’s; ( 2) when cor r ( ar 1) or cor r ( psar 1) ar e speci f i ed t he
i t er at ed GLS est i mat or does not conver ge t o t he MLE.
Some exampl es and var i at i ons i ncl ude:
xtgls y x’s, panel(hetero)
xtgls y x’s, panels(correlated)
xtgls y x’s, panels(correlated) igls
xtgls y x’s, panels(hetero) corr(ar1)
xtgls y x’s,panels(iid) corr(psar1)
Testing for heteroskedasticity.
A l i kel i hood r at i o t est f or het er oskedast i ci t y acr oss panel s can be
per f or med by compar i ng t he l og l i kel i hood val ues of MLE of t he
r egr essi on model wi t h and wi t hout het er oskedast i ci t y as f ol l ows:
xt gl s y x’s, i gl s panel s( het er o)
est i mat es st or e het er o
xt gl s y x’s
l ocal df =e( N_m)  1 ( t he number of panel s or gr oups –
1)
l r t est het er o . , df ( ` df ’)
Testing for autocorrelation.
Wool dr i dge ( Economet r i c Anal ysi s of Cr oss Sect i on and Panel Dat a,
2002, 282 283) out l i nes a t est f or aut ocor r el at i on i n panel  dat a model s.
Davi d Dr ukker has wr i t t en a downl oadabl e pr ogr amt o per f or mt o per f or m
56 I V
t hi s t est .
findit xtserial
net sj 32 st0039 (or click on st0039)
net install st0039 (or click on click here to install)
xtserial y x’s
The underlying null hypothesis is no autocorrelation, so a significant value of the
test statistic provides evidence of autocorrelation.
2. Fixed and random effects specifications
The f i xed and r andomef f ect s r epr esent at i ons ar e a l i t t l e di f f er ent
t han t he f or mj ust consi der ed i n t hat t hey al l ow panel s t o have
di f f er ent i nt er cept s. I n par t i cul ar , t hey can be r epr esent ed as:
it it i it
Y X = β+ α + ε
( 2)
wher e t he mar gi nal i mpact of changes i n t he X’s ar e st i l l assumed t o be
const ant acr oss i ndi vi dual s, i . e. t he β ‘s ar e t he same f or each
i ndi vi dual . The onl y di f f er ence i n t he r el at i onshi p acr oss f i r ms i s i n
t he i nt er cept t er m. I n f i xed ef f ect s ( f e) model s t he
i
α ar e unknown
const ant s and i n r andomef f ect s model s ( r e) model s t he
i
α ar e r andom.
OLS can be used t o est i mat e t he unknown par amet er s i n t he f i xed ef f ect s
f or mwi t h bi nar y var i abl es bei ng added t o t he set of exogenous var i abl es
t o denot e t he i ndi vi dual .
St at a uses a sl i ght var i at i on on t hi s f or mul at i on i n est i mat i on
i i
v α α = +
57 I V
wher e t he
i
v ar e est i mat ed such t hat 0
i
i
v =
∑
;
hence,
it it i it
Y X =α+ β+ ν +ε
.
( 3)
Consi der t aki ng t he f ol l owi ng aver ages of ( 3) :
i i i i
(4) (average over i) y = x
y = x (5) (average over i & t)
α+ β+ν +ε
α+ β+ν+ ε
w
Combi ni ng equat i ons ( 3) and ( 4) , ( 3) , ( 4) and ( 5) , r espect i vel y,
enabl es us t o wr i t e
( ) ( )
i i i it it it
y x Y X
− − − ε = β+ ε
( 6)
( ) ( )
i i i
it it i it
y x (7) Y y X x
− + − + − ε + = α + β + ν + ε ν + ε
w
STATA’s fixed effects ( within) est i mat i on pr ocedur e, xtreg y x’s, fe,
cor r esponds t o est i mat i ng β i n equat i on ( 6) or equat i on ( 7) as
addi ng i n t he over al l mean of y has no i mpact on t he est i mat es of β .
Thr ee
2
' R s ar e r epor t ed:
Within:
2
R f r omt he mean devi at i on r egr essi on, equat i on ( 6)
2 2
ˆ
( , )
Between i i
R corr x y β = ,
2
R f r omr egr essi ng
i
on x
i
y
Overall:
2 2 2
ˆ
( , )
Overall it it
R corr x y β = ,
2
R f r omr egr essi ng
on
it it
y X
, pool ed
r egr essi on
Least squar es est i mat i on wi t h a dummy var i abl e ( LSDV) f or t he
di f f er ent i nt er cept s i s equi val ent t o r unni ng a f i xed ef f ect s
58 I V
r egr essi on. The hypot hesi s t hat t her e i s no het er ogenei t y i n t he f i xed
ef f ect s or t hat t he gr ouped ef f ect s ar e al l t he same, ( )
0, all i
i
for ν = ,
can be t est ed usi ng a Chow Test by compar i ng t he pool ed and LSDV
r egr essi ons as f ol l ows:
( )
( )
( )
( )
2 2
LSDV Pooled
2
LSDV
R R m 1
F m 1 mT m K
1 R mT m K
/( )
,
/
(
− −
( − − − =
− − − (
¸ ¸
wher e m= number of gr oups and T = l engt h of t i me ser i es.
St at a’s between effects est i mat or s can be obt ai ned by est i mat i ng
equat i on ( 4) usi ng t he St at a command, xtreg y x’s, be. The same t he
2
' R
s
r epor t ed wi t h f i xed ef f ect s est i mat i on ar e r epor t ed f or t he bet ween
ef f ect s wi t h t he
2
Between
R cor r espondi ng t o t he f i t t ed model wi t h t hi s
est i mat i on pr ocedur e.
I n t he random effects model t he
i
ν
i n t he r egr essi on model
it it i it
y X α β ν ε = + + +
ar e assumed t o be di st r i but ed i dent i cal l y and i ndependent l y wi t h mean
zer o and const ant var i ance. The t er m
( )
i it
ν ε +
can be t hought of as a
composi t e er r or t er mwi t h
( )
2 2
.
( ) = and Var +
i
i i T u T T m
Var I i i I
ε
α ε σ σ α ε + = + Σ = Ω = ⊗Σ
GLS i s t hen appl i ed t o obt ai n t he desi r ed est i mat or s usi ng t he command,
xtreg y x’s, re.
I f t he
i
ν ar e uncor r el at ed wi t h t he expl anat or y var i abl es, t hen
r andomef f ect s est i mat or s wi l l be ef f i ci ent , ot her wi se t hey wi l l be
i nconsi st ent .
The f i xed ef f ect s est i mat or i s appr opr i at e whet her t he dat a ar e
gener at ed by a f i xed ef f ect s model or a r andomef f ect s model ; however ,
59 I V
i t i s mer el y l ess ef f i ci ent t han t he r andomef f ect s est i mat or i f t he
dat a gener at i ng pr ocess i s a r andomef f ect s model . However , i f t he dat a
gener at i ng pr ocess i s a f i xed ef f ect s model , r andomef f ect s est i mat or s
wi l l yi el d i nconsi st ent est i mat or s. A Hausman t est can be used t o t est
t he nul l hypot hesi s t hat t he dat a ar e gener at ed by a f i xed ef f ect s
model .
I n summar y, t he St at a commands f or est i mat i ng f i xed ( wi t hi n) ,
bet ween, and r andomef f ect s model s, r espect i vel y, ar e gi ven by
xtset panel_var or xtset panel_var time_var
xtreg y x’s, fe
xtreg y x’s, be
xtreg y x’s, re
A Hausman test of the null hypothesis of fixed vs. random effects can be
performed using the commands:
xtreg y x’s, fe
est store fixed
xtreg y x’s, re
est store random
hausman fixed random
Some comments:
( 1) The command “xt r egar y x’s, r e or f e”can be used t o est i mat e r andom
or f i xed ef f ect s ef f ect s model s when t he er r or t er mi s char act er i zed by
a f i r st or der aut or egr essi ve pr ocess.
( 2) Numer ous var i at i ons ar e possi bl e, e. g. , consi der
it it i t it
Y X =α+ β+ ν +γ +ε
whi ch al l ows f or cr oss sect i onal ef f ect s and t i me cont r ast s.
( 3) xt sum[ var l i st ] [ i f ] [ , i ( var name_i ) ] xt sum, i s a gener al i zat i on of
60 I V
summar i ze, r epor t s means and st andar d devi at i ons f or cr oss sect i onal
t i me ser i es ( xt ) dat a; i t di f f er s f r omsummar i ze i n t hat i t
decomposes t he st andar d devi at i on i nt o bet ween and wi t hi n component s.
( 4) A speci al edi t i on of t he Jour nal of Economet r i cs ( edi t i t ed by
Bal t agi , Kel ej i an, and
Pr ucha( 140, 2007) f ocuses on an anal ysi s of spat i al l y dependent dat a
di scusses r el at ed i ssues of i dent i f i cat i on, est i mat i on, and t est i ng.
61 I V
V. H. Stochastic Independent Variables
1. Introductory Remarks:
Whi l e t hi s assumpt i on i s l i st ed l ast , i t may be t he most i mpor t ant
of t he under l yi ng assumpt i ons because OLS est i mat or s wi l l be bot h
bi ased and i nconsi st ent i f t he expl anat or y var i abl es ar e
cor r el at ed wi t h t he er r or t er ms.
Fur t her mor e, t hi s assumpt i on wi l l gener al l y be vi ol at ed i f t he
speci f i ed model i ncl udes a r i ght hand si de dependent var i abl e
( endogenous r egr essor ) whi ch i s qui t e common i n economi c model i ng.
I n t hi s sect i on we wi l l consi der a si mpl e macr o model whi ch
i ncl udes an endogenous r egr essor , i l l ust r at e how consi st ent
est i mat or s can be obt ai ned, and f i nal l y f or mal l y out l i ne why a
cor r el at i on bet ween t he X’s and t he er r or s l eads t o bi ased and
i nconsi st ent est i mat or s.
2. A simple example
The case of endogenous r egr essor s i s a common exampl e of
st ochast i c r egr essor s i n economi c model s. For exampl e, consi der
t he si mpl e macr oeconomi c st r uct ur al model consi st i ng of a
consumpt i on f unct i on and an account i ng i dent i t y:
C
t
= α + βY
t
+ ε
t
Y
t
= C
t
+ Z
t
I n t hi s model , t he t wo dependent var i abl es ar e C and Y, t hus Y i s
an endogenous r egr essor i n t he consumpt i on f unct i on. The OLS
est i mat or s of t he unknown par amet er s i n t he consumpt i on f unct i on
ar e gi ven as f ol l ows:
ˆ α = C 
ˆ
β Y
( )
( )
t t
2
t
,
(  Y)(  C)
C Y
ˆ
=
(  Y)
Y
Cov Y C
Var Y
β
∑
=
∑
Sol vi ng t he st r uct ur al model f or t he r educed f or mgi ves
t
t t
+ +
C Z
1 1 1
α β
ε
=
β β β
t
t t
+ +
Y Z
1 1 1
α β
ε
=
β β β
Not e: Y
t
and ε
t
ar e not i ndependent si nce cov ( Y
t
, ε
t
) =
β
σ
 1
2
as can
seen by not i ng
62 I V
E ( ( Y
t
 E( Y
t
) ) ( ε
t
 E ( ε
t
) ) )
( )


¹

\

ε


¹

\

β
ε
t
t
 1
E =
2
2
t
= E( ) /1 = 0.
1
σ
β ≠
ε
β
Fur t her mor e, we can show t hat
σ σ
σ
β
β
β
2 2
Z
2
OLS
+
)  (1
+ =
ˆ
plim
This is an example of the simultaneous equation problem where least squares
are biased and inconsistent.
3. Estimation, tests, and statistical inference
Sever al est i mat i on appr oaches t o ci r cumvent i ng t hi s pr obl emar e
avai l abl e and wi l l be di scussed i n mor e det ai l i n anot her sect i on.
Two common est i mat or s whi ch yi el d consi st ent est i mat or s ar e t wo st age
l east squar es and i nst r ument al var i abl es. The St at a f or mat f or t he
t wo st age l east squar es est i mat or i s
i vr egr ess 2sl s l hs_dep_var ( r hs_dep_var s=i nst r ument s) r hs_i nd_var s
wher e l hs_dep_var denot es t he l ef t hand si de dependent var i abl e,
r hs_dep_var s t he r i ght hand si de dependent var i abl es or endogenous
r egr essor s, and r hs_i nd_var i abl es r epr esent s t he r i ght hand si de
i ndependent var i abl es. The i nst r ument al var i abl es, or i nst r ument s,
ar e var i abl es whi ch ar e assumed t o be ( 1) cor r el at ed wi t h t he
endogenous r egr essor ( s) and ( 2) i ndependent of t he er r or t er m. Ther e
needs t o be at l east as many i nst r ument s as endogenous r egr essor s.
An F or t  t est can be appl i ed t o a r egr essi on of t he endogenous
r egr essor ( s) on t he i ndependent var i abl es and i nst r ument s t o t est
whet her t he i nst r ument al var i abl es ar e si gni f i cant l y cor r el at ed wi t h
t he endogenous r egr essor . Thi s can be per f or med wi t h St at a’s reg
command as
reg rhs_dep_var instruments rhs_ind_vars
or by addi ng t he opt i on first t o t he ivregress as
I V 43
ivregress 2sls lhs_dep_var (rhs_dep_vars=instruments) rhs_ind_vars,first
A compar i son of t he i nst r ument al var i abl es ( 2SLS) and OLS est i mat es
obt ai ned f r omt he command
reg lhs_dep_var rhs_vars,
pr ovi des t he basi s f or t est i ng whet her t he r i ght hand si de endogenous
var i abl e i s cor r el at ed wi t h t he er r or t er m. These t est s can be
i mpl ement ed usi ng ei t her a Hausman or Wool r i dge t est as f ol l ows:
Hausman test: Est i mat e t he equat i on usi ng OLS and 2sl s
( al t er nat i ves can be used) . Then check f or st at i st i cal
di f f er ences bet ween t he t wo est i mat or s usi ng a Hausman t est .
r eg l hs_var r hs_var s
est st or e OLS
i vr egr ess 2sl s l hs_dep_var ( r hs_dep_var s=i nst r ument s)
r hs_i nd_var s
est st or e 2sl s
hausman 2sl s ol s
Wooldridge test: Regr ess t he r i ght hand si de endogenous var i abl es
i n a r egr esson model on al l of t he exogenous var i abl es ( t hose i n
t he r egr essi on model and t he i nst r ument al var i abl es) and save t he
cor r espondi ng r esi dual s. Est i mat e t he or i gi nal r egr essi on model
wi t h t he est i mat ed r esi dual s i ncl uded as r egr essor s. Test t he
st at i st i cal si gni f i cance of t he coef f i ci ent s of t he r esi dual s.
The est i mat ed coef f i ci ent s of t he or i gi nal var i abl es shoul d be
i dent i cal t o t he 2SLS est i mat es.
Appl yi ng t hese met hods t o t he si mpl e consumpt i on f unct i on can
be accompl i shed wi t h t he St at a commands
r eg c y OLS est i mat es of t he consumpt i on
f unct i on
est st or e OLS
pr edi ct e, r esi d
i vr egr ess 2sl s c ( Y=z) 2sl s est i mat es of t he
consumpt i on f unct i on
est st or e 2sl s
hausman 2sl s OLS Per f or ms a Hausman t est
r eg c y e Per f or ms a Wool dr i dge t est
I V 44
The st at i st i cal si gni f i cance of t he coef f i ci ent s of t he
r esi dual s woul d be t est ed usi ng a chi squar e, F or t  t est .
Not e t hese di st r i but i ons ar e asympt ot i c and woul d not be
expect ed t o be exact f or f i ni t e sampl es.
4. Formal analysis
Assumpt i on A. 5 i n t he st andar d model st at es:
( a) X
t
i s nonst ochast i c.
( b) Val ues of X ar e f i xed i n r epeat ed sampl es.
( c)
XX
n
1
(X, X) =
limit
n →∞
∑
i s f i ni t e and nonsi ngul ar .
Assumpt i ons ( a b) ar e pr i mar i l y of t heor et i cal i nt er est si nce, at
l east wi t h economi c dat a, we can r ar el y “dr aw” t he same set of X' s
or sel ect a pr edet er mi ned val ue f or X. These assumpt i ons, ( A. 5 a
c) , pr ovi de a r el at i vel y si mpl e basi s t o begi n our anal ysi s of
r egr essi on t heor y. Assumpt i on ( c) i s usef ul i n pr ovi ng
consi st ency of L. S. est i mat or s.
a. Case 1 of relaxing (A.5)
( A. 5) ' ( a) X
t
i s st ochast i c
( b) X
t
and ε
t
ar e st ochast i cal l y i ndependent .
©
XX
n
1
(X, X) =
limit
n →∞
∑
i s f i ni t e and nonsi ngul ar .
ˆ
β = ( X’X)
 1
X’y = ( X’X)
 1
X’( Xβ + ε)
= β + ( X’X)
 1
X’ε
E(
ˆ
β ) = β + E( X’X)
 1
X’E( ε)
I V 45
= β, t her ef or e
ˆ
β i s unbi ased.
Var (
ˆ
β ) = E(
ˆ
β  β) (
ˆ
β  β) ’ = E{( X’X)
 1
X’εε’X( X’X)
 1
}
= σ
2
E( X’X)
 1
Rel axi ng t he assumpt i on t hat X i s nonst ochast i c and r epl aci ng i t
wi t h t he assumpt i on t hat X i s st ochast i c and i ndependent of ε does
not al t er t he desi r abl e unbi asedness and consi st ency pr oper t i es of
OLS.
b. Case 2 of relaxing (A.5)
( A. 5) ' ' ( a) X
t
i s st ochast i c.
( b) X
t
and ε
t
ar e st ochast i cal l y dependent and cov ( X
t
, ε
t
) ≠ 0
( c)
XX
n
1
(X' X) =
limit
n →∞
∑
i s f i ni t e and nonsi ngul ar .
E(
ˆ
β ) = β + E{( X' X)
 1
X' ε}
≠ β;
Ther ef or e, t he l east squar es est i mat or i s biased.
pl i m(
ˆ
β ) = β + pl i m( X' X)
 1
X' ε
) Cov(X + =
1
XX
ε
∑
β
≠ β t her ef or e inconsistent.
Thus, i t i s t he cor r el at i on bet ween t he r egr essor s and er r or s
whi ch l eads t o est i mat or bi as and i nconsi st ency.
I V
51
IV. I. Errors of Measurement
An assumpt i on whi ch has been made i n t he devel opment t o t hi s poi nt i s
t hat t he i ndependent and dependent var i abl es cont ai ned i n our hypot hesi zed
f or mul at i ons ar e measur ed wi t hout er r or . I n many cases t hi s i s ext r emel y
unr eal i st i c. I f t he i ndependent and dependent var i abl es ar e measur ed wi t h
er r or , t hen t he l east squar es est i mat or s need not possess t he desi r abl e
st at i st i cal pr oper t i es di scussed ear l i er .
l. Theoretical Development
Assume t hat t he r el at i onshi p
( 1) y = Xβ + ε
wher e ε ~ N( 0, Σ = σ
2
I )
i s hypot hesi zed t o hol d wher e y and X r epr esent "t r ue" val ues.
Al so assume t hat y and X ar e measur ed wi t h er r or as y* and X*,
r espect i vel y, wher e
( 2. a) y* = y + u u ~ N( O, Σ
u
)
( 2. b) X* = X + V V ~ N( O, Σ
v
)
and t he measur ement er r or s u and V ar e i ndependent .
Maki ng use of ( 2) we can r ewr i t e ( 1) i n t er ms of observed variables, y*,
X*.
y*  u = ( X*  V) β + ε
( 3) y* = X*β + ε + u  Vβ
( 3) ' y* = X*β + η
wher e η = u  Vβ.
Appl yi ng l east squar es t echni ques t o ( 3) ' yi el ds
( 4)
ˆ
β = ( X*' X*)
 1
X*' y*
or
( 4) '
ˆ
β = [ X' X + V' X + X' V + V' V]
 1
[ X' y + X' u + V' y + V' u]
I s t hi s est i mat or unbi ased and consi st ent ?
I V
52
Fr om( 4) ' we can wr i t e
1
X X VX X V V V Xy X u V y Vu
ˆ
= + + + + + +
n n n n n n n n
′ ′ ′ ′ ′ ′ ′ ′
   
β
 
\ ¹ \ ¹
and si nce X' y = X' Xβ + X' ε we can use Sl ut sky' s t heor emt o obt ai n
( ) ( )
1
XX VX XV VV XX X Xu Vy Vu
n
ˆ
= + + + + + + + plim
ε
→∞
β β ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑ ∑
si nce Σ
XV
= Σ
VX
= 0, Σ
Vy
= Σ
Vu
= 0, Σ
Xε
= 0, and Σ
Xu
= 0.
Note: (1) As long as the independent variables are measured with error (Σ ΣΣ Σ
vv
≠ ≠≠ ≠ 0) ,
the least squares estimator of β is inconsistent
( )
( )
1
XX VV XX
1
1
= +
XX VV
I
β
β
−
−
∑ ∑ ∑
= + Σ Σ
f(
ˆ
β ββ β
i
)
I V
53
(2) If the dependent variable is measured with error, but the independent
variables are "error free" (Σ ΣΣ Σ
VV
= 0), (3) can be rewritten as
y* = X*β + ε + u
η
wher e η wi l l sat i sf y ( A. 1)  ( A. 4) as l ong as ε and u do.
Not e: I n t hi s case
n
plim
→∞
ˆ
β = β and, gi ven t he X' s, t he associ at ed
ˆ
β wi l l be unbi ased, mi ni mumvar i ance, ef f i ci ent , asympt ot i cal l y
unbi ased, consi st ent , and asympt ot i cal l y ef f i ci ent . I t shoul d be
not ed t hat t he var i ance of η
t
, σ
2
η
= σ
2
ε
+ σ
2
u
wi l l be l ar ger t han i f
t he dependent var i abl e was measur ed wi t hout er r or .
2. An Example. M. Friedman suggested that consumption and income can be partitioned
into "permanent" and "transitory" components as follows:
c = c
p
+ c
T
y = y
p
+ y
T
He al so suggest s t hat t he "per manent " consumpt i on f unct i on i s of t he f or m
c
p
= ky
p
+ ε
T
I f t he "per manent " mar gi nal pr opensi t y t o consume, k, i s est i mat ed usi ng
l east squar es appl i ed t o c and y dat a, we have an exampl e of an er r or i n
var i abl es model and our r esul t ant est i mat e of k wi l l , i n t he l i mi t as
n→∞, pr ovi de an under est i mat e of t he "t r ue" k.
) y + y (
) y + y )(
c
+
c
(
=
y
cy
= k
ˆ
2
T p
T p
T p
2
∑
∑
∑
∑
) y + y (
) y + y )(
c
+ + ky (
=
2
T p
T p
T T
p
∑
ε
∑
y + y y 2 + y
y
c
+ y
c
+ y + y + ) y y ( k + y k
=
2
T T p
2
p
T
T
p
T
T
T
p
T
T p
2
p
∑ ∑ ∑
∑ ∑
ε
∑
ε
∑ ∑ ∑
I V
54
p
p T
2
y
2 2
n
y y
k
ˆ
k = plim
+ →∞
σ
σ σ
wher e σ
2
y
p
and σ
2
y
T
r espect i vel y, denot e Var ( y
p
) and Var ( y
T
) .
3. Estimation. (Σ ΣΣ Σ
VV
≠ ≠≠ ≠ 0)
a. Met hod of i nst r ument al var i abl es.
Sel ect z
t i
' s whi ch ar e uncor r el at ed wi t h t he measur ement er r or s and
ar e cor r el at ed wi t h t he x
t i
' s.
y = Xβ + ε
( )
( )
( )
1
1 1
( )
ˆ
= ' ' ' ' ' '
Z
X Z Z Z Z X X Z Z Z Z Y
β
−
− −
wi l l be a consi st ent est i mat e of
ˆ
β
I V
55
IV. J. Specification Error
A speci f i cat i on er r or i s sai d t o have occur r ed whenever a r egr essi on
equat i on or under l yi ng assumpt i on i s i ncor r ect . Speci f i cat i on er r or s can
t ake many f or ms:
( 1) del et i ng a "r el evant " var i abl e,
( 2) i ncl udi ng an "i r r el evant " var i abl e,
( 3) usi ng an i ncor r ect f unct i onal f or m, or
( 4) speci f yi ng an i ncor r ect descr i pt i on of t he popul at i on f r omwhi ch
t he di st ur bance was dr awn.
For someone t o cl ai mt hat a speci f i cat i on er r or has been made car r i es
wi t h i t some suggest i on t hat t he i ndi vi dual knows what t he "t r ue" model i s
l i ke. Speci f i cat i on er r or s i nvol vi ng quest i ons about f unct i onal f or mor
t he er r or di st r i but i on have al r eady been di scussed. We now consi der t he
consequence of ( 1) del et i ng a r el evant var i abl e and ( 2) i ncl udi ng an
i r r el evant var i abl e.
1. Example. Deletion of "relevant" variables
Tr ue Model : y
t
= β
1
+ β
2
x
t 2
+ . . . + β
k
1
x
t k
1
+ . . . β
k
x
t k
+ ε
t
( 1) y = X
I
β
I
+ X
I I
β
I I
+ ε
Hypot hesi zed Model :
( 2) y = X
I
β
I
+ η [ Not e: η = ε + X
I I
β
I I
]
An appl i cat i on of l east squar es t o ( 2) yi el ds
( 3)
ˆ
β
I
= [ X
I
' X
I
]
 1
X
I
' y
Repl aci ng y i n ( 3) by ( 1) r esul t s i n t he f ol l owi ng expr essi on f or
ˆ
β
I
.
I V
56
( 4)
ˆ
β
I
= [ X
I '
X
I
]
 1
X
I
' [ X
I
β
I
+ X
I I
β
I I
+ ε]
= [ X
I
' X
I
]
 1
X
I
' X
I
β
I
+ [ X
I
' X
I
]
 1
X
I
' X
I I
β
I I
+ [ X
I
' X
I
]
 1
X
I
' ε
= β
I
+ ( X
I
' X
I
)
 1
X
I
' X
I I
β
I I
+ [ X
I
' X
I
]
 1
X
I
' ε
Anal ysi s of t he pr oper t i es of t he l east squar es est i mat or β
I
i n t he
mi sspeci f i ed model :
a. E(
ˆ
β
I
) = β
I
+ E{[ X
I
' X
I
]
 1
X
I
' X
I I
β
I I
} + E{[ X
I
' X
I
]
 1
X
I
' ε}
I f X
I
and ε ar e i ndependent , t hen
E(
ˆ
β
I
) = β
I
+ E[ X
I
' X
I
]
 1
X
I
' X
I I
β
I I
i . e. ,
ˆ
β
I
i s a bi ased est i mat or of β
I
i f X'
I
X'
II
β
II
≠ 0
b.
( )
1
I I
I II
I II
I
1
X X
ˆ
plim = + plim plim
X X
n n
  ′
′

β β
β

\ ¹
1
I I
I
1
X X
+ plim plim
X
n n
  ′
′

ε

\ ¹
( E. 5)
X X I I I I II
1 1
I II X X X
= + +
ε
β β ∑ ∑ ∑ ∑
2. Example. Including an irrelevant variable.
Tr ue Model : y = X
I
β
I
+ η
Hypot hesi zed Model : y = X
I
β
I
+ X
I I
β
I I
+ ε
To summar i ze, del et i ng a r el evant var i abl e r esul t s i n an
i nconsi st ent est i mat or of β
I
unl ess
a)
I
X
0
ε
=
∑
( ε and X
I
ar e i ndependent )
and
b)
I II
X X
0 =
∑
( X
I
and X
II
ar e or t hogonal )
I V
57
= Xβ + ε
The l east squar es est i mat or of


¹

\

β
β
β
II
I
= i s t hen gi ven by
I 1
II
ˆ
ˆ
= = (X X X y )
ˆ
 
β
′ ′ β 

β
\ ¹
1
I I I I II
II I II II II
y
X X X X X
=
y
X X X X X
    ′ ′ ′
 
 
′ ′ ′
\ ¹ \ ¹
Taki ng expect ed val ues gi ves
1
I I I I II I
II
II I II II II
ˆ
X X X X X
E = E(y)
ˆ
X X X X X
    ′ ′ ′
 
β
 


 
′ ′ ′ β
\ ¹
\ ¹ \ ¹
( )
1
I I I II I I
I II
II I II II II
X X X X X
=
X X
0
X X X X X
    ′ ′ ′
β  
 

 
′ ′ ′ \ ¹
\ ¹ \ ¹
1
I I I II I I I II 1
II I II II II I II II
X X X X X X X X
=
0
X X X X X X X X
    ′ ′ ′ ′
β  
 

 
′ ′ ′ ′ \ ¹
\ ¹ \ ¹


¹

\
β
0
=
1
.
The r eason f or t he asymmet r y of t he r esul t s f or t he t wo cases of
speci f i cat i on er r or j ust consi der ed i s t hat t he hypot hesi zed model i ncl udes
t he "t r ue" model as a speci al case i n t he second exampl e, but does not i n t he
f i r st exampl e. I t woul d t hen appear t hat i t woul d be bet t er t o er r or i n t he
di r ect i on of i ncl udi ng t oo many var i abl es t han del et i ng a r el evant var i abl e.
Ther ef or e, i ncl udi ng i r r el evant var i abl es i n a l i near
r egr essi on does not af f ect t he unbi asedness nor t he
consi st ency of t he l east squar es est i mat or s.
I V
58
I t shoul d be ment i oned t hat whi l e t he l east squar e est i mat or of β
I
i n t he
second exampl e i s unbi ased and consi st ent , t he cor r espondi ng var i ance may be
l ar ger t han i s associ at ed wi t h est i mat i ng t he "t r ue" model usi ng l east
squar es.
V. K. PROBLEM SET 5
Violations of the Basic Assumptions
Theory
1. Di st r i but i onal assumpt i ons
a. Assume t hat t he pr obabi l i t y densi t y f unct i on of t he r andom
di st ur bances ε
t
i n a r egr essi on equat i on
Y
t
= X
t
β + ε
t
i s gi ven by gener al i zed er r or
di st r i but i on ( GED) :
p
t
( / )
t
e
( ; , ) =
2 (1 + 1/p)
GED p
σ ε
σ
ε
σΓ
wher e Γ( ) i s t he gamma
f unct i on.
( 1) Obt ai n an expr essi on f or t he l i kel i hood f unct i on and al so f or
t he l og l i kel i hood f unct i on cor r espondi ng t o t he r egr essi on model
wi t h a GED er r or di st r i but i on.
( 2) What woul d t he MLE of β be i f p i n t he GED i s
( a) p=1
( b) p=2
Hi nt : You don’t have t o der i ve an equat i on f or
ˆ
β ; however , i n
maxi mi zi ng t he l og l i kel i hood f unct i on over
ˆ
β f or a gi ven
val ue of p you shoul d get
ˆ
β ’s you have seen bef or e. What ar e
t hey?
( 3) Bonus: How coul d t he par amet er "p" be est i mat ed?
b. For t he dat a, HBJ. dat , est i mat e t he model
t t t
Y X = α+β + ε , usi ng
OLS and LAE, and t est t he di st r i but i onal assumpt i on of
nor mal i t y, i n par t i cul ar :
I V
59
( 1) r epor t t he est i mat ed i nt er cept and sl ope usi ng OLS and LAE;
( 2) t est t he nor mal i t y assumpt i on usi ng t he est i mat ed skewness,
kur t osi s usi ng a “Z st at i si t c; ” and
( 3) t est t he nor mal i t y assumpt i on usi ng t he JB t est . ( Hi nt : You
can use t he St at a command skt est f or ( 2) and ( 3) . )
2. I t was shown i n ( I V. C) t hat
µ X X) X ( β ) β
ˆ
( E
1
′ ′ + =
wher e µ = E( ε) . I t was al so ment i oned t hat i f E( ε
t
) = µ f or al l t t hen
E( β
ˆ
1
) = µ + β
1
and E( β
ˆ
i
) = β
i
f or i = 2, 3, . . . , K. Ver i f y t hat t hi s i s
t r ue f or t he case K = 2. Hi nt :
µ







¹

\

′ β
1
.
.
.
1
X ) X (X, = )
ˆ
( Bias
1 
and
X
N

X
N
1
N
X N 
X N  X
= ) X X (
2 2 2
2
t
1 
∑


¹

\
 ∑
′
.
X N
N
=
X
N
=
1
.
.
.
1
X
t


¹

\



¹

\

∑







¹

\

′
I V
60
3. Consi der t he speci al case of t he gener al i zed r egr essi on model wher e Σ
= σ
2
I . For t hi s case, demonst r at e t hat
a.
1 1 1
= = (X X X Y )
∆
′ ′ β β
∑ ∑
%
si mpl i f i es t o β
ˆ
= ( X' X)
 1
X' Y ,
b. Var ( β
ˆ
) = ( X' X)
 1
X' Σ X( X' X)
 1
= σ
2
( X' X)
 1
, and
c.
1 1 1 2
Var( ) = Var( ) = (X X = (X X ) )
∆
′ ′ β β
∑ σ
%
I V
61
4. Het er oskedast i ci t y
a. Usi ng t he HBJ dat a and t he mar ket model
t t t
Y X = α+β + ε
( 1) Test f or het er oskedast i ci t y usi ng t he f ol l owi ng St at a
commands:
. whi t et st
. est at het t est x, i i d or est at het t est r hs, i i d
. est at het t est x, f st at
( 2) Wi t h t he wei ght s di scussed i n cl ass, use var i ance wei ght ed
l east squar es ( vwls) t o est i mat e α and β. Tur n i n your comput er
commands and out put al ong wi t h your di scussi on of t he r esul t s.
b. For t he het er oskedast i c case ver i f y t hat
T' T = Σ
 1
.
5. For t he case of f i r st or der aut ocor r el at i on i t can be shown t hat







¹

\

ρ
ρ −
ρ + ρ
ρ ρ + ρ
ρ −
σ
= Σ
−
1

0
0
0
0
0
0 0 0 0
1  0
 1 
0 1
1
2
2
2
u
1
M
O O M
O
L
L
Eval uat e T
1
' T
1
and T
2
' T
2
and compar e each r esul t wi t h Σ
 1
comment i ng on t he
r el at i onshi p and expl ai ni ng any di f f er ences. Ref er t o t he cl ass not es f or
t he def i ni t i ons of t he t r ansf or mat i on mat r i ces T
1
and T
2
. The Cochr an
I V
62
Or cut t est i mat or cor r esponds t o del et i ng t he f i r st obser vat i on wher eas t he
Pr ai s Wi nst en ( PW) est i mat or uses al l obser vat i ons.
I V
63
Applied
6. Use t he dat a i n PHI LLI PS. RAWt o answer t hese quest i ons.
a. Usi ng t he ent i r e dat a set , est i mat e t he st at i c Phi l l i ps cur ve equat i on
t 0 1 t t
unem inf = β +β + ε by OLS and r epor t t he r esul t s i n t he usual f or m.
b. Obt ai n t he OLS r esi dual s f r ompar t ( a) and obt ai n t he ˆ ρ f r om
r egr essi ng
t
e on
t 1
e
−
. I s t her e st r ong evi dence of aut ocor r el at i on? Al so
t est f or t he pr esence of aut ocor r el at i on usi ng t he DWt est st at i st i c.
c. Now est i mat e t he st at i c Phi l l i ps cur ve model by i t er at i ve Pr ai s
Wi nst en. Compar e t he est i mat e of
1
β wi t h t hat obt ai ned i n Tabl e 12. 2.
d. Rat her t han usi ng Pr ai s Wi nst en, use i t er at i ve Cochr ane Or cut t . How
si mi l ar ar e t he f i nal est i mat es of ρ ? How si mi l ar ar e t he PWand CO
est i mat es of
1
β ? ( Wool dr i dge, C. 12. 10)
7. Cost s of Pr oduct i on
The f ol l owi ng dat a cor r espond t o ouput ( Q) and t ot al cost s ( C) of
pr oduct i on.
Out put Tot al Cost s( $)
1 193
2 226
3 240
4 244
5 257
6 260
7 274
8 297
I V
64
9 350
10 420
a. Use OLS t o est i mat e t he par amet er s i n t he r el at i onshi p
1 2 t t t
C Q β β ε = + +
b. Per f or ma t est t o see i f t he er r or t er ms ar e “cor r el at ed. ”
c. I ndi cat e how you can obt ai n mor e appr opr i at e est i mat or s t han OLS
est i mat or s of t he l i near equat i on i n ( a) . Show your wor k and
pr ovi de mot i vat i on f or your appr oach. ( Be car ef ul ! ! ! ! ! )
8. Panel dat a exer ci se
Consi der t he f ol l owi ng dat a:
t code x y d1 d2 d3 d4
1 1 0  5 1 0 0 0
2 1 8 23 1 0 0 0
3 1 14 44 1 0 0 0
4 2 10 29 0 1 0 0
5 2 16 26 0 1 0 0
6 3 4 17 0 0 1 0
7 3 11 17 0 0 1 0
8 3 5 31 0 0 1 0
9 4 18 50 0 0 0 1
10 4 5 26 0 0 0 1
11 4 2 17 0 0 0 1
Per f or mt he f ol l owi ng St at a commands and br i ef l y expl ai n t he
cor r espondi ng out put s.
xt set code
r eg y x
r eg y x d1 d2 d3
xt r eg y x, f e
xt r eg y x, be
xt r eg y x, r e
I V
65
9. Consi der t he f ol l owi ng model :
( )
1 2 i i i
n wage educ β β ε = + + l
wher e wage and educ, r espect i vel y, denot e t he wage and educat i on
l evel ( year s) f or t he i t h i ndi vi dual .
a. Under what condi t i ons woul d you expect t he OLS est i mat or s of t he
'
i
s β t o be unbi ased and consi st ent ? Def end your answer .
b. I f you t hi nk t hat t he wage r at e has an i mpact on educat i on as wel l as
educat i on i mpact i ng wages, wi l l t he OLS est i mat or s be unbi ased and
consi st ent ? Def end your answer .
c. I f t her e i s an endogeni et y pr obl emi n t he model , expl ai n how you coul d
obt ai n consi st ent coef f i ci ent est i mat or s.
d. Usi ng t he mr oz dat a ( mr oz. dt a) est i mat e t he gi ven model usi ng OLS and
i nst r ument al var i abl es est i mat or s ( wi t h mot her ’s educat i on as an
i nst r ument ) . Whi ch est i mat e woul d you r ecommend? Use a Hausmann
t est t o suppor t your answer .
V 1
James B. McDonald
Brigham Young University
2/8/2010
VI. SIMULTANEOUS EQUATION MODELS
INTRODUCTION
There are several problems encountered with simultaneous equations models that which are
not generally associated with single equation models. These include (1) the identification
problem, (2) inconsistency of ordinary least squares (OLS) estimators, (3) questions about the
interpretation of structural parameters, and (4) the validity of the OLS "t statistics" associated
with structural coefficients.
To introduce these problems, we review two important papers. The paper on identification
by E. J. Working [1927, QJE] is considered in the first section. The work of Haavelmo [1947,
JASA] dealing with alternative methods of estimating the marginal propensity to consume is
described in the second section. The third section contains a brief summary.
1. STRUCTURAL AND REDUCED FORM REPRESENTATIONS,
IDENTIFICATION, AND INTERPRETATIONS OF COEFFICIENTS
Consider the problem of estimating the impact of an increase in the price of crude oil upon
the equilibrium price and quantity of gasoline. The corresponding increase in the price of
gasoline will depend upon several factors including the slope of the demand curve.
V 2
This is illustrated by the following figure:
Figure 1
Assume that (Q
0
, P
0
) denotes the original equilibrium. Assume that the increase in the price of
crude oil results in the supply curve shifting from S
1
to S
2
. The associated change in P depends
upon the relevant demand schedule, with the more inelastic schedule being associated with the
larger price increases. This example clearly indicates the importance of estimating the slope of
the demand schedule to make predictions about the impact of changes in factor price upon the
equilibrium price.
Estimation of the slope of the demand curve might begin by collecting observations on (P,
Q), which might appear as in Figure 2.
V 3
P
•
• •
•
• •
Q
Figure 2
The reader would probably be tempted to draw a line through the points or perform a least
squares estimation on p = β
1
 β
2
Q in order to estimate the demand schedule. But how would we
estimate the demand curve if a plot of P and Q appeared as in Figure 3 rather than as in Figure 2?
P
• •
• •
• •
• •
• •
• •
Q
Figure 3
The data in Figure 3 appears to define a supply curve rather than a demand curve.
Alternatively, how could we estimate a demand curve if the data appeared as in Figure 4?
V 4
P
•
• • •
• •
•
•
• • •
• •
• •
Q
Figure 4
To answer this question, we need to recall that equilibrium price and quantity are
determined by supply and demand factors and not supply or demand alone. The observations
depicted in Figure 2 could have been generated by either of the following scenarios:
P P
Q Q
Figure 5
V 5
If the demand curve is stable and the supply curve shifts, then the demand curve is "traced
out." If both curves shift, fitting a relationship to the observed (P,Q) would not correspond to the
underlying demand curve(s). Similarly, Figure 3 could correspond to a relatively stable supply
curve and a shifting demand curve or both curves shifting. Figure 4 would appear to correspond
to both curves shifting.
Consider the following model:
(1.1) Demand: Q = γ
11
 β
12
P + γ
12
Y + ε
lt
(1.2) Supply: Q = γ
21
+ β
22
P  γ
23
FC + ε
2t
or equivalently,
t1 11 12 12 t
t
t t 2 21 23 22
t
1
0 1  Q
+ + = 0
Y
0 1
P
FC
(
γ γ β ( ( ( ( ε
(
( ( ( (
(
γ −γ β
ε
¸ ¸ ¸ ¸ ¸ ¸ ¸ ¸
(
¸ ¸
.
Equations (1.1) and (1.2) will be referred to as the structural model with Q and P as endogenous
(dependent) variables and income (Y) and factor costs (crude oil, FC) as exogenous
(independent) variables. In order to draw a demand curve or supply curve using (Q, P) as
coordinates, Y and FC must be fixed at some arbitrary level.
P
S (FC = 125)
D (Y = 100)
Q
Figure 6
V 6
A change in factor costs (income fixed) will shift the supply curve and “trace” the depicted
demand curve and a change in income (factor costs fixed) will shift the demand curve and “trace”
the depicted supply curve, et cet. paribus. It is interesting to observe that by including factor
costs (FC) in the supply equation and not the demand equation we are able to "identify" the
demand equation. Similarly, by including income (Y) in the demand equation and not in the
supply equation we are able to "identify" the supply equation. Hence, one way of "identifying" a
structural equation is by excluding variables from the equation we want to estimate that are
included in other structural equations. This is the general approach to the identification problem
developed by E. J. Working [1927]. A more formal development will be considered later.
We note from Figure 6 that for each level of factor costs and income there is a
corresponding equilibrium price and quantity determined by the intersection of the supply and
demand curves. If we solve the structural model for the explicit relationship between (P, Q) and
FC and Y we obtain
¦
¦
)
¦
¦
`
¹
¦
¦
¹
¦
¦
´
¦
(
(
¸
(
¸
ε
ε
(
(
(
¸
(
¸
(
(
¸
(
¸
γ γ
γ γ
(
(
¸
(
¸
β
β
(
(
¸
(
¸
+
FC
Y
1
 0
0
1 
 1 
 =
P
Q
2 t
1 t
t
t
23 21
12 11
22
12
1 
t
t
(1.3ac)
¦
¦
)
¦
¦
`
¹
¦
¦
¹
¦
¦
´
¦
(
(
¸
(
¸
ε
ε
(
(
(
¸
(
¸
(
(
¸
(
¸
γ γ
γ γ
(
¸
(
¸
β β


¹

\

β β
+
FC
Y
1
 0
0
1  1
+
1
=
2 t
1 t
t
t
23 21
12 11
12 22
22 12
(
(
(
¸
(
¸
(
(
¸
(
¸
γ γ γ γ
γ β γ β γ β γ β


¹

\

β β
FC
Y
1

 +
+
1
=
t
t
23 12 21 11
23 12 12 22 21 12 11 22
22 12
(
(
(
(
(
¸
(
¸
β β
ε ε
β β
ε
β
ε
β
22 12
2 t 1 t
22 12
2 t
12
1 t
22
+

+
+
+
V 7
(
(
¸
(
¸
η
η
(
(
(
¸
(
¸
(
(
¸
(
¸
π π π
π π π
2 t
1 t
t
t
23 22 21
13 12 11
+
FC
Y
1
=
Note: 0 < =
+

=
FC
Q
0, > =
+
=
Y
Q
13
22 12
23 12
12
22 12
12 22
π
β β
γ β
∂
∂
π
β β
γ β
∂
∂
0 > =
+
=
FC
P
0, > =
+
=
Y
P
23
22 12
23
22
22 12
12
π
β β
γ
∂
∂
π
β β
γ
∂
∂
Equations (1.3ac) are referred to as the reduced form equations for Q and P corresponding to the
structural model defined by (1.1) and (1.2). Note that each reduced form equation expresses the
equilibrium value (P or Q) as a function of the exogenous variables FC and Y.
To determine the impact of an increase in the price of crude oil upon the price of gasoline,
we employ the reduced form representation, i.e.,
0 > =
+
=
FC
P
23
22 12
23
π
β β
γ
∂
∂
which takes into account the slopes of the supply and demand curves as well as how far the
supply curve would shift in response to an increase in the price of crude oil. The
equilibrium quantity would also change according to
0. < =
+

=
FC
Q
13
22 12
23 12
π
β β
γ β
∂
∂
The reader might wonder why
0 <  =
FC
Q
23
s
γ
∂
∂
doesn't characterize the change in equilibrium quantity.
V 8
The following figure will illustrate why the reduced form provides the necessary information.
P
Q
← →
γ
23
∆FC
Taking the partial derivative of the supply equation with respect to FC assumes that P is
fixed and hence merely represents the horizontal shift of the supply curve and not the change in
equilibrium quantity. The reduced form equation for Q expresses the equilibrium quantity as a
function of FC and Y and takes account of the increase in equilibrium price associated with an
increase in factor costs.
To summarize, the reduced form coefficients represent the change in equilibrium values
corresponding to changes in the predetermined or exogenous variables, i.e., the reduced form
coefficients are the multipliers. The structural coefficients represent slopes or shifts of structural
schedules in response to changes in predetermined or exogenous variables.
∆
β β
γ β
+

22 12
23 12
FC
V 9
OPTIONAL EXERCISES:
1. The Asymptotic Bias of the OLS estimator of the slope for the demand curve is given by
FC)) (Y,
COR
 (1 + +
) + (
2
2
23
2
2
2
1
2
1
12 22
γ
σ σ
σ
β β
ε ε
ε
where COR(Y, FC) = correlation between Y and FC.
(a) Mathematically analyze the impact of increases in σ
ε2
2
, γ
23
2
, and COR(Y, FC) upon
the asymptotic bias of
ˆ
β
12
.
(b) Graphically analyze the impact of increases in σ
ε2
2
, γ
23
2
, and COR(Y, FC) upon the
"identifiability of β
12
."
V 10
2. INCONSISTENCY OF STRUCTURAL ORDINARY LEAST SQUARES
ESTIMATORS, ALTERNATIVE ESTIMATORS, AND STATISTICAL
INFERENCE
Haavelmo [1947] considered the following simple macro model:
(2.1) C
t
= α + βY
t
+ ε
t
(2.2) Y
t
= C
t
+ Z
t
where Y
t
, C
t
, and Z
t
(Z ≡ Y  C) respectively denote income, consumption and nonconsumption
expenditure.
The reduced form representation corresponding to (2.1) and (2.2) is given by
(2.3) C
t
= π
11
+ π
12
Z
t
+ η
t
(2.4) Y
t
= π
21
+ π
22
Z
t
+ η
t
where (2.5ae) η
t
= ε
t
/(1β)
π
11
= α/(1β)
π
12
= β/(1β)
π
21
= α/(1β)
π
22
= 1/(1β)
Note that π
12
and π
22
correspond to the multipliers discussed in simple macroeconomics
models. Haavelmo's analysis of the simple model defined by (2.1) and (2.2) pointed out many
problems which are also associated with larger econometric models. For this reason we will
consider this model in detail.
V 11
Estimation. Past experience might suggest that the OLS estimator of β would have
desirable statistical properties if ε
t
in (2.1) is not characterized by autocorrelation or
heteroskedasticity. The OLS estimator of β in (2.1) is defined by
(2.6)
( )
( )
2
,
(Y Y)(C C)
ˆ
=
(Y Y)
Cov Y C
Var Y
β
∑
=
∑
but from (2.3) and (2.4), we see that
(2.7)
β
ε ε
π
 1

+ ) Z (Z = C C
12
β
ε ε
β
β
 1

+ ) Z (Z
 1
=
and
(2.8)
β
ε ε
π
 1

+ ) Z (Z = Y Y
22
β
ε ε
β  1

+ ) Z (Z
 1
1
= ;
hence, after substituting (2.7) and (2.8) into (2.6), we can write
(2.9)
)
`
¹
¹
´
¦
β
ε ε
β
∑
)
`
¹
¹
´
¦
β
ε ε
β
β
)
`
¹
¹
´
¦
β
ε ε
β
∑
β
 1
)  (
+
)  (1
) Z (Z
 1
)  (
+ ) Z (Z
 1
 1
)  (
+
)  (1
) Z (Z
=
ˆ
2
( )
2
2
2 2 2
2 2
2 2 2
1 (Z Z)(  ) (  )
(Z Z + + )
(1 (1 (1 ) ) )
ˆ
=
(Z Z (  )(Z Z) (  ) )
+ 2 +
(1 (1 (1 ) ) )
β ε ε β ε ε
β β β
β
ε ε ε ε
β β β
¦ ¹
+
¦ ¦
∑
´ `
¦ ¦
¹ )
¦ ¹
∑
´ `
¹ )
( )
{ }
2 2
2 2
(Z Z /N + 1 (Z Z)(  ) /N + (  /N ) )
=
(Z Z /N + (  )(Z Z) /N + (  /N ) )
β β ε ε ε ε
ε ε ε ε
∑ + ∑ ∑
∑ ∑ ∑
.
Assuming that:
σ
→
∑
2
Z
2
N
1 = t
/N ) Z (Z as N → ∞,
0 /N )  )( Z (Z
N
1 = t
→ ε ε
∑
as N → ∞, and
σ
→ ε ε
∑
2
2
N
1 = t
/N )  ( as N → ∞,
gives us:
V 12
(2.10) N → ∞,
σ σ
σ σ
β
→ β
2 2
Z
2 2
Z
+
+
ˆ
.
σ σ
β
σ
β
2 2
Z
2
+
)  (1
+ = .
. Hence, we see from (2.10) that
ˆ
β is an inconsistent estimator of β with asymptotic bias equal
to the second term in (2.10)
σ σ
β
σ
2 2
Z
2
+
)  (1
.
This may seem like a surprising result in light of the apparent simplicity of the consumption
function. It may not be obvious which of the assumptions
(A.1) ε
t
distributed normally
(A.2) E(ε
t
) = 0 for all t
(A.3) Var(ε
t
) = σ
2
for all t
(A.4) E(ε
t
ε
s
) = 0 for t ≠ s
(A.5) Y
t
and ε
t
are independent
are violated. But upon closer inspection (hint: see (2.4)) we note that
(
¸
(
¸
ε


¹

\

β
ε
π π ε
) (
 1
+
Z
+ E = )
Y
E(
t
t
t 22 21 t t
= E(ε
2
t
)/(1β)
= σ
2
/(1β) ≠ 0;
hence, (A.5) is violated and OLS estimators of the structural parameters α and β are biased and
inconsistent. In fact, this is typically the case when OLS is used to estimate structural
relationships which include endogenous variables on the right hand side of the structural
equation. Right hand side endogenous variables are commonly referred to as endogenous
regressors.
As another example, the asymptotic bias of the OLS estimator of β
12
in (1.1) is given by
V 13
(2.11)
FC)) (Y,
Corr
 (1 + +
) + (
2
2
23
2
2
2
1
2
1
12 22
γ
σ σ
σ
β β
ε ε
ε
.
How can we obtain consistent estimators of the unknown structural
parameters?
Two stage least squares or an appropriate application of instrumental variables estimation
provides a solution. It is instructive to consider an alternative estimator first. Recall that the
ordinary least squares estimators of the reduced form equations (referred to as least squares no
restrictions, LSNR) will yield unbiased and consistent estimators of the π
ij
's which will be
denoted by ˆ
ij
π . This observation provides the basis for obtaining consistent estimators of α and
β in the Haavelmo model. From (2.5 c,e) we note that
β = π
12
/π
22
hence, a consistent estimator of β can be obtained from
(2.12) β* = ˆ π
12
/ ˆ π
22
where
) Z (Z
) Z )(Z C (C
= ˆ
2
12
∑
∑
π
) Z (Z
) Z )(Z Y (Y
= ˆ
2
22
∑
∑
π
or
(2.13)
) Z )(Z Y (Y
) Z )(Z C (C
= *
∑
∑
β
In order to verify the consistency of β* in (2.13) we replace (C C) and (Y Y) in (2.12) by (2.7)
and (2.8) to obtain
V 14
(2.14)
[ ]
[ ]
¦
)
¦
`
¹
¦
¹
¦
´
¦
(
¸
(
¸
β
ε ε
β
∑
¦
)
¦
`
¹
¦
¹
¦
´
¦
(
¸
(
¸
β
ε ε
β
β
∑
β
Z Z
 1
)  (
+ ) Z (Z
 1
1
Z Z
 1

+ ) Z (Z
)  (1
= *
{ }
2
2
(Z Z) /N + (  )(Z Z) /N
=
(Z Z /N + (  )(Z Z) /N )
β∑ ∑ ε ε
Σ ∑ ε ε
Now as N → ∞
β* → β;
hence, β* is a consistent estimator and is obtained by obtaining consistent estimators of the
reduced form (LSNR) and then deducing corresponding estimates of structural coefficients. This
general method is referred to as indirect least squares (ILS), but it is not applicable for all
structural models.
The consistent estimator β* can also be obtained by replacing the dependent variable on the
right hand side of (2.1) by its predicted value (from the reduced form)
ˆ
Y = ˆ π
21
+ ˆ π
22
Z
or
ˆ
Y Y = ˆ π
22
(Z Z)
and then applying least squares to the resultant expression. More explicitly,
V 15
(2.15 ae)
) Y  Y
ˆ
(
) C )(C Y  Y
ˆ
(
= *
2
∑
∑
β
) Z (Z
) C )(C Z (Z
ˆ
ˆ
=
2 2
22
22
∑
∑
π
π
2
22
1 (Z Z)(C C)
=
(Z Z) ˆ
∑
∑ π
)
`
¹
¹
´
¦
∑
∑
)
`
¹
¹
´
¦
∑
∑
) Z (Z
) C )(C Z (Z
) Z )(Z Y (Y
) Z (Z
=
2
2
) Z )(Z Y (Y
) C )(C Z (Z
=
∑
∑
which corresponds to (2.13). Compare (2.15 a) with (2.6) and note that the only difference is that
ˆ
Y (predicted value) replaces Y in (2.6). The structural estimator, obtained by applying least
squares to the structural equation which has been modified by replacing the right hand dependent
variables by their reduced form predictions is referred to as two stage least squares (2SLS).
2SLS yields consistent estimators, and is applicable even when indirect least squares is not.
Another way of looking at the alternative estimator is obtained by comparing (2.6) and (2.15e).
Here we see that the difference is that the right hand side dependent variable Y in (2.6) is
replaced by Z (an instrumental variable) which is correlated with Y, but not with C; hence, these
estimators are sometimes referred to as instrumental variables estimators.
A numerical example: the Haavelmo data set (Haavelmo.dat).
Using the data provided by Haavelmo, the regular OLS estimates of the consumption
function given by
OLS
ˆ
C = 84.01 + .732Y
s (
ˆ
β ) (14.55) (.030)
R
2
= .971
s
2
= 58.21.
V 16
The corresponding 2SLS estimates of the consumption function are given by
2SLS
ˆ
C = 113.1 + .672Y
(17.8) (.037)
s
2
= 71.29.
The LSNR estimates of the reduced form equations are given by
ˆ
C = 344.70 + 2.048Z
(16.48) (.341)
R
2
= .668
ˆ
Y = 344.70 + 3.048Z
(16.48) (.341)
R
2
= .668
The reader should verify that the indirect least squares estimators are equal to the 2SLS.
However, except for pedagogical examples, the reader will apply 2SLS or instrumental variables
estimation directly and not use the two step procedure. Also, the two step procedure yields
incorrect standard errors.
CONFIDENCE INTERVALS. In determining confidence intervals for structural
parameters, the reader might be inclined to use the results associated with the OLS or 2SLS
estimates of the structural equation under consideration. As an example of this we compute
"95% confidence intervals for β (the MPC)."
(a) Based upon OLS: (t = 2.101)
ˆ
β
OLS
± ts
ˆ
β
= (.732 ± 2.101(.0299))
= (.669, .795)
V 17
(b) Based upon 2SLS
ˆ
β
2SLS
± ts
ˆ
β
= (.672 ± 2.101(.0368))
= (.594, .748)
These confidence intervals are very different and one might ask which if either is appropriate. As
it turns out, neither is completely satisfactory since
s

ˆ
ˆ
β
β β
is not exactly distributed as a tstatistic where
ˆ
β is obtained from the technique of OLS or 2SLS.
One way in which we can determine which (if either) of the previous confidence intervals is
closest is to note that
ij
ij ij
ˆ
 ˆ
~ t(n 2);
s
π
π π
hence,
22
22 22
/ 2 / 2
ˆ
 ˆ
1 = Pr[ ]
t t
s
α α
π
π π
α ≤ ≤
22 22
22 / 2 22 22 / 2 ˆ ˆ
= Pr[  + ] ˆ ˆ t s t s α α π π
π π π
≤ ≤
22 22
22 / 2 22 / 2 ˆ ˆ
1
= Pr[  + ] ˆ ˆ t s t s
1
α α π π
π π
β
≤ ≤
22 22
22 / 2 22 / 2 ˆ ˆ
1 1
= Pr[1  1  ]
 + ˆ ˆ t s s t α α π π
β
π π
≤ ≤ .
Making the appropriate substitutions we obtain
(.57, .73)
which is much closer to the results obtained using two least squares than from OLS. One might
be inclined to conjecture that a reason for the poor performance of OLS confidence intervals is
due to the asymptotic bias of OLS estimator,
σ σ
β
σ
2 2
2
+
)  (1
.
It might be instructive to estimate the asymptotic bias. Doing so we obtain for OLS estimates of
σ
2
(s
2
=58.2), β(
ˆ
β =.732), σ
2
z
(285.55); hence asymptotic bias (
ˆ
β
OLS
) = .0454; for 2SLS estimates
V 18
of σ
2
(s
2
=71.29), β(
ˆ
β =.672), σ
2
z
(285.55), asymptotic bias (
ˆ
β
OLS
) = .0655. Note that the
difference between the OLS and 2SLS is (.732  .672 = .06).
PREDICTIONS. In order to make predictions, one should use the reduced form
representation.
V 19
K
2
≥ G
∆
 1
3. A BRIEF OVERVIEW
The mathematical formulation of an economic model is generally referred to as the
structural representation. The structural equations in the structural representation will often
include endogenous regressors (endogenous variables on the right hand side) as well as
exogenous variables.
The reduced form representation corresponding to the structural representation is
characterized by separate equations expressing each dependent variable as a function of the
exogenous variables. The reduced form provides explicit expressions for the equilibrium for the
model, conditional on an arbitrary, but given, set of values for the exogenous variables. The
reduced form coefficients can be interpreted as "multipliers" and yield comparative static results.
The reduced form representation is usually the form used for obtaining forecasts from
econometric models.
After the econometrician is satisfied that a given econometric model is consistent with
relevant economic theory, it is important that each structural equation be identified.
Identification should be checked even before attempting to estimate the model. A necessary
condition (order condition) for a structural equation to be identified is that the number of
exogenous (predetermined) variables excluded (K
2
) from a structural equation is at least as large
as the number of endogenous regressors (one less than the number of endogenous variables in the
equation being checked (G
∆
)),
.
If K
2
is thought of as referring to instrumental variables, then the necessary condition for
identification is that there must be at least as many instrumental variables as endogenous
regressors. This condition must be satisfied for each structural equation. The values for K
2
and
V 20
ivregress 2sls y1 X1 (Y2 Y3=X1 X2)
G
∆
may vary from one equation to another. Identities do not contain unknown parameters and
need not be checked for identification.
OLS estimates of parameters in structural models are typically biased and inconsistent
with unreliable tstatistics. This is due to the correlation between the error and endogenous
regressor on the right hand side of the equation. Two stage least squares estimators (2SLS)
provide biased, but consistent estimators. They can also be viewed as instrumental variables
estimators.
The Stata command for 2SLS is
where Y = endogenous variables (y1 on lhs, y2 and y3 on the rhs),
X1 = exogenous variables in structural equation being estimated,
X2=Z = exogenous variables in the model, but excluded from the equation being
estimated. The variables in X2 are often called instruments. An alternative form for the two
stage estimators is given by
Example 1: See the problem set for some sample data
Demand: Q = γ
11
 β
12
P + γ
12
Y + ε
1t
Supply: Q = γ
21
+ β
22
P  γ
23
FC + ε
2t
ENDOGENOUS VARIABLES: Q, P
EXOGENOUS VARIABLES: Y, FC
(a) Identification
(1) Demand K
2
= 1 FC is in the supply model, but
not in the demand equation
G
∆
 1 = 2  1 = 1 One endogenous regressor (P) in the
demand equation
ivregress 2sls y1 X1(Y2 Y3=X2)
V 21
(2) Supply K
2
= 1 Y is in the demand model, but
not in the supply equation
G
∆
 1 = 2  1 = 1 One endogenous regressor (P) in the supply equation
Therefore K
2
≥ G
∆
 1 is satisfied for the supply and demand equation.
(b) Estimation of the structural parameters (Stata commands)
(1) Demand
ivregress 2sls Q Y (P = FC) or ivregress 2sls Q Y (P=Y FC)
(2) Supply
ivregress 2sls Q FC (P = Y) or ivregress 2sls Q FC (P=Y FC)
(c) Estimation of the reduced form (Stata commands)
(1) Q Equation
reg Q Y FC
(2) P Equation
reg P Y FC
Example 2. Consider the Haavelmo model and data:
C
t
= α + βY
t
+ ε
t
Y
t
= C
t
+ Z
t
(a) Identification
The exogenous variable Z is not included in the consumption function, but it is in the
identity.
(b) Estimation of the structural parameters (STATA commands)
ivregress 2sls c (Y=Z)
(c) Estimation of the reduced form parmaters (STATA commands)
reg c z
V 22
reg y z
The data used by Haavelmo is given
Y C Z
433 394 39
483 423 60
479 437 42
486 434 52
494 447 47
498 447 51
511 466 45
534 474 60
478 439 39
440 399 41
372 350 22
381 364 17
419 392 27
449 416 33
511 463 48
520 469 51
477 444 33
517 471 46
548 494 54
629 529 100
References
Haavelmo, T. "Methods of Measuring the Marginal Propensity to Consume," Journal of
American Statistical Association, 42(1947):105122.
Working, E. "What Do Statistical Demand Curves Show?," Quarterly Journal of Economics,
41(1926):212235.
V 23
4. PROBLEM SET 6: Simultaneous Equations
Consider the following Supply and Demand Model:
Demand: Q
t
= (
11
+ ∃
12
P
t
+ (
12
Y
t
+ e
t1
Supply: Q
t
= (
21
+ ∃
22
P
t
+ (
23
FC
t
+ e
t2
where Q
t
, P
t
, Y
t
and FC
t
denote quantity, price, income and factor costs.
Observations on these variables are given by:
P
t
185 215 275 279 310 330 400 360 450 515
Q
t
320 360 460 460 480 540 600 570 680 780
Y
t
100 120 160 164 180 200 240 220 280 320
FC
t
10 12 14 15 20 16 24 20 28 30
1. Express the reduced form representation in terms of the structural coefficients.
2. Determine which of the structural coefficients can be expressed in terms of the reduced
form coefficients and make this relationship explicit where possible.
3. Determine whether the supply and demand equations are identified. Check the order
(necessary) condition in your analysis.
4. Estimate the reduced form equations for P and Q using the technique of Least Squares
(LSNR). (Hint: In Stata, type reg q Y FC and reg p Y FC)
a) Test for the presence of autocorrelation.
b) Test for heteroskedasticity using the results from the “whitetst” or “hettest” commands
in STATA .
V 24
5. Estimate the supply and demand equations using OLS.
6. Estimate the supply and demand equations using 2SLS (“ivregress” in Stata).
7. Comment on the properties of the estimators associated with questions (5) and (6).
8. Indicate how you could test the following hypotheses and discuss any related problems.
a) ∃
12
= 2
b) (
12
= 0
c) Β
12
= 2.5
d) Β
12
= 0
9. What implication does Β
22
= 0, the coefficient of FC in reduced from equation for P, have
with respect identification of any of the structural equations?