You are on page 1of 15

Chapter 2: Introduction to Simultaneous Equation Models

So far we were exclusively concerned with single-equation models, i.e., model in which there
was a single dependent variable Y and one or more explanatory variables, the X's. In such
models the emphasis was on estimating and/or predicting the average value of Y conditional on
fixed values of the X variables. The cause-and-effect relationship in such models therefore runs
from the X's to Y.
But in many situations such a one-way or unidirectional cause-and-effect relationship may not
hold. This occurs if Y is determined by the X's and some of the X's are, in turn, determined by Y.
Under such a circumstance, there is a two-way or simultaneous relationship between Y and
(some of) the X's, which makes the distinction between dependent and independent variables
dubious.
A model is said to constitute a system of simultaneous equations if all of the relationships
involved are required for determining the values of at least one of the endogenous variables
included in the model. In simultaneous equation models there are more than one equation - one
for each of the mutually or jointly dependent or endogenous variables. The variables that are
truly non-stochastic or regarded as non-stochastic are called the exogenous variables. Lagged
dependent variables used as regressors in the model are called predetermined variables.
Unlike the single-equation model, in the simultaneous-equation models one may not estimate the
parameters of a single equation without taking into account information provided by other
equations in the system.
2.1 OLS Estimation Techniques and the Simultaneity Bias
What happens if the parameters of each equation are estimated by applying, say, the method of
OLS, disregarding other equations in the system?
OLS estimation technique is not appropriate as to estimate a single equation embedded in a
system of simultaneous equations as one or more explanatory variables are correlated with the
disturbance term in the equation, which violates the one of the assumptions of the classical linear
regression model. If OLS estimation technique is applied, one surly obtains estimators which are
biased and inconsistent. To illustrate this, consider the simple Keynesian model of income
determination as given below:

1
Consumption function: C t = β 0 + β 1 Y t +ut 0< β 1 < 1 EQ 1

Y t = C t + I t ( ¿ St )
Income identity: EQ 2

Where Yt - National income


Ct - Consumption expenditure
It - Investment expenditure
St - Saving
ut - Stochastic disturbance term
EQ 1 and EQ 2 are known as structural equations or behavioural equations because they may
portray the structure (of an economic model) of an economy or the behaviour of an economic
agent (e.g., consumer or producer) and parameters are structural (or behavioural) parameters.
Moreover, equations 1 and 2 reveal the variables entering a simultaneous-equation model are of
two types: endogenous, that is, those variables whose values are determined within the model;
and predetermined, that is, those variables whose values are determined outside the model. The
endogenous variables are regarded as stochastic, whereas the predetermined variables are treated
as non-stochastic. The predetermined variables are divided into two categories: exogenous,
current as well as lagged, and lagged endogenous.
For instance, consider the following system of simultaneous equations model:
Y 1 t = β 0 +β 1 X t + β2 Y 1(t−1)+β 3 X t−1 +β 4 Y 2t u t
Y 2 t = α 0 +α 1 X t +α 2 Y 2(t−1)+α 3 X t−1 +α 4 Y 1t ε t
Thus, Xt is a current (present-time) exogenous variable, whereas Xt−1 is a lagged exogenous
variable, with a lag of one time period. Y1(t−1) is a lagged endogenous variable with a lag of one
time period, but since the value of Y1(t−1) is known at the current time t, it is regarded as non-
stochastic, hence, a predetermined variable. In short, current exogenous, lagged exogenous, and
lagged endogenous variables are deemed predetermined; their values are not determined by the
model in the current time period.
Suppose that we want to estimate the parameters of the consumption function. Assuming that
E(ut) = 0, E(ut2) = σ2, E(ut ut+j) = 0 (for j≠ 0), and cov(It, ut) = 0, which are the assumptions of the
classical linear regression model, we first show that Y t (explanatory variable) and ut in EQ 1 are

2
^
correlated and then prove that OLS estimator β 1 is an inconsistent estimator of β 1. To prove this,
we proceed as follows:
Substituting EQ1 into EQ2 gives the following equation:
Y t = β 0 + β 1 Y t +ut + I t
β β u
⇒ Yt = 0 + 1 It+ t
1−β 1 1−β1 1−β 1
β0 β
E (Y t ) = + 1 It as E ( ut )=0
Now, 1−β1 1−β 1

ut
Y t −E ( Y t ) =
Which implies 1−β 1

Hence, covariance between Yt (explanatory variable) and ut is given by:

[
Cov ( Y t , u t ) = E ( Y t −E ( Y t ) ) (ut −E ( ut ) ) ]
E( )
2
ut σ2
= =
1−β 1 1−β 1

Since σ2 is positive by assumption, the covariance between Y and u is different from zero, which
violates the assumption of the classical linear regression model that the disturbance terms are
independent or at least uncorrelated with the explanatory variables.
^
To show that the OLS estimator β 1 is biased and inconsistent estimator of β 1 because of
correlation between Yt and ut, we proceed as follows:
∑ ( C t −C̄ )( Y t −Ȳ )
β^ 1 =
∑ ( Y t − Ȳ )
2

=
∑ ct y t
∑ yt2
=
∑ C t yt
∑ yt2
Substituting for Ct from EQ1, we obtain:

3
∑ ( β 0+ β 1 Y t +ut ) y t
β^ 1 =
∑ y t2
= β 1+
∑ y t ut
∑ yt2 EQ 3

If we take expectation of EQ3 on both sides, we obtain:

E( β^ 1 ) = β 1 + E
[ ]
∑ y t ut
∑ yt2 . Thus it is biased estimator of the true parameter. It is not only
biased but also an inconsistent estimator as the level of bias will never disappear as sample size
increases.

Unfortunately, we cannot evaluate


E
[ ]
∑ yt ut
∑ yt2 since expectations operator is a linear operator
only.
An estimator is said to be consistent if its probability limit (plim) is equal its true population
^
value. Therefore, to show that β 1 in EQ3 is an inconsistent estimator of β 1, we must show that its
plim is not equal the true β1. Applying the rule of probability limit to EQ3, we obtain:

n →∞ n →∞ [ ]
p lim ( β^ 1 ) = p lim ( β 1 ) + p lim
n→∞
∑ y t ut
∑ yt2
= β 1 + p lim
n→∞ [∑ ]
∑ y t ut / n
yt2 / n
p lim (∑ y t ut / n )
n→∞
= β 1+
p lim (∑ y t 2 / n )
n→∞
σ / ( 1−β 1 )
2
= β 1+
σ 2Y EQ 4

4
^
Given that 0<β1<1 and that σ2 and σY2 are both positive, it is obvious from EQ 4 that plim( β 1 )
^ ^
will always greater than β1,; that is, β 1 will overstate the true β1. In other words, β 1 is a biased
estimator, and the bias will not disappear no matter how large the sample size is.

In summary, for a single equation regression model we require that E(u|X) = E(u) (which is = 0
by assumption)-u must be independent of each X.
 If this condition is not satisfied then OLS estimators will be biased and inconsistent
(proof is given above).
 There are several ways that this assumption might be violated:
 endogenous regressors (the equation is part of a simultaneous equation model and one
of the regressors is endogenous (jointly determined) – simultaneous equation bias
(discussed above)
 there is an omitted variable that is correlated with one of the included variables
 one or more of the X variables has systematic measurement errors such that the
observed values are not independent of the disturbance
2.2 Rank and Order Conditions for Identification
Notations and Definitions: The general G equations model in G endogenous or jointly
dependent, variables may be written as follows:
γ 11 Y 1t +γ 12 Y 2t +γ 13 Y 3 t +⋯+γ 1G Y Gt +β 11 X 1 t + β12 X 2 t +⋯+β 1 K X Kt =u1 t
γ 21 Y 1 t +γ 22 Y 2 t +γ 23 Y 3 t +⋯+γ 2 G Y Gt +β 21 X 1 t + β 22 X 2 t +⋯+β 2 K X Kt =u 2t
γ 31 Y 1 t +γ 32 Y 2 t +γ 33 Y 3t +⋯+γ 3 G Y Gt +β 31 X 1 t + β32 X 2 t +⋯+β 3 K X Kt =u 3t
. .. .. . .. .. . .. .. .. . .. .. . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .. . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .. . .. .. . .. .. . .. .. . .. .. . ..
γ G 1 Y 1 t +γ G 2 Y 2t +γ G 3 Y 3 t +⋯+γ GG Y Gt +β G 1 X 1t +β G 2 X 2 t +⋯+β GK X Kt =uGt
Where Y 1 , Y 2 ,⋯ , Y G − G endogenous, or jointly dependent,variables
X 1 , X 2 ,⋯ , X K − K predetermined variables ( one of these X variables may take a
value of unity to allow for the intercept in each equation
u 1 , u2 ,⋯ , uG − G stochastic disturbances
t=1, 2, .. . , T - total number of observations

γ's - coefficient of the endogenous variables


β's - coefficients of the predetermined variables

5
The set of variables entering the above systems of equations are classified into two groups:
endogenous variables (these are variables that are determined within the system or the model)
and predetermined variables (these are variables that are determined outside the system or the
model). Moreover, such systems of simultaneous equations are known as the structural, or
behavioural, equations as they may portray the structure (of an economic model) of an economy
or behaviour of an economic agent. The γ's and β's are known as the structural parameters or
coefficients of the model.

From the structural equations one can solve for the G endogenous variables as the functions of
predetermined variables and error terms and derive the reduced-form equations and the
associated reduced-form coefficients. A reduced-form equation is one that expresses an
endogenous variable solely in terms of the predetermined variables and the stochastic
disturbance terms.
Example 1:
C t = β 0 + β 1 Y t +ut 0< β 1 < 1 EQ4
Y t = Ct+ I t EQ5
Now substitute EQ5 into EQ4
C t = β 0 + β 1 (C t + I t )+u t
C t = β 0 + β 1 Ct + β 1 I t +u t
C t (1- β 1 )=β 0 + β1 I t +ut
β0 β 1 I t ut
C t= + +
1- β 1 1- β 1 1- β 1
C t =π 10+ π 11 I t +v t EQ6
β0 β u
π 10= , π 11= 1 , v t = t
Where 1- β 1 1- β 1 1- β1

Now substituting Ct expression indicated above EQ6 into EQ5:


β0 β I u
Yt = + 1 t + t +I
1- β 1 1- β 1 1- β1 t
β 1 u
Yt = 0 + I t+ t
1- β 1 1−β 1 1- β 1

6
C t =π 20+ π 21 I t + v t EQ7

β0 1 u
π 10= , π 21= , vt= t
Where 1- β 1 1- β 1 1- β 1

EQ6 and EQ7 are known as the reduced form equations of the structural equations. Since only
the predetermined variables and the stochastic disturbance terms appear on the right hand
side of the reduced-form equations, and since the predetermined variables are assumed to
be uncorrelated with the disturbance terms, the OLS estimation method can be applied to
estimate the coefficients of the reduced-form equations (the Π's). From the estimated
reduced-form coefficients, one may estimate the structural coefficients (the β's). This
procedure of estimation is known as indirect least squares (ILS), and the estimated structural
coefficients are called ILS estimates in the sense that they are obtained from the estimates
of the reduced form parameters. ILS estimators are consistent (i.e., level of bias disappears
as sample size gets larger). However this method does not provide standard errors for the
structural form parameter estimates hence could not talk about precision of estimates and
hypothesis testing is impossible.
Example 2:
Consider the following demand and supply models in a simple market:

Q = a o + a1 P + a2 Y + u .......EQ8 Demand function


......EQ9 Supply function
Q = b o + b1 P + v
Where :
Q is quantity demanded and supplied in the market; P is a measure of the price of goods
Y is a measure of consumers’ income and u and v are unobservable stochastic disturbance terms.

In the system, we have a two equation models with two endogenous variables (Q, P) and one
exogenous variable (Y). Unfortunately OLS estimation of the supply function would result in
biased estimators. It would also not be possible to obtain any sensible estimators for the demand
function because it would not be identified. The reduced form of the system is:

Q = π 10+π 11 Y +w1
P = π 20+π 21 Y +w 2 7
a0 b1 −b0 a1 a2 b1 a 0−b0 a2
Where: π 10= π 11 = π 20= π 21=
(b1 −a1 ) (b1 −a1 ) (b 1−a1 ) (b1 −a1 )
a 2 b1
π 11 ( b 1−a1 )
Solving for the structural parameters: = =b1
π 21 a2
( b 1−aa)

The method of Indirect Least Squares (which is suitable for an exactly identified equation like
the supply equation here) can be used to get structural form Parameter estimates indirectly. First
estimate the reduced from equations using OLS estimation technique then solve for the structural
form parameters mathematically. Thus, the slope coefficient of the supply equation, b 1, can be
consistently estimated. However, the slope parameters of the demand equation, a1 and a2,
cannot be consistently estimated by any estimation method
The Identification Problem
In simultaneous equations model identification problem is to mean whether numerical estimates
of the parameters of the structural equations can be obtained from the estimated reduced-form
coefficients. If this can be done, we say that the particular equation is identified. If this cannot be
done, then we say that the equation under consideration is not identified, or under identified.
Specifically,
1. An equation is said to be unidentified/not identified iff we cannot get the structural
coefficients from the reduced form estimates.
2. An equation is said to be exactly identified iff we can get unique structural/behavioural
form coefficient estimates from the reduced form estimates.
3. An equation is said to be over-identified iff more than one set of structural coefficients
could be obtained from the reduced form estimates.
Thus, an identified equation may be either exactly (or fully or just) identified or over identified.
An equation is said to be exactly or just identified if unique numerical values of the structural
parameters can be obtained. An equation is said to be over identified if more than one numerical
value can be obtained for some of the parameters of the structural equations.

8
Consider the above dd-ss model where one can substitute reduced form equations variables into
the structural form:
π 10 +π 11 Y = ao + a 1( π 20 +π 21 Y ) + a2 Y
π 10 +π 11 Y = (a o +a1 π 20 )+( a1 π 21+a2 )Y dd function

Q = b o + b1 PbutP=π 20+π 21 Y
π 10 +π 11 Y = (bo +b 1 π 20 )+b1 π 21 Y ss function

From these two equations, we can equate intercept terms as well as slope coefficients on the right
and left hand sides:
π 10 = ao +a 1 π 20 and
DD:
π 11 = a1 π 21 +a 2

π 10= b o +b1 π 20 and


SS:
π 11 =b1 π 21

The demand function is unidentified as number of equations less than number of known ( a o, a1,
and a2). However, the supply function is exactly identified as:
π 11
b 1= and
π 21
π 11
b o =π 10 − ∗π 20
π 21
Rules for Identification:
There are two rules to test whether a given structural equation in a system of simultaneous
equations is identified or not. These are the order and rank conditions for identification. To
illustrate these conditions, we introduce the following notations:
G - number of endogenous variables in the model
g - number of endogenous variables in a given equation
K - number of predetermined variables in the model
k - number of predetermined variables in a given equation

The Order Condition of Identifiability:

9
A necessary (but not a sufficient) condition of identification, which is known as the order
condition, refers to the order of a matrix, i.e., the number of rows and columns present in a
matrix. This is defined as follows:
Definition: In a model of G simultaneous equations, in order for an equation to be identified, the
number of predetermined variables excluded from the equation must be greater than or equal to
the number of endogenous variables included in that equation less one, that is,

K-k ≥ g-1 EQ 5
If K-k = g-1, the equation is just identified, but if K-k > m-1, it is over identified.

The Order Condition of Identifiability:

In a model containing G equations in G endogenous variables, an equation is said to be identified


if and only if at least one non-zero determinant of order (G-1)x(G-1) can be constructed from the
coefficients of the variables (both endogenous and predetermined) excluded from that particular
equation but included in the other equations of the model. This is a sufficient condition for
identifiability of an equation.

A system of G equations (containing G endogenous variables) must exclude at least G-1


variables from a given equation in order for the parameters of that equation to be identified and
to be able to be consistently estimated.
• Let G denote the number of structural equations. An equation is just identified if the
number of variables excluded from an equation is G-1.

• If more than G-1 variables are absent, it is over-identified. If less than G-1 variables are
absent, it is not identified.

Example 1:
• In the following system of equations, the Y’s are endogenous, while the X’s are
exogenous. Determine whether each equation is over-, under-, or just-identified.

10
Y1  0  1Y2  3Y3  4 X 1  5 X 2  u1
Y2  0  1Y3  2 X 1  u2
Y3   0   1Y2  u3
Solution:
G = 3;
If # excluded variables = 2, the eqn is just identified
If # excluded variables > 2, the eqn is over-identified
If # excluded variables < 2, the eqn is not identified

First Equation: Not identified


Second Equation: Just identified
Third Equation: Over-identified

Example 2: Consider the following hypothetical system of simultaneous equations in which the
Y variables are endogenous and the X variables are predetermined variables.

Y 1 t + γ 12 Y 2t +γ 13 Y 3 t + β 11+ β 12 X 2 t = u1 t EQ 6
Y 2 t + γ 23 Y 3 t + β 21 + β 22 X 2t + β 23 X 3 t = u2 t EQ 7
Y 3 t + γ 31 Y 1t + β 31+ β32 X 2 t + β 33 X 3 t = u3 t EQ 8
Y 4 t + γ 41 Y 1t +γ 42 Y 2t + β 41 + β 44 X 4 t = u 4 t EQ 9
Assignment: Check the order condition of identifiability of each of the above structural
equations.

Identifiability of structural equations using rank condition can be done as follows.


Table 1: Table of structural coefficients

Coefficients of the variables

Equation Number Y1 Y2 Y3 Y4 X1 X2 X3 X4

11
EQ 10 1 γ12 γ13 0 β11 β12 0 0
EQ 11 0 1 γ23 0 β21 β22 β23 0
EQ 12 γ31 0 1 0 β31 β32 β33 0
EQ 13 γ41 γ42 0 1 β41 0 0 β44

To apply the order condition for identification of each of the above four equation, we may
proceed as follows:

Table2: Identification of Equations using Order condition


No. of predetermined No. of endogenous
Equation variables excluded (K-k) variables included
Number less one (g-1) Identified?
EQ 10 2 2 Exactly
EQ 11 1 1 Exactly
EQ 12 1 1 Exactly
EQ 13 2 2 Exactly
To apply the rank condition one may proceed as follows:
1. Write down the system in a tabular form as shown in the above table.
2. Strike out the coefficients of the row in which the equation under consideration appears.
3. Also strike out the columns corresponding to those coefficients in step 2 which are
nonzero.
4. The entries left in the table will then give only the coefficients of the variables included in
the system but not in the equation under consideration. From these entries form all
possible matrices of order G-1 and obtain the corresponding determinant. If at least one
nonzero determinant can be found, the equation is (just or over) identified. If all the
possible (G-1)x(G-1) determinants are zero, the rank of the matrix is less than G-1 and the
equation under investigation is not identified.

12
For the above example, let us check with the rank condition. Consider, say, the first equation,
which excludes variables Y4, X3 and X4 . For this equation to be identified, we must obtain at
least one nonzero determinant of order 3x3 from the coefficients of the variables excluded from
this equation but included in other equations. To obtain the determinant, from the coefficient
matrix given in table 1, delete row1, columns 1,2, 3, 5, and 6 to obtain the relevant coefficient
matrix of excluded variables. In the present case there is only one such matrix, call it A, defined
as follows:

[ ]
0 β 23 0
A = 0 β 33 0
1 0 β 44

It can be seen that the determinant of this matrix is zero and hence rank (A) < 3. Therefore,
(EQ10) does not satisfy the rank condition and hence is not identified.

Note that the rank condition is both a necessary and sufficient condition for identification.

2.3 Testing for Exogeneity

• How do we tell whether variables really need to be treated as endogenous or not?


• Consider again equations (14)-(16). Equation (14) contains Y2 and Y3 - but do we really
need equations for them?
• We can formally test this using a Hausman test, which is calculated as follows:
1. Obtain the reduced form equations corresponding to (14)-(16). The reduced forms turn out to
be:
Y1  10  11 X 1  12 X 2  v1
Y2  20  21 X 1  v2
Y3  30  31 X 1  v3
Estimate the above reduced form equations using OLS, and obtain the fitted values:
Y1 , Y2 , Y3
2. Run the regression corresponding to the first equation in the structural model.

13
3. Run the regression (14) again, but now also including the fitted values
as additional regressors:

Y1   0   1Y2   3Y3   4 X 1   5 X 2   2Yˆ2  3Yˆ3  u1


1 1

(20)

4. Use an F-test to test the joint restriction that l2 = 0, and l3 = 0. If the null hypothesis is
rejected, Y2 and Y3 should be treated as endogenous.

2.4 Estimation techniques: Indirect least squares, two-stages least squares and
instrumental variables (IV)
Indirect least squares: it is discussed above
IV and 2sls
If a regressor (such as P in EQ8) is not independent of the disturbance term we might
nevertheless be able to replace it by another variable (instrumental variable) that is
 highly correlated with P
 but uncorrelated with the disturbance term
If such a variable was to be available we could estimate the supply equation using
Instrumental Variables (IV) estimation. Instrumental Variables estimators, although
biased, are consistent.
Finding a suitable Instrumental Variable
 How do we find an instrumental variable?
 There are two methods:
– Arbitrary search and test.
– Two stage least squares.
 Two Stage Least Squares (2SLS) offers an excellent direct estimation method in the case
of exactly or over-identified equations.
 While it is still a single equation estimation technique, 2SLS uses the information
available from the specification of the entire equation system.
 In doing so, it is able to provide unique estimates of each structural parameter in the over-
identified equation.

14
 The first stage involves the creation of an instrument. Use the reduced from equation for
P to get its fitted value, Phat.
 The second stage involves a variant of instrumental variables estimation. Replace P by
Phat in the supply equation and use OLS in this second stage of the estimation process
 So it is in fact a special way and perhaps less arbitrary way of doing instrumental
variables estimation.

15

You might also like