Professional Documents
Culture Documents
I. The Nature and Scope of Econometrics
I. The Nature and Scope of Econometrics
Introduction
(Joke) E.E. Leamer “There are two things you don’t want to see in the
making – sausage and econometric research.”
Keynes in the General Theory said a $1 increase in income will lead to less than
a $1 increase in overall consumption.
Although Keynes didn’t specify the exact nature of the relationship. Might
suggest a simple linear relationship.
C = 0 + 1 DI 0 < 1< 1
C = 0 + 1 DI +
Only way to estimate the parameters of interest in this model, is to obtain the
necessary data. Data source could involve time series, cross-sectional or panel
data.
Time series data are collected over time for the same country or other single
aggregate economic unit (e.g., aggregate C and DI could be obtained for
Singapore from 1950 -2000). In this case, we’d normally re-write the equation
with a ‘t’ subscript on the variables and disturbance term to denote ‘time’.
C t = 0 + 1 DI t + t
C i = 0 + 1 DI i + i
Finally, panel data contains elements of both time series and cross-sectional
data (e.g., C and DI could be obtained for all countries in the OECD during the
period 1950-2000). Note that we have variation across countries at any single
point in time, as well as variation across time. In this case, we’d normally re-
write the equation with both an ‘i’ and ‘t’ subscript on the variables and
disturbance term to denote ‘country’ and ‘time’.
C it = 0 + 1 DI it + it
Time series or cross sectional data could be plotted as a ‘scatter diagram’ below:
Page4
Now it’s time to estimate the coefficients in the model. The basic idea is to
come up with a ‘line’ that best ‘fits’ the data points. Imagine that this
‘regression analysis’ yields the following consumption function.
Ĉ = 336.9 + 0.820DI
These are the estimates of the 2 coefficients. The ‘hat’ on C indicates that this
is an ‘estimated’ consumption function or regression model.
Recall that we wanted to test Keynes’ hypothesis that the MPC was between
zero and 1. Looks reasonable, but unsure whether there is any ‘statistical’
evidence that it’s below 1.
Page5
One of the other uses of this model if for forecasting or predicting future
economic behaviour. To predict C, however, need to know future values of DI.
Suppose you know that DI is going to be $65,000 (millions).
This also allows you to predict savings of $11,363.1. This is just the difference
between DI and C.
Can also be used for ‘control’ purposes. Suppose that C of 53.6 billion is
insufficient to maintain full-employment. Not enough spending by households.
Government could consider increasing DI through tax cuts to achieve a higher
target. Suppose 62 billion is needed.
DI = 75,198.9
Thus, need to cut taxes by just over $10 billion from forecasted levels.
In the linear regression model (or true regression line or population regression
function)
Yi = 0 + 1 X 1i K X Ki + i
Yˆi = ˆ 0 + ˆ 1 X 1i ˆ K X Ki
When K=1, the regression model is Simple Linear Regression (SLR) model.
When K>1, the regression model is Multiple Linear Regression (MLR) model.
Although regression analysis deals with the relationship of one variable on other
variables, it doesn’t necessarily imply causation. A causal relationship must
come from outside of statistics. Economic theory is supposed to provide the
compelling evidence of causation.
VII. The True (or Population) Regression Function (PRF)
Page7
The 12 families can be grouped into four income groups. Each family within a
group has the same disposable income. This is the entire population, not a
sample.
Plot these data points on the following diagram. This is often known as a
Scatter Diagram. The ‘solid’ dots are the actual observations. Now the
Conditional Mean or Conditional Expectation is
E(Y | X = X i )
The ‘circles’ are the conditional means. Clearly, food expenditures ‘on average’
increase with disposable income.
This can be seen even more clearly by ‘connecting’ these conditional means
with a straight line. This is the True (or Population) Regression Line. Note that
it could also be a True (or Population) Regression Curve.
Page8
E(Y | X i ) = f( X i )
E(Y | X i ) = 0 + 1 X i
What do we mean when we say that our regression model is linear? One
possibility is that the model is nonlinear in terms of the variables.
E(Y | X i ) = 0 + 1 X i2
The second possibility is that the PRF is nonlinear in terms of the coefficients.
E(Y | X i ) = 0 + 1 X i
Such regressions functions will not be considered in this paper, but the one
given above will be. From now on, ‘linear regression models’ should be read as
linear (in terms of the parameters).
The PRF tells us the 'average' food expenditures for a given level of household
income. But we know that any 'particular' household is unlikely to be on this
function. For this reason we rewrite PRF as
Y i = 0 + 1 X i + i
• Measurement Error on Y or X.
Page10
Thus far, we've dealt with the entire population and the PRF. Avoided any
consideration of sampling. In most cases, we will never observe the entire
population. We have to infer from a sample or samples what the PRF might
look like. Note that we're unlikely to know just how close we get to the truth.
ˆ ˆ1 X i
Yˆ i = 0 +
Of course, we can replace the actual value of the dependent variable ( Y i ) with
its fitted value ( Yˆ i ).
The LHS is no longer an estimator, it’s the actual value. The RHS now includes
the Residual term ei.
Y i = ˆ0 + ˆ1 X i + ei
This means that the actual dependent variable can be decomposed into its fitted
value and the residual.
Y i = Yˆ i + ei
This residual, like the disturbance can be either positive or negative. We can
either overestimate:
Y i - Yˆ i = ei < 0 if Y i < Yˆ i
Y i - Yˆ i = ei > 0 if Y i > Yˆ i
XI. Run the height regression (Section 1.4) using the data file
provided. Do further exploration according to Q1.4 and Q1.5