You are on page 1of 10

Page1

Introduction

I. The Nature and Scope of Econometrics.

Lot’s of definitions of econometrics.

 Nobel Prize Committee

 Paul Samuelson, et al. “Econometrics may be defined as quantitative


analysis of actual economic phenomena.”

 Goldberger “... application of economic theory, mathematics and


statistical inference to the analysis of economic phenomena.”

 (Joke) E.E. Leamer “There are two things you don’t want to see in the
making – sausage and econometric research.”

II. Major Uses of Econometrics.

1. Describing economic reality


2. Testing hypothesis about economic theory
3. Forecasting future economic activity

III. Econometric Methodology – Regression Analysis

An important methodology in econometrics is regression analysis which


typically follows these steps:

Use a famous example to illustrate.

1. State the hypotheses.

Keynes in the General Theory said a $1 increase in income will lead to less than
a $1 increase in overall consumption.

We want to test this hypothesis — that the MPC<1.

2. Specify the mathematical model of the theory.


Page2

Although Keynes didn’t specify the exact nature of the relationship. Might
suggest a simple linear relationship.

C =  0 +  1 DI 0 < 1< 1

where C=aggregate consumption and DI=aggregate disposable income

3. Specify the econometric model.

This purely mathematical model is uninteresting to the econometrician. It

C =  0 +  1 DI + 

assumes an exact or deterministic relationship between C and DI.


We re-write the equation with a disturbance or error term.
This is now an econometric model, or more precisely a linear regression model.

4. Obtain the Data.


Page3

Only way to estimate the parameters of interest in this model, is to obtain the
necessary data. Data source could involve time series, cross-sectional or panel
data.

Time series data are collected over time for the same country or other single
aggregate economic unit (e.g., aggregate C and DI could be obtained for
Singapore from 1950 -2000). In this case, we’d normally re-write the equation
with a ‘t’ subscript on the variables and disturbance term to denote ‘time’.

C t =  0 +  1 DI t +  t

Cross-sectional data are collected for a sample over individuals, households,


firms or other disaggregate economic entity at a point in time (e.g., C and DI
could be obtained for sample of 1,000 Singapore families during 2000). In this
case, we’d normally re-write the equation with a ‘i’ subscript on the variables
and disturbance term to denote ‘individual’.

C i =  0 +  1 DI i +  i

Finally, panel data contains elements of both time series and cross-sectional
data (e.g., C and DI could be obtained for all countries in the OECD during the
period 1950-2000). Note that we have variation across countries at any single
point in time, as well as variation across time. In this case, we’d normally re-
write the equation with both an ‘i’ and ‘t’ subscript on the variables and
disturbance term to denote ‘country’ and ‘time’.

C it =  0 +  1 DI it +  it

Time series or cross sectional data could be plotted as a ‘scatter diagram’ below:
Page4

5. Estimate the parameters in the econometric model.

Now it’s time to estimate the coefficients in the model. The basic idea is to
come up with a ‘line’ that best ‘fits’ the data points. Imagine that this
‘regression analysis’ yields the following consumption function.

Ĉ = 336.9 + 0.820DI

These are the estimates of the 2 coefficients. The ‘hat’ on C indicates that this
is an ‘estimated’ consumption function or regression model.

6. Test the hypothesis.

Recall that we wanted to test Keynes’ hypothesis that the MPC was between
zero and 1. Looks reasonable, but unsure whether there is any ‘statistical’
evidence that it’s below 1.
Page5

7. Forecast or predict economic behaviour.

One of the other uses of this model if for forecasting or predicting future
economic behaviour. To predict C, however, need to know future values of DI.
Suppose you know that DI is going to be $65,000 (millions).

Ĉ = 336.9 + 0.820(65,000) = 53,636.9

This also allows you to predict savings of $11,363.1. This is just the difference
between DI and C.

8. Use the model for policy purposes.

Can also be used for ‘control’ purposes. Suppose that C of 53.6 billion is
insufficient to maintain full-employment. Not enough spending by households.
Government could consider increasing DI through tax cuts to achieve a higher
target. Suppose 62 billion is needed.

62,000 = 336.9 + 0.820DI

DI = 75,198.9

Thus, need to cut taxes by just over $10 billion from forecasted levels.

IV. Types of Econometrics and Names of Variables in Regression

Split into ‘theoretical’ and ‘applied’ fields. We end up ‘straddling’ these 2


approaches. Theoretical econometrics concerns the development of basic
estimation approaches, properties of estimators, etc. More closely related to
mathematical statistics (e.g., proofs, axioms, ...).

Applied econometrics is built on this theoretical foundation. Applies estimation


techniques to various areas of economic enquiry. Examples: Where to open a
new restaurant? How much ad? Should we fix the target interest rate? How
many hours studying on Econ107? Academics, private and government sectors
have increasingly used econometrics.
Page6

Regression analysis is the study of the relationship between a ‘Dependent


Variable’ and one or more ‘Independent’ or ‘Explanatory Variables’.

In the linear regression model (or true regression line or population regression
function)

Yi =  0 +  1 X 1i     K X Ki +  i

Yi is called dependent or left-hand-side variable or regressant and is random;


X ki ( k  1,  , K )
is called independent or explanatory or right-hand-side variable
or regressor, it can be fixed or random;  i is called error or disturbance term
and is random;  ’s are called regression coefficients, they are unknown and
fixed;  0 is the intercept coefficient;  k (k  1,, K ) is the slope coefficients. The
meaning of  1 is the impact of a one unit increase in X 1 on Y , holding
constant the other independent variables.

The estimated regression line (or sample regression function) is written as

Yˆi = ˆ 0 + ˆ 1 X 1i    ˆ K X Ki

Yˆi is called ‘estimated’ or fitted value of Yi ; ˆ k (k  0,, K ) is called estimated


regression coefficient; Define ei  Yi  Yˆi and call ei the residual.

When K=1, the regression model is Simple Linear Regression (SLR) model.
When K>1, the regression model is Multiple Linear Regression (MLR) model.

V. Statistical vs. Deterministic Relationships

Regression analysis is concerned with a Statistical, not a Functional or


Deterministic dependence among variables. In statistical relationships, the
variables are Random or Stochastic.

VI. Regression vs. Causation

Although regression analysis deals with the relationship of one variable on other
variables, it doesn’t necessarily imply causation. A causal relationship must
come from outside of statistics. Economic theory is supposed to provide the
compelling evidence of causation.
VII. The True (or Population) Regression Function (PRF)
Page7

Suppose we have a small community of 12 families. We’re interested in


studying the relationship between their weekly disposable income (X) and
expenditure on food (Y). We want to predict the population mean of food
expenditures, given some level of family income.

The 12 families can be grouped into four income groups. Each family within a
group has the same disposable income. This is the entire population, not a
sample.

Disposable Individual Food Average Food


Income (X) Expenditures (Y) Expenditures
250 78.00, 88.50, 96.00 87.50
300 77.50, 89.00, 96.50, 109.00 93.00
350 90.50, 106.50 98.50
400 99.00, 103.00, 110.00 104.00

Plot these data points on the following diagram. This is often known as a
Scatter Diagram. The ‘solid’ dots are the actual observations. Now the
Conditional Mean or Conditional Expectation is

E(Y | X = X i )

The ‘circles’ are the conditional means. Clearly, food expenditures ‘on average’
increase with disposable income.

This can be seen even more clearly by ‘connecting’ these conditional means
with a straight line. This is the True (or Population) Regression Line. Note that
it could also be a True (or Population) Regression Curve.
Page8

Geometrically, a population regression line or curve is simply the locus of the


conditional means or expectations of the dependent variable for fixed values of
the explanatory variable(s).

In general, we could write the Population Regression Function (PRF) as:

E(Y | X i ) = f( X i )

where this is some function of the explanatory variable.

We might anticipate that food consumption will be linearly related to disposable


income. This is an initial assumption of our estimation. We could narrow this
functional form to:

E(Y | X i ) =  0 +  1 X i

This is known as the linear PRF (or PR Line).


Page9

VIII. ‘Linearity’ in Regression Analysis

What do we mean when we say that our regression model is linear? One
possibility is that the model is nonlinear in terms of the variables.

E(Y | X i ) =  0 +  1 X i2

The second possibility is that the PRF is nonlinear in terms of the coefficients.

E(Y | X i ) =  0 + 1 X i

Such regressions functions will not be considered in this paper, but the one
given above will be. From now on, ‘linear regression models’ should be read as
linear (in terms of the parameters).

IX. Adding the Disturbance Term to Our PRF

The PRF tells us the 'average' food expenditures for a given level of household
income. But we know that any 'particular' household is unlikely to be on this
function. For this reason we rewrite PRF as

Y i =  0 + 1 X i +  i

where  i is a random variable with mean 0. Lot's of reasons why  i might


exist.

• Minor influences of Y are omitted.

• The underlying theoretical equation might have a different functional


form than the one chosen for the regression.

• Some purely random variations are always there.

• Measurement Error on Y or X.
Page10

X. The Sample (Estimated) Regression Function

Thus far, we've dealt with the entire population and the PRF. Avoided any
consideration of sampling. In most cases, we will never observe the entire
population. We have to infer from a sample or samples what the PRF might
look like. Note that we're unlikely to know just how close we get to the truth.

Each sample we draw can be used to produce a Sample (Estimated) Regression


Function (SRF), that is, the estimated regression function:

ˆ ˆ1 X i
Yˆ i =  0 +

Of course, we can replace the actual value of the dependent variable ( Y i ) with
its fitted value ( Yˆ i ).
The LHS is no longer an estimator, it’s the actual value. The RHS now includes
the Residual term ei.

Y i = ˆ0 + ˆ1 X i + ei

This means that the actual dependent variable can be decomposed into its fitted
value and the residual.

Y i = Yˆ i + ei

This residual, like the disturbance can be either positive or negative. We can
either overestimate:

Y i - Yˆ i = ei < 0 if Y i < Yˆ i

or underestimate the true value of Yi:

Y i - Yˆ i = ei > 0 if Y i > Yˆ i

X. Questions for discussion: Q1.10

XI. Run the height regression (Section 1.4) using the data file
provided. Do further exploration according to Q1.4 and Q1.5

You might also like