You are on page 1of 61

BASIC ECONOMETRICS

Christophe Croux
christophe.croux@econ.kuleuven.ac.be

1
Session 1: Basic Statistical Concepts

I Why using statistics?

By economic theory or by intuitive thinking one


deduces relationships between variables. Statis-
tics (or econometrics) can help us with
1. Measuring these relationships
2. Test the theory behind these relationships
3. Use these relationships for quantitative predic-
tions/forecasts.

For this, collected/expermental or


observed/historical data will be used.

Typically, only a sample of the whole popula-


tion of interest is available, inducing uncertainty.
Statistics help us to quantify the uncertainty.
2
Example: demand for Money

M =demand for money; Y =(real) national in-


come; r=interest rate; P =general price level

Theory says that M depends on Y and r only.


* How to specify and estimate this function?
* (How to measure M , Y , P and r?)

Possible specifications:
1. Linear form: M = α + βY + γr
Note that ∂M
∂Y = β (Marginal effect)
[If Y increases with one unit, then M increases with β units, ce-

teris paribus.]

2. Linear after log-transformation:


log(M ) = α + β log(Y ) + γr
∂M/M
Note that ∂Y /Y = β (Elasticity)
3
[If Y increases with 1%, then M increases with β %, ceteris

paribus.]

3. Non linear forms, e.g.


log(M ) = α + β(log(Y ))δ + γr
[Non constant elasticities]

After specification of the form, we still need to


estimate the unknown parameters (e.g. by Least
Squares or Maximum Likelihood).

We could also be interested in testing whether P


has a significant effect on M or not. According
to theory, it should not. Specify:
M = α + βY + γr + δP
Test the null hypothesis H0 : δ = 0.

If we reject H0, then we say that P is a significant


variable in the model. If we do not reject H0,
then P is said to have no significant effect.
4
Remark on the disturbance term:
The relationships above are not exact/deterministic.
Otherwise M would be perfectly predictable. We
need to add a disturbance term ε:
log(M ) = α + β log(Y ) + γr + ε (1)
This disturbance term captures
(i) measurement errors
(ii) (small) influences of omitted variables
(iii) unpredictable behavior/events
(iv) deviation from long run equilibrium
(v) ...

Equation (1) is supposed to hold at any time t.


We may write
log(Mt) = α + β log(Yt) + γrt + εt
for t = 1, 2, 3, . . . .

5
II Probability Distributions
In statistics and econometrics we deal with variables whose values

are (partly) determined by some chance mechanism. Such variables

are called stochastic or random variables. In principle, we denote

stochastic variables with X, Y, Z, ... and their outcomes/realizations

by x, y, z, . . . Although we cannot predict which values a random vari-

able will take, we are often able to attach certain probabilities to these

values, that is, to derive its (probability) distribution

II.1 Discrete stochastic variables


X takes values in a finite set {0, 1, 2, . . . , N } or
in 0, 1, 2, 3, . . . , . . .. Examples: to pass or fail an
exam, number of matches won in a championship,
number of car accidents of a person during one
year, ...

The probability distribution of X is given by list-


ing all
pk = P (X = k) for k = 0, 1, 2, . . .
6
The expected value of X is defined as
X
E[X] = kpk
k
It is also called the (population) mean of X. Sim-
ilarly, the expected value of a Z = g(X) is defined
as X
E[Z] = g(k)pk .
k

Example 1: Let X be the number of boys in a


family with 3 children. Then

k 0 1 2 3
pk 0.125 0.375 0.375 0.125

It follows that E[X] = 0 ∗ 0.125 + 1 ∗ 0.375 + 2 ∗


0.375 + 3 ∗ 0.125 = 1.5

Example 2: Let X be the number of children in


a family. Here we do not know the exact proba-
bilities and we need to estimate them. Herefore
7
we collect a random sample of n = 60 families
and record the number of children in each family.
The results are pictured in the barplot below:
20
15
frequency

10
5
0

0 1 2 3 4 5 6

number of children

How could you obtain an estimate for the ex-


pected value of X??

8
II.2 Continuous stochastic variables

If X can take any value in a certain interval of


the real line, then we say that X is a continu-
ous stochastic variable. The probability distribu-
tion is characterized by a density function f (x),
which has the property that
Z ∞
f (x) ≥ 0 and f (x)dx = 1.
−∞
The density function allows us to compute prob-
abilities:
Z b
P (a ≤ X ≤ b) = f (x)dx.
a

Alternatively, we can characterize the probability


distribution by the cumulative distribution func-
tion F , defined as
F (x) = P (X ≤ x).

9
Rx
We see that F (x) = −∞ f (y)dy and that the
derivative of F equals f .

The expected value of X is defined as


Z
E[X] = xf (x)dx.

It is also called the (population) mean of X. The


expected value of Z = g(X) is defined as
Z
E[Z] = g(x)f (x)dx.

The variance of a stochastic variable X (discrete


or continuous) is defined as
Var(X) = E[(X − E[X])2],
and the standard deviation of X equals
p
SD(X) = Var(X).
Note that the standard deviation has the same
units of measurement as X.
10
Example: The standard normal distribution. It
2

has density function f (x) = exp(−x /2)/ 2π :
standard normal density function

0.4
0.3
0.2
f(x)

0.1
0.0

-4 -2 0 2 4

Its cumulative distribution function looks like


standard normal cumulative distribution function
1.0
0.8
0.6
F(x)

0.4
0.2
0.0

-4 -2 0 2 4

11
For X a standard normal random variable, we
have E[X] = 0 and SD(X) = 1. Furthermore:
P (−1 ≤ X ≤ 1) ≈ 0.68, P (−2 ≤ X ≤ 2) ≈
0.95, P (−3 ≤ X ≤ 3) ≈ 0.997.

A random variable Y is said to follow a normal


distribution with parameters µ and σ if
Y −µ
σ
has a standard normal distribution. Notation:
Y ∼ N (µ, σ). We have E[Y ] = µ and SD(Y ) =
σ.

Note that the interval


[mean-2* SD, mean + 2 * SD]
contains 95% of the possible outcomes of X. It
is called the 2σ-interval.
(Example: If the average IQ of the Belgian population is 100 with SD=15, then

about 95% of the Belgians have an IQ in the interval [70;130], under normality)

12
III Estimators of means and variances

The (population) mean µ = E[X] and standard


deviation σ = SD(X) are (almost always) un-
known. We call them population parameters. We
can only compute them exactly when the proba-
bility distribution is completely known, which is
rarely the case. However, it is possible to estimate
them. If we have a random sample X1, . . . , Xn
then the sample average is
n
X
1
µ̂ = Xi
n
i=1
and the sample standard deviation
v
u n
u1 X
σ̂ = t (Xi − µ̂)2.
n
i=1

When estimating µ by µ̂ (or σ by σ̂) we will make


an estimation error. This error is quantified by
13
the standard error (SE). It is the magnitude of a
“typical” error. An (approximative) rule to con-
struct a confidence interval for an estimated pa-
rameter is given by
[estimator-2* SE, estimator + 2 * SE]
This rule is valid for “most” estimators and based
on the fact that the distribution of “most” esti-
mators is close to a normal distribution for large
sample sizes (Central Limit Theorem).

Example Let X be the monthly income of a factory worker (in


Euro). We want to estimate the population mean E[X]. We take from
a random sample of n = 300 workers, giving as histogram
60
50
40
frequency

30
20
10
0

1000 1200 1400

monthly income

14
The sample average equals 1201, with standard error SE=6.47. The

sample standard deviation is 112. We conclude that a 95% confidence

interval for the population mean is given by [1189 ; 1215]. Moreover,

since the distribution of X seems to be close to a normal distribution,

we may say that about 95% of the monthly incomes are in the interval

[978; 1427]

Property Let a and b be two real numbers, then


E[aX + b] = aE[X] + b
and
SD(aX + b) = |a|SD(X).

IV Joint and conditional distributions

Let X and Y be two stochastic variables which


possibly interact. To simplify the formulas, we
consider them as discrete.

15
The joint distribution of (X, Y ) is given by the
probabilities of the form
P (X = k, Y = l).
The conditional distribution of Y given X = k
is given by probabilities of the form
P (Y = l and X = k)
P (Y = l|X = k) = ,
P (X = k)
for each possible outcome k of X.
The marginal distributions of X and Y are noth-
ing else but the distributions of X and Y sepa-
rately.

Definition When the conditional distribution of


Y given X equals the marginal distribution of
Y , then X and Y are statistically independent.

Property
• E[X+Y]=E[X]+E[Y]
16
• If X and Y are statistically independent, then
Var(X + Y ) = Var(X) + Var(Y )

Example Let X be the gender of the first child (0=girl, 1=boy)


and Y the gender of the second child of a family. Joint distribution of
(X, Y ):

P ((X, Y ) = (0, 0)) =0.25


P ((X, Y ) = (0, 1)) =0.25
P ((X, Y ) = (1, 0)) =0.25
P ((X, Y ) = (1, 1)) =0.25

Marginal distribution of Y :
P (Y = 0) = 0.5 and P (Y = 1) = 0.5.
Conditional distribution of Y given X = 1:
P (Y = 1|X = 1) = 0.5 and P (Y = 0|X = 1) = 0.5.
Conditional distribution of Y given X = 0:
P (Y = 1|X = 0) = 0.5 and P (Y = 0|X = 0) = 0.5.

We see that X and Y are statistically independent.

17
V Some Exercises
1. Below we see a graphical representation of the total number of
bankruptcies in Belgium over the last 4 months.

1560
1550
1540
number

1530
1520
1510

We clearly see a significant increase, indicating the start of a re-


cession. Comment on this.

2. Let X be the total amount of money (cash) that a family keeps at


home. From a sample of size n = 100 we obtain a sample mean
of 800 Euro with standard error 50 Euro. The sample standard
deviation is given by 500 Euro.
(i) Construct a 95% confidence interval for E[X].
(ii) Construct a 99% confidence interval for E[X].
(iii) Construct an interval that will contain approximately 95% of
the outcomes of X
(iv) Do you think that the distribution of X is normal?
18
(v) Would it be a good idea to construct my sample by interviewing
100 persons waiting at the railway station. Do you have better a
idea?

3. Let X be the return of a stock A, and suppose that it follows a


distribution N(2,1). Let Y be the return of a stock B, and suppose
that it follows a distribution N(3,2). Suppose that X and Y are
independent. I buy 10 stocks of A and 10 of B. Compute the
expected return and the standard deviation of the total return.
How can I increase the expected return? Comment on this.

4. The next 4 histograms are all based on 100 observations. Comment


on their forms.
25

20
20

15
15

10
10

5
5
0

-1 0 1 2 3 -1 0 1 2 3

x1 x2
20

20
15

15
10

10
5

5
0

-2 0 2 4 -1 0 1 2 3 4

x3 x4

19
As an alternative to histograms, kernel density estimates can be
computed. The latter can be considered as a kind of smoothed
histograms. Compare these kernel density estimates with the pre-
vious histograms.
0.5

0.5
0.4

0.4
0.3

0.3
0.2

0.2
0.1

0.1
0.0

0.0
-2 -1 0 1 2 3 4 -1 0 1 2 3 4

x1 x2
0.4
0.15

0.3
0.10

0.2
0.05

0.1
0.0

0.0

-4 -2 0 2 4 6 -2 0 2 4

x3 x4

5. Let Y be the hourly wage of a person and X the number of years of


schooling. Compare the marginal distribution of Y with the con-
ditional distribution of Y given X = 16 and with the conditional
distribution of Y given X = 12.

20
Session 2: Econometrics in Practice: in-
troduction to the use of Eviews

I Hands on with Eviews


The following steps will give you a first impression of the power of
Eviews:
1. We will analyse data in the file demo.xls. Let us first have a look
at this file. The variables or series have been put into columns
and represent aggregate money demand (M 1), income (GDP ),
the price level (P R), and the short term interest rates (RS). We
see that the data are quaterly. Record the number of variables and
the time span. Close the .xls file and start Eviews.

2. Create a workfile (/File/new/workfile/). Import the data with


/Procs/Import/Read Text-Lotus-Excel/. Do not forget to specify
the number of series or the names of the series. Have a look at the
objects in the workfile.

3. Open the object “GDP” by clicking on it. Using the View menu,
try out: (1) /graph/line, (2) /descriptive statistics/histogram and
stats/.

4. Note that “GDP” is a non-stationary series. Construct the se-


ries in differences by using the menu /Genr/. Use the expression
”dGDP=d(GDP).” Is this series stationary?
21
5. Generate now the series in log-difference using “growth=dlog(GDP)”.
Is this series stationary? Using the View menu, try out /dis-
tribution/ (1) /quantile-quantile graph/, and (2) /kernel density
graph/.

[Let Xt be a time series associated to X. Eviews computes

• The lagged series X(−1) ≡ Xt−1


• The series in differences d(X)=X-X(-1) ≡ Xt − Xt−1
• The series in log-difference dlog(X)=d(log(X))≡ log(Xt) −
log(Xt−1).

Note that
Xt − Xt−1
log(Xt) − log(Xt−1) = log(Xt/Xt−1) ≈ .]
Xt−1
6. Select now in the workfile the series M1, GDP, PR, and RS.
By clicking on the right mouse button, you can open them as
a group. Try /View/Graphs/Lines/ and /View/Multiple Graph
/Lines/. Compute the correlation matrix of these variables, by
/View/Correlations. Are these correlations spurions?

[Time series showing trends over time will always be highly cor-
related. The reason is that they are both driven by time. The
high correlation does not imply a causal relationship, it may be
spurious.]

22
7. Use /Quick/Estimate Equation/ to estimate the equation:

log(M 1) = α + β log(GDP ) + γRS + ε.

The equation specification in the dialog window is simply “log(M1)


c log(GDP) RS.” Have a look at the estimates of the unknown
parameters of the equation.

8. Select now the menu /estimate/ to estimate the equation

log(M 1) = α + β log(GDP ) + γRS + δ log(P R) + ε.

Have a look at the output. Construct a confidence interval for δ.


Do you think that PR is a significant variable?

[The test statistic for testing H0 : δ = 0 is given by


δ̂ − 0
T = .
SE(δ̂)
The statistic T follows approximately a standard normal distribu-
tion (under H0). The value that it takes is called the “t-stat.”
The P-value is the probability that the test statistic takes values
more extreme than the observed one (under H0).
The following rule applies:
P-value > α ⇒ do not reject H0 at significance level α
P-value < α ⇒ reject H0 at significance level α

The default choice for the significance level is α = 0.05. This level
gives the type I error, i.e. the probability of rejecting H0 when it
23
holds. The smaller the choice of α, the more conservative we are
towards H0.
If P-value< 0.05, then the corresponding variable is said to be
significant (for explaining Y ). If P-value< 0.01, then it is highly
significant.
It is often better to interpret the P-value on a continuous scale
(e.g. P=0.049 and P=0.051 is almost identical). The smaller the
P-value, the more evidence in the data against the null hypothesis.

Some authors prefer to report only the t-stats. A variable is sig-


nificant if the t-stat is larger than 2 in absolute value.]

9. There is a problem with the regression model estimated above. The


error term still contains too much “structure.” Save the residuals,
which can be considered as estimates of the error terms, by se-
lecting /Procs/make residual series/ within the equation window.
Make then the correlogram of the residuals and comment.

10. Take the model in differences:

∆ log(M 1) = α0 + β∆ log(GDP ) + γ∆RS + δ∆ log(P R) + ε0,

where ∆ is the mathematical notation for the difference operator.


In Eviews, you can specify “dlog(m1) c dlog(gdp) d(rs) dlog(pr).”
(1) Is the constant term significant? (2) Test H0 : δ = 0 (3) Make
the correlogram of the residuals.

24
II Basic Principles of Eviews
• Eviews is a windows oriented econometrical
software package.
• For every new data set, a workfile needs to be
created.
• Workfiles are characterized by a frequency and
a range.
• A workfile contains different objects.
• Objects may be of different types like series,
groups, equations, graphs, ...
• The available toolbars/menus of an object win-
dow depend on the type of the object
• The same EVIEWS instruction can be given
in several ways.
• It is possible to write programs in EVIEWS.

25
III Descriptive Univariate Statistics
Given a univariate sample x1, . . . , xn, then we
can compute
• location measures: mean x̄, median, ..
• spread/dispersion measures: standard devia-
tion σ̂, range=maximum-minimum, ...
• measure of asymmetry: skewness coefficient
Xn
1 xi − x̄ 3
Sk = ( )
n σ̂
i=1
Positive skewness means long right tail.
• measure of “heavy tails”: kurtosis coefficient
n
1 X xi − x̄ 4
κ= ( )
n σ̂
i=1
At normal distributions κ ≈ 3. If κ < 3, the
distribution is said to be peaked (leptocurtic).
If κ > 3, the distribution is said to be heavy
tailed or flat (platycurtic) w.r.t. the normal.
26
The distribution of the data can be pictured by
an histogram or a kernel density plot.

A quantile-quantile plot (QQ-plot) can be used


as a visual check for normality of the data. The
points in the QQ-plot should closely follow a lin-
ear pattern, if the distribution is normal.

A formal test of normality is the Jarque-Bera test,


which is based on skewness and kurtosis. If the
associated P-value is smaller than 5%, than we
do reject the null hypothesis of normality.

Before starting an econometrical analysis, it is


important to screen the data for outliers. Making
boxplots is helpful.

27
IV Correlation coefficients
Given two stochastic variables X and Y . The
covariance between X and Y is defined as
Cov(X, Y ) = E[(X − E(X))(Y − E(Y ))].
The correlation between X and Y is defined as
Cov(X, Y )
Corr(X, Y ) = p .
Var(X)Var(Y )
We have that
• −1 ≤ Corr(X, Y ) ≤ 1
• |Corr(aX + b, cY + d)| = |Corr(X, Y )|
• Corr(X, Y ) = 1 (respectively =-1) if and only
if there exist a > 0 (resp. a < 0) and b such
that Y = aX + b.
• If Corr(X, Y ) = 0 then we say that X and Y
are uncorrelated. If (X, Y ) follows a normal
distribution, then uncorrelatedness implies in-
dependency.
28
From a random sample X1, . . . , Xn, we can es-
timate ρ = Corr(X, Y ) by the correlation coeffi-
cient
Pn
i=1 (xi − x̄)(yi − ȳ)
ρ̂ = qP P .
n 2 n (y − ȳ)2
(x
i=1 i − x̄) i=1 i
The correlation coefficient is used as a measure
of the strength of the linear association between
2 variables. It is telling us to which extent 2
variables “move together”, and has nothing to
say about causal relations.

Exercise
For 6 datasets, visualized by their scatterplots
on the next page, we computed correlation coef-
ficients and obtained:
-0.70 0.01 0.74 0.74 0.95 0.99
Match the correlations with the data sets.

29
4.0
5
4

3.0
3
y

y
2

2.0
1

1.0
-1

1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0

x x

40
0
-2

20
y

y
-4

0
-6

-20
-8

1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0

x x
0.8

2.5
0.6

1.5
y

y
0.4

0.5
0.2

-0.5

1.0 1.5 2.0 2.5 3.0 3.5 4.0 1.0 1.5 2.0 2.5 3.0 3.5 4.0

x x

30
Serial correlation:
If the data are a random sample X1, X2, . . . , Xn,
then Corr(Xi, Xj ) = 0 for 1 ≤ i 6= j ≤ n. But
if the data form a time series, then presence of
autocorrelation or serial correlation may occur.
If Xt is a stationary time series, then this auto-
correlation can be quantified.
Definition: Xt is a stationary series if
1. E[Xt] = µ for all t
2. Var(Xt) = σ 2 for all t
3. Corr(Xt, Xt−k ) = ρk for all t, and for k =
1, 2, . . ..

We call ρk the autocorrelation at lag k. It can be


estimated as
Pn
t=k+1 (xt − x̄)(xt−k − x̄)
ρ̂k = Pn 2
.
t=1(xt − x̄)

31
The graph of the function k → ρ̂k is called the
correlogram.
A correlogram also indicates critical bounds. If
ρ̂k passes them, then it is significantly different
from zero (H0 : ρk = 0 is rejected at 5%).
series 1 correlogram

1.0
6
4

0.6
ACF
2

0.2
0
-2

-0.2

0 20 40 60 80 100 0 5 10 15 20
Lag
Time

series 2 correlogram
1.0
2
1

0.6
0

ACF
-1

0.2
-2

-0.2
-3

0 20 40 60 80 100 0 5 10 15 20
Lag
Time

The correlograms above show that there is much more serial correlation

in the first than in the second series. One says that there is more

persistency in the first series.

32
V Exercise
In the file spurious.txt you find annual data from
1971 to 1990 for the variables
X: average disposable income per capita (1980
prices)
Y : average consumption per capita (1980 prices)
Z: number of professional football players
We are interested in the correlations between these
variables. Note that we have here time series
data and no cross-sectional data.
1. Create a new workfile and import the data (/File/Import/Text/)

2. Open X and Y as a group (select them and use the rightmouse


button)

3. Make a graph of the 2 series (View/Graph/Line).

4. Make a scatterplot of Y versus X (View/Graph/Scatter/). Besides


a simple scatter plot, you can also add a fitted line (scatter with
regression) or a fitted curve (scatter with kernel fit).

5. Compute the covariance and the correlation between X and Y


(view/correlations and view/covariances)

33
6. Use Quick/Estimate equation to estimate the coefficients of the
regression equation Y = α + βX + ². (Type in “Y c X ” where c
represents the constant term α).

7. Open now X, Y and Z as a group. (i) make a graph of the 3 series


together (ii) make a scattermatrix (/View/Multiple Graphs/Scatter/)
(iii) Compute the Correlation Matrix.

8. Create by using /Genr/ the 3 series in differences. (Use d(X),


d(Y), d(Z)). Compute now the correlation matrix. Comment on
the results.

34
Session 3: The Linear Regression Model

I The model

In a regression problem a dependent/respons vari-


able Y is explained using explanatory/explicative
variables X1, . . . , Xp. If p = 1, then we have sim-
ple regression, otherwise multiple regression.
The regression model is given by
Y = α + β1X1 + . . . + βpXp + ε,
with α the intercept parameter and β1, . . . , βp
slope parameters. The disturbance term is ε.

We observe a sample {(Xi, Yi)|1 ≤ i ≤ n} of size


n, and suppose that every observation follows the
model:
Yi = α + β1Xi1 + . . . + βpXip + εi,
35
for 1 ≤ i ≤ n.

Conditions on the error terms εi are


H1. The explicative variables are independent of
the error terms
H2. E(εi) = 0
H3. Cov(εi, εj ) = 0 for all i 6= j (independent
error terms)
H4. Var(εi) ∼ constant = σ 2 (homoscedasticity)
H5. εi follow a normal distribution

Conditions H1 and H2 are crucial and always


needed. The condition of uncorrelatedness of the
error terms (H3) is often violated for time series
data. Condition (H4) says that the error terms
need to be homoscedastic and not heteroscedas-
tic.
36
Conditions H1 and H2 imply that the conditional
mean function or regression function is given by
E[Y |X1, . . . , Xp] = α + β1X1 + . . . βpXp.
This is the function of main interest in linear re-
gression: it allows to estimate conditional expec-
tations.

Note that
∂E[Y |X1, . . . , Xp]
= βj
∂Xj
for any 1 ≤ j ≤ p, → interpretation of βj :

“If Xj changes with one unit, then Y changes


with βj units, on average and all other variables
kept constant.”

If Xj and Y measured in logarithms:


“If Xj changes with one percentage, then Y changes
with βj percentages, on average and all other
variables kept constant.”
37
If also H4 holds then the conditional variance
function is given by
Var[Y |X1, . . . , Xp] = σ 2

II The ordinary least squares estimator

Let α̂ and β̂1, . . . , β̂p be estimators of the regres-


sion parameters. The fitted values of the depen-
dent variable are then
Ŷi = α̂ + β̂1Xi1 + . . . + β̂1Xip.

The residuals are defined as


ri = yi − ŷi.

The ordinary least squares (OLS) estimators of


α and β1, . . . , βp are such that
X n
ri2
i=1
38
is minimized.

The estimated regression function is Ŷ = α̂ +


β̂1X1 + . . . + β̂pXp. If we have a new observa-
tion with values X01, . . . , X0p for the explicative
variables, then we predict the associated Y0 by
Ŷ0 = α̂ + β̂1X01 + . . . + β̂pX0p.

While the error terms are not observable, the


residuals are, and can be used as estimate for the
εi. The residuals will be used later on to check
the conditions H3 upto H5.

The parameter σ 2 is estimated by


X n
2 1
σ̂ = ri2,
n−k
i=1
where k = (p + 1) is the number of estimated
regression parameters, and ri = Yi − Ŷi.

39
III Properties of OLS

Let β stand for one of the regression parameters

• Under H1+H2, the OLS-estimators are unbi-


ased, that is
E[β̂] = β and E[α̂] = α.

• Expressions for the standard errors of α̂ and β̂


are known and valid under H1+H2+H3+H4. If
also H5 is true, then
α̂ ∼ N (α, SE(α̂)) and β̂ ∼ N (β, SE(β̂))
If H5 is not true, then the OLS estimators are
only asymptotically normal.

• A test statistic for testing H0 : β = 0 is given


by
β̂
T = .
SE(β̂)
40
Under H1→ H5, this test statistic follows a stu-
dent t-distribution with n − (p + 1) degrees of
freedom. The values that T takes is called the
“t-stat.”

It is convenient to consider the P-value, defined


as P-value=P (|T | ≤ |t|), where t is the “t-stat”.
We can apply the rule
P > δ ⇒ do not reject H0 at significance level δ
P < δ ⇒ reject H0 at significance level δ
The default choice for the significance level is
0.05. If we reject H0, then we say that X is a
significant variable. If P-value< 0.01, then X
is a highly significant variable. Recall that the
P-value needs to be interpreted on a continuous
scale. The smaller the P-value, the more evidence
in the data against the null hypothesis.

41
IV Summary Statistics in Regression Analysis

R-squared
The R-squared statistic is the fraction of the vari-
ance of the dependent variable explained by the
independent variables:
2 VarŶ Var(residuals)
R = =1− .
VarY VarY
It measures the predictive power of the regression
equation.
• R2 = 1 if and only if Yi = Ŷi for all i
• R2 = 0 if and only if Ŷi = Ȳ for all i
We also call R2 the squared multiple correlation
coefficient.

Do not use R2 to compare models with different


dependent variables.

42
Adjusted R-squared
A problem with using R2 as a measure of good-
ness of fit is that it never decreases if you add
more regressors. The adjusted R2 penalizes for
the addition of regressors which do not contribute
to the explanatory power of the model:

2 2 n−1
AdjustedR = 1 − (1 − R )
n−k

F-statistic
The F-statistic tests the hypothesis that all of
the slope coefficients (excluding the intercept) in
a regression are zero:
H0 : β1 = . . . = βp = 0.
An accompanying P-value is given by the soft-
ware. The F-test is a joint test, keeping the joint
type I error under control. Note that even if all
the t-statistics are insignificant, it is not excluded
43
that the F-statistic is highly significant.

Durbin-Watson Statistic
The Durbin-Watson (DW) statistic measures the
serial correlation (of order one) in the residuals.
The statistic is computed as
Pn 2
(r t − r )
DW = t=2Pn 2t−1
t=1 ri
• 0 ≤ DW ≤ 4
• H3 → DW ≈ 2
• DW << 2 → positive autocorrelation
(There are better tests for serial correlation in the
error terms.)

44
V Residual Plots

Residual plots help to check the model assump-


tions. The residuals ri or standardized residuals
ri/σ̂ are plotted versus their index (best for time
series data) or versus Ŷi (best for cross-sectional
data). These plots should
• Show no particular structure (to check linear-
ity assumption)
• Have homogeneous dispersion (to check for ho-
moscedasticity)
• If residuals have the tendency to remain “close”
to each other → indication of positive serial
correlation. Also when sequences of residuals
with the same sign are observed. Additionally,
a correlogram of the residuals can be made.
• Values larger than 3 for ri/σ̂ → possible out-
liers
45
• To check for normality of error terms: use QQ-
plots. If the error terms deviate strongly from
normality, and the sample size is not too large,
then results are in doubt.
Exercise Comment on the residuals plots below:

500
1
residual

residual
0

0
-1

-1000

0 5 10 15 20 25 30 0 5 10 15 20 25 30

time time
8

2
6

1
residual

residual
4

0
2

-1
0

-2
-2

0 5 10 15 20 25 30 0 5 10 15 20 25 30

time time
0.4
residual

residual

50
0.0

0
-50
-0.4

0 5 10 15 20 25 30 0 5 10 15 20 25 30

time time

46
VI Using Dummy Variables

Y needs to be a continuous variable. The ex-


plicative variables may be continuous or categor-
ical. In the latter case, one should replace X by
(K −1) dummy variables in the model equation,
where K is the number of categories.

Example Let Y be the income of a person. We


want to relate it to its work experience X (in
years), sex (male/female) and educational level
(primary/secondary/higher eduction). Herefore
we consider the model
Y = α + β1X + β2M + β3E1 + β4E2 + ε,
where
M = 1 if person is male and M = 0 if not
E1 = 1 if primary school and E1 = 0 if not
E2 = 1 if secondary school and E2 = 0 if not
47
(we call higher education the “reference level”).

The expected income of a female with secondary


school education and 10 years of work experience
is therefore 10β1 + β3 + α.

Exercise

During 7 years we measure every season the total


number of cars sold in a certain country (denoted
by Yt, in units of 10000). Let Xt be the price of
oil in real terms. We want to regress Yt on Xt +
time trend + “seasonal dummies.” So the model
is
Yt = α + βXt + γt + δ1St1 + δ2St2 + δ3St3 + εt.
(a) Estimate and interpret the regression param-
eters. How does the interpretation changes if X
and Y are taken in logarithm.
48
(b) Comment on the values of R2, DW, and the
F-statistic
(c) Make the residual plot and comment
The data are
Season Yt Xt
Spring 1990 16.30 39.04
Summer 1990 15.20 31.22
Autumn 1990 19.40 34.47
Winter 1990 15.60 31.44
Spring 1991 17.60 28.95
.. 15.98 29.59
.. 19.10 29.15
.. 16.53 22.99
.. 17.66 25.50
.. 16.17 28.71
.. 20.05 29.77
.. 16.12 25.86
.. 17.23 32.33
.. 16.41 27.88
.. 19.67 31.07
.. 15.35 27.80
.. 18.81 27.28
.. 18.64 28.32
.. 18.74 30.71
.. 15.24 27.28
.. 17.44 37.83
.. 17.30 30.74
.. 20.79 25.11
.. 16.03 28.52
.. 18.91 25.28
.. 16.52 30.01
.. 19.07 31.37
.. 12.83 37.65

49
VII Exercises

1. We have annual data for the UK economy, for


the years 1953-1964, on the percentage change
in wages W and the percentage of the labor
force unemployed U :
W 4.4 5.4 7.1 6.2 4.2 3.1 2.6 3.3 3.8 3.6 4.1 4.4
U 1.5 1.3 1.1 1.2 1.4 2.1 2.2 1.6 1.5 2.0 2.1 1.6

(a) Create a workfile and enter the data (use


/quick/empty group (edit series)/).
(b) Make a scatterplot of W versus U . Does the
relation between them looks linear? What
is their correlation?
(c) Estimate the regression equation W = α +
βU + ε. Interprete the estimated value of
β. Is U a significant varable?
(d) Have a look at the residual series. Do the
assumptions H1-H5 seems to be plausible?
50
Make a QQ-plot of the residual series to
check for normality.
2. For 25 households we have data (in “house-
holds.wf1”) on their total consumption expen-
diture (X) and on their food expenditure (Y ).
(a) Estimate the regression equation Y = α +
βX +ε. Predict the value of Y for X = 200
and for X = 1000.
(b) Estimate the regression equation log(Y ) =
α + β log(X) + ε. Predict the value of Y
for X = 200 and for X = 1000.
(c) Which of the 2 models do you prefer? Make
a scatterplot of Y versus X.

51
3. Ten experts make a prediction for the eco-
nomic growth in the EU and ten other experts
in the US for next year:
EU US
2.1 2.6
2.5 2.4
2.3 3.2
1.4 0.8
1.5 1.3
1.5 2.1
2.4 1.6
2.7 3.2
2.8 3.1
1.1 1.4

(a) Test whether the predictions for US and Eu-


rope are on average the same.
(b) Test for normality of the error terms, given
the small sample size.
(c) How does your answer changes if the 10 ex-
perts making predictions are the same?
52
Session 3: Model Specification

Model Specification: which model to use? which


variables to include? which kind of disturbance
terms? How to specify a model is a difficult task.

Diagnostic tests help us to check for the validity


of the model specification. Also residual plots can
serve as diagnostics for the model assumptions.
If the diagnostic tests reject the validity of the
model, then it is misspecified and another model
needs to be proposed.

I Running Example

Demand for food in the USA, yearly data (1963-1992, file: “food.wmf”).
Q: the demand for food in constant prices
X: total expenditure in current prices
P : a price index for food
G: a general price index
Economic Theory suggests Q = f (X, P, G).
53
1. Make line graphs + some descriptive statistics of the series Q, X,
P and G, and

2. Estimate the model:

log(Q) = α + β1 log(X) + β2 log(P ) + β3 log(G) + ε

3. Interpret the signs and the magnitude of the estimated regression


parameters? Which variables are significant?

4. Interpret the values of R2 and adjusted R2, the Durbin-Watson


and the F-statistic.

5. Make a graph of the actual and the fitted series log(Q). Make a
residual plot. (use /View/Actual, fitted,residuals). Make a QQ-
plot and a correlogram of the residuals. Comment.

II Coefficient tests

The Wald test is the most often used for testing


for restrictions on the coefficients. If k restrictions
are tested for, then the Wald test statistic follows
asymptotically a chi-squared distribution with k
degrees of freedom.

54
If we test for “H0: g(parameters)=0”, then the
Wald test rejects the null hypothesis if “g(estimated
parameters)” is too far from 0.

(In case that all restrictions are linear the F -


statistic can be used.)

1. Use /View/representations/ to know how Eviews labels the coeffi-


cients

2. Test H0 : β2 = 0. Are you surprised by this outcome? (Use


/View/coefficient tests/)

3. Test (i) H0 : β2 = β3 = 0 (ii) H0 : β1 + β2 + β3 = 0 (iii)


H0 : β22 = β3β4.

4. Do you propose to drop log(P ) from the model?

III Omitted and Redundant variables

Models are preferably compact or parsimoneous.


Clearly non-significant variables should not be in-
cluded in the model. But it is dangerous to delete
55
all variables having P-value larger than, for exam-
ple, 5%. Indeed, we have
• Omitting important variables from the regres-
sion equation yields biased estimators.
• Adding non-significant variables increases the
variability of the estimator (larger standard-
errors), but the estimates remain valid (i.e.
unbiased and consistent).
If variables are important for economic reasons,
then they should remain in the model.

Tests for redundant/omitted variables are based


on a comparison of the original model and the
model without/with the redundant/omitted vari-
ables. They compare
* the values of R2 for the 2 models (F-test)
* the value of the log-likelihood of the 2 models
(Likelihood-ratio test)
56
1. Use Eviews to test whether log(P ) is a redundant variable. Are
you suprised by this result? Delete log(P ) from the equation and
estimate the model again using /Estimate/.

2. Generate a variable representing a time trend (use “@trend”)

3. Test whether this trend is an omitted variable.

4. Test whether trend and log(P ) together are omitted variables.

Remark: If two (or more) explicative variables


are highly correlated, then we have the problem
of multicollinearity. In this case, the estimates
remain valid, but we have increased variability.
A simple solution to avoid multicollinearity is to
drop one of the two highly correlated variables.
This is even mandatory in case of perfect corre-
lation between the two variables.

57
IV Residual Tests

There are existing many test for misspecification.


Most of them are based on the residuals.

• White heteroscedasticity test: with r denoting


the residuals, we estimate the equation
r2 = c + γ1X1 + γ2X12 + γ3X2 + γ4X22 + . . .
Under the null hypothesis of no heteroscedastic-
ity, none of these estimated gamma’s should be
significantly different from zero. White uses the
value of nR2 of the model, which should not be
too big. It needs to be compared with the critical
value of a chi-square distribution with as many
degrees of freedom as slope parameters in the
above test equation.

• Breusch-Godfrey LM test for serial correlation:


With r denoting the residuals, we estimate the
58
equation
r = c + γ1 X1 + . . . γ p Xp
+γp+1r(−1) + . . . γp+k r(−k) + ε,
with r(-1), ..., r(-k) lagged version of the residuals.
Under the null hypothesis of no serial correlation,
none of these estimated gamma’s should be sig-
nificantly different from zero. The test uses the
value of nR2 of the model, which should not be
too big. It needs to be compared with the critical
value of a chi-square distribution with k degrees
of freedom.

V Consistent estimation of Covariance

The Least Squares estimator is still consistent


(meaning that it converges to the true value of
the parameter if the sample size tends to infin-
ity) under mild form of serial correlation or het-
59
eroscedasticity. However, the theoretical expres-
sion of the covariance matrix of this estimator is
derived under the hypothesis H1 → H5, and be-
comes invalid if H3 or H4 no longer hold. In this
case, one could use
• White formula for heteroscedasticity consis-
tent covariance estimation
• Newey-West formula for heteroscedasticity and
autocorrelation consistent (HAC) covariance
estimation
Remarks:
* It is also possible to correct for heteroskedas-
ticity by specifying the form of the conditional
variance and using weighted least squares.
* By specifying a dynamic model for the distur-
bance terms serial correlation can be corrected
for. In some cases adding lagged versions of the

60
dependent (and independent) variables can solve
the problem of serial correlation.

1. Apply the White heteroscedasticity test with and without cross-


terms. What is the difference between both tests? Do you need to
reject the assumption of homoscedasticity.

2. (i) Apply the serial correlation LM test. (ii) What is an ARCH


LM test doing?

3. Look how the standard errors around your estimates change when
you use the White estimator for the covariance matrix of the es-
timator (Use the options when estimating the model equation in
EVIEWS). Same question for Newey/West.

61

You might also like