# ECONOMETRICS: an introduction

Edward Omey
HUB - Stormstraat 2 - 1000 Brussel
e-mail: edward.omey@hubrussel.be
web: www.edwardomey.com
1. What is econometrics?
2. The method of least squares
3. Selection of variables
4. The basic assumtions
5. Checking the basic assumptions
6. Making Predictions
7. Some references
Together with this text is an example that is called
’EconometricsExampleText.xls’.
There is also a …le called
’BBA3EconometricsAndExcel’
and there is a table with critical values of the Kolmogorov-Smirnov test.
Both are available in "Teaching material BBA3" on www.edwardomey.com.
1
1 What is econometrics?
1.1 Introduction
Econometrics can be seen as scienti…c research in which we complete the
results of economic assumptions and theories with quantitative information
that is based on real data.
In economic theory, usually there is a qualitative analysis based on some
reasonable assumptions. As an example, in economic theory one argues that
an increase in the price of a product will result in a decrease of the demand
of that product. In econometrics we want to quantify this theoretical result
to know what will be the resulting demand if we increase the price by, for
example, 1 euro.
In mathematical economics one tries to present economic theory in a
mathematical, formal way. In econometrics we want to verify these mathe-
matical models by using real data. As a result we can eventually decide that
the theory is a good or a bad representation of the real world.
Economic statisticiens usually collect data and present data in various
ways. Usually these data form the basis of further statistical analysis.
1.2 Econometric approach
In general, the econometric approach can be represented in the following
table. The table consists of 5 levels and 3 main types of input.
I (economics) II (data) III (methodology)
1 economic theory facts mathematics & probability
2 economic model data statistics
3 econometric model worked data econometric methods
4 operational econometric model
5 veri…cation prediction evaluation
We brie‡y discuss the entries of this table.
1.3 Input from economics
In economics, people develop "rules" about how people, companies and states
behave. Usually this analysis is qualitative and presents the in‡uence of -
2
resp. causal relations between - one group of variables on other groups of
variables.
Sometimes one uses economic models which give a "simple" representa-
tion of the real life.
Example. Keynes examined the relations between consumption and sav-
ings and he found that "people tend to spend more when income goes up".
A simple model is the following model: C = c + /1 , 0 < / < 1, where
C = consumption and 1 = income.
This formula is an example of a mathematical model. It consists of 2
variables and 1 relation. On a graph the model corresponds to a straight line
with slope /. The graph intersects the vertical axis in c.
In econometric models it is important to answer the following questions:
- how many variables are we going to include in a model?
- which variables are we going to include in a model?
- how many relations and which relations are we going to include in the
model?
Example
In studying the income Y of individual workers, economic theory leads to
a formal relation of the form
1 = ,(age, sexe, level of education, years of experience, knowledge
of languages,...)
To formulate an econometric model it is necessary to know how many
variables and which variables to include ànd to specify the relation 1 =
,(....).
1.4 Input from data
In all econometric analysis, we need data. In some cases we have to collect
raw data and in other cases we can use data that have been collected by
others - statistical institutions.
1.4.1 There are di¤erent types of data.
Usually one consider the following 4 types of data
a) Qualitative - the data can not be represented by meaningful numbers
a1) Nominal data: we use groups or categories depending on a qualitative
property
Examples: the colour of hair; sexe of people; the type of car,...
3
a2) Ordinal data: these are nominal data which can be sorted in a mean-
ingful way
Examples: the stars of hotels; the size of shoes; the highest diploma;...
b) Quantitative - the data can be represented by meaningful numbers
b1) Interval data
b2) Ratio data
Ratio data are based on real variables that have a natural zero. The
number zero means that the characteristic is absent.
Example: income (my income is zero); points (I obtained zero points);
revenue (I sold zero products);...
When we deal with ratio data, then the words "double" or "halve" make
sense.
In dealing with interval data we have a variable without a natural zero
and it doesn’t make sense to use the words "double" or "halve".
Typical examples are:degrees in

C or in

1; all IQ-scales.
1.4.2 Dummy
In many econometric models we have to deal with qualitative variables. In
order to include such variables in a mathematical model it will be useful to
quantify these qualitative variables. To this end we are going to use 1 or
more dummy variables. A dummy variable is a variable that can take on
only 2 values: 1 or 0.
Example. When we have a qualitative variable with 2 categories, we use
1 dummy variable. Suppose that we have to deal with the sexe of people, we
can de…ne 1 = 1 if the person is female, and 1 = 0 if the person is
male.
Example. When we have a qualitative variable with 3 categories, we use
2 dummies. Suppose that we want to model the colours at a tra…c light. We
can de…ne
1(1) = 1 if light is red 1(1) = 0 otherwize
1(2) = 1 if light is green 1(2) = 0 otherwize
In this case we …nd the following possibilities
1(1) 1(2) result
1 0 :cd
0 1 q:cc:
0 0 o:c:qc
1 1 i:jo::i/|c
4
1.4.3 Quality of data
Many companies and government organisations collect data and construct
(huge) databases. Some of these are freely available on internet.
Using these data one has to be careful about the quality of the data. In
many cases the published data suggest a higher precision than in reality. One
often publishes rounded numbers or (weekly, monthly,...) averages. Some
always round down and others round in another way.
In collecting data one often has problems with validity: do the data
measure what we really want to measure? For one variable it is often the
case that people use several proxies. To measure - for example - poverty in
a country, one can use several variables:
- Gini - coe¢cient
- the proportion of people living below the poverty line
- the ratio Q(3),Q(1)
etc.
1.4.4 Reworking data
Sometimes we have to modify the data before using them in a model. Ex-
amples of modi…cations are
- replace nominal values by real values
- to calculate weekly mean values from daily values
- calculate relative changes and/or proportions
- transform variables to increase correlation coe¢cients
etc.
1.4.5 Time-series versus cross-sections
In general we can distinguish two types of analysis in econometrics
In a time series we measure variables on di¤erent moments in time (daily,
weekly, yearly,...).
In a cross section we measure variables in the same moment.
Example
- We study the monthly unemployment rate in Belgium and use (for ex-
ample) the monthly in‡ation, the monthly interest rate, the monthly relative
change in the working population,...
5
- We study the 2010 unemployment rate in the di¤erent countries of
the EU and we use (for example) the mean income, the interest rate, the
population density, the ratio new companies/failed companies, etc.
1.5 Input from methodology
In our analysis we are going to need some techniques from mathematics,
probability theory and statistics.
1.5.1 Stochastic models
In econometrics we often deal with relations of the form
1 = ,(A. 2. l. \. ...)
where 1 is the dependent variable and A. 2. l. \ ,... are independent or
explanatory variables. Because it is very plausable that we "forgot" 1 or
more variables, usually one adds an error term in the formula:
1 = ,(A. 2. l. \. ).
This error term summarizes all errors that can occur.
- measurement errors
- observation errors
- rounding of data
- variables that we "forgot" in our analysis (because we didn’t take them
into account or because we didn’t want to take them into account)
- irrelevant variables that we included in our analysis
- stochastic behaviour: in all social, biological, ... environments there is
some indeterminism present. Under the same conditions, the sales can be
di¤erent from day to day because of random decisions of people
- etc.
1.5.2 Linear models
In this course we consider only linear models. These are mathematical
relations that are linear in the parameters.
Examples
* 1 = c +/A
6
* 1 = c +/A +cA
2
* 1 = c +/A +c2 +d exp(A)
* 1 = c +/ log(A) +c log(2)
etc.
1.5.3 "Good" models
We will us the KISS principle in econometrics! We consider a model a "good"
model if
* it is simple: a model is stronger if it explains a lot by only a few variables
* it is theoretical consistent: if from the theoretical point of view we
expect that a variable has a positive in‡uence on 1 , we expect that the
model re‡ects also this positive e¤ect!
* it has predictive power: we wish that the predictions made by the model
are "close" to reality!
* explanatory power: we wish that the results of the model are close to
the data that we used to obtain the model!
1.5.4 Econometric methods
Model: In general we are going to use models of the form
1 = c +/A +c2 +dl +c\ +... +
= ,(A. 2. l. \. .... c. /. c. .... ).
Here
* 1 is the variable that we want to explain;
* A. 2. l. \. ... are the explanatory variables, the variables that we use to
explain 1 ;
* c. /. c. d. ... are the parameters in the model;
* is the (stochastic) error term.
Data. To estimate the parameters, we need data about 1. A. 2. l. \. ...We
suppose that we have : datapoints:
* 1
1
. A
1
. 2
1
. l
1
. \
2
* 1
2
. A
2
. 2
2
. l
2
. \
2
...
* 1
n
. A
n
. 2
n
. l
n
. \
n
.
7
By de…nition, we assume that each of these data points satis…es the "for-
mula":
1
i
= ,(A
i
. 2
i
. l
i
. \
i
. .... c. /. c. ....
i
), for i = 1. 2. .... :.
We assume that the parameters are independent of the index i but that the
error term may be di¤erent for di¤erent values of the index.
Method
In the formula
1
i
= ,(A
i
. 2
i
. l
i
. \
i
. .... c. /. c. ....
i
), for i = 1. 2. .... :,
we know the values of 1
i
. A
i
. 2
i
. ... but not the values of c. /. c. ... and
i
.
Hoping that
i
is "small", we delete
i
and we approximate 1
i
by
´
1
i
:
´
1
i
= ,(A
i
. 2
i
. l
i
. \
i
. .... c. /. c. ...), for i = 1. 2. .... :.
The approximation error that we make will be denoted by c
i
:
c
i
= 1
i
÷
´
1
i
, for i = 1. 2. .... :.
We want the errors to be "as small as possible". In order to do this, we can
de…ne the "global" error in di¤erent ways. We can take for example
mean error = c =
1
:
n

i=1
c
i
or
¹1 = absolute deviation =
n

i=1
[c
i
[ ,
or
oo1 = sum or squared errors =
n

i=1
c
2
i
.
8
1.5.5 Least squares method
A popular and attractive method in econometrics is the least squares
method (LSE). In this method we estimate the papameters by solving the
following mathematical problem:
min
a;b;c;:::
oo1.
We consider a model to be the best model, if the sum of the squared
errors is smaller than any other model.
Example. Let us consider the following data:
X Y
3 10
6 15
9 30
12 35
15 25
18 30
21 45
24 45
We consider the following 2 models:
Model 1:
´
1 = 8 + 1. 6 + A
Model 2:
´
1 = 8. 5 + 1. 5 + A
Using these formulas, we …nd the following results:
X Y Model 1 errors model 1 Model 2 errors model 2
3 10 12,8 -2,8 13 -3
6 15 17,6 -2,6 17,5 -2,5
9 30 22,4 7,6 22 8
12 35 27,2 7,8 26,5 8,5
15 25 32 -7 31 -6
18 30 36,8 -6,8 35,5 -5,5
21 45 41,6 3,4 40 5
24 45 46,4 -1,4 44,5 0,5
Simple calculations show the following results:
9
Model 1 Model 2
c -0,225 0,625
¹1 39,4 39
oo1 241,96 243
If we use AD, then model 2 performs better than model 1. If we use SSE,
then model 1 performs better than model 2.
10
2 The method of least squares
2.1 Introduction
Recall from the previous chapter that we are going to use the method of least
squares. To this end we need a model and data!
Model
In general we are going to use models of the form
1 = c +/A +c2 +dl +c\ +... +.
Data
To estimate the parameters, we need data about 1. A. 2. l. \. ...We sup-
pose that we have : datapoints:
* 1
1
. A
1
. 2
1
. l
1
. \
2
* 1
2
. A
2
. 2
2
. l
2
. \
2
...
* 1
n
. A
n
. 2
n
. l
n
. \
n
.
By de…nition, we have
1
i
= c +/A
i
+c2
i
+dl
i
+... +
i
, for i = 1. 2. .... :.
We assume that the parameters are independent of the index i but that the
error term may be di¤erent for di¤erent values of the index.
Method
We delete
i
and …nd the following approximation:
´
1
i
= c +/A
i
+c2
i
+dl
i
+... + , for i = 1. 2. .... :.
The approximation error that we make is c
i
:
c
i
= 1
i
÷
´
1
i
, for i = 1. 2. .... :.
The "global" error is given by oo1:
oo1 = sum or squared errors =
n

i=1
c
2
i
.
11
The least squares method (LSE): we estimate the papameters by solv-
ing the following mathematical problem:
min
a;b;c;:::
oo1.
We consider a model to be the best model, if the sum of the squared
errors is smaller than any other model.
2.2 Example 1: the simple model 1 = c +/A +
2.2.1 The least squares method
In the simplest model we have only 1 explanatory variable (A) and 2 para-
meters (c. /). Now we have
Model: 1 = c +/A +;
Data: 1
1
. A
1
. 1
2
. A
2
. ...... 1
n
. A
n
and 1
i
= c +/A
i
+
i
;
Approximation:
´
1
i
= c +/A
i
;
Approximation error: c
i
= 1
i
÷
´
1
i
= 1
i
÷c ÷/A
i
;
Global error: oo1 =

c
2
i
=

(1
i
÷c ÷/A
i
)
2
;
LSE: we have to solve the problem
min
a;b
oo1 = min
a;b

(1
i
÷c ÷/A
i
)
2
.
Solution (details are here for completeness)
To solve the problem, we rewrite oo1 as follows: we insert A and 1 in
the expression for oo1 and we …nd
oo1 =

(
_
1
i
÷1 ) ÷/(A
i
÷A) + (1 ÷c ÷/A)
_
2
=

(1
i
÷1 )
2
+/
2

(A
i
÷A)
2
+

(1 ÷c ÷/A)
2
÷2/

(1
i
÷1 )(A
i
÷A)
+2

(1
i
÷1 )(1 ÷c ÷/A)
÷2/

(A
i
÷A)(1 ÷c ÷/A).
For the last two terms, we …nd

(1
i
÷1 )(1 ÷c ÷/A) = (1 ÷c ÷/A)

(1
i
÷1 ) = 0,

(A
i
÷A)(1 ÷c ÷/A) = (1 ÷c ÷/A)

(A
i
÷A) = 0.
12
Simplifying, we …nd that
oo1 =

(1
i
÷1 )
2
+/
2

(A
i
÷A)
2
+:(1 ÷c ÷/A)
2
÷2/

(1
i
÷1 )(A
i
÷A)
For the other terms, we use the following notations:
\ (1 ) =

(1
i
÷1 )
2
and :
2
(1 ) =
\ (1 )
: ÷1
\ (A) =

(A
i
÷A)
2
and :
2
(A) =
\ (A)
: ÷1
\ (A. 1 ) =

(1
i
÷1 )(A
i
÷A) and Co·(A. 1 ) =
\ (A. 1 )
: ÷1
.
We call \ (1 ) the variation of 1 and \ (A. 1 ) the covariation of A and
1 . It is clear that this is highly related to covariance and correlation:
Co·(A. 1 ) =
1
: ÷1

(1
i
÷1 )(A
i
÷A),
:(A. 1 ) =
Co·(A. 1 )
:(A):(1 )
=
\ (A. 1 )
_
\ (A)\ (1 )
.
Using these new notations, we have calculated that
oo1 = \ (1 ) +/
2
\ (A) ÷2/\ (A. 1 ) +:(1 ÷c ÷/A)
2
.
Note that the last term in this expression is always zero or positive for
all values of c. /.
To omit any contribution of this term, we impose the condition that this
term is zero:
1 ÷c ÷/A = 0
or equivalently
1 = c +/A.
We can simplify oo1 and …nd
oo1 = \ (1 ) +/
2
\ (A) ÷2/\ (A. 1 ).
13
This is a quadratic relation in / and from mathematics we know that
oo1 is a convex parabola.
(See e.g. at the website: http://en.wikipedia.org/wiki/Parabola)
This parabola reaches a minimum in the value
/ =
\ (A. 1 )
\ (A)
,
and the value at this mimimum is given by
oo1 = \ (1 ) ÷
\
2
(A. 1 )
\ (A)
.
As a conclusion, we have solved the mathematical problem. The optimal
values of c and / will be denoted by ´c and
´
/. We …nd
´
/ =
\ (A. 1 )
\ (A)
=
Co·(A. 1 )
:
2
(A)
,
´c = 1 ÷
´
/A,
´
1 = ´c +
´
/A.
The expressions for ´c and
´
/ are called the least squares estimates for c
and /.
The …nal result
´
1 = ´c +
´
/A is called the regression (line) of 1 on A.
2.2.2 Some important remarks
« In order to make the calculations, we have to assume that :
2
(A) ,= 0!
If :
2
(A) = 0, it means that we use a variable A that is a constant.
« The regression line is given by
´
1 = ´c +
´
/A. Since 1 = ´c +
´
/A, it
follows that (A. 1 ) is a point on the regression line.
« For the errors we …nd
c
i
= 1
i
÷
´
1
i
= 1
i
÷´c ÷
´
/A
i
.
It follows that the mean error is zero:
c = 1 ÷´c ÷
´
/A = 1 ÷´c ÷
´
/A = 0.
14
« For the variation of the errors \ (c), we …nd
\ (c) =

(c
i
÷c)
2
=

c
2
i
= oo1.
Recall that
\ (c) = oo1 = \ (1 ) ÷
\
2
(A. 1 )
\ (A)
.
« For the variation of
´
1 , we …nd that
\ (
´
1 ) = \ (´c +
´
/A)
=
´
/
2
\ (A)
=
\
2
(A. 1 )
\ (A)
.
« Combining these two formulas, we …nd that
\ (c) = \ (1 ) ÷\ (
´
1 ),
or
\ (1 ) = \ (
´
1 ) +\ (c).
For this simple model the variation of 1 can divided into two parts:
the total variation is the sum of the variation explained by the model
(\ (
´
1 )) and the variation that can not be explained by the model
(\ (c)).
« Variations: we use the following notations:
oo1 = \ (1 ) = the variation of 1
oo1 = \ (
´
1 ) = the variation of
´
1
oo1 = \ (c) = the variation of c.
The previous remark shows that oo1 = oo1 +oo1.
2.2.3 Numerical example
We take the example of the previous section. We have the following data:
15
X Y
3 10
6 15
9 30
12 35
15 25
18 30
21 45
24 45
and we consider the following model: 1 = c +/A +.
In this example we …nd:
A = 13. 5 \ (A) = 378 :
2
(A) = 378,8
1 = 29. 375 \ (1 ) t 1121. 9 :
2
(1 ) t 1121. 9,8
\ (A. 1 ) = 577. 5 Co·(. 1 ) = 577; 5,8
The least squares estimators are given by
´
/ =
Co·(A. 1 )
:
2
(A)
= 1. 5278
´c = 1 ÷
´
/A = 8. 75
and the regression line of 1 on A is given by
´
1 = 8. 75 + 1. 5278A.
For the model we …nd the following errors and estimates (we rounded the
numbers)
X Y
3 10
6 15
9 30
12 35
15 25
18 30
21 45
24 45
´
1 c c
2
13,3334 -3,3334 11,11
17,9168 -2,9168 8,51
22,5002 7,4998 56,25
27,0836 7,9164 62,67
31,667 -6,667 44,45
36,2504 -6,2504 39,07
40,8338 4,1662 17,36
45,4172 -0,4172 0,17
One can verify that c = 0 and that \ (
´
1 ) t 882. 3, \ (c) t 239. 6.
Note that \ (1 ) t 1121. 9 = 882. 3 + 239. 6.
16
2.2.4 The quality of the model
From the mathematical point of view, if :
2
(A) ,= 0, we can always calculate
the regression of 1 on A. In this section we brie‡y discuss how to measure
the quality of the model. We want to measure how "good" the model …ts
the data.
Correlation coe¢cient As a …rst indicator we can calculate the corre-
lation coe¢cient :(1.
´
1 ). For a perfect model we …nd :(1.
´
1 ) = ±1. In
econometrics it is common to use :
2
(1.
´
1 ). We call :
2
(1.
´
1 ) the 1
2
÷value of
the model. Alternatively 1
2
is also called the determination coe¢cient:
to what extent is 1 determined by
´
1 . We always have 0 _ 1
2
_ 1 and the
ideal model has 1
2
= 1 or 1
2
= 100%.
In our example we …nd 1
2
= 78. 6%.
Scatter plot As a second indicator, we can make a scatter plot of 1 ÷
´
1 .
In the ideal situation, all points are on the …rst diagonal.
ANOVA As a third indicator we can analyse the variations or variances
in the model. Recall that for our model we have
\ (1 ) = \ (
´
1 ) +\ (c).
The third indicator is the ratio of what we can explain divived by what
we have to explain:
1
2
=
\ (
´
1 )
\ (1 )
.
It is not a mistake to use the same notation 1
2
: one can provethe following
property
Property
For linear models with a constant term, we have
1
2
=
\ (
´
1 )
\ (1 )
= :
2
(1.
´
1 ).
17
2.3 Example 2: The model 1 = c +/A +c2 +
2.3.1 The least squares method
In this model we have only 2 explanatory variables (A and 2) and 3 para-
meters (c. /. c). Now we have
Model: 1 = c +/A +c2 +;
Data: 1
1
. A
1
. 2
1
. 1
2
. A
2
. 2
2
. ..... 1
n
. A
n
. 2
n
and 1
i
= c +/A
i
+c2
i
+
i
;
Approximation:
´
1
i
= c +/A
i
+c2
i
;
Approximation error: c
i
= 1
i
÷
´
1
i
= 1
i
÷c ÷/A
i
÷c2
i
;
Global error: oo1 =

c
2
i
=

(1
i
÷c ÷/A
i
÷c2
i
)
2
;
LSE: we have to solve the problem
min
a;b;c
oo1 = min
a;b;c

(1
i
÷c ÷/A
i
÷c2
i
)
2
.
Solution
From Example 1 recall that the optimal values of c and / were given by
Co·(A. 1 ) = /:
2
(A),
1 = c +/A
In the new 3÷parameter model one can show that the optimal values of
c. /. c are the solutions (if they exist) of the following system of equations:
Co·(A. 1 ) = /:
2
(A) +cCo·(A. 2),
Co·(2. 1 ) = /Co·(2. A) +c:
2
(2),
1 = c +/A +c2.
We are going to use EXCEL to solve this mathematical problem.
2.3.2 Multicolinearity and quasi-multicolinearity
Examples From the mathematical point of view, the system of equations
doesn’t always has a unique solution!
Example 1
Let us consider the system
5 = /6 +c10
7 = /12 + c20
18
It is clear that the second equation can be simpli…ed and we …nd
5 = /6 +c10
3. 5 = /6 +c10
and there is no solution!
Note that if we look at the coe¢cients of / and c in the system, we have
6 10
12 20
and we see that the second row is a multiple of the …rst row!
Example 2
Let us consider the system
5 = /6 +c10
10 = /12 + c20
In this case, we can simplify again and we …nd
5 = /6 +c10
5 = /6 +c10
or just only one equation: /6 + 10c = 5. It is clear that we …nd many
solutions!
Note that if we look at the coe¢cients of / and c in the system, we have
6 10
12 20
and we see that the second row is a multiple of the …rst row!
Example 3
Let us consider the system
5 = /6 +c10
10 = / +c
In this case the second equation gives / = 10÷c and as a result, for equation
1 we …nd that
5 = (10 ÷c)6 +c10
19
or ÷55 = 4c, or c = ÷55,4 and then / = 10 ÷c.
Note that if we look at the coe¢cients of / and c in the system, we have
6 10
1 1
and we see that the second row is a multiple of the …rst row!
De…nition of MC and QMC Looking at the system of equations that
we have to solve, we have the following coe¢cients of / and c:
:
2
(A) Co·(A. 2)
Co·(A. 2) :
2
(2)
We have a problemof multicolinearity (MC) if the second rowis a multiple
of the …rst row! This is
n(:
2
(A). Co·(A. 2)) = (Co·(A. 2). :
2
(2)),
where n ,= 0 is a constant. In this case we …nd that
n + :
2
(A) = Co·(A. 2)
n + Co·(A. 2) = :
2
(2)
From these equations, we can …nd n. We have
n =
Co·(A. 2)
:
2
(A)
n =
:
2
(2)
Co·(A. 2)
and we …nd that
Co·(A. 2)
:
2
(A)
=
:
2
(2)
Co·(A. 2)
or
Co·
2
(A. 2)
:
2
(A):
2
(2)
= 1
or
:
2
(A. 2) = 1.
Recall that when :
2
(A. 2) = 1, then :(A. 2) = ±1 and then A and 2
are perfectly linearly related: there is a linear relationship between A and
2.
20
De…nition
In the model 1 = c +/A +c2 +, we have MC (multicolinearity) if
A and 2 are linearly related, i.e. if :
2
(A. 2) = 1.
When the variables A and 2 are not linearly related, but almost linearly
related, we say that we have a problem of quasi-multicolinariry (QMC).
This happens when :
2
(A. 2) is "close" to 1.
In econometrics, we want to avoid QMC!
We make the following agreement:
If :
2
(A. 2) _ 0. 36 or if ÷0. 60 < :(A. 2) < 0. 60, we don’t have
problems of QMC;
If :
2
(A. 2) 0. 36 or if :(A. 2) 0. 60 or :(A. 2) < ÷0. 60, we have
a problem of QMC.
For models with 3 or more explanatory variables, we are also going to
discuss QMC. We have a problem of QMC if one of the variables is highly
linearly related with one or more other explanatory variables.
Example We consider the model 1 = c +/A +c2 +, where
1 = the half year salesvolume
A = the time
2 = a dummy (2 = 1 in the …rst half year and 2 = 0 in the second
half year)
Y X Z
4 1 1
1 2 0
6 3 1
2 4 0
11 5 1
5 6 0
11 7 1
7 8 0
15 9 1
9 10 0
We …nd (cf.Computer seminar): :
2
(A) = 8. 25, :
2
(2) = 0. 25 and :(A. 2) =
÷0. 17. For this model we don’t have problems with QMC and we can solve
21
the system of equations. We …nd
´c = ÷2. 4;
´
/ = 1. 2 and ´c = 5. 8;
´
1 = ÷2. 4 + 1. 2A + 5. 81.
We call
´
1 the regression of 1 on A and 1. In the next table we calculate
´
1 and the errors c.
1 A 2
4 1 1
1 2 0
6 3 1
2 4 0
11 5 1
5 6 0
11 7 1
7 8 0
15 9 1
9 10 0
´
1 c
4. 6 ÷0. 6
0 1
7 ÷1
2. 4 ÷0. 4
9. 4 1. 6
4. 8 0. 2
11. 8 ÷0. 8
7. 2 ÷0. 2
14. 2 0. 8
9. 6 ÷0. 6
The 1
2
of the model is given by 1
2
= :
2
(1.
´
1 ) = 96% which is rather
close to 100%.
For the variations, we …nd:
oo1 = \ (1 ) = 174. 9; oo1 = \ (
´
1 ) = 168. 1; oo1 = \ (c) = 6. 8.
Clearly oo1 = oo1 +oo1 and oo1,oo1 = 0. 96 = 1
2
.
22
3 Selection of variables
3.1 Introduction
When studying a variable1 , usually we aregoing to start with many explana-
tory variables. In this section we show how to select the variables in a suitable
way. We are going to study the so-called "forward selection".
The basic ideas are the following:
« if we have to choose between 2 variables, we choose the best variable;
« it is desirable to choose variables with a high contribution;
« we want to avoid QMC
« we want the variables with the highest marginal contribution.
3.2 Selection
We are going to explain the steps by an example.
The data and the di¤erent parts of the calculations are available
fromthe website: www.edwardomey.com(teaching - econmetrics).
The problem is the following. We want to …nd a model for the price 1 of
cars. To this end we collected information about cars and the collected data.
The variables that we intend to use are the following:
Y: the price of a car
X(1): Power (pk)
X(2): Length (mm)
X(3): Weight (kg)
X(4): Fuel amount (l)
X(5): size backspace (l)
X(6): # doors
X(7): Gasoil? (yes = 1)
X(8): Computer? (yes=1)
23
3.2.1 Step 1: correlation coe¢cients
As a starting point, we calculate all correlations between 1 and all variables.
In our example we …nd the following table:
Y X(1) X(2) X(3) X(4) X(5) X(6) X(7) X(8)
Y 1
X(1) 0,80 1
X(2) 0,68 0,58 1
X(3) 0,86 0,62 0,76 1
X(4) 0,62 0,42 0,51 0,80 1
X(5) 0,60 0,50 0,81 0,69 0,45 1
X(6) 0,24 0,25 0,70 0,23 0,13 0,58 1
X(7) 0,21 -0,16 0,14 0,39 0,33 0,17 -0,06 1
X(8) 0,60 0,54 0,49 0,52 0,31 0,45 0,29 0,06 1
We can use this correlation table for several purposes.
Sign of the correlation coe¢cients From the theoretical point of view,
we expect that A(1) and 1 have a positive correlation. In the table, we see
that the data con…rm our theoretical expectiations. We can check all the
sign to see whether or not they con…rm the theoretical expectations.
If we notice that a sign is wrong, it is possible that our theoretical motiva-
tion is wrong. It is also possible that the correlation coe¢cient has the wong
sign, but that it si statistically not signi…cant. For the t÷test, see below.
Finally it is also possible that we have a problem with outliers.
Sort the variables w.r.t. 1 In the table we can see that A(3) has the
highest correlation with 1 . If we select variables, A(3) will be our natural
…rst choice.
Sorting the correlations (in absolute value) w.r.t. 1 we …nd the following
table:
24
Variable A :(1. A)
A(3) 0. 86
A(1) 0. 80
A(2) 0. 68
A(4) 0. 62
A(5) 0. 60
A(8) 0. 60
A(6) 0. 24
A(7) 0. 21
The table shows us the order in which variables are going to enter the
model.
Small correlations Before studying the problem we hope that all variables
will be important in the model. Sometimes some correlation coe¢cients are
very small and maybe the variable is not as important as we thought.
When we see small correlation coe¢cients, several cases can occur
* the correlation measures a linear relationship. Maybe the relationship
in our case is not linear but some other nonlinear relationship. To check this,
we make an A ÷1 -scatter and decide to transform the variable(s) or not.
* it is possible indeed that the value of the variable is small and perhaps
we can delete the variable from the model.
t-test To test whether or not a calculated correlation coe¢cient is "small",
we use a t÷test. We have the following test:
H
0
: j = 0 vs H
a
: j ,= 0
Using the sample correlation coe¢cient : we calculate the t÷statistic:
t(:) = :
_
: ÷2
1 ÷:
2
and under suitable conditions, we have t(:) s t
n2
, the Student t÷distribution
with parameter : ÷2. Using prob-values we can decide to reject H
0
or not.
In our example we perform the t÷test for :(A(6). 1 ) and :(A(7). 1 ). We
…nd : = 106 and
:(A(6). 1 ) = 0. 24; t(:) = 2. 53; Prob-value (two-sided): 0. 012
:(A(7). 1 ) = 0. 20; t(:) = 2. 24; Prob-value (two-sided): 0. 027
If we choose c = 5%, then we can reject H
0
in both cases.
25
3.2.2 Choice of the …rst variable
It is clear that the …rst variable in the model will be A(3) because A(3) has
the highest correlation w.r.t. 1 . Now we construct a …rst model:
model 1: 1 = c +/A(3) + .
Using EXCEL, we …nd the following result:
´
1 = ÷7406. 5 + 24. 23A(3)
1
2
= 0. 73
Note that the coe¢cient of A(3) is positive. This corresponds to the
theoretical expextations and with the positive correlation coe¢cient that we
found earlier.
To see if this model 1 is statistically signi…cant, we perform an 1÷test.
We have the following
H
0
: the model is statistically not signi…cant
H
a
: the model is statistically signi…cant
To choose between H
0
and H
a
, we use the 1
2
and calculate the following
1÷value
1(1
2
) = 1 =
1
2
(j ÷1)
(1 ÷1
2
),(: ÷j)
Here we use : = the sample size, and j = the number of parameters in the
model.
If 1
2
is small (resp. not small), we …nd an 1÷value that is small (resp.not
small). In order to decide whether or not the calculated 1÷value is su¢-
ciently large, we calculate its prob-value by using a Fisher 1÷distribution
1(j ÷1. : ÷j)
For our example, we have
1
2
= 0. 73
1 = 283. 6
: = 106 and j = 2
Prob-value is 1. 76 + 10
31
We conclude that our model 1 is a statistically relevant model.
26
3.2.3 Choice of a second variable
Taking into account QMC
« As a …rst candidate, we can choose A(1) as a second variable.
To see if it is allowed to choose A(1), we have to check for QMC.
In our case we …nd that :(A(1). r(3)) = 0. 62. Earlier we have decided
that this value indicates problems with QMC.
« The next candidate is A(2) and now we …nd :(A(2). A(3)) = 0. 76 and
again QMC
« The next candidate is A(4) and we have :(A(4). A(3)) = 0. 80 and
again QMC
« For the same reason A(5) has to be kicked out of the model.
For A(8) we …nd that :(A(8). A(3)) = 0. 52 and this value is acceptable
from the QMC-point of view.
Model 2 Now we construct a second model:
model 2: 1 = c +/A(3) + cA(8) + .
Using EXCEL, we …nd the following result:
´
1 = ÷3170. 6 + 21. 12A(3) + 4929. 33A(8)
1
2
= 0. 764
Note that the coe¢cients of A(3) and A(8) are positive. This corresponds
to the theoretical expextations and with the positive correlation coe¢cients
that we found earlier.
To see if the model is relevant, we perform the 1÷test as before. We …nd
1
2
= 0. 76
1 = 166. 6
: = 106 and j = 3
Prob-value is 5. 17 + 10
33
We conclude that our model 2 is a statistically relevant model.
27
Marginal contribution of A(8) Before we decide that we are going to
keep A(8) as a second variable, we have to check whether or not the contri-
bution of A(8) to the model is statistically signi…cant.
We de…ne the `C(A(8)) = the marginal contribution of A(8) as the
change in the value of 1
2
.
Model 1: 1
2
= 0. 73
Model 2: 1
2
= 0. 764
MC(A(8)) = 0. 764 ÷0. 73 = 0. 034.
To evalueate the marginal contribution, we perform an 1÷test as follows:
1÷value of MC(A(8)):
1 =
`C(A(8))
(1 ÷1
2
mod el2
),(: ÷j
mod el2
)
We calculate the prob-value by using an 1(1. :÷j
mod el2
)÷distribution
In our example we …nd : = 106 and j = 3 so that
1 = 14. 05
Prob-value = 0. 0003
Since the prob-value is small (_ 5% or _ 1%), the marginal contribution
of A(8) is statistically signi…cant or relevant.
3.2.4 Choice of a 3rd variable: candidate 1
Taking into account QMC The …rst candidate is variable A(6).
We check for …rst order QMC.
We …nd :(A(6). A(3)) = 0. 23 and :(A(6). A(8)) = 0. 29.
These values are accceptable.
To check for higher order QMC, we proceed as follows: we try to explain
A(6) by A(3) and A(8). If A(6) can be explained well, we have a problem
of QMC. If not, we can proceed to model 3. We estimate the parameters in
the following model:
A(6) = n +·A(3) + nA(8) + .
We …nd the following result:
´
A(6) = 3. 85 + 0. 00029A(3) + 0. 5629A(8)
1
2
= 0. 094.
28
It turns out that we can explain 9. 4% of the variation in A(6) by A(3). A(8).
This small number indicates that there is no problem with QMC.
We make the following agreement: if we check for QMC, we calculate
1
2
and then:
If 1
2
_ 0. 36, we don’t have problems of QMC;
If 1
2
0. 36, we have a problem of QMC and have to reject the
candidate variable.
Model 3 Now we construct a new model:
model 3: 1 = c +/A(3) + cA(8) + dA(6) + .
Using EXCEL, we …nd the following result:
´
1 = ÷3538. 25 + 21. 09A(3) + 4875. 57A(8) + 95. 48A(6)
1
2
= 0. 764
Note that the coe¢cients of A(3), A(8) and A(6) are positive. This
corresponds to the theoretical expextations and corresponds with the positive
correlation coe¢cients that we found earlier.
To see if the model is relevant, we perform the 1÷test as before. We …nd
1
2
= 0. 764
1 = 110
: = 106 and j = 4
Prob-value is 7. 43 + 10
32
We conclude that our model 3 is a statistically relevant model.
Marginal contribution of A(6) Before we decide that we are going to
keep A(6) as a third variable, we have to check whether or not the contribu-
tion of A(6) to the model is statistically signi…cant.
We …nd:
Model 2: 1
2
= 0. 764
Model 3: 1
2
= 0. 764
MC(A(6)) = 0.
To evaluate the marginal contribution, we perform an 1÷test as follows:
1÷value of MC(A(6)):
1 =
`C(A(6))
(1 ÷1
2
mod el3
),(: ÷j
mod el3
)
29
We calculate the prob-value by using an 1(1. :÷j
mod el3
)÷distribution
In our example we …nd : = 106 and j = 4 so that
1 = 0
Prob-value = 100%
Since the prob-value is large, the marginal contribution of A(6) is statis-
tically irrelevant!
3.2.5 Choice of a 3rd variable: candidate 2
Taking into account QMC The next candidate is variable A(7).
We check for …rst order QMC.
We …nd :(A(7). A(3)) = 0. 39 and :(A(7). A(8)) = 0. 06.
These values are accceptable.
To check for higher order QMC, we estimate the parameters in the fol-
lowing model:
A(7) = n +·A(3) + nA(8) + .
We …nd the following result:
´
A(7) = ÷0. 727 + 0. 00079A(3) ÷0. 266A(8)
1
2
= 0. 184.
It turns out that we can explain 18. 4% of the variation in A(7) by
A(3). A(8). This small number indicates that there is no problem with
QMC.
New model 3 Now we construct a new model:
model 3: 1 = c +/A(3) + cA(8) + dA(7) + .
Using EXCEL, we …nd the following result:
´
1 = ÷4611. 34 + 22. 68A(3) + 4401. 99A(8) ÷1982. 21A(7)
1
2
= 0. 774
Note that the coe¢cient of A(7) is negative. From the correlation table
we expected a positive contribution.
To see if the model is relevant, we perform the 1÷test as before. We …nd
1
2
= 0. 774
30
1 = 116
: = 106 and j = 4
Prob-value is 8. 55 + 10
33
We conclude that our new model 3 is a statistically relevant model.
Marginal contribution of A(7) Before we decide that we are going to
keep A(7) as a third variable, we have to check whether or not the contribu-
tion of A(7) to the model is statistically signi…cant.
We …nd:
Model 2: 1
2
= 0. 764
Model 3: 1
2
= 0. 774
MC(A(7)) = 0. 01.
To evalueate the marginal contribution, we perform an 1÷test as follows:
1÷value of MC(A(7)):
1 =
`C(A(7))
(1 ÷1
2
mod el3
),(: ÷j
mod el3
)
We calculate the prob-value by using an 1(1. :÷j
mod el3
)÷distribution
In our example we …nd : = 106 and j = 4 so that
1 = 4. 51
Prob-value = 3. 6%
Since this prob-value is smaller than 5%, the marginal contribution of
A(7) is statistically relevant at the 95% level!
31
4 The basic assumptions
4.1 Introduction
Up to now, we have devoted attention to the technique of selecting variables
and estimating parameters in linear models. As usual in statistics, we want
to obtain con…dence statements about the parameters. To this end we have
to formulate some basic assummptions.
Recall that we consider models of the form
1 = c +/A +c2 +.
Taking into account our data, we have
1
i
= c +/A
i
+c2
i
+
i
, for i = 1. 2. .... :.
The basic assumptions are assumptions about
i
4.2 The basic assumptions
4.2.1 BA1: 1(
i
) = 0. \i
This basic assumption indicates that we make no systematic errors and that
we didn’t forget important and relevant variables.
The assumption implies that we don’t have outliers in the data and that
we don’t have clusters in the data.
4.2.2 BA2: \ c:(
i
) = o
2
. \i
This assumptions states that the variance of the model-error term is a con-
stant: it is independent of 1. A. 2 and independent of the index i. If the
assumption holds we have homogeneity of the variance and we call the model
a homoscedastic model. If the assumption doesn’t hold, we have a problem
of heteroscedasticity.
4.2.3 BA3: Co·(
i
.
j
) = 0. \i ,= ,
This assumption states that the error terms should show no correlation and
they should have no in‡uence on each other. The assumption is automatically
satis…ed if the error terms are independent.
32
4.2.4 BA4:
i
s `(0. o
2
). \i
The normality assumption allows us to obtain con…dence statements for the
parameters. Also, we are allowed to perform the 1÷tests (cf. selection of
variables) if this basic assumption holds.
4.2.5 BA5: Assumptions about the explanatory variables
Earlier we have discussed QMC. In our models we are going to avoid QMC-
problems.
Also we assume that in our models 1 = c +/A +c2 + all randomness
is in the -term and not in the variables 1. A. 2.
4.3 Statistical properties of the LS-estimates
Recall that for the simple linear model 1 = c+/A+ and the approximation
´
1 = ´c +
´
/A, we have found that
´
/ =
\ (A. 1 )
\ (A)
,
´c = 1 ÷
´
/A.
The parameters of the model are c. / and o
2
. The estimates for c and /
are ´c and
´
/. We have the following important result.
Theorem 1 Suppose that all basic assumptions hold. Then we have
´c s `(c. o
2
ba
),
´
/ s `(/. o
2
b
b
),
where
o
2
ba
= \ c:(´c) = o
2
A
2
::
2
(A)
,
o
2
b
b
= \ c:(
´
/) = o
2
1
::
2
(A)
,
Co·(´c.
´
/) = o
2
÷A
::
2
(A)
.
33
I would like to stress that we are making all calculations in EXCEL. These
formulas are important to obtain some information about the quality of the
estimators.
We see that the variance of
´
/ depends on 3 elements:
« dependence on o
2
: if o
2
= \ c:() is larger, then \ c:(
´
/) is larger. This
means that if we allow larger ‡uctuations in the error term, there will
be larger ‡uctuations in the parameter estimates.
« dependence on :: if : increases, then \ c:(
´
/) dexreases. This means
that we well have better results if the sample size is bigger.
« dependence on :
2
(A): if :
2
(A) increases, then \ c:(
´
/) decreases. This
means that if we are far away from MC and QMC, then the estimates
are better.
The previous result can be used to obtain con…dence intervals for c and
for /. We …nd the following 95% c.i.:
c = ´c ±.
2;5%
o
ba
,
/ =
´
/ ±.
2;5%
o
b
b
.
Although these formulas are correct, the are without use! In order to use
the formulas, we need o
2
ba
and to …nd this,we need o
2
!
Theorem 2 Suppose that all basic assumptions hold. Then we have that o
2
can be estimated by :
2
e
, where
:
2
e
=
oo1
: ÷j
.
Moreover, o
2
ba
and o
2
b
b
can be estimated by :
2
ba
and :
2
b
b
, where
:
2
ba
= :
2
e
A
2
::
2
(A)
, :
2
b
b
= :
2
e
1
::
2
(A)
.
As a compensation, in the con…dence statements we have to replace the
normal distribution by a t÷distribution. We …nd the following 95% c.i.:
c = ´c ±t
np;2;5%
:
ba
,
/ =
´
/ ±t
np;2;5%
:
b
b
.
In practise, EXCEL calculates all things that we need.
34
4.4 Example
We have a look at Model 3B of the previous section. The model was the
following:
model : 1 = c +/A(3) + cA(8) + dA(7) + .
The full excel-output is given by the following 3 parts:
SUMMARY OUTPUT
Regression Statistics
Multiple R 0,879
R square 0,774
Standard Error 4356,7
Observations 106
ANOVA
df SS MS F Signi…cance F
Regression 3 6,6E09 2,21E09 116,32 8,55E-33
Residual 102 1,9E09 1,9E07
Total 105 8,6E09
Coe¤. Stand.Err t-stat P-value Lower 95% Upper 95%
Intercept -4611,4 2558 -1,78 0,077 -9745 522
X(3) 22,68 1,723 13,11 1,2E-23 19,24 26,11
X(8) 4401,99 1317 3,34 0,001 1789 7014
X(6) -1982,2 937,7 -2,11 0,037 -3842 -122
Part 1
In part 1 we get the regression statistics. Important for us is the 1
2
÷value.
In our case it is 1
2
= 77. 4%.
The standard error of the model is given by :
e
= 4356. 7. It means that
with the model we make errors that have ‡uctuations around 0 and the size
of the ‡uctuations is around :
e
= 4356. 7.
It is wize to compare the size of the errors with the size of 1 . The mean
price of the cars in our example is given by 1 = 32473. 6. The relative error
in our model is given by
:
e
1
=
4356. 7
32473. 6
t 0. 13.
35
With the model we make relative errors around 13%. As a rule of the thumb,
we are glad if the relative error is less than 10%.
Part 2
Part 2 is devoted to the analysis of the variance
In our notations, we have
(regression) SSR = 6623984149
(residuals) SSE =1936113261
(total) SST = 1936113261
The abbreviation "df" means degrees of freedom. When we calculate the
mean square, we …nd for example that
oo1
: ÷j
=
1936113261
106 ÷4
= 18981502. 56
This is :
2
e
= 18981502. 56. Calculating the square root gives :
e
= 4356. 7
which we got in part 1.
The 1 ÷ ·c|nc is the 1÷statistic that we use to see if 1
2
is su¢ciently
large. The prob-value of the 1÷value is given by "F-signi…cance".
Part 3
Part 3 gives the detailed statistical analysis of the parameter estimates.
We consider /, the coe¢cient of A(3).
From the excel output, we see that
´
/ = 22. 68
:
b
b
= 1. 723
Using a 95% c.i. we obtain that
/ =
´
/ ±t
np;2;5%
:
b
b
= 22. 68 ±1. 98 + 1. 723
= [19. 24 ÷26. 11]
which is given in the last columns of the table in Part 3.
The results of Part 3 also allow to test the following type of hypotheses.
Let us consider
H
0
: / = 0 versus H
a
: / ,= 0
36
and let us use c = 5%.
The …rst method to choose between H
0
and H
a
is based on con…dence
intervals. Since the 95% c.i. is given by [19. 24; 26. 11],we conclude that we
reject H
0
.
A second method is based on calculating prob-values. The t÷value of the
sample result is given by
t ÷value =
´
/ ÷/
0
:
b
b
=
´
/
:
b
b
=
22. 68
1. 723
t 13. 11.
Using the t
102
÷distribution, we …nd that
1([t
n2
[ 13. 11) = 1. 261 ÷23
This small value allows us to reject H
0
.
Remark. Excel always calculates a two-sided prob-value.
37
5 Checking the basic assumptions
5.1 Introduction
In view of the previous section, the basic assumptions are crucial for all
statistical properties of the estimators and the models in econometrics. In
the section we discuss how the check whether or not the basic assumptions
hold. If the basic assumptions do not hold, we have a problem with the
statistical properties and this may contaminate any conclusions we make.
5.2 Basic assumption 1
If basic assumption 1 doesn’t hold, then our estimates are biased and most
of the regression output is not reliable any more.
We check this basic assumption by making a scatter plot of 1 and the
calculated residuals or errors c.
We hope to see a graph without outliers and without clusters.
If we see outliers, we have to check the data again: did we make an
input-error? is the data correct? do we have a special data-point?
It is possible that we decide to delete the data line that generates the
outlier.
If we see clusters, we forgot a variable in the model. Careful checking the
data in the cluster(s) may lead to the introduction of a new variable or a
new dummy.
In our example we …nd the following graph:
5.3 Basic assumption 2
If basic assumption 2 doesn’t hold, then all estimates related to variances
(:
2
e
. :
2
ba
. 1
2
,...) are not longer valid and mostof the statistical analysis (con-
…dence intervals, 1-value) is not reliable any more.
We check this basic assumption in several ways
5.3.1 Scatter plots
We maka a scatter plot of c
2
and each of the variables 1. A. 2. .... In a time-
seriesanalysis, we also make a scatter plot (i. c
2
i
). In each plot we hope to see
a horizontal box without any systematic pattern or trend.
38
BA 1
-12000
-8000
-4000
0
4000
8000
12000
10000 15000 20000 25000 30000 35000 40000 45000 50000 55000
Y
e
39
BA 2
0
20000000
40000000
60000000
80000000
100000000
120000000
140000000
10000 20000 30000 40000 50000 60000
Y
e
²
In our example, we …nd the following graphs (1. c
2
) and (A(3). c
2
). Graphs
of the dummies are not useful.
The graphs show a more or less horizontal box. There is possibly one
outlier that we missed earlier.
5.3.2 Correlation coe¢cients
As a second tool we calculate the correlation coe¢cient between c
2
and the
variables 1. A. 2. .... For a timeseries we also calculate the correlation coef-
…cient :(c
2
i
. i). Under ideal circumstances, all these correlation coe¢cients
equal zero. We hope to …nd "small" correlation coe¢cients.
In our example we …nd:
:(c
2
. 1 ) = 0. 02269; :(c
2
. A(3)) = ÷0. 044;
:(c
2
. A(8)) = ÷0. 115; :(c
2
. A(7)) = 0. 072
These correlation coe¢cients seem to be small.
40
BA2
0
20000000
40000000
60000000
80000000
100000000
120000000
140000000
1000 1500 2000 2500
X(3)
e
²
5.3.3 Bartlett’s test
In this test, …rst we sort the data with respect to one of the variables and
then divide the data into two equal or almost equal parts. For each of these
parts, we calculate the regression as in the …nal model and for each part we
calculate :
2
e
(1) and :
2
e
(11).
If we have homoscedasticity, then we expect :
2
e
(1) t :
2
(11). If on the
other hand these values are "far away" from each other, then we have het-
eroscedasticity.
As an example, we sort our data with respect to 1 and then divide the
data into two parts.
Part I contains 53 data lines. In part I, the variable A(8) takes only the
value 0 and has to be excluded from the regression analysis.
For part I, we …nd:
number of parameters: 3
1
2
= 0. 103
:
e
(1) = 3396. 46
Observations: 53
Part II also contains 53 data points and all variables vary. For part II,
41
we …nd:
number of parameters: 4
1
2
= 0. 629
:
e
(11) = 3641. 08
Observations: 53
Bartlett’s test allows us to choose between the following set of hypothesis:
H
0
: o
2
(1) = o
2
(11) vs H
a
: o
2
(1) ,= o
2
(11).
As a statistic, we use `, where
` =
:
2
e
(1)
:
2
e
(11)
Under H
0
(and if all other basic assumptions hold) we have ` s 1(:(1) ÷
j(1). :(11) ÷j(11)), where 1 denotes the 1-distribution, :(1). :(11) are the
sample sizes and j(1). j(11) the number of parameters.
In our example we have ` = (3396. 46,3641. 08)
2
= 0. 87. The 1÷distribution
that we need is the 1(53 ÷3. 53 ÷4) = 1(50. 49).
Using this 1-distribution, we …nd that the prob-value of ` is given by
1(1(50. 49) _ 0. 87) t 0. 31.
If we use c = 5%, we don’t reject H
0
.
We should perform Bartlett’s test by sorting wrt each of the variables.
If we study a timeseries, we also have to divide the data into two parts by
sorting wrt to time i.
5.4 Basic assumption 3
This is not a part of the study material this year
5.5 Basic assumption 4
If basic assumption 4 doesn’t hold, then all con…dence statements are not
valid anymore. Most of the statistical analysis (con…dence intervals, 1-value)
is not reliable any more. We check this basic assumption in several ways.
42
histogram
0
0,02
0,04
0,06
0,08
0,1
0,12
0,14
0,16
0,18
0,2
-8500 -6500 -4500 -2500 -500 1500 3500 5500 7500 9500 11500
5.5.1 Histogram of the residuals
We make a histogram of the residuals and hope it has the normal, bell-shaped
form. First we have to construct a frequency table of the errors c
i
. We use
around 10 classes of equal lenght.
In our example we …nd:
We …nd back a normal shaped histogram.
5.5.2 The test of Kolmogorov and Smirnov
Kolmogorov and Smirnov compared the empirical distribution function (EDF)
and the theoretical distribution function (TDF).
The TDF is, as we assume in BA 4 given by s `(0. o
2
). We estimate
o
2
by using :
2
e
and use A s `(0. :
2
e
) as the TDF.
For each of the residuals c
i
, we calculate
111(c
i
) = 1(A _ c
i
).
43
The EDF is the empirical distribution of errors, this is, for each of the
residuals, we calculate
111(c
i
) =
1
:
#(c::o:: _ c
i
).
If BA4 holds, then we expect 111 t 111. If,on the other hand, there
is a major di¤erence between 111 and 111, we don”t believe that BA4
holds. As a test-statistic, Kolmogorov and Smirnov use the largest di¤erence
between the 111 and the 111:
1o = max [111 ÷111[
In our example we calculate all ingredients.
Part of the table wit EDF and TDF is given here:
e EDF TDF
1 -8459,737709 0,009433962 0,026084149
2 -8067,756171 0,018867925 0,032029298
3 -7942,166053 0,028301887 0,034155828
4 -7703,271272 0,037735849 0,038521048
9 -5471,484999 0,08490566 0,104584053
....
104 8479,141356 0,981132075 0,974184401
105 8994,578009 0,990566038 0,980514971
106 10710,65694 1 0,993021928
The graph of EDF and TDF is given in the following picture:
Clearly the two distribution functions are rather close to each other. For
the KS-statistic we …nd
1o = max [111 ÷111[ = 0. 054.
For 1o, we don’t calculate the prob-value, but for small samples, we use
tables with critical values.
www.york.ac.uk/depts/maths/tables/pdf.htm
44
EFD and TDF
0
0,1
0,2
0,3
0,4
0,5
0,6
0,7
0,8
0,9
1
-12000 -7000 -2000 3000 8000 13000
e
45
or from
www.edwardomey.com
For large samples, we use the critical values given by:
1o(c) =
_
÷ln(c)
2:
.
In our case we …nd 1o(5%) = 1. 22,
_
106 = 0. 118.
Since the calculated value (0. 054) is less than the critical value, we don’t
reject BA4.
46
6 Making predictions
After constructing a model, it is interesting to …nd information about the
quality of the predictions made by the model. To this end we have to use
NEW. For our example, we obtained 10 new data lines:
Y X(3) X(8) X(7)
22990 1530 0 0
29215 1219 0 0
35980 1860 0 1
23800 1220 0 0
17699 1180 0 0
40750 1995 0 1
31500 1708 1 0
31500 1815 0 1
24995 1360 0 0
36450 1870 1 1
For the …rst data line, the real price is given by 1 (1) = 22990. On the
other hand, we have A(3) = 1530, A(8) = A(7) = 0. Using the formula
from Model 3B, we have the followin prediction from our model:
´
1 = ÷4611. 34 + 22. 68A(3) + 4401. 99A(8) ÷1982. 21A(7)
= ÷4611. 34 + 22. 68 + 1530 + 0 = 30089. 06.
We can make similar calculations for the other data entries, and we …nd
the following table:
47
Y P
22990 30089,06
29215 23035,58
35980 35591,25
23800 23058,26
17699 22151,06
40750 38653,05
31500 38528,09
31500 34570,65
24995 26233,46
36450 40220,04
At …rst view, some of the predictions look good and other prediction look
bad. We are going to evaluate the quality of the predictions by using several
methods.
6.1 Correlation coe¢cient
Under ideal circumstances, we have that 1 = 1 . It means that there is a per-
fect linear relationship and moreover that the linear relationship corresponds
to the …rst diagonal.
We calculate the correlation coe¢cient :(1. 1) and hope that :(1. 1) is
"close" to the number 1.
In our case, we …nd :(1. 1) = 0. 828.
It means that 1 and 1 show a rather strong linear relationship.
6.2 Scatter
To see if (1. 1) are close to the …rst diagonal, we make an XY-scatter plot.
In our example, we …nd that
Fromthe graph we see that the points are spread around the …rst diagonal.
6.3 Mean absolute deviations
In this section we calculate the mean absolute deviation and the mean ab-
solute relative deviation:
48
15000
20000
25000
30000
35000
40000
45000
15000 20000 25000 30000 35000 40000 45000
Y
P
49
`¹1 =
1
1

[1
i
÷1
i
[ ,
`¹11 =
1
1

¸
¸
¸
¸
1
i
÷1
i
1
i
¸
¸
¸
¸
.
In the example, we …nd the following numbers:
We have
1 = 29487. 9
1 = 31213. 05
`¹1 = 3606. 5
`¹11 = 0. 134.
Our data ‡uctuate around the central value 1 = 29487. 9 and we have
`¹1 = 3606. 5. Our predictions are predictions with an absolute error of
around 3606. Related to 1 this is an error of around 12%.
Looking at the relative deviations, we …nd a relative error of around
13. 4%.
6.4 Mean squared errors
In the place of taking absolute deviations, one can also look at the squared
errors:
`o1 =
1
1

(1
i
÷1
i
)
2
1`o1 =
_
`o1
In our example, we …nd
`o1 = 18807115
1`o1 = 4336. 72
We can also use squared relative errors and we …nd:
`o11 =
1
1

_
1
i
÷1
i
1
i
_
2
= 0. 0278
1`o11 =
_
`o11 = 0. 167
50
These measure are used when we want to compare the quality of predic-
tions of several models.
6.5 More
More measures can be found on the following website:
http://en.wikipedia.org/wiki/Forecasting#Forecasting_accuracy
This website refers also to some interesting examples.
51
7 Some references
7.1 Internet
1. http://en.wikipedia.org/wiki/Econometrics
2. Econometric Theory on Wikibooks:
http://en.wikibooks.org/wiki/Econometric_Theory
3. B.E. Hansen, Econometrics:
http://www.ssc.wisc.edu/~bhansen/econometrics/
4. Econometrics Resources on internet:
http://www.oswego.edu/~kane/econometrics/
5. Empirics and Econometrics:
http://homepage.newschool.edu/het//schools/metric.htm
6. Links to Online Texts and Notes in Econometrics:
http://www.economicsnetwork.ac.uk/teaching/text/econometrics.htm/
7. Startpagina econometrie:
http://econometrie.startpagina.nl/
7.2 Books
1. A.P.Barten,Econometrische lessen Schoonhoven: Academic Service, Economie
en Bedrifjskunde, 1989
2. W.S. Brown, Introducing econometrics. West Publishing Company,
1991.
3. P. Kennedy, A quide to econometrics, 3rd edition, MIT Press, Cam-
bridge, Mass. 1992.
4. D.N.Gujarati, Essentials of Econometrics. Mc Graw Hill International
Edition, New York, 2006
5. D.N.Gujarati, Basic Econometrics, 4th editions, Mc Graw Hill Inter-
national Edition, New York, 2003.
52
6. G.S. MADDALA, Econometrics, McGraw-Hill Ltd., New York, 1977.
7. E. Omey, Inleiding tot de econometrie. Den Arend, Bonheiden, 2003.
8. J. Schmidt, Econometrics, Mc Graw Hill, New York, 2005.