Professional Documents
Culture Documents
developing countries
i~ ~~o~:~~n~~~up
LONDON ANO NEW YORK
First published 1998 by Routledge
2 Park Square, Milton Park, Abingdon, Oxon OXl4 4RN
Simultaneously published in the USA and Canada
by Routledge
270 MadisonAve, New York NY 10016
Routledge is an imprint of the Taylor & Francis Group, an informa business
© 1998 Chandan Mukherjee, Howard White and Marc Wuyts
Typeset in Times by Florencetype Ltd, Stoodleigh, Devon
Ali rights reserved. No part of this book may be reprinted or reproduced or
utilised in any form or by any electronic, mechanical, or other means, now known
or hereafter invented, including photocopying and recording, or in any informa-
tion storage or retrieval system, without permission in writing from the publishers.
British Library Cataloguing in Publication Data
A catalogue record for this book is available from the British Library.
Library of Congress Cataloging in Publication Data
A catalog record for this book has been requested.
List of figures ix
List of tables xiii
List of boxes XVI
Preface xvii
Introduction 1
1 The purpose of this book 1
2 The approach of this book: an example 3
This book grew out of our frustration as teachers of data analysis and
econometrics to post-graduate students in development economics and in
population and development at the Centre of Development Studies (CDS,
Trivandrum, India) and the Institute of Social Studies (ISS, The Hague,
The Netherlands). Our main aim in both institutions was to develop a
course which puts the emphasis squarely on data analysis and economet-
rics as a research tool in the analysis of development issues. But while
many good texts exist on statistical and econometric theory, only a few
of them deal explicitly with the practice of data analysis in research, and
hardly any do so with data relating to the specific problems of developing
countries. The purpose of this book is to fill this gap.
This book would not have come about but for the successive cohorts
of students at both CDS and ISS who sat through lectures and computer-
assisted workshops based upan the successive drafts which accompanied
its development. They provided us with both encouragement and invalu-
able feedback needed to develop a book of this nature. This feedback
was particularly important to improve the design of the exercises with
real data which constitute an important element of this book. Our sincere
thanks to all these students. The development of this book also benefited
from the involvement of two of its authors (during 1991-2) in the design
and write up of the courses in econometrics and in research methods
for the MSc in Financia! Economics, the externa! programme in eco-
nomics of the School of Oriental and African Studies. Sorne of the
materials and exercises found in this book were initially developed for
this programme. Our thanks go to SOAS (and to SIDA, the funding
agency) for giving us this opportunity to participate in the develop-
ment of long-distance courses which included both conventional study
materials and computer-assisted exercises. The feedback from course
readers, tutors and students was of great help in the subsequent devel-
opment of this book.
Its writing was made possible by the clase collaboration between the
Population and Development programmes and CDS and ISS within
the framework of the Global Planning Programme in Population and
xviii Preface
Development of the UNFPA. ÜUT thanks to UNFPA for creating the
opportunity and facility for this collaboration between the two institutions.
We are grateful to Lucia Hanmer and Niek de Jong, ISS colleagues,
and to PhD students Hari Kurup, SUTesh Babu and Saikat Sinha at CDS
and Philomen Harrison and Alemayehu Geda Fole at ISS for their valu-
able comments on various drafts of the book and their willingness to check
for errors and inconsistencies. We thank Philomen Harrison also for her
assistance in setting up sorne of the data sets used here. Furthermore, the
comments of three anonymous reviewers and of Professor N. Krishnaji
(Centre for Economic and Social Studies, Hyderabad, India) were much
appreciated as they greatly helped us to improve the final version of the
book. Thanks also to Paul Mosley, the series editor, and to Alisan Kirk,
the economics editor of Routledge, for their patience and advice dUTing
the gestation period. Finally, we would like to express OUT special appre-
ciation and thanks to Annamarie Voorvelt at ISS who worked tirelessly
to turn OUT various drafts and notes into a finished manuscript.
Centre far Development Studies, Trivandrum CHANDAN MUKHERJEE
Institute of Social Studies, The Bague HowARD WHITE
MARCWUYTS
lntroduction
where birth is the birth rate, Y income per capita, IMR the infant mortality
rate, i = 1, ... , n, where n is the sample size. The error terms, the E¡S
in the model, are assumed to be each normally distributed with zero
mean and constant variance, and to have zero covariances. Following OUT
discussion above, OUT a priori expectations as to the signs of the coeffi-
cients are as follows: a 2 < O and a 3 > O. Hence, the slope coefficient of
the income variable is expected to be negative and that of infant mortality
to be positive.
Having specified OUT statistical model, we can now proceed with its
estimation. To do this, we use a sample of observations for 109 countries
in the year 1985, taken from the World Bank tables. The data can be
found in the file BIRTH on the data diskette which accompanies this
book. The least squares estimators of the regression model yield the
following results (t-statistics in brackets ):
Bifth; = 18.8 - 0.00039 Y; + 0.22 IMR; R 2 = 0.79
(11.65) (-2.23) (14.04) n = 109
At first sight, these results look good. The coefficient of determination,
R 2 , tells us that the regression explains 79 per cent of the total variation
in the crude birth rate. This is a good result given that we are working
with cross-section data and a relatively large sample size. Moreover,
both slope coefficients have the expected sign and are statistically signif-
icant at 5 per cent significance level. The near zero value of the slope
coefficient of GNP per capita should not worry you. Why? The reason
that this coefficient tums out to be so small is due to the fact that GNP
per capita, measUTed in dollars, varies over a much wider range than the
crude birth rate, measUTed as the number of births per 1,000 population.
Consequently, the slope coefficient which measUTes the impact of a $1
change in GNP per capita on the crude birth rate is bound to be very
small, but its overall effect is nevertheless substantive because of the large
variations in GNP per capita across countries.
Given that the evidence obtained from the regression confirms OUT ini-
tial hypothesis, many researchers may be inclined to stop the data analysis
6 Econometrics for developing countries
at this point. This is unwise, however, since the statistical properties of
the regression results depend on the assumptions of the regression model
being reasonably satisfied in practice. This calls for diagnostic testing of the
validity of the assumptions of the regression model. In our example, of par-
ticular importance is the normality assumption of the error term which
underscores statistical inference (e.g. the t- or F-statistics), and, since we
are dealing with cross-section data, the assumption that the error term has
a constant variance.
To test whether the normality assumption is reasonably satisfied in prac-
tice we use Jarque-Bera's skewness-kurtosis test (which will be explained
in detail in Chapter 3). The basic idea behind this test is to verify whether
the higher moments of an empirical distribution conform with those that
would be obtained if the distribution were normal. A normal distribution
has zero skew and a kurtosis equal to 3. The Jarque-Bera statistic, there-
fore, tests whether we can accept the joint null-hypothesis that a given
empirical distribution (of, say, the residuals of a regression) has zero skew
and kurtosis equal to 3. This implies the use of a chi-square test with two
degrees of freedom. In this example, the probability value of this test
applied to the residuals of our multiple regression yields a value of 0.0574,
or 5.74 per cent. At 5 per cent significance level, therefore, we would
accept the hypothesis that the residuals are drawn from a normal distri-
bution, although we should not overlook the fact that the probability value
is only just above the cut-off point of 5 per cent.
To test whether the error terms have a constant variance we shall use
the Goldfeld-Quandt test (which will be explained in detail in Chapter 7).
The idea behind this test is fairly simple. The practice of data analysis shows
that the variance of the error term often tends to increase (decrease) with
the level of one (or more) of the explanatory variables. That is, for exam-
ple, in a regression of Y on X, the variance of the error term increases
(decreases) as X increases. This is particularly common with cross-section
data. The test involves sorting the data in ascending order of each explana-
tory variable in turn, and running two separate regressions with equal
sample size for the lower and upper parts of the data, while deleting about
a quarter of the observations in the middle. If the error variance is
homoscedastic (that is, equal variances prevail), the sums of squared resid-
uals of both sets of residuals will be roughly equal in size. The relevant test
features an F-statistic which involves the ratio of the larger to the lower
sums of squared residuals of both regressions. If we do this test for our
example by first sorting the data by GNP per capita and subsequently by
infant mortality, we find that we can accept the null-hypothesis that the
error variance is homoscedastic in both cases.
This preliminary analysis may lead us to conclude that the available
evidence supports our initial model. Indeed, the coefficients of the model
have the expected signs and are statistically significant, and the coefficient
of determination indicates that the regression explains 79 per cent of the
Introduction 7
total variation in the birth rate across countries. Moreover, subsequent
diagnostic testing shows that the normality assumption is acceptable (at the
5 per cent significance level) and that the error terms are homoscedastic.
Our results, therefare, give strong evidence far our hypothesis. But how
sound are they really?
o --~~~~~~~~~~~~~~~~~~ o '-r--~~~~-.--~~~--,r--~~~~~~~~~
0.211009
o --~~~~~~~~~~~~~~~~~~
6 lnfant mortality (age<1 year)
175
~ t;f8o
Per capita GNP
~ o ºº
éP o
o
~o o o o r:P o cP
a=' o o
o o 8
a9~ ES~~o
0
~~~8'~ttioo
000
ofb'bº 110
175 ~ oº
a> O::P o
lro
<bcPºº
o o o o o
o oº~gq,
lnfant mortafity '8€1 cfbº o
o 0
(age<1 year) o0 o 'b 'bºº
o co oG::io 0º
o
o cf' ºº o
6
'&f§ o
ºo.9 c9:tl %~ ~ 00
o
6l
~~:d>t o
o o
~oº ~o
54
o o Oc> 0
g
0€ 0
0 08 EP4ºo ( º o
o ºoº<b o ºo
~
o o 'gorn o Crude birth rate
o o o o 08 aD ~00oa:o
o o
0 per 1000 pop.
«Jf> oºº
~~
ºº o
a't8 o
o & o o o
8 00
~ 88 ~o o o 10
110 19270 10 54
o '-T-~~~--,~~~~-.-~~~~,.--~~~--.- o L.-~~~~..----~~~---..~~~~---,-~~~~.-
10 Crude birth ra1e per 1000 pop.
54 4. 70048
Log GNP per capita 9.8663
o '-T-~~~~.,...-~~~---.-~~~~-.-~~~~~
2.44949 13.2288
Square root infant mortality
o
o o
9J oº
cP~
o ocP
8
0
o0
o
o
o cP
o
Lag GNP per <!lb0 o o o o ººº o o o º'b o
8°
~O
Cl> ieébo 00 O O O o8o<fJO Offi o
cap ita 8
o~ º8,'.lh 8 o
QOO @ o0 o o... o º ..cP cPo C!-i
o o " & EJ"" roe>&&
8 §0 8o"u~ o 0 o o ºo~~
o o oooo °
0
o o o ºªo ro
4.70048
13.2288 o o oº
o o cO 8
Cb~'Wo
0
~oº@º%:>
~~ººo&, ºº
o
o o 93 €1
o 0
oº g&:, o
o
O ~ Q)O Square root co8 :;,_ 'b o_n
0 -vO 00 \T
o o
o cJ3 o o
o
o (JI) o o infant mortality 8 o
ºº o oi ºº ó) o o
oº&º~
0
o o oo>º Ef>
~o o o
0o 0
ºº~-º
co~~<:!IO
o
2.44949
o o o o o 54
~o
€1 ~~8gt,o o o 0o
0ºº0::;º9..e
?iJ~ ¿;:? o
o ~ oº @ 6' ºo o
c9
o o 0 o0 0o --g6° ca o
o a9~ o ~ o o
<go(JI) o
Crude birth rate
& º~ºooº
o
oº o o o& Gt> sººocro
o o
o o ºo o o oº
per 1000 pop.
ocrPooO
o Sl. o o o
o oé9o
o o oCf
8 ºº
8ffÍ'j
~
0
aeci
o
¿ o o
o'~
10
4.70048 9.8663 10 54
The estimation with least squares of this model yields the following results
(t-statistics in brackets ):
Biith; = -2.59 + 0.63 log(Y); + 4.06 ...JIMR; R 2 = 0.85
(-0.38) (0.925) (13.78) n = 109
The first thing to note about this new regression is that its coefficient
of determination is now about 85 per cent, as against 79 per cent in our
earlier regression. Since both regressions feature the same dependent vari-
able, the crude birth rate, their R 2s are comparable because they are both
ratios of the same total sum of squares (i.e. the total sums of squares
14 Econometrics far developing countries
of the crude birth rate). Note, however, that the slope coefficient of the
logarithm of GNP per capita no longer has the expected sign, nor is it
statistically significant at the 5 per cent significance level.
This lack of significance suggests that the income variable should be
dropped from the equation altogether. It is possible to check this propo-
sition graphically using a partial regression plot (as we shall show in
Chapter 5). Here we just report the regression results obtained by drop-
ping the income variable (t-statistic in brackets):
Biith = 3.61 + 3.83 -.JIMR; R2 = 0.85
(2.75) (24.17) n = 109
This simple regression confirms that dropping the income variable from
the equation hardly affects the coefficient of determination. This regres-
sion, therefore, yields a better result than the multiple regression of the
birth rate on both GNP per capita and infant mortality.
But perhaps you may be inclined to think that the loss of importance
of the income variable, GNP per capita, is solely due to the use of the
logarithmic transformation which may have been inappropriate in this
case. But the results of the following regression of the birth rate on GNP
per capita and on the square root of infant mortality (t-statistics m
brackets) shows that this is not the case:
Biith; = 3.31 + 0.00003 Y; + 3.86 -.JIMR; R2 = 0.85
(1.56) (0.185) (17.61) n = 109
Clearly, GNP per capita is not statistically significant and, hence, can
be dropped from the equation without any significant change in the coef-
ficient of determination. The simple regression of the birth rate on the
square root of infant mortality, therefore, is superior to any regression
model which also includes GNP per capita or its logarithm.
As can be seen from Figure 3, the distributions of the birth rate and
the square root of infant mortality are quite similar: both tend to be rectan-
gular in shape. Furthermore, Figure 4 shows that the scatter plot of the
crude birth rate against the square root of infant mortality (third row,
second column), unlike the other scatter plots in this matrix, does not
feature any outliers.
Further testing also shows that the residuals of the simple regression
of the birth rate on the square root of infant mortality lead us to accept
the hypothesis that they are drawn from a normal distribution. In this
case, the probability value of the Jarque-Bera skewness-kurtosis test
equals 0.165, or 16.5 per cent, which is well above the cut-off point of
5 per cent. The scatter plot of the birth rate against the root of infant
mortality reveals a very slight tendency towards heteroscedasticity, but
this is unlikely to be very significant. In fact, the application of the
Goldfeld-Quandt test leads us to accept the null-hypothesis of homoge-
neous error variances.
Introduction 15
54.3432
o
o
ro o
o o o o o
o o o o ro
o o o
o o o
ci o o 00 o o o
o o
c. o o
o o o o
o o
o o
o
o o
o o
o
00
o o
o
00 o o
o o o
o
o o o
o
o o o
00 o
o o
o
o o
o o o
o
o
o
ro o
00 o
00
10 000
6 175
lnfant mortality (age<1 year)
Figure 5 Scatter plot of birth against infant mortality with regression curve
(regression of birth against square root of infant mortality)
Figure 5 shows the scatter plot of the birth rate against infant mortality
(untransformed) along with the predicted regression curve obtained by
regressing the birth rate on the square root of infant mortality. As can
be seen from this figure, the slope of this regression curve declines as
infant mortality increases. That is, at higher levels of infant mortality it
requires a much greater reduction in infant mortality to reduce the birth
rate by a given amount than it does at lower levels of infant mortality.
Intuitively, this makes sense. If parents' decisions on fertility are deter-
mined by their concern with the number of children who survive into
adulthood, changes in infant mortality when its level is already low is
likely to have a much bigger impact on the birth rate than similar changes
when the level of infant mortality is high. Indeed, in the latter case, infant
mortality still remains high (even if it declines somewhat) and, hence, the
risk of children not surviving still remains considerable. This might explain
why the square root of infant mortality performs better as an explana-
tory variable than infant mortality itself.
This example showed that the transformation of the explanatory vari-
ables led us to adopta simpler model which no longer features GNP per
capita as an explanatory variable. This may have come as a bit of a surprise
since, clearly, the level of income would appear to be an important factor
in explaining the variations in the birth rate. Our results, however, do not
imply that the level of income does not matter at all. What they say is that
GNP per capita has nothing to add in terms of explaining the variation
16 Econometrics far developing countries
in the birth rate once we have already taken account of the influence of
infant mortality on the birth rate. But clearly the health of a nation in part
depends on its wealth in general, and on its average income in particular.
A quick glance back at the scatter plot of infant mortality against GNP per
capita (or, better still, of the square root of infant mortality against the
logarithm of GNP per capita) tells us that both are clearly related. But these
plots also reveal that sorne countries (in particular, China and Sri Lanka)
have low infant mortality despite their low GNP per capita, while other
countries (such as sorne of the richer oil-producing countries) have a high
infant mortality despite their high GNP per capita. In these exceptional
cases, the variation in the birth rate tallies with the variation in infant
mortality, and not with that in GNP per capita. Hence, health appears to
matter more than wealth in explaining fertility, yet clearly the health of a
nation depends to a great extent on its wealth.
Exercise
The data set SOCECON (available on the data diskette) which features
a set of socioeconomic data for a sample of countries for the year 1990,
contains observations for the birth rate, GNP per capita, and infant
mortality. Use these data to repeat the analysis carried in this section
(which was done with data for the year 1985) and verify whether you
obtain similar results for 1990.
Conclusion
The example in this section shows that we cannot always take our initial
model and the numerical results obtained from it at face value. Results
which look good at first sight may be riddled with problems if we care to
look at our data more carefully. Many of the problems which emerge at
the level of multivariate analysis can often be traced back to particulari-
ties of the data we encounter in studying their univariate distributions and
their pairwise scatter plots. And, as shown in this book, even when we
move to multivariate analysis it is still possible to combine diagnostic
testing with the use of various simple yet powerful graphical methods
(such as, for example, partial regression plots) which allow us to look in
depth at the results of multiple regressions. Graphical methods of data
analysis and careful diagnostic testing are indeed the principal tools which
allow data to play an active part in model specification and evaluation,
and as such are invaluable instruments in applied research.
1.1 INTRODUCTION
MODEL
REAL POPULATION:
POPULATION THEORETICAL
ABSTRACTIONS
/
J~
''
,J CORRECT .._
~ ~
SPECIFICATION?
,,. V
REAL MODEL
SAMPLE: SAMPLE
OBSERVE O
DATA
Summary
In this section we reviewed three distinct sets of ideas on the role of data
in model specification. These three methodologies share in common that
they assign an active role far data in model choice, development or selec-
tion. Obviously, as shown above, each approach has its distinctive ftavour
which corresponds to a particular methodological outlook. But we would
also argue that each approach has its own strengths and weaknesses which
differ depending on the particular context of research. If, far example,
your particular interest is to show that public investment crowds out
Model specification and applied research 39
private investment, it is not very sensible to settle on one specification
which happens to produce the required negative coefficient if minar alter-
ations to this preferred specification render this result insignificant or even
reverse its sign (we shall see an example of this in Chapter 6). If you are
dealing with a problem of model choice in which economic theory provides
you with forceful handles that allow you to nest rival models inside a
more general specification, hypothesis testing in the context of general to
specific modelling seems to be a logical choice of modelling strategy. But
if, in contrast, your research question is still rather vague and theory can
do no more than indicate plausible avenues of inquiry, a researcher may
well have to rely on extensive data exploration to arrive at a firmer hypoth-
esis. In actual practice, from the start of a piece of applied analysis to
its conclusion you may well find yourself drawing on a combination of
the three approaches.
Hence, while each of these approaches is rooted in distinctive method-
ological outlooks, it seems fair to say that each approach will prove its
strengths or reveal its weaknesses, depending also on the specific context
of the research. Therefore, in this book, we do not seek to rally your
support in favour of one of these approaches, but instead our aim is to
draw upon each of these methodologies and show their usefulness in
different research contexts so as to enhance your own ability to employ
data actively in the process of model specification. Indeed, the three
approaches are complements rather than substitutes, all rooted in the same
basic philosophy that data have a role to play in model specification.
2.1 INTRODUCTION
This chapter and Chapter 3 <leal with the problem of modelling a simple
average of a single variable. But why bother with univariate analysis if,
in development research, our main interest is to study empirical relations
between two or more variables? Why not jump straight to regression
analysis? We can think of three reasons why it is best to start with uni-
variate analysis.
First, we should always be aware that specific features we may come
across in univariate analysis, such as the presence of an outlier or of pro-
nounced skewness in the distribution of a variable, invariably have multi-
variate implications. Unexpected or puzzling results in regression analysis
can often only be properly understood if we look at the distributions of
its variables. Failure to do this often leads to nonsense regressions. For
this reason, it is best to proceed from the ground up: study each variable in
turn befare embarking on investigating relations between them (Hamilton,
1992: 1-2). Starting in this way also gives you an excellent opportunity to
become familiar with basic techniques of EDA (exploratory data analysis)
which teach you how to look carefully at a batch of data. Second, residuals
play a key role in the process of modelling data with regression analysis, par-
ticularly in the context of modern modelling strategies. To verify whether
the assumptions of the models we use are valid in practice, it is important
to look carefully for hidden messages in the residuals. To do this, we treat
the residuals of a regression as an observed variable in its own right.
Univariate analysis helps us to look for patterns within residuals or to test
the distributional assumptions we make about the random error term in a
regression. Finally, regression analysis involves averaging of a complex
nature. In empirical analysis, when we say that Y is a function of X, we
mean to say that the average value of Y is a function of X (Goldberger, 1991:
5). In other words, in regression analysis we <leal with conditional means of
Y for given values of X. Consequently, common errors we make when
dealing with a simple average often crop up again in more complex forms
when we subsequently move from univariate to multivariate analysis.
Modelling an average 45
Data analysis embraces both the problem of finding an appropriate
model (model specification), on the one hand, and model estimation and
testing, on the other. This chapter only deals with the latter aspect: esti-
mation and hypothesis testing within the confines of a model which we
assume to be correct. It further assumes that a univariate sample is drawn
from a normal distribution. Why do we make this assumption? One reason
could be that most data we encounter in practice are approximately
normal. Hence, if normality is the rule, it makes good sense to start with
this assumption. Unfortunately, while in sorne sciences data often behave
in this way, most social or economic data are not (approximately) normally
distributed. As we shall see, it is hard to find examples of social or
economic data which display the typical bell-shaped, normal distribution.
More often than not, social and economic data are skewed. Another
reason could be that the normal distribution is ideal for obtaining mean-
ingful averages and, hence, serves as a useful example for the problem of
averaging. This is indeed the case and explains why we take the normality
assumption as a point of our departure. A final reason is that it is often
possible to find an appropriate mathematical transformation which elim-
inates skewness in the distribution of a variable, and makes the normality
assumption acceptable.
In section 2.2, we show intuitively that normality in data renders it
easier to make sense of averages. Section 2.3 then reviews the assump-
tions of the classical model for estimating the mean of a univariate
distribution. Section 2.4 shows that, given these assumptions, the arith-
metic mean of the sample data is the best, linear, unbiased estimator of
the population mean. Subsequently, section 2.5 introduces the principie
of maximum likelihood and shows that the sample mean is also a
maximum likelihood estimator if the population distribution is normal.
Section 2.6 then deals with estimation and hypothesis testing with respect
to the population mean. Finally, section 2.7 summarises the main points
of this chapter. What to do if the normality assumption is not valid in
practice will be dealt with in Chapter 3.
0.05
o
o 100 200 300 400 500 600 700 800 900 1000 1100
Demand for labour: day shift
0.25
0.2
e:
o 0.15
ü
~ 0.1
LL
0.05
o
o 100 200 300 400 500 600 700 800 900 1000 1100
Recruitment of labour: day shift
siaia. . .
Figure 2.1 The demand for and recruitment of casual labour, Maputo harbour
Table 2.1 lists the means, medians and modes, along with the standard
deviations, for both sets of data. Since the mode is the midpoint of the
group with the highest frequency (fraction) of the data, computation of its
location is sensitive to the number of groupings used to construct the
histogram. As we can see, in each case the mean, median and mode are
virtually equal to each other. Hence, for descriptive purposes, it <loes not
matter much which one we use. They all tell the same story.
In sum, this type of bell-shaped distribution has the following charac-
teristics:
1 The average (mean, median or mode) is unambiguously located in the
centre of the distribution. Symmetry assures that the two halves left
and right of the average are mirror images.
2 The greater the distance from the average, the lower the frequency:
the mass of the distribution is concentrated in the neighbourhood of
the average.
3 The smaller the variance, the more representative the average becomes
for the data as a whole.
Table 2.1 Averaging the demand for and recruitment of labour at Maputo
harbour
Mean Median Mode Standard deviation
Demand 574 571 Similar 155
Recruitment 500 503 Similar 109
48 Econometrics far developing countries
In our example, average recruitment is considerably lower than the
average demand for labour. Furthermore, the variation in recruitment
clearly does not match the much greater variation in the demand for
labour. In this case, therefore, the supply of labour appears to have been
insufficient and relatively inflexible with respect to the larger variations
in demand. Both examples show us that the average of a bell-shaped distri-
bution is easy to interpret. This ease of interpretation is due to the
essential symmetry of the data.
Note, however, that symmetrically distributed data are not always bell-
shaped. For example, the distribution of rounding errors (say, when we
round up aggregate data to the nearest million) has a typical rectangular
shape - a distribution with a body but no tails. We can best describe this
distribution by its range (the difference between the two extremes) since
its average (mean, median or mode) is nothing but the middle value of
this range.
Now, take a look at Figure 2.2 which depicts a skewed empirical distri-
bution. It is the distribution of weekly overtime payments for casual labour
on the day-shift in Maputo harbour in the period from March 1980 to
June 1981. During this period, wage rates were constant, but obviously
weekly earnings will differ due to variations in recruitment and in access
to overtime work. The data were obtained by taking a random sample
of the weekly earnings of 45 workers over 13 weeks randomly selected
within this period, and subsequently selecting those observations (368 in
total) which pertained to the day-shift (as distinct from the night-shift).
The vertical lines in the graph show, from left to right, the locations
of the mode, median and mean of the distribution. Table 2.2 lists their
0.3
0.2
e
o
u¡:i
LL
0.1
o
o 500 1000 1500 2000 2500 3000 3500
Overtime payments, day shift
s1a1a'"
Figure 2.2 Weekly overtime payments
Modelling an average 49
Table 2.2 Averaging overtime payments
Mean Median Mode Standard deviation
Overtime payments 674 525 ±80 629
numerical values, along with the value of the standard deviation. As can
be seen from Figure 2.2, these three kinds of averages do not tell the
same story. In this case, therefore, the usefulness of an average is much
more ambiguous.
In fact, the mean, median and, particularly, the mode are far apart.
Each of these 'averages' tells a different story. The mode is the peak of
the distribution, the median its middle value, and the mean its balance
point. This example shows that, for a unimodal distribution which is
skewed to the right (i.e. its tail is on the right), the mode will be smaller
than the median which, in turn, will be smaller than the mean. Conversely,
the mode of a unimodal distribution skewed to the left will be greater
than its median which, in turn, is greater than the mean. The lack of
symmetry, therefore, results in no clear centre of the distribution. In fact,
in this example, the most distinctive feature of the distribution is its virtual
exponential decline from left to right, ending up in a long tail.
In sum, when an empirical distribution is unimodal, symmetric and
(preferably) bell-shaped, the concept of an average is fairly straightfor-
ward. The peak of the distribution, its middle value and its balance point
all coincide. Symmetry ensures that both halves left and right of the
average are mirror images. The bell-shaped distribution implies that the
frequency declines as the distance from the average increases. By contrast,
the average of a unimodal skewed distribution is far more ambiguous: the
location of the mean, the median and the mode depends on the distrib-
ution of the data between its peak and its tail.
Let us now return to the question of modelling data. The lesson we can
learn from these examples with real data is that modelling an average
invariably requires us to make assumptions about the shape of the distri-
bution. In this chapter we shall assume that the population distribution is
symmetrical and bell-shaped. More precisely, we shall assume that the
relevant model is the normal distribution. In this case, the population
mean (the first uncentred moment of the distribution), the median and
the mode all coincide in one unambiguous 'average'. Furthermore, the
normal distribution is characterised by its thin tails. In fact, there is only
a 5 per cent chance that an observation drawn from a normal distribu-
tion is more than 2 standard deviations away from its mean; the probability
of encountering an observation which is more than 3 standard deviations
distant from the mean is as low as 0.3 per cent. Finally, as we shall see,
if data are distributed (approximately) normally, the mean and the stan-
dard deviation tell us all we need to know about the data.
50 Econometrics far developing countries
Exercise 2.1
Using the data file SOCECON (with world socioeconomic data for 1990)
on the diskette, make histograms and compute means, medians and modes
for the following variables:
1 GNP (gross national product) per capita;
2 HDI (human development index);
3 FERT (fertility rate);
4 LEXPM and LEXPF (male and female life expectancy);
5 POPGRWTH (population growth rate ).
In each case, discuss the different averages in the light of the shape of
the empirical distribution. Would you say that any of the distributions is
reasonably symmetrical and bell-shaped? (If you do not know how to
compute a median, jump ahead to Box 3.1 in Chapter 3.)
900
800
ctl
·¡:
ctl
N
700
e:
~ 600
IJ...
o 500
CD
Ol
e: 400
"§
t5 300
.!!!
::::>
e:
ctl
200
~
100
o
64 66 68 70 72 74
Year
Siaia™
Exercise 2.2
Can you think of a few concrete examples with cross-section data where
the assumption of a constant population mean for all observations in the
sample is clearly inappropriate?
In fact, we have come across one example already. If women are paid
less than men for equal levels of education or skills, equation (2.1) would
be inappropriate to model the ftuctuations of income of both men and
women with similar education and skills. In this case, it would be more
correct to apply the model separately to incomes of, respectively, men
and women. Similarly, mortality levels may differ between urban and rural
populations, or among social classes. Averaging across these categories
may well give us misleading results since we assume a single population
when, in fact, several distinct populations should be considered.
52 Econometrics far developing countries
But let us assume that we are dealing with a situation where the assump-
tion of a constant mean is valid in practice. The model as specified in
equation (2.1), however, is still incomplete. The reason is that we need
to specify the stochastic nature of the error term. In classical statistics, at
least three assumptions are made with respect to the behaviour of the
error term:
E(EJ =o (2.5)
(2.6)
Assumption (2.6) states that the error term is homoscedastic. That is,
it has a constant variance. When we say that the error term has a constant
variance, we do not mean to say that all error terms will have the same
size. What it means is that each error is drawn from a population with
the same variance. Since the probability distribution of Y; and E; are iden-
tical but for their respective means, it follows that Y; also has the same
constant variance:
E(Y; - µ)2 = E(E'f) = a 2 (2.9)
The systematic component, µ, does not explain the variation in Y, but
only its average level. Consequently, the total variation in Y; equals the
variation in the error term.
The assumption in equation (2.7) states that the various error terms
(E¡; i = 1, .. .,n) are statistically independent of one another. Por example,
the fact that the error term of observation i was large should not influ-
ence the size of any prior or successive error terms. We assume, therefore,
that the data have been generated through random sampling. This assump-
tion is not always valid in practice. Por example, if our sample is a time
series, the error terms may well be autocorrelated; that is:
(2.10)
Consequently, the data generating process does not conform to our
assumption that the data were randomly sampled.
As yet, we made no assumption about the shape of the population distri-
bution. However, as we have seen in the previous section, shape matters
when assessing the usefulness of different kinds of averages. So, do we
not make any assumption about the shape of the distribution of the error
term? In fact, in classical statistics, we certainly do. More specifically, we
Modelling an average 53
add the assumption that the error term derives from a normal distribution.
This assumption, together with equations (2.5) and (2.6), can be written
as follows:
(2.11)
which states that the error terms are normally distributed with mean O
and a constant variance.
This is a strong assumption which in practice we should never take for
granted without scrutinising the data first. For example, the normality
assumption appears to be quite reasonable for the data on the demand
for and recruitment of casual labour in Maputo harbour as shown in
Figure 2.1. (Recall that we can judge the variance of E; from the graph of
Y; since, provided our model of the mean is correct, the two variables
have the same variance.) It would be far-fetched, however, to assume that
overtime payments to manual workers in the harbour are also distributed
normally. Figure 2.2 throws serious doubt on such an assumption.
Let us now look at the properties of the sample mean as an estimator
of the population mean, subject to assumptions (2.1) and (2.5)-(2.7).
Thereafter, we shall add the normality assumption which sets the stage
for statistical inference about the population mean.
(2.12)
0.3
e 0.2
o
t5
~
u.. 0.1
o
o 100 200 300 400 500 600 700 800 900 1000 1100
Population distribution
0.3
e 0.2
o
u~
u.. 0.1
o
o 100 200 300 400 500 600 700 800 900 1000 1100
Sampling distribution
s1ara•M
in this case. Figure 2.4 compares this histogram of sample means of 1,000
samples (bottom panel) with the histogram of its population distribution.
As shown in Table 2.3, this ( approximate) sampling distribution has exactly
the same mean as the population distribution, which illustrates the fact
that the sample mean is an unbiased estimator. Note, however, that the
standard deviation of the sampling distribution is much smaller than that
of the population distribution, for reasons we shall now explain.
(2.15)
n
But the assumption that the error term is not autocorrelated means that
the covariances of each pair of Y; and ~ will be zero. Thus equation (2.15)
can be reduced to the following expression:
-
V(Y) = -
ª2 (2.16)
n
Take another look at Table 2.3. It shows that the standard deviation of
the sampling distribution is significantly smaller than that of the popula-
tion distribution. Equation (2.16) tells us why this is the case. The
difference between the sample and population variance depends on the
sample size. It is easy to verify that the calculated standard deviation of
the sampling distribution, 40, approximately equals the standard devia-
tion of the population distribution, 155, divided by the square root of
the sample size, 15. Other things being equal, the larger the sample, the
smaller the margin of error of our estimates.
Modelling an average 57
It can be shown that the variance of any linear estimator of the popu-
lation mean, say U, will be greater than or equal to the variance of the
sample mean. That is,
2 -
V(U) ~ ~ = V(Y) (2.17)
n
The sample mean, therefore, has the property of least variance among all
linear estimators of the population mean.
Hence, the sample mean is the best linear unbiased estimator (BLUE)
of the population mean. This result depends on the assumptions of the
model. It is useful to reflect carefully on how each assumption was used
to prove that the sample mean is BLUE:
1 We assumed that all sample units Y; come from the same population -
i.e. they have the same mean and variance. The assumption of an equal
population mean is critical to proving that the sample mean is unbiased.
2 In addition, we assumed that all the sample units are independent of
each other - i.e. our sample is an independent random sample. This
assumption is crucial for the sample mean to have the minimum vari-
ance property. If the sample units are not independent of each other,
the sample mean will not necessarily have the minimum variance prop-
erty, though it will still be unbiased. As a result, the precision of the
estimator will be in doubt.
Perhaps we can best end with a final word of warning. We have shown
that to prove that the sample mean is BLUE, no assumption was needed
with respect to the shape of the population distribution. But this should
not lead us to believe that shape does not matter. In section 2.2 we saw
that a mean is not always that meaningful. Indeed, if a distribution is
symmetric and preferably bell-shaped, the mean is at its centre. But if a
distribution is strongly skewed the mean is no more than a balance point
with little further interpretative value. As we shall see in Chapter 3, the
sample mean loses much of its power if the distribution of the variable
in question is strongly skewed or riddled with outliers. The validity of the
normality assumption, therefore, is not a luxury, but quite essential to
ensure the power of the sample mean as an estimator.
Exercise 2.3
This exercise can best be done in the context of a classroom workshop.
The aim is to get a better grip on the concept of a sampling distribution.
To do this, take a particular set of data on a variable such as one of the
variables listed in Exercise 2.1. Consider the data as the real population
for illustrative purposes, and draw a number of random samples of equal
size (say, n = 10, or 15, or 25) from this population. (If doing the exercise
in the classroom each student can draw his or her own sample.) Now:
58 Econometrics for developing countries
1 draw a histogram of the population distribution and calculate its mean
and standard deviation;
2 calculate the mean and standard deviation for each sample drawn from
this population;
3 draw a histogram of all sample means and calculate its mean and stan-
dard deviation;
4 check the relation between the mean and standard deviation of the
population and those of the distribution of sample means;
5 comment on the respective shapes of the distribution.
The latter point is particularly instructive if the population distribution
from which samples are taken is strongly skewed. The histogram of sample
means (an approximation of the shape of the sampling distribution) will
tend to be bell-shaped. This tendency derives from what is called the
central limit theorem in statistics. In short, the sampling distribution of
the sample mean will tend to the normal distribution as the sample size
increases, notwithstanding the shape of the population distribution from
which the data were sampled.
(A note on sampling To do this exercise well, it is advisable to use a
reasonably large number of samples. To prevent the exercise becoming
tedious, especially if doing it on an individual basis, it is best to use a
software package which allows you to (a) generate a random variable; (b)
sort the data base by ordering any variable; (c) calculate summary statis-
tics (means and standard deviations) for any subset of the data; and (d)
draw histograms. If one is available, proceed as follows, assuming Y is the
variable which defines the population distribution: (a) generate a random
variable, R; (b) sort the database with respect to R; ( c) select the first
n ( = sample size) observations of Y which will be a random sample of
the wider population; (d) calculate the mean and standard deviation
of this sample of Yvalues; (e) delete R and start again at (a) to generate
the next sample.)
0.3
e 0.2
o
u~
LL
0.1
o
350 400 450 500 550 600 650 700 750
Mean demand for labour
0.3
e 0.2
·uo
~
LL
0.1
o
350 400 450 500 550 600 650 700 750
Median demand for labour
S laJa™
Figure 2.5 Comparing the sampling distributions of mean and median: demand
for labour
Modelling an average 61
Exercise 2.4
This exercise is an extension of Exercise 2.3. Before you calculated the
mean of each sample, now obtain its median as well. Compare the
histograms of the sample means and sample median and comment on
their relative efficiency. If the population distribution is not approximately
normal, you should find that the mean does not necessarily perform better
than the median.
y - µ,)
( a/...Jn (2.21)
z =
E (y -µ,)
a/...Jn
= ...Jn E (Y - µ) = O
O'
(2.22)
V (Y-µ,) n - n
al...Jn = a2 .V(Y - µ) = a2 .V(Y) = 1 (2.23)
Y-µ ) (2.24)
P ( -l.96 ,,;:;; al...Jn ,,;:;; 1.96 = 0.95
p ( µ-1.96 Tn,,;:;;
(J y,,;:;;
- µ + 1.96 Tn ª) = 0.95 (2.25)
Confidence intervals
Our main interest, however, is to make inferences about the population
mean based on the sample mean. Now, if the sample mean is, with 95 per
Modelling an average 63
cent probability, within a certain distance of the unknown population
mean, it fallows that the unknown population mean is within a certain
distance of the sample mean. That is, inequality (2.25) can be rearranged
as fallows:
- a - a
P(Y - 1.96 Tn ~ µ ~ Y+ 1.96 Tn) = 0.95 (2.26)
~
-
700 -
-
-
- - -
- - -
650 -
~ ~ - -
-
- -
- -
-- -
~
600 - - - - -
- -
574
- - -
550 - - -
500 - - - - - - -
-
-
-
- -
- -
- - - -
450 -
-
-
400 -
350 -
2 3 4 5 6 7 8 9 1o 11 12 13 14 15 1 6 17 18 19 20
20 sample means with 95% confidence intervals
SIa Ta""
Figure 2.6 Confidence intervals of sample means: demand for labour, Maputo
harbour
-n1 L (Y; - - -
Y)2 where Y
1
= -n L Y; (2.27)
But it can be shown that this sample variance turns out to be a biased
estimator of the population variance. The ML sample variance tends to
underestimate the population variance since the bias involves a factor,
(n-l)!n, which is less than l. Consequently, an unbiased estimator of the
population variance is obtained by multiplying the ML sample variance
by n!(n-l), as follows:
1 -
s2 = n _
1
L (Y;- Y)2 (2.28)
The t-distribution
Substituting s for a in equation (2.21) yields a new variable t, as follows:
t (Y-µ,)
= --;¡:r;; (2.29)
where t0.95 is the critical value of the relevant t-distribution with n-1
degrees of freedom. Note, however, that, as sample size grows, the t-distri-
bution converges to the standard normal distribution; hence, for larger
samples (say, with n > 100) we can safely resort to the standard normal
distribution, even when the population variance is unknown.
Hypothesis testing
A statistical hypothesis is a statement about the value of a parameter of
the statistical model (in this case, about the population mean). The differ-
ence between a confidence interval and a test of a hypothesis is that in
the former we are trying get an idea about the range of likely values of
the unknown population mean, given a particular sample, while in the
latter we are trying to see how likely the sample is for a given hypothe-
sised value of the population mean. In hypothesis testing, we always
consider two complementary hypotheses which do not overlap but,
between them, exhaust all possible values that the relevant parameter can
take. Why do we need two hypotheses? The reason is that hypothesis
testing involves making a decision on whether or not to reject a partic-
ular hypothesis which we denote by H 0 , the null hypothesis. lf H 0 is
rejected, it means that we effectively accept the alternative hypothesis,
called H 1, which contains all other possible values apart from the partic-
ular value specified in H 0 • Hence, H 1 is always very vague in contrast with
the precise nature of Ha: H 1 simply tells us that any other value is possible
apart from the one hypothesised in Ha.
Modelling an average 67
For example, the summary statistics in Table 2.1 tell us that daily recruit-
ment of casual labour in Maputo harbour fluctuated around 500 in the
early 1980s. Suppose that, in a subsequent period, we draw a sample of
observations on daily recruitment and seek to test whether a recruitment
level of 500 continues to be the average. Our two complementary
hypotheses will then look as follows:
(2.31)
The basic idea then is to test the null hypothesis that the sample is
drawn from a population with mean 500. In this case, the alternative
hypothesis specifies that the population mean can be either greater than
or less than 500. This is what is called a two-tailed test.
But suppose we know that transport activity in Maputo harbour
dropped significantly after the early 1980s and, hence, we do not expect
recruitment levels to average as much as 500 per day. In this case, we
specify the complementary hypotheses as follows:
(2.32)
Note that H 0 and H 1 continue to exhaust all possible values for the popu-
lation mean, since the possibility of µ > 500 has been ruled out. This is
a one-tailed test. If we now replace µ by 500 in the expression for t in
(2.29), it follows that our null hypothesis implies that T has a t-distribu-
tion with population mean 500, i.e.:
-
t = Y -_500
1 - t(n-l) 1·t H 0 is
. true (2.33)
shm
since H 0 specifies that µ = 500. How then do we carry out the test? We have
already noted that hypothesis testing involves making a decision as to
whether or not to accept the null hypothesis. This decision will never be fool-
proof. The reason is that we make such decision under uncertainty due to
the random nature of the error term. Our decisions, therefore, will involve
probability statements, not absolute certainties. Table 2.4 lists the two types
of errors which we may encounter when making this type of decision.
A type I error involves rejecting the null hypothesis when in fact it is
true. The probability of making a type I error, a, is called the level of
significance of a test. We always specify this level of significance clearly
befare we do the test. It is customary to allow for a 5 per cent probability
Exercise 2.5
The population distribution of the demand for labour in Maputo harbour
is assumed to be (approximately) normal, with population mean 574. Five
random samples of the daily demand for labour were drawn, with 15
observations each. The resulting sample means and standard deviations
E(aX) = aE(X)
V(aX) = a2 V(X)
T-
-~
z (A.2.6)
3.1 INTRODUCTION
Exercise 3.1
Table 3.1 lists GNP per capita (in $) for a small sample of seven African
countries in 1990. Using these data:
1 calculate the mean and median of the sample;
2 calculate the residuals of the sample mean and check whether they
sum to zero.
The least squares property tells us that the sum of squared deviations
of the sample values from the sample mean will always be less than the
sum of squared deviations from any other arbitrarily chosen value, say c.
Mathematically stated, this means:
L (Y; - Y) 2 ~ L (Y; - c) 2 for any e (3.3)
In other words:
-
cminimum =y (3.4)
Figure 3.1 illustrates this least squares property in the case of our sample
on GNP per capita for seven African countries (Table 3.1). As you can
see, the sum of squared deviations of sample values from different values
for e reaches a minimum when e equals the sample mean.
Exercise 3.2
Calculate the sum of squared residuals for the sample in Table 3.1.
300000
280000
260000
240000
230571
220000
250 275 300 325 350 375 400 425 450
e
Sta Id"'
Exercise 3.3
Repeat exercises 3.1 and 3.2 with the data for sample 2 in Table 3.2:
1 check how big the residual of Botswana is in relation to the other
residuals;
2 check the relative size of the squared residual of Botswana in the total
sum of squared residuals.
What do you conclude from this exercise?
(3.12)
(3.13)
1_ ~(Y- Y)4
n "'-' '
s4
(3.15)
Outliers, skewness and data transformations 83
In practice, the main purpose of calculating the sample kurtosis is to check
whether a symmetric distribution behaves approximately as a normal
distribution. There is not much point, therefore, in calculating a kurtosis
of a skewed distribution. A unimodal bell-shaped empirical distribution
with skewness close to O and a sample kurtosis close to 3 can be taken
to behave similar to a normal distribution. For example, in Chapter 2 we
made frequent use of the empirical distribution of the demand for casual
labour on the day-shift in Maputo harbour during the early 1980s. The
skewness and kurtosis of this empirical distribution are, respectively,
a3 = -0.013 and a4 = 3.007. Not surprisingly, therefore, this distribution
behaves very similarly to a normal distribution. An empirical distribution
with a 3 roughly equal to zero and a4 > 3 has heavier tails than a normal
distribution would have, while a4 < 3 indicates thinner tails than normal.
Exercise 3.4
For samples 1 and 2 in Table 3.2 compute, respectively, the sample coef-
ficients of skewness and of kurtosis. How does the presence of an outlier
in sample 2 affect both measures?
Exercise 3.5
In exercise 2.1 you were asked to compute means, medians and modes
for a set of socioeconomic variables. For each of these variables:
1 compute the coefficients of skewness and kurtosis;
2 comment whether the distribution can be taken to be symmetrical or
not;
3 if so, whether is has normal tails or not.
What do your results tell you about whether the mean is a good summary
of the data?
Order-based statistics
The order statistics of a sample of observations Y; (i = 1, 2, ... n) are
obtained by rearranging the observations in order of increasing magni-
tude. We denote the resulting ordered sample as Y(i) (i = 1, 2, ... n) such
that Y(I) < Y(z) < ... < Y(n)' where the bracketed subscripts refer to the
position of each observation in the ordered list. Suppose, for example,
that we have the following sample of five observations: 4, O, -3, 5 and -2.
These are the Y; values, i = 1, ... 5. The ordered sample is obtained as
follows: -3, -2, O, 4, and 5. This ordered list contains the Y(il values. Hence,
while Y 2 = O, Y(z) = -2, since -2 is the second value in the ordered list.
The median, Md, is the middle value of the ordered sample, Y(i) (i =
1, ... n), and, hence, splits the ordered list into two halves; that is, half the
84 Econometrics far developing countries
observations Y(J) líe below the median, and the other half above it. Box 3.1
explains how to obtain the median of a sample. For a theoretical probabil-
ity distribution the median is the centre of probability of the distribution:
50 per cent of the probability mass líe below it, and 50 per cent above it.
Median
The location of the median depends on whether the sample size, n,
is e ven or odd. First, compute the median depth = (n + 1)/2:
1 If the resulting value is an integer and, hence, the sample size is
odd, the position of the median is given by (n + 1)/2. For example,
if n = 5 and, hence, (n + 1)/2 = 3, the median is Y 3, the third
value in the ordered list.
2 If the resulting value contains a fraction (0.5) and, hence, the sam-
ple size is even, the median is obtained by calculating the arith-
metic mean of Yn 12 and Y(n/2) + 1. For example, if n = 6, (n + 1)/2 =
3.5, the median is obtained by taking the mean of Y 3 and Y4• Note
that positions 3 and 4 are the nearest integers on either side of 3.5.
Note that the median depth indicates how far we have to count
inwards to encounter the median. This concept of depth is also used
to pinpoint the position of an observation in an ordered list with ref-
erence to the median. Hence, the depth of Y; is obtained by counting
its position upwards from the lowest value if Y; lies below the median,
and downwards from the highest value if it líes above the median.
with Y1, the lowest value in the sample. To obtain the upper
quartile, count downwards starting with Ym the highest value in
the sample. In our example, the lower quartile will be the third
value from the bottom of the ordered list; the upper quartile is
the third value from the top.
2 If the resulting value contains a fraction (0.5), the quartiles can
be obtained by averaging the two values whose depths are the
integers adjacent to the computed quartile depth. To find the
lower quartile, start from the bottom of the list; to find the upper
quartile, start from the top. For example, if n = 7, the median
depth will be 4 (which equals the truncated median depth
because it does not contain a fraction). The quartile depth then
equals (truncated median depth + 1)/2 = (4 + 1)/2 = 2.5. The
lower quartile, therefore, is obtained by taking the average of
the second and third values in the ordered list: QL = (Y2 + Y3 )/2.
Similarly, the upper quartile is found by averaging the second
and third values from the top of the list: Qu = (Y5 + Y6 )/2.
Source: Hoaglin (1983)
Quartiles divide an ordered sample into quarters. Box 3.1 explains how
to do this. The median itself is the second quartile. Apart from the median,
we have the lower, Qv and upper, Qu, quartiles. The interquartile range,
IQR = Qu - Qv is the most commonly used measure of spread for arder
statistics. It gives us the range of the middle 50 per cent of the data. The
range of the data is another order-based measure of spread and involves
computing the difference between the extreme values, respectively Xu and
Xv of the sample, where Xu = Y(n) and XL = Y(l)· Unlike the IQR,
however, the range (Xu - XL) is not a resistant summary of spread.
Exercise 3.6
Using Table 3.2, find the median and the upper and lower quartiles for
the following samples:
1 sample 1;
2 sample 1 without Zimbabwe;
3 sample 2;
4 the sample of GNP per capita for all eight countries listed.
(Between them, these four cases cover all possibilities listed in Box 3.1.)
Exercise 3.7
Using the variables selected in exercises 2.1 and 3.5, for each variable find
the median and the upper and lower quartiles.
86 Econometrics for developing countries
To look at the shape of an empirical distribution EDA starts from what
is called the five-number summary of the data: the median, Md, the lower
and upper quartiles, QL and Qu, and the two extreme values, XL and Xu.
The box plot, a graphical display of this five-number summary, shows us
the basic structure of the data: more specifically, it shows the location,
spread, skewness, tail length and outliers of a batch of data (Emerson and
Strenio, 1983: 58). To see how to construct a box plot it is best to look
at an example. Figure 3.2 compares the two box plots corresponding to
the two seven-country samples of GNP per capita of African countries
listed in Table 3.2.
First, take a look at the box plot of sample l. The plot is constructed
with the aid of the five-number summary. A box plot consists of a box
and two tails. The box gives the variation of the middle 50 per cent of
the data: its upper and lower boundaries are, respectively, the upper and
lower quartiles. The horizontal line inside the box indicates the position
of the median. The tails (line segments up and down the box) run to the
most extreme data values which do not qualify as outliers. Box 3.2 gives
a simple procedure to determine which data points can be considered as
outliers. In sample 1 there are no outliers and, hence, the tails run up to
both extreme values of the sample. In contrast, sample 2 features an
outlier, the high GNP per capita of Botswana. To highlight its status as
an outlier, we plot it as an isolated point.
2100
2000
1900 o
1800
1700
1600
1500
1400
1300
1200
1100
1000
900
800
700
600 1
500
400
300
200 1 1
100
o
Sample 1 Sample 2
STaTa"'
Figure 3.2 Box plots of GNP per capita (two samples of seven African
countries)
Outliers, skewness and data transformations 87
9
8
7
6
5
4
3
2
o
-1
-2 lndia/NP
Bhutan
-3
-4
Low Lower-middle Upper-middle
sra1a•M
Figure 3.3 Comparative box plots of gender differences in life expectancy
Outliers, skewness and data transformations 89
Exercise 3.8
Take a careful look at both Table 3.3 and Figure 3.3. Write down your
general comments. More specifically, comment on the following questions:
1 Is there a relation between gender differences in life expectancy and
the wealth of nations?
2 How do these three distributions differ with respect to spread?
3 Are the distributions similar in shape, or not?
4 What does the little cluster of outliers in the distribution of low income
countries tell you?
5 What did you learn from this exercise in terms of the relative useful-
ness of mean-based versus order-based statistics?
25000 -
o
o
@
20000 - o
§
§
15000 -
(
10000 -
8000 -
6000 -
4000 -
2000 - 1 1
o-
sra1a•M
9-
8-
7-
6-
5-
4-
3-
2-
-
o-
-1 -
-2 - lndia/NP
Bhutan
-3 -
-4 -
srata™
Figure 3.5 Symmetric but unusual tail: female-male life expectancy
Outliers, skewness and data transformations 93
Figure 3.5 reproduces the box plot along with the position of the mean.
The reasonably significant discrepancy between the mean and the median
may lead us to decide to transform the data. But if we do this, we over-
look the fact that the middle body of the data is symmetric, as indicated
by the position of the median which cuts the box in halves. As discussed
above, the problem here is not skewness, but the presence of a cluster of
outliers in the lower tail.
To avoid failing to distinguish between genuine skewness in data and
un usual behaviour (outliers) in one of the tails (both of which lead to the
divergence between the mean and the median), always observe whether
the middle 50 per cent of the data also manifest skewness. If they do, a
transformation is called for; if not, our attention should go to the unusual
behaviour in the tail. To make this distinction you can use a box plot to
check the location of the median in relation to the quartiles. Symmetry
requires that the median (approximately) divides the box into halves.
Alternatively, you may use Bowley's coefficient of skewness, bs, which is
a resistant measure of skewness defined as follows:
bs = (Qu + QL - 2 Md) I IQR (3.18)
where bs líes within the range -1 and + l. You can easily verify that if
the median is situated in the middle of the upper and lower quartiles, bs
equals O, indicating symmetry in the middle body of the data. If bs < O,
the middle 50 per cent of the data is skewed to the left (negative skew),
and if bs > O, it is skewed to the right (positive skewness).
To sum up, step 1 checks whether or not the distribution is skewed. To
do this, compare the mean with the median to see whether they diverge
significantly. Compute bs, Bowley's resistant measure of skewness, to
verify whether the middle 50 per cent of the data confirm your conclu-
sion. If the distribution is genuinely skewed, you may decide to try out a
transformation of the data to see whether it is possible to eliminate skew-
ness. If the data are approximately symmetrical (or manifest only slight
skewness), you can proceed with step 2.
where the critica! value of the chi-square statistic with two degrees of
freedom equals 5.99 at the 5 per cent level of significance. Strictly
speaking, this test is only valid for (very) large sample sizes: n > 1,000.
The reason is that Z 3 and Z 4 cannot be taken to be independent if the
sample size is smaller than 1,000 and, therefore, the use of the chi-square
distribution as the sampling distribution of (Z32 + Zi) is not strictly valid.
However, the test is still worthwhile asan indicative exercise if our sample
size happens to be smaller, which will often be the case in applied work.
More specifically, if the test rejects the normality assumption, you can
be reasonably assured that the data are not derived from a normal distri-
bution. But if the test does not reject the null hypothesis, you still need
to be careful about its interpretation. It is possible that the test has missed
out one or another indication of non-normality. This apparent ambiguity
should not trouble you too much. The mechanical application of tech-
niques or tests by themselves is never fruitful. Applied data analysis
invariably involves informed judgement along with technical expertise. It
is always useful to look at data in different ways to assess the reason-
ableness of the assumptions involved in modelling. In this case, use the
skewness-kurtosis test in conjunction with the more informal exploratory
checks obtained by comparing resistant and non-resistant measures of
level and spread to judge whether the normality assumption is likely to
be valid in practice.
Exercise 3.9
Using the variables selected in exercise 2.1, 3.5 and 3.7, for each variable:
1 summarise mean-based and order-based statistics;
2 find the outliers, if any, in the sample;
3 produce the relevant box plot;
4 test the normality assumption by (a) using the two-step exploratory
check and (b) using the skewness-kurtosis test.
96 Econometrics for developing countries
Heavy tails: mean versus median revisited
So what do we do if our data are skewed or have heavy tails? If the data
are skewed, a transformation of the data may help. We discuss how to do
this in section 3.5. But what to do if the data are reasonably symmetric
but have heavier than normal tails? Should we continue to use the sample
mean as the preferred estimator of the population mean?
In section 2.5 we showed that the mean is the superior estimator if the
normality assumption prevails. In those circumstances the mean clearly
outperforms the median: the standard deviation of the sampling distrib-
ution of the median will be about 25 per cent larger than that of the mean.
But what if the underlying conditions of the parent distribution are
unknown and, in particular, if it is likely to have heavy tails? In such
cases, an estimator which performs well under very restrictive circum-
stances but does not do so well under a variety of different conditions is
not much use. It is then preferable to use a robust estimator such as the
median. Indeed:
In non normal distributions with long tails, the relative efficiency of
the median to the mean rises and may become larger than l. We can
now define more specifically what is meant by a robust estímate -
namely, one whose efficiency relative to competitors is high (i.e. seldom
much less than 1) overa wide range of parent populations. The median
is more robust as well as more resistant to erratic extreme observa-
tions, although it is inferior to the mean with symmetrical distributions
not far from normal.
(Snedecor and Cochran, 1989: 136)
Hence, in cases where the empirical distribution is reasonably symmet-
rical but sP is significantly larger than s ( or a4 is well above 3), it is
preferable by far to use the median as the estimator of the population
mean (which, for a symmetrical distribution, is also equal to the popula-
tion median).
If we rely on the sample median as our estimator, it is possible to
construct a conservative confidence interval for the population median
that is valid for any continuous distribution, as follows (ibid.: 136-7). Two
values in the ordered list of sample values serve as the lower and upper
confidence limits. To obtain a 95 per cent confidence interval, we obtain
the positions for the confidence interval by first calculating the values of,
respectively, [(n + 1)/2 ---in] and [(n + 1)/2 +--in], subsequently rounding
down the lower value and rounding up the upper value to the nearest
integers.
To illustrate this procedure, suppose we have an ordered sample of the
following 15 observations:
Sample: 1, ~ 3, 6, 8, 1~ 12, 14, 16, 1~ 19, 2~ 2~ 32, 34
Median: 14
Outliers, skewness and data transformations 97
To obtain 95 per cent confidence interval, we calculate the position of the
approximate limits as follows:
(n + 1)/2 - "1n = (15 + 1)/2 - "115 = 4.3, which rounds down to 4
(n + 1)/2 + "1n = (15 + 1)/2 + "115 = 11.87 which rounds up to 12
giving us the positions of the lower and upper confidence limits. Hence,
the lower limit of the 95 per cent confidence interval is 6, the fourth value
in the ordered list, and the upper limit is 24, the twelfth value in the list.
Note that the distance of the lower limit of this confidence interval to the
median is not necessarily equal to the distance from the median to the
upper limit. This will only be approximately the case if the data are fairly
symmetrically distributed. Indeed, since the confidence interval is based
on arder statistics, its limits depend on the shape of the distribution.
Exercise 3.10
Table 3.5 gives you selected summary statistics for both distributions
depicted in Figures 3.6 and 3.7: respectively, per capita household income
and the logarithm thereof. Check whether either or both distribution
approximately satisfies the normality assumption, using: (a) the two-step
procedure involving the comparison of mean-based and order-based statis-
tics; and (b) the skewness-kurtosis test for normality.
0.56
e
.Q
~
u..
0.00
10.00 1200.00
Per capita household income
stata'M
0.238579
e
.Q
~
u..
o
2.30259 7.09008
Lag per capita household income
s1a1a"'
Figure 3. 7 Log household income
100 Econometrics for developing countries
Table 3.5 Summary statistics of per capita household income
Summary statistics Per capita Logarithm of per capita
household income household income
Median 153.33 5.03
Mean 198.25 4.95
IQR 166.67 1.10
Standard deviation 170.65 0.89
Skewness (a 3) 2.25 ---0.48
Kurtosis (a 4 ) 10.88 3.17
Sample size 197 197
You will have noted that the logarithmic transformation somewhat over-
corrected the skewness in the original data. Indeed, the transformed data
show a slight but significant skew to the left. Strong positive skewness in
the raw data, therefore, has been tumed into mild negative skewness with
the transformed data. This is the main reason why the skewness-kurtosis
test, at 5 per cent significance level, rejects the null hypothesis that the
data are derived from a normal distribution.
As far as the problem of fat tails is concemed, our log transformation
did a splendid job. You will have found that the resulting pseudo stan-
dard deviation is slightly less than the standard deviation, while the
coefficient of kurtosis is slightly above 3. For all practica! purposes, the
log transformed data have thin tails. Given these results - slight but signif-
icant negative skew and thin tails - you might rightly be inclined to decide
that the log transformation did a satisfactory job. The parent distribution
may not be wholly normal, but this does not mean that the classical model
based on the sample mean as estimator is likely to perform badly. In other
words, the underlying distribution of the log transformed data is unlikely
to be so far away from the normal distribution as to affect seriously the
relative efficiency of the sample mean. Furthermore, for reasons which
will become more apparent throughout this book, the logarithmic trans-
formation is very attractive and, if it is likely to perform reasonably well,
you may feel inclined to stick with it. But, altematively, you may want
to search for a better transformation which corrects for positive skewness
in the data without overdoing it. Which other types of transformations
can we use?
In general, the power transformation is of the form yP, where pis a non-
zero real number. The choice of the power p depends on the nature of
skewness in the original distribution. The higher the value ofp above unity,
the greater is the impact of transformation in reducing negative skewness.
Similarly, the lower the value of p below unity, the greater its impact in
reducing positive skewness, except when p is exactly zero. If p = O, yP = 1
for any non-zero value of Y. Obviously, this type of transformation does
not serve any purpose since all the information about Y would be lost. But
it so happens, however, that the logarithmic transformation fits nicely into
this position of the ladder of transformations and so, by convention, p = O
is the log transformation. The reason is that it reduces positive skewness
less than a negative power (p < O) would do, but more than any power p,
such that 1 < p < O. Thus we obtain a hierarchy of powers, p, in terms of
its effects on reducing skewness. This hierarchy is depicted in the so-called
ladder of powers as shown in Table 3.6.
The power used in transformation need not be only an integer but can
contain fractions as well. Hence, for example, it is possible to use a square
root (p = 0.5) which corrects for milder positive skewness in data.
However, the idea is not that yo u should try to find the exact power,
correct to so many places of decimal, like 0.627 or sorne such number, to
get a perfect symmetry. In practice, a suitable rounded power such as 0.25
or 0.75 will suffice as long a reasonable symmetry is achieved.
Let us now go back to our example of per capita incomes of Chinese
households, depicted in Figures 3.6 and 3.7. The effect of the logarithmic
transformation was somewhat stronger than necessary, resulting in a mild
negative skewness in the transformed data. To avoid over-correcting skew-
ness in the original data, we need to move up the ladder a bit: sorne power
in between O and 1 should do the trick. Por example, we could try a square
root, p = 0.5, or a fourth root, p = 0.25. With a bit of trial and error we
settle on the fourth root: p = 0.25. Table 3.7 gives us the summary statis-
tics and Figure 3.8 shows the histogram.
e
o
~
u.
o
1.77828 5.88566
4th root per capita household income
srata""
Figure 3.8 Fourth root transformation
Exercise 3.11
As in exercise 3.10, check whether the underlying distribution of the fourth
root transformation of the data on household income per capita can be
taken to be approximately normal in shape.
You should find that the fourth root transformation brings the data in
line with the normality assumption. In this case, the fourth root is a super-
ior transformation to the logarithmic one. But for reasons which will
become clear in subsequent chapters, economists often find it easier to
work with a variable log(Y) in an equation rather than with Yº· 25 • Hence,
the choice of an appropriate transformation often involves a trade-off
between one which is ideal for the purposes of data analysis and one
which performs reasonably well on this count but also has the advantage
that it lends itself to a more straightforward interpretation (in substantive
terms) of the results.
Exercise 3.12
Figure 3.4 shows the distribution of GNP per capita (in the file
SOCECON), a distribution which is skewed to the right. Using the data
in the data file:
1 take the logarithms of the GNP per capita data;
2 calculate the mean-based and order-based statistics of GNP per capita
and of log(GNP per capita);
3 check whether log(GNP per capita) is reasonably symmetric and thin
tailed;
4 compute the antilogarithm of the sample mean of the log(GNP per
capita) and compare it with the mean and median of the original data
(GNP per capita).
What do you conclude?
The same argument can be applied to the whole family of power trans-
formations since all these transformations preserve the arder of the data.
The inverse transformation of the sample mean of the transformed
data gives us an estímate of the population median of the original raw
data. For example, the median per capita household income of the data
Outliers, skewness and data transformations 105
on China is 153.3. The mean of the fourth root of the data is 3.526.
The fourth power of 3.526, which is the appropriate inverse transforma-
tion in this case, is 154.5, which is very close to the median of the original
data.
What applies to the point estímate also applies to the interval estímate
(the confidence interval). Hence, after estimatíng the sample mean of the
transformed data and calculating its confidence limits, we can then take
the inverse transformatíon of this sample mean and of its confidence limits
to obtain an interval estímate of the populatíon median of the original
data. It is not valid, however, to calculate the inverse transformation of
the standard error of the sample mean of the transformed data because
the reverse transformatíon changes the nature of the sampling distribu-
tíon of the estimator obtained by reverse transformation.
Note:First moment is given as first moment around zero, whereas the remaining are
moments around the mean, or derived from the latter.
Outliers, skewness and data transformations 107
of the estimator to perform well (relative to competitors) over a range
of different underlying conditions. While the sample mean is superior
when the data are drawn from a normal distribution, the sample median
is the more robust estimator and, hence, preferable when the underly-
ing conditions are unknown.
5 If the data are unimodal but skewed, a data transformation is called
for to correct for the skewness in the data. To do this we rely on the
ladder of power transformations which enable us to correct for differ-
ences in the direction of skewness (positive or negative) and in its
strength. Often, but not always, a transformation renders the trans-
formed data symmetric and, hopefully, also more normal in shape. If
so, the classical model of inference about the population mean using
the sample mean as estimator can again be used.
6 After analysis with transformed data it is necessary to translate the
results back to the original data. If the transformation was successful
and, hence, the transformed data are near symmetrical, the inverse
transformation of the sample mean (and of its confidence interval)
yields a point (and interval) estímate of the population median of the
original data, and not of its population mean. Yet the estímate is
obtained by applying the classical model based on the superiority of
the sample mean as an estimator when the normality assumption is
satisfied, to the transformed data.
ADDITIONAL EXERCISES
Exercise 3.13
Demonstrate algebraically that adding observation Xn + 1 to a sample of
n observations will: (a) leave the sample mean unchanged when Xn + 1
equals the sample mean for the first n observations; and (b)
increase/decrease the sample mean when xn + 1 is greater/less than the
sample mean for the first n observations.
Exercise 3.14
Using your results from Exercise 3.9, choose appropriate transformations
for each of your selected variables. Test for normality in each of the trans-
formed data series and comment on your results.
This page intentionally left blank
Part 11
Regression and data analysis
This page intentionally left blank
4 Data analysis and simple regression
4.1 INTRODUCTION
Standard errors
Given the assumptions of the classical linear regression model, the vari-
ances of the least squares estimators are given by:
1 _X2 )
var (b 1) = a2 ( ;;, + ~(X; _ X) 2 (4.24)
a2
var (b 2) =~(X;_ X) 2 (4.25)
(4.28)
using (4.25) and (4.26). The statistic, t(n-2), denotes the Student's t-distri-
bution with (n - 2) degrees of freedom. The reason we now have only
(n - 2) degrees of freedom is that, in simple regression, we use the sample
data to estimate two coefficients: the slope and the intercept of the line.
In the case of the sample mean, in contrast, we only estimated one para-
meter (the mean itself) from the sample.
Similarly, for b 1, we get:
b1 - Ho(/31)
t= se ( bl) - t(n-2) ( 4.29)
[~ + "L(X~ X )]1
12
se(b 1 ) = s 2 (4.30)
b1 ± t ( n-2,~) se(b 1)
(4.32)
b1 + b 2X 0
while its (1 - a) per cent confidence interval can be obtained as follows:
(4.36)
where
- [ 1 (Xo - X)2 ]112
se(Y0 ) - s 1 + -;,, + L:(X; _ X) 2 (4.38)
Hypothesis testing
The sampling distributions given in (4.27) and in (4.29) can be used for
tests of hypotheses regarding the intercept and the slope of the popula-
tion regression in much the same way as we did in the case of hypothesis
testing concerning the population mean of a univariate normal distribu-
tion. Remember, however, that in this case the t-distribution has (n - 2)
degrees of freedom instead of (n - 1). We shall illustrate the use of the
t-test in the exercises with section 4.5.
What if X is stochastic?
The inferences we make from the sample regression about the parame-
ters of the population regression are contingent upan the assumptions of
the classical normal linear regression model. One of these assumptions
states that X is non-stochastic but a given set of values. As we pointed
out above, this is perhaps a plausible assumption when regression analysis
is done in the context of experiments in which the researcher has control
over the values of X. But in development research our data derive from
observational programmes where neither Y nor X is subject to control.
Consequently, in most cases, both Y and X have to be considered as
stochastic variables.
124 Econometrics for developing countries
How does this infiuence the validity of our inferences? If X is stochastic,
the critica! question turns out to be whether X; and E; are statistically
independent from one another, and consequently do not covary. If:
E(X¡,E;) = 0, for i = 1, ... n (4.40)
it can be shown that the least squares estimators retain their property
of unbiasedness and, furthermore, the variances of the estimators, the
confidence intervals and the tests of hypotheses that we derived in this
section remain valid conditional upan the realised values of X. In other
words, provided X is independent of the error term, once a sample is
drawn (and hence the observed values of X are known), our inferences
are valid contingent upan these 'given' X values.
This may appear rather restrictive. However, if we make the additional
assumption that Y and X are jointly distributed as a bivariate normal
distribution, all formulae derived above with regard to the estimators of
the population regression coefficients, their standard errors, the confidence
intervals and the tests of hypotheses are all valid. In this book, we shall
not go in to detail about the bivariate (or, for that matter, multivariate)
normal distribution. Suffice it to say that if Y and X jointly follow a
bivariate normal distribution, both the marginal and conditional distribu-
tions will be normal, and, importantly, the regression of Y on X is linear.
However, the converse is not true. If we find that Y and X each follow
a normal distribution, this does not imply that they are jointly normally
distributed (see, for example, Maddala, 1992: 104-5; Goldberger, 1991:
68-79).
This completes our brief review of statistical inference in the classical
normal linear regression model. You will undoubtedly have noted the simi-
larities with the problem of inference about the population mean of a
univariate normal distribution. The latter is in fact a special case of the
general regression model. As with the sample mean, the least squares
regression line turns out to be a powerful tool of analysis if the assump-
tions of the classical normal linear regression model are approximately
valid in practice or if Y and X are jointly normally distributed. But if the
assumptions of the model are likely to be invalid in practice, the least
squares line, like the sample mean, rapidly loses its superiority as an esti-
mator of the population regression. Hence, befare embarking on statistical
inferences based on the least squares line, it is important to check care-
fully the validity of the assumptions of the model in practice. This is the
issue to which we now turn.
o o
o
o o
o
o
o o
o
o
124 o o
124 1010
Demand for labour: day shift
.i:::
, . . ·.
:.een ~<"'-. •.
,,1 ·.... ··.
¡u-
"O ...<<'· . . ".....·~ :: ~ ~· : ·... :·
•.
/~('.i¡:/ ,·.
~
o
.e
.E!!
o
e:
Q) ~/. ,•'•
É
·s .· ,·'
u
Q)
a:
124
124 1010
Demand for labour: day shift
s1a1a™
Figure 4.2 Exploratory band regression: D on R
Exercise 4.2
Using the data file TANZANIA, compute the growth rates of govern-
ment recurrent expenditures and revenues and then answer the following
questions:
Data analysis and simple regression 129
1 regress RE on RR, computing the regression coefficients, their stan-
dard errors, the R 2 , the residuals, and the fitted values of RE;
2 graph RE against RR with the regression line;
3 graph the residuals against the predicted values of RE;
4 plot the residuals against time, t;
5 fit a three-band exploratory band regression;
6 check whether the residuals behave as a sample of a normal distribution.
Is this a satisfactory regression? Explain your answer.
0.246037 o
o
o
o o
o
o
o
o o
o o
o o
-0.184229
-0.170612 0.222829
Growth in recurrent revenue
stata™
0.159212 80
79
~
e:
o
88
~ 82 76
75
oV>
E 87 77
Ql
"O 90
·¡¡; 85 72
Ql
a:
81 78
84
74
-0.103522
-0.14916 0.214631
Fitted values of RE
srara•M
0.246316 o
o
o
o
o
o
o o
o o
o o
-0.184229
-0.170624 0.222939
Growth rate recurren! revenues
0.15 -
0.1 -
0.05 -
o-r-~~~----t:=================================t~~~~
-0.05 -
-0.1 -
-0.15 -
sra1a 1
M
0.2
0.15
a: 0.1
a:
e
o
lJ.J 0.05
a:
o(/)
ca::::i o
"O
-¡¡;
Q)
a: -0.05
-0.1
-0.15
70 72 74 76 78 80 82 84 86 88 90
is useful (a) to draw a horizontal line indicating the zero mean, and (b)
to connect successive points with line segments. Figure 4.7 uses our
example to show you how to do this. If errors in successive years tend to
move in the same direction, this type of residual plot will show the pres-
ence of runs of positive or of negative residuals (respectively, strings of
points above or below the line ). If successive residuals are relatively uncor-
related, the curve of connected residuals will cross the line in a similar
way as fiipping a coin switches from a run of heads to a run of tails. If,
however, successive residuals are negatively correlated, a positive residual
will probably be followed by a negative residual and, hence, the curve
will be very jagged (crossing the zero line almost continuously). In our
134 Econometrics far developing countries
case, the curve connecting successive residuals shows a series of runs up
and down the zero mean, not unlike the type of runs you get by flipping
a coin. Hence, this quick check seems to indicate that the residuals show
little evidence of autocorrelation.
It appears, therefore, that in our example the assumptions of the clas-
sical normal linear simple regression model are reasonably satisfied. This
sets the stage for statistical inference based on the sample regression line.
Exercise 4.3 suggests sorne questions of statistical inference applied to this
simple example.
Exercise 4.3
Using the regression results listed in 4.40:
1 What economic interpretation would you give to the slope coefficient
of this regression?
2 What <loes its intercept tell you?
3 Formally test the hypothesis that the slope coefficient equals 1.
4 Formally test the hypothesis that the intercept term equals O.
5 Construct confidence intervals for (a) the conditional mean of RE, and
(b) the predicted value of RE for, respectively, RR = -0.10; RR =O.O,
and RR = 0.15.
6 Compute the hat statistics and the standard errors for each of the
residuals of the regression.
This regression through the origin, unlike the general model with slope
and intercept, does not adjust far the sample means. In other words, all
the farmulae far the relevant statistics in this case can be derived simply
by taking the corresponding farmulae far simple regression with a slope
and intercept and replacing all sample means by zero. The test statistics
and confidence intervals can also be derived in this way.
The R 2 statistic of a regression through the origin, however, loses much
of its usefulness as a measure of goodness of fit. It is not comparable with
the R 2 of the corresponding regression with intercept and slope. It is
furthermore possible to come across a negative R 2 in a regression through
the origin. This can occur when the intercept should not have been
dropped and the resulting residual sums of squares turns out to be higher
than the total sums of squares.
Our example yields the following regression through the origin:
Data analysis and simple regression 137
0.25 o
0.2
o
~
.a 0.15 o
'5
e: o
(])
a. 0.1
X
(]) o
e: 0.05
~
:::;
u o
~ o
.s::
~ -0.05
e
('.)
o o
o o
0.1
o
-0.15
o o
Exercise 4.4
Using the regression results in equation 4.45:
1 Formally test the hypothesis (3 2 = l.
2 What economic interpretation can you give to this unitary slope coef-
ficient? Why might we think it valid on theoretical grounds to exclude
the intercept from this regression?
Exercise 4.5
Look carefully at the four plots in Figure 4.9 (overleaf). For each plot
write down whether any of the points is: an outlier, a point of high
leverage, an inftuential point, or sorne combination of these.
You will have noticed that each plot contains only one point which qual-
ifies for discussion. Obviously, with real data the situation can be more
complex. Table 4.3 shows our quick summary. In general we note:
1 outliers are not necessarily inftuential (Plot 4)
2 but they can be so (depending on leverage) (Plot 3)
3 yet high leverage points are not always inftuential (Plot 1)
4 and inftuential points are not necessarily outliers (Plot 2)
In terms of visual displays, outliers can best be spotted with residual
plots (but they are also visible in a scatter plot especially if the fitted line
is shown), while inftuential points require us to look at scatter plots since
they do not show themselves on residual plots if they do not have large
residuals. Just running regression without looking at any plots leads you
to ignore inftuential points and makes spotting outliers more tedious. This
is the simple but powerful lesson of this exercise.
As shown above, graphical displays can be of great help in spotting
outliers and inftuential points. Apart from these graphical methods,
>-
"O
~Q)
Vl
..e
o
12.3972
0.05911 20
Observed X
Plot 1
35 o
o
30 o o
o o
o
o o
o
25 o o
>- o o o o
"O o
Q) o
e:
Q)
20 o
o
Vl o
..e o o
o 15 o
10
o
5
o
o 5 10 15 20 25 30 35
Observed X
51 i3Ii3'R
Plot 2
75.0795
o
o o o
ºo o o
o o 0o o o
Q> o o
0 0
o o 0
¡;p 0
o o
0
o
o 8
8' o °
o:9 ºo f' o
o o o
o o
º oº oioooº º
Q'.) o ~o
ocP ooº
o oº
ºo 8 8)ºº
'O o
12.3972 cP
0.05911 20
Observed X
5Ii3l i3"'
Plot 3
75.0795 % oo
Q:i o
o o
o o o
o o o
o o
o o o o
o ºo
Oo o
o o o
o o
o
>- ºº o
o o
"O 8
Q)
o 8º o
e:
Q) oºº o
o cP o
"'
.o o o o
o o
o o o 00 o o o
o
o o
ºº o o
o ºº
o oº o
ºº
o o o
o o
o o o
o 80
'b o
12.3972 ºº
0.05911 9.98287
Observad X
sra1a'"
Plot 4
Studentised residuals
In order to make the outliers conspicuous in relation to rest of the resid-
uals, it is useful to consider them in the context of the overall residual
variation. One way in which this can be done is to calculate the stan-
dardised residual, which is simply the residual divided by the standard
error of the estímate (i.e. standardised residual = e/s). However, the
problem with this measure is that if there is an outlier in the data set it
will ínflate the standard error of the regression. This problem is catered
for by using instead s(i) where the (i) subscript denotes a statistic calcu-
lated having dropped the ith observation from the sample. Be careful not
to be confused by this notation; for example h; is the hat statistic for
observation i (see below), whereas bz(i) is the slope coefficient having
dropped the ith observation from the sample.
Making this adjustment, we define the studentised residuals (t;):
e.
(. = ' (4.46)
' s(i)Y(l - h;)
where s(i) is the standard error estímate of the regression (defined in equa-
tion (4.26)) fitted after deleting the ith observation, and h; is a measure
of leverage as defined in equation (4.39). The additional term in the
numerator, '1(1 - h;), is necessary since the variance of the residuals is not
constant. With this adjustment, we get a t-statistic which tests whether the
ith residual is significantly different from O and, hence, signals an outlier
which <loes not really fit the overall pattern. (In fact the studentised
residual may be interpreted as a test of influence on intercept, but we
pursue this interpretation in Chapter 6.) It is possible to obtain formally
derived critica! values (which are larger than those from the usual t-table)
against which to compare the calculated value, but we recommend that
the studentised residual be used as an exploratory tool, as in the example
below.
Take another look at Figure 4.6 which depicts the box plot of the resid-
uals of the regression of RE on RR. This plot <loes not reveal the presence
of outliers, although its upper tail is somewhat more prolonged than its
lower tail. The box plot of the studentised residuals shown in Figure
4.10, however, tells a different story: the data point for the year 1980
now appears as an outlier. Studentised residuals, therefore, are much
better than the usual residuals at spotting outliers. The reason is that
each studentised residual is obtained by dividing the least squares residual
by its standard error (hence, a t-value) where the standard error of the
Data analysis and simple regression 143
1980
srata""'
Figure 4.10 Studentised residual RE
regression is estimated by deleting the ith data point. If the ith data point
is an outlier, the standard error of the regression after deleting the ith
observation will be significantly lower than the standard error of the
regression with all points included. This explains why the studentised
residual is better at bringing out outlying points.
75
0.2 89
80 78
0.15
84
82
88
0.1
7~1
85 90
73
0.05
o
-0.02 -0.15 -0.1 -0.05 o 0.05 0.1 0.15 0.2
Growth rate: recurren! revenue
STaTa™
Figure 4.11 Hat statistics versus RR
Data analysis and simple regression 145
linear functions of the observed Y values. The hat statistic, h; is nothing
other than the coefficient of the observed Y; in the linear equation which
expresses the predicted value Y; as a linear function of all observed Y; s.
This explains why they are called hat statistics.
DFBETA statistics
The DFBETA statistic is defined as:
DFBETA. = b 2 - b 2<0 (4.48)
' se(b 2 )(i)
where b2(i) and se(b 2)(i) are the slope coefficient and standard error of the
estimate of the slope from regression estimated, having dropped the ith
data point from the sample. The DFBETAs measure the sensitivity of the
slope coefficient to the deletion of the ith data point. If the deletion of
this point leads to a drastic change in the slope coefficient, it follows that
the ith data point is infiuential. In other words, a large value of the
DFBETA statistic for a given data point indicates that this observation
has a sizeable impact on the slope coefficient of the regression. DFBETA
may also be calculated for the intercept, although this is not a common
practice. (In Figure 4.9, plot 4, the outlier will infiuence the intercept,
though not the slope coefficient.)
But how big does a DFBETA statistic have to be to be considered
large? The DFBETA is not a formal test statistic like, say, the t-test.
Therefore we do not have critical values derived from statistical theory.
But we can use sorne rules of thumb. In general, if DFBETA > 2, the
corresponding data point is unquestionably an influential point. This is a
general criterion. It is also possible to relate the cut-off value of the
DFBETAs to the sample size, n. If DFBETA > 21-..Jn, the corresponding
data point may be deemed infiuential (Myers, 1990: 261; though sorne
sources suggest 31-..Jn). As a rule, it is useful to make a box plot of the
DFBETA statistics and check whether there are any outliers. Figure 4.12
shows the box plot of the DFBETA statistics of the regression of RE
against RR. As we can see, none of the data points has a DFBETA statistic
which exceeds 2, while only the data point 80 (corresponding to the growth
rate from 1979 to 1980) exceeds 21-..J20 = 0.447. This confirms our earlier
hunch that this point exerted a slight pull on the regression line without,
however, causing any major distortion of the results.
A final point: DFBETA statistics are obtained by deleting each data
point in turn and checking the effect of doing so on the slope coefficient of
the regression line. It is possible, however, that a number of infiuential
points may cluster together and jointly pull the regression line in their
direction. In such a case, deleting data points one by one may not reveal
the pull exerted by this infiuential cluster of points. This is why you should
never rely solely on DFBETA statistics to check for infiuence, but also take
146 Econometrics for developing countries
0.1
0.5 74
78
-0.5
-1
-1.5 80
-2
5 laTC:J'M
a good look at the scatter plot of Y against X to see whether there are any
clusters of influential points. The formula for DFBETA may be just as eas-
ily applied by dropping two or three points from the regression. At times,
several clusters exist which may pull the regression line in similar or oppo-
site directions. But if you find you have many 'influential points' you can
be sure that the problem is one of model misspecification.
Exercise 4.6
In section 3.3 of Chapter 3 we investigated the relation between the differ-
ence between female and male life expectancy, L, of developing countries,
on the one hand, and GNP per capita, Y, on the other. In doing so, we
grouped the data into three income categories: low, lower-middle and
upper-middle income countries. Here we look at the same relation without
prior grouping of countries in income categories. Using the data file
SOCECON:
1 regress L on Y;
2 check whether this regression is likely to satisfy the model assump-
tion;
3 try an exploratory band regression;
4 check graphically whether there are outliers, points of high leverage,
or influential points;
Table 4.4 Summary measures of outliers~ influence and leverage
Statistic Formula Use Critica[ value
e¡
1 (X-X)- 2
Hat statistic ( h) h=-+ 1 - Leverage Bounded by lln (no leverage) and 1
n L.(Xi -X)2
(extreme leverage ); values above 0.5
indicate excessive leverage and values over
0.2 indicate that the observation may give
problems.
b2 - b2(i)
DFBETA DFBETA¡ = se(b2){í) Influence Under 2/-Jn the point has no influence;
over 3/-Jn the point is influential and
strongly so if DFBETA exceeds 2.
Note: n is the sample size; k is the number of regressors; the subscript (i) (i.e. with parentheses) indicates an estímate from the sample
observation í. In each case you should use the absolute value of the calculated statistic.
148 Econometrics far developing countries
5 compute the studentised residuals, the hat statistics, and the DFBETAs
for the data points. Is this regression a good summary of the data?
Note that the computation of studentised residuals, hat statistics and
DFBETAs is cumbersome unless you have access to a statistical pack-
age which routinely provides these diagnostic statistics. Unfortunately,
most econometric packages do not provide these statistics. For this
reason, it is useful to familiarise yourself with a statistical package (for
example, STATA) which incorporates residual analysis and influence
diagnostics in its statistical routines.
ó
>. o o
e 1000 ºo o o
o
(])
o ºo i o o oº
~o¡¡g~@o
e: o @
w <o
o
f
o 1000 2000 3000 4000 5000 6000 7000
GNP per caprta
1
§' 9
·a_
ctS o o
()
8 o o
O> ºoº o0 o
a.. o
aeº o
e: 7 o
o 'b go
o o
o o o o 'b o oº
li. 6 o o o
E
::J
00
o
o o o o 0
o
0
o
o
io
0
e: 5 o o
o o
oo o º º oº o
o o
>. 4 o o o o
Ol o o o
w
e: 3
o o
o ~ Oa o
o
o
!:!::!. o
Ol
_Q
2
1
4 5 6 7 8 9
lag (GNP/capita} STaTa""'
Exercise 4.7
Use the data on energy consumption per capita, E, and GNP per capita,
Y, from the data file SOCECON far all countries with a GNP per capita
of less than $10,000. These are the low, lower-middle and upper-middle
income countries. Answer the following questions:
1 Regress E on Y, and log(E) on log(Y).
2 In each case, check whether the normality assumption is reasonably
satisfied, whether the residuals are homoscedastic, and whether there
are any outliers or influential points.
3 Which, do you think, is the better regression, and why?
.l!l
4000
·a.
C1l
u
Q¡
c.
e 3000
o
aE
::J
(/)
e 2000
o
u
>,
e>
Q)
e
w
1000
o
o 1000 2000 3000 4000 5000 6000 7000
GNP per capita
srara"'
Figure 4.14 E on Y versus log(E) on log(Y)
roughly equal weight to all data points, unlike the linear regression, the
location of which is mainly determined by the inftuential points on the
right-hand side of the scatter.
More formally, there is a procedure which may be followed to compare
the R 2s for regressions with transformed dependent variables. The method
is as follows:
1 Carry out the regression with the transformed dependent variable and
calculate the fitted values.
2 Convert these fitted values back to the original data units (for example,
if you have made a log transformation, then take the exponential of
the fitted values).
3 Calculate the correlation coefficient between the converted fitted
values from step 2 and the actual values of the dependent variable.
The square of this correlation coefficient may be directly compared
with the R 2 from the regression with the untransformed dependent
variable.
Semi-logarithmic transformations
Semi-logarithmic transformations involve equations which can be
linearised by a logarithmic transformation of either the dependent or the
154 Econometrics for developing countries
independent variable. There are two variants. The first is given by:
(4.57)
This specification, like the double-log specification, involves a multiplica-
tive error term. Taking logarithms of both sides yields:
lnY = 13 1 + 13 2 X+ E (4.58)
where
131 = lnA (4.59)
The slope coefficient of this specification can be expressed as follows:
-dln
f3z- -- Y -dY
--1
(4.60)
dX -YdX
which shows that the slope coefficient depicts the relative change in Y per
unit change in X.
To illustrate this semi-log model, let us take another example with data
from the file SOCECON. The top panel of Figure 4.15 shows the scatter
plot of energy consumption per capita, E, against the urban population
as a percentage of the total population, U, for low, lower-middle and
upper-middle income countries. As you can see, the scatter plot reveals
that the underlying relationship is non-linear. Note, furthermore, that the
distribution of the regressor, U, is fairly symmetric in shape, while that
of the regressand, as we know already, is skewed to the right. The lower
panel of Figure 4.15 plots log(E) against U. Once more, the transforma-
tion solved more than one problem: the scatter shows a linear pattern and
is no longer heteroscedastic. Consequently, the regression of log(E)
against U, unlike that of E against U, is likely to satisfy the assumptions
of the classical linear model. We leave it to you as an exercise to verify
this. Figure 4.16 compares the linear regression line with the non-linear
curve estimated by the semi-log model.
An interesting case of this semi-log model arises if its regressor is a
variable denoting time, t. For example, if t denotes continuous time the
semi-log model depicts an exponential trend with a constant (instanta-
neous) rate of growth given by its slope coefficient.
It is more common, however, to measure time in discrete intervals (say,
a year). In this case, we can best modify the specification as follows:
(4.61)
where r is the constant (yearly) growth rate implied by the trend. Taking
logarithms of both sides of the equation yields:
(4.62)
where
s 15000 j o
·a. 14000
cu 13000
u 12000
ID 11000
a.. 10000
e
o 9000
li. 8000--, oº
E 7000
~
ti) 6000 o
e 5000 o ºo
o o o Cb
u 4000 o oº o o
>. o o o o
3000 o o ºº oo o ºo
e>
Q) 2000
o
o o o ºº o
e o 0 o
1000 o o
w ºo º oººo
ºoO oO o
o o oco ~ ~ 8 o~ c2 ~ c&ne9 oO oº o
o 10 20 30 40 50 60 70 80 90 100
Urban population in % total
§' 10
·a.
cu o
u 9 ºº
o
ID o o o ºº
o g:i ºo <-O o
c.. 8 o o o cP q, o 0
o
e o o o oO o
o o o
7 o ºº o o o o
li. o o o
o ºª o o o
o
E o
~
ti)
e
6
o
o oº o
o o oo
o
o o o
o
ª oº o
o 5 o oººo o
u o o oO o
00 o o o o
>.. o o oºº o
Ol 4 o o 00 00 @
ID
e o o o o
3 o o o o o o
~ o
o
Ol 2
_Q
1 ¡ 1 l 1 1
o 10 20 30 40 50 60 70 80 90 100
Urban population as % total STciTa.,.,..,
Figure 4.15 Energy consumption versus urban population as percentage of total population
156 Econometrics far developing countries
15000
14000
13000
12000
.E 11000
·a.
Cll 10000
~
e 9000
o
E. 8000
E
:J 7000
"'oe 6000
u
>- 5000
e>
Ql
e 4000
UJ
3000
2000
1000
o
o 10 20 30 40 50 60 70 80 90 100
Urban population as % total
srata"'
Figure 4.16 E on U versus log(E) on U
13 1 = lnY0 13 2 = ln (1 + r) (4.63)
In summary, the slope coefficient obtained by regressing log(Y) against
t gives us an estimate of either the instantaneous rate of growth or, after
taking the anti-logarithm and subtracting 1, the period rate of growth,
depending on whether time is seen as a continuous or a discrete variable.
Note however that ln(l + r) = r for small r so that the growth rate will
be approximately the same by either interpretation under these circum-
stances.
The second variant of the semi-logarithmic model only involves the log
transformation of the regressor, as follows:
(4.64)
Note that this model <loes not involve a transformation of the depen-
dent variable and hence the error term enters the equation in its usual
additive fashion. The slope coefficient of the model is given by:
dY dY
f3 - - --- -
2 - dlnX- dX/X (4.65)
which, less formally, depicts the absolute change in Y per unit relative
change in X.
.----¡
80
o
o o
o o o o o o o o
rP 8 o
70 -1 o ºº oº
0(11) 00 <t:i o o o o
{)' o
e o o ºo o~ o oº o
~<D 60 -1 o oº
o
a. &, o og
X o
<D 50 -1 Jo%o o
.!!:! o o
:.:::¡ o
o o
40
30
80
o o
o o
o o o o 8 ºo o oºo o
70
>. ºo roºº%º oc o
o o o
e O CD O oo'tJ o O
ctl
t3 60 o
e o o o
(l)
a.
X
o oo
o o o
o~ o o
o o
<D 50 00 080 ooººº
<D o o o
:s 40
o
o
o
30
1
4 5 6 7 8 9
log {GNP/capita) STaTa"™
80
70
>.
()
e:
60
"'c.
t5
Q)
X
Q)
~
:.:J 50
o
o
40
30
o 1000 2000 3000 4000 5000 6000 7000
GNP per capita
srata™
Figure 4.18 L on Y versus L on log(Y)
Data analysis and simple regression 159
specification would have been most appropriate to model the data in Table
4.1 on food expenditures as a percentage of total household expenditures
as a function of total household income in Tanzania in 1969.
Exercise 4.8
Do exercise 4.6 again, regressing L (the difference between female and
male life expectancy) on Log(Y) rather than on Y (GNP per capita). Are
there any outliers or (clusters of) influential points left. If so, how do you
suggest tackling the problem?
ADDITIONAL EXERCISES
Exercise 4.9
Use the data in data files INDONA and SRINA to estimate a consump-
tion function for Indonesia and Sri Lanka respectively. Repeat the
estimation excluding the intercept in each case, and comment on your
findings.
Data analysis and simple regression 161
Exercise 4.10
Demonstrate algebraically that adding a point (Xn + 1, Yn + 1) to a sample
of n observations will (a) not influence the slope coefficient of the regres-
sion of Y on X if Xn + 1 is equal to the sample mean for X of the n
observations; (b) that the intercept from the regression will probably
change even if Yn + 1 is equal to the sample mean for Y of the n obser-
vations; and (e) that if Xn + 1, Yn + 1 lies at the point of means of the sample
of observations, then the regression line is unchanged. Generate a numer-
ical example to illustrate your findings.
Exercise 4.11
Use the data in the data file TOT to regress the terms of trade on a
constant and a time trend where the trend is defined as (a) t = 1, 2, ...
39; and (b) t = 1950, 1951 ... 1988. Compare the estimated coefficients.
Derive algebraically the general result which is verified by your terms-of-
trade regression.
Exercise 4.12
Using the data in the data file TOT, regress the terms-of-trade index and
its log on a time trend. How do you interpret these two sets of results,
and which regression model has the more appropriate specification?
Exercise 4.13
Prove that the least squares estímate of the slope coefficient in the simple
regression model is an unbiased estimator. State clearly any assumptions
you make.
Exercise 4.14
Draw a scatter plot of consumption against income using the data in data
file SRINA. Plot on this graph (a) the fitted values of consumption
from the consumption function; and (b) the upper and lower limits of the
confidence interval for the fitted values. Comment on the shape of each
of the curves you have plotted.
Exercise 4.15
Figure 4.19 shows a data set with a clear point of high leverage. Also
shown are the regression line with and without this observation included
in the sample. When the observation is excluded, the regression line seems
to fit nicely through the points. However, for the full sample the regression
162 Econometrics far developing countries
240 .--------,--~~~~~~~~~~~~~~~~~~~~~~-----,
220 D
200
180
160
140
>- 120
100
80
60
40
20
o ~~~~~~~~~~~~~~~~~~~~~~~~~
line not only misses the point of high leverage but seems also to fit less
well to the other points. However, the R 2 from the full sample regression
is 0.85, compared to 0.78 when the point of high leverage is excluded.
How can this result be explained and what important lessons can you
draw from this example?
5 Partial regression: interpreting
multiple regression coefficients
5.1 INTRODUCTION
(5.1)
166 Econometrics far developing countries
where E is the error term subject to the usual set of assumptions of the
classical normal linear regression model.
Taking logarithms of both sides yields:
(5.2)
where 13 1 = logA.
Economic theory suggests that 13 2 > O and 134 < O: that is, the income
elasticity is positive while the own price elasticity is negative. As to the
expected sign of 13 3, the coefficient of the price of food, Krishnaji (1992:
106) argues that this coefficient will be negative for the following reason:
if, as seems to be the case, food consumption levels are either inade-
quate for survival (as among the poor) or unsatiated (whether in
quantity or quality, as among sorne above the poverty line), what deter-
mines the allocation process for the majority of the population is not
the total but the 'residual income': that part of income which is left
over after food articles have been bought.
Consequently, if Krishnaji's hypothesis is correct, we expect that 13 3 < O:
the rise in the price of food will adversely affect the demand for manu-
factured goods.
Furthermore, unlike Krishnaji, we assume there is no 'money illusion'
- that is, if nominal income and both price indices rise proportionally, the
demand for manufactured goods remains unchanged. Hence, our assump-
tion - which we shall not formally verify at this juncture - requires that
the slope coefficients in equation (5.2) add up to zero (see Box 5.1). That
is, we assume that equation (5.2) is homogeneous of degree zero, hence:
(5.3)
or
(5.4)
or, alternatively:
(0.36) (0.195)
R2 = 0.09, TSS = 0.4158, RSS = 0.3788, ESS = 0.0037
Simple regression (double-logarithmic)
o
o
o
0.260046
Relative price of food
0.132003 o
o
o o
o
o
o o
.E o
"O o o o
e
<1l
E
Q)
o
o o o o
o
-0.98754
-0.240881 0.260917
Relativa price of food
s1a1a'M
went up, but so did money incomes, the partial regression plot would
remove this covariation between both explanatory variables and only
depict movements in the price variables over and above changes in the
income variable. In the process, nothing is held constant, but neverthe-
less the linear influence of the income variable is removed from the stage
to allow us to look deeper into the patterns of covariation.
Sweeping out
This section has introduced you to a powerful concept, partial regression,
which allows you to look deep into the structure of data so as to bring
to the surface patterns within the data which are not immediately obvious
but indicative of deeper relations among variables. Partial regression uses
residuals of prior regressions with the raw data. For this reason we used
terms such as 'accounting for the infiuence of other variables' or 'removing
Partial regression 173
the linear influence of other variables'. Or, equivalently, we say that we
'control for other variables' while looking at the relation between any two
variables. These expressions are quite cumbersome to use. Perhaps you
will agree with us that EDA's more colourful expression of sweeping out
makes the point more vividly (Emerson and Hoaglin, 1985). Hence, when
we return to these important concepts of partial regression, partial corre-
lation and partial regression plot in section 5.3, we shall frequently say
that we look at the relation between Y and X, while sweeping out Z,
meaning that we control for the linear influence of Z on both Y and X.
Exercise 5.1
From SOCECON, select the variables Birth (B, the birth rate ), the loga-
rithm of GNP per capita (log Y), and Urban (U, the urban as percentage
of total population). Compare the partial regression of B or log Y,
sweeping out U, with the simple regression of B or log Y. Next, look at
the partial regression of B or U, sweeping out log Y and compare it with
the simple regression of B on U. What do you conclude?
(5.14a)
which show that the least squares residuals are uncorrelated with each of
the regressors, a result which is again familiar from simple regression
analysis.
Fourth, the least squares regression is a plane in the three-dimensional
space, though it is often referred to as the 'regression line' by analogy
with the case of simple regression. The resulting regression line which
yields the average relation of Y for given X 2 and X 3 is written as:
(5.15)
Now, equations (5.13a) and (5.14a) imply that the predicted (or fitted) Y
values are uncorrelated with the residuals. This can be shown as follows:
(5.16)
Hence, the mathematical properties of the least squares regression line
in multiple regression are a simple extension of the properties of the
simple regression line.
(5.18)
Expressions (5.18) and (5.18a) reveal that b2 depends not only on the
slope coefficient of the simple regression of Y on X 2 , but also on the
slopes of the simple regressions of, respectively, Y on X 3 and X 2 on X 3
(the auxiliary regression). A similar argument can be made for b 3• Hence,
in general, simple and multiple regression do not yield the same estimates
of slope coefficients of simple and multiple regressions. There are,
however, two exceptions to this general rule, about both of which it is
instructive to know. To see what these exceptions are, we suggest you
attempt exercise 5.2 befare reading on.
Exercise 5.2
Using equations (5.18) or (5.18a) and (5.19) or (5.19a), show that the slope
coefficient of X 2 will be the same in the simple regression of Y on X 2 and
in the multiple regression of Y on X 2 and X 3 , if (a) X 2 and X 3 are uncor-
related with each other; and (b) b3 , the multiple regression coefficient of
X 3 , equals O. Generate a data set to illustrate both of these special cases.
The proofs are simple and straightforward. Each case, however, gives
us sorne interesting insights into the question as to how multiple regres-
sion seeks to disentangle the separate effects of different explanatory
variables on the dependent variable. We discuss each case in turn.
Partía! regression 177
Orthogonality and perfect collinearity of regressors
If X 2 and X 3 are uncorrelated, b 23 , b 32 and r23 all equal O, and, hence, the
simple regression coefficient of Y on X 2 equals the slope coefficient of X 2
in the multiple regression of Y on X 2 and X 3: that is, b2 = by2 • In this case
we say that the regressors X 2 and X 3 are orthogonal.
Intuitively, this result makes sense. If the explanatory variables in a
multiple regression do not covary linearly with one another, it follows that
multiple regression analysis allows us to distinguish clearly between their
separate effects on the dependent variable. Multiple regression, therefore,
becomes the simple addition of the constituent simple regressions. Sweeping
out is unnecessary because the explanatory variables do not overlap.
The opposite of orthogonality occurs when a perfect linear relation
exists between X 2 and X 3 , that is, X 2 = a + b.X3 , where a and b are non-
zero constants. In this case, r23 = 1: we say that both regressors are
perfectly collinear. From equation (5.19) it follows that b 2 will be inde-
terminate because the denominator in the formula will be zero: we can
either regress Y on X 2 or Y on X 3 but not Y on both X 2 and X 3 together.
In many regression packages you may get a message such as 'singular' or
'near singular matrix' when attempting to perform a regression. This
message means that sorne of your regressors have a strong linear rela-
tionship with one another and so, since it involves a division by zero, the
computer cannot complete the calculation.
(5.20)
then X 1 (i.e. the constant term: X 1; = 1, for all i), X 2 , X 3 , ••• , Xk are said
to be perfectly multicollinear because, in this case, any of the Xjs can be
expressed as a linear function of the remaining regressors. As in the three-
variable case, it will then not be possible to salve the system of normal
equations and, hence, multiple regression breaks down.
The question of superfluous variables extends equally to the k-variable
case in multiple regression. If we find that bj = O, dropping Xj from the
regression model will not affect the other coefficients in the model. But
this does not mean that this variable ~ will yield zero coefficients in
Partial regression 179
regressions (simple and multiple) which involve subsets of the initial
broader model. In other words, if regressing Y on X, Z and W yields a
zero coefficient for X, it does not necessarily imply that the slope coeffi-
cients for X will also be zero in the simple regression of Y on X, in the
regression of Y on X and Z, or in the regression of Y on X and W.
Orthogonality or perfect collinearity as well as slope coefficients which
yield exact zero values are in fact extreme situations. With respect to
collinearity, as far as regression is concerned, life is simplest when regres-
sors are orthogonal and it is impossible when perfect collinearity prevails.
Most socioeconomic data, however, fall somewhere in between these two
extremes: regressors derived from non-experimental data tend to covary
with one another but not in an exact linear fashion. However, more often
than not the strength of the correlation between regressors tends to be very
high, which makes it difficult to distinguish clearly between their separate
effects on the dependent variable. Moreover, due to the presence of
collinearity, a variable may perform well in either simple or multiple regres-
sions until sorne other variable takes the stage and renders it superfluous.
Interpreting multiple regressions, therefore, is a complex problem since
the presence of imperfectly collinear regressors blurs the picture when
dealing with non-experimental data. The main lesson we learn from
looking at the algebra of least squares is that the inclusion or deletion of
one or more explanatory variable(s) from a regression will generally not
leave the regression coefficients of the other regressors included in the
model unchanged. In other words, when interpreting the slope coefficient
of a particular regressor, Xj, it usually matters which other regressors are
also included in the model. That is, the estimated coefficient of any
regressor can change quite dramatically across a range of specifications
which differ depending on which other variables are included in or
excluded from the model. We shall return to this issue in section 5.7.
(5.24)
(5.25)
(5.26)
Exercise 5.3
In section 5.2, we studied the effect of the price of faod on the demand
far manufacturing goods in India in the context of a three-variable model
arrived by imposing the assumption of 'no money illusion' on the general
model given by equation (5.2). In this exercise, you are requested to drop
this assumption and to use model specification (5.2) instead to work out
the fallowing questions:
1 Regress log(M) on log(Y), log(P1) and log(Pm).
2 Compute the three partial regressions and their coefficients of partial
correlation.
184 Econometrics far developing countries
3 Compare the partial regression plots with the corresponding simple
scatter plots.
4 How would you interpret, respectively, the simple, partial and multiple
coefficients of determination?
5 Show how the multiple R 2 can be derived from the ESSs of a hier-
archy of simple regressions involving the raw data as well as residuals
from prior regressions.
6 Compare the results of the four-variable model with those of the three-
variable model based on the assumption of 'no money illusion'.
7 In your opinion, do you think the assumption of 'no money illusion'
is warranted? Why?
Model assumptions
The classical model is subject to the following usual assumptions:
1 The population regression is adequately represented by a linear func-
tion of the k variables included in the model: E(Y;) = !31 + !32 X 2; +
!33 X3; · · · + !3¡ X¡;;
2 the error terms have zero mean: E( E;) = O;
3 constant variances: V( E;) = a 2 , for all i;
4 and zero covariances: E; and E¡ are uncorrelated for i =/= j;
5 the error term and the X¡s ha ve zero covariances: E( E;,X¡;) = O, for all
i and j;
6 there is no perfect collinearity between regressors: no exact linear rela-
tion exists between the X¡;s, j = 1, 2, 3, ... k, where Xli = 1 for all i
(i.e. the constant term).
Apart from the last one, all assumptions are exactly the same as with
simple regression. An added assumption is needed to ensure that least
squares will yield a solution for the coefficients of the model. The das-
Partial regression 185
sical normal multiple regression model requires that we add the familiar
normality assumption:
7 the error terms have identical normal distributions: E; - N(O,a2),
i = 1 ... n.
Statistical properties
Given the assumptions of classical multiple linear regression, the Gauss-
Markov theorem can be extended to the k-variable case to prove that, as
in the simple regression model, the least squares estimators of 13¡ are
BLUE - best, linear and unbiased estimators. Furthermore, if the
normality assumption is valid, the least squares estimators will also be ML
(maximum likelihood) estimators and, hence, have the property that they
have mínimum variance among all estimators. As with the sample mean
and the least squares simple regression line, the least squares estimators
are unbeatable if the assumptions of classical normal multiple regression
are satisfied in practice. As befare, the normality assumption lays the basis
for statistical inference in the linear regression model.
where
for the constant term, and, similarly, for the slope coefficients, we get:
a2
V(/3¡) = ( ) ;j = 2,3, ... ,k (5.30)
S¡¡ 1- R¡2
or
a2 .
V(f3j) = - · V!Fj ; ¡ = 2,3, ... , k (5.32)
sjj
where
1
V!Fj =l _ R.2 ;j = 1,2,3, ... ,kl]! (5.33)
1
are called the variance infiation factors of the coefficients of the regres-
sion model. The V!Fjs measure the degree of multicollinearity among
regressors with reference to the ideal situation where all explanatory vari-
ables are uncorrelated (R2i = O implies V!Fj = 1, for all j = 2, ... k). If,
however, R 2i is positive but smaller than 1, V!Fj > 1, approaching infinity
when R 2i approaches l. If we obtain high values of the R 2is, however, we
should not conclude that our regression coefficients will also have high
variances. Much depends on other factors, namely, the sampling variances
of the regressors and the error variance. Por example, a small error vari-
ance coupled with a relatively high degree of multicollinearity can still
produce acceptable sampling variances of our estimates.
Conversely, the prevalence of large estimated standard errors for the
coefficients of the regression model cannot always be ascribed to the
problem of multicollinearity. It is indeed equally possible that our model
was badly specified and, hence, <loes not capture the real determinants of
the dependent variable in question. Consequently, our error variance will
tend to be large. Alternatively, our X variables may vary too little to get
meaningful precision in estimation.
o'O- ~o
o
0
lºi 0
o
i?o<:tJo
80
ll
GNPc
ºº o Sao cPO~
~ ~o
~~Qaaa~
o
O ro
ooJliilr ~CD ób
O o & cPOO
oerner2a ~ ~ ·~º
o o 8
100
o o o
~ 'b o o~ oº ºº
o .Jbeflº
(l)
142 º a i
º~~ ~~~'8
o o n-,~ ºº o o o o o~8°
º~~º
INFANT CP-'&cP o o
MORTALITY &o~~
~
º~~~L
o o o ºº
4 1 &wro~~ 00
o cP 100
lí:;;.,~/~Í>~<»oo°;,
1
JJ cP<t,~~l'fp
~o
ft''"'~ ~o°ª5b~
0.932 o
o
~~go
ºº ~Ji:. ºª~ ~óC o
ó> o o o o
oº oJ~~~&
o o
HDI ~ ooo
~"" ~
o ºct1ocn
o o 00 u-(JJ o o
~o
0.191 o
º~~~~ºº"~~
o o
oo oºo ol!I
o O"
ºo
<l>
o
54
'6~o
~
o o o o
00
ge §r;{? 'iblt @ o~ º?; o o
~8
o q, o Cb o
g@ó~~º@~~io
0 0
o'a.i/lf 0 '6 o'2i ºº&%>o o 0
:li~aºoºº
o o On0,.
~o'gog;
0~ 0 8°0 BIRTH
o o c8o 6b oo o oº -0'%º o
&,_ 000 o ~ ,!' o 01; :Pai;o
'fl>~
J;:Jº 9>oºó>gg'9@0o
&
oo E- ºo o "U~º& c»&o~m o o o0 cgw
9
100 33590 5 100 9 54
~ 'll ~
sº o588.o~ 0 10.422
• ºg o c9 ~o
oºº 0 o00
~~~~o
o o o o
~&o º
0 0
~O~~ ~ 80~~ co
•F~"'º'b
o ºo o
LOG (GNPc) ºo o
~"9~~ o~O-.~J3
~ ºº~o
<D
oo
o o o (l)O? B~!ll>o
4~~~cP o ºº °o°o~Voo o
o ~li)- o o
o o 4.60517
11.9164 o o
2
O c@~O~~ o ig : o~ •ºo o
~
i(o®JP& 0°08~ 68~<&
0
o c9 % URBAN 00
ºot~º'I,,º ~ g o~ o~gxp~
o
o oaocoº 00
~ Q %>º~ o
Q)O O 09> 00 -o o o ~o eo-v
o o o 8ºººº1
00 o o 5
~~\ºo
0.868624 fJr&'rJld
tcn:¿s ~ o
o0
rf,.o':o 9'go o
~ ~o
l
0 o o
& o0 oa:o
'~¡i;o 0 ~ <$'
Oo 0 0 0
ó' "b
o o
o
o c9 o o oo
o ºB<b
o <oº
SQUARE HDI o J\q ~
0
0
o
8
~r ºct10<(9~
0 0
o¡/loo oº o oO ºo
x o o
o
o o <-O ºo
ºo o
oº
o oº
~ ~ºOt:io o x
! o
o 8 Cb
~e
o 'iJf'tPº <JAº% o
'-'Oº
9 .9
-
.et:::: ºº~o o oº o oB>
o o 00
o óh.
~-u
oólo
8
o
ct:Po
fP
o .et::::
:o 0 :o
Q) o o o Q) 1 o
o o o
o o o
o
-26.0901 -,-
-1.46011
o
e( lgnpc 1X)
.71423
--27.9042 L
-2.03139
o
e( sinfmort 1X)
2.81092
(e) coef = -i 9.321726. se= 7.9017212. t = -2.45 (d} coef = 0.0090213. se= 0.04005153 t= 0.23
19.8638 o
o 20.7867 ~o o
o o
o
o
o ~ o o o
ºo o €l R o o o
x x o ºa o -º~~o~ ';.RO:P0 °o o
.et::: o~ o S~" e
~ ~
o 0
.e o o o o Q) & o
t:::: o o o oº & o'() o
:o o
oº
i5 o o 0
ºº &
0
o
Q) o Q)
o o o o o o o 8
o ºo o
-24.3383 o -25.6923 . - - o
-0.18374 0.163167 -37 .8281 34.1522
e( hdi2 IX) e( urbanpop 1 X }
Figure 5.5 Partial regression plots of regression of birth: on log(GNPc) (b) on Infmort05 ; on HDF; and (d) on urban
192 Econometrics for developing countries
derivation remains the same. Since the least squares estimators are linear
functions of the Y¡s, it follows that the predicted Y;s are also linear func-
tions of the Y;s. The hat statistic hu is the coefficient of Y; in the function
which expresses the predicted Y; as a linear combination of the observed
Y;s. It can be shown that this statistic measures the distance a particular
configuration of X values corresponding to data point i lies in relation to
the main body of the X data. Hence, hu measures leverage as a result of
the position of various Xjs in combination. The studentised residuals as
well as the DFBETAs are calculated analogously to the case of simple
regression. Note, however, that in multiple regression we have a set of k
DFBETAs, one for each coefficient in the regression. In a three-variable
case, for example, a particular data point may exert infiuence on the slope
coefficient of X 2 without affecting the slope of X 3•
Obviously, a critica! assumption of the classical linear regression model
is that all relevant variables have been included in the model. Much of
our concern in the remainder of this chapter as well as in Chapter 6 is
to come to terms with this assumption. How do we know whether our
model is adequate inasmuch as it includes the main variables? As we shall
see, there is no easy answer to this. At this juncture, suffice it to say that
if our model is reasonably adequate, our error terms should behave as
noise (i.e. a normally distributed random variable with mean zero and a
constant variance ). If, however, the residuals of the estimated model
show signs that something has been left out, we should conclude that our
model is misspecified, although we may be unaware of the exact nature
of the problem. In fact, partial regression is built on this principle inas-
much as it uses the residuals of a regression which include explanatory
variables deemed relevant to check whether another variable adds
anything further in terms of making a significant contribution towards
explaining the variation in Y.
or
t-2
r 2 - J
Yj - t/ + degrees of freedom (5.37)
where ryi is the partial coefficient of correlation of regressor ~ with the
dependent variable Y, and ti is its calculated t-value under the hypothesis
H 0 : 13i = O. Consequently, it is always possible to calculate the partial
correlation coefficients corresponding to each regressor from the results
of a multiple regression analysis.
The implication of (5.36) (or (5.37)) is that the multiple regression line
contains all relevant infarmation about the partial regressions far each
of its explanatory variables. The partial regression coefficient equals
the corresponding slope coefficient of the multiple regression line and
the coefficient of partial correlation can be obtained directly from the
calculated t-statistic of the corresponding slope coefficient in the multiple
regression. There is no need, therefare, to sweep out the other regressors
in arder to arrive ata partial regression. However, the concept of sweeping
out is important since it teaches us how a multiple regression coefficient
is arrived at by controlling far the linear influence of the other regressors
194 Econometrics for developing countries
also included in the model. Moreover, to construct a partial regression
plot we need to be familiar with the method of partial regression. Let us
now illustrate the various uses of the t-test in multiple regression. Many
researchers only use the t-test to check the hypothesis whether a partic-
ular regressor has coefficient zero and, hence, should be dropped from
the equation. Obviously, this is an important application of the t-test.
But the t-statistic is more versatile in its use. For this reason, let us illus-
trate varied applications of the t-test using our example of section 5.2
concerning the demand for manufactured consumer goods in India.
Exercise 5.4
Using model specification (5.3) which you estimated in exercise 5.2, test
the following hypotheses:
1 H 0: (3 2 = 1;
2 H 0: ¡3 3 =O;
3 Ho: ¡34 = -l.
In each case, specify clearly what you consider to be the relevant alternative
hypothesis. Explain the economic meaning of each of these hypotheses.
log M = /31 + /32 log (;:,) + {33 log (~) + (/32 + {33 + {34) log p m + E
(5.45)
which yields a specification which now features income deftated by indus-
trial prices, the relative price of food, and the price of manufactured goods,
all in logarithms, as explanatory variables. Importantly, the slope coeffi-
cient of the latter variable equals the sum of the coefficients ~ 2 , ~ 3 and
~ 4 • This enables us to test formally the homogeneity condition based on
the assumption of 'no money illusion'.
Exercise 5.5
Using specification 5.45, formally test H 0 : ~2 + ~3 + ~4 = O.
If you <lid the test, you will find that the data reject the null hypothesis. This
result may come as a surprise since the assumption that no money illusion
prevails appears quite reasonable in the light of demand theory. But, as
explained by Krishnaji (1992), our data are in fact highly aggregated time
series. As such, serious problems may emerge. For example, the effect of
rising average income per capita on the demand for manufactured goods
depends on how the increase in average income is distributed across house-
holds. If, as Krishnaji argues, rising average income goes hand in hand with
a worsening income distribution, the growth of the demand for manufac-
tured goods may conceivably be depressed accordingly. Similarly, a rise in
the price index <loes not mean that all prices go up at the same rate.
Differential price rises can have different consequences for incomes
and demand patterns. With aggregate data, therefore, the homogeneity
assumption is not as straightforward as it appears at first.
In exercise 5.4, you tested the hypothesis H 0: ~ 2 = 1 in model specifica-
tion (5.2) and found that the data reject the null hypothesis that the income
variable has an unitary elasticity. In other words, the demand for manu-
factured goods grows less than proportionally with income. Krishnaji's
explanation for this low income elasticity is that the worsening distribution
of income limits the expansion of the home market for manufactured goods.
But in fact Krishnaji <lid not use a double-log specification to estimate
Partial regression 197
the demand curve. Instead, in his specificatíon he tried explicitly to take
account of the dampening effect of rising incomes with worsening distrib-
ution on the demand for manufactured goods. He modelled the demand
for manufactured goods as a linear function of the price variables, P1 and
p m' and a quadratic function of income.
Exercise 5.6
Using model specification (5.48):
1 Estímate the regression coefficients of the model.
2 Estímate the partíal regression of M and Y2.
3 Construct the partial regression plot of M and Y2.
4 Estímate the partial regression of M and Y.
5 Construct the partíal regression plot of M and Y.
6 Formally test the hypothesis Ha: 13 5 = O.
7 Formally test the hypothesis Ha: 13 3 = O.
8 Explain the economic significance of each of these hypotheses.
9 What does each partial regression plot (respectively, M on Y and M
on Y2) tell you?
2 -
-~
¡--
_l__
o
1 1
-1
---'--- 1 1
_L_
-2 -
-3 -
-~
-4 -
s1a1a™
Figure 5.6 Variation in regression coefficients across specifications
Note: From left to right: FP; log(GNPc); FL; and CM
202 Econometrics far developing countries
more fragile with respect to alternative specifications, occasionally even
producing the wrong sign. The coefficients of child mortality and female
literacy vary significantly across specifications without, however, producing
the wrong sign in any of them. Obviously, there is a fair amount of multi-
collinearity among regressors. To see this, take a good look at the last
column in Table 5.1. None of the multiple regressions have an R 2 which
is anywhere near the sum of R 2s of the simple regressions featuring their
corresponding regressors. The table also reveals sorne cases of the pres-
ence of superfluous variables. In specification 15, for example, the
coefficient of child mortality dwindled almost to zero and, hence, drop-
ping this variable from the regression hardly affects the regression
coefficients of the other regressors, as can be seen from specification 11.
Similarly, in specification 8, the coefficient of the income variable is excep-
tionally close to zero; dropping it from the regression hardly affects
the regression coefficient of the other regressor, FL, as can be seen from
specification 3.
regressions which include the family planning variable. Table 5.3 gives us
the conditional bounds of the coefficients of FL and of CM in regressions
which include the family planning variable. As you can see, the bounds
have now become considerably closer. Undoubtedly, the female literacy
variable is less fragile in the sense that its point estímate is much less
sensitive to the range of specifications under consideration. The income
variable retains a relatively large fluctuation in its coefficient, but the bond
is also narrowed and all values have the 'right sign'.
Lessons
This example shows that sensitivity analysis can be a great help in guard-
ing against making fragile inferences. Looking carefully at the behaviour
of regression coefficients across altemative specifications often gives us
useful insights. When one or more coefficients prove to be highly robust
with respect to altemative specifications we may feel more confident about
204 Econometrics for developing countries
the inferences we make about the influence exerted by such variables. But
this <loes not mean that we should look at other variables with suspicion.
Indeed, quite often one or another coefficient of a regressor only becomes
reasonably stable once one or more other regressors are already included
in the model. In fact, in any regression there are usually sorne variables we
feel confident should always feature in the regression model. However, it
is always important to check how stable such coefficients are once we
include or exclude other variables which we consider to be doubtful, super-
fluous or of minar importance. Finally, our example also taught us that
sorne of the regressors we take into consideration when searching for an
appropriate specification may in fact be different proxy variables of a
deeper variable which may be hard to measure directly. The joint presence
of such related proxy variables often renders their coefficients unstable or
insignificant, or both. Sensitivity analysis may help to spot such problems
and help us to select the proxy which seems most appropriate.
Exercise 5. 7
Model specification (5.2) is, in fact, a restricted version of a more elabo-
rate model which include, apart from the income variable, Y, the price
variables, P1 (the price of cereals) and Pm' two more price variables:
namely, P 01, a price index of other food products, and P8 , a price index
of consumer services. These data are in the data file KRISHNAJI.
Including the last two variables into the double-log model specification
yields a six-variable regression.
1 Construct a table with the results of all possible regressions which at
least include the income variable (why?)
2 Construct comparative box plots of the variation in the slope coeffi-
cient of each regressor in the model.
3 Judging from your table, check whether there is much evidence of
multicollinearity.
4 Check whether any variables in any of the specifications appear super-
ftuous.
5 How robust is the income elasticity across alternative specifications?
6 In your opinion, which price variables appear to be most relevant in
the model?
206 Econometrics for developing countries
5.8 SUMMARY OF MAIN POINTS
1 Multiple regression is nothing more than a hierarchy of simple regres-
sions involving regressions between the dependent variable and each
of the regressors, auxiliary regressions between regressors, and regres-
sions featuring as its variables residuals of prior regressions.
2 The regression coefficient of an explanatory variable in a multiple
regression equals that of the partial regression arrived at by sweeping
out the linear influence of the other regressors included in the model
from both the dependent variable and the explanatory variable in ques-
tion. This procedure yields a partial regression which allows us to
control for the influence of other variables while investigating the rela-
tion between the dependent variable and an explanatory variable in a
context where other things are not equal.
3 The coefficient of partial correlation is the square root of the coeffi-
cient of determination of the partial regression. It is the coefficient of
correlation between the dependent variable and the added regressor
after sweeping out the influence of the other regressors in the model.
4 A partial regression plot (or added-variable plot) is a scatter plot
between two sets of residuals obtained by removing the linear influ-
ence of the other regressors from both the dependent variable and the
added regressor. A partial regression plot allows us to look at a
multiple regression coefficient by means of a two-dimensional scatter
plot. It is a powerful diagnostic tool to detect deviations from model
assumptions.
5 The extension of the least squares principie to multiple regression is
quite straightforward and yields mathematical properties which are
similar to those of simple regression. The main difference is that there
is now more than one explanatory variable, and with non-experimental
data, these variables often display a fair amount of multicollinearity.
Perfect collinearity implies that an exact linear relation exists between
the regressors (including the constant term), in which case linear
regression breaks down. Orthogonality of regressors implies that they
are uncorrelated with one another, a situation which is ideal for regres-
sion but seldom satisfied in practice.
6 Due to the presence of collinearity, the slope coefficient of a given
regressor in relation to the dependent variable depends on the other
regressors included in the model. Consequently, the slope coefficient
varíes with model specification as a result of the inclusion or exclu-
sion of other regressors. Only if all regressors are orthogonal on each
other will simple regression yield the same slope coefficients as
multiple regressions.
7 A superfluous variable in a regression is one which adds nothing to the
explained variation once the effect of other regressors have been taken
into account. Strictly speaking, its slope coefficient in the multiple
Partial regression 207
regression will be zero, but it may well be non-zero in subset regres-
sions drawn from this broader model. Dropping a superfiuous variable
from the regression does not alter the slope coefficients of the other
regressors.
8 Given the usual assumptions of classical linear regression jointly with
the added assumption that regressors are not exactly collinear, the
least squares estimators turn out the be BLUE. If we add the normality
assumption, the least squares estimators are also ML estimators and,
therefore, have minimum variances among all estimators. Given these
assumptions, the least square line is unbeatable as an estimator of the
population regression.
9 The t-statistic allows us not only to test hypotheses concerning indi-
vidual values of the coefficients, but also hypotheses which involve
linear combinations of coefficients provided we can suitably repara-
meterise the model.
10 It is a useful exercise to investigate the sensitivity of regression coeffi-
cients across plausible neighbouring specifications to check the fragility
of the inferences we make on the basis of any one specification with
respect to specification uncertainty as to which variables to include.
11 An important conclusion of this chapter is that we should not to readily
be led to assume that a slope coefficient in a multiple regression
measures the marginal impact of its regressor on the dependent vari-
able, other things being equal. With non-experimental data, other
things are never equal and hence our estimation and hypothesis testing
always takes place in a context where the covariation between regres-
sors is part of the picture. Regression only allows us to remove the
linear infiuence of other regressors when looking at the bivariate rela-
tion between Y and a particular regressor, but this is by no means the
same as holding the other regressors constant. Whether or not such
an inference can be made requires careful refiection, not a leap of faith
or blind trust.
6 Model selection and misspecification
in multiple regression
6.1 INTRODUCTION
In the last chapter we saw that any particular slope coefficient of a regres-
sor generally depends on the other regressors included in the model. Hence,
even if we are interested in the impact of only one of the regressors, it is
important that we include all relevant variables in the regression equation.
In this chapter we take this point further and show that if we omit a rele-
vant variable from the model, least squares will no longer give us unbiased
estimators of the coefficients of the population regression. The problem
is misspecification due to the omitted variable bias. That is, unless all
relevant variables are included in the regression equation, then none of the
estimated parameters will be unbiased (except in a special case shown
below). However, this result should not lead to a strategy of including every
conceivable variable one can think off in a regression since, with collinear
data, regression results are likely to yield foggy messages due to infiated
standard errors of the coefficients. Variable selection in model specification
is, therefore, a challenging task in applied research and has serious conse-
quences for the validity of the inferences we make from our data.
This chapter deals with hypothesis testing in the context of model selec-
tion and hence introduces you to sorne of the basic principies of general to
specific modelling. In section 6.2, we begin this chapter with an example of
misspecification: Griffin's well-known argument that aid displaces savings.
We show how, according to his own theory, Griffin's equation was mis-
specified. Next, in section 6.3, we examine the theory behind omitted vari-
able bias as well as the implications of omitting relevant variables or adding
irrelevant variables for the standard errors of regression coefficients, and
relate this discussion to our examination of Griffin's model. Excluding vari-
ables from a model is a form of restriction being imposed on a more gen-
eral model which includes these variables. This point is explained in section
6.4, where the use of F- and t-tests in specification searches is explored.
The F-test can be used to test any linear restrictions we place on a
model: not just testing for the exclusion of certain variables (zero restric-
tions ), but also testing for particular linear relations between regression
Model selection and misspecification 209
coefficients (non-zero restrictions) and for pooling data from different
samples (including different time periods). These issues are discussed in
sections 6.5 and 6.6. Even if it is not valid to pool the data in a partic-
ular sample, it may still be possible to estimate a single equation by the
use of dummy variables to allow sorne variation in coefficients between
sub-samples, this application is explored in section 6.7. Section 6.8
summarises the main points from this chapter.
(6.4)
1\
s 19.14 - 0.87 -
A
R2 0.33
y
= (6.5)
(2.06) (0.16) y RSS = 10,358
The coefficient of -0.87, which is not very different from Griffin's results,
is significant at the 1 per cent level and appears to confirm the argument
that aid will displace savings. But we saw above that the estimated model
should not be equation (6.4) at all, since to get it Griffin ignored the inter-
cept term in equation (6.3). As the estimated equation is divided through
by income, the 'true model' should also include the reciproca! of income
on the right-hand side, that is:
(6.6)
Model selection and misspecification 211
Estimation of equation (6.6) yields:
A
s A 1
- = 20.90 - 0.40 - - 17375 - R 2 = 0.57 (6.7)
y (l.69) (0.15) y (2923) y RSS = 6,637
The magnitude of the negative relationship between aid and savings is
halved by estimating the correct equation derived from Griffin's model,
vividly illustrating the point made in the last chapter that the value of an
estimated coefficient depends on the other regressors included in the
model. Befare we go on to look at the theory behind this omitted vari-
able bias, it is useful to pause and consider what we have done above.
Exercise 6.1
Use the data in the file MALTA (Maltese exports demand and supply)
to estimate the following demand equation for exports, X:
ln(XJ = ¡3 1 + ¡3 2 ln(WYJ + E1,;
b =
L (X 2; - X)Y;
~i=~l_ _ _ __ (6.10)
Y2 n
L (X2; -X)
i=l
2
If we work out the product in the numerator, we shall see that the first
term disappears since the sum of deviations from the mean of X equals
zero. The second term will reduce to ~ 2 , as its numerator and denomi-
nator are equal.
Taking expectations of both sides of equation (6.11) then yields:
n
2: (X2; - X)X3;
E( b2) = f32 + {33 + _;~-~----- (6.12)
:¿ (X2; -X)2
;~1
since the X-variables are given and, hence, non-stochastic, and E(E;) =O.
Equation (6.12) shows that byz yields a biased estimator of ~ 2 • This bias
is the product of two terms. The first term, ~ 3 , is the population slope
coefficient which measures the impact of the wrongly omitted variable on
the regressand. The second term is the estimate, c2, of the slope coeffi-
cient of the following auxiliary regression:
Y3; = "'11 + -Y2X2; + E; (6.13)
Hence, equation (6.12) can also be rewritten as:
E(bz} = ~ 2 + ~ 3 c2 (6.14)
Equation (6.14) allows us to determine the direction of the bias. If
the relationship between the omitted variable and the regressand (X3 and
Y) and the correlation between the omitted and included variables
(X3 and X 2) have the same sign (i.e. both are positive or both negative),
then their product is positive and there is an upward bias. If the two
expressions have different signs, their product is negative and there is a
downward bias.
Exercise 6.3
This exercise requires that you generate a set of artificial data to check
on the implications of adding irrelevant variables to a model. Start with
the assumption that the correct specification involves a simple regression
model: Y= 10 + 0.6 X 2 + E. Set your sample size equal to 30, and generate,
respectively, X 2 as a normally distributed variable with mean 10 and
variance 25, and E as a normally distributed error term with mean O
and variance 16. The Y values are then obtained from the postulated
regression model. N ext, genera te two more variables, X 3 and X 4 , which,
by construction, bear no relation to Y whatsoever. To do this, generate
X 3 as a normally distributed random variable with mean 5 and variance
16, while X 4 has mean 15 and variance 36. Given these artificial data,
regress (a) Y on X 2 , and (b) Y on X 2 , X 3 and X 4 • You know that,
by design, the latter regression carries a lot of extra baggage due to the
inclusion of two irrelevant variables. Carefully check how the introduc-
tion of irrelevant variables affects the standard errors and coefficient
Model selection and misspecification 219
estimates of the regression coefficient of the relevant variable. What do
you conclude about the effect of including irrelevant variables on the
precision of the estimates of the relevant variable?
Conclusion
This section discussed the implications of omitting relevant variables or
adding irrelevant ones in terms of bias and precision of regression results.
The resulting message is quite complicated. On the one hand, it is impor-
tant that we take seriously the implications of omitting a relevant variable
because it can lead to misleading conclusions as a result of the bias it intro-
duces. But, on the other hand, with collinear variables, adding regressors
to a model has a cost in term of the precision of our estimates due to the
variance inflation factor of the collinear regressor. If a regressor matters a
great deal, the gain obtained by adding it far outweighs the cost of omit-
ting it from the regression. The question of including regressors which are
relevant but less vital in terms of their effect on the dependent variable is
more tricky: the gain in the reduction in bias needs to be balanced against
the loss in precision, particularly when the sample size is relatively small.
The presence of rival proxy variables often amplifies the problem.
Irrelevant variables merely blur the picture, but we do not always know
which variables are irrelevant. The task of variable selection, therefore, is
quite daunting. Let us now see how hypothesis testing can help once we
have decided upon a set of variables to include in our general model.
The F-test
The F-test is a powerful tool in specification testing since it enables us to
test a whole range of linear restrictions. The versatility of the F-test, unlike
the t-test, is due to the fact that it does not rely on the standard errors
of individual coefficients, but operates on the residual sums of squares of
the regression as a whole. In other words, the F-test checks whether the
imposition of a linear restriction on a model significantly increases its
residual sums of squares. To do this, the F-statistic takes account of the
degrees of freedom of, respectively, the unrestricted (i.e. without imposing
the restriction) and restricted versions of the model. The general formula
for the F-test is as follows:
F ) = RSSR - RSSu n - ku
(m,n-ku RSS m (6.21)
u
where RSSuR = the residual sums of squares of, respectively, the unre-
stricted and the restricted model estimations; n = sample size; ku = number
of estimated coefficients in the unrestricted model; m = number of linear
restrictions imposed on the model.
The different parts of this formula need sorne explanation. Note that if
we add one or more regressors to an equation, the RSS cannot possibly
increase. If the added variable has any explanatory power at all, it will
reduce the RSS. Consequently, dropping one or more variables from a
model will increase the RSS (unless the estimated slope coefficients of the
deleted variables are exactly equal to zero ). More generally, the imposition
of a linear restriction on the model can never reduce the RSS when the
restricted model is estimated. The numerator of the first term in the formula
shows the difference between these two residual sums of squares, and divid-
ing this by RSSu gives the proportional increase in the RSS from imposing
the restriction. How big does an increase have to be to make it significant?
Model selection and misspecification 221
F = Wlv1
Zlv 2
follows an F-distribution with v1 and v 2 degrees of freedom (of,
respectively, the numerator and the denominator).
Exercise 6.3
If the restrictions we impose on a model concern dropping one or more
explanatory variables, both the restricted and unrestricted versions of the
model will feature the same dependent variable. Now, given that R 2 =
ESS!TSS = (1 - RSS!TSS) show how the F-test in equation 6.16 can be
re-expressed using R2s rather than RSSs of the restricted and unrestricted
regressions. To do this, express RSS as a function of TSS and the R 2 , and
substitute this solution in to equation (6.21 ).
222 Econometrics far developing countries
If you completed exercise 6.3, you will have found that the F-test can
also be formulated as:
F = Ru2 - R/ n - ku
(m,n-ku) l _ R 2 m (6.23)
u
Note that in the R 2 version of the F-test, it is the statistic of the un-
restricted model which appears first in the numerator. This is what we
would expect, since Ru2 will be greater than Ri. Note that expression
(6.21) is gene rally applicable, whatever the form of linear restrictions
we impose on the model, while (6.23) can only be used if both the
restricted and unrestricted version of the model feature the same depen-
dent variable.
Dropping a regressor from the model: the F-test and t-test compared
But let us now return to our example in section 6.2 and test formally
whether the variable 1/Y can be dropped from the model. We already have
all the information we require to apply the F-test to Griffin's restriction
imposed on the 'correct' savings model. Griffin imposed the restriction
that ¡33 = O, and, hence, ku = 3 and kR = 2, so that m = l. The sample size,
n, is 66 and we have RSSR = 10,358 and RSSu = 6,637. The restricted RSS
is indeed considerably greater than the unrestricted. But is the difference
significant? We test this with the F-test:
F = 10,358 - 6,637 66 - 3 =
35 3
<1•63 ) 6,637 1 .
This calculated figure compares with a critica! value of just under 4.0
at the 5 per cent level. Since the calculated value is greater than the crit-
ica! value, we can reject the null that ¡3 3 = O, in favour of the alternate
hypothesis that it is non-zero: Griffin's restriction is invalid. We leave it
to you to verify that the R 2 version of the F-test will yield the same result.
You may think that all this is much ado about nothing. If we wish to
test the null hypothesis that ¡3 3 = O, why not just look at the t-statistic?
As can be calculated from the results in equation (6.7), t = 5.92, and,
hence, we reject the null hypothesis at the 5 per cent significance level.
In this case, the t-test is indeed easier, but it is only valid as a test of a
single zero restriction. If we want to impose two or more zero restric-
tions, we can no longer rely on separate t-tests, but we must use an F-test.
In other words, we cannot test the hypothesis that two or more coeffi-
cients in a multiple regression are both zero by looking at their respective
t-statistics; we must do a joint test with the F-test. The reason that
combining individual t-tests is insufficient to test whether we can drop two
or more variables from a model is because, in general, the sampling distri-
butions of these coefficients are not independent of one another: in the
three-variable case, for example, as shown in equation (5.29), the covari-
Model selection and misspecification 223
anee of the least squares slope coefficients is generally not zero, unless
the corresponding regressors are orthogonal. The knowledge that one
coefficient is zero, therefore, will generally affect the probability of the
other being zero as well, and hence a joint test is required.
If our restriction involves dropping only one variable, it does not matter
whether we use the t- or F-tests: both tests are equivalent. In fact, it can
be shown that t2 = F. In our example t2 = 5.922 = 35.1, which, allowing
for rounding errors, is acceptably clase to the calculated F-statistic of 35.3.
But, to repeat, if we wish to restrict more than one variable, we must
apply the F-test.
32
30
D
28
26 D
D D
24 D D
oo 22 D
e. 20 D
D
E D
(]) 18 D
E
Cii
(])
16
> 14
.!::
(])
(ij 12
>
&: 10
8 D
D D
6
4 D
ºº
2
o
o 4 8 12 16 20 24 28
Public investment ('000)
which compares with a critica! value of 3.63 at the 5 per cent level.
Therefore, we must reject the null hypothesis that the simple regression
of private investment on public investment is a valid model of the deter-
minants of private investment. We conclude that there appears to be
neither crowding in nor crowding out in the Sri Lankan case. We shall
retum to this example, however, in section 6.7.
At this juncture, we should stress an important point. In both examples
discussed above, Griffin's aid-savings relation and the crowding-in hypoth-
esis, we started with a simpler model and then showed how the results
changed dramatically when additional variables were introduced to the
regression. This arder of presentation was a pedagogical device to illus-
trate the importance of not jumping to conclusions on the basis of
oversimplified regressions. If we had been presenting these regressions as
results from research there would have been no call to report the simple
regression results as they are statistically meaningless (though we may
choose to report that we tested whether the simpler model were a valid
restriction of the data and found it not to be so).
Exercise 6.4
With the results of exercise 6.1, formally test whether (a) both explana-
tory variables and (b) the price variable only can be dropped from the
three-variable model. In the latter case, use both the t- and the F-tests
and check the relation between them.
40 ...•
20
Q)
o
E
o
(.)
e
~ -20
Cl
e
-~
en
-40
-60
26
-80
o 10 20 30 40 50 60
Aid/income
•
5
•
•
•
-66.4916 26
-0.00111 0.002809
e(inv YI X)
s1a1a"'
Figure 6.3 Partial regression: S/Y on 1/Y
Exercise 6.5
Rework the example of Griffin's hypothesis and systematically check the
assumptions of the model using diagnostic graphs, normality checks,
studentised residuals and DFBETAs.
If you tried out studentised residuals and DFBETAs, you will have seen
that they also confirm the presence of outliers and infiuence in the regres-
sion: the studentised residuals for data points 15 and 26 are, respectively,
2.3 and -6.73, and data point 26 gives a DFBETA value of -2.97 for the
slope coefficient of l/Y. These statistics, therefore, also confirm that
outliers and infiuential points are prevalent and are likely to distort the
inferences we make.
An important lesson we learn from this example is that we should never
forget that the presence of serious outliers and infiuential points is a sign
of misspecification. It tells us that the error term of the model still contains
a lot of meaningful information which we have notas yet grasped. Jumping
ahead into hypothesis testing without carefully checking the assumptions
of the model is a poor strategy in data analysis. Superficially, it may look
good on paper. But many of these inferences may prove worthless if you
Model selection and misspecification 229
care to check your assumptions. Befare drawing conclusions from a model
make sure its foundations are sound so that you can assert them with
sorne confidence.
In this section we have seen how to use an F-test as a means of testing
zero restrictions, so we can test a specific model against a more general
one. But the example we have just seen shows that you must also check
the specification of the general model. If the 'general model' still has
omitted variables then this problem will quite possibly show up in the
residuals - possibly as influential points as in the example just discussed.
But omitted variables can also produce problems of heteroscedasticity
(see Chapter 7) and serial correlation (see Chapter 11). Hence a key
message of this text is borne out here: always look at the residuals for
clues as to possible misspecification.
= Qml32+133
Exercise 6.6
The data file PRODFUN contains data from a developing country busi-
ness survey covering two manufacturing sectors. Estímate specifications
(6.29) and (6.31) using the data in this file, and formally test the hypoth-
esis whether there are constant returns.
Exercise 6.7
Using the data set KRISHNAJI (food price and manufacturing demand
in India), estímate equatíons (5.5) and (5.41), which differ inasmuch as the
latter imposes the restriction of a unitary income elasticity on the demand
for manufacturing goods: that is, 13 2 = 1, which is a non-zero linear restric-
tion. Note that the unrestricted (equation 5.5) and restricted (equation
5.41) versions of the model do not feature the same dependent variable.
1 Use (6.31) to calculate the F-statistic to test the restriction of a unitary
elasticity.
Model selection and misspecification 231
2 Check what would happen if you had calculated the F-statistic with
formula (6.23).
Another common example in economics which involves non-zero restric-
tions is the assumption of no money illusion in demand equations (i.e.
demand depends on real not nominal income). We carne across such a case
in the previous chapter when discussing the demand for manufacturing
goods in India. A further example involves testing for parameter stability
across different samples. This is the issue we shall turn to in the next section.
Exercise 6.8
Khan and Reinhart (1990) argue that the productivities of public and
prívate capital are different, the latter being more productive. To test this
they first regressed growth (y) on aggregate investment (!), growth of
the labour force (L), growth of exports (X) and a constant. They then
repeated the regression with investment desegregated into public (Ig) and
prívate (Ip). Using cross-section data for 24 countries they achieved the
following results:
y= 1.085 + 0.119/ + 0.427 L + 0.212X R2 = 0.660
(0.81) (2.36) (1.33) (4.97)
Pooling data
In Chapter 4, we discussed the relation between using the data file
SOCECON. Our model, therefore, is as follows
(6.32)
This model assumes, however, that the values of (3 1 and (3 2 are the same
232 Econometrics far developing countries
80 o
o o o
o o
o 000 00
o o o o
00
o 00
o o o
o o o 00
70 o o o
o
o 0000 o
>. Q() o o o
(J o
e: o
C1l o o o
00 o
tí o o
Q) o
a. 60 <>
X
Q)
o
2 o o
o
:.:J o o
00
o o
o o
o
50 o o o
o o
o 00
o 00 o o
o
o
o
40
4 5 6 7 8 9 10 11
Log (GNP per capita)
for all observations. Figure 6.4, which depicts the scatter plot of the data
using all available countries, shows there are actually two sets of rela-
tionships: the scatter is steeper on the left-hand side (left of the vertical
line indicating incomes below $2,500), and fiatter for countries with higher
incomes. This result is not very surprising, since there is a physiological
limit on life expectancy.
The result of these underlying data patterns is that if we fit the regres-
sion line to all observations, it does not capture the relationship at all
well, as is clear from Figure 6.4. The resulting regression line is as follows:
/\.
LE; = 20.96 + 5.80log(Y;) R 2 = 0.75 (6.33)
(2.41) (0.32) RSS = 3125.8
Looking at the scatter suggests that the relationship between income
and life expectancy is different for high and low income countries. That
is, we should have two separate regressions:
For Y; ~ $2,500 LE;= a 1 + a 2 log(YJ + E2 ; (6.34)
This calculated value is much greater than the critica! value of a little over
3.07 at the 5 per cent level, so we reject the null hypothesis that it is valid
to pool our data.
Testing for the validity of pooling is often not done in published
research. Yet a glance at the scatter plot shows in this case that it is
improbable that the data may be pooled for a linear regression. This is
thus yet a further example where preliminary data analysis - such as
looking at the scatter plots - can yield useful information about specifi-
cation ( and warnings about possible misspecification) of your model. What
we can do once the data has rejected pooling is discussed in section 6.7
below. First we show how a similar test may be used to test parameter
stability across time.
Exercise 6.9
White (1992) presents results listed in Table 6.1 for the regression of real
GDP growth on a constant, savings, exports, grants and other capital
inftows for three developing regions: Africa, Asia, and Latin America and
234 Econometrics for developing countries
Table 6.1 Results of growth regressions
Savings Exports Grants Capital infiows RSS
Africa 0.08 -0.02 0.15 0.08 26,401
(0.02) (0.02) (0.03) (0.08)
Asia 0.09 0.03 0.07 -0.17 3,150
(0.03) (0.01) (0.07) (0.08)
LAC 0.18 -0.09 0.08 -0.10 10,290
(0.03) (0.02) (0.04) (0.11)
Ali regions 0.11 -0.03 0.13 0.01 41,717
(0.11) (0.01) (0.02) (0.05)
Note: Absolute values of t-statistics are listed within brackets.
Caribbean. He also gives the results from pooling the data across the three
regions. What assumption is being made in running the pooled regres-
sion? Test this assumption by means of an F-test, stating clearly your null
hypothesis. (The size of the pooled sample is 1,334 observations.)
1.40
1.30
1.20
1.10
1.00
0.90
0.80
0.70
0.60
1950 1955 1960 1965 1970 1975 1980 1985
Exercise 6.10
Repeat the test for a structural break in the developing country terms of
trade putting the break at 1974, rather than at 1973 as is done in the text.
Which is the more appropriate break point?
Exercise 6.11
Population figures for Kenya for the period 1968-88 were regressed on
a constant and a trend. The residual sum of squares (RSS) from this
Model selection and misspecification 237
regression was 8.77. The same equation was re-estimated for the sub-
sample 1968-85, from which the RSS was 3.89. Use a Chow test to
determine the stability of the parameters.
Intercept dummies
An intercept dummy is a variable which takes the value one for a specific
sub-sample and zero for the rest of the sample. In our example on the
decline in the terms of trade, we shall distinguish the periods 1950-72, for
which years the variable D UM takes the value zero, and the remaining
years (1973-86) for which it takes the value l. The regression equation
beco mes:
(6.43)
Note that DUM must also be given a time subscript since it is a variable.
Now, when DUM = O, the estimated regression line is given by:
~
ln(TOT) 1 = ¡3 1 + ¡3 3t (6.44)
But when D UM = 1, the line is:
~
ln(TOT) 1 = (b 1 + bz} + b3t (6.45)
The coefficient b2 is the differential intercept - that is, the shift in the
intercept for those observations for which D UM = l. Do not interpret b 2
as the intercept for the second sub-sample - this is not what it is. The
intercept for the second sub-sample is given by b1 + b2•
Estimating equation (6.43) using the terms of trade data set yields:
~
ln(TOT) 1 = 0.254 + 0.262DUM - 0.017t R2 = 0.57 (6.46)
(0.038) (0.059) (0.003) RSS = 0.299
1.40
1.30
1.20
1.1 o
1.00
0.90
0.80
0.70
of the true slope coefficient. Figure 6.6 shows the fitted lines from equation
(6.46). You will have observed that we have here assumed that the slope is
constant between sub-samples, but that the intercept can vary. So we are still
imposing a restriction on the unrestricted (two-equation) model. It is left to
you as an exercise to test this restriction. You will find that it is not valid.
0.25 75
0.2
Q) 0.15
.a
'6
e
Q)
0.1
c.
X
Q)
0.05
E
~
:::; o
<..l
~
.r::
j -0.05
e
" -0.1
-0.15
-0.2
-0.2 -0.15 -0.1 -0.05 o 0.05 0.1 0.15 0.2 0.25
Growth: recurren! revenue
srara•M
Figure 6. 7 Regressions of RE on RR with and without data point 1980
Table 6.2 Regressions with and without intercept dummy for 1980 (t-statistics
in brackets)
RE on RR D80 RSS R2
1 Ali observations 0.934 0.0667 0.788
(8.405)
2 Ali except1980 1.047 0.0347 0.890
(12.05)
3 All observations 1.047 0.189 0.0347 0.890
(12.05) (4.077)
Slope dummies
An intercept dummy allows us to vary the intercept across sub-samples
while keeping the slopes of the regressors constant. But at times we want
to be able to vary the slopes. This requires the use of slope dummies. To
see how we can construct these, let us continue far a while with this
example of the simple regression of Tanzania recurrent expenditures
against recurrent revenues. Recall that, in Chapter 4, the exploratory band
regression of RE on RR noted a mild non-linearity of the regression curve.
More particularly, with the exception of the outlier 1980, it appeared as
if the slope coefficient was lower when the growth rates in recurrent
revenues were positive than when they were negative. Let us see how we
can use a slope dummy to check whether this is true.
To obtain a slope dummy, we first construct an intercept dummy,
DNEG, such that DNEG = 1 if RR < O , and O otherwise. The dummy
DNEG, therefare, picks out all observations far which the growth in recur-
rent revenue was negative. The slope dummy, DRR, is then obtained as
fallows: DRR = DNEG.RR. Hence, DRR = RR when RR < O; otherwise,
DRR = O. We now farmulate our regression modelas fallows:
(6.47)
As befare, the regression model features no constant term. Now, if
DRR =O, the slope of the regression will be 13 2 , but if DRR = RR (when
RR < O), the slope coefficient will be (13 2 + 13 3).
242 Econometrics for developing countries
0.25
0.2
l!:? 0.15
.a
'5
e:
Q)
a.
0.1
X
Q)
E 0.05
l!:?
;:;
(.) o
l!:?
.e -0.05
~
e
(9 -0.1
-0.15
-0.2
E 20 D
D
E D
Q) 18 D
E
¡;; 16
Q)
> 14
.!:
Q)
ia 12
> ¡JJ
et 10
8
D D
6 D D
4 D
2
o 10 12 14 22 24
6 8 16 18 20 26
Public investment ('000)
Figure 6.9 Sri Lankan investment function with intercept dummy only
32
30 D
28
26 D
D D
24 D D
oo 22
D
E 20 D
D
E D
Q) 18 D
E
¡;; 16
Q)
> 14
.!:
Q)
ia 12
>
et 10
8
D D
6
4 D
ºº
2
o
o 4 8 12 16 20 24 26
Public investment ('000)
Figure 6.10 Sri Lankan investment function with slope dummy only
244 Econometrics far developing countries
Combining intercept and slope dummies: the Sri Lankan investment
function
Let us now return to the Sri Lankan investment function and see how
intercept and slope dummies can be combined to throw more light on the
crowding-in or crowding-out issue. Take another look at Figure 6.3:
government investment appears to fall into two distinct sub-samples - one
set to the right in excess of Rs 16,000 million and another of lower value
in the bottom-left of the graph. Looking at the data set ( or adding data-
labels to our graph if it has this facility) we can see that the former, higher
values all correspond to the years 1979-89 and the latter to 1970-8. The
Sri Lankan government embarked on a liberalisation programme late in
1977 and on an investment boom in 1978-9. This suggests that we could
use dummy variables to distinguish between periods with different policy
regimes. The question then is whether there was a structural break in our
investment function between these two periods? The scatter plot suggests
there was such a break: with crowding in during the earlier period, but
crowding out later on - so the coefficient on government investment
should be positive in the first period and negative later on.
For pedagogical reasons, let us compare the effects of three different
uses of dummies: Figures 6.9, 6.10 and 6.11 show the fitted lines from
introducing, respectively, an intercept dummy, a slope dummy and both
slope and intercept dummies into the simple regression. If we use an ínter-
32
30 D
28
26 D
24
D
oo 22
2- 20 D
D
E D
CD 18 D
E
(¡) 16
CD
.so> 14
.El 12
Cll
>
10
et D
8
O D
6 D
4 D
2
o 1o 12 14 16 18 20 22 24 26
6 8
Public investment ('000)
Figure 6.11 Sri Lankan investment function with intercept and slope dummies
Model selection and misspecification 245
cept dummy only, there is neither crowding in or out in either period
(Figure 6.9). The problem is that, in this figure, we are constraining the
slope coefficient to be the same in each period. In the case in which we
allow the slope to vary (Figure 6.10), we find crowding in to be stronger
in the second period than the first. Here we are imposing the constraint
that the intercept must be the same for the two sub-samples. The result
is that both regression lines look awkward.
As shown in Figure 6.11, it is only when we allow both slope and inter-
cept to vary that we get a clearer picture: the fitted lines now conform to
the pattems apparent in the scatter. This example allows us to draw an
important conclusion: the difference in the slope coefficient between the
two sub-samples will only reveal itself if we also introduce an intercept
dummy. The intercept dummy is clearly necessary since it is not possible
to have a negatively sloped line through the 1979-89 sub-sample of data
points that has the same intercept as a line with a positive slope through
the 1970-8 data points. The intercept dummy, therefore, is necessary to
accommodate the difference in slope. This is, in fact, nothing more than
a specific example of omitted variable bias.
When we include both slope and intercept dummies then both coeffi-
cients vary between sample periods. That is, including a dummy for all
regressors yields the same coefficients as estimating separate sub-sample
regressions. We may therefore test the validity of pooling data either by
the Chow test presented earlier or by an F-test of the restriction that the
coefficients on all dummy variables are jointly zero. These two tests are
equivalent, i.e. they yield exactly the same result. You are asked to verify
this equivalence in exercise 6.13.
This should not lead us, however, to interpret the intercept dummy as
reflective of a higher level of 'autonomous' prívate investment in the later
period. In general, in the presence of a slope dummy, an intercept dummy
should be interpreted as accommodating the change in slope.
Exercise 6.12
Using the example of the regression of life expectancy on the logarithm
of GNP per capita, re-estímate the model using intercept and slope
dummies combined. Compare your results with those obtained earlier with
two separate regressions. Test formally whether either one or both of the
dummies can be dropped from the model. Do your results confirm the
conclusions reached earlier?
Exercise 6.13
Test whether it is valid to pool the data for the regression of IP on Ig using
the data in the file SRINA by (a) running separate sub-sample regres-
sions for 1970-8 and 1979-89; and (b) using dummy variables. Compare
246 Econometrics far developing countries
the estimated coefficients and restricted and unrestricted RSSs from the
two methods, and comment on your results.
ADDITIONAL EXERCISES
Exercise 6.14
Using the data in data files INDONA and SRINA test whether it is valid
to estimate a pooled consumption function for the two countries and
comment on your findings.
Exercise 6.15
In section 6.7 we found a structural break in the simple private invest-
ment function for Sri Lanka. But in section 6.4 we found the simple
regression to be misspecified owing to omitted variables. Using the data
in the file SRINA:
1 Construct partial scatter plots of IP against each of lg, M and r.
2 Use your scatter plots to judge if there is a structural break in any of
these relationships.
3 Hence, define a general investment function which regresses IP on lg,
M and r and any appropriate dummy variables.
4 Try to obtain a more specific model but testing restriction on the coef-
ficients of the general model.
How do you interpret your findings?
Exercise 6.16
Mosley et al. (1991) are concerned to examine the impact of adjustment
policies on a range of macroeconomic performance variables (such as
growth of GDP and exports). They estimate a number of equations in
which the main regressors are current and lagged adjustment-related finan-
cia! flows and measures of compliance with conditionality. A small number
of other variables are included (all unlagged). The authors argue that:
The equations represent somewhat crude hypotheses regarding the
determinants of the five dependent variables. There are many other
independent variables which could have been included as explanatory
248 Econometrics for developing countries
variables in the equations. In addition, lags could have been introduced
to more of the independent variables ... However, since it is specifically
the impact of Bank finance and policy conditions which we wish to quan-
tify, we have refrained from more complex specification of the equations.
(Mosley et al. 1991: 210)
Comment on their argument.
Exercise 6.17
Given the following estimates of the consumption function, both calcu-
lated from the same data set of 20 cross-country observations, what would
you expect to be the sign of the correlation coefficient between income
and the real interest rate:
e= 0.14 + 0.82 y - 0.04r
e = o.os + o.83 Y
where C is consumption, Y income and r the real interest rate? Explain
your answer.
Exercise 6.18
Suppose that the true model of prívate investment (Ip) is:
Ip,t = !31 + !3zlg,t + ¡33M1 + Et
but that you estímate
Ip,t = !31 + !32Ig,t + E't
What would you expect to be the direction of the bias in the coefficient
on Ig in the simple regression of IP on Ig? Use the data for Sri Lanka
(SRINA) to verify your answer.
Exercise 6.19
Using the data in data file PRODFUN estímate a separate production
function for each sector. Is it valid to pool the data from the two sub-
sectors? How do your results affect your answer to exercise 6.6?
Part 111
Analysing cross-section data
This page intentionally left blank
7 Dealing with heteroscedasticity
7.1 INTRODUCTION
Real data do not conform to the idealised conditions of the classical linear
regression model. As we pointed out in Chapter 4, you are likely to
encounter heteroscedasticity frequently in economic data, particularly with
cross-section data. The reason is that the variation in the dependent vari-
able seldom remains constant when the level of one (or more) explanatory
variable(s) increases or decreases. For example, not only is the level of
consumption of the rich much higher than that of the poor, but it is also
more varied. The poor have few options but to spend their income on
the basic essentials of life; the rich enjoy the privilege of making choices.
Similarly, there tends to be much less variation in output or expenditure
levels among small enterprises than among large firms. The implication
for statistical analysis is that you will not be able to apply the regression
model to the data straight away. But fortunately, the techniques of trans-
formation make the application of the model possible in very many
situations in practice. In Chapter 4, we showed how a well-chosen trans-
formation can help to convert a non-linear relationship into a linear one.
Like non-linearity, heteroscedasticity is also often due to the skewness in
the distribution of the variables under study. As a result, a suitable trans-
formation can make the heteroscedasticity disappear while making the
average relationship linear at the same time. However, you may not always
be able to do this. There are also cases where the relationship will look
clearly linear but the scatter plot indicates heteroscedastic errors.
If all the other assumptions of the regression model, i.e. the assumptions
of linearity of the regression, independence and zero expectation of the
error term, are valid, then it can be shown that heteroscedastic errors do
not affect the unbiasedness of the least squares estimates of the regression
coefficients. But the precision in estimation of the coefficients is no more
the best. In other words, the least squares estimators are not best linear
unbiased estimators (BLUE) but only linear unbiased estimators. Further,
the standard formulae for the standard errors will not be valid, since they
are based on the assumption of homoscedasticity. Consequently, it is not
252 Econometrics for developing countries
possible to perform the t-tests and F-tests under heteroscedasticity. Thus,
in order to make reliable inferences from the heteroscedastic data on the
basis of the linear regression model, we seek to eliminate heteroscedasti-
city by means of a suitable transformation. Incorrect functional form is the
type of model misspecification most likely to account for heteroscedasticity
in the residuals; but it may also be a symptom of omitted variables.
There is a very important point here, one on which we depart from
many traditional textbooks. Heteroscedasticity (and autocorrelation, dealt
with in Chapter 11) is a violation of our assumptions about the error term,
which has adverse implications for least squares estimation. But we do
not know the errors, but proxy them with the residuals. The residuals are
a function of our model specifications. Hence 'problems' which appear in
the residuals, such as heteroscedasticity, are just as likely to be a result
of a misspecified model as they are of a genuinely heteroscedastic error
in the true model. When coming across a problem in the residuals the
first course of action must always be to check the model specification.
Only once you are sure that the model is correctly specified should you
turn to one of the traditional 'cures' for residual heteroscedasticity.
In Chapter 4, we showed how the residual versus predicted plot can be
used to check for heteroscedasticity. In this chapter, in section 7.2 we
discuss two other plots which are useful in the visual examination for
heteroscedasticity. Next, in section 7.3, we introduce a selection of statis-
tical tests for heteroscedasticity. Section 7.4 discusses how to explore
suitable transformations for elimination of heteroscedasticity in order to
obtain the best estimators. Section 7.5 shows how weighted least squares
regression can sometimes be used to make inferences when the error term
is heteroscedastic. We show that weighted least squares can also be done
by a linear regression with suitably transformed variables. Finally, section
7.6 gives a summary of the main points.
o
Q) o
E o o
o
(,J
.!::
Q) o
Ol o
t1l
:;: g 8 ººo o
ro:::i o
ººo o
o ºo
o o
o o 8@ o
-o o o ºº il o ~ o ¡¡ o
·¡¡:; o oºo o o ºo
Q)
a: 8
0 go o0 o g 0
o oºº
000808§ oºº
o o o o
o o
8 8
o
o 8
o
8 o
o o
o
52.3633 374.897
Predicted wage income
506711 o
(/l
Cii
::J
"O
·¡¡;
~
"O
~
Ctl
::J o
O"
(/) o
o o
o
o o
o§ 0
00 o o o
ºº 8 º ° º
o ~ 2 ~ºI 8° ~ 88 ªº o ~
soo
oo@@oo@iiooOo§O~eg89oifio ooioMº o0 B oo ~º
0
0.728144 io8° 0 @ o
52.3633 374.897
Predicted wage income
srara™
Figure 7.2 Squared residuals versus predicted
711.837 o
o
o
o o
o
o o
o o o
8
o o
o@ o@
ooo oo
8 §
o 8ºº o
o o
o o
0.853314
62.3633 374.897
Predicted wage income
srara™
Figure 7.3 Absolute residuals versus predicted
256 Econometrics for developing countries
Exercise 7.1
In Chapter 4, while discussing transformations towards linearity, we used
three examples with data taken from the data file SOCECON: respec-
tively, the relation between energy consumption (E) and GNP (Y), both
per capita; between energy consumption (E) and the degree of urbanisa-
tion as measured by the percentage of the population living in urban areas
(U); and, finally, between life expectancy (L) and GNP per capita (Y).
For each of these simple regressions between the raw data, compare the
plots of raw, absolute and squared residuals against the predicted values
of the dependent variable or against the regressor. In each case, check
which plot is most revealing in terms of detecting heteroscedasticity.
Exercise 7.2
Use the INDIA data set to estímate the regression line between the loga-
rithm of wage income and the age of the worker, compute the residuals,
and plot the raw, absolute and squared residuals against the predicted
values of wage income and against the age of workers. What do you
conclude about the presence or absence of heteroscedasticity?
Bartlett's test
Bartlett's test can be applied to check for the equality of the variances of
the dependent variable across groups defined by an explanatory variable.
Dealing with heteroscedasticity 257
The conditional variance of Y given X is the same as the conditional vari-
ance of the error term, u}. Indeed, using equation (7.1), we get:
V(Y) = V(l3 1 + 132 X+ Ex)
Now let Y;j = jth Y value in the ith class; n; = the number of observations
in ith class; and f; = (n; - 1); f = "'2.f;. The test is then performed as follows:
where
- 1 n
Y;=-~ Y;j
n; j=l
Goldfeld-Quandt test
Bartlett's test was used to check for homogeneity in the conditional vari-
ances of Y, the dependent variable. The Goldfeld-Quandt test checks for
homogeneity in the conditional error variances. Hence, with Bartlett's test,
you group Y with reference to the ascending order of one of the X vari-
ables, but you do not run groupwise regressions. The test is performed using
the computed conditional variances of Y. As we shall see, the Goldfeld-
Quandt test also implies that you group the data with reference to the order
of one of the X variables, but in this case you run groupwise regressions
to obtain sets of within-group residuals. The test is commonly used
260 Econometrics far developing countries
when the heteroscedastic variance a 2x is suspected to vary monotonically
(i.e. consistently increasing or decreasing) with one of the explanatory
variables in the regression model. The procedure is based on dividing
the sample into three groups in ascending arder of one of the explanatory
variables, and testing for the difference in the error variance between the
bottom and the top groups. Hence, the middle group is not considered in
the test. Its only function is to prevent the extreme groups bordering on
each other.
The test involves the following steps:
1 Arrange the data in ascending arder of the explanatory variable
suspected to be related to the error variance.
2 Drop a number of the middle observations, say e, so that (n - e) is
divisible by 2, hence n' = (n - c)/2 is the subsample size. A rule of
thumb is to drop about 1/4 of the total observations from the middle.
3 Estimate two separate regressions for the bottom and the top group
of observations, and compute the corresponding residual sums of
squares - respectively, RSS1 and RSS2.
4 Compute the ratio of the higher to the lower residual sums of squares.
This ratio has an F-distribution with [d, d] degrees of freedom where
n-c
d = - - - k = n' - k
2
k = No. of estimated coefficients
under the hypothesis that the error distribution within each group
is normal, and that the error variances are the same. The higher the
computed ratio, the less likely it is for the hypothesis to be true.
5 Compare the computed ratio with the critical value of the relevant
F-distribution. If the computed exceeds the critical value, then the
hypothesis of homoscedasticity is rejected.
Let us use our sample data of 261 workers again to illustrate this test step
by step:
1 Since there is only one explanatory variable, age, we arrange the data
in ascending arder of age.
2 The total number of observations is 261. One-fourth of 261 is 65.25.
Now (261 - 65) is an even number and, hence, we drop the middle
65 observations. This leaves us with 98 (i.e. (261 - 65)/2) observa-
tions each in the bottom and top groups. The bottom group corre-
sponds to lower values of the age variable and the top group to the
higher values.
3 Bottom group (observations 1 to 98): RSS1 = 382,302
Top group (observations 164 to 261): RSS2 = 2,207,120
4 Fcalculated = RSS2IRSS1 = 5.63.
Dealing with heteroscedasticity 261
5 The computed value of 5.63 is higher than the critica! value in the
F-distribution with [96, 96] degrees of freedom at 5 per cent level
of significance. Hence, we reject the null hypothesis of homosced-
asticity.
Again, this test is not strictly valid because of the non-normality of the
distribution of wage income. If we repeat this test with log income, this
time the test is valid, and the corresponding calculated value much lower
(1.96), though not insignificant.
It is not necessarily the case that the different tests for heteroscedas-
ticity will lead to the same conclusion. These tests depend on the way in
which we divide the data into groups, hence, different groupings may yield
different results. We shall come back to this point later.
White's test
The basis for this test is to check whether there is any systematic relation
between the squared residuals and the explanatory variables. This is
achieved by regressing the squared residuals e;2 on all the explanatory
variables and on their squares and cross products. Thus, if X 1 and X 2 are
the explanatory variables, then White's test involves regressing e2 on X 1,
X 2 , X 12 , Xz2 and X 1 .X2 , and using the overall F-test to check if the regres-
sion is significant or not. This test (and others like it) is in fact a general
regression specification error test (RESET) and not solely a test for
heteroscedastic errors (see Box 7.1).
In our example of the regression of wage income on the age of the
worker, the only explanatory variable, we regress the squared residuals
e2 from the fitted regression in (7.2) on AGE and AGE2, which yields an
F-statistic of 6.58. This calculated value should be compared with the crit-
ica! value F(z,zss)· The calculated F-statistic is significant at the 1 per cent
level and, hence, we reject the hypothesis that there is no relationship
between squared residuals and age and age squared. By implication, we
reject the hypothesis of homoscedasticity.
Glejser's test
Like White's test, Glejser's test also checks whether a systematic relation
exists between the residuals and the explanatory variables. However,
Glejser approaches the problem in a different way. The test involves
regressing absolute residuals separately on X, x-1 and X1 12 , and uses
t-tests for the slope coefficients to be zero. If there is more than one
explanatory variable then this exercise is to be repeated for each of the
explanatory variables. The hypothesis of homoscedasticity is rejected
if any of the slope coefficients turns out to be significantly different
from zero. The difference with White's test, therefore, is that Glejser's
262 Econometrics far developing countries
k
L ft' In
f · In (s2) -
1+ . 1 [kG)
i=l
2.: - --~lJ
Bartlett's 3(k 1) f í=l i x2 47.81 1.15 Age: 1.15
Edu: 1.77
RSS1
Goldfeld-Quandt RSS2 F 5.63 1.96 1.42
Glejserb Test of significance of slope coefficient from Age: 6.5 Age: 1.9 Age: 1.8
separate regressions of absolute residual on 1/Age: 6.4 l!Age: 1.4 1/Age: 0.8
regressor, its inverse and its square root .,/Age: 4.5 JAge: 1.9 ,/Age: 1.6
Notes:
ª Sex squared omitted from \Vhite's test for multiple regressíon owing to multicollinearity in the test equation.
b t-statistics for Glejser test are absolute values.
264 Econometrics for developing countries
A 1
le 1=163.97 - 2159.0 AGE
(335.5)
The two-sided t-tests for the slope coefficients equal to zero are signifi-
cant at the 1 per cent level in all three regressions above (see Table 7.2).
Thus, Glejser's test also rejects the hypothesis of homoscedasticity.
lt is quite possible that not all the four tests - Bartlett's, Goldfeld-
Quandt, White's and Glejser's - produce the same results in terms of
either rejecting or accepting the hypothesis of homoscedasticity. If all the
tests fail to reject homoscedasticity then we are on firm ground to proceed
with the initial model. If at least one test rejects homoscedasticity then
we should examine carefully the nature of heteroscedasticity by means of
the graphs discussed earlier and proceed according to the principles which
we shall discuss in the next section.
Exercise 7.3
Using the cases listed in exercise 7.1, try out all four tests discussed in
this section to test for heteroscedasticity.
Exercise 7.4
Continuing with exercise 7.2, do the tests for heteroscedasticity with
the model featuring the logarithms of income versus the age of Indian
workers.
o o 8
o
§ º§ oºo o o
Q) ºº0 o s8o o o
Ol o 0
8ofil~o @º§@o@
Cll
e
o o8ºog gº 8 o8
o o §o ºª8º0 ºº g
(]) o o o g§ ~ 0
o @º oo 0
° 0
o o
o o
E g o o o g o o ºº o o o
8 º º 8 º º eºe º º º o
.!: o 8 oºº o ºº o
¡,; ooº ºooºso o o
Cii o o o o0 o o
::i
o o
"O o o
·¡¡; @~§ o
o
~ o
o
o
Ol o o o
.3 o o
o
o
o o
-0.158628
4.13298 5.92665
Lag (predicted wage income)
srara™
the income-age data the corresponding scatter plot with regression line
is depicted in Figure 7.4.
The plot indicates a linear relationship between ln(jej) and the ln(Yp),
where YP is the predicted Y. The corresponding fitted regression is given
as follows:
ln(j e j) = -2.0930 + 1.2108 ln(Yp)
(0.1691)
which reveals that the slope coefficient is reasonably close to unity.
Exercise 7.5
Test formally whether the slope coefficient in equation (7.7) is significantly
different from one, using the 5 per cent significance level. What do you
conclude from this test?
You will have found from the two-sided t-test that the slope coefficient is
insignificantly different from one at the 5 per cent significance level.
Hence, it seems a good idea to try out the logarithmic transformation to
see whether it helps us to eliminate heteroscedasticity. The regression of
ln(Y) on AGE yields the following results (t-statistics in brackets):
Dealing with heteroscedasticity 267
2.61322 o
o
o o o
o
o o o
o 8
o o
o o o o
o o o o o
o o o o o
ºo o g
o o o 8 o
o ºg oº o o o ºº o og o
o ºoºº o o 0
00°~ o o o o
o o o o
o o 8ºº o
0
§ o !!¡¡ 0 o
0 0°08 oººB 8 8 o o !! o
o o ºog o o o o o o
o o
OOOO!! § § 8 O!! o o
o0 o o o 0° o o
o oo ~ oº o o ~o 0 o o 800°
o o oº ~8 ºosº o o
ºo o o§gº o 90 O ¡¡oºº oºo ºº o o
0.003345 o ºoo!! o o o 0 o O
4.08787 6.21782
Predicted log wage income
STaTa""
Exercise 7.6
Using the data in INDIA, perform Bartlett's, Goldfeld-Quandt, Glejser's
and White's test for the regression of log income on age. Comment on
your results.
268 Econometrics far developing countries
1800 o
o
o
o o
o o
o
o
~
.a o
'6
e o
Q)
a. o
><
Q)
""O
o
o
u.
83 o
382 3832
Total expenditure
7.49554 o o
o
o o o
o o
o
o
o
~
.a
'6
e
Q)
a.
><
Q)
""O
o o
.E o
Ol
o o o
...J
o
4.41884 o
5.94542 8.25114
Log total expenditure
Exercise 7.7
Using the data in INDIA, regress log income on age, education and sex.
Test the hypothesis that education and sex may be dropped from the equa-
tion. Perform Bartlett's, Goldfeld-Quandt, Glejser and White's tests.
Comment on your results.
The results you should achieve for exercise 7.7 are shown in Table 7.2.
All the calculated test statistics, except Bartlett's, have fallen still further
and the null hypothesis may now be accepted in the case of the Goldfeld-
Quandt test. This example shows that residual heteroscedasticity may well
result from omitted variable bias, and the inclusion of incorrectly omitted
variables will reduce evidence of heteroscedasticity.
270 Econometrics far developing countries
The statistic for Bartlett's test did not change since it is calculated with
reference to the dependent variable. It is therefore unchanged by changes
in the functional form of the regressor rather than regressand. Except
White's test, all the tests are calculated with respect to a single regressor
(age in the example here). If there are more regressors, then the data
should be sorted by each regressor in turn and the test recalculated for
Bartlett's (as shown in Table 7.2) and the Goldfeld-Quandt tests, and the
absolute residual regressed on each regressor (and transformations
thereof) in turn for the Glejser test. These steps are not necessary if there
is sorne good reason to believe that the heteroscedasticity is related to
one specific regressor.
and, hence, the model for grouped data has heteroscedastic error terms.
But note that the error variance is of the form as specified in (7.9) with
w? = (l/n;).
V (5_) = J_
W;
V( E.) = J_ w. u2 = u2
w/ ' w;2 '
2
(7.15)
Exercise 7.8
Using the data in the file TPEASANT (farm size and household size in
Tanzania), estímate the regression of landholding size on household size
with weighted least squares. Do you think that the resulting regression
satisfies the assumptions of classical linear regression?
o o
o
~
.a o
o
o
'O o
e:
Q)
a.
X
Q) o
"O o
o
E o
ro::;¡
"O
·¡¡; o
Q) o
a: o
o o o
269.685 1905.96
Predicted food expenditure
s ta 1 a•M
582.7 o
o o
ro::;¡ o
"O
o
·¡¡; o
o
~ o
Q) o
"5 o o
o(/) o o
.o oºoo oºº o
<! ~o o '2P o
0
O (jo CD & O ~O o
~ ºº o o o o
0 'O~oO~ ;50 00 o o
o o o cP
o ~o 8 'b '(, c8 o
8=>
000 o
º~§:¡
o
.Glm.~
8 ºóJV~ o oo o o o
g' g:¡scg 'ti> o o o
0.608848
382 3832
Total expenditure
STaTa"'
0.475001 o
o o
o
o o o
o o o
o
o
o
o o
o o
o o o o o
o o o
0.000069
0.000261 0.002618
Inversa total expenditure
1989.55
Q)
~
ue:
g¡_
X
Q)
"O
o
o
LL
83
382 3832
Total expenditure
STaTa"'
where the formula is simplified using the assumption that E( E¡2) = u 2 for
all i. Where this assumption is not valid (i.e. the errors are heteroscedastic)
then:
Dealing with heteroscedasticity 277
~ (X. - X)2 (T2
Var (b 2) = ~(i; _X) 2 ' (7.22)
White (1980) showed that substituting the squared residuals (e¡2) into
equation (7.22) yields a consistent estimate of the standard errors (this
result generalises to the k variable case). However, unlike with weighted
least squares, these are not the minimum variances.
Inspection of equation (7.22) shows that if the errors are homoscedastic
then the expression simplifies to that in equation (7.21). That is, the het-
eroscedastic consistent standard errors and those usually reported will be
the same if there is no heteroscedasticity. A divergence between these
two sets of standard errors is thus a rough test for the presence of
heteroscedasticity.
ADDITIONAL EXERCISES
Exercise 7.9
Use the data in data file INDFOOD to test far heteroscedasticity in the
regression of household faod expenditure on total expenditure. Repeat
the tests using the lag of both variables. Comment on your findings.
Exercise 7.10
Using the data in data file LEACCESS, regress life expectancy on (a)
income per capita; (b) logged income per capita; and (c) logged income
per capita and access to health. Test far heteroscedasticity in each regres-
sion equation. Comment on your results.
8 Categories, counts and
measurements
8.1 INTRODUCTION
Categories matter a great deal in empirical analysis. The reason is that
the average level of a numerical variable or relations between variables
may differ quite markedly across different categories. For example, rural
or urban location affects consumption and production patterns of house-
holds. Similarly, wage and salary earnings may differ between men and
women, even for the same level of education and years of experience.
Occupational status affects both the health (for example, mortality rates)
and the wealth of people. Categorical variables allow us to classify our
data into a set of mutually exclusive categories with respect to sorne qual-
itative criterion: for example, men/women; rural/urban; occupation; region,
countries or continents; policy regimes. In practice, this type of variable
is inevitably discrete in nature inasmuch as we only consider a definite
(usually limited) number of categories. For example, the gender variable
has only two categories (male/female), while a variable on occupational
status usually distinguishes among eight to ten categories. The distinctive
nature of these variables, therefore, is that they do not measure anything,
but assign a quality to our data (i.e. they are qualitative not quantitative ).
Hence, we cannot compute an average for this type of variable, but we
can count (frequencies) how many observations in our data set fall in a
group defined by a qualitative categorical variable. This chapter deals with
ways in which these variables can be employed in empirical analysis to
look deeper into the structure of our data. In particular, this chapter shows
that the use of categorical variables helps us to guard against making
unwarranted generalisations based on the assumption that homogeneity
prevails when, in fact, we are lumping things together which should be
kept separate. We have already come across categorical variables in
Chapter 6, when we introduced dummy variables. In this chapter, we take
this analysis further and look at the categories behind the dummies.
Throughout this chapter we shall use one extended example concerning
the effects of education and gender on weekly wage earnings for the sam-
ple of 261 workers in an industrial town of southern India (data set INDIA)
280 Econometrics for developing countries
to illustrate our argument on categorical variables. Section 8.2 deals with
the analysis of the relation between a numerical dependent variable and a
categorical explanatory variable, which involves comparing averages
between categories and, hence, is a natural extension of the principle of
regression. We show that dummy variables can be used to depict categories
of a categorical variable, a technique which allows us to extend the reach
of regression analysis to <leal with qualitative explanatory variables. Section
8.3 shows how to analyse the association between two (or more) categori-
cal variables in the context of a contingency table. This technique allows us
to test whether two (or more) categorical variables are statistically inde-
pendent or not, using a chi-square statistic, thus laying the groundwork for
discussing the regression between a quantitative dependent variable and
two (or more) categorical variables in section 8.4. Here we meet again the
by now familiar concept of partial association which we carne across in
Chapter 5. But we also introduce you to interaction effects which result
when two or more categorical variables interact in unison to affect the out-
come of the dependent variable. As usual, the last section 8.5 summarises
the main points of this chapter. Perhaps you are wondering whether cer-
tain problems involve the use of a dependent qualitative (categorical) vari-
able. They do, but they will be dealt with in the next chapter.
households into a few income categories such as low, middle and high
income earners. This gives rise to income as an ordinal categorical vari-
able. We use the term 'ordinal' because the three categories involve an
ordering. In a similar way, a rainfall variable classified by low, medium
and high rainfall is also an ordinal variable. Another such variable, and
one we shall use in this example, is educational achievement: below
primary, primary, secondary and higher education. The latter example
shows that an ordinal variable is not necessarily derived from a measure-
ment variable. Years of schooling is a quantitative variable but educational
achievement is not exactly the same: it indicates that certain standards
have been reached and successfully accomplished.
Exercise 8.1
Using the data set SOCECON, construct an ordinal income variable by
grouping developing countries into low, lower-middle and upper-middle
income countries as measured by GNP per capita. To do so, use the
following cut-off points: low income countries, $600 or below; lower-
middle income countries: above $600 and up to $2,500; upper-middle
income countries: above $2,500 and up to $9,000. (Exercise 8.4 will require
you to analyse life expectancy by these categories, so be sure to sort these
data by the income categories when doing this exercise.)
Do men earn more than women? To answer this question with our data
we need to analyse the association between weekly earnings, a quantita-
tive variable, and a dichotomous qualitative variable. To do this, we need
to tabulate earnings by gender. This is done in Table 8.2. The income
differences between male and female workers come out quite sharply from
these two columns: almost 78 per cent of the female workers earned less
than Rs 150 while the corresponding proportion among their male coun-
terparts was about 56 per cent. These differences are reftected in the
average earnings of male and female workers, shown in the last row of
Table 8.2. Gender as a category, in this case, thus identifies heterogeneity
with respect to wage earnings in the population: women earn less than
men; hence, averaging across male and female workers is incorrect.
282 Econometrics far developing countries
Table 8.2 Distribution of weekly wage earnings (per cent by gender in each
income group)
Weekly wage earnings Gender
(lndian Rs) Male Fe mal e Total
(!) (2) (3) (4)
Up to 70 16.5 54.6 24.5
71-150 39.3 23.6 36.0
151-300 22.8 12.7 20.7
300 + 21.4 9.1 18.8
Total 100.0 100.0 100.0
Average weekly income 182.9 102.0 165.9
= b¡ + b2 1 = b¡ + b2
In other words, b 1 is the average wage income of the male workers, i.e.
b1 = 182.94, and b 2 is the difference between average female and male
wage incomes: b 2 = 101.99 - 182.94 = -80.95. This result may be proved
more formally (see exercise 8.8). Note that we are talking about sample
averages (or, more precisely, sample means). The regression model, there-
fore, can be written as follows:
w = 131 + 132 D + E (8.3)
where 13 1 and 13 2 are the population parameters of the model: respectively,
the mean earnings of male workers and the mean difference in earnings
between female and male workers.
However, not surprisingly, the distribution of weekly wage earnings of
the 261 workers is skewed, as can be seen from Table 8.2. The skew in
the distribution is clearly discernible - while the average weekly income
is about Rs 166 (given in the last row), more than 60 per cent of the
workers earned below Rs 150. The standard assumptions about E in the
classical regression model are obviously not applicable here. Therefore, a
transformation of the dependent variables is called far: once more, the
logarithmic transformation of W does the trick. Hence, we rewrite the
model (8.3) as:
ln(W) = 13 1 + 13 2 D + E (8.4)
284 Econometrics for developing countries
As explained earlier, in regression model (8.4) the constant term, 13 1, gives
us mean log earnings of male workers while the slope coefficient, 13 2 , states
the mean difference in log earnings between female and male workers.
This is typical of modelling categorical variables with dummy variables:
one of the categories is used as the benchmark represented by the constant
term, and the other categories are then compared (through differencing)
with this reference category by means of their corresponding slopes. Box
8.1 tells you why this is the best way to use dummy variables in regression
Exercise 8.2
Using Table 8.3, formally test the hypothesis b2 = O. Does your conclusion
confirm that gender matters in terms of explaining earning differences?
Exercise 8.3
Using Table 8.5, formally test the hypotheses ¡3j = O, for j = 2, 3, 4. What
do you conclude in terms of the importance of educational level on weekly
earnings?
You should have found that there does not appear to be a significant
difference between average earnings of workers with primary and below
primary education, but both secondary and post-secondary education
clearly matter.
Categories, counts and measurements 287
Exercise 8.4
In exercise 8.1 you were requested to construct a categorical variable to
depict income per capita categories for different countries. Select life
expectancy as an additional (numerical) variable from the data set
SOCECON and:
1 compare mean- and order-based statistics of life expectancy for each
category of the income variable and construct the corresponding
comparative box plots;
2 construct dummy variables to represent this categorical income variable;
3 regress life expectancy as dependent variable on the dummies thus
constructed.
How does your analysis under (1) compare with your regression results
obtained in 3?
Contingency tables
First, we take up the issue of association between gender and education
which, if it exists, will play a role in the partial associations with income
(recall Chapter 5). To do this we use a contingency table. A two-way con-
tingency table presents a joint frequency distribution of two categorical
variables. Table 8.6 is the contingency table of gender and education,
listing frequencies (or counts) of 261 workers jointly by gender and by edu-
cation. A contingency table such as is given in Table 8.6 is used to examine
whether or not two categorical variables are stochastically (statistically)
independent (see Box 8.2). We need a table of cross-tabulations because,
with categorical variables, we cannot compute numerical summaries like a
covariance ora coefficient of correlation straight away. But we can use the
frequencies to compute fractions of counts or proportions which, as we shall
see, serve as estimates of the corresponding probabilities.
To say that two categorical variables are stochastically dependent does
not imply any statement about causation. It merely states that they
are associated with one another. Hence, when investigating stochastic
288 Econometrics far developing countries
workers and the number of 'below primary' workers. Since the sample
proportions are good estimators of the corresponding binomial probabil-
ities, we can use the corresponding sample proportions of male workers
(206/261) and 'below primary' workers (147/261) as estimators of these
probabilities. Hence, the estimated expected cell frequency for ( G = male
and E= below primary), under the hypothesis of stochastic independence,
is equal to 261 (206/261)-(147/261) = 116.02, which compares with an
observed frequency of 111. The same steps can be repeated for the rest
of the cells of the contingency table to obtain the estimated expected
frequencies under the hypothesis of independence. The third column in
Table 8.7 lists all estimated expected cell frequencies.
If the hypothesis of independence is correct, then these expected
frequencies should be very clase to the actually observed frequencies in
the sample, i.e. the first and the third column of Table 8.7 will be very
clase to each other. The more the third column deviates from the first
column, the less likely it is that both variables are indeed stochastically
independent. What we need, therefore, is a numerical tool to summarise
the deviations of the expected frequencies (EXP) from the observed
frequencies (O BS). This is done as follows:
2
x2 = 2: [(OBSE~:XP) ~ ( 8.9)
where the summation is over all the cells in the contingency table.
The resulting measure is called the contingency chi-square statistic. It
is easy to see that the more the expected frequencies deviate from the
observed frequencies, the larger will be the value of the statistic since it
is based on the squared differences between the two sets of frequencies.
The larger the value of the statistic, therefore, the more the hypothesis
of independence is suspected. But how large a value of this statistic is
large enough to reject the hypothesis of independence? Such a judgement
is based on how unlikely (in the probability sense) is the large value of
292 Econometrics for developing countries
the statistic. Now, the sampling distribution of this statistic is a chi-square
distribution with degrees of freedom equal to d = (e - l)(r - 1), where e
is the number of columns and r the number of rows in the contingency
table. This explains the name of the summary measure in equation (8.9).
So all we need next is to find the cut-off points (critica! values) in the cor-
responding chi-square distribution at which there is only 5 per cent or
1 per cent probability (level of significance) for the statistic to have a larger
value. If the computed value of the statistic is larger than the critica! value,
then we reject the hypothesis of independence at the corresponding level
of significance. Note that computer software often provides the upper-tail
probability of the computed value, i.e. the probability of the statistic being
larger than the computed value. In that case, we reject the hypothesis of
independence if the upper-tail probability is less than 5 per cent or 1 per
cent. This procedure to test the hypothesis of independence of two cate-
gorical variables is called the contingency chi-square test of independence.
In our present example, the value of the test statistic computed at
the bottom of the fourth column is 6.54. The degrees of freedom are
(4 - 1)·(2 - 1) = 3. The critica! value of a chi-square distribution with
three degrees of freedom is 7.81 at 5 per cent level of significance. Since
the computed test statistic of 6.54 is below the critica! value we cannot
reject the hypothesis of independence between gender and education of
workers at the 5 per cent level of significance.
The test of independence described above checks for the existence of
association. The larger the value of the test statistic, the greater is the
evidence of association. The same chi-square statistic can also be used to
develop a measure of association. One such measure is given by Cramer's
V, defined as follows:
2
V= J(OBS- EXP) . 1
(8.10)
EXP n·(k-1)
Exercise 8.5
Using exercises 8.1 and 8.4 and the data set SOCECON (world socio-
economic data), define a categorical variable which picks out the countries
of Sub-Saharan Africa from among the developing countries. Investigate
whether this variable and the categorical income variable are statistically
independent or not, using both the contingency chi-square test of inde-
pendence and Cramer's V statistic.
Categories, counts and measurements 293
8.4 PARTIAL ASSOCIATION AND INTERACTION
In our example we found that both gender and education are related to
wage income. To distinguish between their separate effects on wage income,
we shall make use of the familiar concept of partial association. But we also
need another concept which, initially, is a little difficult to understand but
matters a great deal when working with categorical variables. This is the
concept of interaction between two (or more) categorical variables in their
effects on a measurement variable: for example, the differences in partía!
associations of income with education across the gender categories.
Let us start with the question of the partial associations of income with
gender and education. For convenience of exposition, we collapse the four
categories of education into two categories - 'below secondary', and 'sec-
ondary and above'. The partial association between weekly wage income
and education is the association between these two variables, keeping the
third variable, gender, fixed at a specific category. For example, consider
the contingency table of income groups and educational categories for
male workers in Table 8.8(a). The degree of association in this contingency
table is the association between income and education for male workers.
Therefore, it is a partial association between income and education.
Similarly, the degree of association between income and education in
the contingency table for female workers, Table 8.8(b ), is also a partía!
association. The chi-square statistics indicate that the hypothesis of inde-
pendence between income and education can be rejected at the 1 per cent
Exercise 8.6
Using the data set SOCECON, investigate the partial associations and the
interaction effects of the categorical income variable and the variable
denoting Sub-Saharan African countries on life expectancy. In each case,
explain carefully what each of these concepts measures in this concrete
example.
Exercise 8. 7
Show how regression analysis can be used to investigate the relation
between life expectancy as dependent variable and the categorical income
per capita variable and the Sub-Saharan Africa identification variable as
explanatory variables. Check whether your results conform with those
obtained in exercise 8.6.
ADDITIONAL EXERCISES
Exercise 8.8
Consider the regression model
y = ¡3¡ + ¡32 D + E
where Y is income and D a dummy variable for gender (D = 1 for female
and D = O for male). Use the formulae for the least squares estimators
to show that
300 Econometrics for developing countries
- - -
b 1 = Y m and b 2 = Y 1 - Y m
where the m and f subscripts denote male and female respectively. (The
solution to this exercise is given as Appendix 8.1.)
Exercise 8.9
Repeat your analysis of life expectancy carried out in this chapter but
now including dummy variables to separately identify Sub-Saharan Africa,
Asia, North Africa and the Middle East, Eastern Europe and developed
countries. Compare your results with those obtained earlier.
Exercise 8.10
Using the data in the PRODFUN file, calculate labour productivity and
classify firms as having low, medium and high productivity. Is the level of
labour productivity independent from the sector in which a firm operates?
:. D - D = nm if D = 1
n
-_ - --L
n, if D =O
n (A.8.3)
where n1 and nm are the number of women and men respectively. The
denominator in equation (A.8.2) is therefore given by:
(~)
2
-n
_
m n
+n1 (nm)
n
2
-_ !!d!J
n
(A.8.4)
Categories, counts and measurements 301
The numerator is:
n nm ~
2: (D -
i=l
D)Y; = 2: (D; -
i=l
D)Y; + 2.J (D; - D)Y;
i=l
(A.8.5)
(A.8.6)
=Y- nmn [Y - Y] f m
(A.8.7)
9 Logit transformation, modelling
and regression
9.1 INTRODUCTION
Up to now, the dependent variables in the economic models we discussed
were all measurement variables. But what if the dependent variable we
are interested in is categorical in nature? For example, we may be inter-
ested in investigating the main determinants of home ownership in an
urban setting. Alternatively, our interest may be to find out why sorne
rural labourers succeed in obtaining permanent jobs while others have to
depend on temporary work or casual jobs. In both examples the depen-
dent variables are categorical in nature. In fact, they are both dichotomous
variables. This chapter looks at dependent categorical variables which are
dichotomous in nature and, hence, can be represented by dummy vari-
ables. Our focus, therefore, is on regressions in which the dependent
variable is a dummy variable.
When dealing with a dichotomous dependent variable our main interest
is to assess the probability that one or another characteristic is present.
Does a labourer have a permanent job, or not? Does the household own
its home, or not? What determines the probability that the answer is yes,
orno? It is the latter question which we try to address when dealing with
logit regression. This is what makes logit regression, despite many simi-
larities, essentially different from the linear regression models we have
discussed so far. In multiple regression, for example, we try to predict the
average value of Y for given values of the independent variables with the
use of a regression line. In logit regression, however, our interest is to
predict the probability that a particular characteristic is present. Hence,
we do not predict whether Y equals 1 or O; what we predict is the prob-
ability that Y = 1 given the values of the independent variables.
But what are logits anyway? We start this chapter by showing that
logits provide a convenient means to transform data consisting of counts
of a dichotomous variable. Hence we can show the connection between
proportions (or percentages), odds and logits as alternative summaries
of observed counts of a dichotomous variable. We argue that the logit
transformation is a handy and user-friendly tool for data analysis,
Logit transformation, modelling, regression 303
independently from its further usefulness in the context of logit model-
ling and regression.
Next we turn to logit modelling in the context of multiway contingency
tables. You have already come across the concepts of contingency tables
and dependence or independence of categorical variables in the previous
chapter. In this chapter, we show how this analysis can be extended to
investigate the effects of one or more categorical variables (the indepen-
dent variables) on a dichotomous response variable (the dependent
variable). From here it is a small step to reach logit regression as a flex-
ible model to deal with regressions featuring a dummy variable as
dependent variable, which is done in the closing sections of this chapter.
The final section summarises the main points.
0=-n- (9.2)
n-N
p
(9.3)
(1 - p)
Far example, the odds favouring high profit enterprises in the urban small-
scale manufacturing sector in Tanzania are:
0.0739
O man,Dar = 1 - 0.0739 = 0.0798
In this case, the odds are very similar in value to the corresponding propor-
tion : 0.0798 as against 0.0739. The reason is easy to understand once we
take another look at equation (9.3). For small values of p, the denomi-
nator (1 - p) will be approximately 1 and, hence, the odds are approx-
imately equal to the corresponding proportion.
A logit, L, is obtained by taking the logarithm of the odds. Hence:
L = log (O) (9.4)
= log p/(1 - p) (9.5)
= log p - log (1 - p) (9.6)
In our example, the logit of high profit enterprises in urban informal manu-
facturing equals:
Lman,Dar = ln 0.0798
= -2.528
The difficulty sorne students have with logits is not to do with the way
it is calculated but with the question as to why we should bother computing
the logarithm of the odds in the first place. But, as we intend to show,
the logit transformation is a convenient way to deal with the analysis of
dichotomous categorical variables. Why?
Many people prefer to work with proportions (or, better still, percent-
ages). However, the problem with proportions is that they have clear
boundaries - a fioor of zero and a ceiling of one - which can cause trouble
when working with either very small or very large proportions. In the
Logit transformation, modelling, regression 305
0.3 tran
0.25
0.2
0.15
0.1
0.05
o
Rural Urban
s1a1a'M
case of our Tanzanian data, for example, most proportions of high profit
enterprises in the total number of enterprises turn out to be very small,
particularly in rural-based sectors. This can lead us astray when comparing
the variations in these proportions between urban and rural enterprises,
as shown in Figure 9.1. The comparative box plots show the variation
across sectors in the proportion of high profit enterprises in rural and
urban enterprises respectively. With the exception of urban transport as
a far-outlier, this figure conveys the impression that the variation across
sectors is fairly similar for rural and for urban enterprises. But, in fact,
the ratio of the highest to the smallest proportions in rural-based sectors
is far in excess of the same ratio for urban-based sector, even if we include
the outlier. You can easily verify that this ratio equals 4.6 for urban-based
sectors as against 21.6 for rural-based sectors. The comparative box plots
shown in Figure 9.1 effectively hide this greater internal variability among
rural sectors since all its proportions are small so that the box plot is
squashed against the fioor.
By contrast, Figure 9.2 shows the comparative box plots of the logits.
This plot shows much greater internal variability among rural-based
sectors as against urban-based sectors. Why is this? The answer lies in the
effects of the logit transformation on proportions. A proportion can range
from O to 1, that is:
306 Econometrics for developing countries
o
-1 tran
-2
-3
-4
-5
Rural Urban
srata™
Figure 9.2 Comparing logits
(9.7)
but the logit transformation stretches the tails of this distribution, and,
hence:
-oo~L~+oo (9.8)
where L = O (i.e. O = 1) corresponds to a proportion equal to 0.50. Hence,
logits bring out significant variation among small or large proportions far
better than can be seen from merely looking at the proportions them-
selves. Indeed, in our example, the box plot of the logits tells us that, with
the exception of urban transport, urban-based sectors are far more homo-
geneous with respect to the prevalence of high profit enterprise than
rural-based sectors.
The logit transformation, therefore, helps us to detect patterns in counts
or proportions of dichotomous categorical variables, particularly in cases
where the characteristic we are interested in is either very rare or highly
prevalent. Fortunately, in the middle range (corresponding to proportions
around 0.50), proportions and logits tell a similar story. Consequently, as
Tukey (1977: 509) puts it, logits are 'first-aid bandages' when working with
counted fractions since, on balance, they allow us to bring out patterns
in the data which may be obscured when using proportions or percent-
ages. For this reason, logits are useful exploratory choices for data analysis
with counted fractions of categorical variables. But, as we shall see below,
Logit transformation, modelling, regression 307
logits also improve our capacity to model data when the dependent vari-
able is a dichotomous categorical variable.
Exercise 9.1
Use Table 9.2 to compute the proportion, odds and logits of informal
sector enterprises which had existed less than five years by 1991 in both
rural and urban areas in Tanzania. Make comparative box plots for each
of these cases.
Table 9.2 Tanzanian informal sector enterprises by sector, location and age
Urban Rural
Less than 5 or more Less than 5 or more
5 years years 5 years years
Agriculture and fishing 39,788 24,687 34,471 43,163
Mining and quarrying 6,070 7,117 1,802 2,150
Manufacture 51,459 33,448 137,642 212,883
Construction 11,399 17,451 38,613 48,526
Trade, restaurants and hotels 308,312 96,881 341,017 179,995
Transport 5,411 2,460 31,674 9,181
C&P services 20,087 19,648 23,857 38,031
Source: Tanzania: The Informal Sector 1991, Planning Commission and Ministry of
Labour and Youth Development, ENT 5 (adapted).
308 Econometrics far developing countries
gorical variable of the age of the enterprise (more or less than five years
old) divides up the total number of enterprises by the type of policy regime
in operation when they were started: roughly, befare and after structural
adjustment policies were initiated.
Clearly, the data in this table can be used to explore questions about
structural changes in the spread of informal sector activities across sectors
befare and after the implementation of structural adjustment policies
and across the rural/urban divide. Note, however, that the table cannot
tell you much about the growth in the number of enterprises over time
(befare and after the era of structural adjustment policies) since the
mortality rates of informal sector enterprises at different points in time
are unknown. But since there are no clear a priori reasons to believe that
mortality rates differ markedly across either sectors or the rural/urban
divide, a table like this may allow us to make meaningful inferences with
respect to structural changes in informal sector activities. One question
which springs to mind, among others, is whether there was any inherent
bias in favour of trade among informal sector activities as a result of struc-
tural adjustment.
Logit modelling
How do logits come into our analysis? As we have seen, a logit is the
logarithm of the odds. In this example, we consider the odds in favour of
trading activities since the dichotomous 'trade/non-trade' variable is our
dependent variable. We therefore simplify Table 9.3 by introducing the
logarithms of the odds in favour of trade explicitly as the dependent vari-
able (Table 9.4).
How do we interpret this table? First start with the logits themselves.
For example, the logit in favour of trade for urban enterprises initiated
befare structural adjustment equals -0.0787. To interpret this value it is
useful to work your way back to the corresponding odds and proportion.
In general, if L is the logit, the corresponding odds, O, is obtained as
follows:
Logit transformation, modelling, regression 309
Table 9.3 Informal sector enterprises by trade/non trade, age and location
Trade Non-trade Total
Age Urban Rural Urban Rural Urban Rural
Less than 5 years 308,312 341,017 134,214 268,059 442,526 609,076
5 or more years 96,881 179,995 104,811 353,934 201,692 533,929
Total 405,193 521,012 239,025 621,993 644,218 1,043,005
o= eL (9.9)
which in this example yields:
o _ e--0.0787
";35,urban -
= 0.924
To obtain the corresponding proportion, p, we proceed as follows. Since
0=-p-
1-p
it follows that
o (9.10)
p = 1 +o
Exercise 9.3
Calculate the odds ratios in favour of trade:
1 with respect to urban and rural location of enterprises initiated after
structural adjustment policies;
2 with respect to policy regime (before/after structural adjustment poli-
cies) for urban areas; and
3 with respect to policy regime for rural areas.
Now, the important thing to note is that simple differences of logits are
in fact logarithms of odds ratios. This result is very convenient since it
greatly enhances our ability to make sense of cross-tabulations of logits,
as is done, for example, in Table 9.4. To see this, we start with the odds
ratio defined as follows:
(9.12)
In n = In ( ~J = In 0 1 - In 0 0
(9.13)
Conversely, if we computed both L 1 and L 0 , the odds ratio can be obtained
by calculating the anti-logarithm of their difference:
(9.14)
This simple result now explains why, in Table 9.4, we added a final row
and a final column of differences of the logits tabulated across dichoto-
mous explanatory variables. Take, far example, the difference 0.9104
obtained by subtracting the logit in favour of trade far urban enterprises
of five years old or more from the logit of similar enterprises less than
five years old. The odds ratio far urban areas in favour of trade as a result
of the shift in policy regime is then obtained as fallows:
í\
~L<S/;;,5,urban
-
-
eD.9104 -
-
2 •485
Exercise 9.4
Using Table 9.4, show that the odds ratios between rural and urban loca-
tion in favour of trade were very similar far enterprises started befare
and after structural adjustment.
Interaction effect
But what about the computed value in the bottom-right comer of Table
9.4? This value is obtained by taking the difference of differences of logits.
Therefare, it involves differencing twice. Note, however, that it does not
matter whether we take the difference between row-differences or between
column-difference: both yield the same result. But how do we interpret
such difference of differences? The interpretation is quite straightfarward
notwithstanding the apparent complexity of differencing twice. The differ-
ence between two logits is the logarithm of an odds ratio; differencing
one more time yields the logarithm of the ratio of odds ratios. What this
measures is the interaction effect of both explanatory variables on the
dependent variable.
312 Econometrics for developing countries
In Table 9.4 we already noted that the column differences (respectively,
0.9104 and 0.9169) are very similar in magnitude. Consequently, the odds
ratio in favour of trade as a result of a change in policy regime is very
similar for both urban and rural areas. In other words, rural and urban
location does not interact with the change in policy regime (measured by
the age variable) as far as their impact on the prevalence of trading activ-
ities within the Tanzanian informal sector is concerned. The interaction
effect is measured by the logarithm of the ratio of the odds ratios. To
obtain the latter ratio, we only need to take the antilogarithm of the differ-
ence of differences between the logits in the table, as follows:
Exercise 9.5
Can you explain why the R 2 of the regression without constant term turns
out to be much higher than that of the regression with constant term,
although (in fact) the residual sums of squares of both regressions turned
out to be very close together? (Hint: Keep in mind that in a regression
through the origin the total sum of squares of the regression is not the
sum of squared deviations from the mean of the dependent variable (as
is the case when the regression features a constant term), but the sum of
squared values of the dependent variable, which in this case is a dummy
variable.)
The scatter plot with this regression line is shown in Figure 9.3. This plot
looks quite different from a scatter plot where both variables are measure-
ment variables, since all data points have a Y value 1 or O. The problem
with this plot, however, is that with a large sample it is quite possible that
different data points end up being superimposed on one another because
many of the workers concerned are likely to have the same age (measured
in years). To be able to see the distribution of the data points, therefore,
it is advisable to jitter the points a bit so that the thickness of the scatter
comes to the fore, as in Figure 9.3. (If your computer package will not
do this automatically then add a small random number to your dummy
variable for the purpose of drawing the graph.)
The regression line predicts the probabilities that the F;s equal 1 for a
given age. Hence, at age 15, our regression line predicts that the proba-
bility of encountering a worker with a permanent job is about 0.19, while
at age 55 the corresponding probability has risen to 0.70. This is why this
type of regression is called the linear probability model: it expresses the
probability that the dependent variable equals 1 for different values of the
explanatory variable(s). In this case, age clearly matters whether a worker
Logit transformation, modelling, regression 315
e:
Q)
e
C1l
E
(¡;
a...
15 20 25 30 35 40 45 50 55 60 65 70 75
Age
STaTa™
Y¡ = ~1 + ~2 X¡ + E¡ (9.18)
Probability
o
1
(9.20)
60
50
40
30
20
Manual worker Operator
stata"'
given age within the age range from 10 to 70 years were all less or equal
to 1, or more than or equal to O. But this does not always happen, as can
be shown with a similar example taken from another part of the world
and in a different context.
This example concerns the selection of machine operators from a pool
of casual manual workers in Maputo harbour at the time of independence.
The problem emerged when, in Mozambique, many Portuguese settlers
who previously occupied most of the skilled jobs left the country in large
numbers after independence in 1975. The result was a grave shortage of
skilled labour in Maputo harbour, mainly of machine operators. A solu-
tion was sought by accelerated training and upgrading of workers selected
from a large pool of casual manual labourers. The age of the worker
appears to have been an important criterion for selecting trainee opera-
tors from among the casual workforce. To verify this hypothesis a stratified
random sample was taken of these two groups of workers: newly trained
operators and the remaining casual labour force.
As in the previous example, we have a dependent dummy variable, O,
which equals 1 if the worker is an operator and O otherwise (indicating
that the worker is a manual labourer). As befare, the explanatory vari-
able in the model is the age of the worker in years. The hypothesis we
seek to test is whether selection favoured younger workers. In fact, the
comparative box plot of the age of worker by type of work reveals that
318 Econometrics far developíng countries
-
~
Q)
c.
o
o ++
*
20 30 40 50 60 70
Age
o ++
20 30 40 50 60 70
Age
Figure 9.6 Linear probability model and logit regression: Maputo worker data
Logit transformation, modelling, regression 319
age matters, as shown in Figure 9.5. However, since age clearly is the
explanatory variable in this case we prefer a model which features the
type of worker as qualitative dependent variable. Estimating the linear
probability model with ordinary least squares yields the following results:
O; = 1.158 - 0.0204 A;
(0.187) (0.0045) (9.22)
R2 = 0.14; No. of observations = 129
a regression which confirms our hypothesis that the selection procedure
for upgrading workers favoured younger workers. But this regression is
in fact quite problematic, as can be seen from Figure 9.6 which depicts
the scatter plot with the corresponding regression line.
The problem is that our regression predicts negative probabilities from
the age of 57 years and above. A worker who was about 25 years old at
the time of independence had an estimated probability of 0.65 while a
worker of 56.5 years old would not be selected (zero probability). Above
the latter age, the probabilities predicted by this regression turn negative.
A further problem is that we can no longer apply the two-step procedure
of weighted regression to correct for heteroscedasticity either, since sorne
of our weights turn out to be negative.
E
Q)
e
ro
E
(¡;
c...
10 20 30 40 50 60 70 80
Age
Figure 9. 7 Linear probability model and logit regression: ludian worker data
Exercise 9.6
Using the data set INDIA (Indian worker data) estímate the linear prob-
ability model between F and the age of worker, respectively, for all data
points, for male workers and for female workers. Construct the corres-
ponding scatter plots with regression line.
(9.24)
and, in the case of random sampling where all observations are sampled
independently, the likelihood function will simply be the product of the
individual contributions, as fallows:
L = rrn
i=l
p.Y;
l
c1 - P.l
)1-Y; (9.25)
Exercise 9.7
Show that workers with post-secondary schooling are in an even more
advantaged position. To do this, compute the relevant odds ratio.
What is the odds ratio in favour of fixed work between workers with post-
secondary schooling and those with secondary schooling as the highest
educational attainment? This ratio can be computed as follows:
OD4=1
oD4=1/0 no schooling = 3.563
_e_ = l1 46
{}D4=l!D3=l = - Q
D3=1
oD3=1/0 no schooling e
1.124 .
1
p = 1 + 1 / e--0.885 = 0.29
= -2 (-151.10 + 150.900)
= 0.41 (9.29)
a value which leads us to accept the null hypothesis that the restricted
version of the model can be maintained.
As shown in Tables 9.5 and 9.6, the computer output of a logit regres-
sion will generally report a chi-square statistic which tests the hypothesis
whether all variables (apart from the constant term) can be dropped from
the regression. Consequently, the corresponding number of degrees of
freedom equals the number of variables included in the model (respec-
tively, five in the unrestricted version and three in the restricted variant
of the model). In both cases the null hypothesis that all variables can be
dropped is rejected by the data. Sorne computer programs may also report
a pseudo R 2 for logit regression. Following Aldrich and Nelson (1984,
quoted in Hamilton, 1992: 233) this pseudo R 2 is computed as:
326 Econometrics for developing countries
(9.33)
(9.34)
~W¡ L~ = /31 ~W¡ + /32 ~ X2¡ + ... + /3m ~W¡ Xmj (9.36)
In doing so, make sure not to introduce an explicit constant term when
carrying out regression (9.36).
.9
.8
.7
E .6
(])
e
Cll
E .5
(i;
a.. .4
.3
.2
.1
10 20 30 40 50 60 70 80
Age
sra1a•M
Figure 9.8 Conditional effects plots
Logit transformation, modelling, regression 329
conditional effect plot, meaning that the plot is conditional upon both D3
and D4 being equal to O. Similarly, we could construct a conditional effect
plot for D3 = 1 and D4 = O and another one for D3 = O and D4 = l.
Figure 9.8 shows these three conditional effect plots all on one graph.
Each curve plots the conditional probabilities (given the level of educa-
tional attainment) against the age of the worker. Distances between the
curve for a given age show the importance of different levels of educa-
tional attainment. In this case, a higher level of educational attainment
increases the predicted probability of getting a permanent job. Notice, in
particular, that post-secondary education strongly boosts the probabilities
of getting permanent work.
Logit regressions, like linear regressions, can provide us with useful sum-
maries which give us insight into the data. But they can also mislead us
in much the same way as linear regressions do. Problems like multi-
collinearity, non-linearity, leverage and infiuence also arise in logit regres-
sion. In general, it is a good idea not to attempt logit regression when one
or more of the explanatory measurement variables is strongly asymmetric.
lt is best to transform this variable to avoid the problem of leverage and
infiuence. Furthermore, we should always supplement our logit regressions
with residual analysis, for which we may use Pearson residuals.
The Pearson residuals (Hamilton, 1992: 235-8) in logit regression are
not calculated with respect to each individual case, but with respect to
identical combinations of X values (or X patterns). If J denotes the
number of unique X patterns (J :s: n) and m¡ denotes the number of cases
with the jth X pattern, the Pearson residual is defined as:
{\
Y-mP.
(9.37)
{\
where P¡ is the predicted probability of cases with the jth X pattern and
Y¡ is the sum of observed Y values for cases with the jth X pattern.
At first sight, the Pearson residual may appear forbidding, but its
rationale is easy to grasp. In the numerator each group j (which may
include only one observation or more) refiects observations on Y, the
dependent dummy variable, for identical values on the explanatory
variables, X. Hence, M¡"P¡ = the predicted number of successes in this
group, and Y¡ = the actual (observed) number of successes, such that the
difference is the unexplained number of successes. The denominator is
just the standard deviation of the binomial distribution of the number of
successes.
The Pearson chi-square statistic (ibid.: 236) is the sum of squared
Pearson residuals:
(9.38)
330 Econometrics for developing countries
40
o
Q)
~ 30
O"
'i'
:.eu
en
"e:
o
~ 20
Q)
a.
.s:
Q)
Ol
e:
~ 10
ü
o
o
o 0.5
Predicted probability
s1a1a•M
Figure 9.9 Poorness of fit plot for logit regression
The closer Pearson's chi-square is to zero, the better the model fits the
data. In fact, in a saturated model Pearson's chi-square statistic will be
zero since all residuals equal O.
We can now obtain a measure of 'poorness-of-fit' of a particular X
pattern by calculating the change in Pearson's chi-square that results from
deleting all cases with this particular X pattern. Hence:
Ll X2p(j) = the change in Pearson's chi-square as a result
of dropping all cases with the jth X pattern (9.39)
A 'poorness-of-fit' plot can be obtained by plotting the change in Pearson's
chi-square as a result of deleting all cases with X pattern j against the
predicted probabilities, Pt Figure 9.9 provides such a plot for the logit
regression in Table 9.6.
When data points are situated near the horizontal axis in this type of plot
it means that the predicted probability does not differ much from the
observed probability and, hence, Pearson's chi-square will hardly change as
a result of deleting those cases which correspond to this particular X
pattern. A point which lies far above the horizontal axis fits the model
poorly. If it is situated on the right-hand side where the predicted proba-
bility equals one, it means that the actual data show that the particular char-
acteristic (Y = 1) is not (predominantly) present among cases with this
Logit transformation, modelling, regression 331
configuration of X values. An outlier on the left-hand side would indicate
that the data show that the characteristic is present while the model
predicts that it is not.
In fact, the outlier in Figure 9.9 corresponds to only one individual.
This was a man, 47 years old, with post-secondary education, who did not
have a permanent job. Our model predicts that the corresponding prob-
ability of being in a permanent job is very high, which explains the large
change in Pearson's chi-square provoked by deleting this observation from
the sample. Diagnostic plots such as that in Figure 9.8 allow us to spot
the unusual in a large data set. Once more, this illustrates the advantage
of using graphical tools in analysis.
Exercise 9.8
Use the data set INDIA (Indian worker data) to redo the estimation of
the logit model discussed above along with the conditional effect plots
and poorness-of-fit plot.
10.1 INTRODUCTION
In Part III we discussed problems specific to cross-section analysis. We
now turn our attention to time-series data. In such data the assumption
that the error terms from successive observations are uncorrelated is
frequently invalid; that is, investigation will find the residuals to be auto-
correlated. A likely cause of autocorrelation (which is itself discussed more
fully in Chapter 11) is that the series are non-stationary - a concept which
is defined in section 10.2.
Non-stationarity is a very serious matter: regression of one non-
stationary variable on another is very likely to yield impressive-seeming
regression results which are wholly spurious. Section 10.3 illustrates how
spurious regression may arise and section 10.4 presents formal tests for
stationarity. One possible means of avoiding spurious regression is to
transform the data so as to make them stationary - such transformations
are discussed in section 10.5. (The other solution to the problem is the
application of cointegration techniques which allow the estimation of non-
spurious regressions with non-stationary data; these techniques are
discussed in Chapter 12.) Section 10.6 summarises the main points from
the chapter.
320
300
280
260
o 240
o
íi
,.._ 220
a'.)
~ 200
x
Ql 180
"O
-~ 160
~
·g_ 140
Q; 120
E
::J
(/)
e: 100
o
() 80
60
40
20
o
1972 1984 1986 1988 1990 1992
20
e:
Q)
10
u
(¡;
E,
.r:: o
~
e
C>
t -10
o
c.
X
Q)
(ij
Q)
a: -20
-30
-40 ~~-'---'----'-~-'---'----'-~-'---'-----'~-'----'---'~--'---'---''----'----'-~L....J
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988
If the moments are not time invariant then the variable is non-stationary.
The fact that many socioeconomic data are non-stationary has the very
serious consequence that regression results calculated with such series may
be spurious.
*
5 *
*
*
'O
Q)
*
Ol 4 *
Ol
g
><
Q)
*
"O
3 *
.!: *
Q)
(.) *
"§_
•* *
(i; 2 *
E
::i
(/)
e:
o •*
ü
•* *
o
6 7 8 9 10 11 12 13 14
Money supply (logged)
Figure 10.3 Infiation and money supply: scatter plot of CPI against M2,
1966-92
1.0
0.5
-0.5
-1.0
-1.5 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
o 6 12 18 24 30 36 42 48 54 60
o 5 10 15 20 25 30 35 40 45 50 55 60
Figure 10.5 AR(l) process: 13 1 = O; 13 2 = 0.5
Trends, spurious regressions and stationarity 341
lation of the random walk. Despite the fact that a random walk is gener-
ated by a succession of unrelated error terms the series displays apparent
trends. In this example the disproportionate number of negative errors
for observations 30-60 creates an apparent strong downward trend over
this period. Even a quite short run of error terms of the same sign will
send a random walk off on a seeming trend.
If the 13 2 coefficient exceeds unity then this 'trend' factor dominates the
series, since the variable in each period is its own past value times sorne
multiple. As may be seen from Figure 10.7, in which 13 2 = 1.1, the resulting
escalation in the X value soon drowns out the error term. The similarity
between this graph and that of the CPI for Tanzania (Figure 10.1) should
be readily apparent. It seems that the AR(l) model with a coefficient of
greater than unity would be a good representation of this data series.
A negative value of 13 2 results in oscillations, since the negative coeffi-
cient will reverse the sign of X in each successive period - though it will
sometimes happen that the error term will cancel out this reversal so that
two or more X values of the same sign may be observed. This cancelling
out effect is more likely if 13 2 is relatively small, resulting in a pattern of
oscillations which will appear more muted. In Figure 10.8 132 = -0.5 and
the 'jagged' nature of the time series is obvious.
What difference is made by non-zero values of 131 ? In the case when
132 = O then X will display random ftuctuations around a mean of 13 1. If
2.0
1.0
o.o
-1.0
-2.0
-3.0
-4.0
-5.0
--6.0
-7.0
-8.0
o 5 1o 15 20 25 30 35 40 45 50 55 60
Figure 10.6 AR(l) process: 13 1 = O; 13 2 = 1 - random walk
342 Econometrics for developing countries
300~~~~~~~~~~~~~~~~~~~~~~~~~~
250
200
150
100
50
o 5 10 15 20 25 30 35 40 45 50 55 60
o 5 10 15 20 25 30 35 40 45 50 55 60
o 5 10 15 20 25 30 35 40 45 50 55 60
o 5 10 15 20 25 30 35 40 45 50 55 60
B2 = 1 : xt = Xo + f31t + ¿ E¡
;~1
(10.6)
From this expression we can see that the value of X at time t depends on
three factors: (a) the initial value, X 0 ; (b) the amount of drift; and (c) the
sum of the current and all past error terms. Hence an error in one period
affects the value of X in all future periods: the series has a perfect memory.
This feature of a random walk may be contrasted with modelling X as
following a deterministic trend, that is:
(10.7)
Superficially equation (10.7) looks very like equation (10.6), but in fact
the structure of the error term is wholly different. In the trend, model X
follows the trend with random ftuctuations about that trend - but it always
returns to the trend as an error in one period is immediately 'forgotten'
in the subsequent period. By contrast, in a random walk the error is
embodied in all future values of X so that the series may stray a long way
from its underlying 'trend' value, and is not 'drawn back' to the trend as
Trends, spurious regressions and stationarity 345
Table 10.1 Series resulting from different values of 13 1 and 13 2 in AR{l) model
131 = o
13 2 = O X is just random error in X fluctuates in random manner
each period. No pattern around mean of 13 1
will be discernible
O < 132 < 1 X fluctuates around O with X fluctuates around mean of
'sorne memory' resulting 13 1/{l - 132) with sorne patterns
in short patterns
O > 13 2 > -1 X fluctuates around O in an X fluctuates around mean of
oscillatory manner 13 1/(1 - 132) in an oscillatory
manner
132 = 1 Random walk Random walk with drift
132 > 1 Explosive ( exponential) growth Explosive (exponen tia!) growth
132 < -1 Ever larger oscillations Ever larger oscillations
That is, the expected (squared) difference between X and its expected
value increases with time, which is to be expected as the 'history of errors'
will have a greater cumulative impact, allowing the series to wander away
from its underlying trend as time goes by.
Now consider the case of a variable with lr3 2I < l. In equation (10.5),
as t tends to infinity X can be written as:
/31 ~ .
l/321<1: xt ~ -_- + ~ /3 ~ €1-l as t ~ 00
1 /32 i=O (10.10)
The initial value is no longer relevant as the series forgets about events
too far in the past. Similarly the impact of a specific error term decreases
as it moves further into the past. The second component from equation
(10.5) is in this case equal to the expected value of ¡3 1/(1 - ¡3 2). The vari-
ance, as t ---+ > oo, is given by:
t-l t-l
E[Xt - E(X1)] 2 = ~ ~ /Íi/3~ E[ E¡_¡ E¡_¡]
i=O j=O
o 5 10 15 20 25 30 35 40 45 50 55 60
-- X --e- y
Exercise 10.2
Use a spreadsheet to construct an AR(l) model and examine each of the
ten possible parameter combinations shown in Table 10.1.
Exercise 10.3
Using the graphs you drew for exercise 10.1, make an assessment of prob-
able parameter values for each series on the assumption that they may
be described by an AR(l) process.
Trends, spurious regressions and stationarity 349
Exercise 10.4
Use a spreadsheet to construct two random walks (generated by indepen-
dent error terms) and regress one on the other. Note down the resulting t-
statistic. Repeat a further 49 times. In what percentage of cases do you find
a significant result? What are the implications of your analysis for OLS
regressions using time-series data? (You may also do this exercise using an
econometric package with the facility to generate random numbers. If you
do use such a package you should also check your regression results to note
in which case R 2 > DW.)
Exercise 10.5
Can the problem of spurious regression arise m cross-section data?
Explain.
o 100 200 300 400 500 600 700 800 900 1000
The fact that many economic series appear to fallow a random walk
has two important implications. First, there do not exist long-run trend
values of these variables, from which they may depart but to which they
eventually return - a series of shocks can send the variable off on a wholly
different path far the rest of time. Second, as already mentioned, OLS
cannot be applied to non-stationary series.
The inapplicability of OLS is equally relevant to trend stationary
processes as to difference stationary processes. But if a variable were to
be a trend stationary process then OLS would be applicable to the
detrended series far that variable. But detrending a difference stationary
process, such as a random walk, <loes not make it stationary - so OLS is
still invalid. To apply regression techniques to a DSP, differencing must
be applied to make the series stationary. Differencing a TSP will also
result in a stationary series. That is, we can treat a TSP as if it were a
DSP and achieve valid results. But if we treat a DSP as a TSP we would
be led to the invalid application of OLS.
As mentioned, the first difference of a non-stationary variable may well
be stationary. Figure 10.13 shows the first differences far the lag of the
Tanzanian CPI and M2 (as we discuss on p. 360 these variables approxi-
mate the rate of infiation and monetary growth respectively). Whereas
Figure 10.1 showed the logged CPI to be clearly non-stationary the same
is not so obvious of its first differences.
Trends, spurious regressions and stationarity 351
0.4
0.3
0.2
0.1
o~~~~~~~~~~~~___.____.____.____.____.____.____.____.____.____.____.____.____.____.__,
Figure 10.13 First differences in logged Tanzanian CIP and M2, 1967-92
Ayi=l,146.Yt-l +e/
from which the RSS is the unrestricted sum of squares RSSu, shown for
each of the three series as the top line of Table 10.3. To examine if the
series is a random walk without drift we impose the restrictions ¡3 1 == ¡3 2
== O and ¡3 3 == 1, so that we estimate:
(10.16)
and use the resulting RSS (RSSRI in Table 10.3) to calculate the usual
F-test (see Chapter 6). The appropriate critica} value is given by the test-
statistic <1> 2.
In the case of ln(CPI) and ln(M2) the critica} value at the 5 per cent
level is 5.68, and is a bit larger than this for XGROW. (The 10 per cent
critica} value for a sample size of 25 is 4.67.) Hence we accept the null
hypothesis in the case of ln(CPI) and reject in the case of ln(M2). The
result for XGROW is clase to the critica} value. Given the weakness of
the test in smaller samples it is probably best to reject the null and proceed
to the next stage.
Accepting the null hypothesis indicates the variable to be a random
walk without drift. Not shown in the decision tree is the suggestion of
Holden and Pearson that we may now verify this result by imposing the
restriction that ¡3 2 == O in equation (10.15). The RSS from this equation is
then used to test the joint restriction ¡3 1 == O and ¡3 3 == 1 (for which the
appropriate RSS has already been derived, i.e. RSSRI) using the critica}
values given by <1> 1• In the case of ln(CPI), the RSS, once ¡3 2 is dropped
from equation (10.15), is 0.082, resulting in a calculated F-statistic of 1.77.
Trends, spurious regressions and stationarity 355
This value is below the critical value of 5.18 at the 5 per cent level, so
we accept the null hypothesis, thus confirming that ln(CPI) is a random
walk without drift.
Step 2 If the null from Step 1 is rejected we test instead whether the series
may be a random walk with drift. That is, 13 2 =O and 13 3 = 1, so we estimate:
fj.Xt = 131 + 134/j.Xt-l + E3,t (10.17)
The resulting RSS is shown as RSSR 2 in Table 10.3 (not for ln(CPI) which
has been established to be random walk without drift). Relaxing the
restriction that the intercept term is zero greatly reduces the RSS for the
ln(M2) series. Again the F-test is used, with the appropriate critical values
given by <1> 3 , which is 7.24 at the 5 per cent level for a sample size of 25.
Clearly we can accept the null for ln(M2), but have to reject for XGROW.
Logged money supply is non-stationary, that is, it follows a random walk
with drift.
If the null is accepted we may wish to verify the result by calculating
the t-test for the hypothesis that 13 3 = 1, using the critical value T3 •
Estimation of equation (10.15) for ln(M2) results in an estimate for 13 3 of
0.9088 with a standard error of 0.1074. The resulting t-statistic is -0.849,
compared to the critical value at 5 per cent of -0.80. This result casts
sorne doubt on the conclusion that ln(M2) is a random walk, but given
the marginal nature of the result ( the null hypothesis is easily acceptable
at the 10 per cent level) and the serious consequences of treating a random
walk as a non-stationary series, we had best proceed on the assumption
that ln(M2) is indeed a random walk with drift.
Step 3 If the null from Step 2 is rejected, the null hypothesis that
13 3 = 1 should now be tested with a t-statistic compared to a critical value
given by the standard normal tables. The estimated coefficient in the case
of XGROW is -0.4626, so the calculated t-statistic is -4.10. This value is,
in absolute terms, far greater than the N(0,1) critical value of 1.96 at the
5 per cent level, so we reject the null.
Step 4 If the null from Step 3 is rejected we know that the variable does
not have a unit root. We can test if it is stationary around a constant mean
or around a deterministic trend (i.e. is a trend stationary process, TSP).
Since there is no unit root, the usual t-tests are valid and we may use the
t-statistic, compared to the critical value from the usual t-tables to test the
null hypothesis that 132 = O. The estimated coefficient on the time trend
in equation (10.15) for the XGROW series is 0.225, so that the SE of
0.834 results in a calculated t-statistic of 0.27. This value is far below the
critical value of 3.16 at the 5 per cent level with 13 degrees of freedom,
so we accept the null, i.e. there is no deterministic trend.
356 Econometrics far developing countries
It therefore seems that XGROW is non-stationary (confirming our initial
visual impression), but it should be recalled that the application of these
techniques to a sample of less than 25 observations is not strictly valid.
The problem of small sample size is one that applied work, especially with
developing-country data, continually runs up against. Data quality is
poor, and where there are data, there are not many. Modern time-series
techniques require large samples, which we do not have, but traditional
techniques are simply invalid. There is no ready solution to this dilemma.
An important caveat to this procedure is that the variable may be 1(2)
(most likely with price series), in which cases the restricted model will
also be rejected at Steps 1 and 2. However, in this case the t-statistic in
Step 3 will be positive rather than negative (ie the estimated coefficient
is greater than one), so the variable is non-stationary. We may then check
the stationarity of the first differences.
Exercise 10.5
Replicate the results shown in Table 10.3.
Exercise 10.6
Test the stationarity of the data series you compiled for exercise 10.1.
Exercise 10.7
Use a spreadsheet to create two series, one following a trend with a
stochastic component (a TSP series) and the other a random walk with
drift (a DSP series). Alternatively you may use the data given in Appendix
10.1 for this exercise. Carry out an ADF test on: (a) each of the two series;
(b) the residuals from regressing each series on a constant and time;
and (c) the first differences of each series. Comment on your findings.
294.8
2.8
2 3
Sub-sample
Figure 10.15 Box plot for successive sub-samples of untransformed CPI data
358 Econometrics for developing countries
We could proceed by trial and error using the ladder of transforma-
tions, but section 7.4 above illustrated an easier way to determine which
transformation is likely to do the trick, which may be explained as follows.
Equation (7.4) was:
(10.18)
where ªx is the standard deviation of variable x, µx its mean and k a
constant indicating the data transformation required to stabilise the vari-
ance of the series. This equation suggests a practical method to check
which transformation is likely to be best. First, plot the logarithm of a
measure for spread against the logarithm of a measure for level (average)
for successive slices of a time series. Usually we use the interquartile range
and the median instead of the standard deviation and the mean because
the former are more robust. If the successive points (plotted accordingly
to this method) roughly approximate a straight line, the pattern of covari-
ance behaviour level and spread is stable through time: the lag-linear
model given by equation (10.18) is the appropriate one. Having plotted
this line, computing its slope indicates which transformation is likely to
do the trick, since this slope corresponds to the variable k.
Figure 10.16 plots the logarithms of the interquartile range for each
slice against the logarithm of its corresponding median (see Table 10.4).
The scatter of the three points shows a steady increase which can be
4.0
3.5
3.0
2.5
((
º 2.0
1.5
1.0
1.5 2.0 2.5 3.0 3.5 4.0 4.5
Median
Figure 10.16 Logged IQR against logged median for Tanzanian CPI
Trends, spurious regressions and stationarity 359
Table 10.4 Medians and interquartile ranges
Median Interquartile range
Value Lag Value Lag
1966--74 4.40 1.48 1.85 0.62
1975-83 11.90 2.48 13.00 2.56
1984-92 131.10 4.88 152.00 5.02
QL---1.-1..-1..-1..-1..-1..-1..-1..-1..-1..-1..-1..-1..-1..-1..--1.--1.--1.--1.--1.--1.--1.--1.--1.--1.--1.--1.-'
where
Although the t-statistic for the slope coefficient is high, the R 2 of -0.26
clearly indicates that we have invalidly omitted the intercept, which will
bias the estímate of the slope coefficient. This fact alone should cast doubt
on whether the regression of ln(CPI) on ln(M2) is the true model.
Including an intercept gives (standard error in parentheses):
dln(CPI) = 0.13 + 0.25dln(M2) R 2 = 0.06
(0.05) (0.21) DW = 0.74 (10.23)
The R 2 remains pitifully low. More importantly, the slope coefficient is now
only 0.25, which is far less than the 0.91 obtained in equation (10.3). The
growth of the money supply now appears to have no significant effect on
the rate of inflation. What happened? The regression between the levels of
both variables looks good, but as soon as we switch to regressing changes
in these levels on one another the results are disappointing. It appears,
therefore, that first differencing took the wind out of our sails. Why?
The problem líes in the regression between the levels of both time
series. In regression analysis, the total variation of a dependent variable,
Y, around its mean is in part explained by the variation in the indepen-
dent variable, X, around its mean, leaving the residual variation as the
remainder. If, as is the case in our example, both Y and X are stochastic
variables, we assume that they are jointly sampled from a bivariate normal
distribution. This implies that we assume that each variable (as well as
the error term) is distributed normally with a constant mean and a
constant variance. The regression line then explains the variation in Y
362 Econometrics far developing countries
around its constant mean in terms of the variation in X around its constant
mean. Hence, each Y1 ( or X 1) is assumed to be sampled from the same
distribution with a constant mean and variance. However, in our example
this clearly is not the case. The problem is that neither the dependent nor
the independent variable has a constant mean through time. Successive
observations of either of these variables appear to be drawn from different
distributions with progressively higher means.
Take another look at Figure 10.1. It seems far-fetched to assume that
this is a sample of observations drawn from a distribution with a constant
mean or variance. The log transformation we subsequently applied to this
variable in regression allows us to stabilise the variance, but obviously not
the level. The reason is that logarithms preserve order although they alter
the shape of the data.
Hence, the results we obtained by regressing the levels of both log trans-
formed variables may be as much due to the fact that each variable is
constantly on the move in terms of its level as to any real relationship
between them. This is what make regressions with time series so prone
to the danger of spurious correlations. In Chapter 12, on cointegration,
this issue will be taken up again. There we shall show that in sorne
instances it is possible to derive meaningful results by regressing time
series which are not stationary over time.
Why did differencing alter the picture so dramatically? The reason is
that taking first differences often yields a new time series which no longer
manifests a changing mean through time. Consequently, the assumptions
of ordinary least squares are more likely to be valid in practice. We can
now relate the variation in Y around its constant mean to the variation
in X around its constant mean. The regression, therefore, is much less
likely to produce spurious results. It follows that with time series it is
often useful to transform the original variables in a manner which stabilises
their means and variances through time. Regressions with such trans-
formed variables are less likely to fall prey to spurious correlations.
Indeed, we can use the fact that the regression in differences yields
different results to levels regression as a test of whether or not a regres-
sion is spurious.
Exercise 10.9
Calculate the series of first differences for each of your series from exer-
cise 10.1, having first applied the appropriate power transformation as
determined in exercise 10.8. Conduct the ADF on each of the resulting
series. How do your results compare with those obtained in exercise 10.6?
ADDITIONAL EXERCISE
Exercise 10.10
Generate 61 observations for a variable following a random walk without
drift. Regress the variable on its own lag 200 times, each time noting the
value of the estimated slope coefficient and the value of the t-statistic for
the test of the null that the slope is unity. Plot a histogram of the slope
coefficients. How often is the null rejected at the 5 and 10 per cent levels?
Why are these results a surprise? What do they tell you about testing for
stationarity?
Trends, spurious regressions and stationarity 365
APPENDIX 10.1:
GENERATED DSP AND TSP SERIES FOR EXERCISES
NOTES
1 This specification is not appropriate, if only because the relationship between
the CPI and time is clearly nota linear one. More importantly, the series may
be a difference stationary process rather than a trend stationary process - this
distinction is discussed on p. 344--5 and 349-50.
2 The trend is given by the expected value, which is:
11.1 INTRODUCTION
Autocorrelation (also called serial correlation) is the violation of the
assumption that E( E;E¡) = O. When the error in one period is related to
the error in another period then OLS is no longer BLUE. Moreover, the
R 2 may be overestimated, standard errors underestimated and t-statistics
overestimated. If the regressors include a lagged dependent variable then
OLS estimates are biased.
The presence of autocorrelation in the residuals of the estimated model
is, however, often a result of model misspecification, rather than 'genuine'
autocorrelation of the model error term. Recall that formal tests of the
property of the error term are carried out on the residuals. But, whilst
the error is a part of the data generation process, the residuals are a
product of our model specification. Hence testing for autocorrelation
should in the first instance be interpreted as a test for misspecification. A
range of techniques are available to detect autocorrelation. Here we will
present graphical methods, the runs test and the Durbin-Watson statistic,
we also define Durbin's h which should be used in the presence of a lagged
dependent variable.
This chapter is organised as follows. section 11.2 explains in more detail
what autocorrelation is and why it is a problem and section 11.3 considers
the various reasons why autocorrelation may be present. Formal tests for
autocorrelation are presented in section 11.4, and section 11.5 discusses
how to <leal with autocorrelation. Section 11.6 summarises the chapter.
(11.1)
Misspecification and autocorrelation 367
we are concerned here with the assumption that E( E1E1_,) = O. (We are
now using a t subscript as autocorrelation is a time series problem.) Put
more simply, because the error for one observation is large this <loes
not mean that the next one will be. Indeed, the fact that an error term
is positive should have no implications for whether the next term is
positive or negative.
To understand the implications of autocorrelation for residual plots and
the basis for tests for autocorrelation it is useful to spend sorne time
looking at the autoregressive (AR) model, introduced in Chapter 10. The
AR model is serially correlated by construction, so it is a good device for
seeing what such an autocorrelated error will look like in practice. But
you are not required to construct these AR models as a part of testing
for autocorrelation in the normal course of events.
Suppose that the error term, E, in the model of equation (11.1) is gener-
ated by an AR(l) process with a white noise error (v):
v - N(O, u~) Vt (11.2)
Figure 11.1 shows the case in which p = O, so that E is just equal to that
period's error term, v. 1 We know here, because we have generated the
data, that there is no serial correlation - each error term is indeed inde-
pendent of the others. How can this fact be seen in the residual plot? If
the different terms are independent then we should not see any patterns
in the data - for example, there should not be long runs of positives
1.0
0.5
e:
.Q
·¡¡:;
a.
w
-0.5
-1.0 ~~~~~~~~~~~~~~~~~~~~~~~~~
o 10 20 30 40 50
Period
2.0
1.5
1.0
0.5
e
.Q
·¡¡;
c.
o.o
w
-0.5
-1.0
-1.5
-2.0
o 10 20 30 40 50
Period
1.0
0.5
e:
..Q
·¡¡;
a.
o.o
UJ
-0.5
-1.0
-1.5
o 10 20 30 40 50
Period
(11.4)
Equation (11.4) shows that the ES generated by the AR(l) process are
indeed autocorrelated, since the covariance of any pair of ES is not zero
(though it tends to zero as the ES get further apart). However, equation
(11.3) shows that the variable is homoscedastic. 2
370 Econometrics far developing countries
Why is autocorrelation a problem?
The proof that LS estimators are BLUE uses the assumption that the error
term in the model is not autocorrelated. Violation of this assumption means
that the proof is no longer valid, and it is indeed the case that OLS is
no longer BLUE. 3 We do not, however, need this assumption of no auto-
correlation to prove unbiasedness, so the estimates remain unbiased
(except when the model contains a lagged dependent variable, as discussed
below).
In addition to the loss of efficiency, the residual variance no longer
provides an unbiased estímate of the error variance. Hence the attempt
to construct confidence intervals or to test hypotheses about the coeffi-
cients is made invalid, since the standard errors are no longer applicable.
When there is positive autocorrelation - which is the more common sort
in practice - then the estimates of the error variance have a downward
bias. Hence our confidence intervals are narrower than they should be
and the calculated t-statistics inflated, so that there is a danger that we
shall incorrectly reject the null that the variable has no significant impact.
Likewise the R 2 and related F-statistic are likely to be over-estimated.
(11.8)
'¡¡j
o
::J
"O
·¡¡;
Q)
a:
-2
-4
-6
-8
1960 1962 1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988
Exercise 11.1
Use the data given in Table 11.1 to regress output on the price index and
fertilizer input. Draw the residual plot and count the number of runs.
200
190
180
Cñ 170
o
o
s. 160
e
o
:m::J 150
c.
o
a... 140
130
120
11 o
100'---'----'---'~~~-'----'--'--'---'--'----'--'-~---'---'~~~-'----'--'--'---'---'
1970 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992
Figure 11.5 Population of Belize and fitted line from linear regression
1970-92 and the fitted line from the linear regression of population on
time. Clearly the attempt to fit a straight line to these data is inappro-
priate - and the result can be clearly seen that the residuals are all positive
at first, negative in the middle years, and then positive in the last years.
This very clear pattern in the residuals is also displayed in Figure 11.6.
This case is one in which the autocorrelation results from the incorrect
functional form: the linear regression is not the right one, and this fact
affects the residuals so as to induce a pattern of autocorrelation. The auto-
correlation is a product of the incorrect functional form and not a property
of the error in the true model.
From looking at the graph we might suspect that the true model is to
regress the log of population on time. 4 As may be seen from Table 11.2,
this data transformation <loes improve the results. Unfortunately, as Figure
11.6 shows, the residuals from such a specification are a bit lower but still
display marked autocorrelation; there are still only three runs and the
results don't pass the rule of thumb for spurious correlation (R 2 > DW).
What is going on here?
If we look at the data it is possible to detect that the rate of increase
in population is lower in the earlier period (up to about 1977) than in the
later, whereas so far we have assumed the slope coefficient to be constant
throughout. To allow for this break in the data we regressed logged popu-
lation on time and an intercept and slope dummy with the break point
374 Econometrics far developing countries
10
-2
-4 l_L__J~L_L_.J.._-'-...J..._-'---'--'---'--__¡_--'----'-__J.__.JL_L_.J.._-'-...J..._-'---'-_J_J
1970 1972 1974 1976 1978 1967 1980 1982 1984 1986 1990 1992
in 1977. The residuals from this regression are also shown in Figure 11.6.
There are now ten runs (compared to the previous three runs) and the
danger of having to reject the null hypothesis of no serial correlation is
considerably reduced.
A word of caution must be inserted here. The econometric interpreta-
tion of our results is that there is a structural break in the regression of
Misspecification and autocorrelation 375
Belize's population on time, with a higher growth rate (slope coefficient)
in the later years than the earlier. But in fact we will often find such
results with population data. Typically a country's population is enumer-
ated once every ten years in a census; population figures for non-census
years are estimates based on the most recent census and observed intra-
census trends. When the data from a new census become available there
will be a break in the intercept as the actual figure will not equal the
estimate made on the basis of the previous census, and a new popula-
tion growth rate will be used for future estimates as this figure is also
revised. The structural break in this case is therefore a product of the way
in which the data are produced rather than any sudden change in the
proclivity of the people of Belize. None the less, the example serves
to illustrate how an inappropriate functional form may result in residual
autocorrelation and, also, how adding a further variable to a regression
may remove autocorrelation. It is to this latter possibility that we now
turn our attention.
-1
1960 1962 1964 1966 1968 1970 1972 1974 1976 1978 1980 1982 1984 1986 1988
Figure 11. 7 Residuals from crop production function from expanded regression
15 ,__ D
D D D
D i:ri D D ~ D
10 ~
D D D
D oº
D
D CtJ D D D D
5r
D i:ri D D D
cR:i
D ºº Do D Cb D Do
rn::J D~ D ~D D D D
D
"O
·¡¡;
o D
D D D ro
r1 r1
D D u
D
CD D
a: D
D
D D
-5 ,__ D D
D Do Do D oº
00 D
Do D
D
D D
-10 ~ D
D D DO D
D ºº
D D D
D
-15 ~ D
D
D
-20 ~
1 1 1 1 1 1 1 1 1 1
-25
11 21 31 41 51 61 71 81 91 101
Observation (alphabetical)
Figure 11.8 Residuals from regression of life expectancy on income per capita,
alphabetical ordering
378 Econometrics for developing countries
20
15 D
D D D
D D ¡§11.l D D
10 - D
D C0 ci:P D
oº D ºo
Cf:¡
D D D D Do
5 - D Do
D ¡jl D D D D ºoo
D
Cii
:::J D
ºº
D
D C2:J D ctfü D
"O
·¡¡;
o D D D D
D
~~
(J)
a: D D
mº
-5 - D D
º¿:i D D º!f:i
D D D
D
Doo D D
-10 - D D
D D
rn51oo
D D D
D
-15 - D
D
D
-20 -
1 1 1 1 1 1 1 1 1 1 1
-25
19 96 109 63 29 23 64 43 62 50 52
Observation (income)
Figure 11.9 Residuals from regression of lite expectancy on income per capita,
income ordering
90
80
70
>.
o
e
~
(J)
c. 60
)(
(J)
2
:.:::¡
50
40
30
o 2 4 6 8 10 12 14 16 18 20
lncome per capita ('000)
Figure 11.10 Scatter plot and fitted line for regression of lite expectancy on
income per capita
Misspecification and autocorrelation 379
Exercise 11.2
Compile time-series data for the population of the country of your choice.
(The data for Belize are in data file BELPOP and for Peru in PERU.)
Regress both population and logged population on time and graph the
residuals in each case. How many runs are there in each case? Comment.
Can you respecify the equation to increase the number of runs?
Exercise 11.3
Repeat exercise 11.2 using time series data for the country of your choice
(or from data file PERU) for: (a) real manufacturing value added; (b)
infant mortality rate; and (c) terms of trade. Comment on your results.
Exercise 11.4
Using the results in Table 11.3, use an F-test to test the hypothesis that
the two rainfall variables may be excluded from the regression. In the
light of your results, comment on the apparent problem of autocorrela-
tion in the regression of output on the price index and fertilizer input.
Exercise 11.5
Using data given in data file SOCECON, regress the infant mortality rate
on income per capita. Plot the residuals with the observations ordered:
(a) alphabetically; and (b) by income per capita. Count the number of
runs in each case. Comment on your results.
Correlograms
The correlogram is the plot of the residual covariances standardised by
the residual variance. By plotting the theoretical correlograms from
different error generating processes we can learn to spot these processes
380 Econometrics for developing countries
when confronted with these plots calculated using the residuals from actual
regressions. Here we will first consider the correlograms produced by the
error term generated using the AR(l) process, presented in Figures
11.1-11.3 above, and then plot those for the residuals from the crop
production function data.
Figures 11.11-11.13 show the correlograms for the data generated with
the AR(l) process for p =O, p = 0.7 and p = -0.7. When p =O we expect
the O term (i.e. the ratio of the error variance to itself) to be unity, as it
must always be, and all the others to be zero. In practice, the other terms
are not zero, but as Figure 11.11 shows they are comparatively small.
The lack of a pattern in Figure 11.11 stands in stark contrast to that shown
in Figure 11.12 where, as we would expect from equation (11.4) above,
there is a reduction in the covariances as we take errors which are further
apart. Equation (11.4) suggests that when p < O then the covariances
should alternate between negative (for odd differences) and positive (for
even differences) - and this pattern can be clearly seen from Figure 11.13
(for p = -0.7).
Turning from errors generated by a known model, we now plot the resid-
ual correlogram from estimation of the crop production function. Figure
11.14 shows the correlogram for the residuals from the regression of out-
put on the price index and fertilizer input. There is not such a marked pat-
tern as in Figure 11.12, but the high covariance between the residual and
1.20
1.00
0.80
0.60
0.40
0.20
O.DO
-0.20
o 2 3 4 5 6
1.00
0.80
0.60
0.40
0.20
0.00
o 2 3 4 5 6
1.50 ~----------------------------..,
o 2 3 4 5 6
1.00
0.50
0.00
o 2 3 4 5 6
o 2 3 4 5 6
Exercise 11.6
Using the results from your estimation of population against time (exer-
cise 11.2) plot the correlogram for the regression of: (a) population on
time; (b) logged population on time; and (c) your improved specification.
Runs test
So far we have been counting the number of runs as an indication of the
presence of autocorrelation: if there is positive autocorrelation then there
will be rather fewer runs than we would expect from a series with no
autocorrelation. On the other hand, if there is negative autocorrelation
then there are more runs than with no autocorrelation. But so far we have
not said how many runs are 'too many'. In fact it is possible to calculate
the expected number of runs from our residuals under the null hypoth-
esis of no autocorrelation and to construct a confidence interval around
this number. If the actual number of runs is less than the lower bound
we reject the null in favour of positive autocorrelation and if the number
of runs is above the upper bound we reject the null in favour of negative
autocorrelation.
It can be shown that the expected number of runs is:
2 2N1N 2 (2N 1N 2 - n)
sR = nz(n - 1) (11.11)
Exercise 11.7
Use your population regression results to perform the runs test for auto-
correlation for the simple log regression and your preferred specification.
Exercise 11.8
Carry out the runs test on the errors shown in Figures 11.1, 11.2 and 11.3
and comment on your findings.
(11.14)
it follows that:
/\ /\ Le1 e
d"" 2(1 - p) where p = t1 · (11.16)
L e1-1
where p is the slope coefficient from regressing the residuals on their first
lag. The Durbin-Watson statistic has thus been shown to be related to
the estimation of the AR(l) model, and it is indeed a test for first-order
correlation - that is, a non-zero covariance between the error and its first
lag. Alternative specifications of the test are required to test for higher
orders of autocorrelation, but first arder is what we meet most commonly
in practice. 6
From equation (11.16) we can see that if there is no autocorrelation
(so that p = O) then d = 2. As p tends to one (positive autocorrelation)
d tends to zero, and as p tends to -1 (negative autocorrelation) d tends
to 4. The Durbin-Watson statistic thus falls in the range O to 4, with a
value of 2 indicating no autocorrelation; values significantly less than 2
mean we reject the null of no autocorrelation in favour of positive auto-
correlation and those significantly above 2 lead us to reject the null in
favour of negative autocorrelation.
The critica! values for the DW test depend not only on sample size and
the number of regressors, but also the particular values of the regressors.
Consequently it is not possible to give a single critical value; instead two
values are given: an upper and a lower bound. The use of these bounds
is shown in Figure 11.16.
If the calculated value of d is less than the lower boundary (dL)
then reject the null hypothesis of no autocorrelation in favour of the
positive autocorrelation. If it líes above 4 - dL then reject the null in
favour of negative autocorrelation. (Since we more commonly find positive
o du 2 4-du 4
h -(1 -~) J
2
n
1 - n(Var (b1)]
(11.17)
where Var(b1) is the square of the standard error of the coefficient on the
lagged dependent variable and n is the number of observations. The test
may not be used if n·Var(b¡) is greater than one.
The regression results are given in Table 11.5 (which repeats also those
for OLS estimation). Calculate the estímate of the intercept b 1 = b 1*/(l
- p), which equals -19.98.
5 Comparing the two regressions, we see that the DW statistic is now
1.50. This value falls towards the upper end of the zone of indecision,
so the evidence for autocorrelation is much weaker than in the OLS
390 Econometrics f or developing countries
regression, though it may be thought worthwhile to repeat the pro-
cedure (using a new p of 0.25, calculated from the new DW).
Comparison of the slope coefficients from the two regressions shows
price to be relatively unaffected. With C-0 estimation, the fertiliser vari-
able produces the expected positive sign, though it remains insignificant.
The unexpected insignificance of fertiliser is a further indication that we
should have treated the initial autocorrelation as a sign of misspecifica-
tion. In this case, the C-0 procedure has suppressed the symptom of
misspecification, but cannot provide the cure - which is to include the
omitted variables.
Exercise 11.9
Using the Sri Lankan macroeconomic data set (SRINA), perform the
simple regression of IP on Ig and plot the residuals. Use both runs and
DW tests to check for autocorrelation. Add variables to the equation to
improve the model specification (see Chapter 6 where this data set was
used previously) and repeat the tests for autocorrelation. Use the
Cochrane-Orcutt estimation procedure if you feel it is appropriate.
Comment on your findings.
= ~p;v (A.11.2)
~ t-1
i=O
Using equation (All.2) we may get:
=
e 1 e 1_ 8 _t-s~2i2
-p ~ p v- 1_ 8_¡ + cross pro ducts (A.11.3)
i=O
Therefore:
since the expected value of all the cross products is zero (as v 1 is not seri-
ally correlated). From which it follows that:
2
2)
E (e 1 = ª• 2
=
ªV
l _ p2
(A.11.5)
392 Econometrics for developing countries
NOTES
1 The data are generated using the Lotus @RAND command. These variables
follow a rectangular distribution, but are made approximately normal by aver-
aging over 20 such random numbers. (By the central limit theorem the resulting
numbers are approximately normal.)
2 Equation (11.3) is just the special case of equation (11.4) in which s =O.
3 The generalised least squares estimator (GLS) is BLUE, but further discus-
sion is beyond the scope of this text.
4 Such a specification is also desirable because of the interpretation of the slope
coefficient as the population growth rate.
5 However, the sample size is not that large and the value near the lower end
of the interval so we need to be cautious, perhaps by seeking verification from
another test. As we see below, the Durbin-Watson statistic suggests that these
residuals do show autocorrelation.
6 The exception worth noting is that quarterly time series data may well have
fourth-order autocorrelation.
7 And hence is equivalent to GLS, see note 3.
8 This fact means that we cannot readily use the t-statistic to test the signifi-
cance of the intercept when applying the Cochrane-Orcutt procedure. The
appropriate test is beyond the scope of this book.
9 The C-0 procedure is not equivalent to GLS, and therefore not BLUE, unless
the Prais-Winsten transformation is applied.
12 Cointegration and the error
correction model
12.1 INTRODUCTION
Thus far in Part IV we have discussed problems encountered in time-
series analysis: the danger of spurious regression in Chapter 10, and the
problem of autocorrelation more generally in Chapter 11. We have empha-
sised that regression with non-stationary series is generally biased and
inconsistent. Transformations to stationarity, notably differencing, create
their own problems. In this chapter we will present valid procedures for
obtaining regression estimates with non-stationary series. Least square
estimates can be used if two non-stationary series are cointegrated, a
concept we explain in section 12.2. The test for cointegration, of which
examples are given in section 12.3, is to test whether the residuals from
the levels regression are stationary. If these residuals are stationary then
the series are cointegrated. The levels regression will then provide consis-
tent estimates of the long-run relationship. The full dynamic model is
estimated as an error correction model, which is presented in section 12.4.
Section 12.5 concludes.
(12.2)
o 4 8 12 16 20 24 28 32 36 40 44 48
oX + Y1 o Fitted Y1
10
o 4 8 12 16 20 24 28 32 36 40 44 48
oX + Y2 o Fitted Y2
1.1 ~~~~~~~~~~~~~~~~~~~~~~~~~~~
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.01--~-+-~~~~~~--+~++---i'--+---~~~~~--i~-+-~~~~
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6
-0.7
-0.8
-0.9
-1.0
-1.1 l..--'-~--'-~--'-~-'-~'----'~-'-~-'-~-'-~-'--~+-------''-----'-~~~
o 4 8 12 16 20 24 28 32 36 40 44 48
0.6
0.5
0.4
0.3
0.2
0.1
O.O 1----+--i-+--+~-+--+-~+-+-_,_.'-+-+-+--++-++~--+--+-~-+~--+~1---<.-+-~~
-0.1
-0.2
-0.3
-0.4
-0.5
-0.6~~~~~~~~~~~~~~~~~~~~~~~-+~~~
o 4 8 12 16 20 24 28 32 36 40 44 48
Exercise 12.1
Using either the data in data file PERU or data for the country or coun-
tries of your choice (be sure to use real macroeconomic data, not nominal),
plot (a) the dependent variable, independent variable and fitted values;
(b) the residual plot; and (c) the dependent variable and independent
variable both expressed in differences, for the following relationships:
Cointegration and the error correctíon model 399
1 consumption as a function of income;
2 imports as a function of income;
3 the infant mortality rate as a function of income;
4 real growth as a function of export growth.
In the cases where a significant relationship is found, which of these rela-
tionships do you believe to be spurious?
(12.6)
This equation is called the cointegrating regression. We analyse here only
the bivariate case. The analysis changes if more variables need be included
on the right-hand side, but the technique is beyond the scope of this book.
If the residuals from the cointegrating regression are stationary then the
variables are said to be cointegrated. We know that the residuals will have
a zero mean and no trend by construction, so we do not need to apply
the full decision tree presented in Chapter 10. Rather we can proceed
directly to the augmented Dickey-Fuller test without a constant or a trend
or use the DW-based test. However, when testing for the stationarity of
residuals the critica! values used in Chapter 10 are no longer valid. The
correct ones to use, given below in the statistical tables on p. 480, are
slightly larger.
Sorne examples
2.9 5.4
2.8 5.3
•'"'
•' ....... ,,,.,. , r
'' '' o
2.7 '' '' 5.2 ce
''' ''' ' ce
x
¿ 2.6 ,..............:''
(1)
a.
(/)
5.1 ii3
t:: ' ~
o 2.5 '' (1)
a.
>< '' 5.0 ><
()
Q)
2.4 '' ::r
e¡; '' O>
~
'' '' ::::l
4.9 ce
2.3 '' (1)
'O '' ''
''
ª
Q)
'' 4.8
''
Ol (1)
Ol '
o 2.2 ',' '
',, / ?
,,,
...J
1 -LX ·-----LRER 1
+
+
2.75
+
+++
+
++ +
2.50 +
+
X + +
....J ++
2.25 + +
+
+ +
++
2.00 +
1.75 -t---------r-----.-----~-------1
LRER
Figure 12.6 Scatter plot of exports and RER
are laid out similarly to Table 10.3, where we first met the decision tree
for testing for stationarity (except that here we only require the top two
rows of the table). Both variables are found to be I(l). Thus there is a
danger that OLS estimates using these variables may be spurious. To
examine whether or not our results are indeed spurious we must examine
the stationarity of the residuals from equation (12.7).
Since the residuals must have zero mean and no time trend we may
use the augmented Dickey-Fuller test in which the change in the variable
is regressed upon its own lag and a lagged difference term (plus addi-
tional lagged differences if they are necessary to remove residual autocor-
relation). The null hypothesis that the variable has a unit root is the test
of significance of the lagged term, so the normal t-statistic is calculated.
We first regress the change in the residual on lagged residual and check the
DW (this is just the Dickey-Fuller test); if there is autocorrelation we add
the lagged difference and re-estimate. In fact the DW without the lag
is 2.03, so there is no need to proceed to the augmented version of the test.
The t-value in this equation is -2.54 (shown in Table 12.1). The critica!
402 Econometrics far developing countries
Table 12.1 Decision tree applied to exports, RER and residuals from cointe-
grating regression
Exports RER Residuals
RSSu 0.352 0.374
RSSRI 0.608 0.527
(F-stat) (5.33) (3.00)
t-stat -2.54
Result Random walk Random walk Random walk
Note: Exports and RER are mean deviations of logged values.
values for residuals are not the standard Dickey-Fuller ones but are slightly
larger. At the 5 per cent level the critica! value given by Engle and Granger
(1987) is -3.37 for the DF test (and-3.37 for the ADF); Phillips and Ouliaris
(1990) give a slightly lower value of-2.76 (though this value is derived from
much larger samples). 3 Using either value, the null hypothesis is accepted;
that is, the residuals have a unit root so the two series are not cointegrated.
We mentioned in Chapter 11 that an alternative test for non-stationarity
is to regress the series on a constant and look at the DW statistic. When
applying the DW test to a series the result tells whether the variable is sta-
tionary or not, but yields no additional information as to the appropriate
dynamic specification. But, as already stated, in the case of residuals we
know there is a zero mean and no time trend and we are, in any case, only
interested to know whether they are stationary or not. The cointegrating
regression DW ( CRDW) is simply the DW statistic from the levels regres-
sion; in this case the regression of ln(X) on ln(RER). Recall that DW = 2(1
- p). Hence the null hypothesis that p = 1 (i.e. there is a unit root) corre-
sponds to the null DW =O. Engle and Granger (1987) give the appropriate
critica! value at the 5 per cent level to test this hypothesis as being 0.386.
In the example given here the calculated value is 0.824. As the calculated
value is greater than the critica! value we should reject the null of a unit
root in the residuals and thus conclude that the series are cointegrated.
The two tests, ADF and CRDW, thus give different results. Engle and
Granger say that the ADF is the recommended test. The critica! values
of the DW are in fact very sensitive to parameter values. Hence whilst
CRDW may be a quick and easy test to perform, the ADF (which does
not take much longer anyway) is preferable. Hence we conclude that the
results reported in equation (12.7) are spurious. It would be wrong to
conclude on the basis of these results that exchange rate policy has been
the driving force behind Pakistan's recent export growth.
ignore this line for the moment). Even from just looking at this graph we
should at least suspect two things. First, regression of consumption on
income is going to yield a high R 2 (almost certainly in excess of 0.9).
Second, this seemingly good result will probably be spurious, as closer
examination shows that, although both series 'trend' upwards, the year-
on-year changes in consumption do not in fact match particularly well
with the changes in income. That the residuals will be autocorrelated is
already clear from this plot, especially for the 20 years to 1982, during
which there are only three runs.
Our suspicions should be further aroused by the scatter plot (Figure
12.8), from which the autocorrelation is shown by the pattern of points
around the fitted line. This indication of autocorrelation is supported by
the regression results, which are (t-statistic in parentheses ):
In(C) = -1.53 + 1.23ln(Y) R2 = 0.98
(-10.42) (42.15) DW = 0.36 (12.8)
Investigation reveals both logged consumption and income to be
random walks without drift. Hence, the estimated consumption function
is indeed spurious unless the residuals from the levels regression turn out
to be stationary. We see from equation (12.8) that the regression fails the
CRDW as the DW is less than the critical value of 0.38. This result is
confirmed if we carry out the augmented Dickey-Fuller test. As befare
the first stage is the Dickey-Fuller test, but the DW statistic from this
404 Econometrics far developing countries
5.5
5.4
5.3 o
5.2
5.1 o
5.0
e 4.9
o
"üi 4.8
o.
E 4.7
::J
en 4.6 o
e
o 4.5 o
u
"O o
Q) 4.4 o
Ol
Ol 4.3 o
o
_J
4.2
o
4.1
4.0
3.9 o
3.8 o
o
3.7
3.6
4.2 4.4 4.6 4.8 5.0 5.2 5.4 5.6
Logged income
Figure 12.8 Scatter plot for Costa Rican consumption against income
_____ . ., .
5.50
5.25
---
5.00
4.75
4.50-+-...,.---,-,--,.--,-,-,-,--,--...,.---,-,---,---,-,-,-,--,--...,.---,-,---,---,-,-,-,--,--;-,-,---,---.--I
1960 1965 1970 1975 1980 1985 1990
- - LC ------LY - - - LCFIT 1
Exercise 12.2
Use the cointegration test to determine if the relationships listed in exer-
cise 12.1 are spurious for your data.
(
9.8
9.6
9.4
9.2
9.0
8.8
8.6
8.4
o 2 4 6 8 10 12 14 16 18 20
Time
oX +Y
Figure 12.11 shows the same simulation, but now with ¡3 3 = -0.25, so
that only one-quarter of the adjustment process occurs in each period. It
now takes rather longer for equilibrium to be restored. If the impact effect
is larger than the long-run effect then the model simulations will demon-
strate 'overshooting'.
In these simulations we have ignored the intercept, ¡3 1. We should allow
the value of this coefficient to be determined by the data. If ¡3 1 is non-
zero, then ¡3 1 and ¡3 3 become involved in the equilibrium condition,
suggesting that the dependent variable is subject to sorne drift in addition
to the equilibrium relationship and opening a question as to what we mean
by equilibrium. Equilibrium as used in the context of the cointegrating
regression and the ECM means a statistically observed relationship
between the variables over the sample period. This concept does not
necessarily correspond to economic equilibrium.
Turning to the estimated values for the consumption function shown in
equation (12.11), we can first note that it was valid to exclude the inter-
cept, and that the estimates of ¡3 2 and ¡3 3 have the expected sign. The
model converges quickly to equilibrium, with over 90 per cent of the
discrepancy corrected in each period. What can we say about the marginal
and average propensities to consume?
410 Econometrics for developing countries
11.6
11.4
11.2
11.0
10.8
10.6
10.4
Q)
10.2
Cl
o 10.0
_J
9.8
9.6
9.4
9.2
9.0
8.8
8.6
8.4 '---'---'---'----'----'----'-----'----''---'----'----'----'
o 2 4 6 8 10 12 14 16 18 20
Time
oX +Y
As stated above, a unitary elasticity yields a constant APC, and the value
of that APC is 0.88. But, in fact, we did not get a unitary elasticity, but
a value slightly less than one - suggesting that the APC will fall as income
rises (as the percentage increase in consumption is a bit less than that in
income ). Specifically:
C* = e--0.11 --0.03In(Y) (12.15)
Y*
So the APC depends on the level of Y. In fact, the variation is slight. As
shown in Table 12.2, at the mean level of ln(Y) the APC is 0.764, and it
Cointegration and the error correction model 411
Table 12.2 Average and marginal propensities for consumption function
y ln(Y) APC MPC e· dC MPC
(long-run) (impact)
Minimum 124 4.82 0.775 0.752 96.1 96.8 0.682
Average 198 5.29 0.764 0.741 151.5 152.2 0.672
Maximum 269 5.59 0.757 0.735 203.4 204.1 0.666
Note: Calculated from data and estimates for consumption function; details gíven in text.
ranges from 0.775 for the lowest value of Y to 0.757 for the highest. The
constant APC is therefore nota bad approximation, but to calculate its level
we do need to take account of the level of Y (i.e. the APC is not 0.88 as it
would appear to be if we ignored the intercept in the calculation).
The marginal propensity to consume will also depend on the value of
Y and is best calculated by rearranging the formula that:
elasticity = MPC (12.16)
APC
since we know the elasticity to be 0.97. The results of this calculation are
shown in Table 12.2. The MPC also varies over a small range: from 0.735
(for the lowest incomes) to 0.752 (for the highest). This MPC is the long-
run propensity - that is, an increase in income of 100 units will increase
equilibrium consumption by approximately 75 units.
The impact effect of a change in income has to be calculated from the
coefficient on the impact term. The percentage change in consumption
can be calculated as:
et _ = ebiln(Y/Y,_,)
__ (12.17)
c1-1
Table 12.2 shows the application of this formula. For each of the minimum,
mean and maximum values of Y the corresponding equilibrium C was
calculated ( = APC x Y). The percentage change given by equation (12.17),
when income is increased by one unit, is calculated, and this percentage
used to calculate the absolute increment in consumption. This increment
is the impact MPC. As expected, it is a bit, but not that much, less than
the long-run MPC. At the average level of income our results show that
an increase in income of one unit will increase consumption by 0.67 units
in the year of the rise in income. In the long run, consumption will rise
by 0.74 units, with most of the additional 0.07 units coming in the year
after the increase in income.
Exercise 12.3
Use a spreadsheet to construct a simulation of the ECM. Experiment with
different parameter values.
412 Econometrics far developing countries
Exercise 12.4
Using the data of your choice, find a non-spurious regression between two
I(l) variables and estima te and interpret an ECM.
NOTES
1 This statement is a specific form of cointegration which, more generally
defined, encompasses higher orders of integration.
2 As written here we are ignoring the intercept. The addition of a constant to
a stationary series will not alter the fact that the series is stationary, so the
omission makes no difference to the argument.
3 The critica! value depends on the number of regressors in the levels regres-
sion, but we are restricting our attention to the bivariate case.
PartV
Simultaneous equation models
This page intentionally left blank
13 Misspecification bias from single
equation estimation
13.1 INTRODUCTION
(13.3)
where Qd is quantity demanded, Q is the market-clearing equilibrium
quantity and Y is consumers' income. There are four equations (equation
13.3 is really two equations, Q = Qs and Qs = Qd) and four endogenous
variables (Q, Qs, Qd and P). Let us see what difference it makes to esti-
mate equation (13.1) first as a single equation and then taking into account
the whole model. The data used below are contained in data file SIMSIM. 1
Transforming all variables to logs and estimating by OLS yields:
A
Exercise 13.1
Use the Indonesian national account data contained in data file INDONA
to estima te a consumption function for Indonesia using (a) O LS and (b)
IV, with investment and the trade balance as a single instrument (i.e.
define a new composite exogenous variable, equal to 1 + X - M). Use
Hausman's test to see if there is a significant difference between the two
estimators. How do you interpret this result?
Exercise 13.2
Use the data in the data file MALTA to replicate Table 13.1. Test whether
price is exogenous in the supply equation.
Table 13.1 Estimation of supply and demand model for Maltese exports
Demand Supply
OLS TSLS Hausman (OLS) OLS TSLS
Constant -2.41 -2.93 -2.93 3.26 3.50
(-1.51) (-1.49) (-3.01) (3.48) (3.65)
p -2.88 -5.07 -0.02 1.91 2.21
(-4.44) (-4.81) (-0.03) (5.80) (6.12)
pw 3.81 5.62 5.62
(6.87) (6.32) (12.76)
y 0.63 1.01 1.01
(1.83) (2.29) (4.62)
E 2.70 3.05
(5.20) (5.53)
CPI -1.12 -1.38
(-2.90) (-3.35)
I 0.38 0.32
(2.65) (2.16)
Fitted P -5.05
(-6.37)
R2 0.96 0.94 0.99 0.97 0.97
Note: - indicates excluded from regression. See text for explanation of symbols.
422 Econometrics far developing countries
Exercise 13.3
Suppose that a fourth equation is added to the model of Maltese export:
(13.16)
Test the exogeneity of P and CPI in the supply and demand equations in
this expanded model.
plim (b 2 ) = {3 2 -
1
~ (13.21)
/32 - 'Y2 <Tp
E(b~) = /32 + .L
y¡ ei ¡ (13.22)
LY¡P¡
We cannot evaluate the final expectation for the same reason as befare.
But if probability limits are taken, then, since income is independent of
the error term, we know that plim(l/N)¡yE 1 = O. Hence plim(b21v) = [3 2,
showing that instrumental variables gives a consistent estimator. Note that
the estimator is consistent, but not unbiased, as the final expression in
equation (13.22) cannot be reduced to zero. Instrumental variable estima-
tion and TSLS are for this reason 'large sample techniques', the distribu-
tion of the estimate converges on the population value as the sample size
increases - so estimation from a small sample may plausibly give an esti-
mate quite far removed from this population value. 'Large sample' should
ideally be taken as at least 50 observations, though we often do not have
so many observations for developing countries (notably with time-series
data). But the point is an important one, and it would be pointless to apply
these techniques to samples of fewer than 25 observations.
The bias in OLS estimation of the supply equation arase because price
is not exogenous as OLS assumes. The Hausman test is based on testing
whether or not OLS (which will be inconsistent if price is exogenous)
gives a significantly different estimate to a technique which is consistent
when there is a problem of simultaneity (we used IV, but, as we shall
see in Chapter 14, there are other techniques). If we find that, contrary
to our theoretical expectation, there is no significant simultaneity bias,
then the OLS estimates may be used. Whilst the IV results are the same as
(insignificantly different from) those obtained by OLS (both methods
are consistent when there is no simultaneity bias), the former are less
efficient.
Hausman's test is a test of exogeneity that is directly related to the
problem in hand: is OLS estimation appropriate or not? In the next section
we discuss other definitions and tests for exogeneity that are common in
the literature but which do not have the same intuitive appeal.
Exercise 13.4
The consumption function is often estimated as:
et= f31 + f32Y1 + E¡ (13.23)
Yet income also depends on consumption through the accounting identity:
(13.24
where I is investment and (X - M) the balance of trade in goods and
services, both of which are taken to be exogenous. 7 Show that single-equa-
tion estimation of the consumption function gives a biased estimator of
the MPC, with an upward bias of:
Simultaneity bias 425
plim (b 2) - /3 2 = -
1
-
1 - /32
~
<Iy
(13.25)
Did your results from exercise 13.1 conform with this expression?
Granger causality
The Granger test that X does not Granger cause Y is the F-test that the
Xs may be excluded from the equation:
k k
Yt = 130 + I 13i Yt-i + I "{¡X¡_¡+ E¡ (13.26)
i=l i=l
Table 13.2 Granger and Sims test for Maltese export quantity and prices
Granger Sims
p p X X p p X X
e 0.41 0.14 0.22 0.42 1.37 1.43 -1.93 -1.65
P_z -0.65 -0.55 -0.13 0.31 1.06
P_¡ 1.31 1.52 0.27 0.72 -1.22
p -0.53 1.77
p+l 0.15
P+z 0.99
X_z -0.11 -0.10 -0.04 0.11 0.12
X_¡ 1.31 0.96 0.98 0.30 0.28
X 0.04 0.16
X+ i 0.11
X+z 0.01
n 25 25 25 25 23 23 23 23
Rz 0.98 0.97 0.98 0.98 0.93 0.93 0.98 0.93
RSS 0.083 0.154 0.253 0.269 0.301 0.305 0.270 0.782
Notes: - indicates excluded from the regression. See text for symbols and explanation.
Simultaneity bias 427
The Sims test
The test proposed by Sims is differently specified, but has the same intu-
itive interpretation as the Granger test. The unrestricted equation for the
Sims test is: k2
Yt = f3o + I f3;'i+j + E (13.29)
j=kl
that is, Y is regressed on lagged, current and future values of X (the length
of the lag and the lead need not be equal). The restricted equation
excludes the lagged values of X:
o
Y 1 = [3 0 + I [3jXt+j + E (13.30)
j=kl
Be sure to use the same sample size! So we are testing the null that the
coefficients on the lead terms are jointly zero: i.e. future values of X do
not affect Y. The null hypothesis is that Y does not cause X. Note the
difference between the Sims and Granger tests. Here Y is the dependent
variable, but we are checking if Y causes X, which is the opposite of the
Granger test. If future X is significant, then Y cannot cause X as it does
not precede it.
Table 13.2 reports the results for the Sims test using the price and quan-
tity data. First, price is regressed on past, present and future prices. The
omission of the lead terms barely changes the RSS, so that the calculated
F-statistic is only 0.11. Hence we accept the null hypothesis that price
does not cause quantity. By contrast, the calculated F-statistic for the
hypothesis that quantity does not cause price is 16.12, so that the null is
rejected. We find that quantity does cause price. These findings are, of
course, the same as those obtained with the Granger test.
Exercise 13.5
Use the data on money and prices in Tanzania (date file TANMON) to
test for Granger causality between the two variables. Carry out also the
Sims test and compare your results.
Exogeneity
Three concepts of exogeneity may be identified: weak, strong and super. 8
A variable, X, is weakly exogenous if there is no loss of information by
analysing the distribution of Y conditional upon X (which is what OLS
regression does) and ignoring the stochastic behaviour of X itself. If our
concern is to establish the appropriate estimation technique, then weak
exogeneity is all which need concern us. We have already seen that the
Hausman test can provide us with this information. Granger causality, on
the other hand, is neither necessary nor sufficient to establish weak
exogeneity.
428 Econometrics far developing countries
Granger causality enters the picture because the definition for X to be
strongly exogenous in a model containing X and Y is that X should be
weakly exogenous and the X should not be Granger caused by Y. Super-
exogeneity is related to the Lucas critique (Lucas, 1976), which is the
notion that behaviourial relationships can change in the face of policy
changes (i.e. adjustment in 'exogenous' policy variables). A variable, X,
is said to be super-exogenous if (a) X is weakly exogenous; and (b) model
parameters are invariant to changes in X (formally speaking, the marginal
distribution of X). See, for example, Maddala (1992) for a more extended
discussion of these concepts. Our main point is to emphasise that the
Hausman test comes closest to the conception of exogeneity required in
the context of estimating simultaneous equations.
o
Figure 13.1 The identification of the supply curve by the demand curve
Figure 13.2 The identification of the demand curve by the supply curve
generalises and is formally embodied in the rank and order conditions for
identification. These conditions are the subject of the next section.
(o~ -~ ~º)o
1
(13.32)
Y =C+l+X-M (13.33)
C = 131 + 132 y + El (13.34)
] = 81 + 82r + E3 (13.37)
Md = Ms = L_
p (13.38)
and the E¡S are error terms. We combine equations (13.36) and (13.38)
so that there are five equations and five endogenous variables (Y, C, I, M
and r) and three exogenous variables (L/P, X and the intercept). Table 13.4
gives the coefficient matrix. The final columns also give the information
necessary for the arder condition, with the last column summarising the
identification status of each equation. A table such as this provides sys-
tematic means of working through the identification process.
We do not need to discuss the identification of the national accounting
identity - we already know all the coefficients are exactly one (or minus
one, as the case may be). Let us consider the consumption function. The
matrix of coefficients of variables excluded from this equation is the 4 x
5 matrix:
(-~ -~ -~
-1 o o
o o o
~ ~)
82
~3
o
-1 (13.39)
(~~ -~ -i ~)
o o o -1 (13.40)
Exercise 13.6
Replacing equation (13.1) by equation (13.31) (i.e. use the supply equa-
tion with weather), test for identification in the supply and demand model.
ADDITIONAL EXERCISES
Exercise 13.7
Use the Indonesian national accounts data (date file INDONA) to test
whether income is exogenous in the consumption and import functions in
the simple Keynesian model:
Ct = 131 + 132 yt + El,t
Simultaneity bias 435
M, = "Y1 + "Y2Yt + E2,t
Y=C+I+X-M
where Mis imports, C consumption, Y income, I investment and X exports.
Exercise 13.8
Using equations (13.1) and (13.2) illustrate that the OLS estimate of
the coefficient on price suffers from upward bias. (The algebra for this
question is a bit messy.) Does this finding agree with the results shown
in Table 13.1?
Exercise 13.9
In the demand and supply model in equations (13.1)-(13.3) the demand
equation is unidentified, so no simultaneous technique may be applied to
estimation of the price elasticity of demand. But of course single-equa-
tion estimation by OLS is still possible. Carry out this regression and
comment on the information given by the coefficient.
Exercise 13.10
Comment on the estimation procedure for the parameters in the following
model:
NOTES
1 The data have been generated by a simulation of the model specified in equa-
tions (13.1 )-(13.3).
2 The supply equation actually has two regressors - price and the intercept term.
Sorne packages, e.g. Microfit, require you to specify as many instruments as
regressors including the intercept term. If this is the case simply include the
intercept in your list of instruments - it will act as its own instrument.
3 Subject to comments made in the next section.
4 A version of the omitted-variable Hausman test was used in Chapter 10 as an
equivalent form of the Plosser-Schwert-White differencing test.
5 An alternative closure would be quantity adjustment in which the actual leve!
of exports is constrained to the mínimum of supply and demand, i.e. X =
min¡xs,xd).
436 Econometrics far developing countries
6 lf we were testing the exogeneity of more than one variable then the fitted
value for each of these variables would be included in the expanded regres-
sion.
7 The consumption function is the 'standard text book' example of simultaneity
bias. We did not use it as the accounting identity is usually presented in
for the case of a closed economy. We have included the trade balance since
we use actual data; there are no real economies for which it is true that Y =
e+ 1.
8 The reader is warned that exogeneity is a term for which differing definitions
abound. We present here those suggested by Engle et al. (1983) which have
the widest currency.
9 It is perfectly possible to check first the order condition and then the rank.
There is no strong argument either way. An equation that fails the order condi-
tion must fail the rank condition (why?) but not vice versa. Doing the rank
condition first may therefore save a small amount of time. In Chapter 14 it
will also be seen that it is possible to apply the technique of two- (or three-)
stage least squares to both just and over-identified equations: if this is the
intention then the order condition is superfluous.
14 Estimating simultaneous
equation models
14.1 INTRODUCTION
The previous chapters have shown that OLS estimation may be inappro-
priate for an equation which is part of a system of equations and that it
may not be possible to estimate the parameters of sorne equations at all
(if the equation is unidentified). We also saw that an equation may be
just or over-identified and said that this affects how the equation may be
estimated. This chapter discusses the appropriate estimation techniques
for these different cases.
We begin, however, in section 14.2 with a presentation of recursive
systems as a special class of multi-equation model. Section 14.3 discusses
the method of indirect least squares (ILS) which may only be used for
equations which are just identified. Section 14.4 presents instrumental vari-
ables and the related method of two-stage least squares. These techniques
are called limited information estimation as they estimate the parameters
of one equation at a time. Section 14.5 employs the various techniques to
estimate a consumption function for Indonesia.
In section 14.6 we discuss seemingly unrelated regressions (SUR) and
three-stage least squares (3SLS) which is a full information technique -
all the models' (estimable) parameters are estimated simultaneously.
Section 14.7 provides a summary of the chapter's main points.
(14.1)
438 Econometrics for developing countries
where the Ys are endogenous and X, which may be a vector, exogenous.
The problem of simultaneity bias arises since one of the regressors is cor-
related with the error term. This problem will not affect X as it is exoge-
nous. But in a simultaneous model - in which say Y1 depends on Y 2 and Y 2
depends on Y1 - then a variable which is the dependent variable in one
equation will be the regressor in another and simultaneity bias will be
present. But this problem does not appear in the recursive model. The
expression far Y1 is equation (14.1) in a reduced farm, so clearly OLS can
be used. Since Y1 is a function of only X and its own error E 1, it will not be
related to any of the other error terms appearing in the model. The reduced
farm far Y 2 shows it to be a function of X and E1 and E2• So Y2 will not be
correlated to, far example, E3 when it appears as a regressor in the equa-
tion of Y 3, nor E4 and so on. The same argument carries through far each
of the endogenous variables. As there is no simultaneity bias the structural
equations from a recursive model apparently may be estimated by OLS.
In fact there is a problem here, which we shall state rather than demon-
strate. In a recursive model the arder condition is satisfied far only the
first equation. However, the other equations may be shown to be identi-
fied, provided there is no cross-equation correlation between the error
terms, i.e. E(E;Ej) =O far all i =I= j. However, OLS is consistent rather than
unbiased, and so only valid far large samples.
There are two important exceptions to the above argument. The first
should be apparent from the preceding paragraph, i.e. what happens when
the error terms are cross-correlated, e.g. E(E1E2) =I= O. In this case multi-
plying the expression far Y1 through by Ez shows that E(yE 2) is not zero,
so that OLS estimation of the equation far Y 2 is now biased and incon-
sistent. In fact, cross-correlation of the error terms leads us away from
OLS even when the equations do not apparently farm part of a multi-
equation model - that is, they are what is known as seemingly unrelated
regressions. The appropriate estimation technique under these circum-
stances is discussed in section 14.6.
We have discussed here a general farm of the recursive model. In
economics such models are most likely to arise on account of lags in
behaviourial relationships. Examples of this fact are provided in the exer-
cises. But this case is the second exception to our initial description of
using OLS far the recursive model. If the first equation contains a lagged
endogenous variable, then estimation of this equation by OLS is consis-
tent but not unbiased. This result fallows since simple substitution can
obtain a reduced farm in which Y1 is a function of its own lag, under
which circumstances we know OLS to be biased but consistent.
Exercise 14.1
In a simple two-equation Keynesian model of the closed economy, the
accounting identity:
Estimating simultaneous equation models 439
(14.2)
may be combined either with a contemporaneous consumption function:
(14.3)
or one in which the effect of income on consumption operates with a lag
(14.4)
Show that the model with the lagged consumption function is recursive,
whereas that with the contemporaneous function is not. Discuss the impli-
cations of this finding far estimation of the marginal propensity to
consume.
Exercise 14.2
Consider the five-equation IS-LM model given in equations (13.33)-
(13.37). Show that a lagged consumption function does not make the
model recursive. Show that the model is recursive if the income term is
lagged in both the import and consumption functions and that investment
depends on the lagged interest rate. In the latter case explain the arder
of substitution to obtain a recursive solution to the model.
Q = Qf = Qf (14.7)
where P and Q are price and quantity, the s and d superscripts denote
supply and demand, and Y is income.
This model may be solved to get the fallowing reduced-farm expres-
sions far price and quantity:
p = V1 - /31 + ')13 y + 82 - 81
(14.8)
t /32 - 1'2 /32 - 1'2 t /32 - 1'2
/321'1 - /311'2
7T3 =
/32 - 'Y2
(14.12)
(14.13)
Exercise 14.3
In the following model (where W is an index of weather conditions):
Q 5 = ~1 + ~2p + ~3 W + El (14.14)
Qd = 'Y1 + 'Y2P + Ez (14.15)
Qs = Qd = Q (14.16)
which parameters may be estimated using ILS? Derive the algebraic
expressions to calculate these parameters. Discuss your answer in the light
of the identification of the equations.
Estimating simultaneous equation models 441
To apply ILS to our supply and demand example (equations (14.5)-
(14.7)) we first estimate the reduced-form equations. We do this using the
data set in data file SIMSIM, where all series have been subject to the
log transformation:
A
If you refer back to Chapter 13, you will see that these are not the
same as the results which were obtained by OLS (by which the price elas-
ticity of supply is estimated as 0.11). The latter are biased and inconsistent,
whereas the ILS estimates are consistent; but they are not unbiased. 2 That
is, in common with all techniques for estimating equations in simultaneous
systems, ILS is a large-sample technique. Small-sample bias may be quite
serious. Developing-country data sets often do not permit very large
samples but a sample size of less than 20 would cast serious doubt on the
validity of your results and it is really preferable to work with at least 25
observations.
The above example shows how ILS may be applied to the estimation
of an equation which is just identified. What about when an equation is
over-identified? We have stated that ILS is then inappropriate, and it
is easy to show why this is so. Suppose that our demand equation has an
additional exogenous variable, A, which is the real value of advertising
expenditure per capita:
(14.20)
Checking the identification of the behaviourial relationships in this model
(which is left as an exercise) shows that, whilst the demand equation
remains unidentified, the supply schedule is now over-identified.
The consequences of this over-identification are apparent when we write
out the reduced form equations:
(14.21)
(14.22)
We will now get estimates of six 7TS, but there are only five structural
coefficients to be estimated (13 1, 132 , "Y 1, "Y 2 and "'1 3). We have too many
442 Econometrics for developing countries
equations, so there will be no unique solution. There will be more than
one solution for the parameters of the supply curve (the ¡3s) and no means
of choosing between them as to the 'right one'. ILS therefore cannot be
used when an equation is over-identified.
In practice, ILS is little used even if an equation is just identified. This
neglect is because of the algebra (which rapidly becomes tedious in large
models) and subsequent substitution required to get the structural coef-
ficients from the reduced-form estimates. The method mainly survives as
an introduction to the identification problem in many texts. The preferred
method is instrumental variable estimation. In the next section we deal
with this technique first for just identified equations and then in its gener-
alised form (for equations which are either just or over-identified), which
is equivalent to two-stage least squares.
Exercise 14.4
The Human Development Index may be modelled as a function of income
per capita and population growth. But population growth itself is a function
of the HDI. Which parameters in this model may be estimated? Derive
the algebraic expressions to estimate them by ILS. Calculate these esti-
mates using the data in the data file SOCECON.
And like ILS, IV estimates are consistent but not unbiased; it is also
a large sample estimator. This must obviously be so since it has just been
demonstrated that the two estimators are algebraically identical.
The R 2 obtained by OLS will always be greater than that resulting from
IV estimation. Indeed, the latter can be negative, which is symptomatic
of poor model specification, and perhaps that the equation is not identi-
fied at all.
What if we have a choice of instruments. Which should we choose? The
answer is that this problem will not arise: if we have a choice of instru-
ments then the equation will be over-, not just, identified and the use of
IV with a single instrument is inappropriate. Rather, we should apply
2SLS which utilises the full range of instruments.
The supply and demand model, with advertising included in the demand
schedule, is an example of this situation. As before, the demand equation
is unidentified. The supply equation is over-identified, and we have a
choice of instruments: either income or advertising. Estimating the model
using advertising as the instrument gives:
A
Qt = 7.32 + 0.39
(1.19) (0.35) (14.25)
which is a rather different answer to that provided by using income (here
price is not significant), given in equation (14.23). There is no basis for
saying that either equation (14.19) or (14.21) is the 'right answer'. This
situation is analogous to that in IV estimation, where we had more reduced
form coefficients than structural ones, so there was no unique solution
for the latter.
So how can we estimate the supply function when it is over-identified?
Since we have no basis for deciding between the exogenous variables in
choosing our instrument why not choose both of them? Or, more precisely,
a linear combination of them. This sounds promising, but what linear
combination should we take (what weight should we give each variable)?
444 Econometrics for developing countries
Why not use the coefficient given by regressing the variable we wish to
proxy on the instruments (the instruments being all the exogenous vari-
ables in the model)? That is, proxy the endogenous regressors by their
fitted values from the estimates of their reduced forms. This method is
perfectly acceptable, and it will yield identical results to the closely related
method of two-stage least squares. But to carry out 2SLS, rather than
using the fitted values of the endogenous variables in the formula for the
IV estimator, we use the fitted values as the regressors in the original
structural equation we are estimating rather than the actual values of these
variables.
Thus, the procedure is called two-stage least squares since it involves
doing two sets of regressions. First we regress the endogenous regressors
from the structural equation we wish to estimate on their reduced forms.
Then we estimate the structural equation by OLS, but replacing the
endogenous regressors by the fitted values given by the first stage. 5 We
now apply the technique to the supply and demand model.
The first stage is to regress the endogenous regressor on its reduced
form; this gives:
Pt = -1.14 + 1.19Y1 - 0.03A 1 (14.26)
which is used to calculate the values of P which are then used to esti-
mate the second stage-equation:
Q/ = 8.36 + 0.20.P,
(39.69) (5.38) (14.27)
The standard errors given by this second regression are not the appro-
priate ones. These are, however, not difficult to calculate, and are given
by statistics and econometrics packages. 6 Also it is not necessary to carry
out the two stages yourself in practice as this will be done by the package
(though it is useful for understanding the technique to do it yourself a
few times and, as we shall see, it can yield useful information). The esti-
mated equation given by selecting 2SLS is:
A A
(14.35)
(14.38)
Estimating simultaneous equation models 447
There are only two structural coefficients to be estimated (13 1 and ¡3 2) but
four reduced-form estimates. It might appear that we have too many equa-
tions, so that the consumption function is over-identified. This is not the
case. Applying the order condition shows (subject to the rank condition
being satisfied) that the consumption function is just identified. When an
equation is just identified but there appear to be multiple solutions (as
here ), then, in fact, the solutions arrived at by the different channels will
turn out to be the same. You should be able to satisfy yourself that this
is the case by estimating the 7r from equations (14.36) and (14.37) and
substituting the results in to equation (14.38).
The reduced form for income is:
Yt = 4,011 + 2.888/XMt (14.39)
which gives:
/\
The same answers are arrived at by estimating the reduced form for
consumption:
et = 4,011 + l.888IXM 1
(14.41)
so:
/\
Exercise 14.5
Use the Indonesian data (data file INDONA) to estimate the model:
Yt = et + It + xt - Mt (14.44)
et= f31 + f32 yt + El,t (14.45)
Mt = ª1 + ª2Yt + Ez,t (14.46)
Exercise 14.6
Repeat the procedures followed in the preceding section using the national
accounts data for Sri Lanka (data file SRINA) or the country of your
choice.
bsuR _ [u; LP1 q¡ - <T12LP1 q2]<T12LPiP2 + [ui LP2q2 - <T12LP2 qi]af LP~
2 2
- uiu;LPiLP; - u1 2(LP1P2)2
(14.50)
Examination of equations (14.49) and (14.50) shows that if there is no
cross-equation error correlation then the estimators are equivalent to
OLS. SUR estimators are also equivalent to OLS when the two regres-
sors (P1 and P 2) are identical.
In arder to apply these equations, estimates of the error variance and
covariance are required, which are calculated from the residuals:
(14.51)
Exercise 14.7
Using the data in SOCECON estimate the regressions of the birth rate
on the square root of infant mortality, and of life expectancy on logged
income per capita using (a) OLS and (b) SUR. Comment on your results.
Exercise 14.8
Replicate Table 14.3 using the INDONA data. Reproduce the table also
using the data in SRINA or national accounts data for the country of your
choice. Comment on your results.
NOTES
1 Standard errors are not reported as there is no simple way of obtaining them
from the reduced-form standard errors (which were not reported for the same
reason - their significance does not necessarily imply the significance of the
structural coefficients). The inability to judge significance from ILS estimation
is an additional reason against its use to that given later in the text.
2 The reason being that an attempt to prove unbiasedness runs into the need
to take expectations of a ratio of random variables. This is not possible, and
we must resort to probability limits.
3 Many packages, for example Microfit, treat the intercept as a variable. Since
to estimate an equation by IV we need as many instruments as we have vari-
ables, estimation of the supply schedule requires two instruments. But the
intercept (or any other exogenous variable appearing in the equation) simply
acts as its own instrument.
4 We gave the formula for the slope coefficient (but not the intercept and stan-
dard errors) in Chapter 13.
5 We can actually replace ali the endogenous variables - including that on the
left-hand side of the equation - by their fitted values, as using fitted rather
than actual values on the left-hand side does not affect the parameter esti-
mates. (It does change the standard errors, but these are, as we shall see, the
wrong ones anyway.)
6 To get the correct standard errors from the ones given by the second stage it
is necessary to multiply by a correction factor. This correction factor is the
ratio of the estimated variance of the error in the structural equation being
estimated to the variance of the error in the second-stage regression (the struc-
tural equation with endogenous regressors replaced by their fitted values from
the first stage regression). See Gujarati (1988: 620-1) or Maddala (1988:
311-13) for a derivation.
7 In which case, as stated in Chapter 13, it is unnecessary to check the order
condition, since the rank condition is necessary and sufficient.
8 In Chapter 13 we used the IV method to provide a consistent estimator when
the regressor is endogenous. It is possible to use 2SLS instead.
9 Note that the DW is substantially less than the R 2 , suggesting that there is
likely to be spurious correlation here and that cointegration analysis might be
appropriate. As stated in Chapter 11, this latter approach should be adopted
- using simultaneous techniques does not eliminate the inconsistency that
results from regressing series that are I(O) on one another.
10 However, the bias is very small. Sorne suggest simultaneity is never too great
a problem - and almost certainly less than the measurement error in the data.
We should also note with concern that R2 > DW, suggesting that the regres-
sion may well be spurious. It is more important to test for stationarity, and
proceed accordingly if the variables prove non-stationary, than to worry about
simultaneous estimation.
11 lf different results are obtained this tells you there is a problem in the data.
This is why we could not use the textbook example of Y = C + !, since there
is no economy which fits this model. lf data are used for which this identity
does not hold then different results will be obtained from the two reduced
forms. The Y here is GDP at market prices: GNP might be a more appro-
priate measure to use in the consumption function, in which case net factor
payments from abroad should be added as an additional exogenous term in
the identity.
12 Endogenising imports through an import function, which may appear an
obvious improvement to the model, is left as an exercise.
13 We have not given the formulae for the intercept and standard errors. These
estimates were obtained from TSP.
This page intentionally left blank
Appendix A
The data sets used in this book
Example
Pr (O ,,;; z ,,;; 1.9607 = 0.4750
Pr (z ~ 1.96) = 0.5 - 0.4750 = 0.025
o 1.96
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
O.O 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4 147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
2.3 0.4893 0.4896 0.4901 0.48901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4990 0.4990 0.4990
Note: This table gives the area in the right-hand tail distribution (i.e., z ~ O). But since the normal distribution is
symmetrical about z = O , the area in the left-hand tail is the same as the area in the corresponding right-hand tail. For
example, P(-1.96 "°' z "°'O) = 0.4750. Therefore, P(-1.96 ,o; z ,o; 1.96) = 2(0.4750) = 0.95.
Appendix B: Statistical tables 465
~
0.025 O.JO 0.05 0.025 O.OJ 0.005 O.OOJ
f 0.50 0.20 O.JO 0.05 0.02 O.OJO 0.002
Example
Pr (F > 1.59) = 0.25
Pr (F > 2.42) = 0.10 for df N 1 = 10
Pr (F > 3.14) = 0.05 and N 2 = 9
Pr (F > 5.26) = 0.01
a¡ ¡ar
denom- df far numerator N 1
inator
N,
Pr 1 2 3 4 5 6 7 8 9 10 11 12
0.25 5.83 7.50 8.20 8.58 8.82 8.98 9.10 9.19 9.26 9.32 9.36 9.41
1 0.10 39.9 49.5 53.6 55.8 57.2 58.2 58.9 59.4 59.9 60.2 60.5 60.7
0.05 161 200 216 225 230 234 237 239 241 242 243 244
0.25 2.57 3.00 3.15 3.23 3.28 3.31 3.34 3.35 3.37 3.38 3.39 3.39
2 0.10 8.53 9.00 9.16 9.24 9.29 9.33 9.35 9.37 9.38 9.39 9.40 9.41
0.05 18.5 19.0 19.2 19.2 19.3 19.3 19.4 19.4 19.4 19.4 19.4 19.4
0.01 98.5 99.0 99.2 99.2 99.3 99.3 99.4 99.4 99.4 99.4 99.4 99.4
0.25 2.02 2.28 2.36 2.39 2.41 2.42 2.43 2.44 2.44 2.44 2.45 2.45
3 0.10 5.54 5.46 5.39 5.34 5.31 5.28 5.27 5.25 5.24 5.23 5.22 5.22
0.05 10.1 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 8.79 8.76 8.74
0.01 34.1 30.8 29.5 28.7 28.2 27.9 27.7 27.5 27.3 27.2 27.1 27.0
0.25 1.81 2.00 2.05 2.06 2.07 2.08 2.08 2.08 2.08 2.08 2.08 2.08
4 0.10 4.54 4.32 4.19 4.11 4.05 4.01 3.98 3.95 3.94 3.92 3.91 3.90
0.05 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.94 5.91
0.01 21.2 18.0 16.7 16.0 15.5 15.2 15.0 14.8 14.7 14.5 14.4 14.4
0.25 1.69 1.85 1.88 1.89 1.89 1.89 1.89 1.89 1.89 1.89 1.89 1.89
5 0.10 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 3.30 3.28 3.27
0.05 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 4.74 4.71 4.68
0.01 16.3 13.3 12.1 11.4 11.0 10.7 10.5 10.3 10.2 10.1 9.96 9.89
0.25 1.62 1.76 1.78 1.79 1.79 1.78 1.78 1.78 1.77 1.77 1.77 1.77
6 0.10 3.78 3.46 3.29 3.18 3.11 3.05 3.01 2.98 2.96 2.94 2.92 2.90
0.05 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.03 4.00
0.01 12.2 10.9 9.78 9.15 8.75 8.47 8.26 8.10 7.89 7.87 7.79 7.72
0.25 1.57 1.70 1.72 1.72 1.71 1.71 1.70 1.70 1.69 1.69 1.69 1.68
7 0.10 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2.72 2.70 2.68 2.67
0.05 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.64 3.60 3.57
0.01 12.2 9.55 8.45 7.85 7.46 7.19 6.99 6.84 6.72 6.62 6.54 6.47
0.25 1.54 1.66 1.67 1.66 1.66 1.65 1.64 1.64 1.63 1.63 1.63 1.62
8 0.10 3.46 3.11 2.92 2.81 2.73 2.67 2.62 2.59 2.56 2.54 2.52 2.50
0.05 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.35 3.31 3.28
0.01 11.3 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 5.81 5.73 5.67
0.25 1.51 1.62 1.63 1.63 1.62 1.61 1.60 1.60 1.59 1.59 1.58 1.58
9 0.10 3.36 3.01 2.81 2.69 2.61 2.55 2.51 2.47 2.44 2.42 2.40 2.38
0.05 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.14 3.10 3.07
0.01 10.6 8.02 6.99 6.42 6.06 5.80 5.61 5.47 5.35 5.26 5.18 5.11
Appendix B: Statistical tables 467
Table B.3 - Continued
dffor
df far denominator N 1 denom-
inator
15 20 24 30 40 50 60 100 120 200 500 00 Pr Ni
9.49 9.58 9.63 9.67 9.71 9.74 9.76 9.78 9.80 9.82 9.84 9.85 0.25
61.2 61.7 62.0 62.3 62.5 62.7 62.8 63.0 63.1 63.2 63.3 63.3 0.10 1
246 248 249 250 251 252 252 253 253 254 254 254 0.05
3.41 3.43 3.43 3.44 3.45 3.45 3.46 3.47 3.47 3.48 3.48 3.48 0.25
9.42 9.44 9.45 9.46 9.47 9.47 9.47 9.48 9.48 9.49 9.49 9.49 0.10 2
19.4 19.4 19.5 19.5 19.5 19.5 19.5 19.5 19.5 19.5 19.5 19.5 0.05
99.4 99.4 99.5 99.5 99.5 99.5 99.5 99.5 99.5 99.5 99.5 99.5 0.01
2.46 2.46 2.46 2.47 2.47 2.47 2.47 2.47 2.47 2.47 2.47 2.47 0.25
5.20 5.18 5.18 5.17 5.16 5.15 5.15 5.14 5.14 5.14 5.14 5.13 0.10 3
8.70 8.66 8.64 8.62 8.59 8.58 8.57 8.55 8.55 8.54 8.53 8.53 0.05
26.9 26.7 26.6 26.5 26.4 26.4 26.3 26.2 26.2 26.2 26.1 26.1 0.01
2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 2.08 0.25
3.87 3.84 3.83 3.82 3.80 3.80 3.79 3.78 3.78 3.77 3.76 3.76 0.10 4
5.86 5.80 5.77 5.75 5.72 5.70 5.69 5.66 5.66 5.65 5.64 5.63 0.05
14.2 14.0 13.9 13.8 13.7 13.7 13.7 13.6 13.6 13.5 13.5 13.5 0.01
1.89 1.88 1.88 1.88 1.88 1.88 1.87 1.87 1.87 1.87 1.87 1.87 0.25
3.24 3.21 3.19 3.17 3.16 3.15 3.14 3.13 3.12 3.12 3.11 3.10 0.10 5
4.62 4.56 4.53 4.50 4.46 4.44 4.43 4.41 4.40 4.39 4.37 4.36 0.05
9.72 9.55 9.47 9.38 9.29 9.24 9.20 9.13 9.11 9.08 9.04 9.02 0.01
1.76 1.76 1.75 1.75 1.75 1.75 1.74 1.74 1.74 1.74 1.74 1.74 0.25
2.87 2.84 2.82 2.80 2.78 2.77 2.76 2.75 2.74 2.73 2.73 2.72 0.10 6
3.94 3.87 3.84 3.81 3.77 3.75 3.74 3.71 3.70 3.69 3.68 3.67 0.05
7.56 7.40 7.31 7.23 7.14 7.09 7.06 6.99 6.97 6.93 6.90 6.88 0.01
1.68 1.67 1.67 1.66 1.66 1.66 1.65 1.65 1.65 1.65 1.65 1.65 0.25
2.63 2.59 2.58 2.56 2.54 2.52 2.51 2.50 2.49 2.48 2.48 2.47 0.01 7
3.51 3.44 3.41 3.38 3.34 3.32 3.30 3.27 3.27 3.25 3.24 3.23 0.05
6.31 6.16 6.07 5.99 5.91 5.86 5.82 5.75 5.74 5.70 5.67 5.65 0.01
1.62 1.61 1.60 1.60 1.59 1.59 1.59 1.58 1.58 1.58 1.58 1.58 0.25
2.46 2.42 2.40 2.38 2.36 2.35 2.34 2.32 2.32 2.31 2.30 2.16 0.10 8
3.22 3.15 3.12 3.08 3.04 2.02 3.01 2.97 2.97 2.95 2.94 2.93 0.05
5.52 5.36 5.28 5.20 5.12 5.07 5.03 4.96 4.95 4.91 4.88 4.86 0.01
1.57 1.56 1.56 1.55 1.55 1.54 1.54 1.53 1.53 1.53 1.53 1.53 0.25
2.34 2.30 2.28 2.25 2.23 2.22 2.21 2.19 2.18 2.17 2.17 2.16 0.10 9
3.01 2.94 2.90 2.86 2.83 2.80 2.79 2.76 2.75 2.73 2.72 2.71 0.05
4.96 4.81 4.73 4.65 4.57 4.52 4.48 4.42 4.40 4.36 4.33 4.31 0.01
Source: As Table B.2: table 18
468 Econometrics far developing countries
Table B.3 Upper percentage points of the F distribution
dffor
denom- df far numerator N 1
inator
N,
Pr 1 2 3 4 5 6 7 8 9 10 11 12
0.25 1.49 1.60 1.60 1.59 1.59 1.58 1.57 1.56 1.56 1.55 1.55 1.54
10 0.10 3.29 2.92 2.73 2.61 2.52 2.46 2.41 2.38 2.35 2.32 2.30 2.28
0.05 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.98 2.94 2.91
0.01 10.0 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 4.85 4.77 4.71
0.25 1.47 1.58 1.58 1.57 1.56 1.55 1.54 1.53 1.53 1.52 1.52 1.51
11 0.10 3.23 2.86 2.66 2.54 2.45 2.39 2.34 2.30 2.27 2.25 2.23 2.21
0.05 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.85 2.82 2.79
0.01 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 4.54 4.46 4.40
0.25 1.46 1.56 1.56 1.55 1.54 1.53 1.52 1.51 1.51 1.50 1.50 1.49
12 0.10 3.18 2.81 2.61 2.48 2.39 2.33 2.28 2.24 2.21 2.19 2.17 2.15
0.05 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 2.75 2.72 2.69
0.01 9.33 6.93 5.95 5.41 5.06 4.82 4.64 4.50 4.39 4.30 4.22 4.16
0.25 1.45 1.55 1.55 1.53 1.52 1.51 1.50 1.49 1.49 1.48 1.47 1.47
13 0.10 3.14 2.76 2.56 2.43 2.35 2.28 2.23 2.20 2.16 2.14 2.12 2.10
0.05 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 2.67 2.63 2.60
0.01 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 4.10 4.02 3.96
0.25 1.44 1.53 1.53 1.52 1.51 1.50 1.49 1.48 1.47 1.46 1.46 1.45
14 0.10 3.10 2.73 2.52 2.39 2.31 2.24 2.19 2.15 2.12 2.10 2.08 2.05
0.05 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 2.60 2.57 2.53
0.01 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 3.94 3.86 3.80
0.25 1.43 1.52 1.52 1.51 1.49 1.48 1.47 1.46 1.46 1.45 1.44 1.44
15 0.10 3.07 2.70 2.49 2.36 2.27 2.21 2.16 2.12 2.09 2.06 2.04 2.02
0.05 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 2.54 2.51 2.48
0.01 8.86 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.73 3.67
0.25 1.42 1.51 1.51 1.50 1.48 1.47 1.46 1.45 1.44 1.44 1.44 1.43
16 0.10 3.05 2.67 2.46 2.33 2.24 2.18 2.13 2.09 2.06 2.03 2.01 1.99
0.05 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.46 2.42
0.01 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.62 3.55
0.25 1.42 1.51 1.50 1.49 1.47 1.46 1.45 1.44 1.43 1.43 1.42 1.41
17 0.10 3.03 2.64 2.44 2.31 2.22 2.15 2.10 2.06 2.03 2.00 1.98 1.96
0.05 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 2.45 2.41 2.38
0.01 8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.52 3.46
0.25 1.41 1.50 1.49 1.48 1.46 1.45 1.44 1.43 1.42 1.42 1.41 1.40
18 0.10 3.01 2.62 2.42 2.29 2.20 2.13 2.08 2.04 2.00 1.98 1.96 1.93
0.05 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.37 2.34
0.01 8.29 6.01 5.09 4.58 4.25 4.01 3.84 3.71 3.60 3.51 3.43 3.37
0.25 1.41 1.49 1.49 1.47 1.46 1.44 1.43 1.42 1.41 1.41 1.40 1.40
19 0.10. 2.99 2.61 2.40 2.27 2.18 2.11 2.06 2.02 1.98 1.96 1.94 1.91
0.05 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 2.38 2.34 2.31
0.01 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.36 3.30
0.25 1.40 1.49 1.48 1.46 1.45 1.44 1.43 1.42 1.41 1.40 1.39 1.39
20 0.10 2.97 2.59 2.38 2.25 2.16 2.09 2.04 2.00 1.96 1.94 1.92 1.89
0.05 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 2.35 2.31 2.28
0.01 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 3.37 3.29 3.23
Estimating simultaneous equation models 469
Table B.3 - Continued
dffor
df far denorninator N 1 denom-
inator
N,
15 20 24 30 40 50 60 100 120 200 500 00 Pr
1.53 1.52 1.52 1.51 1.51 1.50 1.50 1.49 1.49 1.49 1.48 1.48 0.25
2.24 2.20 2.18 2.16 2.13 2.12 2.11 2.09 2.08 2.07 2.06 2.06 0.10 10
2.85 2.77 2.74 2.70 2.66 2.64 2.62 2.59 2.58 2.56 2.55 2.54 0.05
4.56 4.41 4.33 4.25 4.17 4.12 4.08 4.01 4.00 3.96 3.93 3.91 0.01
1.50 1.49 1.49 1.48 1.47 1.47 1.47 1.46 1.46 1.46 1.45 1.45 0.25
2.17 2.12 2.10 2.08 2.05 2.04 2.03 2.00 2.00 1.99 1.98 1.97 0.10 11
2.72 2.65 2.61 2.57 2.53 2.51 2.49 2.46 2.45 2.43 2.42 2.40 0.05
4.25 4.10 4.02 3.94 3.86 3.81 3.78 3.71 3.69 3.66 3.62 3.60 0.01
1.48 1.47 1.46 1.45 1.45 1.44 1.44 1.43 1.43 1.43 1.42 1.42 0.25
2.10 2.06 2.04 2.01 1.99 1.97 1.96 1.94 1.93 1.92 1.91 1.90 0.10 12
2.62 2.54 2.51 2.47 2.43 2.40 2.38 2.35 2.34 2.32 2.31 2.30 0.05
4.01 3.86 3.78 3.70 3.62 3.57 3.54 3.47 3.45 3.41 3.38 3.36 0.01
1.46 1.45 1.44 1.43 1.42 1.42 1.42 1.41 1.41 1.40 1.40 1.40 0.25
2.05 2.01 1.98 1.96 1.93 1.92 1.90 1.88 1.88 1.86 1.85 1.85 0.10 13
2.53 2.46 2.42 2.38 2.34 2.31 2.30 2.26 2.25 2.23 2.22 2.21 0.05
3.82 3.66 3.59 3.51 3.43 3.38 3.34 3.27 3.25 3.22 3.19 3.17 0.01
1.44 1.43 1.42 1.41 1.41 1.40 1.40 1.39 1.39 1.39 1.38 1.38 0.25
2.01 1.96 1.94 1.91 1.89 1.87 1.86 1.83 1.83 1.82 1.80 1.80 0.10 14
2.46 2.39 2.35 2.31 2.27 2.24 2.22 2.19 2.18 2.16 2.14 2.13 0.05
3.66 3.51 3.43 3.35 3.27 3.22 3.18 3.11 3.09 3.06 3.03 3.00 0.01
1.43 1.41 1.41 1.40 1.39 1.39 1.38 1.38 1.37 1.37 1.36 1.36 0.25
1.97 1.92 1.90 1.87 1.85 1.83 1.82 1.79 1.79 1.77 1.76 1.76 0.10 15
2.40 2.33 2.29 2.25 2.20 2.18 2.16 2.12 2.11 2.10 2.08 2.07 0.05
3.52 3.37 3.29 3.21 3.13 3.08 3.05 2.98 2.96 2.92 2.89 2.87 0.01
1.41 1.40 1.39 1.38 1.37 1.37 1.36 1.36 1.35 1.35 1.34 1.34 0.25
1.94 1.89 1.87 1.84 1.81 1.79 1.78 1.76 1.75 1.74 1.73 1.72 0.10 16
2.35 2.28 2.24 2.19 2.15 2.12 2.11 2.07 2.06 2.04 2.02 2.01 0.05
3.41 3.26 3.18 3.10 3.02 2.97 2.93 2.86 2.84 2.81 2.78 2.75 0.01
1.40 1.39 1.38 1.37 1.36 1.35 1.35 1.34 1.34 1.34 1.33 1.33 0.25
1.91 1.86 1.84 1.81 1.78 1.76 1.75 1.73 1.72 1.71 1.69 1.69 0.10 17
2.31 2.23 2.19 2.15 2.10 2.08 2.06 2.02 2.01 1.99 1.97 1.96 0.05
3.31 3.16 3.08 3.00 2.92 2.87 2.83 2.76 2.75 2.71 2.68 2.65 0.01
1.39 1.38 1.37 1.36 1.35 1.34 1.34 1.33 1.33 1.32 1.32 1.32 0.25
1.89 1.84 1.81 1.78 1.75 1.74 1.72 1.70 1.69 1.68 1.67 1.66 0.10 18
2.27 2.19 2.15 2.11 2.06 2.04 2.02 1.98 1.97 1.95 1.93 1.92 0.05
3.23 3.08 3.00 2.92 2.84 2.78 2.75 2.68 2.66 2.62 2.59 2.57 0.01
1.38 1.37 1.36 1.35 1.34 1.33 1.33 1.32 1.32 1.31 1.31 1.30 0.25
1.86 1.81 1.79 1.76 1.73 1.71 1.70 1.67 1.67 1.65 1.64 1.63 0.10 19
2.23 2.16 2.11 2.07 2.03 2.00 1.98 1.94 1.93 1.91 1.89 1.88 0.05
3.15 3.00 2.92 2.84 2.76 2.71 2.67 2.60 2.58 2.55 2.51 2.49 0.01
1.37 1.36 1.35 1.34 1.33 1.33 1.32 1.31 1.31 1.30 1.30 1.29 0.25
1.84 1.79 1.77 1.74 1.71 1.69 1.68 1.65 1.64 1.63 1.62 1.61 0.10 20
2.20 2.12 2.08 2.04 1.99 1.97 1.95 1.91 1.90 1.88 1.86 1.84 0.05
3.09 2.94 2.86 2.78 2.69 2.64 2.61 2.54 2.52 2.48 2.44 2.42 0.01
-·----------------------------
470 Econometrics far developing countries
Table B.3 - Continued
dffor
denom- df for numerator N 1
inator
N,
Pr 1 2 3 4 5 6 7 8 9 10 11 12
0.25 1.40 1.48 1.47 1.45 1.44 1.42 1.41 1.40 1.39 1.39 1.38 1.37
22 0.10 2.95 2.56 2.35 2.22 2.13 2.06 2.01 1.97 1.93 1.90 1.88 1.86
0.05 4.30 3.44 3.05 2.82 2.66 2.55 2.46 2.40 2.34 2.30 2.26 2.23
0.01 7.95 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26 3.18 3.12
0.25 1.39 1.47 1.46 1.44 1.43 1.41 1.40 1.39 1.38 1.38 1.37 1.36
24 0.10 2.93 2.54 2.33 2.19 2.10 2.04 1.98 1.94 1.91 1.88 1.85 1.83
0.05 4.26 3.40 3.01 2.78 2.62 2.51 2.42 2.36 2.30 2.25 2.21 2.18
0.01 7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.26 3.17 3.09 3.03
0.25 1.38 1.46 1.45 1.44 1.42 1.41 1.39 1.38 1.37 1.37 1.36 1.35
26 0.10 2.91 2.52 2.31 2.17 2.08 2.01 1.96 1.92 1.88 1.86 1.84 1.81
0.05 4.23 3.37 2.98 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.18 2.15
0.01 7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.18 3.09 3.02 2.96
0.25 1.38 1.46 1.45 1.43 1.41 1.40 1.39 1.38 1.37 1.36 1.35 1.34
28 0.10 2.89 2.50 2.29 2.16 2.06 2.00 1.94 1.90 1.87 1.84 1.81 1.79
0.05 4.20 3.34 2.95 2.71 2.56 2.45 2.36 2.29 2.24 2.19 2.15 2.12
0.01 7.64 5.45 4.57 4.07 3.75 3.53 3.36 3.23 3.12 3.03 2.96 2.90
0.25 1.38 1.45 1.44 1.42 1.41 1.39 1.38 1.37 1.36 1.35 1.35 1.34
30 0.10 2.88 2.49 2.28 2.14 2.05 1.98 1.93 1.88 1.85 1.82 1.79 1.77
0.05 4.17 3.32 2.92 2.69 2.53 2.42 2.33 2.27 2.21 2.16 2.13 2.09
0.01 7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.07 2.98 2.91 2.84
0.25 1.36 1.44 1.42 1.40 1.39 1.37 1.36 1.35 1.34 1.33 1.32 1.31
40 0.10 2.84 2.44 2.23 2.09 2.00 1.93 1.87 1.83 1.79 1.76 1.73 1.71
0.05 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.08 2.04 2.00
0.01 7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.89 2.80 2.73 2.66
0.25 1.35 1.42 1.41 1.38 1.37 1.35 1.33 1.32 1.31 1.30 1.29 1.29
60 0.10 2.79 2.39 2.18 2.04 1.95 1.87 1.82 1.77 1.74 1.71 1.68 1.66
0.05 4.00 3.15 2.76 2.53 2.37 2.25 2.17 2.10 2.04 1.99 1.95 1.92
0.01 7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63 2.56 2.50
0.25 1.34 1.40 1.39 1.37 1.35 1.33 1.31 1.30 1.29 1.28 1.27 1.26
120 0.10 2.75 2.35 2.13 1.99 1.90 1.82 1.77 1.72 1.68 1.65 1.62 1.60
0.05 3.92 3.07 2.68 2.45 2.29 2.17 2.09 2.02 1.96 1.91 1.87 1.83
0.01 6.85 4.79 3.95 3.48 3.17 2.96 2.79 2.66 2.56 2.47 2.40 2.34
0.25 1.33 1.39 1.38 1.36 1.34 1.32 1.31 1.29 1.28 1.27 1.26 1.25
200 0.10 2.73 2.33 2.11 1.97 1.88 1.80 1.75 1.70 1.66 1.63 1.60 1.57
0.05 3.89 3.04 2.65 2.42 2.26 2.14 2.06 1.98 1.93 1.88 1.84 1.80
0.01 6.76 4.71 3.88 3.41 3.11 2.89 2.73 2.60 2.50 2.41 2.34 2.27
0.25 1.32 1.39 1.37 1.35 1.33 1.31 1.29 1.28 1.27 1.25 1.24 1.24
00 0.10 2.71 2.30 2.08 1.94 1.85 1.77 1.72 1.67 1.63 1.60 1.57 1.55
0.05 3.84 3.00 2.60 2.37 2.21 2.10 2.01 1.94 1.88 1.83 1.79 1.75
0.01 6.63 4.61 3.78 3.32 3.02 2.80 2.64 2.51 2.41 2.32 2.25 2.18
Estimating simultaneous equation models 471
Table B.3 - Continued
df far
df for denominator N 1 denom-
inator
N,
15 20 24 30 40 50 60 100 120 200 500 00 Pr
1.36 1.34 1.33 1.32 1.31 1.31 1.30 1.30 1.30 1.29 1.29 1.28 0.25
1.81 1.76 1.73 1.70 1.67 1.65 1.64 1.61 1.60 1.59 1.58 1.57 0.10 22
2.15 2.07 2.03 1.98 1.94 1.91 1.89 1.85 1.84 1.82 1.80 1.78 0.05
2.98 2.83 2.75 2.67 2.58 2.53 2.50 2.42 2.40 2.36 2.33 2.31 0.01
1.35 1.33 1.32 1.31 1.30 1.29 1.29 1.28 1.28 1.27 1.27 1.26 0.25
1.78 1.73 1.70 1.67 1.64 1.62 1.61 1.58 1.57 1.56 1.54 1.53 0.10 24
2.11 2.03 1.98 1.94 1.89 1.86 1.84 1.80 1.79 1.77 1.75 1.73 0.05
2.89 2.74 2.66 2.58 2.49 2.44 2.40 2.33 2.31 2.27 2.24 2.21 0.01
1.34 1.32 1.31 1.30 1.29 1.28 1.28 1.26 1.26 1.26 1.25 1.25 0.25
1.76 1.71 1.68 1.65 1.61 1.59 1.58 1.55 1.54 1.53 1.51 1.50 0.10 26
2.07 1.99 1.95 1.90 1.85 1.82 1.80 1.76 1.75 1.73 1.71 1.69 0.05
2.81 2.66 2.58 2.50 2.42 2.36 2.33 2.25 2.23 2.19 2.16 2.13 0.01
1.33 1.31 1.30 1.29 1.28 1.27 1.27 1.26 1.25 1.25 1.24 1.24 0.25
1.74 1.69 1.66 1.63 1.59 1.57 1.56 1.53 1.52 1.50 1.49 1.48 0.10 28
2.04 1.96 1.91 1.87 1.82 1.79 1.77 1.73 1.71 1.69 1.67 1.65 0.05
2.75 2.60 2.52 2.44 2.35 2.30 2.26 2.19 2.17 2.13 2.09 2.06 0.01
1.32 1.30 1.29 1.28 1.27 1.26 1.26 1.25 1.24 1.24 1.23 1.23 0.25
1.72 1.67 1.64 1.61 1.57 1.55 1.54 1.51 1.50 1.48 1.47 1.46 0.10 30
2.01 1.93 1.89 1.84 1.79 1.76 1.74 1.70 1.68 1.66 1.64 1.62 0.05
2.70 2.55 2.47 2.39 2.30 2.25 2.21 2.13 2.11 2.07 2.03 2.01 0.01
1.30 1.28 1.26 1.25 1.24 1.23 1.22 1.21 1.21 1.20 1.19 1.19 0.25
1.66 1.61 1.57 1.54 1.51 1.48 1.47 1.43 1.42 1.41 1.39 1.38 0.10 40
1.92 1.84 1.79 1.74 1.69 1.66 1.64 1.59 1.58 1.55 1.53 1.51 0.05
2.52 2.37 2.29 2.20 2.11 2.06 2.02 1.94 1.92 1.87 1.83 1.80 0.01
1.27 1.25 1.24 1.22 1.21 1.20 1.19 1.17 1.17 1.16 1.15 1.15 0.25
1.60 1.54 1.51 1.48 1.44 1.41 1.40 1.36 1.35 1.33 1.31 1.29 0.10 60
1.84 1.75 1.70 1.65 1.59 1.56 1.53 1.48 1.47 1.44 1.41 1.39 0.05
2.35 2.20 2.12 2.03 1.94 1.88 1.84 1.75 1.73 1.68 1.63 1.60 0.01
1.24 1.22 1.21 1.19 1.18 1.17 1.16 1.14 1.13 1.12 1.11 1.10 0.25
1.55 1.48 1.45 1.41 1.37 1.34 1.32 1.27 1.26 1.24 1.21 1.19 0.10 120
1.75 1.66 1.61 1.55 1.50 1.46 1.43 1.37 1.35 1.32 1.28 1.25 0.05
2.19 2.03 1.95 1.86 1.76 1.70 1.66 1.56 1.53 1.48 1.42 1.38 0.01
1.23 1.21 1.20 1.18 1.16 1.14 1.12 1.11 1.10 1.09 1.08 1.06 0.25
1.52 1.46 1.42 1.38 1.34 1.31 1.28 1.24 1.22 1.20 1.17 1.14 0.10 200
1.72 1.62 1.57 1.52 1.46 1.41 1.39 1.32 1.29 1.26 1.22 1.19 0.05
2.13 1.97 1.89 1.79 1.69 1.63 1.58 1.48 1.44 1.39 1.33 1.28 0.01
1.22 1.19 1.18 1.16 1.14 1.13 1.12 1.09 1.08 1.07 1.04 1.00 0.25
1.49 1.42 1.38 1.34 1.30 1.26 1.24 1.18 1.17 1.13 1.08 1.00 0.10 00
1.67 1.57 1.52 1.46 1.39 1.35 1.32 1.24 1.22 1.17 1.11 1.00 0.05
2.04 1.88 1.79 1.70 1.59 1.52 1.47 1.36 1.32 1.25 1.15 1.00 0.01
Degree Pr
of
freedom 0.750 0.500 0.250 0.100 0.050 0.025 0.010 0.005
Notes: For df greater than 100 the expansion: [sqr]2x2 - [sqr](2k - 1) = Z follows the standardised normal distribution.
where k represents the degrees of freedom.
474 Econometrics far developing countries
Table B.5 Durbin-Watson dstatistic: significance points of dL and du at 0.05 level of signiticance
Example
If n = 40 and k' = 4, d¿ = l.28S and du = 1.721. If a computed d value is less !han 1.28S, there is evidence of positive first-or<
serial correlation, if it is greater than 1.721 there is no evidence of positive first-order serial correlation, but if d lies between 1
lower and the upper limit, there is conclusive evidence regarding the presence ar absence of positive first-order serial correlation
n dL du dL du dL du dL du dL du
6 0.610 1.400
7 0.700 l.3S6 0.467 1.896
8 0.763 1.332 O.SS9 1.777 0.368 2.287
9 0.824 1.320 0.629 1.699 0.4SS 2.128 0.296 2.S88
10 0.879 1.320 0.697 1.641 O.S2S 2.016 0.376 2.414 0.243 2.822
11 0.927 1.324 0.6S8 1.604 0.59S 1.928 0.444 2.283 0.316 2.64S
12 0.971 1.331 0.812 1.579 0.6S8 1.864 0.Sl2 2.177 0.379 2.S06
13 1.010 1.340 0.861 1.562 0.71S 1.816 0.S74 2.094 0.44S 2.390
14 l.04S l.3SO 0.90S l.SSl 0.767 1.779 0.632 2.030 o.sos 2.296
lS 1.077 1.361 0.946 1.543 0.814 l.?SO 0.68S 1.977 O.S62 2.220
16 1.106 1.371 0.982 1.539 0.87S 1.728 0.734 l.93S 0.6lS 2.1S7
17 1.133 1.381 l.OlS 1.536 0.897 1.710 0.779 1.900 0.664 2.104
18 l.lS8 1.391 1.046 l.S3S 0.933 1.696 0.820 1.872 0.710 2.060
19 1.180 1.401 1.074 1.536 0.967 l.68S 0.8S9 1.848 0.7S2 2.023
20 1.201 1.411 1.100 1.537 0.998 1.676 0.894 1.828 0.792 1.991
21 1.221 1.420 l.l2S 1.538 1.026 1.669 0.927 1.812 0.829 1.964
22 1.239 1.429 1.147 1.S41 I.OS3 1.664 0.9S8 1.797 0.863 1.940
23 l.2S7 1.437 1.168 l.S43 1.078 1.660 0.986 1.78S 0.89S 1.920
24 1.273 1.446 1.188 1.546 1.101 l.6S6 1.013 1.77S 0.92S 1.902
2S 1.288 l.4S4 1.206 l.SSO 1.123 l.6S4 1.038 1.767 0.9S3 1.886
26 1.302 1.461 1.224 l.SS3 1.143 l.6S2 1.062 1.7S9 0.979 1.873
27 1.316 1.469 1.240 l.SS6 1.162 l.6Sl 1.084 1.7S3 1.004 1.861
28 1.328 1.476 I.2SS l.S60 1.181 l.6SO 1.104 1.747 1.028 l.8SO
29 1.341 1.483 1.270 l.S63 1.198 l.6SO 1.124 1.743 l.OSO 1.841
30 l.3Sl 1.489 1.284 1.S67 1.214 l.6SO 1.143 1.739 1.071 1.833
31 1.363 1.496 1.297 1.570 1.229 l.6SO 1.160 I.73S 1.090 l.82S
32 1.373 1.502 1.309 1.574 1.244 I.6SO 1.177 1.732 1.109 1.819
33 1.383 l.S08 1.321 l.S77 l.2S8 1.6Sl 1.193 1.730 1.127 1.813
34 1.393 1.514 1.333 1.580 1.271 I.6S2 1.208 1.728 1.144 1.808
3S 1.402 1.S19 1.343 I.S84 1.283 I.6S3 1.222 1.726 1.160 1.803
36 1.411 l.S2S l.3S4 1.S87 I.29S I.6S4 1.236 1.724 l.17S 1.799
37 1.419 I.S30 1.364 I.S90 1.307 I.6SS 1.249 1.723 1.190 l.79S
38 1.427 l.S3S 1.373 1.594 1.318 I.6S6 1.261 1.722 1.204 1.792
39 l.43S l.S40 1.382 1.597 1.328 l.6S8 1.273 1.722 1.218 1.789
40 1.442 1.544 1.391 1.600 1.338 l.6S9 I.28S 1.721 1.230 1.786
4S 1.47S I.S66 1.430 1.61S 1.383 1.666 1.336 1.720 1.287 1.776
so I.S03 l.58S 1.462 1.628 1.421 1.674 1.378 1.721 I.33S 1.771
SS I.S28 1.601 1.490 1.641 l.4S2 1.681 1.414 1.724 1.374 1.768
60 l.S49 1.616 I.Sl4 I.6S2 1.480 1.689 1.444 1.727 1.408 1.767
6S l.S67 1.629 1.536 1.662 1.503 1.696 1.471 1.731 1.438 1.767
70 1.583 1.641 l.SS6 1.672 l.52S 1.703 1.494 1.73S 1.464 1.768
7S 1.598 l.6S2 1.571 1.680 I.S43 1.709 l.SlS 1.739 1.487 1.770
80 1.611 1.662 I.S86 1.688 I.S60 1.71S l.S34 1.743 1.507 1.772
8S 1.624 1.671 1.600 1.696 I.S7S 1.721 l.SSO 1.747 l.S2S 1.774
90 I.63S 1.679 1.612 1.703 I.S89 1.726 l.S66 l.7Sl l.S42 1.776
9S I.64S 1.687 1.623 1.709 1.602 1.732 1.S79 I.7SS l.SS7 1.778
100 I.6S4 1.694 1.634 1.71S 1.613 1.736 l.S92 1.7S8 1.S71 1.780
lSO 1.720 1.746 1.706 1.760 1.693 1.774 1.679 1.788 l.66S 1.802
200 I.7S8 1.778 1.748 1.789 1.738 1.799 1.728 1.810 1.718 1.820
Appendix B: Statistical tables 475
Table B.5 - Continued
n dL du du dL du dL du dL dL du
6
7
8
9
10
11 0.203 3.005
12 0.268 2.835 0.171 3.149
13 0.328 2.692 0.230 2.985 0.147 3.266
14 0.389 2.572 0.286 2.848 0.200 3.111 0.127 3.360
15 0.447 2.472 0.343 2.727 0.251 2.979 0.175 3.216 0.111 3.438
16 0.502 2.388 0.398 2.624 0.304 2.860 0.222 3.090 0.155 3.304
17 0.554 2.318 0.451 2.537 0.356 2.757 0.272 2.975 0.198 3.184
18 0.603 2.257 0.502 2.461 0.407 2.667 0.321 2.873 0.244 3.073
19 0.649 2.206 0.549 2.396 0.456 2.589 0.369 2.783 0.290 2.974
20 0.692 2.162 0.595 2.339 0.502 2.521 0.416 2.704 0.336 2.885
21 0.732 2.124 0.637 2.290 0.547 2.460 0.461 2.633 0.380 2.806
22 0.769 2.090 0.677 2.246 0.588 2.407 0.504 2.571 0.424 2.734
23 0.804 2.061 0.715 2.208 0.628 2.360 0.545 2.514 0.465 2.670
24 0.837 2.035 0.751 2.174 0.666 2.318 0.584 2.464 0.560 2.613
25 0.868 2.012 0.784 2.144 0.702 2.280 0.621 2.419 0.544 2.560
26 0.897 1.992 0.816 2.117 0.735 2.246 0.657 2.379 0.581 2.513
27 0.925 1.974 0.845 2.093 0.767 2.216 0.691 2.342 0.616 2.470
28 0.951 1.958 0.874 2.071 0.798 2.188 0.723 2.309 0.650 2.431
29 0.975 1.944 0.900 2.052 0.826 2.164 0.753 2.278 0.682 2.396
30 0.998 1.931 0.926 2.034 0.854 2.141 0.782 2.251 0.712 2.363
31 1.020 1.920 0.950 2.018 0.879 2.120 0.810 2.226 0.741 2.333
32 1.041 1.909 0.972 2.004 0.904 2.102 0.836 2.203 0.769 2.306
33 1.061 1.900 0.994 1.991 0.927 2.085 0.861 2.181 0.795 2.281
34 1.080 1.891 1.015 1.979 0.950 2.069 0.885 2.162 0.821 2.257
35 1.097 1.884 1.034 1.967 0.971 2.054 0.908 2.144 0.845 2.236
36 1.114 1.877 1.053 1.957 0.991 2.041 0.930 2.127 0.868 2.216
37 1.131 1.870 1.071 1.948 1.011 2.029 0.951 2.112 0.891 2.198
38 1.146 1.864 1.088 1.939 1.029 2.017 0.970 2.098 0.912 2.180
39 1.161 1.859 1.104 1.932 1.047 2.007 1.990 2.085 0.932 2.164
40 1.175 1.854 1.120 1.924 1.064 1.997 1.008 2.072 0.952 2.149
45 1.238 1.835 1.189 1.895 1.139 1.958 1.089 2.022 1.038 2.088
50 1.291 1.822 1.246 1.875 1.201 1.930 1.156 1.986 1.110 2.044
55 1.334 1.814 1.294 1.861 1.253 1.909 1.212 1.959 1.170 2.010
60 1.372 1.808 1.335 1.850 1.298 1.894 1.260 1.939 1.222 1.984
65 1.404 1.805 1.370 1.843 1.336 1.882 1.301 1.923 1.266 1.964
70 1.433 1.802 1.401 1.837 1.369 1.873 1.337 1.910 1.305 1.948
75 1.458 1.801 1.428 1.834 1.399 1.867 1.369 1.901 1.339 1.935
80 1.480 1.801 1.453 1.831 1.425 1.861 1.397 1.893 1.369 1.925
85 1.500 1.801 1.474 1.829 1.448 1.857 1.422 1.886 1.396 1.916
90 1.518 1.801 1.494 1.827 1.469 1.854 1.445 1.881 1.420 1.909
95 1.535 1.802 1.512 1.827 1.489 1.852 1.465 1.877 1.442 1.903
100 1.550 1.803 1.528 1.826 1.506 1.850 1.484 1.874 1.462 1.898
150 1.651 1.817 1.637 1.832 1.622 1.847 1.608 1.862 1.594 1.877
200 1.707 1.831 1.697 1.841 1.686 1.852 1.675 1.863 1.665 1.874
476 Econometrics far developing countries
Table B.5 - Continued
n dL du dL du dL du dL du dL du
6
7
8
9
10
11
12
13
14
15
16 0.098 3.503
17 0.138 3.378 0.087 3.557
18 0.177 3.265 0.123 3.441 0.078 3.603
19 0.220 3.159 0.160 3.335 0.111 3.496 0.070 3.642
20 0.263 3.063 0.200 3.234 0.145 3.395 0.100 3.542 0.063 3.676
21 0.307 2.976 0.240 3.141 0.182 3.300 0.132 3.448 0.091 3.583
22 0.349 2.897 0.281 3.057 0.220 3.211 0.166 3.358 0.120 3.495
23 0.391 2.826 0.322 2.979 0.259 2.128 0.202 3.272 0.153 3.409
24 0.431 2.761 0.362 2.908 0.297 2.053 0.239 3.193 0.186 3.327
25 0.470 2.702 0.400 2.844 0.335 2.983 0.275 3.119 0.221 3.251
26 0.508 2.649 0.438 2.784 0.373 2.919 0.312 3.051 0.256 3.179
27 0.544 2.600 0.475 2.730 0.409 2.859 0.348 2.987 0.291 3.112
28 0.578 2.555 0.510 2.680 0.445 2.805 0.383 2.928 0.325 3.050
29 0.612 2.515 0.544 2.634 0.479 2.755 0.418 2.874 0.359 2.992
30 0.643 2.477 0.577 2.592 0.512 2.708 0.451 2.823 0.392 2.937
31 0.674 2.443 0.608 2.553 0.545 2.665 0.484 2.776 0.425 2.887
32 0.703 2.411 0.638 2.517 0.576 2.625 0.515 2.733 0.457 2.840
33 0.731 2.382 0.668 2.484 0.606 2.588 0.546 2.692 0.488 2.796
34 0.758 2.355 0.695 2.454 0.634 2.554 0.575 2.654 0.518 2.754
35 0.783 2.330 0.722 2.425 0.662 2.521 0.604 2.619 0.547 2.716
36 0.808 2.306 0.748 2.398 0.689 2.492 0.631 2.586 0.575 2.680
37 0.831 2.285 0.772 2.374 0.714 2.464 0.657 2.555 0.602 2.646
38 0.854 2.265 0.796 2.351 0.739 2.438 0.683 2.526 0.628 2.614
39 0.875 2.246 0.819 2.329 0.763 2.413 0.707 2.499 0.653 2.585
40 0.896 2.228 0.840 2.309 0.785 2.391 0.731 2.473 0.678 2.557
45 0.988 2.156 0.938 2.225 0.887 2.296 0.838 2.367 0.788 2.439
50 1.064 2.103 1.019 2.163 0.973 2.225 0.927 2.287 0.882 2.350
55 1.129 2.062 1.087 2.116 1.045 2.170 1.003 2.225 0.961 2.281
60 1.184 2.031 1.145 2.079 1.106 2.127 1.068 2.177 1.029 2.227
65 1.231 2.006 1.195 2.049 1.160 2.093 1.124 2.138 1.088 2.183
70 1.272 1.986 1.239 2.026 1.206 2.066 1.172 2.106 1.139 2.148
75 1.308 1.970 1.277 2.006 1.247 2.043 1.215 2.080 1.184 2.118
80 1.340 1.957 1.311 1.991 1.283 2.024 1.253 2.059 1.224 2.093
85 1.369 1.946 1.342 1.977 1.315 2.009 1.287 2.040 1.260 2.073
90 1.395 1.937 1.369 1.966 1.344 1.995 1.318 2.025 1.292 2.055
95 1.418 1.929 1.394 1.956 1.370 1.984 1.345 2.012 1.321 2.040
100 1.439 1.923 1.416 1.948 1.393 1.974 1.371 2.000 1.347 2.026
150 1.579 1.892 1.564 1.908 1.550 1.924 1.535 1.940 1.519 1.956
200 1.654 1.885 1.643 1.896 1.632 1.908 1.621 1.919 1.610 1.931
Appendix B: Statistical tables 477
rabie B.5 - Continued
dL du dL du dL du dL du dL du
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 0.058 3.705
22 0.083 3.619 0.052 3.731
23 0.110 3.535 0.076 3.650 0.048 3.7S3
24 0.141 3.454 0.101 3.S72 O.ü70 3.678 0.044 3.773
25 0.172 3.376 0.130 3.494 0.094 3.604 0.06S 3.702 0.041 3.790
26 0.205 3.303 0.160 3.420 0.210 3.S31 0.087 3.632 0.060 3.724
27 0.238 3.233 0.191 3.349 0.149 3.460 0.112 3.S63 0.081 3.6S8
28 0.271 3.168 0.222 3.283 0.178 3.392 0.138 3.49S 0.104 3.S92
29 0.30S 3.107 0.2S4 3.219 0.208 3.327 0.166 3.431 0.129 3.S28
30 0.337 3.0SO 0.286 3.160 0.238 3.266 0.19S 3.368 0.156 3.465
31 0.370 2.996 0.317 3.103 0.269 3.208 0.224 3.309 0.183 3.406
32 0.401 2.946 0.349 3.0SO 0.299 3.1S3 0.2S3 3.252 0.211 3.348
33 0.432 2.899 0.379 3.000 0.329 3.100 0.283 3.198 0.239 3.293
34 0.462 2.854 0.409 2.9S4 0.3S9 3.0Sl 0.312 3.147 0.267 3.240
3S 0.492 2.813 0.439 2.910 0.388 3.005 0.340 3.099 0.295 3.190
36 0.520 2.774 0.467 2.868 0.417 2.961 0.369 3.053 0.323 3.142
37 0.S48 2.738 0.49S 2.829 0.44S 2.920 0.397 3.009 0.351 3.097
38 0.S7S 2.703 O.S22 2.792 0.472 2.880 0.424 2.968 0.378 3.054
39 0.600 2.671 0.549 2.757 0.499 2.843 0.4Sl 2.929 0.404 3.013
40 0.626 2.641 0.57S 2.724 0.S2S 2.808 0.477 2.892 0.430 2.974
4S 0.740 2.512 0.692 2.S86 0.644 2.659 0.S98 2.733 O.SS3 2.807
so 0.836 2.414 0.792 2.479 0.747 2.S44 0.703 2.610 0.660 2.67S
SS 0.919 2.338 0.877 2.396 0.836 2.4S4 0.79S 2.512 0.7S4 2.S71
60 0.990 2.278 0.9Sl 2.330 0.913 2.382 0.874 2.434 0.836 2.487
6S 1.0S2 2.229 1.016 2.276 0.980 2.323 0.944 2.371 0.908 2.419
70 1.lOS 2.189 1.072 2.232 1.038 2.27S 1.005 2.318 0.971 2.362
7S 1.1S3 2.156 1.121 2.19S 1.090 2.23S 1.0S8 2.275 1.027 2.317
80 1.195 2.129 l.l6S 2.165 1.136 2.201 1.106 2.238 1.076 2.275
85 1.232 2.lOS 1.205 2.139 1.177 2.172 1.149 2.206 1.121 2.241
90 1.266 2.08S 1.240 2.116 1.213 2.148 1.187 2.179 1.160 2.211
9S 1.296 2.068 1.271 2.097 1.247 2.126 1.222 2.1S6 1.197 2.186
00 1.324 2.053 1.301 2.080 1.277 2.108 1.2S3 2.13S 1.229 2.164
50 1.S04 1.972 1.489 1.989 1.474 2.006 1.4S8 2.023 1.443 2.040
:oo 1.S99 1.943 1.S88 1.9SS 1.S76 1.967 1.S6S 1.979 1.SS4 1.991
;ource: This table is an extension of the original Durbin-Watson table and is reproduced from N.E. Savín and K.J. White 'The
)urbin-Watson Test for Serial Correlation with Extreme Small Samples ar Many Regressors', Econometrica, 45, November 1977,
989-96 and as corrected by R.W. Farebrother, Econometrica, 48, September 1980, lSS4.
l/otes:
! = number of observations
;' = number of explanatory variables excluding the constant term
478 Econometrics for developing countries
Table B.6 Critica! values of runs in the runs test
Example
In a sequence of 30 observations consisting of 20 + signs (= N 1) and 10 - signs (= N2), the
critica! values of runs at the 0.05 leve! of significance are 9 and 20, as shown by Table B.6(a)
and (b), respectively. Therefore, if in an application it is found that the number of runs is
equal to or less than 9 or equal or greater than 20, one can reject (at the 0.05 leve! of signif-
icance) the hypothesis that the observed sequence is random.
(a)
Nz
N, 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2 2 2 2 2 2 2 2 2 2
3 2 2 2 2 2 2 2 2 2 3 3 3 3 3
4 2 2 2 3 3 3 3 3 3 3 3 4 4 4 4 4
5 2 2 3 3 3 3 3 4 4 4 4 4 4 4 5 5 5
6 2 2 3 3 3 3 4 4 4 4 5 5 5 5 5 5 6 6
7 2 2 3 3 3 4 4 5 5 5 5 5 6 6 6 6 6 6
8 2 3 3 3 4 4 5 5 5 6 6 6 6 6 7 7 7 7
9 2 3 3 4 4 5 5 5 6 6 6 7 7 7 7 8 8 8
10 2 3 3 4 5 5 5 6 6 7 7 7 7 8 8 8 8 9
11 2 3 4 4 5 5 6 6 7 7 7 8 8 8 9 9 9 9
12 2 2 3 4 4 5 6 6 7 7 7 8 8 8 9 9 9 10 10
13 2 2 3 4 5 5 6 6 7 7 8 8 9 9 9 10 10 10 10
14 2 2 3 4 5 5 6 7 7 8 8 9 9 9 10 10 10 11 11
15 2 3 3 4 5 6 6 7 7 8 8 9 9 10 10 11 11 11 12
16 2 3 4 4 5 6 6 7 8 8 9 9 10 10 11 11 11 12 12
17 2 3 4 4 5 6 7 7 8 9 9 10 10 11 11 11 12 12 13
18 2 3 4 5 5 6 7 8 8 9 9 10 10 11 11 12 12 13 13
19 2 3 4 5 6 6 7 8 8 9 10 10 11 11 12 12 13 13 13
20 2 3 4 5 6 6 7 8 9 9 10 10 11 12 12 13 13 14 14
Estimating simultaneous equation models 479
(b)
N1
N¡ 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
2
3
4 9 9
5 9 10 10 11 11
6 9 10 11 12 12 13 13 13 13
7 11 12 13 13 14 14 14 14 15 15 15
8 11 12 13 14 14 15 15 16 16 16 16 17 17 17 17
9 13 14 14 15 16 16 16 17 17 18 18 18 18 18 18
10 13 14 15 16 16 17 17 18 18 18 19 19 19 20 20
11 13 14 15 16 17 17 18 19 19 19 20 20 20 21 21
12 13 14 16 16 17 18 19 19 20 20 21 21 21 22 22
13 15 16 17 18 19 19 20 20 21 21 22 22 23 23
14 15 16 17 18 19 20 20 21 22 22 23 23 23 24
15 15 16 18 18 19 20 21 22 22 23 23 24 24 25
16 17 18 19 20 21 21 22 23 23 24 25 25 25
17 17 18 19 20 21 22 23 23 24 25 25 26 26
18 17 18 19 20 21 22 23 24 25 25 26 26 27
19 17 18 20 21 22 23 23 24 25 26 26 27 27
20 17 18 20 21 22 23 24 25 25 26 27 27 28
Source: Sidney Siegel, Nonparametric Statistic for the Behavioral Science, New York:
McGraw-Hill, 1956, table F, pp. 252-3. The table has been adapted by Siegel from the
original source: Frieda S. Swed and C. Eisenhart, 'Tables for Testing Randomness of
Grouping in a Sequence of Altematives', Annals of Mathematical Statistics, 14, 1943.
Notes: Table B.6(a) and (b) give the critica! values of runs n for various values of N 1(+
symbol) and Nz(-symbol). For the one sample runs test, any value of n which is equal to
or smaller than that shown in Table B.6(a) or equal to or larger than shown in Table
B.6(b) is significant at the 0.05 leve!.
480 Econometrics far developing countries
Table B. 7 Critical values for Dickey-Fuller test
Sample size Probability of a larger value
10% 5% 1%
Aldrich, John H. and Nelson, Forrest D. (1984) Linear Probability, Logit, and
Probit Models, Beverly Hills, CA: Sage.
Banerjee, Anindya, Dolado, Juan J., Galbraith, John J. and Hendry, David F.
(1993) Cointegration, Error Correction, and the Econometric Analysis of Non-
stationary Data, Oxford: Oxford University Press.
Barten, A.P. (1985) 'Het voorgeborchte der econometrie: het parelsnoer van
Engel', Tijdschrift voor economie en management 30 (3-4): 453-74.
Box, G.E.P. and Jenkins G.M. (1970) Time Series Analysis, Forecasting and
Control, San Francisco: Holden-Day.
Carr, E.H. ([1961], 1990) What is History?, ed. R.W. Davies, Harmondsworth:
Penguin Books.
Chambers, John M., Cleveland, William S., Kleiner, Beat and Tukey, Paula S.
(1983) Graphical Methods for Data Analysis, Pacific Grove, CA: Wadsworth &
Brooks/Cole Publishing Company Advanced Books & Software.
Charemza, Wojciech W. and Deadman Derek F. (1992) New Directions in Econo-
metric Practice: General to Specific Modelling, Cointegration, Aldershot, Hants.:
Edward Elgar.
Chenery, Hollis B. and Strout, William (1966) 'Foreign Assistance and Economic
Development', American Economic Review 66: 679-733.
Collier, P., Radwan, S. and Wangwe, S. (with Wagner, A.) (1986) Labour and
Poverty in Rural Tanzania: Ujamaa and Rural Development in the United
Republic of Tanzania, Oxford: Clarendon Press.
Dasgupta, P. (1993) An Jnquiry into Well-Being and Destitution, Oxford: Oxford
University Press.
Davies, Richard B. (1994) 'From Cross-sectional to Longitudinal Analysis', in
Analysing Social and Political Change: A Casebook of Methods, ed. Angela Dale
and Richard B. Davies, London: Sage Publications.
Demaris, Alfred (1992) Logit Modelling: Practica! Applications, Beverly Hills, CA:
Sage Publications.
Diaconis, P. (1985) 'Theories of Data Analysis: From Magical Thinking Through
Classical Statistics', in David C. Hoaglin, F. Mosteller and J. Tukey (eds),
Exploring Data Tables, Trends and Shapes, New York: John Wiley.
Dickey, D.A. and Fuller, W.A. (1981) 'Likelihood Ration Statistics for Autore-
gressive Time Series With a Unit Root', Econometrica 49: 12-26.
Emerson, John D. and Hoaglin, David C. (1985) 'Resistant Multiple Regression,
One Variable at a Time' in David C. Hoaglin, Frederick Mosteller and John W.
Tukey (eds), Exploring Data Tables, Trends, and Shapes, New York: John Wiley.
Emerson, John D. and Strenio, Judith (1983) 'Boxplots and Batch Comparison',
in David C. Hoaglin Frederick Mosteller and John W. Tukey (eds ), Under-
standing Robust and Exploratory Data Analysis, New York: John Wiley.
482 Econometrics far developing countries
Engle, Robert F. and Granger, C.W.J. (1987) 'Cointegration and Error Correction:
Representation, Estimation and Testing', Econometrica 55 (2): 251-76.
Engle, Robert F., Hendry, D.F. and Richard, J.F. (1983) 'Exogeneity', Econo-
metrica 55: 277-304.
Friedman, Milton (1970) The Controversy in Monetary Theory: The First Winicott
Memorial Lecture, 16 September, JEA Occasional Paper no. 33. London:
Institute of Economic Affairs.
Fry, M.J. (1988) Money, Interest, and Banking in Economic Development,
Baltimore, MD: Johns Hopkins University Press.
Fuller, W.A. (1976) Introduction to Statistical Time Series, New York: John Wiley.
Gatt, J. (1995) An Econometric Analysis of the Determinants of the Maltese Exports
of Manufacturers, ISS Working Paper no. 198, The Hague: Institute of Social
Studies.
Giere, Ronald N. (1991) Understanding Scientific Reasoning, Fort Worth, IL: Holt,
Rinehart & Winston.
Gilbert, Christopher (1990) 'Professor Hendry's Econometric Methodology', in
C.W.J. Granger (ed.), Modelling Economic Series, Oxford: Clerendon Press.
Goldberger, Arthur S. (1964) Econometric Theory, New York: John Wiley.
Goldberger, Arthur S. (1991) A Course in Econometrics, Cambridge, MA: Harvard
University Press.
Gould, S. J. (1996) Full House: The Spread of Excellence from Plato to Darwin,
New York: Harmony Books.
Granger, C.W.J. (ed.) (1990) Modelling Economic Series, Oxford: Clarendon Press.
Granger, C.W.J. and Newbold, P. (1974) 'Spurious Regressions in Econometrics'
Journal of Econometrics 2: 111-20.
Granger, C.W.J. and Newbold, P. (1977) Forecasting Economic Time Series, New
York: Academic Press; 2nd edition 1986.
Gregory, C.A. and Altman, J.C. (1989) Observing the Economy, London: Routledge.
Griffin, Keith (1970) 'Foreign Capital, Domestic Savings and Economic
Development', Bulletin of the Oxford University Institute of Economics and
Statistics 32: 99-112.
Griffin, Keith (1971) 'Reply', Bulletin of the Oxford University Institute of
Economics and Statistics 33: 156-61.
Griffin, Keith and Enos, John (1971) 'Foreign Assistance: Objectives and
Consequences', Economics of Development and Cultural Change 18: 313-27.
Grilli, E. and Yang, M.C. (1988) 'Primary Commodity Prices, Manufactured Goods
Prices and the Terms of Trade of Developing Countries: What the Long Run
Shows', World Bank Economic Review 2: 1-47.
Gujarati, D. (1988), Basic Econometrics, 2nd edn, New York: McGraw-Hill.
Hamilton, James D. (1994) Time Series Analysis, Princeton, NJ: Princeton
University Press.
Hamilton, Lawrence (1990) Modern Data Analysis: A First Course in Applied
Statistics, Pacific Grove, CA: Brooks Cole.
Hamilton, Lawrence C. (1992) Regression with Graphics: A Second Course in
Applied Statistics, Pacific Grove, CA: Brooks Cole.
Harriss, B. (1990) 'The Intrafamily Distribution of Hunger in South Asia', in Jean
Dreze and Amaryta Sen (eds), The Political Economy of Hunger, vol. 1,
Entitlement and Well-being, Oxford: Clarendon Press.
Heckman, J.J. (1992) 'Haavelmo and the Birth of Modern Econometrics: A
Review of The History of Econometric Ideas by Mary Morgan', Journal of
Econometric Literature 30: 876-86.
Helmers, F.L.C.H. (1988) 'Real Exchange Rate Indexes', in R. Dornsbuch and
F.L.C.H. Helmers (eds), The Open Economy: Tools far Policy Makers in
Developing Countries, Oxford: Oxford University Press.
References 483
Hoaglin, David C. (1983) 'Letter Values: a Set of Selected Order Statistics', in
David C. Hoaglin, F. Mosteller and J. Tukey Understanding Robust and
Exploratory Data Analysis, New York: John Wiley, pp. 33-57.
Hoaglin, David C. (1985) 'Using Quintiles to Study Shapes', in David C. Hoaglin,
F. Mosteller and J. Tukey, Exploring Data Tables, Trends and Shapes, New York:
John Wiley.
Hoaglin, David C., Mosteller, F. and Tukey, J. (1983) Understanding Robust and
Exploratory Data Analysis, New York: John Wiley.
Hoaglin, David C., Mosteller F., Tukey J. (1985) Exploring Data Tables, Trends
and Shapes, New York: John Wiley.
Holden, Darryl and Perman, Roger (1994) 'Unit Roots and Cointegration for the
Applied Economist', in B. Bhaskara Rao (ed.) Cointegration far the Applied
Economist, Oxford: Basil Blackwell.
Hopwood, A. (1984) 'Accounting and the Pursuit of Efficiency', in A. Hopwood
and C. Tomkins, lssues in Public Sector Accounting, Oxford: Phillip Allan,
167-87.
Huber, Peter J. (1981) Robust Statistics, New York: John Wiley.
Kennedy, Peter (1992) A Cuide to Econometrics, Oxford: Blackwell.
Khan, M.S. and Reinhart, C.M. (1990) 'Prívate Investment and Economic Growth
in Developing Countries', World Development 18: 19-28.
Kmenta, Jan (1986) Elements of Econometrics, New York: Macmillan.
Krishnaji, N. (1992) 'The Demand Constraint: A Note on the Role of Foodgrain
Prices and Income Inequality', in N. Krishnaji, Pauperising Agriculture: Studies
in Agrarian Change and Demographic Structure, Sameeska Trust, Bombay:
Oxford University Press.
Leamer, E.E. (1978) Specification Searches: Ad hoc lnference with Non-experi-
mental Data, New York: John Wiley.
Leamer, E.E. (1983) 'Let's Take the Con Out of Econometrics', American
Economic Review, 23 (1): 31-43.
Levine, J.H. (1993) Exceptions are the Rule: An lnquiry into Methods in Social
Sciences, Boulder, CO: Westview Press.
Lucas, R.E. (1976) 'Econometric Policy Evaluation: A Critique', in K. Brunner
and A.H. Meltzer (eds), The Phillips Curve and Labour Markets, supplement
to Journal of Monetary Economics 1: 19-46.
MacKie-Mason, J.K. (1992) 'Econometric Software: A User's View', Journal of
Economic Perspectives 6 (4): 165-87.
McKinnon, R.I. (1973) Money and Capital in a Developing Economy, Washington,
DC: Brookings lnstitution.
Maddala, G.S. (1988) lntroduction to Econometrics, Englewood Cliffs: Prentice
Hall.
Maddala, G.S. (1992) lntroduction to Econometrics, New York: Macmillan.
Miller, R.W. (1987) Fact and Method: Explanation, Confirmation and Reality in
the Natural and the Social Sciences, Princeton, NJ: Princeton University Press.
Moore, D.S. and McCabe, G.P. (1989) lntroduction to the Practice of Statistics,
New York: Freeman.
Morgan, M.S. (1990) The History of Econometric Ideas, Cambridge: Cambridge
University Press.
Mosley, Paul, Harrigan, Jane and Toye, John (1991) Aid and Power: The World
Bank and Policy-based Lending, 2 vols, London: Routledge.
Mosteller, Frederick and Tukey, John W. (1977) Data Analysis and Regression: A
Second Course in Statistics, Reading, MA: Addison-Wesley.
Myers R.H. (1990) Classical and Modern Regression with Applications, 2nd edn,
Boston, MA: PWS-Kent.
484 Econometrics far developing countries
Pelto, Pertti J. and Pelto, Gretel H. (1978) Anthropological Research: The Structure
of Inquiry, Cambridge: Cambridge University Press.
Phillips, P.C.B. and Ouliaris S. (1990) 'Asymptotic Properties of Residual Based
Tests for Cointegration', Econometrica 58 (1): 165-93.
Rao, B. Bhaskara (ed.) (1994) Cointegration far the Applied Economist, London:
Macmillan.
Rawlings, John O. (1988) Applied Regression Analysis: A Research Too/, Pacific
Grove, CA: Woodsworth & Brooks/Cole.
Ridde!l, Roger (1987) Foreign Aid Reconsidered, London: James Curry.
Rosenberger James L., and Gasko, Miriam (1983) 'Comparing Location
Estimators: Trimmed Means, Medians, and Trimean', in David C. Hoaglin, F.
Mosteller and J. Tukey Understanding Robust and Exploratory Data Analysis,
New York: John Wiley, pp. 297-338.
Ross, J.A., Rich, M., Molzan, J.P. and Pensak, M. (1988) Family Planning and
Child Survival, 100 Developing Countries, Centre for Population and Family
Health, New York: Columbia University.
Sapsford, David (1985) 'The Statistical Debate on the Net Barter Terms of Trade
Between Primary Commodities and Manufactures: A Comment and Sorne
Additional Evidence', Economic Journal 95: 781-8.
Seers, D. (1976) 'The Political Economy of National Accounting', in A. Caimcross
and M. Pur (eds), Employment, Income Distribution and Development Strategy,
London: Macmillan.
Sen, A.K. (1985) 'Women, Technology and Sexual Divisions', Trade and
Development (UNCTAD), 6.
Sen, A.K. and Sengupta, S. (1983) 'Malnutrition of Rural Indian Children and the
Sex Bias', Economic and Political Weekly 18.
Sen, Gita (1993) 'Paths of Fertility Decline: A Cross-country Analysis', in Pranab
Bardhan, Mrinal Datta-Chauduri and T.N. Krishnan, Development and Change,
Bombay: Oxford University Press.
Shaw, E. (1973) Financia/ Deepening in Economic Development, New York:
Oxford University Press.
Sims, C. (1980) 'Macroeconomics and Reality', Econometrica 48: 1-48.
Snedecor, George W. and Cochran, William G. (1989) Statistical Methods, New
Delhi: Affiliated East-West Press.
Spanos, Aris (1986) Statistical Foundations of Econometric Modelling, Cambridge:
Cambridge University Press.
Spanos, A. (1990) 'Towards a Unifying Methodological Framework for Econo-
metric Modelling', in C.W.J. Granger (ed.), Modelling Economic Series, Oxford:
Clarendon Press.
Sproas, J. (1980) 'The Statistical Debate on the Net Barter Terms of Trade
Between Primary Commodities and Manufactures', Economic Journal 90:
107-28.
Stigler, S.M. (1986) The History of Statistics: The Measurement of Uncertainty
befare 1900, Cambridge MA: Belknap Press.
Tukey, J.W. (1977) Exploratory Data Analysis, Reading, MA: Addison-Wesley.
Wheeler, E.F. (1984) 'Intra Household Food Allocation: a Review of Evidence',
paper presented at meeting on 'The Sharing of Food', Bad Homborg, London:
London School of Hygiene and Tropical Medicine. mimeo.
White, Halbert (1980) 'A Heteroscedasticity Consistent Covariance Matrix
Estimator and a Direct Test for Heteroscedasticity', Econometrica 48: 817-38.
White, Howard (1992) 'What do we Know About Aid's Macroeconomic Impact?',
Journal of International Development 4: 121-37.
Working, H. (1943) 'Statistical Laws of Family Expenditure', Journal of the
American Statistical Association 38: 43-56.
Index
absolute residuals (plots) 254-5, 267, average economic regression 24, 29-32
273-6 average propensity to consume 406,
actual values 403, 405, 444, 445 407, 409-11
added-variable plot 169-70, 183--4 averages (kinds of) 45-50
ad hoc modifications (model
specification) 31-2, 33, 36, 276 Banerjee, Anindya 41, 352
African economies (GNP per capita) Barten, A. P. 112, 158
77-8, 79, 80, 86, 98 Bartlett's test 256-9, 263--4, 267, 270,
age of workers, wages and 277
(heteroscedasticity) 253-6, 261-6 bell-shaped curve 27, 28, 46-8, 49, 65
aid 455; Griffin's model 38, 208, Belize (population data) 372-5, 384,
209-12, 214, 219, 222, 225, 227-8 458
Aldrich, John H. 325 benchmarks, dummy variables and 284
algebra of stationarity 343-6 best linear unbiased estimator
Altman, J. C. 29 (BLUE) 418; autocorrelation and
analysing cross-section data: see cross- misspecification 366, 370, 387; least
section data squares estimators 75, 78, 111, 119,
antilogarithm (inverse transformation) 164, 185, 251, 269, 272, 370;
103-5, 195, 285, 311-12, 323 modelling an average 45, 53-8, 61
applied research 2-3; model bias: assessing direction of 214-15;
specification and 23--43 omitted variable 208, 209-19,
AR(l) process 338--48, 351-2, 367-9, 269-70
380-1, 385, 387, 391, 398 binomial distribution 289-90
arithmetic progression 158 birth rate 4-5, 7-16, 19, 187-91, 456
association: between categorical bivariate analysis 51, 125, 313-16, 399,
variable 287-92; causality and 406
118-19 bivariate normal distribution 124, 126,
augmented Dickey-Fuller test 352, 361
399, 401, 402-3, 406 see bivariate relations 9
Dickey-Fuller test BLUE see best linear unbiased
autocorrelation 132--4, 229, 335, 348, estimator (BLUE)
352; misspecification and (time- Bowley's resistant coefficient 93, 106,
series data) 366-92 132
autoregressive distributed lag (ADL) Box, G. E. P. 40
408 box plots: data analysis (simple
autoregressive process 338--48, 351-2, regression) 132, 142-3, 146, 150;
367-9, 380-1, 385, 387, 391, 398 data transformation( outliers and
auxiliary regressions 169, 176, 182, skewness) 86-93; logit
185, 213, 216 transformation 305-6;
average (modelling) 44-74 transformations to stationarity 357
486 Index
business cycle 345 collinearity 198, 218-19; perfect (of
regressors) 177, 178-9
categorical data (counts and competing proxy variables (regression
measurements) (cross-section data) coefficients) 203-5, 218, 219
279-301 computing the median/quartiles 84-7
categorical variable: association conditional bounds on regression
between (contingency tables) coefficients 202-3
287-92; dependent 302-32; multiple conditional effects plot 328-9
regression on 295-8; regression on conditional mean 122, 138, 265
(using dummy variables) 280-7 confidence intervals 62-6, 96-7, 105,
causality: association versus 118-19; 121-3, 383-4
econometric tests of 415, 425-8, 434 constant mean 361-2
Chambers, John M. 28 constant returns to scale 229, 230
Charemza, Wojciech W. 34 constant term 327; dummies and
checking: assumptions of regression 284-5, 314
model 187-92; model assumptions consumption, income and 406--7
(testing zero restrictions) 226-9; for consumption function 433;
normality in data 91-5 cointegrating 405-6, 407, 409-11,
Chenery, Hollis B. 224 456; Costa Rica 402-5, 456;
chi-square distribution 74, 95, 258 estimating (in simultaneous system)
chi-square test 6, 324-6, 330--1; 445-8; Indonesia 445-8
contingency (of independence) contingency chi square test of
290--2 independence 290-2
child mortality 202-3, 204 contingency tables: association
China (household size/income data) between categorical variables
98-100, 101, 456 287-92; logit modelling with 307-13
Chow test 233, 245, 246--7; second test correlation coefficient 118, 176, 180-4,
236-7 214
classical linear regression model correlograms 379-83
115-16, 120-4 Costa Rican consumption function
Cobb-Douglas production function 402-5, 456
229 counts (cross-section data) 279-301
Cochran, William G. 96 covariance, derivation of (AR(l)
Cochrane-Orcutt procedure 387-90, model) 391
391 Cramer's V statistic 292-4, 299
coefficient of determination 5, 6-7, 14, critical values 292, 352, 354-5, 385,
118, 171-2, 179-80, 185 386, 399, 402, 474-5, 480
coefficient of kurtosis 81-3, 87 crop production function 371-2,
coefficient matrix (rank and arder 375-6, 380, 382-4, 386, 388-90
conditions) 433 cross-correlation 448, 449
coefficient matrix (supply/demand cross-section data: autocorrelation in
model) 431-2 376--9; categorical data 279-301;
coefficient of partial correlation 181-2, heteroscedasticity 251-78; logit
193-4 transformation/modelling 302-32;
coefficient of skewness 81-3, 87, 93, time-series data and 34, 39-42
106 crowding-in (investment) 224, 225,
coefficient of variation 81, 89-90 244-5
cointegrating consumption function crowding-out (investment) 225, 244-5
405-6, 407, 409-11, 456 cubic powers 82
cointegrating regression DW (CRDW)
402, 403, 406 Dasgupta, P. 4-5
cointegration 41, 335; error correction data: exploratory check for normality
model and 393-412 91-5; exploring (graphical methods)
Collier, P. 462 7-13; mining 3, 32; modelling with
References 487
transformed 102-5; non-normality in DFBETA statistic 145-8, 190, 192, 228
(detecting) 90-7; pooling (tests of Diaconis, P. 32
parameter stability) 231-4; role (in diagnostic graphs (in logit regression)
model specification ) 29-39; sets 327-31
18-20, 455-62; table (collapsing) diagnostic plots (heteroscedasticity)
308; time dimension (in model 252-6
specification) 39-42 diagnostic testing (initial model
data analysis (aims/approaches of specification/estimation) 4-7
study) 1-20 dichotomous dependent variables
data analysis (foundations of): model 302-3, 306-8, 310, 312, 313
specification and applied research Dickey-Fuller test 352, 399, 401-4,
23-43; modelling an average 44-74; 406, 412, 480
outliers and skewness (data trans- difference stationary process (DSP)
formations) 75-107 349-51, 365
data analysis, regression and: model differences/differencing 360, 362-3,
selection and misspecification 388
208-48; partía! regression 163-207; Direction of Trade Statistics
simple regression 111-62 (Pakistan) 459
data generation process (DGP) 212, domestic savings 38, 208, 209-12, 233-4
366, 368-9, 394, 405 double-log transformation 149-53, 265,
data transformations: to eliminate 273, 275, 276
skewness 97-105, outliers and Ducpetiaux, E. 112
skewness 75-107 dummy variables 313-14, 317, 322-3;
Deadman, Derek F. 34 parameter instability 237-46;
decision tree (stationarity testing) regression on categorical variable
352-4, 399, 401 280-7, 296, 298
degrees of freedom 121, 258, 325; Durbin-Watson statistic 19, 338, 348,
model selection 216-17; modelling 366, 379, 384-91, 398-404, 406, 412,
an average 66, 69, 74; outliers and 470, 472
skewness 82, 95; partial regression
186, 194 econometrics/econometric modelling 2,
demand: function (money illusion in) 23-4, 29-32; tests of causality 415,
166-7; for labour (scatter plot) 425-8, 434; time-series data and
126-8; for manufactured goods 39-42
(India) 165-73, 194-7 Eisenhart, C. 475
demand and supply model 441, 460; elasticities 134, 149, 159, 410-11;
identification in 428-30; simultaneity constant elasticity function 165;
bias in 417-22 income 166, 196, 205, 239; price 166,
Demaris, Alfred 307 415, 418, 423, 441; supply 418, 423,
density function 58, 59 441, 444
dependent variables 7-9, 219; Emerson, John D. 86, 173
categorical (logit transformation) empírica! research: encompassing
302-32; dichotomous 302-3, 306-8, approach 211-12; exploratory data
310, 312, 313; lagged 366, 370, 386; analysis 35-7
numerical 280-7; partía! regression encompassing approach (in empírica!
163-5, 168, 171-2, 179, 189-90; simple research) 211-12
regression 111, 118, 153-4, 156, 168 endogenous variables 415-18, 420-1,
determination, coefficient of 5, 6-7, 425, 428, 430, 444
14, 118, 171-2, 179-80, 185 energy consumption 150-6,
deterministic time trend 344, 347, 352, Engel's law 24-5, 112, 158
355, 360 Engle, Robert F. 402, 404
detrending 360 Enos, John 209
devaluation, response of Pakistaní error correction model, cointegration
exports to 399-402 and 41, 387, 393-412
488 Index
error terms continued fertility data 4-5, 15, 198-9, 200,
error correction term 407-8 457-8
error terms 76, 252; logit transform fitted values 403, 405, 444, 445
315-16, 327; misspecification and five-number summary 86, 87
autocorrelation 366-7, 379-80; tlow variables 39
modelling an average 52-3, 67; simple focus variable 38
regression 111, 114-16, 121, 150; food: expenditure 112-13, 158-9, 271,
simultaneous equation models 417, 273-6; prices (India) 165-73, 194-7,
438; spurious regression 338, 340--1, 230--1, 458
344, 346-7 foreign aid 38, 208, 209-12
error variance 6, 25, 186-7 fourth root transformation (per capita
errors: in hypothesis testing 67-8; household income) 101-2
standard see standard errors fragility analysis 37-8; regression
estimation: in logit regression 320-7, coefficients 198-205
of population variance 65; of Friedman, Milton 338
simultaneous equation models Fry, M. J. 224
437-53 full information estimation techniques
estimator of error variance 186-7 (simultaneous equation) 445,
exchange rates (Pakistan) 399-402 448-51
exogeneity 416, 425, 427-8 Fuller, W. A. 480; see also
exogenous variables 415-21, 424, 425, Dickey-Fuller test
428-9, 431, 442-4
expenditure, recurrent 142-3, 144, 145; Gasko, Miriam 45
Tanzania 128-37, 239-42 Gatt, J. 420, 458
explained sum of squares (ESS) Gauss, Karl Friedrich 78
117-18, 168-9, 171-2, 179-80 Gauss-Markov theorem 119, 163, 185
explanatory variables 111; categorical gender differences (life expectancy)
280-7; logit modelling 310--11, 312, 87-90, 91-3, 94
317; partial regression 163-4, general-to-specific modelling 33-5,
168-72, 176-7, 198 136, 165, 198
exploratory band regression 113, generalised least squares (GLS)
125-8, 131, 241 estimation 448, 449
exploratory check for normality in generated DSP series 365
data 91-5 generated TSP series 365
exploratory data analysis (EDA) 44, 'genuine' autocorrelation 387-90
125; model specification 35-7, 39; geometric progression 158
outliers and skewness 75, 80, 86-7, Giere, Ronald N. 26
90, 100; partial regression plot 164, Gilbert, Christopher 34
169-70, 183-4, 201 Glejser's test 256, 261-4, 267, 270, 277
exports: Malta 212, 420--1, 423, 426, GNP per capita 4-5, 6, 14-16, 187-91;
458; Pakistan 399-402, 459 African economies 77-8, 79, 80, 86,
extreme bounds analysis (fragility) 98; energy consumption and 150--3;
37-8, 198-205 graphical methods of data analysis
7-13; life expectancy and 87-90,
F-statistic 6, 225-6, 261; distribution 91-3, 94, 157-8, 232
221, 466-7, 476-9; t-test and Goldberger, Arthur S. 44, 124, 316
(comparison) 208-9, 222-3; tests Goldfeld-Quandt test 6, 14, 250,
165, 192, 195, 198, 228-9, 233, 259-61, 263-4, 267, 270, 277
235-6, 245, 355, 363, 420, 425-7; Gould, Stephen Jay 3
zero restrictions 208-9, 220--3 Granger, C. W. J. 7, 23, 32, 37, 41,
family planning variable 202-3 402, 404
Farebrother, R. W. 470, 472 Granger causality test 415, 425-6,
fat tails 100 427-8, 434
female literacy 202-3, 204 graphical methods 3, 4, 7-13
Index 489
graphics: in logit regression 327-31; identification: problem (simultaneity
regression with (checking model bias) 428-34; in supply and demand
assumptions) 124-35 model 428-30
Gregory, C. A. 29 income: consumption and 406--7; data
Griffin, Keith (aid model) 38, 208, 98-101, 258-9, 261-7, 456; house-
209-12, 214, 219, 222, 225, 227-8 hold 98-103; life expectancy and
Grilli, E. 234, 462 377-8; variable 167-9
grouped data 271-2, 326--7 incorrect functional form 372-5
growth rates, logged differences and 360 incremental capital-output ratio
Gujarati, D. 463 (ICOR) 210
Gupta, K. 211 independe11ce: contingency chi-square
test 290--2; stochastic 287-8, 289,
Haavelmo-Cowles research 291
programme 30 independent variables 118, 154, 190,
Hamilton, Lawrence C. 7, 36, 44, 125, 303
126, 144, 148, 190,325,327,329,406 India: food/total expenditure 268, 269,
Harriss, B. 90 457; food prices/manufacturing
Harrod-Domar equation 209, 210 demand 165-73, 194-7, 230--1, 458;
Hartley, H. O. 465 wages/age of workers 253-6, 457;
hat statistic 115, 123, 142, 143-5, 147, wages and education/gender 269-70,
190, 192 279-98, 456--7
Hausman specification test 418-21, indirect least squares (ILS) 439-42,
424, 428, 434, 445 443, 444-7
heavy tails 80, 81, 82-3, 105; checking individual case data, logit regression
for (normality assumption) 93-4; with 321--6
mean/median ( estimators) 96--7 Indonesia: consumption function
Heckman, J. J. 30, 35 445-8; national accounts data 451,
Hendry, D. 32, 34 457
heteroscedastic standard errors infant mortality 4-5, 6, 7-16, 187-91
(HCSEs) 123, 251, 270--7 inflation, money supply and 338-9,
heteroscedasticity 14, 123, 126, 129, 348-9, 361
131, 149, 150, 152, 190, 229, 327; influence, outliers and leverage 137-48
dealing with 251-78; in linear influential points (DFBETA statistics)
probability model 315-16 145-6
histograms 7-9, 11 initial model specification/estimation
Hoaglin, David C. 85, 87, 94, 125, 173 (with diagnostic testing) 4-7
Holden, Darryl 352, 354, 406 instrumental variable estimation
homoscedasticity 6--7, 115, 119, 123; 417-19, 420, 423-4, 442-6, 447
transformations towards 261, 264-70 integrated variables 351
household: expenditure 112-13, 158-9, interaction effect: logit modelling with
190, 271, 273-6; income (China) contingency tables 311-12; partía!
98-100, 101-2, 103; size (China) association 293-4, 298
98-100, 101, 456 intercept dummies 237-41, 243, 244-6,
Huber, Peter J. 144 298
Human Development Index 187-91, interest rate 415
442 International Financia! Statistics
Human Development Report (UNDP) (IMF) 456, 457, 459, 461-2
458, 460 interpreting (error correction model)
hypothesis testing 27, 39; classical 407-12
normal linear regression model 123; interquartile range (IQR) 85, 87, 89,
in logit regression 320--7; model 90, 91, 93-4, 358-9
specification 208-48, modelling an intuitive approach (cointegration)
average 44-74; testing downwards 394-9
33-5 invariance of series variance 337-8
490 Index
inverse transformational gender and 87-90, 91-3, 94
(antilogarithm) 103-5, 195, 285, limited information estimation 445,
311-12, 323 448
investment: crowding-in 224, 225, linear combination between regression
244--5; crowding-out 225, 244--5; Sri coefficients, testing for 195-7
Lankan investment function 223-5, linear infiuence of income variable,
243-6 removing (partial regression) 168-9
irrelevant variables (omitted variable linear probability model 313-20
bias) 213-14, 218-19 linear regression 372-4; least squares
IS curve 415 principie and 111, 114-20; model
(inference from) 120-4; model
Jarque-Bera test (skewness-kurtosis) (partial regression) 184--92
6, 14, 94--5, 100, 132, 259, 276, 296 linear restrictions, non-zero 229-31
Jenkins, G. M. 40 linearity, transformation towards
Johansen method (cointegration test) 148-59
399, 406 LM curve 415
logarithmic transformation 14, 358, 362;
k-variable case 178-9, 182-3, 185 antilogarithm 103-5, 195, 285,
Kennedy, Peter 32, 34, 37 311-12, 323; double-log trans-
Keynesian macro model 432-3 formation 149-53, 265, 273, 275, 276;
Keynesian model 415, 434--5, 438-9, to eliminate skewness 98-104; semi
451 logarithmic transformation 153-9;
Khan, M.S. 231 versatility of (heteroscedasticity) 259,
Kmenta, Jan 322, 327 266, 268-9, 273 logged differences
Krishnaji, N. 165-73, 196-7, 205, and rates of growth 360
230--1, 458 logged real exchange rates (Pakistan)
kurtosis 81-3, 87, 90; Jarque-Bera test 399-400
6, 14, 94-5, 100, 132, 259, 276, 296 logit modelling with contingency
tables 289, 307-13
labour demand 46-8, 50, 53, 55-6, 60, logit regression: estimation and
62, 458-9 hypothesis testing in 320--7; graphics
ladder of power transformation 9, and residual analysis 327-31; linear
100--2, 105, 259, 356-8 probability model and 313-20; with
lagged dependent variable, auto- saturated model 312-13
correlation with 366, 370, 386 logit transformation (cross-section
Landau, L. 211 data) 302-32
Leamer, E. E. 27, 32, 34, 37-8, 198, lognormal variable 150
425 Lucas, R. E. (Lucas critique) 428
least squares estimation 13, 116-17,
120, 124--5, 163; as BLUE 75, 78, McCabe, G.P. 138
111, 119, 164, 185, 251, 269, 272, McKinnon, R. l. 224, 352
370 Madarassay, A. 461
least squares line 170, 173-80 Maddala, G. S. 124, 428
least squares principie 137; concept of Malinowski, Bronislaw 29
resistance and 76-80; indirect least Malta (exports) 212, 420-1, 423, 426,
squares 439-42, 443, 444-7; linear 458
regression and 111, 114--20, 174--5; manufactured goods, demand for
three-stage least squares 448, 450-1; (India) 165-73, 194--7, 230--1, 458
two-stage least squares 420--1, 424, Maputo harbour: casual labour 46-8,
442-5, 447, 451 50, 55, 60, 62-4, 67-70, 83, 94, 126,
Legendre, Adrien Marie 78 149, 317-20, 458-9; overtime
leverage, outliers and infiuence 137-48 payments 48-9, 53
Levine, J. H. 3 marginal propensity to consume 407,
life expectancy 157-8, 232, 377-8, 458; 409-11
Index 491
mathematical properties: of least money supply 363, 415; infiation and
squares regression line 117-18, 338-9, 348-9, 361
174-5; of sample mean 76-8 Moore, D. S. 138
maximum bounds (regression Morgan, Mary S. 30
coefficient) 201-3 Mosley, Paul 247-8
maximum likelihood principie 321-2, Mosteller, Frederick 35, 37, 60, 79,
324-5; normality and 58-61, 112, 204
65; normality assumption 111, multi-equation model 437-8
119-20 multicollinearity 182, 186, 198, 202,
mean 73; based statistics 78-91, 133; 216, 217
conditional 122, 138, 265; estimator multiple regression 6, 14, 16, 111-12;
(heavy tails) 96-7; median versus on categorical variables 295-8;
60-1, 79-80, 91-3, 96-7, 105; in coefficients (interpreting) 163-207;
modelling an average 45-50; model selection and misspecification
stabilising 360-2; see a/so sample in 208-48; partial regression and
mean; zero mean 170-1, 180-1; t-test in 192-8, 219,
measurements (cross-section data) 239-42
279-301 multivariate analysis 16, 399, 406;
median: computing 84-7; estimator see a/so multiple regression
(heavy tails) 96-7; interquartile multiway contingency tables 303, 307
range and 358-9; mean versus 60-1, Myers, R. H. 138, 241
79-80, 91-3, 96-7, 105; in modelling
an average 45-50 Nelson, Forrest D. 325
Miller, R. W. 30, 31 Newbold, P. 41
minimum bound of (regression nominal categorical variable 282
coefficient) 201-3 non-linearity 127, 131, 149, 150, 152;
minimum variance property of sample eliminating skewness 97-105;
mean 56-8 heteroscedasticity 126, 261, 262
misspecification: autocorrelation and non-normality in data (detecting) 90-7
(time-series data) 366-92; bias non-stationarity 335-8, 345, 348-50;
(single equation estimation) 415-36; cointegration and 393, 396, 404,
model selection and (in multiple 406-7; testing for (unit roots) 351-6,
regression) 208-48 386-7, 402
mode 45-50 non-trade variables 308-9
model assumptions (regression with non-zero determinants (rank and
graphics) 124-35 order conditions) 431, 433
model closure 420 non-zero linear restrictions ( testing)
model estimation/testing: modelling an 229-31
average 44-74; see also statistical normal distribution, inference from
inference sample of 61-71
model selection 208-48 normality: in data ( exploratory check)
model specification 4-7; and applied 91-5; maximum likelihood principie
research 23-43; role of data 29-39; 58-61; skewness-kurtosis test 6, 14,
statistical inference and 24-9 94-5
modelling: an average 44-74; cross- normality assumption 6; linear
section data 302-32; a random walk regression model 120-4; maximum
338-43; simple regression 112-14; likelihood and 111, 119-20; model-
with transformed data 102-5 ling an average 17, 45, 58-61;
modern approaches to model validity (detection procedures) 90-7
specification (role of data) 32-9 null hypothesis 6, 95, 135, 325, 354-5;
moments of distribution (skewness cointegration 401-2, 404, 406;
and kurtosis) 106 heteroscedasticity 14, 257-8;
money illusion (in demand function) misspecification and autocorrelation
166-7, 196, 231 369, 383-6; misspecification bias
492 Index
418-19, 421, 426-7; modelling an Pelto, Gretel, H. 31
average 66-9, 70; multiple regres- Pelto, Pertti J. 31
sion 222, 225, 229, 233, 236; partial perfect collinearity of regressors 177,
regression 194, 196 178-9
numerical dependent variable 280-7 Perman, Roger 352, 406
numerical summaries 125, 126, 132-3 permanent income hypothesis 405
Peru (database) 459
odds ratio 310-12, 323-4 Pfefferman, G. 461
omitted variables 262; bias 208-19, Phillips, P. C. B. 402, 404
269-70; as cause of autocorrelation Plosser-Scwert-White differencing test
375-6; Hausman test 419-22; 362-3
heteroscedastic residuals due to 262; pooling data(tests of parameter
Plosser-Scwert-White test 362-3 stability) 231-4
one-tailed test 68, 69 population: data (Belize) 372-5, 384,
order-based sample statistics 80-90, 458; distributions 55-6, 60-1; mean
91, 133 49, 53-5, 56, 57, 59, 60-4, 67, 70,
order condition 430-4 102-3; regression 112, 114, 115, 125,
order of integration (integrated 208, 270-1, 372-5, 384, 455; variance
variables) 351 ( estimating) 65, 66
ordinal categorical variable 281 power transformation 152, 159, 265,
ordinary least squares 148; auto- 360; ladder of 9, 100-2, 105, 259,
correlation 366, 386, 388-9; 356-8
cointegration 394, 400-1, 407; logit Prais-Winsten transformation 388
transformation 316, 319, 326-7; Prebisch, Raul 234
misspecification bias 415-24, 428-9; predicted values (plot) 253-5, 267,
multiple regression 212, 229; 273-4
simultaneous equation models predictive failure 236-7
437-8, 440-1, 443-6, 449-51; prices of food (India) 165-73, 194-7,
spurious regressions 345, 347-8, 350, 230-1, 458
362 private investment (Sri Lanka) 223-5,
origin, regression through 136-7 243-6
orthogonality of regressors 177, 178-9, probability distributions 45
198, 213-14 probability limits (plims) 423-4
Ouliaris, S. 402, 404 probability theory 25, 61-3
outliers 13, 14; definition 87; detecting production function 230, 460;
87-90; leverage and influence Cobb-Douglas 229; crops 371-2,
137-48; skewness and data 375-6, 380, 382-4, 386, 388-90
transformations 75-107 proxy variables, competing (regression
over-identification (indirect least coefficients) 203-5, 218, 219
squares) 441-3, 446-7 pseudo-R2 325-6
overshooting 409 pseudo standard deviation 94, 132
overtime payments (Maputo harbour) public investment (Sri Lanka) 223-5,
48-9 243-6