Professional Documents
Culture Documents
333–50
R-squared 0.89
Standard error of regression 2.07
F-statistic 291.58
Probability (F-statistic) 0.00
* We would like to thank David Moreton for helpful suggestions and the Quality of Teaching Committee of the Faculty of
Economics and Commerce at the University of Melbourne for financial support.
©
2005 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
Published by Blackwell Publishing Asia Pty Ltd
334 The Australian Economic Review September 2005
Figure 1 A Scatter Plot of the Data for Example 1 subgroups indicating the presence of a ‘dumb-
bell’ plot as shown in Figure 2. Note that if the
data in each subgroup are fit as separate mod-
els, the R2’s obtained from the separate regres-
sions are very close to 0.
The regression results for the second exam-
ple are presented in Table 2. These results ap-
pear to show that there is not a significant
relationship between y and x and the regression
has a very low R2. With results such as these it
is often the case that one would assume that
these variables have no relationship to one an-
other.
Figure 2 The Implied Dumbell However, a scatter plot of x and y in Figure 3
for the Data for Example 1 indicates that there is a unique well-defined re-
lationship between them. To estimate the pa-
rameters of this relationship would require a
model that does not assume linearity.
Many other authors have found examples
similar to those presented here. Two worth
mentioning are by Anscombe (1973) and
Leamer (1994). In Anscombe (1973) four
datasets each consisting of 11 data points on y
and x are examined. For each of the four
datasets the same results from an ordinary least
squares regression including the estimated co-
efficients and R2 are obtained. However, when
Figure 3 A Scatter Plot of the Data for Example 2 the data are plotted for each of the four
datasets, differences between the datasets such
as outliers and non-linearities can be seen.
Leamer (1994, p. xiii) presents an example in
which a consumption function is estimated
using hypothetical expenditure and income
data. The results of the ordinary least squares
regression look very good with a high R2 and a
coefficient of the right sign. However, a scatter
plot of the data reveals the data spell out the let-
ters H E L P.
R-squared 0.00
Standard error of regression 8.51
F-statistic 0.02
Probability (F-statistic) 0.89
©
2005 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
Hirschberg, Lu and Lye: Descriptive Methods for Cross-Section Data 335
The inspection of the data prior to running a The estimation of the density for the contin-
regression is strongly encouraged in numerous uous random variable is often not sufficient.
econometric textbooks. For example, see Because the normal distribution acts as a
Kennedy (2003, p. 408), Koop (2000, pp. 12– model on which many statistical tests rely, it is
20), Pindyck and Rubinfeld (1998, p. 45), important to determine how the distribution for
Greene (2003, pp. 878–81) and Griffiths, the series compares to a known distribution
Carter Hill and Judge (1993, p. 22). However, such as the normal. Q-Q plots and P-P plots
in many cases the discussion is limited to the provide a means of comparing series distribu-
construction of histograms and scatter plots. tions to a standard distribution while also iden-
The aim of this article is to provide a detailed tifying the observations that do not conform
student guide to useful methods of summaris- and these techniques are presented in Section
ing and examining raw cross-section data 4.
before attempting to apply sophisticated Sections 5 and 6 discuss graphical tech-
econometric techniques.2 niques for analysing multivariate data. The re-
To illustrate the techniques presented we use lationship between two series is the focus of
detailed data for Dominick’s Finer Foods su- Section 5 and the use of the correlation coeffi-
permarkets. A research project at the Univer- cient and the scatter plot is examined. Whereas
sity of Chicago has made available a set of the correlation coefficient only indicates the
detailed data for the Dominick’s supermarkets presence of a linear association, the scatter plot
located in metropolitan Chicago.3 We use can highlight the presence of a non-linear rela-
these data to create average daily sales by de- tionship between the variables or the presence
partment for a set of 84 stores for which the of outliers. Section 6 discusses two graphical
data are relatively complete—at least three techniques that can be used to look at a group
years of data are used for each store. In addi- of variables at a time. Side by side box plots si-
tion to the sales for each store, the data also multaneously display the distribution of a
contain information obtained from the US group of variables so that distributions and
Census to describe the population in the neigh- their properties can be compared between the
bourhood in which the store is located and variables. The matrix scatter plot is a graphic
some additional marketing information con- analogue to the correlation matrix and is a use-
cerning the nature of the customers in the ful method of tracking particular observations
store. across a group of variables.
Various univariate methods for investigating An important use of graphs is to find patterns
the individual series are discussed first. In Sec- in the data. Therefore graphs need to be clear
tion 2 we define a number of descriptive statis- and well presented and often this requires
tics that numerically summarise the properties changing the default options available in
of a single variable. In Section 3 we demon- graphic routines in computer programs. In Sec-
strate how graphic displays for the nature of the tion 7 we illustrate the steps involved in chang-
distribution of the series can be generated. The ing the visual impact of a graph obtained using
first of these distribution plots is the box plot. the default options in Microsoft Excel. Section
This is a very useful summary in graphic form 8 discusses statistical packages with particular
of the overall shape of the distribution of the emphasis on widely used software packages for
data. The next method is the histogram which graphics in econometrics. Section 9 concludes
provides more detail. Here we introduce the the article.
method by which we can compare the series
distribution to the normal distribution using a 2. Descriptive Statistics
simple overlay. Following the histogram we
introduce the kernel density estimate or A number of descriptive statistics are used to
smoothed histogram. This provides a more ac- summarise the properties of a single variable x
curate estimate for the distribution of a contin- with n observations in terms of its location, dis-
uous variable. persion and shape. The most common of which
©
2005 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
336 The Australian Economic Review September 2005
is the mean. The mean is a statistical term for Table 3 Descriptive Statistics for Produce Sales
the average and is defined as: Measure Statistic
Mean 5573.97
1 n
x = --- ∑ x i (1) Median 5575.83
ni = 1
Variance 3321618.56
where the xi are the observations and n is the Standard deviation 1822.531
number of observations. Another measure of Minimum 2083.79
central tendency is the median which if n is an Maximum 12660.69
even number is defined as: Skewness 1.070
Kurtosis 5.508
x[(n ⁄ 2) – 1] + x[(n ⁄ 2) + 1]
median = ----------------------------------------------------- (2) Jarque-Bera 38.03
2
provide a rough shape to the density form, and and the ‘lower adjacent value’. The upper adja-
the density plot which is a smoothed version cent value (xua) is defined as either the largest
of the histogram. observation [max(x)] if it is less than or equal
to the upper quartile plus 1.5 times the inter-
3.1 Box Plots quartile range or xua = min[Q0.75 + 1.5IQR,
max(x)]. For Produce Sales, xua = $8359. The
The box plot provides a summary display of the lower adjacent value (xla) is either the smallest
distribution of the data (Chambers et al. 1983) observation if it is greater than or equal to the
by graphically showing quartiles of the data. It lower quartile minus 1.5 times the IQR or 1.5
shows the centre of the distribution (the median times the IQR or xla = max[Q0.25 – 1.5IQR,
or 50th percentile (Q0.50)), the spread of the min(x)]. Thus for Produce Sales, it is the small-
bulk of the data (the length of the box which is est value, $2084.
the distance from the 25th percentile (Q0.25) to Any data points that fall outside the range of
the 75th percentile (Q0.75)), and how stretched the two adjacent values are referred to as out-
the tails of the distribution are (the length of the side or outlier values and are plotted as an indi-
lines relative to the box). Additional values that vidual point. In the case of Produce Sales, there
are greater than the limits are plotted as well. In are two outside values, observations 62 and 84,
addition, some box plot programs also have op- which are respectively the sales values of store
tions to locate the mean of the data and the 95 109, $11895, and the sales value of store 137,
per cent confidence bounds of the mean. $12661. If there are outside values it may be
In Figure 4 a simple box plot is illustrated for necessary to go back to the source of the data to
Produce Sales. The top and bottom of the rect- verify that these values are valid. In this case
angle represent the upper and lower quartiles of these two extreme values are for two large
the data and the centre line within the rectangle stores which both have more than $94000 in
is the median which is equal to $5576. The total average daily sales while the mean of total
upper and the lower quartiles are found by or- average daily sales is $56046.
dering the data and finding those values that The width of the box plot is arbitrary which
limit the upper 25 per cent and the lowest 25 means that multiple box plots could be placed
per cent. Thus the interquartile range (IQR) is side by side to allow for comparisons between
the range between the upper and lower quar- groups of data. We will examine this use of the
tiles IQR = Q0.75 – Q0.25; for Produce Sales, box plot further in Subsection 6.1 when we dis-
IQR = $2529. cuss multivariate plotting techniques.
The lines that extend from the ends of the
box (sometimes referred to as whiskers) go to 3.2 Histograms
what is referred to as the ‘upper adjacent value’
Another way to summarise a data distribution
is the histogram or density plot for data that
take discrete values. The range of the data is
Figure 4 Box Plots of Produce Sales
partitioned into several intervals of equal
length, the number of points is counted in each
interval and plotted as bar lengths in a histo-
gram (Chambers et al. 1983). The vertical axis
shows the proportion of the observations in
each bar and the relative heights of the bars
represent the relative density of cases in the in-
tervals. For Produce Sales, there is one store
that sells around $2000 and two stores that sell
more than $9000, as shown in Figure 5. The
Produce Sales of most stores are between
N = 84
$3000 and $9000.
©
2005 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
338 The Australian Economic Review September 2005
Figure 5 Histogram of Produce Sales 1822.53) as defined by the formula for the esti-
mated normal density as given below:
1 –( z – x )2
f̂ N ( z ) = ----------------- exp --------------------- (8)
2 πσ̂ 2 2s 2
of problems arise. First, histograms, unlike the be selected. Commonly the choice of K(•) will
data-generating process, are not continuous; be a function that defines a symmetric unimo-
they assume that all values within an interval dal probability density function and hence it
have the same probability of occurring. Sec- will follow that f̂ K ( z ) will itself be a probabil-
ond, it is quite possible to obtain quite different ity density function and will inherit all the con-
looking histograms for the same data by chang- tinuity and differentiability properties of the
ing the length of the interval used and by kernel K(•).
changing their locations. One can think of the histogram as a form of
One commonly employed alternative ap- kernel density estimate where the density is
proach is to use the kernel density estimator. only evaluated at the midpoints of the intervals
The kernel is a weighting function which is de- and the bandwidth is half the interval size. We
fined for each point at which the density is define the kernel as K(ui) = hI( u i ≥ 1), where
evaluated. The kernel density estimator is de- ui = (1/h)(xi – z) and I(•) is an indicator function
fined as: that takes a value of 1 if its argument is true and
0 otherwise. This means that whenever an ob-
1 n z – xi
f̂ K ( z ) = ------ ∑ K ⎛ ------------⎞
servation satisfies the inequality defined as
(9)
nh i = 1 ⎝ h ⎠ x i – z ≤ h it is included in the interval with a
midpoint at z. The estimated density for the his-
where x1 … xn are the observations in the sam- togram defined in the form of a kernel density
ple, f̂ K ( z ) denotes the estimated density func- estimator is given as:
tion, z is the value at which the density is being
evaluated, h is the bandwidth (also known as 1 n
f̂ H ( z ) = ------ ∑ hI ( u i ≤ 1 )
the smoothing parameter or window width) nh i = 1
and K(•) is the kernel function. z is usually a
1 n
value within the span of the values of the sam- = --- ∑ I ( x i – z ≤ h ) (10)
ni = 1
ple but could be outside the range of the values
of the sample; recall that in (8) we defined the
A widely employed kernel function is the
normal density which is defined over all values
Epanechnikov function (Epanechnikov 1969)
from –∞ to ∞. The bandwidth (h) is similar to
which has the form:
the width of the intervals in the histogram and
determines the smoothness of the density esti-
3
mate. The kernel function is a weighting func- K(ui) = --- ( 1 – u 2i )I ( u i ≤ 1 ) (11)
4
tion and is generally chosen so that less weight
is placed on observations that are further from
where again we use ui = (1/h)(xi – z). Note
z than is placed on those that are near. That is,
that only those observations which satisfy the
the distance from z, defined for a symmetric
kernel function as z – x i (the absolute value of
the difference between z and xi), takes larger
Figure 7 The Epanechnikov Kernel
values the closer xi is to z and smaller values
when they diverge from each other. When we
use (8), the normal distribution, to approximate
the density we downweight the density for
those observations that are further from the
mean of the sample. We also divide the dis-
tance from the mean by the variance which
makes the estimated density flatter in shape in
much the same way that larger values of the
bandwidth imply a smoother density.
To implement the kernel estimator a smooth-
ness parameter h and kernel function K(•) must
©
2005 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
340 The Australian Economic Review September 2005
criterion that u i ≤ 1 are included in the com- little to choose between the various kernels and
putation of the density estimate for z. A graph it is quite appropriate to base the choice of ker-
of the Epanechnikov function is given in Fig- nel on which is easiest to compute.
ure 7. In Figure 7, as z → xi, then ui → 0 and Bandwidth (h) selection is crucial in density
K(ui) reaches its maximum value. estimation as it controls the smoothness of the
There are numerous other kernels that have density estimate. The larger the value of h the
been suggested; for example, the biweight ker- smoother f̂ K ( z ) . There are various ways of
nel: choosing h. One possibility is to plot out sev-
eral values of f̂ K ( z ) corresponding to different
15 2 values of h and then choose the estimate which
K(ui) = ------ ( 1 – u 2i ) I ( u i ≤ 1 ) (12)
16 meets with the prior expectation about the den-
sity. As with the choice of interval size/number
and the uniform kernel: with histograms, the choice of the bandwidth
should be made so that the complexity of the
1 data is not masked by too large a value for h,
K(u) = --- I ( u i ≤ 1 ) (13)
2 without making the artefacts in the particular
sample into prominent features. Silverman
We can also base the kernel density estimate on (1986) provides a general formula which he
the normal density functional form by using the claims works well for a variety of cases based
normal kernel function defined as: on the interquartile range (IQR), the estimated
standard deviation (s), and the sample size (n):
1
K(u) = ----------exp ( – 0.5u 2 ) (14)
2π h = 0.9n –1 ⁄ 5 min ( s, IQR ⁄ 1.34 ) (15)
Although the normal density has a well-known In Figure 8 a histogram of Produce Sales is
form, it has the potential disadvantage that out- presented. The width of the intervals case is
liers in the sample may influence the density given as $1000. Given the location and size of
estimate for a certain value even when they are these bins we have an indication of a bimodal
a great distance away. It also has the property probability function and a small probability of
that f̂ K ( z ) ≠ 0, ∀z ∈ R or that the estimated quite large values.
density is never equal to zero no matter where In Figure 9 the kernel density estimate is
it is evaluated on the real line. plotted by the Eviews computer program using
Silverman’s (1986) monograph is an impor- the Epanechnikov kernel and a bandwidth (h)
tant reference for a more detailed discussion of of 1000. A comparison shows that the kernel
these functions. Silverman comments that on density estimate is much smoother than the
the basis of efficiency measures there is very histogram although they both show similar
Figure 8 Histogram of Produce Sales Figure 9 Kernel Density Estimate of Produce Sales
Kernel
K e rn e lDDensity (Epanechnikov,
e n sity (E p a n e ch n iko v,hh =
= 1000.0)
1 0 0 0 .0 )
.0 0 0 2 4
.0 0 0 2 0
.0 0 0 1 6
.0 0 0 1 2
.0 0 0 0 8
.0 0 0 0 4
.0 0 0 0 0
4000 8000 12000
Produce
P ro d u ceSales
S a le s(Fruit
(fru itand
a n d Veg)
V eg )
©
2005 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
Hirschberg, Lu and Lye: Descriptive Methods for Cross-Section Data 341
n n
∑ ( xi – x )2 ∑ ( yi – y )2
i=1 i=1
Figure 14 Scatter Plot of Produce the correlation coefficients between the x and y
Sales and Total Average Daily Sales variables is around 0.7, yet the scatter plots
show quite different relationships between the
variables. Thus the relationship between them
is not as well established by the correlation co-
efficient as the scatter plot.
In the introduction of this article we pre-
sented two examples of simple regressions
where the scatter plot told the story that the re-
gression estimates did not. In the first example
we found that the correlation was quite high but
that this was due to a ‘dumbbell’-type relation-
ship. In the second example we determined
from the regression that the correlation was
very low but, just as in the second plot in Fig-
ure 15, the plot indicated a prominent relation-
ship between the variables.
may potentially be outliers. In particular, two
of these observations are observations that 6. Multivariate Techniques
have been highlighted in the Q-Q and box plots
for Produce Sales. Thus the two stores corre- The examination of the relationships among a
sponding to these observations have high Pro- group of variables is frequently the objective of
duce Sales and high Total Average Daily Sales. our analysis. Even if this is the case it is still
This may indicate that for these two stores their useful to begin by looking at each variable in-
Produce Sales are in fact a large component of dividually, paying attention to such things as
their total sales. Note that the type of relation- skewness, kurtosis, outliers, distributional as-
ship shown in Figure 1 is not present in this sumptions and so on. The techniques described
case, thus we can conclude that although the above in Sections 3 and 4 are useful for this
correlation coefficient is high it is not due to purpose. However, there are other graphical
the presence of a ‘dumbbell’ effect. techniques that can be used to look at a group
To demonstrate why the scatter plots provide of variables at a time. In this section two such
additional information we examine two more techniques will be described. The first is side
scatter plots in Figure 15. In both of these cases by side box plots and the second is the graphic
©
2005 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
Hirschberg, Lu and Lye: Descriptive Methods for Cross-Section Data 345
analogue to the correlation matrix—the matrix Figure 16 Side by Side Box Plots
scatter plot.
©
2005 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
346 The Australian Economic Review September 2005
Table 5 The Correlation Matrix Corresponding to the Different Variables Plotted in Figure 17
Produce Sales Total Average
Grocery Sales Meat Sales (Fruit and Veg) Daily Sales
Grocery Sales 1.00 0.91 0.87 0.97
Meat Sales 0.91 1.00 0.76 0.90
Produce Sales (Fruit and Veg) 0.87 0.76 1.00 0.91
Total Average Daily Sales 0.97 0.90 0.91 1.00
Figure 17 A Matrix Scatter Plot of Four Variables allows a lot of variation in the design of the el-
ements of the plots. It is possible to change the
shape, size, font, colour, darkness, orientation
and location along an axis of the graph to max-
imise its visual impact. In this section we
present an example of a scatter plot using the
default options in Excel and we demonstrate
what steps are involved in improving the visual
impact of this graph.
©
2005 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
348 The Australian Economic Review September 2005
available for data analysis in the 1960s. The offshoot of the demand for the capability of
earliest programs included rudimentary graph- computers to play sophisticated computer
ics routines that produced what are often re- games—more and more statistics packages
ferred to as ‘printer plots’ and line graphs were have been written with graphics plotting soft-
only available if one had access to purpose- ware. Microsoft Excel is a widespread program
built plotters. These printer plots are still avail- for the generation of plots. Two widely used
able in a number of programs and can provide software packages for graphics in economet-
a convenient method for scanning large rics are SPSS and Eviews (here we refer to ver-
amounts of data in that they can be produced sion 11.5 and version 5.0 respectively). Both of
very efficiently and scaled with a single se- these programs employ the point-and-click
lection command. With the widespread avail- menu-driven editing of the plot characteristics
ability of graphics-capable computers—an of the most widely used PC software such as
Figure 22 The Multiple Density Plots for Supermarket Sales
©
2005 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
Hirschberg, Lu and Lye: Descriptive Methods for Cross-Section Data 349
Microsoft Excel. Below we have listed the ca- ity of overlaying one plot on the other. This is
pabilities of Eviews and SPSS. particularly useful for summarising a number
Eviews and SPSS have graphics editors that of plots on one page. Figure 22 is an example
allow changes to almost all aspects of the plot. of how the distribution plots of the sales type
Once the initial plot has been created the plot listed in Figure 16 can be summarised in a se-
can be brought into the graphics editor. Both ries of kernel density plots. It is possible to re-
Eviews and SPSS allow you to modify the font, quest the estimated density be placed in a file
colour and scale of the axis—however SPSS that can be exported to another file so that den-
allows you to identify particular observations sities can be compared on the same plot. In Fig-
as well as annotate the graph and to add refer- ure 23 we have rescaled the density plots for
ence lines wherever needed inside the graphic both Health and Beauty Sales and Bakery Sales
area (see Figures 12, 14, 16 and 17 where par- so that they can be compared directly. Note that
ticular observations are identified). In addition, the area under each density curve is scaled so
with SPSS multiple graphic images can be cop- that it is equal to unity.
ied simultaneously and inserted into an MS-
Word document. SPSS also has the capability 9. Conclusion
of recording the particular modifications made
to a plot to a file so that new plots can be made The message to be drawn from this article is
with the same format using what is referred to that graphic representation of data can help to
as a ‘template’ file (an option also available in improve the understanding of the observational
Eviews). But a further feature of SPSS, which information used in statistical analysis. There
is available after every command, is that it cre- are a number of methods for summarising data
ates an exact ‘journal file’ which allows all the in a visual way. These methods can be used on
point-and-click commands completed in a ses- individual series of values or on paired data or
sion to be recorded to a ‘batch commands’ file with multiple series. The distributional as-
which can be read into the syntax window of sumptions of the data can be examined and the
the program. In this window the file can be ed- interrelationships between two variables and
ited with a text editor so that multiple identical multiple variables can be made. In addition, we
runs can be made with the same data. have demonstrated how a standard software-
Eviews unlike SPSS will compute non- generated graphic image can be improved to
parametric density estimates. In addition, in enhance the message in the graphic informa-
Eviews multiple graphs can be produced and tion of the data.
put into a single graphic file—with the capabil- The emphasis in this article is on cross-
section observations in that we have not dis-
cussed the time-series aspects of the data. Im-
Figure 23 The Overlay of the Density Plot plicitly we have assumed that the data under
for Health and Beauty Sales (Dashed Line) examination are identically and independently
and Bakery Sales (Solid Line) distributed. The second assumption is often
not the case when a sample is measured over
time. Unfortunately, when a sample is not in-
dependent the use of correlation methods with
other dependent data may result in spurious re-
sults. In addition, if the data are not identically
distributed, the estimation of the density func-
tion may be confounded by the fact that the
data may be generated by multiple processes
and thus trying to identify a single process
may be akin to the use of data from a dumb-
bell plot case to estimate a correlation coeffi-
cient. Graphical methods can be used with
©
2005 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research
350 The Australian Economic Review September 2005
time-series data to identify the nature of the Chambers, J. M., Cleveland, W. S., Kleiner, B.
data. This will be the topic of a future paper. and Tukey, P. A. 1983, Graphical Methods
for Data Analysis, Chapman & Hall, United
November 2004 States.
Epanechnikov, V. A. 1969, ‘Nonparametric es-
Endnotes timates of multivariate probability density’,
Theory of Probability and Applications, vol.
1. Note that this article assumes the knowledge 14, pp. 153–8.
from a first year introductory statistics subject. Greene, W. 2003, Econometric Analysis, 5th
edn, Prentice Hall, New Jersey.
2. Cross-section data are data on one or more Griffiths, W., Carter Hill, R. and Judge, G.
variables collected at the same point in time, 1993, Learning and Practicing Economet-
such as survey data. Although the methods de- rics, John Wiley & Sons Ltd, New York.
scribed here can be applied to data over time Henry, G. T. 1995, Graphing Data: Tech-
(time series), time-series data require special- niques for Display and Analysis, Sage Publi-
ised methods which are not discussed in this ar- cations, Thousand Oaks, California.
ticle. Kennedy, P. 2003, A Guide to Econometrics,
5th edn, Blackwell Publishing, United King-
3. The Dominick’s database covers store-level dom.
scanner data collected at Dominick’s Finer Koop, G. 2000, Analysis of Economic Data,
Foods over a period of more than seven years. John Wiley & Sons Ltd, New York.
The data are the property of the Marketing Leamer, E. 1994, Sturdy Econometrics, Ed-
group at the University of Chicago Graduate ward Elgar Publishing Company, Great Brit-
School of Business and are intended for aca- ain.
demic use only. For more detail on any other Pindyck, R. and Rubinfeld, D. 1998, Econo-
parts of this dataset consult the web site at metric Models and Economic Forecasts, 4th
<http://gsbwww.uchicago.edu/research/mkt/ edn, International edn, Irwin/McGraw Hill,
Databases/DFF/DFF.html>. Boston, Massachusetts.
Silverman, B. W. 1986, Density Estimation for
References Statistics and Data Analysis, Chapman &
Hall, London.
Anscombe, F. 1973, ‘Graphs in statistical an- Tufte, E. R. 1983, The Visual Display of
alysis’, American Statistician, vol. 27, pp. Quantitative Information, Graphics Press,
17–21. Cheshire, Connecticut.
©
2005 The University of Melbourne, Melbourne Institute of Applied Economic and Social Research