Mathematical Population Studies: An International Journal of Mathematical Demography

This article was downloaded by: [Purdue University]
On: 17 January 2015, At: 09:36

Publisher: Routledge
Informa Ltd Registered in England and Wales Registered Number: 1072954
Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK
Mathematical Population Studies:

An International Journal of
Mathematical Demography
Publication details, including instructions for authors and
subscription information:
http://www.tandfonline.com/loi/gmps20
Simple versus complex models:

Evaluation, accuracy, and
combining
a b
Dennis A. Ahlburg
a
Industrial Relations Center and Center for Population
Analysis and Policy , University of Minnesota
b
Program on Population , East‐West Center , 1777
East‐West Road, Honolulu, HI, 96848, USA
Published online: 21 Sep 2009.
To cite this article: Dennis A. Ahlburg (1995) Simple versus complex models: Evaluation,
accuracy, and combining, Mathematical Population Studies: An International Journal of
Mathematical Demography, 5:3, 281-290, DOI: 10.1080/08898489509525406
To link to this article: http://dx.doi.org/10.1080/08898489509525406
PLEASE SCROLL DOWN FOR ARTICLE
Taylor & Francis makes every effort to ensure the accuracy of all the information
(the “Content”) contained in the publications on our platform. However, Taylor
& Francis, our agents, and our licensors make no representations or warranties
whatsoever as to the accuracy, completeness, or suitability for any purpose of the
Content. Any opinions and views expressed in this publication are the opinions and
views of the authors, and are not the views of or endorsed by Taylor & Francis. The
accuracy of the Content should not be relied upon and should be independently
verified with primary sources of information. Taylor and Francis shall not be liable
for any losses, actions, claims, proceedings, demands, costs, expenses, damages,
and other liabilities whatsoever or howsoever caused arising directly or indirectly in
connection with, in relation to or arising out of the use of the Content.
This article may be used for research, teaching, and private study purposes.
Any substantial or systematic reproduction, redistribution, reselling, loan, sub-
licensing, systematic supply, or distribution in any form to anyone is expressly
forbidden. Terms & Conditions of access and use can be found at http://
www.tandfonline.com/page/terms-and-conditions
Downloaded by [Purdue University] at 09:36 17 January 2015
Mathematical Population Studies © 1995 OPA (Overseas Publishers Association)
1995, Vol. 5(3), pp. 281-290 Amsterdam B.V. Published under license by
Reprints available directly from the publisher Gordon and Breach Science Publishers SA.
Photocopying permitted by license only Printed in Malaysia
SIMPLE VERSUS COMPLEX MODELS: EVALUATION,

ACCURACY, AND COMBINING*
DENNIS A. AHLBURG
Industrial Relations Center and Center for Population Analysis and Policy,
University of Minnesota
and
Program on Population, East-West Center, 1777 East-West Road,
Honolulu, HI 96848, USA
February 20, 1995
This paper argues that it is premature to decide whether simple forecasting models in demography are
more (or less) accurate than complex models and whether causal models are more (or less) accurate than
noncausal models. It is also too early to say under what conditions one type of model can outperform
another. The paper also questions the wisdom of searching for a single best model or approach. It
suggests that combining forecasts may improve accuracy.
KEY WORDS: Forecasting models, causal models, simplicity, complexity.

Communicated by Andrei Rogers.
INTRODUCTION: THE TYPOLOGY

With the modeling typology suggested by Beaumont and Isserman (1987: 1008) and
Smith (1987: 1010) and adopted by Rogers to unify the discussion in the papers
in this volume we have taken an important step toward answering the question
"do simple models outperform complex models in population forecasting?" We now
have a working terminology and recognize that simple/complex dimension is to be
distinguished from noncausal/causal dimension (which Rogers calls the extrapola-
tive/causal dimension).
When we describe population forecasting models we have a vocabulary for these
models: a two-by-two typology: simple/causal; simple/noncausal (or simple/extra-
polative); complex/noncausal (or complex/extrapolative); and complex/causal. Purely
demographic models are extrapolative (noncausal or naive) and may be simple or
complex (an ARIMA model would be complex, a trend model simple). A cohort-
component model is generally thought tó be a complex/extrapolative model, how-
ever, an interesting question is raised by Smith (1987: 1010) as to whether cohort-
component models are causal or noncausal. The classification of these models seems
to rest on how the assumptions are formulated. If a causal argument goes into the
•Scott Armstrong and Fred Collopy provided helpful comments.

281
282 D. A. AHLBURG
assumed paths of fertility, mortality, and migration, then they are complex/causal
models, according to Smith. This seems to have been the case until fairly recently.
Now that the Bureau uses statistical time series analysis for the future age-specific
fertility rates (Long, this collection), the position is a little less clear. Economic-
demographic models, represented in this collection by'Warren Sanderson's paper,
are integrated (or linked) models with both demographic and economic detail. They
are distinct from purely demographic models (trend, time series, cohort-component)
that have no causal economic inputs, and economic models with the demographic
component exogenous. They are thus causal and (generally) complex. Structural
economic models with exogenous demographics are complex and causal in their
economic structure but not in their demographic structure.
What this typology means is that when addressing the question of whether a sim-
ple model is more accurate than a complex model, one should do so within a class
of causal or noncausal models. Thus, comparisons of extrapolative models such as

those of Isserman (1977) and Smith (1987) compare within the causal class and can
address simplicity/complexity, although these studies are mostly of simple models.
The comparisons of Ahlburg (1990) and Long and McMillen (1987) are of complex
models but differ on the causality dimension, depending upon how one chooses
to classify cohort-component models. Many other comparisons mix simplicity and
causality and thus are not helpful in answering the question being discussed in the
papers in this collection.
Simple/complex and causal/noncausal by no means exhausts the possible typolo-
gies. Ramsay and Kmenta (1980: 2) suggest a distinction between formal and in-
formal models, although they state that the distinction is more accurately between
parametric and nonparametric specifications.1 The models discussed here are para-
metric, although some have nonparametric inputs, such as expert judgment on the
fertility, mortality, and migration assumptions of cohort-component model.2
HOW DO WE EVALUATE THE «PERFORMANCE" OF A MODEL?

The discussion of whether simple models outperform complex models is really about
whether simple models or complex models are "best". But what is meant by "best" is
sometimes not clear. Are we interested in the forecast accuracy of the model or are
we interested in something broader? Whereas most of the population forecasting
literature and most of the papers in this collection focus on accuracy, the introduc-
tory paper by Andrei Rogers and the papers by John Long and by McNown, Rogers,
and Little, discuss a broader set of criteria for evaluating forecasts.
Long (1993) sets out these additional criteria:
1) reputation or expertise of the forecaster
2) face validity—reasonableness or "believability", based on commonsense or
accepted academic theory or tenents.
!
The distinction is important because statistical rejection of a hypothesis couched in terms of a specific
model may be a rejection of the particular parametric specification used not a rejection of the underlying
hypothesis.
2
There are some purely judgment models in demography, such as college enrollment forecasts, but not
many (see Ahlburg, McPherson, and Schapiro, 1993, for a review).
SIMPUXITY VS. COMPLEXITY 283
3) fair and unbiased—in the legal and administrative sense3

4) parsimony—a clear and simple model
5) internal consistency
6) cost of development and ease of implementation
7) how well it suits the needs of the user
8) extent to which the forecast reflects intended government policy.
In his paper in this collection Long adds:
9) legitimacy (use of most recent data, consensus on assumptions)
10) transparency—ease of explanation
11) suitability as a base for other projections.
This list accords fairly well with the results of a survey of users carried out by
Carbone and Armstrong (1982): users are concerned about the cost and time re-
quired to make the forecast, the ease of use and implementation, and the ease
of interpretation. While these attributes are certainly desirable, they should not be
overemphasized. As pointed out by Newbold and Bos (1994: 523), it is important
that the forecasts be generated through some intellectually plausible mechanism and
it is legitimate to remain skeptical when apparently good forecasts are generated by
an implausible mechanism (such as Andrei Rogers's pundit monkeys). However,
there may be a tradeoff between accuracy and" these criteria. I think "assumption
drag" (see John Long's paper) can result from an undue focus on face validity, le-
gitimacy, and transparency and has contributed to the inaccuracy of the US Bureau
of the Census forecasts and its missing turning points in demographic series. A bal-
ance needs to be struck between the cost of generating and using a forecast and
the benefits of more accurate forecasts. As Newbold and Bos (1994: 523) note: "one
can derive benefit from driving an automobile without fully understanding exactly
how it works, though some general understanding is certainly useful." If we adopt
the use of a wider set of criteria in evaluating a population forecast, then we are
concerned with whether the marginal benefits of a change in methodology in terms
of increased accuracy outweigh the increased marginal cost from any decrease in
face validity, internal consistency, parsimony, or other criteria.
MEASURES OF FORECAST ACCURACY

If we focus on forecast accuracy as the measure of forecast performance, have
demographic forecasters been using appropriate measures of forecast accuracy?
In a convenience survey of nineteen papers on population forecasting, I found
that twelve used the mean absolute percentage error (MAPE), four the root mean
squared error (RMSE), three the root mean squared percentage error (RMSPE),
and three used Theil's U, and two other measures (six used several measures). One
more, John Long's in this collection, also uses RMSE. None of the papers (includ-
ing my own) justified their choice of error measure. In general, the choice of the
forecast error measure should be related to the loss function of the user. If, for
'Avoiding statistical bias is also a desirable property. See Holden, Peel, and Thompson 1990.
284 D. A. AHLBURG
example, the cost of errors is linear, then an absolute error cost function prevails
and an error measure such as mean absolute error (MAE) is appropriate. If the loss
function (cost of errors) is linear in percentages (rather than in absolute errors),
then MAPE is an appropriate error measure. If big errors are very costly, then a
quadratic loss function, that is, one that weights larger errors more heavily, should
be used. Mean square error (MSE) is a candidate, another is RMSE, but, despite
being the most widely used error measure, it is inappropriate for comparisons across
models and time horizons (Chatfield, 1988; Fildes and Makridakis, 1988; Armstrong
and Collopy, 1992; and Fildes, 1992). This is because it is highly unreliable. For
example Chatfield (1988) showed the comparison on methods for 1001 series was
dominated by the RMSE of about three series. Note that one of the most famous
results in population forecasting, that of Keyfitz (1981) on the size of forecast error
from cohort component models is based on a the use of RMSE across forecasts and
forecast horizons.4
As Granger (1980) and Newbold and Bos (1994) note, it is difficult if not im-
possible to get users to specify a loss function. For the Bureau of the Census with
thousands, if not millions of users, it is impossible. Consequently most forecast-
ers who consciously choose a loss function assume a quadratic loss function. Such
an assumption also implies that costs are symmetric which may not be the case.
Asymmetric loss functions can be specified but require special treatment in fore-
cast evaluation and also some modification of the usual forecasting methodologies
(Newbold and Bos 1994: 525).
There are cases though where an exclusive focus on loss functions is not war-
ranted. The most common case is that where turning points are of particular im-
portance. Here a method that correctly predicts the turning point is preferable to
a method that has a lower average error across the whole forecast period. Turning
points are important in demography but, at least at the national level, occur in-
frequently. However, because of assumption drag, such missed turning points also
influence the projections made after the turning points. Thus it is not clear how
much weight should be given to the ability to predict turning points at the cost of
inferior average accuracy. Andrei Rogers calls for some measure of "degree of diffi-
culty" by which to weight performance. This is a good idea. An alternative involving
the use of relative accuracy measures is discussed below.
Another case is where the outcome is not independent of the forecast. Here a
forecast may be ex-ante accurate but ex-post inaccurate because the ex-ante forecast
leads to action that changes the outcome. This is less likely to be the case in national
and state forecasts but may be relevant in certain small area forecasts.
4
Fildes and Makridakis (1988: 549) show that mean square error loss is sensible where a homogeneous
set of series are being evaluated and where the square forecast errors have associated with them a dis-
tribution of associated costs, specifically, constant unit costs independent of the variability of the series.
RMSE performs poorly on two other criteria for choosing error measures: reliability and validity. MAPE
scores reasonably well on these. Reliability is concerned with the extent to which an error measure pro-
duced the same accuracy rankings for a set of methods when it is applied to different samples from a set
of time series. Construct validity is concerned with whether measures what they should be measuring,
that is, accuracy. Rankings of methods were found to differ by accuracy measure used (Armstrong and
Collopy, 1992: 73).
SIMPLIXITY VS. COMPLEXITY 285
Finally, an accuracy measure may be consistent with the user's loss function but
may have properties that make it undesirable to use. This is illustrated in some re-
cent work on the choice of a measure of forecast accuracy when comparing models
across series. This is the typical procedure in sub-national forecast comparisons,
as well as in Keyfitz's work using population forecasts from different countries and
across forecast horizons. A key problem that arises is that accuracy measures should
be unit-free measures, otherwise series with large numbers might dominate compar-
isons. RMSE, often used in demography, is not unit-free. MAPE, another common
measure, is unit-free. Sometimes comparisons are made over series with different
amounts of change (discussed at length by Rogers). Using a relative measure that
compares the forecast from the model against those of another model is appropri-
ate. Such a measure is the Relative Absolute Error (RAE) which divides the abso-
lute forecast error by the corresponding error for the random walk. To summarize
across series the geometric mean of the RAE (the GMRAE) is taken.
If outliers are a concern they can be guarded against by the use of medians (Arm-
strong and Collopy 1992). To select among forecasting methods, they suggest using
the median RAE (MdRAE) when using a small number of time series and the
median absolute percentage error (MdAPE) when comparing across many series.
Fildes (1992) recommends the use of relative geometric RMSE because it has de-
sirable statistical properties and a simple interpretation as measuring the average
error of one method compared to another. Theil's U is another attractive measure.
The gist of these papers is to use an error measure with desirable properties and
if there is a clear loss function underlying the application, use an error measure
consistent with it. If the loss function is not known, the forecaster may want to
consider a number of error measures within the appropriate set discussed by Arm-
strong and Collopy and Fildes. Unfortunately, the error measures generally used in
demography are not those that have been shown to have desirable properties nor is
the match between loss function and error measure widespread.
SHOULD WE LOOK FOR A SINGLE BEST MODEL?

A common strategy in forecasting is to look for a single best model. The notion
implicitly underlies the forecasting comparisons carried out in demography, includ-
ing those in the current collection. It is not clear that this is a very fruitful strategy
either on theoretical or empirical grounds. First, it can be shown that a forecast that
combines forecasts from different models will be at least as accurate as the best of
its component forecasts (Holden, Peel, and Thompson 1990; chapter 3 section 2).
Bodkin et al. (1991: 531) show that the combination of model forecasts can reduce
the risk of error. Not only can the variance of forecasts be reduced by combining
but the absolute level of accuracy can be improved.5 Skepticism arises because this
result is based on population values rather than sample values, so in practice if the
most accurate model can be identified in advance, then it should be used. But how
likely is the population forecaster to know this? Most likely, not very. As Rogers
'Bodkin et al. (1991: 531) also recommend combining forecasts for high frequency intervals (e.g. monthly)
from time-series models with forecasts of lower frequency (quarterly) from structural macroeconometric
models.
286 D. A. AHLBURG
has observed "[forecasters] usually cannot anticipate the likely occasions that are in
prospect." Consider the following information from economic forecasting: the rank
correlation between the 1971 and 1972 accuracy of forecasts from 12 US econo-
metric models that varied substantially in complexity for 1971 and 1972 was —0.3
(McLaughlin, 1973). The average rank correlations for the relative accuracy of short
range forecasts for five British macromodels on seven variables were even smaller
(Armstrong, 1985: 22). Wolf (1987) ranked 15 leading US macroeconomic forecast-
ers on the accuracy of four variables (real GNP growth, unemployment, the three-
month Treasury Bill rate, and inflation) for 1983 through 1986. The rank correlation
of accuracy over the three years was 0.168 and not significant. That is, a model's rel-
ative accuracy in one period is not a good guide to its accuracy in a future period.6
In a review of demographic comparisons McMillen and Long (1987) found that
time series models dominate economic-demographic models for short horizons but
not for longer horizons. Research on the accuracy of small-area forecasts has failed
to find a method that is superior to others for places with different rates of growth.
It is doubtful, therefore, whether a forecasting strategy that searches for the best
single model is a good strategy for forecasting even if one model is thought likely to
outperform others.
A large volume of empirical evidence in the general forecasting literature, includ-
ing that from demographic series (but not forecast by demographers) suggests that
the combination of forecasts leads to smaller forecast errors in practice (Newbold
and Bos 1994:495; Holden, Peel, and Thompson 1990: Chapter 3; Armstrong 1985).
To the best of my knowledge there has been only one study carried out in demogra-
phy on combined forecasts. In a study of forecasting the population of census tracts,
Smith and Shahidullah (1993) found that a forecast based on the simple average of
the forecasts from all extrapolation techniques was about as accurate as the single
most accurate method (which was not known ex-ante), although it was not as accu-
rate as "a combination of forecasts using only those techniques found to perform
particularly well for each type of place". As Smith and Shahidullah are aware, such
weighting schemes should be tested by using data not used in the construction of the
weights. Ex-ante forecast comparisons are best but ex-post comparisons for other
areas or time periods can also be useful.
A number of considerations have been suggested to help in deciding whether to
combine forecasts. First, combining forecasts is most likely to yield gains in accu-
racy when the forecasts combined are from different methodologies because they
are capturing different information sets or different specifications (Holden, Peel,
and Thompson 1990: 42, 96-97). Second, the series to be included can be chosen
on the basis of past ex-ante forecast errors. These should be unbiased, random,
and small. Experience in other fields suggests that the total number of forecasts
to be included is likely to be small, most likely between three and six. Third, even
though it can be shown theoretically that an optimal weighting scheme exists, such
an optimal weighting scheme changes over time if the covariances of the forecast
errors vary over time. Consequently, an empirically based scheme may be useful.
Fortunately, research on the combination of forecasts shows that a simple weighted
6
I am setting aside here the problems mentioned above with using certain measures of accuracy across
series and forecast horizons.
SIMPLICITY VS. COMPLEXITY 287
average of forecasts often works well relative to more complex combinations (see
Clemen, 1989, for an exhaustive survey).7
Most of the "rules of thumb" we have on combining forecasts have come from
a narrow set of extrapolative techniques. Thus, it is possible that new approaches
to combining may yield more efficient strategies for combining forecasts (see Arm-
strong and Collopy, 1993; Diebold, 1989; and Holden, Peel, and Thompson 1990).
The Armstrong-Collopy approach, which uses causal knowledge about the series
to be forecast to determine weights for combining extrapolative forecasts, shows
promise for application in demography, especially in sub-state forecasts where ex-
trapolative models are common. For example, Isserman (1977) and Smith (1987)
attempt to reduce forecast error by using domain knowledge in a structured way,
although not as formally and completely as Armstrong and Collopy. There is no
reason to believe that the approach could not be used to combine these extrapola-
tive models with more complex noncausal models or with causal models.
CONCLUSION
In my view, it is too early to say whether simple models are in general more accurate
than complex models or whether causal models are more accurate than noncausal
models. Research to date in demography has not established the clear superiority
of one type of approach over another. Furthermore, the papers by Rogers and by
McNown, Rogers, and Little show that some of the accepted results of population
forecasting have nothing to do with the accuracy of complex models relative to that
of simple models. The contradictory findings of Stoto and Keyfitz on simple models
versus complex models probably reflects the different lengths of the base periods
they used to construct the simple growth rate extrapolation (Rogers). McNown,
Rogers, and Little show that a comparison of the accuracy of simple and complex
extrapolation models (time series models) would most likely be determined by the
choice of the sample period for computing the historical rates of change of fertil-
ity. Forecasts based on information from the past five years would show persistent
increases in fertility, whereas those based on the last thirty years show dramatic
declines in fertility. They show that both the point estimates and the confidence
intervals are affected. In addition, most of the comparisons we have in demogra-
phy are based on fit to the historical data or ex-post errors. It has been shown that
ex-ante not ex-post comparisons are the appropriate basis for comparing forecast
accuracy (Pant and Starbuck 1990, and Armstrong 1985: 241-242).
I think that it is also too early to say under what conditions simple models can
outperform complex models. While it is quite possible that different models may be
more accurate for different periods of change, forecast horizons, and for different
types of series, there have been too few careful comparisons for any general conclu-
sions to be drawn. I disagree with Andrei Rogers's conclusion that "complex mod-
7
Newbold and Bos (1994: 510-511) found that weights based on the inverse of the sum of squared fore-
cast errors perform well compared with regression based weights. Regression-based weights, such as
those mentioned by Smith and Shahidullah (1993: 13-14), perform well when a small number of the
forecasts are clearly inferior. However, if this is the case it may be best to exclude these forecasts.
288 D. A. AHLBURG
els have outperformed simple models in times with relatively stable demographic
trends, when the degree of difficulty has been relatively low, and have been outper-
formed by simple models in times of significant unexpected shifts in such trends,
when the degree of difficulty has been relatively high." The RMSEs provided by
John Long (Table 1) do not support these conclusions. If we take the period of
the late-1950s and early-1960s and the period 1975-1985 as being relatively stable,8
then the simple constant growth model outperforms the cohort-component model
in the latter period. While it is true that the cohort component model outperforms
the simple model in the earlier period for five-year ahead forecasts, it does not do
so for both 20 year forecasts. If we were to include the 1955 forecast then the pic-
ture is even murkier. Rogers also concludes that "simple models have outperformed
complex models at major turning points in U.S. demographic trends." Again Long's
error statistics do not support this conclusion. Assuming the major turning points to
be those in fertility in 1957 and 1976, the cohort component model outperforms the
simple model just after the 1957 peak (for the five year but not the 20 year forecast)
but not after the 1976 trough in fertility. It is of course possible that Rogers's con-
clusions may hold under different simple models. One area where we may have a
"general rule" is in forecasting age-specific populations. Here it does seem that the
cohort component model's use of information on the age structure helps as claimed
by McNown, Rogers, and Little (this collection). But note that the results in Long's
Table 2 show that this is not universally true: although all of the Census Bureau's
cohort component projections are more accurate than a simple constant growth
model for the population aged 15-19, in two of the seven forecasts for the popula-
tion 60-64 years, the simple model outperforms the cohort component model.
It is not clear that our search for the single best model or approach makes sense
because, as I have argued above and is illustrated in the papers here: 1) the fore-
caster rarely has enough information to assert with any great conviction that a par-
ticular model is superior to all others, and 2) even when a particular forecast ap-
pears to be superior, it does not necessarily follow that the other forecasts contain
no useful information.9 The extensive literature on combining forecasts suggests that
forecast accuracy can be improved, often greatly, by combining the forecasts from
different models. The one study in demography that combines forecasts finds this to
be the case.
Where, then, do we need further research? I think the following areas would
repay furtherwork:
1. I agree with Andrei Rogers that we need forecasting competitions of alter-
native models. These have been very helpful in economic forecasting and in busi-
8
These periods were identified by Andrei Rogers (personal communication). They accord well with peri-
ods of relatively lower decade change in the growth rate of population and with lower annual net change
in population per thousand.
9
The usefulness of the other forecasts can be established by regressing the actual value of the forecast
variable on a constant and the predicted values of the variable from different models. If a model's fore-
cast contains all the information in another model's forecast and some additional information, then its
forecast should be significant in this regression, and the other models should not If both forecasts con-
tain independent information, then both should be significant. If neither contains useful information, then
neither should be significant See Fair and Shiller (1990) and Fair (1993). One assumes that "usefulness"
established in this fashion is robust over time.
SIMPLIXITY VS. COMPLEXITY 289
ness forecasting. Ex-ante forecasting competitions are best. Unfortunately, in de-

mographic forecasting few of us would survive to evaluate the forecasts. Much can
be learned from forecasts on simulated data or on historical data. We need to be
careful to agree on the appropriate measure or measures of accuracy to use.
2. Combining forecasts. The idea of the forecast competitions is not to choose
the best model or approach but to choose a set of approaches that can be combined
to yield a more accurate combined forecast. We need to investigate whether a simple
average is most appropriate or whether some other method of combining forecasts
yields a more accurate forecast and under what circumstances.
3. The use of experts in population forecasting. The use of experts to set the fer-
tility, mortality, and immigration assumptions of Bureau of the Census forecasts has
always been a mystery to those outside the Bureau. This mystery has perhaps deep-
ened with the replacement of the fertility experts by time series methods (experts?)
and the potential replacement of mortality experts by the time series approaches
of Lee and Carter and Rogers and McNown. Recent advances in forecasting have
suggested a procedure for structuring the domain knowledge of experts to enhance
forecast accuracy. The conditions under which this approach is useful seem to fit
nicely with the traditions of the cohort component method. They are a) the fore-
caster has expert knowledge, b) the trend of the target series is affected by more
than one important causal force, c) the series can be decomposed such that the
separate causal forces can be specified for at least one of the components, and d)
it is expected that the components can be forecast more accurately than the target
values (Collopy and Armstrong 1994).
4. Uncertainty. I think the papers in this collection have made a great contri-
bution to our understanding of uncertainty in population forecasts. One thing that
needs to be done is to educate the users of population forecasts on measures of
uncertainty beyond "high" and "low" scenarios. In particular, we need to consider
the implications of Keyfitz's (1981) observation, re-emphasized in the paper by Mc-
Nown, Rogers, and Little, that demographic forecasts beyond 20 to 30 years ahead
convey little information.
REFERENCES
Ahlburg, D. A. (1982) How accurate are the U.S. Bureau of the Census' projections of total live births?
Journal of Forecasting 1: 365-374.
Ahlburg, D. A. (1987) Population forecasting. In S. Makridakis and S. Wheelwright (eds.), The Handbook
of Forecasting: A Manager's Guide, second edition, 135-149. New York: Wiley.
Ahlburg, D. A. (1990) A Comparison of the Ex-ante Forecasts of U.S. Births From an Economic-Demo-
graphic Model and the Bureau of the Census, paper presented at the Annual Meeting of the Population
Association of America, Toronto, Ontario.
Ahlburg, D. A., McPherson, M., and Schapiro, M. O. (1993) Incorporating enrollment forecasts into
projections of Pell Program Costs: A study of feasibility and effectiveness. Report for the Department
of Education (Washington, DC), September.
Armstrong, J. S. (1985) Long-Range Forecasting, second edition. New York: Wiley.
Armstrong, J. S., and Collopy, E (1992) Error measures for generalizing about forecasting methods:
Empirical comparisons. International Journal of Forecasting 8: 69-80.
Armstrong, J. S., and Collopy, F. (1993) Causal forces: Structuring knowledge for time-series extrapola-
tion. Journal of Forecasting 12: 103^115.
Beaumont, P., and Isserman, A. (1987) Comment. Journal of the American Statistical Association 82:
1004-1009.
290 D. A. AHLBURG
Bodkin, R., Klein, L. R., and Marwah, K. (1991) A History of Macroeconometric Modelbuilding. Brook-
field, VT: E. Elgar.
Carbone, R., and Armstrong, J. S. (1982) Evaluation of extrapolative forecasting methods: Results of a
survey of academicians and practitioners. Journal of Forecasting 1: 215-217.
Chatfield, C. (1988) Apples, oranges, and mean square error. Intemational Journal of Forecasting 4:515-
518.
Clemen, R. T. (1989) Combining forecasts: A review and annotated bibliography. International Journal
of Forecasting 5: 559-584.
Collopy, F., and Armstrong, J. S. (1992) Rule-based forecasting: Development and validation of an expert
systems approach to combining time series extrapolations. Management Science 38.
Collopy, F., and Armstrong, J. S. (1994) Decomposition of time series by causal forces: A decision
process for structuring forecasting problems. Working paper.
Diebold, F. X. (1989) Forecast combining and encompassing: Reconciling two divergent literatures. Inter-
national Journal of Forecasting 5: 589-592.
Fair, R. C. (1993) Testing macroeconometric models. American Economic Review 83: 287-293.
Fair, R. C., and Shiller, R. J. (1990) Comparing information in forecasts from econometric models.
American Economic Review 80: 375-389.

Fildes, R. (1992) The evaluation of extrapolative forecasting methods. International Journal of Forecasting'
8: 81-98.
Fildes, R., and Makridakis, S. (1988) Forecasting and loss functions. International Journal of Forecasting
4: 545-550.
Granger, C. W. J. (1980) Forecasting in Business and Economics. New York: Academic Press.
Holden, K., Peel, D. A., and Thompson, J. L. (1990) Economic Forecasting: An Introduction. New York:
Cambridge University Press.
Isserman, A. (1977) The accuracy of population projections for subcounty regions. Journal of the Ameri-
can Institute of Planners 43: 247-259.
Keyfitz, N. (1981) The limits of population forecasting. Population and Development Review 7: 579-593.
Keyfitz, N. (1982) Can knowledge improve forecasts? Population and Development Review 8: 729-751.
Land, K. (1986) Methods for national population forecasts: A review. Journal of the American Statistical
Association 81: 888-901.
Long, J. F. (1993) Accuracy, monitoring, and evaluation of national population projections. In N. Keilman
and H. Cruijsen (eds.), National Population Forecasting in Industrialized Countries, 129-146. Amster-
dam: Swets and Zeitlinger.
Long, J. F., and McMillen, D. (1987) A survey of Census Bureau projection methods. In K. Land and S.
Schneider (eds.), Forecasting in the Natural and Social Sciences, 141-178. Boston: D. Reidel.
McLaughlin, R. L. (1973) The forecasters' batting averages. Business Economics 3: 58-59.
McNees, S. (1987) Consensus forecasts: The tyranny of the majority. New England Economic Review,
15-21.
Murdock, S., Leistritz, F. L., Hamm, R. R., Hwang, S-S., and Parpia, B. (1984) An assessment of the
accuracy of regional economic-demographic projection models. Demography 21: 383-404.
Ramsay, J. B., and Kmenta, J. (1980) Problems and issues in evaluating econometric models. In J.
Kmenta and J. B. Ramsay (eds.), Evaluation of Econometric Models, 1-11. New York: Academic Press.
Smith, S. K. (1987) Tests of forecast accuracy and bias for county population projections. Journal of the
American Statistical Association 82: 991-1003.
Smith, S. K., and Shahidullah, M. (1993) Evaluating population projection errors for census tracts. Paper
presented at the Annual Meeting of the American Statistical Association (revised).
Wolf, C. (1987) Scoring the economic forecasters. The Public Interest 88: 48-55.

Mathematical Population Studies: An International Journal of Mathematical Demography

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Mathematical Population Studies: An International Journal of Mathematical Demography

Uploaded by

Copyright:

Available Formats

This article was downloaded by: [Purdue University]

On: 17 January 2015, At: 09:36

Mathematical Population Studies:

Simple versus complex models:

To link to this article: http://dx.doi.org/10.1080/08898489509525406

PLEASE SCROLL DOWN FOR ARTICLE

SIMPLE VERSUS COMPLEX MODELS: EVALUATION,

February 20, 1995

KEY WORDS: Forecasting models, causal models, simplicity, complexity.

INTRODUCTION: THE TYPOLOGY

•Scott Armstrong and Fred Collopy provided helpful comments.

of causal or noncausal models. Thus, comparisons of extrapolative models such as

HOW DO WE EVALUATE THE «PERFORMANCE" OF A MODEL?

3) fair and unbiased—in the legal and administrative sense3

MEASURES OF FORECAST ACCURACY

SHOULD WE LOOK FOR A SINGLE BEST MODEL?

ness forecasting. Ex-ante forecasting competitions are best. Unfortunately, in de-

American Economic Review 80: 375-389.

You might also like