Judgemental and Statistical Time Series Forecasting: A Review of The Literature

ELSEVIER International Journal of Forecasting 12 (1996) 91-118
Judgemental and statistical time series forecasting:

a review of the literature
Richard Webby*, Marcus O'Connor

School of Information Systems, University of New South Wales, Kensington, N.S.W. 2052, Australia
Abstract
This paper reviews the literature on the contributions of judgemental methods to the forecasting process. Using a
contingent approach, it first reviews the empirical studies comparing the performance of judgemental and statistical
methods and finds emphasis for the importance of judgement in providing contextual information for the final
forecasts. It then examines four methods of integrating contextual information with the output of statistical models.
Although judgemental adjustment of statistical forecasts is a viable alternative, simple combination of forecasts
may offer superior benefits. Promising developments can also be gained from the use of decomposition principles in
the integration process.
Keywords: Time series forecasting; Judgemental forecasting; Statistical forecasting; Forecast combination; Judgemental
adjustment; Judgemental decomposition
1. Introduction reviewing the empirical evidence regarding the

role of judgement in time series forecasting. We
In the past decade, opinion has been divided examine studies that compared judgemental and
as to the role of judgement in time series statistical forecast accuracy, as well as studies
forecasting. Some (e.g. Makridakis, 1988) sug- where judgement was integrated with statistical
gested that judgement cannot be trusted. Others or structured forecasting methods. To do this in
(e.g. Dalrymple, 1987) provided evidence that a unified manner, we develop a contingent
judgement was strongly preferred as the most framework within which studies can be com-
important method of practical sales forecasting. pared and contrasted along a number of task,
In addition, studies of judgemental forecast human and environmental dimensions.
accuracy have shown the credibility of simple Section 2 presents the studies that have com-
eyeballing (e.g. Lawrence et al., 1985) and the pared judgemental and statistical forecast ac-
need for judgement in situations involving con- curacy. We begin by establishing the practical
textual information (e.g. Edmundson et al., validity of this comparison (i.e. we ascertain that
1988). both judgement and statistical methods are used
This paper seeks to focus the debate by in practice) and then proceed with a factor-by-
factor review of the literature along the contin-
* Corresponding author. gent dimensions referred to above.
0169-2070/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved

SSD1 0169-2070(95)00644-3
92 R. Webby, M. O'Connor / International Journal of Forecasting 12 (1996) 91-118
Section 3 then extends the results from Section evidence documented in those surveys indicates
2 by examining statistics and judgement as com- a remarkedly consistent finding-that subjective
plementary approaches rather than competing techniques generally represent about 40-50% of
approaches. We suggest that statistical (objec- the total techniques used in time series forecast-
tive) and judgemental (subjective) methods ing. When comparing early and recent surveys,
could perhaps be synthesised to gain the dual there appears to have been little diminution in
benefits of objective mechanical precision and the role of judgement, contrary to what one
subjective human interpretative abilities. Four might expect given the emergence of desk-top
main approaches to integrating subjective and computing and forecasting software. In consider-
objective forecasting are classified: model specifi- ing this evidence, it is important to note that the
cation, forecast combination, judgemental ad- majority of the respondents in those surveys
justment and judgemental decomposition. The were either forecasting experts or came from
literature is reviewed within those four large organisations. Given that subjective meth-
categories and the pros and cons of the ap- ods are used more frequently in small firms
proaches are debated with particular reference to (Dalrymple, 1987; Taranto, 1989), it seems the
their accuracy, efficiency, practicality and va- general prevalence of judgement in forecasting
lidity. may even be somewhat greater than indicated by
the above surveys.
It seems therefore that subjective and objec-
2. Subjective versus objective forecasting tive methods are both commonly used in prac-
methods tice, and we can proceed with a comparison of
the empirical accuracy of subjective and objec-
This section begins by briefly discussing the tive forecasting methods, knowing that it is
survey research documenting the practical use of practically relevant as well as academically inter-
subjective forecasting methods compared to ob- esting.
jective forecasting methods. The aim of this
exercise is to confirm that both subjective and 2.2. Contingent analysis methodology
objective methods are prevalent in practice, and
hence establish the validity of their empirical The evaluation of the empirical studies com-
comparison. The section proceeds by using a paring the accuracy of subjective and objective
contingent methodology to assess the accuracy of methods of time series forecasting uses a contin-
subjective versus objective time series forecast- gent framework. The strategy adopted is to
ing in the empirical literature. This evaluation attempt to provide a comprehensive multi-factor
attempts to determine whether the effectiveness review of the literature, rather than focus on a
of subjective versus objective methods is depen- particular factor in isolation, since comparisons
dent upon task, human and environmental dif- made across studies cannot control for multiple
ferences (such as the availability or otherwise of factors. Although the provision of contextual
contextual information). information is considered to be an important
factor, it would be deficient to analyse its effects
2.1. Practical use of subjective versus objective across a variety of empirical studies without
methods considering the effects of other mitigating fac-
tors.
Many surveys have examined the relative use The factors considered are as follows. First,
of subjective and objective methods in practice the effects of time series characteristics and
(Cerullo and Avila, 1975; Lawrence, 1983; Men- presentation on the relative accuracy of the
tzer and Cox, 1984; Sparkes and McHugh, 1984; human judge are examined. Characteristics of
Dalrymple, 1987; Taranto, 1989; Batchelor and the time series include the periodicity of the data,
Dua, 1990; Sanders and Manrodt, 1994). The trend, seasonality, noise, instability, number of
R. Webby, M. O'Connor / International Journal of Forecasting 12 (1996) 91-118 93
historicals and number of forecasts required. mate exponential processes (although note the
Aspects of the presentation of the time series criticism of their normative model by Jones,
include whether the series were presented in 1984), but exponential growth is rare in econ-
graphical or tabular form and the nature of the omic series because many are affected by multi-
feedback provided to the judges. Second, the ple causal forces (see Collopy and Armstrong,
effects of subject/environment characteristics are 1993).
considered. The situational characteristics of the Thus, based on the related literature, judge-
judges that are deemed important to the task ment can be expected to be reasonably accurate
include the experience of the forecasters, the with trending series in comparison to statistical
extent to which the judges made the forecasts methods. However, the outcome of the com-
with an understanding of the context or nature of parison will be dependent on the type of statisti-
the series and an assessment of the motivation of cal technique used to model trend-some tech-
the judges in the task. niques are highly accurate, others are likely to be
The following sections discuss the empirical particularly weak in handling trend.
evidence for each of those contingent factors. To Studies of subjective versus objective time
supplement and support this discussion, a set of series forecasting, as opposed to line fitting,
reference tables is included in an Appendix to show that there is a distinct tendency for subjects
the paper, which provides further detail about to dampen upward and downward trends (Eg-
the studies. Details of the empirical studies gleton, 1982; Lawrence and Makridakis, 1989;
discussed in the following sections can be found Sanders, 1992). Lawrence and Makridakis (1989)
in the reference tables. found that forecasts of downward series were
dampened significantly more than forecasts of
2.3. Empirical evidence upward series. O'Connor (1993) also observed
the forecasting accuracy of downward sloping
2.3.1. Series~task characteristics series was significantly inferior to upward sloping
series. Sanders (1992) showed that judgemental
2.3.1.1. Trend forecasts of upward sloping series were relatively
In extrapolating trends, the human judge may less accurate than forecasts of fiat series, al-
revert to the mean, continue or enhance the though statistical forecasts of trending series
trend, or apply an intermediate strategy and were also worse.
dampen the trend. The tendency to continue a Thus, there may be a weak association be-
trend may be enhanced by its length and consis- tween the presence of trend and comparative
tency (Andreassen and Kraus, 1990) and by the judgemental inaccuracy, but all the above studies
presence of confirming news (De Bondt, 1990). were conducted using artificial or manipulated
Empirical work on line fitting by Mosteller et al. data, and may not apply in real life situations
(1981) indicates that judgemental fitting is highly which often require trends to be dampened
accurate with both upward and downward series, (Gardner and McKenzie, 1985). The assumption
compared with linear regression. In that study, of constancy made by researchers may not be
subjects may have tended to overstate slopes valid, and the supposed 'bias' of trend dampen-
slightly, but this was not significant, and noise ing may actually be appropriate behaviour.
had no influence on trend identification.
Judgemental modelling of trend in real series is 2.3.1.2. Seasonality
also highly accurate (Edmundson, 1987; Winton A substantial body of research suggests that
and Edmundson, 1993), and it seems that judges humans possess good pattern recognition skills,
may dampen these trends. Studies of the ex- in fields such as music, object perception and
trapolation of exponential growth (Wagenaar speech recognition (e.g. Simon and Sumner,
and Sagaria, 1975; Timmers and Wagenaar, 1968). A time series pattern is arguably much
1977) have shown that people greatly underesti- less complex than the patterns in those tasks, but
nevertheless Edmundson (1987) found that seasonality, subjective methods are comparative-
humans could perform as well as automatic ly better than objective methods.
mechanisms in seasonal identification.
The empirical evidence suggests that, in 2.3.1.3. Noise
studies involving artificial series, judgemental Noise refers to the random or white noise in a
performance deteriorates with high seasonality series, as distinct from instability which is related
(Adam and Ebert, 1976; Sanders, 1992), and to the presence of discontinuities. The findings
high signal (Eggleton, 1982; Lawrence and from studies in related fields suggest that high
O'Connor, 1992). However, the accuracy of the random noise leads to comparatively poor
statistical techniques also declines, and it is human performance (Beach and Peterson, 1967;
difficult to tell from these studies whether or not Laestadius, 1970).
judgement is comparatively worse when the The evidence in the time series forecasting
series contain high seasonality. literature tends to confirm that random noise has
Lawrence et al. (1985) showed that, with real a decremental effect on the forecast accuracy of
time series, judgemental forecasts of seasonal judgemental techniques (Adam and Ebert, 1976;
series were more accurate than judgemental Eggleton, 1982; Sanders, 1992; O'Connor et al.,
forecasts of non-seasonal series. However, there 1993). However, statistical methods also fare
may have been differences other than seasonality poorly. Determining which method is best for
involved, e.g. trend, noise and instability. The series with different noise levels is again not
study compared judgemental forecasts with straightforward. Lawrence et al. (1985) showed
statistical forecasts, finding that the best statisti- that students with graphs were more accurate
cal method was generally more accurate than than statistical methods for 'macro' series, which
judgement, except in the case of tabular seasonal would be expected to have relatively low noise,
series forecasted by the authors. but not for 'micro' series, which would probably
Edmundson (1987) conducted a discriminant have high noise. However, the micro series may
analysis in order to determine the time series have also exhibited higher instability (see next
characteristics that allow us to differentiate be- section). The forecasts of the researchers them-
tween judgemental and statistical forecasts in selves in that study were more accurate with
monthly series. He found that a purely seasonal both types of series when they were presented in
metric was not statistically important, but a graphical form. Perhaps one reason for the
'signal to noise' metric was significant in dis- problem that novices had with noisy series was
criminating between judgemental and de- because they may have read too much signal in
seasonalised single exponential smoothing (DSE) the random perturbations (O'Connor et al.,
forecasts over short (1-6 months) horizons. 1993).
Edmundson concluded that judgement had rela- Thus, it is unclear whether subjective methods
tively greater difficulty with a strong stable perform better or worse than objective methods
signal, counter to expectation. However, the when random noise is high.
'signal to noise' metric was not a good discrimin-
2.3.1.4. Instability
ant between judgemental forecasts and Box-
Instability refers to the presence of discon-
Jenkins forecasts. This implies that this metric
tinuities or temporal disturbances in the series.
may be associated more with DSE accuracy than
The literature on 'bootstrapping' suggests that
judgemental inaccuracy. In other words, it may
models outperform man, provided the data are
simply identify series where DSE performance
relatively stable (Kleinmuntz, 1990). However,
differs, and hence those series where there is
when 'broken-leg c u e s '1 a r e available (which are
room for judgement to be better.
Thus, the presence of seasonality is associated
with higher forecast error for judgemental and A broken-leg cue refers to an unusual important piece of
information whose presence would dramatically alter judge-
statistical methods. It is relatively difficult to ment compared to a model of that judgement (Kleinmuntz,
determine whether, with different levels of 1990).
not able to be incorporated into the models), information processing capacities. The source of
human judgement is most effective. This implies this variation lies not only in task and subject
that judges will outperform models when they differences, but also in the different definitions
have contextual information to help them com- of what constitutes a 'cue' (see Wood, 1986, for a
prehend discontinuities in series. In the absence definition). For example, a time series may be
of these explanatory cues, human pattern recog- considered to be a single 'configural' cue (Bruner
nition skills may still allow the judge to notice et al., 1956), a set of cues indicative of its
discontinuities in the series, and hence discount observable characteristics, or it may contain cues
them when making the forecast. for every historical data point.
In forecasting artificial series containing dis- There is some indication that as little as 40
continuities, judges generally perform worse data points may lead to overload (Lawrence and
than statistical methods (Sanders, 1992; O'Con- O'Connor, 1992), although this result was inter-
nor et al., 1993). Only in one case was judge- preted more as a failure in the assumption of
ment superior-when, in Sanders (1992), subjects constancy and hence its implication is unclear. It
forecast a low noise series that contained an has been suggested that complex statistical tech-
immediate level change ('step') mid-way through niques such as Box-Jenkins may require as many
the series history. However, Sanders did not test as 50 observations in the modelling process (Box
judgement against the adaptive response rate and Jenkins, 1970). Parsimonious techniques
(ARR), which was found to be consistently require significantly fewer observations 2 (e.g. the
better than single exponential smoothing (SES), naive requires just one historical value), but it is
and judgement, in O'Connor et al. (1993). possible that judgement (especially contextual
One problem with both these studies is that judgement) would be better than even these
the forecasts were made in the absence of parsimonious methods when there is little his-
contextual information. In a realistic setting, torical data (e.g. new product forecasting, Wind
such information should help judges identify, et al., 1981).
explain and anticipate discontinuities. For exam-
ple, Sanders and Ritzman (1992) determined that 2.3.1.6. Length of forecast horizon
expert contextual judgement was most effective The empirical evidence suggests that short-
for the highly variable series-they found a sig- term forecasts are generally more accurate than
nificant interaction effect between contextual long-term forecasts, in line with expectations
knowledge and data variability. Edmundson et (Lawrence et al., 1985; Brown et al., 1987;
al. (1988) also found that for forecasts for prod- Lawrence and Makridakis, 1989; Hopwood and
ucts with sporadic promotions in supermarkets McKeown, 1990). This is true for both subjective
(high instability), contextual knowledge was vit- and objective methods, but again it seems un-
ally important. clear which method has the greater advantage
Thus, it appears that judgement generally does for different horizon lengths.
not deal well with instability, unless contextual However, recent research by Lawrence et al.
information is available to explain discon- (1994) suggests that judgemental forecast accura-
tinuities. cy sometimes deteriorates with shorter forecast
horizons, despite the availability of more time
2.3.1.5. Number of historical data points series history and improved contextual informa-
Human information processing theory tion. This contrasts with the findings in the
(Schroder et al., 1967) suggests that there is an earnings forecasting literature that analyst timing
inverted bell-shaped relationship between infor- advantage (i.e. proximity to the earnings
mation load and decision quality. The empirical announcement) plays an important role in the
work (e.g. Miller, 1956; Phelps and Shanteau,
1978) on information overload suggests that 2 See Makridakis et al. (1983, p. 556) for guidelines as to
anywhere between three and ten pieces of in- the minimum data requirements of various objective meth-
formation (cues) represent the limits of human ods.
superiority of subjective forecasts over objective were superior with intermediate complexity
methods (e.g. Brown et al., 1987). (Remus, 1987). Dickson et al. (1986) also found
Thus, the majority of the evidence indicates a task-related effect. They concluded that "for a
that judgement performs worse than objective task activity that involves seeing time dependent
methods over long forecast horizons. patterns in a large amount of data, graphs are a
good choice of format." (p. 46).
2.3.1.7. Feedback It is difficult to tell whether a graphical or
In general, feedback has been found to have a tabular presentation mode is best in time series
beneficial effect on task performance in most forecasting. Lawrence et al. (1985) found no
estimation tasks (O'Connor, 1989). The type of significant difference between the accuracy of
feedback also appears to have an important judgemental forecasts made with graphical ver-
influence on task performance (Balzer et al., sus tabular displays, although they suggested that
1992). Outcome feedback may be less effective forecasts made by the table method were more
(Fischer, 1982) than task performance feedback 'robust' (i.e. had lower standard deviation).
(Benson and 0nkal, 1992) in probability learning There was also some weak support for the
tasks (i.e. presenting an actual value may be less hypothesis that graphs were better with short
effective than presenting a summative error horizons and tables were better in the longer
measure). In time series extrapolation, there has term.
been relatively little work done on this issue.
One study, by Mackinnon and Wearing (1991), 2.3.2. Judge~environmental characteristics
suggested that simple outcome feedback may be
just as effective. However, the deterministic 2.3.2.1. Experience
nature of their series might have made outcome This factor involves an assessment of the
feedback appear more effective than it would be experience possessed by the judges in the time
in a normal probabilistic forecasting task (Good- series forecasting task. This is an attempt to
win and Wright, 1993). codify the overall competence and knowledge of
It is difficult to report on the effect of feed- the human judge. There appear to be two major
back on judgemental time series forecasting. aspects to forecaster experience:
None of the studies we found systematically • Technical knowledge (Sanders and Ritzman,
varied the feedback dimension, and it was also 1992), which is knowledge about data analysis
difficult to determine the quality of feedback in and formal forecasting procedures, and
these studies. There is a clear need for research • Causal knowledge, which pertains to an
on the effect of different types of feedback on understanding of the cause-effect relationships
judgemental forecasting performance. involved (i.e. the structure, as opposed to the
content), and is typically gained from general
2.3.1.8. Graphical versus tabular data presenta- industry forecasting experience (Edmundson et
tion al., 1988). This is one aspect of 'non-time series
There is mixed evidence in the related litera- information', which also includes 'product
ture about the relative effectiveness of graphical knowledge', which is explained under the head-
versus tabular presentation. Some representative ing "Context", in the next section.
studies are mentioned here (see also DeSanctis, The above two aspects are not analysed in-
1984). Benbasat and Dexter (1985) found no dividually in this review because it is generally
performance differences between graphical and impossible to separate them in the studies (cf.
tabular reports, while DeSanctis and Jarvenpaa Sanders and Ritzman, 1992).
(1989) concluded that the value of graphics only Armstrong (1985) provides a review of the
emerged after practice. Remus (1984, 1987) value of expertise (pp. 91-96). The evidence
found that tables were superior to graphics in from a variety of fields (e.g. psychology, finance,
environments of low complexity, but graphics medicine) generally indicates that expertise
beyond a certain minimal level has little in- series behaviour.Whether or not this context is
cremental value (cf. the value of the expert appropriate or valid is another issue.
weather forecaster-Murphy and Brown, 1984). The definition of contextual information used
Perhaps ~ i s also holds true in time series fore- in this analysis can be related to similar notions
casting-maybe total novices (e.g. under- in literatures other than time series forecasting:
graduates) will be inept at the task, but subjects • Broken-leg cues (Meehl, 1957). As men-
with a limited amount of training (e.g. tioned, this term is used in the psychology and
graduates) will be just as good as experts, e.g. bootstrapping literatures (Kleinmuntz, 1990) to
managers (Remus, 1990). describe any dramatic changes or events which
The evidence from the time series forecasting cannot be captured in a formal model. The
literature is, unfortunately, rather difficult to broken-leg analogy was made to emphasise the
interpret because of the confounding effects of type of unusual event that invalidates the output
different contextual and motivational levels in of a clinical model.
the studies. It is difficult to say whether ex- • Soft information (Mintzberg, 1975). In his
perience is related to the forecast accuracy of studies of managerial work, Mintzberg observed
subjects compared with judges from this over- that much of the information processed by mana-
view table. At the intraostudy level, it appears gers is of an informal, verbal, qualitative or 'soft'
that experience (industry or technical) has no nature. Soft information is often in the form of
effect on forecast accuracy (Edmundson et al., rumour and hearsay and is particularly cherished
1988; Sanders and Ritzman, 1992). However by managers over the 'hard' data contained in
those two studies varied experience within a time MIS reports. Managers even strive to encourage
series only task. the flow of soft information to the extent of
leaving their door open to invite interruptions.
2.3.2.2. Context Mintzberg's results have been replicated and
The 'context' factor represents an assessment confirmed by Kurke and Aldrich (1983).
of the 'contextual information' available to the According to the definition given for 'con-
forecaster. Several terms have been used in the textual information', not all of it will consist of
literature that are similar to the meaning in- 'soft' information but an important portion of it
tended here by 'contextual information', e.g. will.
contextual knowledge (Sanders and Ritzman, Thus, on the basis of the prior related litera-
1992), product knowledge (Edmundson et al., ture, subjective methods are expected to per-
1988) and extra-model knowledge (Pankratz, form significantly better than objective methods
1989). The existing definitions of forecasting when contextual information is available.
'context' unfortunately do not distinguish clearly Table 1, which examines when objective and
between contextual information and forecaster subjective methods are best across different
experience. Thus 'contextual information' is de- levels of contextual knowledge, provides strong
fined here as information, other than the time support for the above expectation. It appears
series and general experience, which helps in the that subjective methods generally perform better
explanation, interpretation and anticipation of in the presence of contextual information. How-
time series behaviour. The exclusion of 'general ever, as Appendix reference Table Al(a)-(e)
experience' from this definition relates to the indicates, it is also possible that the improved
forecaster's knowledge of causal relationships performance is related to other factors, such as
among variables-this knowledge was defined as experience, motivation or differences in group
part of the experience dimension. Note that this versus individual forecasting.
definition of contextual information encompasses Edmundson et al. (1988), and later Sanders
the labels used to describe series (cf. Goodwin and Ritzman (1992), confirmed that contextual
and Wright's, 1993, definition), as labels provide information is critical in sales forecasting, much
some context within which to interpret time m o r e s o than expertise. Edmundson et al.
Table 1 higher with quality information, because there

Observations showing best method by context
will be "no obvious excuse for inaccuracy".
Context Best method There have been a number of empirical studies
Objective Subjective that generally support a link between incentives
None 10
and performance. A recent study (Henry and
Low 2 Sniezek, 1993) related to forecasting perform-
Moderate 2 ance found that judgements of future perform-
High ance in an 'almanac questions' task were the
Very High most accurate when individuals perceived high
levels of internal control and when monetary
rewards were present.
showed that forecasts made with product knowl- High levels of motivation are expected to have
edge were significantly more accurate (around a beneficial effect on human performance, par-
10% in terms of MAPE) than the techniques of ticularly when contextual information is avail-
DSE and judgemental extrapolations made by able. Unfortunately in the empirical research we
both novices and experts. Sanders and Ritzman reviewed, it is practically impossible to separate
(1992) also concluded that "contextual knowl- motivation from contextual information-the
edge is particularly important in making good situations where contextual information is avail-
judgemental forecasts, while technical knowl- able are also the situations where managers are
edge has little value" (p. 39). making the forecasts as an important part of
In conclusion, one possible advantage of con- their job.
textual information is that it explains past dis- Edmundson et al. (1988) compared the fore-
continuities and anticipates future discontinuities casts made for key versus non-key products
in time series. Without it, forecasters generally (managers would probably be more motivated
do not deal competently with rapid level changes for the important, key products) but again other
(Sanders, 1992; O'Connor et al., 1993). Both factors interfere in the analysis such as greater
Edmundson et al. (1988) and Sanders and Rit- instability and perhaps more contextual infor-
zman (1992) found that the effectiveness of mation with the key products, from the increased
contextual information was contingent upon the number of supermarket promotions.
instability in the series (although different moti-
vational levels may again play a role). 2.4. Summary
2.3.2.3. Motivation Table 2 summarises the findings made for each

Motivation refers to the environmental charac- of the contingent factors. It shows:
teristics that induce rigour in the application of a • the expected relationships, based on a priori
forecaster's selected strategy. Beach et al. reasoning and the related literature, between
(1986a) described four motivating characteristics high levels of each factor and judgemental ac-
of the forecasting environment: curacy in comparison to the accuracy of objec-
• the extrinsic benefit of making an accurate tive forecasting methods;
forecast, • the overall findings made for each relation-
• whether the forecast will be revisable in the ship, and whether they were in agreement with
light of further information before accuracy is the hypotheses; and
assessed, • the effect of the interactions between factors
• whether the forecast is perceived to be that were investigated in the studies.
within the judge's area of competence, i.e. In summary, therefore, contextual information
whether the judge's reputation is on the line, and appears to be the prime determinant of
• the adequacy, reliability and understan- judgemental superiority over statistical models.
dability of the information-motivation will be When time series are unstable, contextual in-
Table 2 objectivity and precision. In the next section, we

Summary of the comparative accuracy of subjective methods explore the alternatives to integrating objective
for high levels of each contingent factor and subjective forecasting. We attempt to de-
C.sc-~ry Factor IIypo~sis F~ling A~n~nt termine whether the effectiveness of the differ-
Task T~ld '~* X ent approaches to integrating subjective and
Scasonality ,~ ?
objective forecasting is dependent on contingent
Noise 9 ?
Instability = ~'* X
factors, such as contextual information. For
Ilistoticals ~, ~'* ~¢ example, judgemental adjustment may or may
Rm~casthoazo, t 'D'* ,¢ not be better than forecast combination when
Rmma,:k 4b. ? contextual information is high. We build upon
C,mphi~ 0 ? the literature reviewed earlier in this paper by
Subj~tJ ~ 0 ?
using the same contingent methodology to re-
Enviroommt C.on~xt.al infocmati~ 45 6 4
Molivatio. 'J~ 'I~
view the approaches to integrating subjective
Inl~lctians Iligh c~text & ~ ~ I¢
and objective forecasting.
high instability
High s4~0mlity & 4~ ~ X
low instability 3. Integrating objective and subjective
Tabular Wesamali~& ? ?
forecasting
, hip ex~eHencc
a In the absence of contextual information.
This section explores four approaches which
4b, strong positive; ,O,, weak positive; 'ql,, strong negative; ~,,
may facilitate the interaction of judgement with
weak negative.
structured forecasting methods: (a) model build-
ing, (b) forecast combination, (c) judgemental
formation is particularly advantageous, pre- adjustment and (d) judgemental decomposition.
sumably because of the greater number of dis- The literature in each category is reviewed, and
continuities that can be explained by human then the paper concludes with a debate of the
judgement. However, contextual information is pros and cons of the four approaches.
by no means the only factor to consider, as the Fig. 1 presents a diagrammatic classification of
above table shows. Factors such as trend, in- the four approaches. It summarises how the
stability, historicals, the length of the forecast subjective and objective processes may interact
horizon and high seasonality in the presence of in each approach, and in so doing, provides a
low instability may all have detrimental effects background for the discussion that follows. In
on judgemental accuracy, but perhaps only in the diagram, note that the input of soft 'con-
the absence of contextual information. There is a textual factors' has been separated from the hard
need for research to investigate the effects of quantitative information (represented by the
different time series characteristics when con- time series 11, and the causal variables X 1 to Xn).
textual information is available. Model building is an approach in which judge-
The next section examines the role of judgement is used to select variables, specify model
ment in conjunction with objective methods of structure and set parameters based on contextual
forecasting, rather than in competition with such factors. From this, an econometric or statistical
methods. As we have seen, judgemental ap- model is constructed which uses a time series,
proaches are useful in interpreting contextual and perhaps other causal variables, to produce
information, yet lack the mechanical precision the forecast (Yt+k)"
and rigour of statistical approaches. A composite In the combination approach, an 'objective'
approach may enable the advantages of forecast is combined with a 'subjective' forecast.
judgemental interpretation of 'soft' contextual The subjective forecast may have been produced
information to be coupled with the benefits of from the time series only, or with the added
mechanical approaches, such as defensibility, information provided by contextual factors and
forecast, and the forecast is recomposed with any

future-oriented information.
Note also that the four approaches are by no
means mutually exclusive-they can and do inter-
act. For example, model building is precursory
to forecast combination and judgemental adjust-
ment.
2. Comblmf~n of ObleeU~ ~ d S~Jer~hmFon~ubl
3.1. Model building
The formulation of a statistical forecast re-

quires the input of many aspects of judgement
(Bunn and Wright, 1991). At the very least, a
human judge is required in the first instance to
select a forecasting model (or a set of models for
3. J w ~ B a ~ Ad~Wrm~t of amO b ~ c ~ ~ s t combination, or from which the most accurate
can be selected). The purpose of this section is to
identify the areas where judgement is important
in model building. It does not seek to describe a
step-by-step approach to model building, nor
does it provide a treatise on different forms of
objective models.
Bunn and Wright (1991) identified four areas
where judgement may play a role in statistical
model building: variable selection, model specifi-
cation, parameter estimation and data analysis.
These will be discussed with particular reference
I.EQ~IO
to the role of judgement in interpreting 'soft'
contextual information.
Variable selection and data analysis. There are
often many variables that may be causally re-
lated to the forecast series. Statistical diagnostics
Fig. 1. Integrating subjective and objective forecasting. (data analysis) can help the forecaster find im-
portant causal variables, but, as Bunn and
Wright state, "conventional wisdom is that vari-
other causal variables. The combination process, able selection should be essentially judgemental"
represented by the rounded box, may be either (p. 509). The tasks of model specification and
objective (e.g. simple averaging), or subjective parameter estimation (described below) can be
(perhaps supplemented by further contextual automated using techniques such as multiple
knowledge). regression or 'bootstrapping' (Dawes et al.,
Judgemental adjustment simply takes the ob- 1989), but judgement is still required to identify
jective forecast and adjusts it for any contextual the variables. Bootstrapping is successful when
factors to produce the forecast. the relationships among variables are linear and
Judgemental decomposition is a three-step when the variables are well-identified, but is
process; judgement and statistical methods may vulnerable when there are unaccounted 'broken-
operate at any of these steps. First, the time leg' cues affecting the data (Kleinmuntz, 1990).
series is decomposed for any historical data (soft Recent research by Lawrence and O'Connor
or hard), then the decomposed time series is (1992, 1993) has shown that bootstrapping the
time series forecaster was effective for stable 'broken-leg' cues. The second approach, model
artificial series, but not for real-world series. intervention (see for example, West and Har-
Model specification. When causal information rison, 1989) allows the forecaster to identify
is quantitative, it can be incorporated fairly easily 'broken-leg' cues, without needing to specify the
into a formal model, either through the use of influence that the cues may have had on the time
regression as indicated above, or through build- series. The influence is determined within the
ing an econometric model. Causal variables can modelling process. Harvey and Durbin (1986)
also be incorporated into time series analysis, illustrated how model intervention could be used
using the technique of multivariate ARIMA. For to incorporate the effect of a once-only event
example, the influence of advertising expendi- (e.g. the introduction of special legislation) into
ture on a sales series could be accounted for by a structural time series model.
'prewhitening '3 the advertising series and apply- Parameter estimation. Harvey and Durbin
ing this transformation to the sales series before (1986) also showed how the parameters associ-
making the forecast (Makridakis et al., 1983, ated with a special event could be set so that the
Chapter 10). effect of the event diminished over time. How-
Empirical research, summarised by Armstrong ever, setting parameters may be a cognitively
(1985, pp. 404-412), indicates that causal and difficult task. Model intervention allows judges
extrapolative methods are roughly equivalent in to identify special events and it automatically
accuracy, but that causal methods are more assesses the effects of these events on the series,
successful than extrapolative methods when the but forecasters find it difficult to encode their
level of environmental change is high. This beliefs as to the temporal effects of these events.
suggests that causal methods would work best for Parameter estimation may be difficult because it
long-range forecasts. However, recent research requires understanding not only the influence of
by Clemen and Guerard (1989) indicates other- the event, but also the underlying model.
wise-the gains from econometric forecasts tend The following sections examine alternative
to decrease rapidly as the forecast horizon in- ways of introducing judgemental interpretation
creased. of contextual information into time series fore-
Turner (1990) and Donihue (1993) indicate casting. These approaches may not possess the
that, in practice, judgemental adjustments are objectivity and credibility of pure model-build-
frequently and successfully made to econometric ing, yet they appear simpler and (arguably) more
models to incorporate extra-model information. practical.
However, instead of adjusting a model's output
to incorporate the extra-model information, the 3.2. Combination of objective and subjective
model itself may be changed. There appear to be forecasts
two main ways of incorporating soft information
into the model. First, the information may be The combination of a judgemental forecast
subjectively encoded in the form of quantitative with a statistical forecast is a pragmatic approach
variables in a detailed econometric modelling to introducing subjective interpretation of con-
approach (e.g. Abramson and Finizza's, 1991, textual factors. There is a great deal of evidence
application of belief networks in forecasting oil that combining two or more independent fore-
prices). However, this approach is costly, and casts leads to significant improvements in accura-
may only be warranted for very important series. cy (Clemen, 1989). The past literature has fo-
Moreover, it may still not accommodate all the cused predominantly on the combination of
objective forecasts, whereas this section explicit-
ly examines the empirical evidence on combina-
tion of objective and subjective forecasts.
3 Prewhitening involves the use of A R I M A techniques to
r e m o v e the signal from a series, so that only white noise Only a few studies have investigated the in-
remains. fluence of situational differences on the effective-
ness of combining objective and subjective fore- was slightly (but not significantly) more accurate
casts. Lawrence et al. (1986) examined the than the management forecast. The management
effectiveness of combination for series having forecasts could not be significantly improved by
different levels of forecasting difficulty and combining them with statistical forecasts.
seasonality. Contrary to expectation, they found However, most of the studies show greater
that combination was more effective for the low improvement from combination than Moriarty
MAPE series. Seasonality, on the other hand, and Adams (1984). Although tests of conditional
was found to have no influence on the benefit efficiency were not made in these studies, it
gained from combination. The effect of forecast appears that, in many cases, judgemental fore-
horizon was also somewhat surprising, as the casters may not use information efficiently.
greatest improvement in accuracy was found in Judges have an information advantage, but they
the short run (cf. Conroy and Harris, 1987). are not fully exploiting it.
Sanders and Ritzman (1990) further investi- Several studies have focused on the influence
gated the effect of series difficulty, oper- of different combination schemes on accuracy.
ationalising it in terms of a coefficient of vari- There is general support for the notion that the
ation rather than MAPE. They also found that greater the number of forecasts, the better
combination was best for 'easier' series. For the (Lawrence et al., 1986; Edmundson et al., 1988;
'harder' series, where judgemental forecasts Guerard and Beidleman, 1987). There is also
were significantly more accurate than statistical some indication that mechanical combination is
forecasts, the combined forecast was less accur- better than subjective combination (Angus-Lep-
ate than the judgemental forecast. This finding is pan and Fatseas, 1986; Lawrence et al., 1986).
consistent with the argument that combination There is recent evidence that a rule-based combi-
would be ineffective when one forecast is sub- nation strategy employing judgemental assess-
stantially superior to another, since combining a ment of 'net causal forces' can aid in generating
very good forecast with a very poor forecast the weights for combining objective forecasts
would be generally expected to reduce accuracy (Collopy and Armstrong, 1992b). This method is
(in comparison to the very good forecast). still mechanical, but uses judgemental input
The evidence presented in Appendix reference based on 'eyeballing' the series. Hence, it com-
Table A2(a)-(c) confirms the result of Section 2 bines objectivity and subjectivity at the combina-
that judgement was particularly accurate when tion weighting level, rather than at the level of
contextual information is available. However it the forecasts.
also indicates that combination was usually effec- There are mixed results regarding the best
tive in these circumstances (cf. Conroy and type of mechanical combination. The related
Harris, 1987; Bohara et al., 1987). Hence, with literature on combining objective methods sug-
context, judgemental forecasts were superior to gests that simple averaging is often just as good
statistical forecasts, but the combination of the as weighted averages based on ex ante correla-
two improved accuracy even further. tions (e.g., Newbold and Granger, 1974). Some
That finding can be related to the notion of studies of the combination of objective and
'conditional efficiency' (Granger and Newbold, subjective forecasts support this finding (Conroy
1973). A forecast is conditionally efficient if its and Harris, 1987; Bohara et al., 1987; Blattberg
error is not significantly greater than the com- and Hoch, 1990), while others indicate that
posite forecast. In a study of the conditional regression-based weighting is more accurate
efficiency of judgemental forecasts, Moriarty and (Guerard and Beidleman, 1987; Newbold et al.,
Adams (1984) compared management forecasts 1987; Lobo, 1991). The use of the contingent
with those based on the Box-Jenkins method. methodology is needed to facilitate the interpre-
They found that the management forecasts were tation of these sort of mixed results, but, un-
conditionally efficient with respect to the statisti- fortunately, most papers do not report the neces-
cal forecasts, even though the composite forecast sary information about task differences. For
example, instability in series may cause problems cast; this may in turn be related to series charac-
for regression-based weighting 4. Also somewhat teristics (see also Section 2).
disappointing among the empirical research is Studies of non-contextual adjustment. This
the lack of testing for statistical significance (cf. form of adjustment may be attempted because
Blattberg and Hoch, 1990). This makes it dif- the judge feels that the statistical forecast is
ficult to make any conclusions with a great deal inaccurate, even though the judge cannot as-
of confidence, apart from just generally stating sociate any contextual information with the
that forecast combination improves accuracy. series. There is mixed evidence regarding the
Clearly more contingency-based research is effectiveness of non-contextual adjustment. It
needed in this area, especially to determine the appears it may be beneficial when there is 'excess
role of contextual information in the combina- error', or room for improvement, in the statisti-
tion of objective and subjective forecasts. cal forecast (Willemain, 1989, 1991; Lim and
O'Connor, 1993), but not when the statistical
3.3. Judgemental adjustment of objective forecast is accurate. Judgemental adjustment
forecasts may also be more effective with seasonal series
(Willemain, 1989), especially those with low
Judgemental adjustment of an objective fore- noise (Sanders, 1992), but again this is related to
cast appears to be quite a common practice (see the quality of the statistical forecast (Lim and
Turner, 1990, for instance). In terms of use, it O'Connor, 1993; Carbone et al., 1983).
seems to be the major competing alternative to Contextual adjustment. When contextual in-
combination for integrating subjective and objec- formation is available, the judge may wish to
tive forecasting. Yet, strangely, there appears to adjust the model-based forecast to incorporate
be a lack of studies comparing these two com- the effects of this extra knowledge. This type of
mon approaches (cf. Lim, 1993, who found adjustment is generally effective (cf. Mathews
adjustment to be less accurate than combination, and Diamantopolous, 1986). However, as in
albeit in an extrapolation task without contextual Section 2, it is again difficult to attribute the
information). accuracy improvement solely to context-ex-
Based on the literature on anchoring and perience and motivation may also play a part.
adjustment (Tversky and Kahneman, 1974), There is a lack of studies which isolate these
judgemental adjustment might be expected to factors, as Edmundson et al. (1988) and Sanders
have a detrimental effect on accuracy. However, and Ritzman (1992) did in comparing objective
the presence of contextual information, which and subjective methods.
we saw in Section 2 to be important in Structured adjustment. The practice of
judgemental accuracy, may have a mediating judgemental adjustment has been criticised
role. With context, one would expect, a priori, because of its informal, ad hoc nature (Bunn and
that adjustment may be beneficial because the Wright, 1991). Four studies applied structured
judge would bring information to the forecast methods to generate judgemental adjustments.
that the model could not. However, without Reinmuth and Guerts (1972) used Bayesian
contextual information, judgemental adjustment techniques to incorporate customer survey in-
might worsen accuracy because of the action of formation into the adjustment made by the
the anchoring and adjustment heuristic. The marketing staff. Cook et al. (1984) and Wolfe
effectiveness of adjustment may also depend on and Flores (1990) used the analytic hierarchy
the initial accuracy of the base statistical fore- process (AHP) to generate adjustments. In
Wolfe and Flores' study, subjects were given
contextual information in the form of a brief
4 Kang (1986) found, in combining objective forecasts, that economic commentary, accounting information,
simple averaging is best with unstable series, because in-
stability tends to invalidate the ex ante weights derived from
and historical EPS for one real undisguised
past series behaviour. company. The resultant adjustments significantly
104 R. Webby, M. O'Connor / InternationalJournal of Forecasting 12 (1996) 91-118
improved the accuracy of ARIMA-generated people made better judgements when using a
forecasts, especially when the A R I M A forecasts decomposed approach. Decomposition was espe-
were initially inaccurate due to series volatility. cially useful when the subject knew relatively
Wolfe and Flores also examined the effect of little about the topic. Phelps and Shanteau
expertise and forecast horizon, but neither af- (1978) found that expert livestock judges consid-
fected accuracy. The external validity of this ered fewer than three pieces of information in
research may be limited because it appears that picture form in comparison to 9-11 cues when
subjects did not see the base forecasts, or the the information was decomposed to a verbal
resultant adjustments. The AHP is also quite an form. They interpreted this result as implying
involved approach, although the complexity that there are inter-correlations present in real
problem was partially addressed in a later study stimuli that tend to reduce the number of dimen-
by Flores et al. (1992). sions processed, but it may also imply that
Bunn and Wright (1991) point out that even decomposition enhanced the amount of infor-
the use of a structured adjustment process is still mation processed and hence decision quality (or
ad hoc, lacks interaction with the model and is that decomposition simply made hard-to-see cues
"mostly qualitative". They proposed the need obvious). In work related to decomposition,
for an "interactive decomposition structure"- Fischhoff and Bar-Hillel (1984) found that focus-
interactive in the sense that judgement should be ing techniques such as isolation analysis can
part of an overall coherent modelling effort. The improve judgement in inferential problem solv-
topic of judgemental decomposition is examined ing. Isolation analysis requires subjects to con-
in the next section. sider what judgement they would make if each
cue was the only one available, prior to making a
3.4. Judgemental decomposition summary judgement based on all the informa-
tion.
The principle behind decomposition is the Lyness and Cornelius (1982) compared holistic
notion of 'divide and conquer' (Raiffa, 1968). and decomposed judgement in a performance
Task decomposition encourages a person to rating experiment. Students were required to
concentrate selectively on one element at a time evaluate hypothetical instructors based on writ-
by breaking a task into its component parts. ten information. Lyness and Cornelius found
The basic difference between judgemental that an algorithmic decomposition strategy
adjustment and judgemental decomposition of (where the ratings were derived from dimension-
time series is that adjustment tries to account for al evaluations and importance weights) was gen-
past and future contextual factors and model erally more effective than holistic judgement and
misspecifications after the forecast is made, while clinical decomposition (where dimensions were
decomposition tries to remove the effects of past considered separately, but the subjects then
contextual factors from the series history before made overall evaluations). Unfortunately
extrapolating and accounting for future-oriented because of the nature of the task, their analysis
factors in the forecast. In essence, decomposition considered only reliability, convergence and
considers the assessment of current status to be agreement measures; it did not compare the
separate from the assessment of change (Arm- predicted values against actual or optimal values.
strong, 1985). Moreover, the result varied according to the
This section considers the empirical studies measure used-with mean absolute deviation
that have investigated decomposition in general, between raters, decomposition was superior;
and then focuses on the use of decomposition in with correlations, holistic and decomposed
time series forecasting. judgement were equally effective. Lyness and
Armstrong et al. (1975) asked subjects a set of Cornelius speculated that decomposition would
general knowledge questions, and found that be most advantageous in complex circumstances,
and tested the effect of information load (three, certain circumstances and should be applied with
six or nine performance dimensions) on the caution. This issue will be discussed further.
effectiveness of decomposition. The hypothe- In the context of time series forecasting,
sised interaction between judgement strategy Edmundson (1990) developed a decision aid,
(holistic, decomposition) and information load called GRAFFECT, that encouraged judgemen-
was not found for the reliability and convergence tal forecasting along the lines of classical time
scores. High inter-rater agreement was found series analysis. Users were able to judgementally
with algorithmic decomposition at high infor- extract the trend and seasonality from a time
mation levels. However, the significance of this series, leaving a residual component. Edmun-
result was not statistically tested. Hence, the dson found that the extrapolative accuracy of
impact of decomposition on relationship between both novice and expert forecasters using GRAF-
environmental complexity (information level) FECT were better than unaided forecasters and
and decision quality has not been fully investi- deseasonalised single exponential smoothing for
gated. the shorter forecast horizons. Considering that
Three recent papers by MacGregor and as- the subjects were not provided with a contextual
sociates have further explored the role of de- information advantage, this seems to be a some-
composition in answering almanac-type ques- what surprising finding. Why should decomposed
tions. MacGregor et al. (1988) showed that judgement be superior to one of the most accur-
accuracy increased with greater structure in the ate statistical techniques (see Makridakis et al.,
decision aid. Subjects performed best when 1982) in a purely numerical task? One possible
provided with the full algorithmic decomposition explanation is that the subjects were able to
strategy-they were given the complete algo- recognise and discount discontinuities in the
rithm, asked to estimate its components, and series that were naively incorporated by the
then shown how to combine the parts. statistical model. The results of O'Connor et al.
MacGregor and Lichtenstein (1991) showed (1993) and Sanders and Ritzman (1992), to a
that extending algorithmic decomposition to in- lesser extent, suggest otherwise, but their series
clude alternative problem solving approaches led were artificially generated and their subjects
to further improvement in performance. How- were given different instructions and lacked
ever, they found that providing training on Edmundson's decision aid. It is, of course, open
decomposition, so that people could create their to speculation as to how well the subjects in both
own algorithms, was of little benefit. They also those studies might have performed if the events
cautioned that "erroneous knowledge incorpo- causing the discontinuities had been identified.
rated into an estimation aid may lead one astray Collopy and Armstrong (1993) showed that
without a warning that it is doing so" (p. 115)- decomposition of series by 'causal forces' can
subjects with misinformation tended to perform reduce forecast error. The effect of traffic vol-
particularly badly. ume was removed from three series (highway
One issue raised in the MacGregor and Lich- deaths, injuries and accidents) to create com-
tenstein paper was that decomposition seemed to ponent series that were forecast separately and
work best with estimates of large numerical then recomposed with the forecast for traffic
values. MacGregor and Armstrong (1993) con- volume. All forecasts were made using Holt's
firmed that decomposition improved accuracy for exponential smoothing. Collopy and Armstrong
problems involving extreme and uncertain val- found that decomposition was effective for the
ues, but was less accurate for problems with deaths and injuries series, but not the accidents
target values that were not extreme and uncer- series. They cautioned that multiplicative re-
tain. It was not clear whether the contingency of composition of decomposed components can be
decomposition performance was due to the large risky because it has the potential for explosive
magnitude of the target values, or their uncer- errors, echoing the sentiments of MacGregor
tainty, or both. However, these papers indicate and Lichtenstein (1991) and MacGregor and
that decomposition may be a risky strategy in Armstrong (1993). They also stressed that re-
search attention should focus on the conditions bility, cost, and other factors identified as im-
where decomposition is effective. portant by practitioners (see Collopy and Arm-
Goodwin and Wright (1993, p. 154) suggested strong, 1992a).
that decomposition would be ineffective where: The incorporation of contextual factors in
• the decomposition is mechanical and/or the model building involves either quantifying those
judge is sceptical about the decomposition tech- factors in an extensive econometric model or
nique that is being employed; using the technique of model intervention. Both
• there is unfamiliarity with the technique approaches would produce credible forecasts,
used for decomposition and the type of judge- but they suffer from high set-up costs. Model
ments required by it; intervention is a complex approach, requiring an
• the judgements required by decomposition understanding of the underlying model in order
are actually more complex psychologically than to codify beliefs about contextual factors by
holistic judgements; parameter setting.
• the judge experiences boredom or fatigue Forecast combination is a relatively practical
because the duration of the task is lengthened and simple approach which generally improves
and the number of judgements increased. accuracy. However, it suffers from duplication of
Goodwin and Wright (1993) also emphasised effort, redundancy of information (see Clemen
the need for decomposition research to focus on and Winkler, 1985), loses resolution when the
"analysing and integrating judgements based on dependent variable (e.g. sales) is unstable (see
time series information with contextual infor- Moriarty, 1990), and does not facilitate back-
mation" (p. 155). Very little work has examined ward inference 6 (see Einhorn and Hogarth,
judgemental decomposition of contextual factors 1982). Some believe that combination should be
from time series. There are software packages replaced by an all 'encompassing' model
that facilitate additive decomposition of special (Diebold, 1989). The counter argument is that
events (e.g. FORSYS by Lewandowski, 1982) combining is more cost-effective than building a
and multiplicative decomposition of events (e.g. single comprehensive model (see Bunn, 1989).
FUTURCAST by Carbone and Makridakis5), Judgemental adjustment of a statistical forecast
but it seems that only anecdotal evidence (e.g. is perhaps the most easy-to-use and cost-effective
Gorr, 1986a,b) has been reported about the of the four approaches, but it introduces the
effect of event decomposition on forecast accura- possibility of bias from the use of the anchor and
cy. Lee et al. (1990) developed a prototype adjustment heuristic. As Bunn and Wright (1991)
expert system for forecasting the demand of oil emphasised, judgemental adjustment is ad hoc,
products for a single company. The system was informal and open to adversarial criticism.
designed to detect historical events, relate analo- Judgemental decomposition is arguably more
gous events, and to decompose events from the complex than combination or adjustment, but
time series. However, the system was not empiri- brings structure, defensibility and backward in-
cally tested-the authors simply reported on its ference to the task of forecasting with contextual
use in an oil company. information. It is possible that decomposition
can reduce cognitive load, but it may also be
3.5. Summary risky and ineffective in certain circumstances
(Goodwin and Wright, 1993). Research is
Four approaches to integrating subjective and needed on the accuracy of judgement decompo-
objective forecasting have been examined. All
have shown the potential to reduce forecast
error, but they may differ in ease-of-use, credi- 6 Backward inference refers to understanding causality
from past data. For example, it may be possible to infer that
a high sales value was because of an advertising campaign.
This would have benefits not only in forecasting, but also in
5 Reported by Gorr (1986a). assessing the effectiveness of marketing.
sition of special events. Such an approach may

Y Yearly C) None, oe very little -- Flat
facilitate the interpretation of 'soft' event in- O ~a~/ ( ~ Low 6 Up
formation by the judgemental forecaster, and M Monthly ~ Moderate ~' Down
4w F o u r ~
have broader implications for the integration of W Weekly ( ~ High
o DaJ~ • very hch • Better
soft and hard information in general decision v Worse
making. '/ Yes - Not sigrdficant = Same
X No rVa Not appcopnate
[blank] Information no4 available, or infonnation is repeated from above

4. Conclusion ..... The group ol information above this line is duplicated below it
This paper has reviewed the literature on the Fig, A1. Key to abbreviations and symbols used in Appendix
importance of human-interaction factors in a reference tables.
comparison of judgemental and statistical ap-
proaches to time series forecasting. It demon-
strated the crucial role of the knowledge of the a key to the symbols used in the reference tables.
context of the time series to accuracy. The It is hoped that future empirical studies would
accuracy of judgemental eyeballing may be provide greater information about their data,
roughly equivalent to the statistical methods, but methods and environmental characteristics
the major contribution of judgemental ap- because in many instances it was difficult to
proaches lies in the ability to integrate this non- adequately classify past studies within this con-
time series information into the forecasts. Ac- tingent framework. We also hope that future
cordingly, Section 3 examined four ways of literature reviews could utilise the framework
integrating judgemental and statistical forecasts- and build and improve upon this approach.
model building, combination, judgemental ad- A brief comment about the classification rules
justment and judgemental decomposition. Sim- for certain factors:
ple combination provides considerable advan- Trend-The trend factor reports the direction
tages, while judgemental adjustment may be of slope in the series (up, fiat, down). Ideally, it
subject to bias. Promising advantages can be would be useful to present the mean gradient (or
made from an improvement of the process of equivalent) in the series, or at least the propor-
decomposition in any combination process. tion of ups versus downs, but these measures
were rarely reported in the studies.
Seasonality-The seasonality factor indicates
Appendix. Reference tables the presence of signal or pattern in the series.
This measure should be interpreted in conjunc-
This appendix presents the tables that were tion with the data presented in the "Period"
constructed to provide the detailed support for column to determine the type of seasonality (e.g.
the conclusions made in the main body of the monthly versus quarterly).
paper. They are included here because we feel Historicals-This refers to the number of his-
they are a useful reference source for future torical data points used. In the studies of 'real
research in judgemental forecasting. It should be world' forecasting, it is often impossible to
possible for a researcher doing work on the determine how many data points were used by
influence of seasonality, for example, to scan the the judges, so the table presents the number of
table and quickly find the studies that examined data points used by the objective methods in
seasonality. those cases.
The tables do however require some interpre- Forecasts-This refers to the number of fore-
tive effort on behalf of the reader. A graphical casts made by the human judge. This measure
presentation format was chosen to display the should be interpreted in conjunction with the
results succinctly and attractively. Fig. A1 shows 'feedback' dimension described in the next sec-
108 R. Webby, M. O'Connor / International Journal of Forecasting 12 (1996) 91-I18
tion. If feedback was not provided, then the siderable experience in forecasting. • Forecast-
number represents the length of the forecast ing experts or practitioners whose job involves
horizon, and the error measures have been the regular preparation of forecasts (it is not
calculated across the entire horizon. Where the known if they are really experts, but it is beyond
forecast is indicated as a specific number (e.g. the scope of the review to assess this).
#16) or a range ( # 1 6 - # 1 8 ) then the errors Context-The following rules were used in
represent those specific step-ahead periods. making assessments of context: O No contextual
Feedback-The assessment of feedback is a information supplied; only the time series.
binary measure showing whether feedback was The series was labelled (e.g. "quarterly earn-
provided to subjects. Unfortunately, it was dif- ings"), but subjects were unable to place the
ficult to determine whether outcome or perform- series in a broader context, through lack of
ance feedback was provided in most cases. detail. ~ Incomplete level of contextual infor-
Experience-The following decision rules were mation (e.g. a production department forecasting
applied: O Undergraduate students with no without knowledge of the plans of the sales
experience in forecasting. (~ Graduate students department). • External level of contextual
or equivalent, who may be good surrogates for information, e.g. security analysts. They may
managers (Remus, 1990), but generally have possess s o m e inside information and have some
little experience in forecasting, i~ Subjects with influence on the outcome, but this is generally
some training and evidence of some practical limited in comparison to managers (Brown,
experience in forecasting. • Subjects with con- 1988). • Internal level of contextual informa-
Table Al(a)
Objective versus subjective methods of time series forecasting (studies ordered by amount of contextual information)
'%- ~,. %, ~o. ~.. ~,,_ ,esr

STUDY SERIES
Lawrence, M-Competition
Edmundson & [N-I 1I]
O'Connor (1985)
Andreassen a Economic and

(1991) sporting
a Andreassen investigated the effect of causal prediction versus extrapolation. Subjects were significantly worse at causal
prediction (P < 0.0001), but only his results involving time series extrapolation are presented here.
tion, e.g. managers who not only have inside critical. • Forecasting represented a major re-
information, but also significant control over the sponsibility of the job and judges were account-
outcome of the forecast variable, including the able for accuracy. • Judges recognised that
ability to change a firm's accounting practices to these series were very important and were ac-
make the reported outcome consistent with the countable for forecast accuracy.
forecast (Brown, 1988). Accuracy-In the case of Appendix reference
Motivation-The following decision rules were Table A2(a)-(c) (Combination), the "Accuracy"
utilised: © No incentives, accountability or column compares the accuracy of the resultant
reason for accurate performance. ~ Rewards or forecast with the error of the combined forecast
course credit offered to students. ~ The judges with the best single method used. For Appendix
were reasonably well motivated, as forecasting reference Table A3(a)-(c) (Adjustment), accura-
was part of their job, but they were neither cy is compared to the original (base) forecast.
accountable nor of the belief that the series were
Table Al(b)
Objective versus subjective methods of time series forecasting
~ 4)~e~,o~~o= ~_ Q~. BEST
STUDY SERIES
%
Lawrence&
b
Artificial
6 'Y' ~ ' O O 7 #31 X ~ OLSregmssion ,SA.t~,nts -lg G O O "5.0~
Makddakis 6 [N.-350] 1.6 }.01
[N=181
(1959) 6 i 8.7
#8 -20.4
4.4
32.1
. . . . . . . . . . . . . . . .
6 1.2 }
6 Q 2.5 -
6 1.7
#8 4.9
7.6
4.7
=gnal
&
Lawrence Stationary 20 'O' ~ (~! C) 4 x # Generalmodel- 0.6 s~a,ntsIN-~l 17 G O O 1.4 v
O'Connor(1992) ARMA [N=40] _ -- of-judge 1.1 2.6 V
20 20 1.9
20 40 2.2
Lawrence& M-Competitiond 68 M #1 X tt Model-of.judge 13.1 Students[N-?] 1~ O O Ol1"3 ~ -
O'Connor(1993) [N=1111 #2 16.2 10.2 .015
#3 15.g 11.6 • -
#4 15.3 11.1 • -
Adam & Ebmt Generated

'M' , ~ C) 60 12~ ~ # $ES Grad.
students[error=
MAD] 2o O17.7 v o2
(1976) from a 18.8 V ,001
demand ...................... ~ ... 25.1 V .001
function [N=?] . . . . . I. . . . . . . . . .
Winter's 3 factor 12.4 V .001
28.6 V .001
b The effect of a vertical repositioning of the graph was also investigated. With a low position, judgemental performance was
worse than regression (P <0.01).
Error measure is only the deviation from the regression forecast (i.e. judgemental estimate-least squares). Hence, effect on
accuracy cannot be reported.
d Only the results for monthly data are presented here. The results for annual and quarterly series and between seasonal and
non-seasonal series also supported the finding that the bootstrapping model was less accurate than the human judge.
° Their analyses were based on each forecaster's last 12 forecasts. They found that "terminal performance was significantly better
than initial performance (P < 0.001)".
f Adam and Ebert also investigated the effect of telling subjects that the series represented demand for medical supplies, but
found no differences in accuracy.
Table Al(c)
% ~ % ._'%~'% % % %
+ ~, %. ~ a , ~ ~a-n% % , " % "%/ OBJECTIVE ~,~
STUDY SERIES METHOD JUDGES ~ "t) ~ Ps" ~ ~, 't)
Sanders (1992) Artificial [N=10] I 'M' -- 0 ; ( ~ . 0 48 12 X ',X SES 3.0 Urldergl"aduatiNl -7-8 0 0 0 ~:: V .0S
1 16.4 IN=Gel V .06
"',--,''',',',-',',,---, . . . . i ' - - - 1 - - - I - - - 1. . . . . . . . . . . . . 1 . . . . I- . . . . . . . . . . . . . . ;.---f --.I---t---P
,ll, ~ Winters 1.5 4.0 V .08
1 (18 8.7 13.9 V .05
5.4 5.7 V -
1 .. 14.8 16.8 V .05
3.2 6.5 V .0S
19.9 23.7 V .05
1 -:O ~ @ SES 7.7 7.1 • .05
I 21.2 26.4V .05
O'Connor. Artificial [N=10] 11 'Q' - 0 ~ 0 20 28 SES 4.9 Graduate students 17 O @ O 7.o v .o5
Remus & Griggs 18.3 [N-171 21.1 V -
(1993)
1 ,6 ~@ ARR 6.2 8.2 V .05
13.5 18,0 V -
6.4 7.8 V -
15.7 19.4 V .05
6.3 9.7 V
17.5 24.1 V
7.7 10.5 V
18.7 25.3 V
Carbons & Gorr M-Competition 2 Y 6 X ~' Holt-Wintsrs MBA students g h
(1985) [N=I0] 2 Q 30- 8 [MdAPE] 3.9 (7 tswns of 2) 7 O0 0 ;'.2 v ,'
6 M 140 16
g This data represents their initial 'eyeball' extrapolation. A second group of students were involved, but they did not make an
eyeball extrapolation.
h The statistical testing of these results has been criticised (Andreassen, 1991).
Table Al(d)
STUDY SERIES " ++ " 0 : " 1 % . '%%'% OBJEC,,VE

METHOD
Edmundson,
Lawrence &
Sales[N=18]
M key - 10
18 ~l~ 48 6 DEE 311 a m s ~ n t , 6 @ O O as~ v -
28.8 36.7 V
O'Connor (1988) - non-key 8 ......... 33.9 33.3 •
Me,~,,, 6 • 0 0 32.3 v
32.9 V
32,3.8 •
Management I • • @ 22.7 • .001
group 8 19.0 •
27.3 •
Sanders & Demand for 22 D (~ ~ X AvemgeofSES, ~.~ 'Non-W~c~' 4-S O O (~ 30.6 V
Ritzman (1992) )ublic .~_ -750 65 Holts, AEP 28.2 etudents[N=32] 38.6 V
wemhouae 74,3 75.2 V
132.7 135.3 V
'T~h~.=,' 4~ @ 0 O 28.2 v
students [N=49] 34.7 V
73.2 •
"--T--'F--T--'r ..... "i"- - -J- - - -I'- - t - - -f . . . . . . . . . . . F-- - - I . . . . . . . . . . . . 1373
4 - - -I.- - - L _ _L. _ _1£.. _'. V i
p~,irm=~o~ 1 ;• • @ 27.s v o6
26.9 V -
51.5 • .05
66.2 • .06
Walker& Salesof 2 Q!• 32- 12 i/ ARIMA 6.6 SIdlNIdI~oL j 1 Oi~ O 14.4 V .025
McClefland consumer 40 Production dept. 1 9.3 V --
(1991) products Finance dept. 1 10.1 V --
Maklidekisetal. M2-Competitio~ 23 M 14 ~' Combination of

(1993) SEE. Holt, 11.4 Combination of k 6 • ~ ~ 13.1 V
Dampen. Long experts
' Students wer only provided with 3 months of daily actuals and 2 years of monthly actuals (both in table form).
J T h e forecasts by the sales d e p a r t m e n t were reported to be subject to political and organisational biases.
k M o s t , but not all, experts were using objective m e t h o d s (some judgementally adjusted) to m a k e their forecasts. M o s t also used
contextual information.
Table Al(e)
BEST
OBJECTIVE
STUDY SERIES METHOD .,. uoo,,
Armstror,g (1983) Company 160~ Y Extrapolative 28.4 Managersand Qm.~naty 21,0 most !
[aVenlgeS of 14 earnings analysts ~ 3V
studies] a
Brown, Company 233 Q 60- #1 Brown-Rozeff 2r.3 A.,~.,- • • • 20.7 :

~ .01
Hagerman, earnings 83 #2 Brown-Rozeff 30.7 (Valueline) .01
Griffin & #3 Brown-Rozeff 33.1 28.7 • .01
Zmijewski (1987)
#1 Brown-Rozeff 34.0 Security • ~ ~) 28.0 •
Hopwood & Company 258 Q 68 ~ Brown-Rozeff 39.1 analysts 36.7 •
McKeown (19g0) earnings #3 Brown-Rozeff 42.1 (ValueLine) 39.8 •
#4 Hopwood- 43.8 43A •
McKeown 2
Schmmrs & Industry-level 84 Y I X Naive 27.0 Bus/hess ~ ~(~;19.6 •

Mohr (1988) variables WNk editors
(e.g. sales,
)rofit, market
share)
Table A2(a)
The combination of objective and subjective methods of time series forecasting
STUDY
Lawrence, a
Edmundlxm&
O'Conmr (1~ )
~us-t.sppan &
F e ~ (19ee)
EdrmJndt~,
Lswmn~ &
O'Connor (1gee)
Guem.:l &
Bekhman (191W)
Guera~ (I~7)
a The effect of seasonality was also investigated, and found not to influence the benefit of combining forecasts.
b Errors for 1981 sample. The results were similar, yet more marked in 1982.
Table A2(b)
SINGLE COMBINATION "~ ~'¢4,~

METHODS
IrrUDY Method Weighting % % %
I~wt,~l, EP8 for 85k A ¢ 5 7 ^na~ete • i On • 1. Value Lina fl 15.6 All Equal 19.6 •
Zumw~lt & electric utilities 73 (Value Line} 2. Dividend 20.8
Kw~n(l~7) grow~
[error=MSE I 3. Sharpe-llnter 21.2 All Regression 15.3 •
4. Brown-Rozeff 24.6
Lobo & Nair EPS 95 Ac 16 1 s~un~/ !O ~) • 1. Analyst n/a 1&.2 Equal 52.9 •
(1~I) analysts 2. ARIMA 'FI" 1&3 55.9 •
O/8/E/S) 3. ARIMA 'F2" 1&4 58.0 •
4. ARIMA 'F3" 1&5 48.1 •
[Percent e 5. Random walk
Better] with drift
Lobo (1991) EPS 96 Ac 16 1 1. I/B/FJS 55.6 1&4 Linear f 52.0 •
analysts 2. Value Line 54.7 i 1&3 model with 50.4 •
(ValueUna 3. Random walk 64.1 2&4 no constant 50.1 •
andI/B/E/S) with drift 2&3 term and no 50.8 •
4. ARIMA 64.7 1,2,3,4 constralnte 50.4 •
Conroy & Hattie Financial 1to • 0 • 1. Mean analyst 24.0 1&2 Equal 29.5 9
EPS 15~ A 20 #1
(1987) analysts from 12+ forecast
1&3 Minimum
brokerage 2. Random walk 42.1 Variance 30.0 7
finns 3. Best of 5 47.8
meCKx:lsg 1&2 Minimum 36.3 V
Vadance
#5 37.2 33.7 •
34.9 36.1 v
38.1 45.0 v
Quarterly data were used ion the extrapolative models, and aggregated.
Other models were used but not combined.
° A measure of percentage error was also presented, but had to be winsorised to remove outliers.
~This was the best technique for generating combination weights in the study, superior to equal weighting and other linear
models.
The most accurate method (based on lowest APE for last year's forecast) was selected from random walk, single exponential
smoothing, double exp. smoothing, triple exp. smoothing, and average EPS over the past 5 years.
Table A2(c)
<1.1) p.uee I I I I I I I I I I ~o~o~iete f I I I 2. AR,MA 134.2

i , i i i i i i I i J i i i 3e=,~.t~ I~:;
en,,,.~No.. M~ro- Jl 41Q] I ] I Ille]281 I E~oro.ic
i__i.._4_4__
Is01eJ~)le1.Judgemant [9.2
&Battl(1987) qmonomic I I I I I I I I i I forec~stem I to I I [ 2. ARIMA 19.3 lq!~,2
Mo,.r~& S.~,oo, I 2 I MI I I I 172 136'1 I M..,~.~, I lelele 1.Ju~,,maot 114.5
84ndlmnl& Demarclfor I 22 I D I I I ~'~ 1-750[ 2501 J Expertplannars r Itl ~l~ll.Judg ement r 26.3 1~1~ l l / 4 O t ~ 1 * I 17.21 A ] I
Ri~an<l~O) p~li¢ I I I I I W I I I I I I I I 126.1
"ro~" I I I I I r~ I I I r I J I J I 150.6
IIII I'lllll J
I I I I oo0 nl:t16 n 64.1
I I I I I ] I r I I I I I I of obi.=~. 127.9
i i i i i i i i i i i i i II"~hn~°" 174°
131.0
Blattl~rg & Hoch Catalog fuhion > M? 3-6? Buyers and - ~ - ' ~ - ' ~ ~ ~
eak.,nowo. IlSOOl I I I I I I I I [ bi'andmanagam I-I-[-f~:~:~;~r-a;~,;~ I~
,1, I IIIIII I II I(5'''e' I IIIm° 1.524
Only the results for the cattle prices are presented here. Combination was also effective with hog and broiler prices.
Only the results for one representative series (corporate profits) are presented here.
k The MSE for the first 20 points of the forecast horizon were regressed to determine weights. The remaining eight points were
used in out-of-sample comparison. A 'restricted' regression combination was also presented.
~The weights were determined by fitting OLS regression to 24 of the forecast points; the remaining 12 points were used in ex ante
comparison.
mBox-Jenkins was less accurate than judgement over the full 36 points (MAPE of 19.2 versus 11.1).
"Adaptive exponential smoothing (1/3), single exponential smoothing (1/6), linear regression (1/6), adaptive estimation
procedure (1/6) and Holt's two parameter model (1/6).
Table A3(a)
Judgemental adjustments of statistical forecasts
STUDY SERIES _,u,_ +=,+ %

ADJUST- % -
.E+
"~' ~ ' ~
Wilem=n 8 Artiflal=dARIM/ 12 i (5) - - l l( 30 10 X + If I MBA students& 1 ~ 0'0 SmartF°mcastsb 0.1 Horistic (0.1) v
(1989) [N-,.~] 12 if(drift) faculty [N=12]
12 I if ['excess" error] 3,51.8
Willemain (1991) Sekcted from 7 A, 6, Students[4] "l Naive Holistic (12) V
M-competition 10 O, 8, Executives[3] } 10 ~ 0 0 SmartFocecast, (6.4) V
[N..24] 7 M 18 Faculty[3] J
tim & O'Connor ~ from 10 M 30 V'I ¢ C-lrad.~.~lQnfa 6? (~ 0 O Naiv® c 21.7" Holi,~e 20.3 1.4 • .002
(1993) M-Competition t. "X. . . {N=64] Damped exp sin. 11.8 18.8 (4.0) V .0005
5 0.7 •
(1.9) V
2.1 •
: (6.1) v
Sanders (1992) Artificial[N=IO 1 'M' - - 0 (~ 0 48 12 X iif,x Undergraduates -7-8 0 0 0 SES 3.0 Ho,stic 2.9 0.1 • -
1 ~p [N.38] 16.4 21+8 (5.4) V .05
1 ~ (~ Winters 1.5 1.8 (0,3) V .0S
1 _. (~ 8.7 12.0 (3.3) V .08
1 5.4 4.0 1.4 • .05
1 14.8 17.4 (2.6) V .05
1 2.8 0.4 • .05
_1 ....... 19.9 20.0 (0.1) V -
1 - 0 0 @ SES 7.7 6.0 1.7 • .05
1 212. 2B.1 (6.9) V .05
Cerbone, .Selectedfrom 12 A, X 13+ 6, X MBAstudents S ( ~ ( ~ ( ~ Holt-Winters Holistie V -
Andersen, the M- CI, 150 8, (6 teamsof 2) Ca¢oone-Longin, ~7 .05
Cordveeu& competition M 18 Box-Jenkins V -
Corson U~e3) [N-2S] 13
V .05
V .C5
• -
Willemain also investigated the effect of autocorrelation, finding that improvement was greatest for positively autocorrelated
seasonal series.
b SmartForecasts If selected one of five parsimoneous extrapolative techniques.
c The objective forecast was provided after subjects had made a judgemental extrapolation.
Table A3(b)
STUDY SERIES
BASE
FORECAST
%
CaCoone& GO~T Selecmclfrom 2
(1965) M-eompe~on 2
A,
Q,
3O-
140
6-
18
X IfMBA students
(7 teamsof 2)
, !eeo Holt-Winters 3.9 I(1.1)
Carbone.l.cx~ini 5.4 } H°listice S'0 [ 0"4
V
IN-+O] 1.7 .•
6 M [medianon'ors] Box-Jenkins 6.7
AngUlV+ & Infarestrates I M 48 12 X X Undergmduafa 1230 G/G Combir~tion of 18.0 Holistic
Fat=mu (1988) 12.4 5+8 •
students subjective and
obie=~
exlmpde~
(lee exhibit2.2)
Wol~ & Rote= g Elrningl per 14 Q >4o 4 x ¢ coqx:,ratemar, 1~ 0
(199o) shara [N-2Sl Officers ARIMA 102 Analylic 96 72 • .03
Hierarchy !
14 Executive MBA 1h ( ~ Procelm
~denfa 60 (AHP) 56 3.9 • .06
Floras, Oleen& Centroid (=imiklrto AHP in
w~4fe (1992) Sameas above
meta," fora~=t =~umcy gains)
I I I
d Another group of students was used, but they were not required to revise their base forecasts.
e Revision was made after considering all three base forecasts.
f Students made eyeball extrapolations (graph and table), combined these with an objective forecast, and then made adjustments
after being told the nature of the series.
g Wolfe and Flores also found that improvement was greatest for series where the ARIMA forecasts were inaccurate.
h TWO subjects (and hence series) were excluded from the analysis in each case.
Table A3(c)
% ,oJus,.E.,
Reinmuth& ~lasof 1 M -60 1 M°r~,~, ? !0!010 S~milarto 10.8 Bayesian 0.8 10.0 • n/a
Guerts (1972) frozenfood Customers 9 Winters' exp.
, smoothing
Mathews& Sa!e=,ofUK 1281 M 8 1 Product I • • • Holt's 97.7 HolistJc 115.9 (18.2) V
Diamantopo~ous healthcare i managers exponenta
(1986) and company smoothing
Diamantopolous &
Mame~ 0~9)
Mafflews& SalelofUK 9001 M 8-14 #1 ~' HoWs 41.1 Holistic 27.2 13.9 &k .0OO
Diamantopolous health care #6 ~ Same as above - m exponential 29.9 21.1 8.8 • .0OO
(1989,1990, company smoothing
1992) [error shown, t(A-F)/(A+F)I]
Cook, Fak~hi& Household 8 Q 43- 4 Expedencad • • • Urban popula0or 0.9 Analy~c 0.7 0.2 •
Mariano (1984) ~pulalJorl & 76 forecastom & model wit~ a Hierarchy
mlcrneconomic economists Box-Jenkins Process
vedables component (AHP)
Donihue(1993) Macrcaconomic 12 Q 4 Groupof 1 • • • Macrotconornic HolislJc •

forecasters model (MQEM)
[error = RMSE]
McNeas(1990) Macrcaconomic 21 Q 1-8 Groupsof 4 • • • Macrneconomic Unknown 87 • .10 I
3rominent models to
forecasters 62
[Percent
Better]
~The marketing staff made probabilistic assessments of the effect of promotions on the forecast, and these assessments were
modified by Bayesian incorporation of sample information from the customers.
i Approximately "one third to one half" of the 900 forecasts were revised. M&D (1990) found that the selection of forecasts for
revision was effective.
k Results varied by error measure and period, but adjustment generally improved accuracy.
Improvement was only significant for the one-quarter ahead forecast.
References Armstrong, J.S., 1985, Long-Range Forecasting: From Crys-

tal Ball to Computer, 2nd edn. (Wiley-Interscience, New
Abramson, B. and A. Finizza, 1991, Using belief networks to York).
forecast oil prices, International Journal of Forecasting, 7, Armstrong, J.S. and F. Collopy, 1992, Error measures for
299-315. generalising about forecasting methods: empirical com-
Adam, E.E. and R.J. Ebert, 1976, A comparison of human parisons, International Journal of Forecasting, 8, 69-80.
and statistical forecasting, AIlE Transactions, 8(1), 120- Armstrong, J.S., W.B. Denniston and M.M. Gordon, 1975,
127. The use of the decomposition principle in making judge-
Andreassen, P.B., 1991, Causal prediction versus extrapola- ments, Organisational Behaviour and Human Performance,
tion: effects of information source on judgemental fore- 14, 257-263.
casting accuracy, Working paper, Sloan School of Manage- Balzer, W.K., L.M. Sulsky, L.B. Hammer and K.E. Sumner,
ment. 1992, Task information, cognitive information, or func-
Andreassen, P.B. and S.J. Kraus, 1990, Judgemental ex- tional validity information: which components of cognitive
trapolation and the salience of change, Journal of Forecast- feedback affect performance? Organisational Behaviour
ing, 9, 347-372. and Human Decision Processes, 53, 35-54.
Angus-Leppan, P. and V. Fatseas, 1986, The forecasting Batcheior, R. and P. Dua, 1990, Forecaster ideology, fore-
accuracy of trainee accountants using judgemental and casting technique, and the accuracy of economic forecasts,
statistical techniques, Accounting and Business Research, International Journal of Forecasting, 6, 3-10.
16, 179-188.. Beach, L.R. and C.R. Peterson, 1967, Man as an intuitive
Armstrong, J.S., 1983, Relative accuracy of judgemental and statistician, Psychological Bulletin, 68(7), 29-46.
extrapolative methods in forecasting annual earnings, Beach, L.R., V.E. Barnes and J.J,J. Christensen-Szalanski,
Journal of Forecasting, 2, 437-447. 1986a, Beyond heuristics and biases: a contingency model
of judgemental forecasting, Journal of Forecasting, 4, 143- Clemen, R.T. and R.L. Winkler, 1985, Limits for the
157. prediction and value of information from dependent
Beach, L.R., J.J.J. Christensen-Szalanski and V.E. Barnes, sources, Operations Research, 33(March-April), 427-442.
1986b, Assessing human judgement: has it been done, can Collopy, F. and J.S. Armstrong, 1992a, Expert opinions
it be done, should it be done?, in: G. Wright and P. Ayton, about extrapolation and the mystery of the overlooked
eds., Judgemenml Forecasting (Wiley, New York), 49-62. discontinuities, International Journal of Forecasting, 8,
Benbasat, I. and A.S. Dexter, 1985, An experimental evalua- 575-582.
tion of graphical and colour-enhanced information pre- Collopy, F. and J.S. Armstrong, 1992b, Rule-based forecast-
sentation, Management Science, 31(11), 1348-1364. ing: development and validation of an expert systems
Benson, P.G. and D. Onkal, 1992, The effects of feedback approach to combining time series extrapolations, Manage-
and training on the performance of probability forecasters, ment Science, 38, 1394-1414.
International Journal of Forecasting, 8, 559-573. Collopy, F, and J.S. Armstrong, 1993, Decomposition by
Bessler, D.A. and J.A. Brandt, 1981, Forecasting livestock causal forces: using domain knowledge to forecast highway
prices with individual and composite methods, Applied deaths, Working paper.
Economics, 13, 513-522. Conroy, R. and R. Harris, 1987, Consensus forecasts of
Blattberg, R.C. and S.J. Hoch, 1990, Database models and corporate earnings: analysts' forecasts and time series
managerial intuition: 50% model + 50% manager, Man- methods, Management Science, 33(6), 725-738.
agement Science, 36(8), 887-899. Cook, T., P. Falchi and R. Mariano, 1984, An urban
Bohara, A., R. McNown and J.T. Butts, 1987, A re-evalua- allocation model combining time series and analytic hierar-
tion of the combination and adjustment of forecasts, chical methods, Management Science, 20(2), 198-208.
Applied Economics, 19, 437-445. Dalrymple, D.J., 1987, Sales forecasting practices: results
Box. G.E.E and G.M. Jenkins, 1970, Time Series Analysis: from a United States survey, Inwrnational Journal of
Forecasting and Control (Holden-Day). Forecasting, 3, 379-392.
Brown, L.D., 1988, Comparing judgmental to extrapolative Dawes, R.M., D. Faust and P.E. Meehl, 1989, Clinical
forecasts: it's time to ask why and when, International versus actuarial judgement, Science, 243(March), 1668-
Journal of Forecasting, 4, 171-173. 1674.
Brown, P., G. Foster and E. Noreen, 1985, Security Analyst De Bondt, W.F.M., 1990, Risk and return in the stock
Multi-Year Earnings Forecasts and the Capital Market market as most people (who don't know) see it, Working
(American Accounting Association). paper, University of Wisconsin-Madison.
Brown, L.D., R.L. Hagerman, P.A. Griffin and M.E. DeSanctis, G., 1984, Computer graphics as decision aids:
Zmijewski, 1987, Security analyst superiority relative to directions for research, Decision Sciences, 15(4), 463-487.
univariate time-series models in forecasting quarterly earn- DeSanctis, G. and S.L. Jarvenpaa, 1989, Graphical presenta-
ings, Journal of Accounting and Economics, 9, 61-87. tion of accounting data for financial forecasting: an ex-
Bruner, J.S., J.J. Goodnow and G.A. Austin, 1956, A Study perimental investigation, Accounting, Organisations and
of Thinking (Wiley, New York). Society, 14(5/6), 509-525.
Bunn, D., 1989, Forecasting with more than one model, Diamantopolous, A. and B.P. Mathews, 1989, Factors affect-
Journal of Forecasting, 8, 161-166. ing the nature and effectiveness of subjective revision in
Bunn, D. and G. Wright, 1991, Interaction of judgemental sales forecasting: an empirical study, Managerial and
and statistical forecasting methods: issues and analysis, Decision Economics, 10, 51-59.
Management Science, 37(5), 501-518. Dickson, G.W., G. DeSanctis and D.J. McBride, 1986,
Carbone, R. and W.L. Gorr, 1985, Accuracy of judgemental Understanding the effectiveness of computer graphics for
forecasting of time series, Decision Sciences, 16, 153-160. decision support: a cumulative experimental approach,
Carbone, R., A. Andersen, Y. Corriveau and P.P. Corson, Communications of the ACM, 29(1), 40-47.
1983, Comparing for different time series methods the Diebold, F.X., 1989, Forecast combination and encompas-
value of technical expertise, individualized analysis and sing: reconciling two divergent literatures, International
judgemental adjustment, Management Science, 29(5), 559- Journal of Forecasting, 5, 589-592.
566. Donihue, M.R., 1993, Evaluating the role judgment plays in
Cerullo, M.J. and A. Avila, 1975, Sales forecasting practices: forecast accuracy, Journal of Forecasting, 12, 81-92.
a survey, Managerial Planning, September/October, 33- Edmundson, R.H., 1987, A computer based decision aid for
39. forecasting monthly time series, Ph.D dissertation, Uni-
Clemen, R,T., 1989, Combining forecasts: a review and versity of New South Wales.
annotated bibliography, International Journal of Forecast- Edmundson, R.H., 1990, Decomposition: a strategy for
ing, 5, 559-583. judgemental forecasting, Journal of Forecasting, 9, 301-
Clemen, R.T. and J.B. Guerard, 1989, Econometric GNP 314.
forecasts: incremental information relative to naive ex- Edmundson, R.H., M.J. Lawrence and M.J. O'Connor, 1988,
trapolation, International Journal of Forecasting, 5, 417- The use of non-time series information in sales fore-
426. casting: A case study, Journal of Forecasting, 7, 201-211.
Eggleton, I.R.C., 1982, Intuitive time-series extrapolation, Lawrence, M.J., 1983, An exploration of some practical
Journal of Accounting Research, 20(1), 68-102. issues in the use of quantitative forecasting models, Jour-
Einhorn, H.J. and R.M. Hogarth, 1982, Prediction, diag- nal of Forecasting, 1, 169-179.
nosis, and causal thinking in forecasting, Journal of Fore- Lawrence, M.J. and S. Makridakis, 1989, Factors affecting
casting, 1, 23-36. judgemental forecasts and confidence intervals, Organisa-
Fischer, G.W., 1982, Scoring-rule feedback and the over- tional Behaviour and Human Decision Processes, 42, 172-
confidence syndrome in subjective probability forecasting, 189.
Organisational Behaviour and Human Performance, 29, Lawrence, M.J. and M.J. O'Connor, 1992, Exploring
352-369. judgemental forecasting, International Journal of Forecast-
Fischhoff, B. and M. Bar-HiUel, 1984, Focusing techniques: a ing, 8, 15-26.
shortcut to improving probability judgements? Organisa- Lawrence, M.J. and M.J. O'Connor, 1993, Heads or models:
tional Behaviour and Human Performance, 34, 174-194. the importance of task differences, Working paper, Uni-
Flores, B.E., D.L. Olson and C. Wolfe, 1992, Judgemental versity of N.S.W.
adjustment of forecasts: a comparison of methods, Interna- Lawrence, M.J., R.H. Edmundson and M.J. O'Connor,
tional Journal of Forecasting, 7, 421-433. 1985, An examination of the accuracy of judgemental
Gardner, E.S. and E. McKenzie, 1985, Forecasting trends in extrapolation of time series, International Journal of Fore-
time series, Management Science, 31(10), 1237-1246. casting, 1, 25-35.
Goodwin, P. and G. Wright, 1993, Improving judgemental Lawrence, M.J., R.H. Edmundson and M.J. O'Connor,
time series forecasting: a review of the guidance provided 1986, The accuracy of combining judgemental and statisti-
by research, International Journal of Forecasting, 9, 147- cal forecasts, Management Science, 32(12), 1521-1532.
161. Lawrence, M.J., R.H. Edmundson and M.J. O'Connor,
Gorr, W.L., 1986a, Use of special event data in government 1994, A field study of forecasting accuracy, Working paper,
information systems, Public Administrative Review, Special University of N.S.W.
Issue, 532-539. Lee, J.K., S.B. Oh and J.C. Shin, 1990, UNIK-FCST:
Gorr, W.L., 1986b, Special event data in shared databases, knowledge-assisted adjustment of statistical forecasts, Ex-
MIS Quarterly, 10(September), 239-255. pert Systems with Applications, 1, 39-49.
Granger, C.W.J. and P. Newbold, 1973, Some comments on Lewandowski, R., 1982, Sales forecasting by FORSYS,
the evaluation of economic forecasts, Applied Economics, Journal of Forecasting, 1, 205-214.
5, 35-47. Lim, J.S., 1993, An empirical investigation of the effective-
Guerard, J.B., 1987, Linear constraints, robust-weighting ness of time series judgemental adjustment using forecast-
and efficient composite modelling, Journal of Forecasting, ing support systems, Ph.D dissertation, University of
6(3), 193-199. N.S.W.
Guerard, J.B. and C.R. Beidleman, 1987, Composite earn- Lim, J.S. and M.J. O'Connor, 1993, Graphical adjustment of
ings forecasting efficiency, Interfaces, 17(5), 103-113. initial forecasts: how good are people at the task? Working
Harvey, A.C. and J. Durbin, 1986, The effects of seat belt paper, University of N.S.W.
legislation on British road casualties: a case study in Lobo, G.J., 1991, Alternative methods of combining security
structural time series modelling, Journal of the Royal analysts' and statistical forecasts of annual corporate earn-
Statistical Society, Series A, 149(3), 187-227. ings, International Journal of Forecasting, 7, 57-63.
Henry, R.A. and J.A. Sniezek, 1993, Situational factors Lobo, G.J. and R.D. Nair, 1991, Analysts utilisation of
affecting judgments of future performance, Organisational historical earnings information, Managerial and Decision
Behaviour and Human Decision Processes, 54, 104-132. Economics, 12, 383-393.
Hopwood, W.S. and J.C. McKeown, 1990, Evidence on Lyness, K.S. and E.T. Cornelius, 1982, A comparison of
surrogates for earnings expectations within a capital mar- holistic and decomposed judgment strategies in a per-
ket context, Journal of Accounting, Auditing and Finance, formance rating simulation, Organisational Behaviour and
5(Summer), 339-368. Human Performance, 29, 21-38.
Jones, G.V., 1984, Perception of inflation: polynomial not MacGregor, D. and J.S. Armstrong, 1993, Judgemental
exponential, Perception and Psychophysics, 36, 485-487. decomposition: when does it work? Working paper.
Kang, H., 1986, Unstable weights in the combination of MacGregor, D.G. and S. Lichtenstein, 1991, Problem struc-
forecasts, Management Science, 32(6), 683-695. turing aids for quantitative estimation, Journal of Be-
Kleinmuntz, B., 1990, Why we still use our heads instead of havioural Decision Making, 4, 101-116.
formulas: toward an integrative approach, Psychological MacGregor, D.G., S. Lichtenstein and P. Slovic, 1988,
Bulletin, 107, 296-310 Structuring knowledge retrieval: an analysis of decom-
Kurke, L.B. and H.E. Aldrich, 1983, Mintzberg was right!: a posed quantitative judgments, Organisational Behaviour
replication and extension of The Nature of Managerial and Human Decision Processes, 42, 303-323.
Work, Management Science, 29(8), 975-984. Mackinnon, A.J. and A.J. Wearing, 1991, Feedback and the
Laestadius, J.E., Jr., 1970, Tolerance for errors in intuitive forecasting of exponential change, Acta Psychologica, 76,
mean estimations, Organisational Behaviour and Human 177-191.
Performance, 5, 121-124. Makridakis, S., 1988, Metaforecasting: ways of improving
forecasting accuracy and usefulness, International Journal forecasts (with discussion), Journal of the Royal Statistical
of Forecasting, 4, 467-491. Society, Series A, 137, 131-149.
Makridakis, S., A. Andersen, R. Carbone, R. Fildes, M. Newbold, P., J.K. Zumwalt and S. Kannan, 1987, Combining
Hibon, R. Lewandowski, J, Newton, E. Parzen and R. forecasts to improve earnings per share prediction, Interna-
Winkler, 1982, The accuracy of extrapolation (time series) tional Journal of Forecasting, 3, 229-238.
methods: results of a forecasting competition, Journal of O'Connor, M., 1989, Models of human behaviour and
Forecasting, 1, 111-153. confidence in judgement, International Journal of Forecast-
Makridakis, S., S.C. Wheelwright and V.E. McGee, 1983, ing, 5, 159-169.
Forecasting: Methods and Applications, 2nd edn. (Wiley, O'Connor, M., W. Remus and K. Griggs, 1993, Judgemental
New York). forecasting in times of change, International Journal of
Makridakis, S., C. Chatfield, M. Hibon, M. Lawrence, T. Forecasting, 9, 163-172.
Mills, K. Ord and L.F. Simmons, 1993, The M2-Competi- Pankratz, A , 1989, Time series forecasts and extra-model
tion: a real-time judgmentally based forecasting study, information, Journal of Forecasting, 8(2), 75-84.
International Journal of Forecasting, 9, 5-22. Phelps, R.H. and J. Shanteau, 1978, Livestock judges: how
Mathews, B.P. and A. Diamantopolous, 1986, Managerial much information can an expert use? Organisational Be-
intervention in forecasting: an empirical investigation of haviour and Human Performance, 21, 209-219.
forecast manipulation, International Journal of Research in Raiffa, H., 1968, Decision Analysis (Addison-Wesley, Read-
Marketing, 3, 3-10. ing, MA).
Mathews, B.P. and A. Diamantopolous, 1989, Judgemental Reinmuth, J.E. and M.D. Guerts, 1972, A Bayesian ap-
revision of sales forecasts: a longitudinal extension, Jour- proach to forecasting effects of atypical situations, Journal
nal of Forecasting, 8, 129-140. of Marketing Research, 9(August), 292-297.
Mathews, B.P. and A. Diamantopolous, 1990, Judgemental Remus, W.E., 1984, An empirical investigation of the impact
revision of sales forecasts: effectiveness of forecast selec- of graphical and tabular data presentations on decision
tion, Journal of Forecasting, 9, 407-415. making, Management Science, 30(5), 533-542.
Mathews, B.P. and A. Diamantopolous, 1992, Judgemental Remus, W.E., 1987, A study of graphical and tabular displays
revision of sales forecasts: the relative performance of and their interaction with environmental complexity, Man-
judgementally revised versus non-revised forecasts, Jour- agement Science, 33(9), 1200-1204.
nal of Forecasting, 11, 569-576. Remus, W.E., 1990, Will information systems research
McNees, S.K., 1990, The role of judgment in macroeconomic generalize to managers? Working paper.
forecasting accuracy, International Journal of Forecasting, Sanders, N.R., 1992, Accuracy of judgemental forecasts: a
6, 287-299. comparison, Omega International Journal of Management
Meehl, P.E., 1957, When shall we use our heads instead of Science, 20(3), 353-364.
the formula? Journal of Counseling Psychology, 4, 268- Sanders, N.R. and K.B. Manrodt, 1994, Forecasting prac-
273: tices in US corporations: survey results, Interfaces, 24(2),
Mentzer, J.T. and J.E. Cox, 1984, Familiarity, application, 92-100.
and performance of sales forecasting techniques, Journal Sanders, N.R. and L.P. Ritzman, 1990, Improving short-term
of Forecasting, 3, 27-36. forecasts, Omega International Journal of Management
Miller, G.A., 1956, The magic number seven plus or minus Science, 18(4), 365-373.
two: some limits on our capacity for processing infor- Sanders, N.R. and L.P. Ritzman, 1992, The need for con-
mation, Psychological Review, 63, 81-97. textual and technical knowledge in judgemental forecast-
Mintzberg, H., 1975, The manager's job: folklore and fact, ing, Journal of Behavioural Decision Making, 5, 39-52.
Harvard Business Review, July/August, 49-61. Schnaars, S.P. and Mohr, I., 1988, The accuracy of Business
Moriarty, M.M., 1990, Boundary value models for the Week's industry outlook survey, Interfaces, 18(5), 31-38.
combination of forecasts, Journal of Marketing Research, Schroder, H.M., M.J. Driver and S. Streufert, 1967, Human
27(November), 402-417. Information Processing (Holt, New York).
Moriarty, M.M. and A.J. Adams, 1984, Management judg- Silverman, B.G., 1992, Judgment error and expert critics in
ment forecasts, composite forecasting methods, and con- forecasting tasks, Decision Sciences, 23(5), 1199-1219.
ditional efficiency, Journal of Marketing Research, Simon, H.A. and R.K. Sumner, 1968, Pattern in music, in:
21(August), 239-250. Kleinmuntz, ed., Formal Representation of Human Judg-
Mosteller, F., A.F. Siegal, E. Trapido and C. Youtz, 1981, ment (Wiley, New York), 219-250.
Eye fitting straight lines, The American Statistician, 35(3), Sparkes, J.R. and A.K. McHugh, 1984, Awareness and use
150-152. of forecasting techniques in British industry, Journal of
Murphy, A.H. and B.G. Brown, 1984, A comparative Forecasting, 3, 37-42.
evaluation of objective and subjective weather forecasts in Taranto, G.M., 1989, Sales forecasting practices: results from
the United States, Journal of Forecasting, 3, 369-393. an Australian survey, Unpublished thesis, University of
Newbold, P. and C.W.J. Granger, 1974, Experience with N.S.W.
forecasting univariate time series and the combination of Timmers, H. and W.A. Wagenaar, 1977, Inverse statistics
and misperception of exponential growth, Perception and Winton, E. and R.H. Edmundson, 1993, Trend identification
Psychophysics, 21(6), 558-562. and extrapolation in judgemental forecasting, Working
Turner, D., 1990, The role of judgement in macroeconomic paper, University of N.S.W.
forecasting, Journal of Forecasting, 9, 315-346. Wolfe, C. and B. Flores, 1990, Judgemental adjustment of
Tversky, A. and D. Kahneman, 1974, Judgement under earnings forecasts, Journal of Forecasting, 9, 389-405.
uncertainty: heuristics and biases, Science, 185, 1124-1131. Wood, R.E., 1986, Task complexity: definition of the con-
Wagenaar, W.A. and S.D. Sagaria, 1975, Misperception of struct, Organisational Behaviour and Human Decision
exponential growth, Perception and Psychophysics, 18(6), Processes, 37, 60-82.
416-422.
Walker, K.B. and L.A. McClelland, 1991, Management Biographies: Richard WEBBY is a lecturer in the School of
forecasts and statistical prediction model forecasts in Information Systems at the University of New South Wales.
corporate budgeting, Journal of Accounting Research, His research focuses on the development and empirical
29(2), 371-381. validation of software designed to support human judgement.
West, M. and J. Harrison, 1989, Subjective intervention in His doctoral work examined the application of a forecasting
support system to aid judgemental forecasters with both time
formal models, Journal of Forecasting, 8, 33-53. series data and extra-model information.
Wiilemain, T.R., 1989, Graphical adjustment of statistical
forecasts, International Journal of Forecasting, 5, 179-185. Marcus O'CONNOR is a member of the academic staff of
Willemain, T.R., 1991, The effect of graphical adjustment on the School of Information Systems at the University of New
forecast accuracy, International Journal of Forecasting, 7, South Wales. His research interests centre on the way people
use information in making judgements. He typically uses the
151-154.
forecasting task as a vehicle to examine the ability of people
Wind, Y., V. Mahajan and R.N. Cardozo, 1981, New Product to combine both causal and non-causal information in making
Forecasting (Lexington Books, Lexington, MA). forecasts.

Judgemental and Statistical Time Series Forecasting: A Review of The Literature

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Judgemental and Statistical Time Series Forecasting: A Review of The Literature

Uploaded by

Copyright:

Available Formats

ELSEVIER International Journal of Forecasting 12 (1996) 91-118

Judgemental and statistical time series forecasting:

Richard Webby*, Marcus O'Connor

1. Introduction reviewing the empirical evidence regarding the

0169-2070/96/$15.00 © 1996 Elsevier Science B.V. All rights reserved

Table 1 higher with quality information, because there

2.3.2.3. Motivation Table 2 summarises the findings made for each

Table 2 objectivity and precision. In the next section, we

forecast, and the forecast is recomposed with any

The formulation of a statistical forecast re-

sition of special events. Such an approach may

[blank] Information no4 available, or infonnation is repeated from above

'%- ~,. %, ~o. ~.. ~,,_ ,esr

Andreassen a Economic and

~ 4)~e~,o~~o= ~_ Q~. BEST

Adam & Ebmt Generated

STUDY SERIES " ++ " 0 : " 1 % . '%%'% OBJEC,,VE

Maklidekisetal. M2-Competitio~ 23 M 14 ~' Combination of

Brown, Company 233 Q 60- #1 Brown-Rozeff 2r.3 A.,~.,- • • • 20.7 :

Schmmrs & Industry-level 84 Y I X Naive 27.0 Bus/hess ~ ~(~;19.6 •

SINGLE COMBINATION "~ ~'¢4,~

<1.1) p.uee I I I I I I I I I I ~o~o~iete f I I I 2. AR,MA 134.2

Mo,.r~& S.~,oo, I 2 I MI I I I 172 136'1 I M..,~.~, I lelele 1.Ju~,,maot 114.5

STUDY SERIES _,u,_ +=,+ %

Donihue(1993) Macrcaconomic 12 Q 4 Groupof 1 • • • Macrotconornic HolislJc •

References Armstrong, J.S., 1985, Long-Range Forecasting: From Crys-

You might also like