Omega: Çera G Pinçe, Laura Turrini, Joern Meissner

Omega 105 (2021) 102513
Contents lists available at ScienceDirect
Omega
journal homepage: www.elsevier.com/locate/omega
Review
Intermittent demand forecasting for spare parts: A Critical review ✩

Çerağ Pinçe a,∗, Laura Turrini b, Joern Meissner c
a
Quinlan School of Business, Loyola University Chicago, 16 E. Pearson St., Chicago, IL 60611, USA
b
EBS Business School, Burgstr. 5, 65375 Oestrich Winkel, Germany
c
Kuehne Logistics University, Großer Grasbrook 17, 20457 Hamburg, Germany
a r t i c l e i n f o a b s t r a c t
Article history: Spare parts demand forecasting has received considerable attention over the last fifty years as it is a
Received 8 October 2019 challenging problem for many companies. This paper provides a critical review and quantitative analysis
Accepted 23 June 2021
of the current literature on spare parts demand forecasting methods. First, we describe how different
Available online 3 July 2021
research streams in the literature have developed over time and review each stream extensively. Then, by
Keywords: gleaning information from the available studies, we carry out a quantitative analysis to provide granular
Spare parts insights into why and when a particular forecasting method should be preferred.
Intermittent demand
© 2021 The Authors. Published by Elsevier Ltd.
Forecasting
Inventory control
This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/)
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2. Literature review framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
3. Time-series forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1. Parametric approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1.1. Croston and its modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.1.2. Methods taking demand obsolescence into account . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3.1.3. Other parametric time-series methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.1.4. Parametric bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.2. Nonparametric approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.1. Nonparametric bootstrapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2.2. Other nonparametric methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.3. Forecast improvement strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3.1. Demand classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3.2. Data aggregation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
4. Contextual forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.1. Judgmental forecasting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2. Installed base forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5. Comparative studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.1. Performance measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
5.1.1. Forecast accuracy measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.1.2. Inventory performance measures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.2. Performance comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6. Quantitative literature analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
✩
This manuscript was processed by Associate Editor Prof. Ben Lev.
∗
Corresponding author.
E-mail addresses: cpince@luc.edu (Ç. Pinçe), laura.turrini@ebs.edu (L. Turrini), joern.meissner@the-klu.org (J. Meissner).
https://doi.org/10.1016/j.omega.2021.102513
0305-0483/© 2021 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
Ç. Pinçe, L. Turrini and J. Meissner Omega 105 (2021) 102513
6.1.
Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.2.
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.2.1. Comparison of croston and SBA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
6.2.2. Comparison of Croston and SBA with traditional methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
6.2.3. Comparison of Croston and SBA with new parametric methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.2.4. Comparison of Willemain with new nonparametric methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
6.2.5. Comparison of nonparametric and parametric methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6.2.6. Comparison of methods using installed base information. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.2.7. Comparison of methods using data aggregation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
7. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1. Introduction and stock control performance, we exclude spare parts inventory

management methods from our review as they are beyond our pa-
Many companies and after-sales service providers carry spare per’s scope. For an extensive review of these methods, we refer the
parts stocks to minimize downtime risks and assure equipment reader to Hu et al. [5] and Basten and van Houtum [6]. In a similar
availability. The main challenge faced by these companies is to find vein, Topan et al. [7] gives an overview of papers on operational
the right balance between the inventory holding costs incurred interventions in spare parts planning.
by large inventories of slow-moving and intermittent spare part Our paper’s scope is similar to that of Boylan and Syntetos
stocks, and the equipment downtime costs. For example, a two- [8], which is spare parts demand forecasting. Bacchetti and Sac-
hour delay for an aircraft can cost an airline up to $150,0 0 0 [1]; cani [9] also provide a literature overview on this theme; however,
not surprisingly, the commercial aviation industry’s total spending their focus is mainly on investigating the gap between research
on spare part stocks exceeds $10 billion annually [2]. and practice through case studies. While our review may overlap
At the heart of the cost-availability tradeoff challenge lies an- with these and some other reviews mentioned above, our contri-
other critical problem: spare parts demand forecasting. Spare parts bution differs from these earlier papers in two critical ways. First,
exhibit erratic, lumpy, or intermittent demand patterns involving we provide a current and in-depth examination of spare parts de-
long series of zero-demand periods. Traditional time-series meth- mand forecasting literature. Except for Boylan and Syntetos [8] and
ods, such as exponential smoothing or moving averages, often fail Bacchetti and Saccani [9], none of the other reviews specifically
to provide accurate estimates for such demand patterns because focus on spare parts demand forecasting, which has been rapidly
they place weight on the most recent data points. Consequently, growing over the last two decades with many new methods and
these methods generate fluctuating demand forecasts that grad- research streams emerging. Second, we synthesize the literature
ually decrease to their lowest level before a demand occurs and through a quantitative analysis. To our best knowledge, our review
jump to their highest level immediately after. is the first carrying out such analysis of performance measures and
One of the earliest studies pointing out this difficulty and methods developed for spare parts demand forecasting. As a result,
proposing an alternative approach is Croston [3]. In this seminal we offer more precise insights into when a particular forecasting
paper, Croston suggests forecasting the inter-demand time and de- method should be used and identify interesting future research av-
mand size separately through exponential smoothing to overcome enues.
the issue of putting too much emphasis on the most recent obser-
vations. Since then, spare parts demand forecasting has received 2. Literature review framework
significant academic attention, and numerous methodologies and
research streams have evolved to form a rich body of literature The spare parts demand forecasting literature can be subsumed
(Fig. 1). under three major categories and several subcategories that closely
In this paper, we provide a critical review and quantitative anal- follow the literature’s evolution over time. The framework of our
ysis of the current literature on spare parts demand forecasting literature review is summarized in Fig. 2.
methods. We describe how different research streams in the litera- The review framework is a natural classification of papers on
ture have developed over time and review each stream extensively. spare parts demand forecasting literature and shares some similar-
We then carry out a quantitative analysis by gleaning information ities with the classification schemes offered in earlier reviews [e.g.,
from the available empirical studies to shed light on the perfor- 5,9]. Different from these previous schemes, we provide a separate
mance of spare parts demand forecasting methods. We focus on section on comparative studies and treat demand classification and
the following questions: What is the overall performance of stan- data aggregation methods as forecast improvement strategies. We
dard spare parts demand forecasting methods? How do traditional also consider installed base and judgemental forecasting methods
or newer methods perform compared to these standard methods? under the bigger category of contextual forecasting. Below we ex-
What are the factors explaining the performance differences be- plain each category in more detail.
tween these methods? Why and when should a particular forecast- The first major category consists of papers on time-series fore-
ing method be preferred? casting methods (Section 3). Most time-series methods rely on
To provide a rigorous review of the field, we concentrate only historical data to generate demand forecasts and do not incor-
on spare parts demand forecasting papers. We do not review stan- porate contextual information; however, they differ depending on
dard forecasting methods commonly used in supply chain con- whether they take a parametric or nonparametric approach. Para-
texts, such as exponential smoothing or the autoregressive inte- metric approaches often assume that the lead-time demand fol-
grated moving average (ARIMA) models. However, we occasionally lows a known probability distribution (e.g., normal, Poisson). In
mention some of these methods, as they also appear in the spare contrast, nonparametric approaches derive the lead-time demand
parts demand forecasting literature. We refer the reader to [4] for distribution from the data. Parametric approaches can also be di-
an excellent review of these methods. Similarly, while we exten- vided into various categories depending on whether they pro-
sively discuss the link between spare parts demand forecasting pose a modification of Croston, incorporate demand obsolescence,
2
Fig. 1. Evolution of spare parts demand forecasting literature.
Fig. 2. Literature review framework.
use statistical bootstrapping, or take a different approach, whereas installed base information study how contextual factors such as
the nonparametric approaches can be classified into bootstrapping maintenance schedules, equipment age, or operating conditions in-
methods and other nonparametric methods, such as the empiri- fluence spare parts demand. We review both research streams in
cal method [10] and its extensions, and neural network models Sections 4.1 and 4.2.
applied to spare parts demand forecasting. We review parametric The third major category consists of comparative studies pro-
and nonparametric time-series approaches and their subcategories viding performance benchmarks for traditional and alternative
in Sections 3.1 and 3.2. spare parts demand forecasting methods (Section 5). They carry
Two research streams that branched out of time-series forecast- out extensive empirical tests by using industrial or simulated data
ing methods focus on forecast improvements by demand classifi- sets and measure the methods’ performances based on forecast
cation or data aggregation. Demand classification schemes catego- accuracy or inventory performance. In these comparative studies
rize data as smooth, erratic, lumpy, or intermittent based on var- and in the rest of the literature, the tests often use several mea-
ious demand characteristics and recommend the best performing sures to give a sense of each particular method’s overall perfor-
forecasting method for each demand type. Data aggregation meth- mance. However, there seems to be no clear consensus on the set
ods, on the other hand, aim to reduce demand variability by ag- of best performance measures for spare parts demand forecasting
gregating data at various levels and combining different forecast- methods. While most studies use forecast accuracy measures, oth-
ing methods. We refer to these techniques collectively as forecast ers show that inventory performance measures yield more realistic
improvement strategies and review them in Section 3.3. benchmarks [e.g., 11]. Thus, to investigate whether there are com-
The second major category involves methods that systematically monly used performance measures in the literature, we carry out
combine contextual information with statistical forecasting tech- a quantitative analysis of the forecast accuracy and inventory per-
niques to rapidly respond to unexpected or structural changes in formance metrics reported and briefly explain their main charac-
demand (Section 4). Contextual information can be broadly cat- teristics (Section 5.1). We then review the comparative studies in
egorized as the additional information derived from an expert’s detail (Section 5.2).
opinion or the installed base’s condition that can be potentially The remainder of this paper is organized as follows:
helpful in estimating spare parts demand. The papers focusing Section 3 reviews the time-series forecasting methods.
on expert judgment investigate whether judgmental adjustments Section 4 reviews the contextual forecasting methods.
improve statistical forecasts. In contrast, the papers considering Section 5 provides an overview of the performance measures
3
commonly used in the literature and reviews the comparative Table 1

Comparison of inventory performance between SES
studies. Section 6 discusses the insights obtained from the liter-
and Croston.
ature analysis. Section 7 concludes with an outlook on research
opportunities. SES Croston
Average on-hand inventory 477 456

MAE 74.26 65.73
3. Time-series forecasting RMSE 133.9 126.6
Inventory management for any item begins with estimating the

mean and variance of the lead-time demand and then fitting a
mean, or estimating the hypothesized demand distribution’s pa-
demand distribution to these two parameters. This information is
rameters through bootstrapping.
typically used to calculate the optimal safety stock levels to achieve
a specific availability target (e.g., fill rate, cycle service level, cus-
3.1.1. Croston and its modifications
tomer waiting time) or minimize the total system cost. The fore-
Although exponential smoothing is widely used in practice, its
casting step in this two-step procedure is usually carried out with
application to intermittent demand has a significant drawback, as
a time-series forecasting method. Time-series forecasting meth-
pointed out by Croston [3]. Exponential smoothing is biased since
ods are prevalent in practice due to their simplicity and ease of
forecasts are generated during positive-demand periods, referred to
implementation. They rely mainly on historical data and do not
as issue points, and the bias depends on the average interarrival
attempt to incorporate contextual information (e.g., expert judg-
time. Consequently, it produces a sawtooth forecast pattern when
ments, product characteristics, maintenance information) to iden-
used for estimating intermittent demand (Fig. 3a). That is, expo-
tify the drivers of spare parts demand. Consequently, they require
nentially smoothed forecasts generated at issue points always over-
less effort for data collection and can be easily automated by using
estimate actual demand, and this error increases during the inter-
data readily available in ERP systems.
demand interval. Croston [3] shows this behavior analytically and,
Because of their prevalence and practicality, there is a large
as a solution, suggests decomposing the demand process into two
body of literature on time-series methods that have been devel-
components, inter-demand time and demand size, and forecasting
oped to estimate spare parts demand. These techniques can be
each component separately with exponential smoothing. The fore-
broadly categorized as parametric, nonparametric, and forecast im-
casts are updated only during periods with positive demand and
provement approaches. In the following subsections, we systemati-
are combined to estimate the demand per unit of time.
cally review these methods by further categorizing them into more
More specifically, if st denotes the inter-demand time and zt de-
focused research substreams to present the state of the art in time-
notes the demand size observed in period t, Croston can be given
series-based spare parts demand forecasting techniques.
as
if zt = 0 : sˆt = sˆt−1 , zˆt = zˆt−1
3.1. Parametric approaches if zt = 0 : sˆt = α st + (1 − α )sˆt−1 , zˆt = α zt + (1 − α )zˆt−1 ,
In parametric approaches, demand is assumed to follow a hy- where sˆt and zˆt are the one-step-ahead forecasts of inter-demand
pothesized probability distribution, and the mean and variance of time and demand size generated at the end of period t. Then, the
the demand are estimated with a forecasting method. Once de- demand forecast is equal to Yˆt = zˆt /sˆt . Croston compresses zero de-
mand is characterized by a probability distribution, inventory pol- mand periods and updates forecasts only when there is positive
icy metrics (e.g., reorder points, safety stocks) are calculated by demand. Consequently, it generates smoother estimates with less
extrapolating this characterization to lead-time demand. While in error variation, as shown in Fig. 3a, and thereby leads to lower
principle, this approach can be used with any type of data and safety stocks for the same service level. We illustrate this point
forecasting method, classic techniques, such as simple exponential with the following constructed example.
smoothing (SES), generate inaccurate estimates for intermittent de- Example. Consider the generated intermittent demand data pre-
mand. This is because the updating mechanisms in these classic sented in Fig. 3b. Demand during the replenishment lead time (L)
methods are not tailored for long streaks of zero-demand periods, is assumed to be normally distributed, and the inventory system
as are typically observed in intermittent demand patterns. Croston is controlled by an order-up-to policy. For a given cycle service
[3] was the first paper to investigate this problem and develop a level ρ , the optimal order-up-to level can be given as S = μ(L ) +
new forecasting method for intermittent demand, which has since σ (L ) −1 (ρ ), where μ(L ) and σ (L ) are the mean and standard de-
become known as Croston’s method or, shortly, Croston. Over the viation of the lead-time demand, and is the standard normal
years, many scholars have proposed modifications of Croston, and distribution function. The mean and standard deviation of √lead-
it has become one of the standard performance benchmarks for time demand are computed as μ(L ) = Yˆt L and σ (L ) = RMSEt L, in
spare parts demand forecasting methods. In Section 3.1.1, we pro- which Yˆt denotes the demand forecast generated by SES or Croston,
vide a detailed explanation of Croston and review its modifications. and RMSEt denotes the root mean squared error. The results of the
Another critical characteristic of spare parts demand is obsoles- inventory system simulation with the forecasts generated by the
cence. Spare parts demand can vanish, and due to the prevalence two methods and a 85% target cycle service level are summarized
of zero-demand periods, it may take a long time before a forecaster in Table 1. As Table 1 shows, Croston achieves a lower on-hand in-
detects obsolescence. Such a delay in obsolescence detection, how- ventory than SES at the same cycle service level due to its smaller
ever, is costly, as stocks of nonmoving, expensive spare parts incur forecast errors calculated by the mean absolute error (MAE) and
high holding costs (due to tied capital, warehousing, handling, etc.) RMSE.
without contributing to the service level [e.g., 12,13]. Due to the Syntetos and Boylan [14] point out the bias in Croston and pro-
significance of this problem, there is a growing research stream on pose a modified procedure in which the demand forecast is given
intermittent demand forecasting methods that take demand obso- as Yˆt = zˆsˆt −1 for a given c. For the method to be theoretically un-
sˆt c t
lescence risk into account; we review these papers in Section 3.1.2. biased, c should be infinite, but a good approximation is obtained
In Sections 3.1.3 and 3.1.4, we review other parametric methods when c is set to approximately 100. Syntetos and Boylan [15] pro-
focusing on different aspects such as cross-correlations between vide an analytical approximation of the bias in Croston and pro-
demand intervals and sizes, random shifts in demand distribution’s pose another forecasting method to correct this bias. In the liter-
4
Fig. 3. Exponential smoothing and Croston.
ature, this method is often referred to as the Syntetos-Boylan ap- companies carrying spare parts inventories as it often leads to ex-
proximation
(SBA).
According to SBA, the demand forecast is given pensive dead stocks that are difficult to dispose of due to their
by Yˆt = 1 − α2 zsˆˆt , where (1 − α /2 ) is the bias correction coeffi- specificity [e.g., [12,13]]. Thus, responding to demand obsolescence,
t
cient with smoothing constant α . Using a dataset from the au- preferably before or quickly after it occurs, is crucial for efficient
tomotive industry, Syntetos and Boylan [15] show that SBA gives spare parts management.
more accurate results than Croston, SES, and the simple moving Teunter et al. [21] note that Croston and its modifications are
average for fast intermittent demand. slow to adjust to new demand levels when demand gradually de-
Our quantitative literature analysis (Section 6) shows that SBA creases or suddenly becomes zero. To address these issues, they in-
consistently generates more accurate forecasts than Croston for troduce a new method, referred to as TSB, that combines demand
spare parts demand data. However, in inventory measures, Cros- size forecasts with demand probability forecasts instead of demand
ton outperforms or ties with SBA depending on the data type. For interval forecasts. Another difference between the new method and
highly intermittent and decreasing demand patterns, Croston out- Croston and SBA is that it updates the demand probability esti-
performs SBA, whereas for demand patterns with moderate to low mate in every period. In contrast, the latter methods update their
intermittence and variability, SBA leads to slightly better inventory estimates only after a demand occurrence. Consequently, forecasts
performance than Croston. We discuss the insights behind these generated by TSB are adjusted downward when there are no de-
observations in more detail in Section 6. mand occurrences and can react to obsolescence more quickly.
Levén and Segerstedt [16] extends Croston for both slow- and Through a simulation study, Teunter et al. [21] show that TSB is
fast-moving items by introducing a somewhat simpler procedure more accurate than SES, Croston, and SBA in terms of the mean er-
in which the ratio between the last observed demand quantity and ror (ME) and the mean squared error (MSE) for both stationary and
the demand interval is smoothed with the previous period’s de- nonstationary demand patterns. However, they also note that fur-
mand rate forecast. As in Croston, the forecasts are updated only ther research is needed to empirically test the performance of TSB
at the end of those periods in which demand occurs. The authors against the other methods. Indeed, in a follow-up empirical paper,
claim that their method has better inventory performance than which uses the UK Royal Air Force (RAF)1 and automotive datasets,
SES. However, Boylan and Syntetos [17] show that this method Babai et al. [26] show that TSB does not lead to significantly more
has higher bias than Croston or SES. Teunter and Sani [18] con- accurate forecasts than the other methods. Such performance dif-
duct a more general investigation of the bias in Croston and its ferences between experiments carried out with industrial and sim-
modifications proposed by Syntetos [19], Levén and Segerstedt [16], ulated datasets are not uncommon in the literature. As we discuss
and Syntetos and Boylan [15]. Their numerical study shows that in detail in Section 6.2, simulation studies are typically tailored to
the [16] method has the largest bias, whereas [19] method has specific purposes and cannot capture many of the factors affect-
the lowest bias. The study also reveals that the biases in differ- ing real-life demand patterns, such as maintenance regimes, differ-
ent methods are mainly determined by the smoothing parame- ences between Original Equipment Manufacturers (OEMs) and the
ter and demand probability but are little affected by the demand end-users of spare parts, or unplanned batch ordering of parts by
type. the end-users.
Building on these earlier works, Babai et al. [27] propose a
new method, referred to as modified SBA, that addresses TSB’s ac-
3.1.2. Methods taking demand obsolescence into account curacy shortcomings with industrial datasets. The modified SBA’s
In the context of spare parts forecasting and inventory control, forecast updates are similar to those of SBA during positive de-
demand obsolescence means that a spare part is not needed or demand periods, but when the risk of obsolescence increases, the
manded anymore. This can occur gradually or suddenly, depending
on the operating environment of the spare parts. For example, sud-
den obsolescence happens if the equipment for which the part is 1
This is the single most commonly used dataset in the spare parts demand fore-
needed is relocated or taken out of operation somewhat unexpect- casting literature. The original dataset consists of monthly demand observations of
edly. Or if a newer version of a part replaces the old version due 50 0 0 spare parts (SKUs) over seven years. As this original dataset has been used
to an upgrade. In contrast, a more gradual decrease in a part’s de- in many other studies [e.g., [11,22,23]], in the rest of our review, we will shortly
refer to it as the RAF dataset. Two exceptions are Eaves and Kingsman [24] and
mand happens as the equipment enters its end-of-life phase and Mohammadipour and Boylan [25]. These papers also report using a dataset from
new generation equipment begins to dominate the market. In ei- the UK RAF, but these datasets involve a larger number of SKUs than the original
ther case, demand obsolescence is an important problem for many RAF dataset.
5
updates become similar to those of TSB. Using the RAF and auto- ulation) and less intuitive than traditional methods such as Cros-
motive datasets, they show that the modified SBA outperforms all ton or SBA. Thus, it might be challenging to explain these mod-
other methods (SES, Croston, SBA, TSB) in terms of bias (ME) and els to spare parts demand management professionals. On the other
MSE. Especially for spare parts associated with decreasing demand, hand, the paper also shows that if a simpler static distribution is
the modified SBA yields higher benefits due to increased accuracy. preferred for modeling spare parts demand, the best option is the
Their study also highlights that the implementation of the Croston negative binomial distribution, as it allows for greater variability.
and SBA methods in enterprise resource planning systems should In contrast, the Poisson distribution is too restrictive and should
be carried out carefully, as these methods cannot deal well with be avoided.
obsolescence or decreasing demand patterns. Following Snyder et al. [31], Jiang et al. [32] propose another
Following Teunter et al. [21], Prestwich et al. [28] propose distribution-fitting forecasting method that reflects the changes
an alternative method that incorporates sudden obsolescence risk, in the underlying demand process using a mixed zero-truncated
which is referred to as hyperbolic exponential smoothing (HES). Poisson hurdle model. The empirical analysis conducted with data
HES is similar to TSB, but the forecasts decay hyperbolically instead from an electric power company shows that the proposed method
of exponentially. Using simulated data, the authors show that TSB outperforms Croston, the Poisson model, the hurdle Poisson model,
yields the most accurate estimates of demand patterns with sud- and the hurdle shifted Poisson model in all forecast accuracy mea-
den obsolescence. This is because the hyperbolic decay in HES dur- sures. While both Snyder et al. [31] and Jiang et al. [32] offer in-
ing zero demand periods captures obsolescence more slowly than teresting and promising parametric approaches to forecasting in-
the exponential decay in TSB. On the other hand, HES is more ro- termittent demand, more research is needed to better assess the
bust to changes in smoothing factors than TSB, making it a more performance of these prediction-distribution methods in compari-
straightforward method to calibrate in practice, as there is often son to other parametric and nonparametric methods.
limited history for intermittent demand.
In some practical situations, it is also important to identify
parts with obsolescence risk. Such capability would help to reduce 3.1.4. Parametric bootstrapping
inventory costs significantly, especially for expensive and slow- Parametric bootstrapping is a simulation technique that uses
moving spare parts. van Jaarsveld and Dekker [12] study this prob- estimated distribution parameters to generate lead-time demand
lem by using the spare parts demand data of a complex techno- data. In general, bootstrapping techniques are useful when direct
logical product. They model the part obsolescence process by a estimation of the policy parameters is difficult due to a lack of
two-state absorbing Markov chain in which the states represent historical data or extreme irregularity in the shape of the demand
healthy and obsolete demand cases. The parts are grouped based distribution, which makes it hard to directly describe the demand
on their demand over a specific period, and their obsolescence risk distribution with a parametric distribution. Parametric bootstrap-
is estimated using the past behavior of similar groups. They con- ping, however, is not entirely distribution-free; a certain family of
clude that for slow-moving items with high risk of obsolescence demand distributions (e.g., normal, log-normal) must be assumed
the company should carry lower stocks. to conduct parametric bootstrapping. Nonparametric bootstrapping
methods, which do not assume a specific demand distribution, are
3.1.3. Other parametric time-series methods discussed in Section 3.2.1. Below and in Section 3.2.1, we briefly
Pennings et al. [29] present a method that incorporates posi- review the bootstrapping methods developed for spare parts de-
tive cross-correlation between interarrival times and demand sizes mand forecasting. A more detailed review of these methods can be
to anticipate incoming spare parts demand. They also introduce a found in Hasni et al. [33].
variant of [30] bootstrapping method in which the probability of a Snyder [34] introduces two parametric bootstrapping methods
demand occurrence is approximated by using the empirical distri- based on Croston, referred to as the log-space adaptation and the
bution of demand occurrences during the lead time. By using five adaptive variance version. These methods use demand history to
different industrial datasets, they benchmark their method against produce least-squares estimates of the mean and standard devi-
standard methods (SES, Croston, SBA, 21,30) according to forecast ation of demand and the SES’s smoothing parameter. The empir-
accuracy and inventory performance. In the forecast accuracy mea- ical distribution of lead-time demand is constructed through re-
sures, SBA is found to be the overall best performer; in inventory peated simulations and used to compute order-up-to levels. Nu-
measures, the proposed method outperforms the other methods if merical tests conducted with a dataset from the automotive in-
there is a strong positive cross-correlation between the demand dustry show that the adaptive variance method performs best by
size and the interarrival time. achieving the fill-rate target with the lowest inventory. However,
[31] develop a new method incorporating possible random the paper acknowledges that the proposed parametric bootstrap
shifts in the mean of the demand distribution. The method models approaches ignore the effects of estimation error. Thus, they may
the spare parts demand distribution with various counting distri- have a tendency to underestimate the variability of lead-time de-
butions (Poisson, negative binomial, and hurdle shifted Poisson) for mand and to suggest lower safety stocks than is necessary.
which the mean can change over time through a recurrence rela- Varghese and Rossetti [35] propose another parametric boot-
tionship. The recurrence relations update the mean by smoothing strapping algorithm in which demand occurrences are generated
the previous period’s mean and the demand realizations. These dy- by a two-state Markov chain and demand sizes are generated by
namic shifts in the mean reflect the effects of possible structural using a Poisson, negative binomial, or mixed distribution. The au-
changes in the underlying demand distribution (e.g., phases of the thors compare their method with Croston and SBA using the accu-
product life cycle, obsolescence, seasonal demand). Successive fu- racy measures MSE, MAE, and the mean absolute percentage error
ture demand realizations are sampled from these prediction distri- (MAPE). They conclude that there is no outperformer among the
butions to obtain an estimate of future demand series. This new methods, but SBA performs slightly better than the others. Hasni
method is compared to static demand distribution models and tra- et al. [36] take an inventory performance perspective and focus on
ditional forecasting methods by using a database on car parts de- the service levels achieved by parametric and bootstrapping meth-
mand. ods. They modify the bootstrapping methods proposed in Wille-
The results show that these dynamic models can yield substan- main et al. [30] and Zhou and Viswanathan [37] by adjusting the
tial accuracy gains over the alternatives. However, they are techni- way those algorithms sample lead-time demand. Using the RAF
cally involved (require advanced knowledge of statistics and sim- dataset and a jewelry item dataset, they show that their modified
6
methods lead to considerable service-level improvements, but the and the target stockout risk (α ). They compare their method with
underestimation problem for high fill-rate targets remains. the [40] classic bootstrapping method and another approach, re-
The quantitative analysis presented in Section 6 further syn- ferred to as the normal approximation technique. They find that the
thesizes the performance results of the newer parametric meth- two bootstrapping methods result in higher service levels than the
ods discussed above (Sections 3.1.2–3.1.4). Our analysis shows that normal approximation technique, but the performance difference
in forecast accuracy measures, the newer parametric methods per- between their method and the classic bootstrapping methods is in-
form better than Croston for almost all data types. For automo- significant.
tive and electronics datasets, SBA has an accuracy advantage over Willemain et al. [30] modify the classic bootstrapping method
newer methods. However, for military, military aviation, and elec- to better model intermittent inventory data with autocorrelation,
trical parts, the results are less conclusive and depend on the spe- frequently repeated values, and relatively short time series. To
cific accuracy measures used. capture the autocorrelation, the authors use a two-state Markov
In terms of inventory performance, both SBA and Croston seem process to model zero and nonzero demands and evaluate the
to be better choices than new or classic parametric methods. In transition probabilities directly from the data using the counting
particular, for industries with high target service levels and mod- techniques presented in Mosteller and Tukey [42]. The transition
erate to high demand intermittency (e.g., railway, military, and mil- probabilities are used to generate a sequence of zeros and ones
itary aviation), Croston outperforms newer methods due to its ten- in which the ones indicate positive demand realizations. Demand
dency to yield higher service levels. On the other hand, for auto- sizes are then generated through a jittering procedure in which the
motive datasets, which show moderate intermittency and down- sampled demand values are modified by introducing random varia-
ward trends, the inventory performances of SBA, Croston, and the tion. Through an empirical study based on multiple industrial data
newer methods are similar. sets, they show that their bootstrapping method generates more
accurate results than SES or Croston.
3.2. Nonparametric approaches Zhou and Viswanathan [37] introduce another version of boot-
strapping that generates demand intervals through bootstrapping
All of the parametric methods discussed so far assume that and demand sizes through sampling. They compare this method
lead-time demand follows a particular probability distribution. with SBA in terms of inventory performance using simulated and
Methods following the nonparametric approach, on the other hand, empirical data. They find that the proposed method outperforms
do not assume a specific lead-time demand distribution but recon- SBA for the simulated data; however, SBA outperforms the pro-
struct their empirical distributions, typically, through a bootstrap- posed method if the comparisons are carried out with the empiri-
ping procedure. Consequently, nonparametric methods are more cal data. In a follow-up paper, Hasni et al. [23] conduct a more ex-
flexible and can be used with any kind of demand distribution, intensive comparative study and confirm that SBA has better inven-
cluding those that are difficult to describe with a parametric dis- tory performance than the [37] bootstrapping method for highly
tribution family [38]. intermittent demand and short lead times. On the other hand, the
In practice, complex demand patterns often occur as a result of latter outperforms SBA and Willemain for moderately intermittent
maintenance policies. For example, car tires are replaced in pairs demand and long lead times.
to have the same wear profile. Similarly, all seals connected to a Kocer [43] generalizes [30] method by using higher-order
pump are usually replaced together to have a lower maintenance Markov chains. They show that their proposed method decreases
cost and downtime risk. Nonparametric methods can accommo- the MAPE of the lead-time demand forecast, but it performs sim-
date such nonsmooth demand patterns relatively easily. However, ilarly to Croston, if the accuracy measure is the mean absolute
for homogeneous demand sizes, bootstrapping methods should be scaled error (MASE) of the period demand.
applied with caution because simple sampling methods can gener-
ate empirical distributions leading to poor inventory performance 3.2.2. Other nonparametric methods
[11]. Porras and Dekker [10] introduce a new nonparametric method
In this section, we first review the nonparametric bootstrapping in which the empirical distribution of the lead-time demand is di-
approaches and present their advantages over the classic paramet- rectly constructed from the data without sampling. This method
ric approaches. We then continue by reviewing other distribution- is simpler than bootstrapping methods and is often referred to as
free techniques, such as [10] empirical method and neural network the empirical method. An empirical study carried out with a dataset
models, and identify their advantages over bootstrapping and the supplied by a petrochemical processing company shows that the
well-known parametric methods. empirical method performs better than Willemain in terms of in-
ventory cost, as the latter estimates slightly higher reorder points
3.2.1. Nonparametric bootstrapping to achieve the same service level. However, they also find that the
The classic bootstrapping method, which was introduced by simple parametric normal-distribution method outperforms both
Efron [39] and applied to inventory management by Bookbinder nonparametric methods.
and Lordahl [40], has frequently been used in the intermittent de- Van Wingerden et al. [44] extend the empirical method by in-
mand context. These bootstrapping methods sample from past de- corporating randomness into lead times. An empirical study based
mand realizations until M bootstrapped statistics of the lead-time on three different industrial datasets reveals that the proposed
demand are obtained and order them to calculate the empirical method’s inventory performance regarding the holding cost and fill
distribution, denoted by Fˆ . That is, if 1 − α is the target service rate tradeoff is slightly better than that of the empirical method.
level, the appropriate reorder level is calculated as the α -quantile However, the proposed method often performs worse than SBA,
of the empirical distribution (Fˆ −1 (1 − α )). Fricker and Goodhart except in the case of expensive spare parts with very slow-moving
[41] propose a variation of the classic bootstrapping method to ad- and highly variable demand. In a similar vein, Zhu et al. [45] pro-
dress the problem of scarce historical data, a common issue with pose a modification of the empirical method by applying extreme
intermittent demand. This method assumes a stochastic lead-time value theory to model the tail of the lead-time demand distribu-
and produces Fˆ , as in classic bootstrapping. Each lead-time de- tion. Through simulations and an empirical study, they show that
mand observation is created for a different lead-time value, sam- the modification leads to shorter expected waiting times, higher
pled from its distribution, and the reorder point is set to minimize cycle service levels, and better target service level achievement
the Euclidean distance between the empirical stockout risk (F̄ˆ (z )) than the empirical method. On the other hand, comparisons of this
7
method’s service level performance with that of other alternatives 3.3. Forecast improvement strategies
(Willemain, Croston, SBA) are inconclusive.
Another developing category of nonparametric methods in- Spare parts have different underlying demand characteristics re-
volves papers applying machine learning techniques to spare parts quiring different estimation approaches. Understanding these char-
demand forecasting. The general idea behind these papers is to acteristics can be useful because a method’s performance may
learn demand patterns directly from the data through a super- depend on them. In this section, we begin by reviewing papers
vised learning algorithm, such as neural networks. Neural net- that focus on demand classification and distribution fitting meth-
works are considered a versatile tool that can capture nonlinear ods. The primary purpose of these papers is to identify the best-
patterns in the data, such as intermittence and lumpiness, better performing forecasting method for given demand characteristics
than most time-series methods. One of the earliest studies in this to facilitate stock control. We then continue by reviewing stud-
stream is by Gutierrez et al. [46]. They compare their neural net- ies that focus on forecast improvements through data aggregation.
work model with SES, Croston, and SBA using an electronic prod- These papers aim to reduce the variability in spare parts demand
ucts distributor’s data and show that it generally produces more – emerging from many periods of zero demand and highly vari-
accurate forecasts than the other methods unless the training and able demand sizes – by grouping the data by time, demand vol-
test datasets have significantly different average demand sizes. By ume, item characteristics, the forecasting methods used, or a com-
using the same dataset, Mukhopadhyay et al. [47] compare a mod- bination of these alternatives. Below, we review these forecast im-
ified version of [46] method with traditional methods (SES, Cros- provement strategies in two separate sections.
ton, SBA, weighted moving average). The comparisons show that
the modified neural network method outperforms the other meth- 3.3.1. Demand classification
ods according to the MAPE and the median relative absolute error The objective of demand classification methods is to match the
(MdRAE). demand characteristics with the most appropriate estimation pro-
Kourentzes [48] generalizes [46] method by incorporating three cedure to improve forecasting and stock control. One of the earliest
different network settings and the Levenberg-Marquardt algorithm studies on this topic is Williams [51], in which product demand
to improve the training speed. In contrast to Gutierrez et al. [46], is categorized as sporadic, slow-moving, or smooth by decompos-
they find that the neural network method performs worse in ing lead-time demand variance into causal elements. Each demand
terms of accuracy measures (ME and MAE) and better in terms of category is then fitted with a specific distribution to calculate re-
achieved service levels than the Croston-type methods. They par- order levels using data on a public utility. Numerical comparisons
tially attribute this contradictory partially to the differences be- show that the proposed demand classification scheme leads to a
tween the datasets used by both studies (a short monthly inter- substantial reduction in inventory cost compared to an assumption
mittent demand series vs. a long lumpy demand series). They also of continuous demand for all products.
argue that forecast accuracy metrics lead to misleading findings for Willemain et al. [52] conjecture that for data with too much
intermittent demand data. or too little intermittency, Croston should not have any significant
Lolli et al. [49] propose a neural network with a simpler and accuracy improvement over SES. Johnston and Boylan [53] provide
faster learning algorithm, referred to as an extreme learning ma- quantitative evidence for this conjecture by proposing a forecasting
chine. They benchmark its performance against [46] and [47] meth- method based on estimates of the mean and variance of demand
ods using an automotive dataset. They show that the neural net- size and the average inter-demand interval. Their numerical study
works using back-propagation learning [46,47] perform better than shows that if the average inter-demand interval exceeds 1.25 re-
the extreme learning machine in terms of MAPE, but the latter is view periods, the proposed method generates more accurate fore-
easier and faster to implement. Guo et al. [50] combine a genetic casts than the exponentially weighted moving average (EWMA).
neural network model with three exponential smoothing variants They also define the average inter-demand interval ( p) as a de-
and a hierarchical forecasting method. By using aircraft spare parts mand classification parameter.
data, the authors show that the combined method generates more Syntetos et al. [54] extend [53] work by suggesting an addi-
accurate forecasts than each method’s direct forecasts. tional demand classification parameter, the squared coefficient of
To generate more insights into the value of nonparametric variation of demand (CV 2 ). They analytically compare the MSEs of
methods, we analyze the results of the performance comparisons Croston, SBA, and EWMA to derive cutoff values for CV 2 and p.
between the parametric and nonparametric methods, as described These cutoff values are then used as a two-dimensional demand
in Section 6. In summary, our analysis reveals that, on aver- classification scheme to find the most accurate forecasting method.
age, Willemain’s bootstrapping method is more accurate than the The classification scheme’s validity is confirmed using a large num-
newer nonparametric methods; however, it can perform poorly de- ber of demand series from the automotive industry. Kostenko and
pending on the data type and the method with which it is com- Hyndman [55] criticize this study by claiming that SBA yields
pared. For lumpy demand series with large spikes, Willemain tends smaller MSEs than Croston in some of the instances reported by
to selectively pick these large demand values to forecast the pe- Syntetos et al. [54] and propose a more accurate cutoff value for
riod’s demand, thereby leading to higher service levels. Thus, for determining when SBA should be preferred over the other meth-
datasets characterized by high degree of lumpiness (e.g., military ods.
aviation and automotive datasets), Willemain tends to yield bet- Building on these earlier works, Boylan et al. [56] examine the
ter inventory performance. On the other hand, for low or moder- demand classification scheme’s stock control implications. Using
ately lumpy demand series, newer nonparametric methods should data provided by a company developing demand planning soft-
be preferred for better inventory performance, as Willemain often ware, they show that zero-demand-period counts can be an alter-
fails to achieve the target service level. native classification parameter. However, the forecasting methods
We also find that nonparametric methods generally have bet- suggested under this new classification scheme fail to achieve the
ter forecast accuracy and inventory performance than paramet- target service levels for lumpy demand. Syntetos et al. [57] extend
ric methods. However, the comparison results are scattered, and this line of research by empirically investigating the link between
some studies report divergent findings for the same datasets. Thus, the classification parameters ( p and CV 2 ) and the fit between the-
the advantage of using these relatively more complicated meth- oretical demand distributions and industrial datasets. They recom-
ods is not apparent due to their potentially higher implementation mend heuristic rules for the best theoretical demand distribution
cost. for inventory control based on these empirical observations. In ad-
8
dition to these works, other papers test [54] classification scheme than the lead time duration may lead to poor performance. For ex-
in other industrial contexts, such as the automotive [58] and com- ample, if demand is aggregated to monthly values while lead times
mercial aviation [59–62] industries. are expressed in days, the benefits of temporal aggregation will
Several other papers offer alternative approaches to the demand most likely be lower. Nikolopoulos et al. [22] suggest that a po-
classification problem. Lengu et al. [63] provide an empirical anal- tential remedy for this issue is to set the aggregation level equal
ysis to assess whether compound Poisson distributions provide a to the lead time length plus one review period. They claim that
good fit for spare parts demand and propose a demand classifi- this heuristic would also be practically valuable since in periodic
cation scheme categorizing SKUs based on the mode and variabil- review inventory systems forecasts are generated to determine the
ity of the observed demand sizes. Empirical tests using industrial safety stocks for this time period.
datasets (household appliances, commercial aviation) show that Babai et al. [68] extend [22] study by investigating the inven-
the compound Poisson distributions considered by the classifica- tory performance of ADIDA for three forecasting methods (Croston,
tion scheme generally provide a good fit for intermittent demand SBA, and SES) with the same (RAF) dataset. They conclude that for
items. In a similar vein, [64] investigate the best-fitting distribu- these forecasting methods, ADIDA results in higher service levels
tions for spare parts demand. They combine SES, Croston, and SBA than those achieved with the disaggregated data. In a similar vein,
with various lead-time demand distributions to create prediction Mohammadipour and Boylan [25] propose a temporal aggregation
distributions, which are then used in an empirical inventory simu- scheme for the integer auto-regressive moving average (INARMA)
lation with the RAF dataset. They show that SBA has the best over- process. The authors analytically show that aggregation of INARMA
all inventory performance and that distributions fitted to high de- process over a time horizon results in an INARMA process. More-
mand percentiles yield the largest inventory savings. over, using a dataset from UK RAF3 and an automotive dataset,
Petropoulos et al. [65] investigate the main determinants of they demonstrate that the forecasts generated by the aggregation
forecast accuracy based on eight characteristics (seasonality, trend, method have, in most cases, lower MSEs than the cumulative fore-
cycle, randomness, number of observations, p, CV 2 , forecasting casts obtained by summing up the h-step ahead estimates.
horizon) and propose a practical selection procedure that matches Following Nikolopoulos et al. [22], Petropoulos et al. [69] de-
these data characteristics to the most appropriate forecasting velop a new aggregation framework, referred to as iADIDA (inverse
method. Through an extensive simulation study, they find that ADIDA), that performs aggregation over demand volumes (instead
when p is long, TSB and a four-period moving average outperform of time) to reduce demand variability. Their empirical examination
the other standard methods (naïve2 , SES, Croston, SBA) in terms of shows that iADIDA improves forecast accuracy and works better
accuracy. On the other hand, for high values of CV 2 , either Croston for datasets with high levels of data-volume variance. Boylan and
or SBA should be preferred. Babai [70] analytically compare the statistical performance of the
A common assumption in demand classification methods is that nonoverlapping and overlapping temporal aggregation approaches.
the interarrival times follow a Bernoulli or Poisson distribution. Using a dataset from the jewelry industry, they show that the ap-
However, these distributions cannot account for increasing fail- proach aggregating time series by using overlapping time buckets
ure rates due to their memoryless property and therefore fail to outperforms the nonoverlapping approach unless the demand his-
capture actual intermittent demand patterns [66]. Syntetos et al. tory is short or the demand is very slow-moving.
[67] address this issue by investigating the performance of EWMA, An alternative forecast improvement strategy is to group items
Croston, and SBA for Erlang-distributed interarrival times and pro- with the same characteristics and then forecasting their aggregate
viding alternative demand classification cutoff values. They con- demand. This forecasting strategy is referred to as cross-sectional or
clude that the average inter-demand interval is a useful classifica- hierarchical forecasting. Moon et al. [71] compare direct and hierar-
tion criterion for the Erlang distribution assumption, but CV 2 has chical forecasting methods using a military dataset from the South
less explanatory power than suggested in earlier studies. Korean Navy. In their study, direct forecasting methods based on
SES are generated by combining different temporal aggregation
levels (monthly, quarterly, yearly) and trend/seasonality adjust-
3.3.2. Data aggregation
ments. Hierarchical forecasts are then generated by using item-
Data aggregation is another way to improve the performance
and group-level direct forecasts. They find that a simple combina-
of spare parts demand forecasting methods. The main idea is to
tion of SES models (quarterly data aggregated at the group level
aggregate data with similar demand patterns, either temporally or
and monthly data at the item level) minimizes forecasting errors
across time series, to reduce the number of zero-demand periods
and inventory costs.
and make forecasts more accurate. Below, we review the studies
In a more recent study, Li and Lim [72] demonstrate the value
that propose different aggregation strategies for spare parts de-
of hierarchical forecasting for intermittent demand forecasting.
mand forecasting methods and discuss how they improve these
They propose a greedy aggregation-decomposition method involv-
methods’ performance.
ing a new hierarchical structure that utilizes aggregated and dis-
Willemain et al. [52] is one of the earliest studies investigat-
aggregated forecasts. This proposed method is tested with an ar-
ing the effects of temporal data aggregation on the performance
ray of forecast accuracy measures using intermittent demand data
of Croston’s method. Their results show that Croston performs
on fashion products. The results show that the authors’ proposed
more accurately with aggregated weekly data than with daily data.
method performs better than other widely used techniques, such
Nikolopoulos et al. [22] introduce a temporal aggregation frame-
as Croston and SBA, and temporal aggregation approaches, such as
work, referred to as ADIDA (aggregate-disaggregate intermittent
ADIDA and iADIDA.
demand approach), in which forecasts are allocated back into the
Combining forecasts derived from alternative methods is an-
original (disaggregated) time series after they have been generated
other well-known forecast improvement strategy. Using the RAF
with temporally aggregated data. Using the RAF dataset, they show
dataset, Petropoulos and Kourentzes [73] compare the perfor-
that ADIDA can significantly improve the performance of the naïve
mances of three different combination strategies. The first strat-
and SBA methods.
egy combines the forecasts of classic methods (naïve, moving av-
It is important to note that the temporal aggregation level
should be chosen carefully as a level significantly higher or lower
3
The dataset used in this study seems to be different from the original RAF
dataset used in other studies, as it involves monthly demand observations of 16,0 0 0
2
The naïve method produces forecasts equal to the last observed demand value. SKUs over six years.
9
erage, SES, Croston, SBA) using the original sampling frequency different market types and conditions faced by various products,
of the time series. The second strategy uses a single forecasting that could not be factored in with a statistical model [e.g., [79–
method at different temporal aggregation levels with ADIDA. The 81]]. Other studies provide contradictory empirical evidence sug-
third strategy combines different methods for different temporal gesting that judgmentally adjusted forecasts can lead to worse per-
aggregation levels. They conclude that combining forecasts from formance than statistical forecasts because forecasters may have
transformed frequencies from the same or multiple methods im- limited access to quantifiable information [82], put too much
proves forecasting performance. They also show that forecasting weight on their subjective contributions [82,83], or make re-
accuracy can be further improved by using classification schemes peated bold adjustments based on inaccurate information [84].
[54,55]. More recent articles examine the role that intentional biases play
Another forecast improvement strategy is outlier detection. in group forecasts [85] or the impact of different time horizons
Quite often, there is planned-maintenance demand in industrial on the accuracy and characteristics of judgmental adjustments
datasets. While there are forecasting methods that take such [86].
maintenance tasks into account [e.g., [45,74,75]], an alternative Although the literature suggests that quantitatively derived
way to tackle these demand spikes is to treat them as out- forecasts are frequently adjusted by human forecasters, judgmen-
liers. As most of these planned maintenance tasks can be fore- tal demand forecasting is an under-researched area in the op-
seen accurately, inventory should be planned ahead of their oc- erations and supply chain management fields [87,88]. Not sur-
currence. Thus, removing such large demands from the dataset prisingly, the literature investigating the link between judgmen-
through outlier detection can improve the forecasting methods’ tal interventions and intermittent demand forecasting is even more
performance. limited.
Overall, we conclude that data aggregation methods positively Syntetos et al. [89] is one of the few studies that looks into this
influence forecast accuracy and inventory performance. As we dis- question. By analyzing monthly demand and forecast data provided
cuss in more detail in Section 6, the comparative studies show that by a pharmaceutical company, they show that judgmentally ad-
temporal aggregation increases forecast accuracy for datasets char- justed forecasts, which take into account market intelligence gath-
acterized by highly variable inter-demand intervals, such as the ered by company forecasters, are more accurate than those gener-
military aviation data. On the other hand, for datasets character- ated by statistical methods. However, they also show that adjusted
ized by erratic patterns and high demand-volume variance, such forecasts do not improve over time, whereas statistical forecasts
as automotive data, demand-volume aggregation performs better. do, and that negative adjustments generally perform better than
Furthermore, the combination of forecasts generated across differ- positive adjustments. They conclude that the increased accuracy of
ent group levels and aggregation frequencies also improves fore- adjusted forecasts leads to better stock control.
casting accuracy for spare parts embedded in hierarchical struc- In a similar vein, Boutselis and McNaught [90] describe an
tures such as weapons systems. In terms of inventory perfor- interesting single-period forecasting problem: demand for spare
mance, the general takeaway from the literature is that aggregation parts for military equipment is to be forecasted over a limited fu-
yields higher service levels at a lower cost for erratic and lumpy ture time period (e.g., the duration of a military operation), during
demand. which the actual spare parts demand can change significantly due
to changes in the operational context. The authors propose three
4. Contextual forecasting different Bayesian network models that incorporate observational
data, expert opinions, or both to generate single-period spare parts
Spare parts demand is often nonstationary and follows the life- demand forecasts. They find that the Bayesian network approaches
cycle pattern of the equipment in which the components are in- outperform the expert-adjusted SES and logistics regression meth-
stalled. It is also dynamically influenced by other contextual fac- ods in terms of accuracy.
tors, such as maintenance schedules, equipment age, and operat- Both of these papers hint that systematic judgmental adjust-
ing conditions. Time-series forecasting methods take into account ments can be beneficial for intermittent demand forecasting. How-
this additional contextual information only through historical data ever, the benefits most likely depend on the characteristics of the
and, therefore, can be slow to adapt to demand changes driven by demand series and how these adjustments are incorporated into
these external factors. The idea behind contextual forecasting is to the forecasts. Given the practical relevance of judgmental forecast-
overcome these difficulties by systematically combining all avail- ing in the context of intermittent demand, more research is needed
able information, broadly categorized as expert judgment and in- to generalize these initial findings.
stalled base information, with statistical methods to improve fore- Next, we briefly review the supply chain literature on judgmen-
casting performance. tal forecasting that takes a broader perspective than spare parts
Because of their potential to improve spare parts demand fore- demand management. Syntetos et al. [91] extend the investigation
casts, contextual forecasting methods have received increasing at- of Syntetos et al. [89] to non-intermittent demand items and show
tention from researchers over the last two decades. Depending on that judgmental adjustments can lead to better stock control than
the type of contextual information, the papers published in this statistical methods by achieving higher service levels with a lower
area can be categorized into two research streams. The papers in on-hand inventory. Fildes et al. [92] empirically show that judg-
the first category investigate how judgmental interventions impact mental adjustments are most effective if they are large, consistent,
the performance of statistical forecasting methods. The papers in and negative. In a theoretical study, Syntetos et al. [93] develop
the second category focus on developing new forecasting methods a system dynamics model to evaluate the effects of forecast and
that incorporate installed base information. Below, we review these order adjustments at various stages in a supply chain. They show
research streams in two separate sections. that the impact of forecast and order adjustments on factory stock
amplification decreases as the intervention point moves upstream
4.1. Judgmental forecasting (from retailer to factory) in the supply chain.
Another study that focuses on the impact of judgmental fore-
Statistical forecasting methods are seldom used in practice casting within the supply chain is Eksoz et al. [94]. Relying on
without expert interventions and updates [76–78]. Early empiri- survey data, the authors show that sharing judgmentally adjusted
cal studies suggest that judgmental adjustments improve statisti- forecasts facilitates group forecasting efforts between manufactur-
cal forecasts by taking into account relevant information, such as ers and retailers. For further discussions and reviews of judgmental
10
forecasting in the contexts of supply chain and operations manage- information (e.g., maintenance, location, service contract informa-
ment, we refer the reader to Goodwin et al. [95], Arvan et al. [88], tion) about their installed base. In contrast, B2C companies rarely
and Perera et al. [96]. have such granular data but rather a vague idea about their in-
stalled base size and obsolescence time. However, these companies
4.2. Installed base forecasting still need to forecast end-of-life (EOL) spare parts demand for their
products to honor warranty agreements. Kim et al. [103] focus on
The term installed base is defined by Dekker et al. [97] as “the these settings and propose a forecasting method that requires rela-
whole set of systems or products for which an organization pro- tively limited installed base information (sales and returns). Using
vides after-sales services,” and the term installed base information a database of consumer product components, they show that their
typically refers to information on such equipment’s quantity, loca- installed based forecasting method generates significantly more ac-
tion, operational environment, failure rate, and condition derived curate estimates than a simple autoregressive model.
from maintenance and remote monitoring activities. Installed base In the same vein, Dombi et al. [104] also develop an installed
forecasting methods use this information to predict potential fu- base forecasting model for consumer electronics products. They
ture equipment failures and generate more accurate spare parts combine analytic and fuzzy clustering techniques to estimate de-
demand forecasts. mand curves for consumer electronics spare parts. Using a dataset
The idea of using installed base information in spare parts de- supplied by an electronics service provider, they show that their
mand forecasting is not new. Cohen et al. [98] suggest incorporat- method has the best accuracy performance when compared to SES,
ing part failure rates and the number of installed machines into ARIMA, and two soft computational forecasting methods. Building
exponential smoothing to improve spare parts demand forecasts at on Wang and Syntetos [102] and Kim et al. [103], Van der Auw-
IBM. Petrović and Petrović [99] develop logistics software, referred eraer and Boute [105] offer a more versatile method that incorpo-
to as SPARTA II, that combines a Bayesian algorithm and fuzzy set rates information on maintenance policy, age, installed base size,
theory with installed base information to estimate the probabil- and part reliability. They compare the proposed method with SES
ity of satisfying spare parts demand and to calculate stock levels. and SBA through a simulation study. Their findings suggest that
Aronis et al. [100] present a case study in which part failure rates their proposed method can achieve the same cycle service levels
for telecommunication systems are estimated using the Bayesian with lower inventories in the product’s maturity and EOL phases.
approach. These estimates are then used to forecast demand for As we discuss in more detail in Section 6, using installed base
new parts that do not have a failure history. In another case study, information substantially increases forecast accuracy, especially for
Ghobbar and Friend (2002) investigate the link between demand highly intermittent demand (e.g., parts demand for electric power
lumpiness and installed base information for aircraft spare parts. equipment or petrochemical machinery components). However, for
Using a general linear model and data from an airline operator, hard-to-forecast demand patterns, such as those characterized by
they show that maintenance (condition-based) and utilization in- sudden drops or a high level of part commonality along with high
formation (flying hours and number of landings) for aircraft are intermittence, the impact of installed base forecasting on accu-
the two major sources of the lumpiness observed in the parts’ de- racy is limited. Although there is some evidence indicating that in-
mand patterns. stalled base information can improve cycle service levels relative
However, none of these papers benchmark their proposed to other well-known methods, more research is needed in this di-
method’s performance against well-known spare parts demand rection to draw robust conclusions.
forecasting methods. One of the earlier works taking this avenue Overall, installed base information generates significant visibil-
is Hua et al. [101]. They estimate the probability of a demand oc- ity for service supply chains and can yield significant cost sav-
currence in each period as a function of contextual variables (plant ings by effectively addressing issues such as demand intermittence
and equipment overhaul) with a logistics regression. Using a petro- and obsolescence. On the other hand, exploiting this information
chemical company dataset, they show that their method outper- is difficult due to the high costs associated with the collection,
forms SES, Croston, and Willemain according to two new forecast standardization, and storage of big data [97,106]. Thus, more case
accuracy measures that they introduce. studies are needed to outline implementation guidelines and best
A series of papers following Hua et al. [101] introduce methods practices. For further discussion on the use of installed base infor-
using maintenance information to improve spare parts forecasting. mation for spare parts demand forecasting, we refer the reader to
Wang and Syntetos [102] propose a method based on planned and Van der Auweraer et al. [107] and Hellingrath and Cordes [108].
corrective maintenance information (block- and age-based inspec-
tion) using the delay time modeling concept, which defines a two- 5. Comparative studies
stage failure process. Using simulations, they show that their pro-
posed method generates more accurate forecasts than SBA in al- Comparisons between different forecasting methods have been
most all cases. Romeijnders et al. [75] develop a two-step forecast- reported in the literature. These comparative studies provide per-
ing method that uses information on component repairs to more formance benchmarks for the traditional and alternative methods
quickly detect changes in the underlying demand rate. Through a used in practice to forecast spare parts demand. Often, they use in-
case study, they show that their method performs similarly to SES, dustrial data sets and quantify the methods’ performance accord-
moving average, and TSB, whereas it outperforms Croston in terms ing to several forecast accuracy or inventory performance mea-
of MSE, MAE, and ME. They also show that if planned mainte- sures. In this section, we begin by explaining these performance
nance tasks are taken into account, the forecasting errors can be measures and provide a quantitative overview of how frequently
reduced by 20%. In a follow-up study, Zhu et al. [74] propose a they are used in the spare parts demand forecasting literature. We
joint forecasting and inventory control method that uses planned then continue by reviewing the comparative studies and identify-
maintenance tasks as a source of advance demand information. ing their contributions.
Using datasets from an aircraft service provider and Netherlands
Railways, they show that the proposed method yields a significant 5.1. Performance measures
reduction in total inventory costs over the traditional benchmark
methods (e.g., SES, Croston, SBA, TSB). There are two main categories of performance measures: fore-
All of the studies above assume B2B settings in which com- cast accuracy measures and inventory performance measures. Fore-
panies have somewhat advanced capabilities to monitor detailed cast accuracy measures quantify the gap between historical de-
11
Table 2 sures to give a sense of a particular method’s overall performance.

Common absolute accuracy measures.
Among all the papers reviewed in this study, a total of 44 papers

MEt = 1t ts=1 es Mean error (bias) use at least one forecast accuracy measure in their benchmarking

MAEt = 1t ts=1 |es | Mean absolute error studies. Fig. 4 presents the distribution of the accuracy measures
1 t
MSEt = t s=1 es2
Mean squared error
1 t
among these papers. The reported percentages are based on the
RMSEt = e2
t s=1 s
Root mean squared error
t e s number of papers in which a particular measure or one of its close
s=1
MAPEt = t Mean absolute percentage error derivatives (e.g., ME or ME scaled) is used.
s=1 Ys
1 t |es |
MASEt = s=1 t Mean absolute scaled error Fig. 4 b shows that a large majority of the papers use abso-
i=2 |Yi −Yi−1 |
t 1
t 1 t−1
lute accuracy measures. This is to be expected because absolute
GMAEt = |es | t
s=1 Geometric mean absolute error
t 2 1t measures are simple and intuitive, and there are more options
GRMSEt = s=1 es Geometric root mean squared error
for absolute measures than for relative measures or other types
of measures. PB and PBt are the least preferred measures, even
though relatively early studies consider them to be meaningful
mand observations and forecasts, whereas inventory performance measures for intermittent demand [15,46], whereas newly devel-
measures consider how a particular forecasting method performs oped or less common measures represent a significant percentage
in terms of service levels, on-hand inventory, stockouts, or total (9.4%). Fig. 4 a provides a more detailed view of the distribution
inventory costs. Below, we briefly review these two categories and of the absolute accuracy measures. Interestingly, classic measures
highlight their differences from each other. such as MAE, MSE, and MAPE, remain the most commonly used
accuracy measures, despite criticism about their suitability for in-
5.1.1. Forecast accuracy measures termittent demand over the years [e.g., [30,65,109,110]]. For exam-
Most accuracy measures are simple functions of the forecast er- ple, Hyndman and Koehler [109] claim that MASE is the best fore-
rors (es := Ys − Yˆs ) generated by a particular forecasting method. cast accuracy measure for intermittent data, yet it is only the fifth
These measures are referred to as absolute accuracy measures be- most commonly used absolute accuracy measure in the spare parts
cause they quantify a single forecasting method’s performance for demand forecasting literature. For a more detailed discussion of
a given time series [15]. Common absolute accuracy measures are forecast accuracy measures, we refer the reader to Hyndman and
shown in Table 2. Most of these standard measures, however, de- Koehler [109], Wallström and Segerstedt [110], and Prestwich et al.
pend on the scale of the data, making them impractical for com- [111].
parisons across time series of different items. The exceptions are
the MAPE and MASE, both of which provide a scale-free measure-
ment of forecast accuracy.
When an accuracy measure quantifies various forecasting meth- 5.1.2. Inventory performance measures
ods’ relative performances, it is referred to as a relative accuracy Despite forecast accuracy measures’ widespread use, high fore-
measure. These measures compute one method’s accuracy relative cast accuracy does not necessarily imply high inventory perfor-
to that of other methods, typically by using an absolute mea- mance for spare parts [e.g., [11,91,112]]. Moreover, although the im-
sure as a baseline performance metric. Examples of relative accu- plicit function of a spare parts demand forecasting method is to
racy measures are the relative geometric root mean squared error serve as a demand planning tool for stock control, the number
(RGRMSE), relative root mean squared error (RRMSE), and relative of studies measuring a forecasting method’s impact on inventory
arithmetic mean absolute error (RAMAE). Two other common rel- performance is surprisingly low. Among all the papers considered
ative accuracy measures are the percentage better (PB) or percent- in our review, we identified 29 papers using at least one inven-
age best (PBt), which rank the performance of different methods tory performance measure in their benchmarking studies. Fig. 5
based on the percentage of time they perform better or best ac- shows the distribution of the various inventory performance mea-
cording to an underlying measure. sures used in these papers.
There are also other less common forecast accuracy measures We observe from Fig. 5 that the two most common inventory
that have been proposed for particular forecasting methods or data performance measures are the service level and the tradeoff curve.
types. For example, Willemain et al. [30] introduce an accuracy Almost all papers use the achieved cycle service level or fill rate as
measure assessing the proximity of the estimated and hypothetical a service level measure, whereas tradeoff curves are used to show
lead-time distributions. Similarly, Hua et al. [101] develop an accu- the tradeoff between inventory costs or volumes and achieved ser-
racy measure counting correct demand-occurrence predictions for vice levels or backorder volumes. Syntetos et al. [113] claim that
situations in which forecasting a demand occurrence is more criti- tradeoff curves are practical and realistic inventory performance
cal than forecasting the demand size. Some studies suggest differ- measures compared to those involving a target service level or less
entiating between a method’s performance evaluated at all points common metrics such as implied stock holdings or average regret.
in time or only at the issue points because zero-demand periods Other standard inventory performance measures are the average
can influence the variance measurements [e.g., 17,53]. total cost (consisting of inventory holding and stockout costs) and
OEMs often complete a final production run to cover the spare average on-hand inventory or stockout volumes. Similar to their
parts demand during the remaining life of a product before ceas- use of accuracy measures, most studies use a combination of these
ing its production. The size of this final order should be based measures to assess a forecasting method’s impact on spare parts
on a forecast of the total demand over the remaining service life. inventory [e.g., [2,23,37]].
To assess the quality of this forecast more accurately, Kim et al. A few other studies define their own alternative metrics to
[103] propose an accuracy measure that quantifies the difference benchmark inventory performance. For example, Snyder [34] mea-
between the summed forecasts and the summed demand over the sures a method’s inventory performance with the order-up-to level
end-of-life (EOL) phase. They conclude that this alternative mea- necessary to achieve a 95% service level. Eaves and Kingsman
sure should be used as the main benchmark for EOL products as [24] propose a measure called implied stock holdings, which calcu-
it yields lower forecast errors than the MAPE and the root mean lates the safety margin for zero stockouts, and Sani and Kingsman
squared percentage error (RMSPE). [114] introduce a relative inventory performance measure, referred
Because there is no single forecast accuracy measure that is to as average percentage regret, which gives the opportunity cost of
generally accepted in the literature, most papers use several mea- using a specific forecasting method.
12
Fig. 4. Forecast accuracy measures used in spare parts demand forecasting.
Holt, Winter) according to their MAPEs with data obtained from

an airline operator. They conclude that the weighted moving av-
erage is superior to the other methods, and its performance in-
creases with the length of the temporal aggregation. Similarly, Re-
gattieri et al. [61] show that higher demand lumpiness consistently
leads to higher MAE, and the best-performing methods for com-
mercial aircraft spare parts demand are the weighted moving av-
erage, Croston, EWMA, and trend-adjusted exponential smoothing
methods. By using a dataset from the UK RAF4 , Eaves and Kings-
man [24] compare the performances of the SES, moving average,
naïve, Croston, and SBA methods based on standard forecast ac-
curacy measures and an inventory measure referred to as implied
stock holdings. Through a periodic review order-up-to policy, the
authors show that SBA yields the best implied stock holdings per-
Fig. 5. Inventory performance measures used in spare parts demand forecasting. formance. In terms of forecast accuracy, they conclude that there
is no best method overall, and traditional accuracy measures are
not ideal for making comparisons in the presence of slow-moving
5.2. Performance comparisons intermittent demand.
To better understand the impact of trends on forecasting per-
Willemain et al. [52] perform a comparison of Croston and SES formance, Altay et al. [2] compare SBA with [116] modification of
using both industrial and simulated data, which is generated to Holt’s method. They find that SBA consistently yields lower stock
violate Croston’s assumptions. For both data types, they conclude levels at the expense of lower service levels, whereas [116] method
that Croston outperforms SES according to most standard fore- leads to higher service levels than SBA without significantly in-
cast accuracy measures and has broader applicability than claimed creasing the total cost. Similar to earlier articles, Altay et al. [2] ob-
by Croston [3]. Sani and Kingsman [114] is one of the first pa- serve that comparisons based on standard forecast accuracy mea-
pers to present a comparative study of various spare parts fore- sures yield inconclusive results. According to Teunter and Duncan
casting methods. They compare five different forecasting methods [11], the reason why many comparative studies generate inconclu-
with standard and ad hoc inventory policies based on annual in- sive results is partly due to inappropriate choices of performance
ventory cost and fill-rate measures. They show that the one-year measures. The authors illustrate this point by empirically showing
moving average has the best overall inventory performance, and that the zero-forecast method (predicting zero demand in each pe-
it is closely followed by Croston. They also show that SES provides riod) performs better than SES, moving average, Croston, SBA, and
the highest fill rate but at the price of high inventory holding costs. a parametric bootstrapping method based on standard forecast ac-
Along the same lines, Strijbosch et al. [115] compare two dif- curacy measures. They repeat these comparisons using inventory-
ferent (s, Q ) models with a simulation study. Both models use a and service-level metrics and find that Croston, SBA, and the boot-
Croston-type forecasting method but differ mainly in their lead- strapping method perform consistently better than SES and the
time demand model. The authors find that a compound Bernoulli moving average. The zero-forecast method has the worst inventory
lead-time demand with mixed-Erlang demand sizes yields better and service level performance, as expected.
fill-rate performance than normally distributed lead-time demand. The majority of comparative studies exclude bootstrapping
Building upon these studies, Syntetos and Boylan [112] compare methods and focus on comparisons between standard parametric
the inventory performances of simple moving average, SES, Cros- methods and Croston and its modifications. One of the few arti-
ton, and SBA under a periodic review order-up-to policy. They con- cles investigating the value of bootstrapping methods is by Synte-
clude that SBA has the best inventory performance overall, but the tos et al. [113]. By using data from the jewelry and electronics sec-
moving average yields results very close to those of SBA. tors, they compare the inventory performances of [30] bootstrap-
Because a typical application of intermittent demand forecast- ping method, SES, Croston, and SBA. The authors conclude that
ing methods is for aircraft spare parts, a series of papers conducts
comparative studies by using data sets from the commercial and 4
The dataset used in this study seems to be different from the original RAF
military aviation industries. Ghobbar and Friend [60] compare thir- dataset used in other studies, as it involves demand observations of 18,750 SKUs
teen well-known forecasting methods (e.g., SES, EWMA, Croston, over six years.
13
Table 3
Example calculation of percentage better scores.
Better Performance Scores

Number of
Experiment Experiment Type SBA Traditional Inconclusive Comparisons
1 I1 -ME 1 1 1 3
2 I1 -MSE 2 0 1 3
3 I2 -ME 3 0 0 3
4 I2 -MSE 0 2 1 3
Total 6 3 3 12
Percentage Better 50% 25% 25% -
while [30] method usually performs better than the other meth-
ods, the performance differences are not very large, and parametric
techniques may be preferred due to their simplicity. Another com-
parative study involving a bootstrapping method is by do Rego and
de Mesquita [58]. Through a large-scale simulation study based on
automotive parts data, they show that the best performing method
in terms of the total cost and achieved fill rate is the modification
of [37] bootstrapping method for lumpy demand and SBA for er-
ratic demand aggregated to the monthly level.
6. Quantitative literature analysis
To provide a clearer synthesis of the state of the art in spare Fig. 6. Croston vs. SBA.
parts demand forecasting research, we conduct a quantitative anal-
ysis of the findings in the literature. We first analyze the results for
the performances of the two standard benchmark methods: Cros- periments based on the data and performance measure type. This
ton and SBA. Our objective is to understand the settings in which categorization enables us to summarize the better performance
one method outperforms the other by analyzing the results of the scores for each paper into a percentage better score. For example,
papers conducting such a comparison. We then extend this anal- if a paper compares SBA and the traditional methods based on
ysis to performance comparisons between these two benchmark two forecast accuracy measures (e.g., MSE and ME) using two in-
methods and other traditional or newer forecasting methods. We dustrial data sets (e.g., I1 and I2 ), the percentage better scores for
continue our analysis by synthesizing the extant literature’s find- those experiments can be computed by dividing the total better
ings on the performance of nonparametric and parametric methods performance scores by the total number of comparisons, as shown
along with how incorporating of installed base information, human in Table 3.
judgments, demand classification, or data aggregation affects fore- We average these percentage better scores across the papers re-
casting performance. Below, we first explain the methodology un- porting similar experiments to obtain an average percentage better
derlying the literature analysis and then continue with a discussion (APB) score based on the data or performance measure type used,
of our findings. as shown, for example, in Table 4. Thus, the average percentage
better score gives the overall performance of a particular method–
6.1. Methodology or a group of methods–across the papers carrying out similar com-
parisons with different experimental setups and data.
Among all the papers reviewed in this study, we identified 56 A more detailed analysis based on another overall-performance
papers suitable for our literature analysis, which report compar- measure that takes actual experimental measurements into ac-
isons of various spare parts demand forecasting methods. Papers count, however, is not feasible. This is because each paper presents
that are not in line with the objectives outlined above are excluded its empirical study results at different levels of detail and conducts
from the analysis. A complete list of papers included in each anal- experiments based on different setups. Consequently, it is not pos-
ysis can be found in Table A.1 in the Appendix. For each paper, sible to carry out a more detailed analysis by directly using the
we count the number of times that the methods of interest out- reported results. On the other hand, the two-step approach de-
perform or tie with each other. For example, if an experiment in scribed above–in which we first calculate the percentage better
a paper benchmarks the performances of SBA and three other tra- scores per paper and then average these scores over all relevant
ditional methods (SES, zero-forecast, and the naïve approach), we papers–enables us to synthesize a diverse set of findings and gen-
count the outcomes of three possible comparisons (SBA vs. SES, erate more granular insights into the state of the literature.
SBA vs. zero-forecast, SBA vs. naïve) to obtain better performance
scores for these methods. Ties can happen if the reported perfor-
mance results are the same or the tradeoff curves do not clearly in- 6.2. Results
dicate a better performer. In our example, if we assume that in this
particular experiment, SBA outperforms the zero-forecast method, 6.2.1. Comparison of croston and SBA
is outperformed by the naïve method, and ties with SES due to in- We begin our discussion with comparisons between Croston
conclusive results, the better performance scores would be SBA 1, and SBA, the two most common benchmark methods for spare
traditional methods 1, and inconclusive 1. parts forecasting. Fig. 6 presents each method’s average percent-
Because the experiments are carried out with different data age better score based on forecast accuracy and inventory perfor-
sets, performance measures, parameter choices, and distributional mance measures. In all figures and tables presented in this section,
assumptions, it is difficult to define a common experiment type N represents the number of relevant papers over which the aver-
across the literature. To overcome this difficulty, we categorize ex- age percentage better score is calculated.
14
Fig. 7. Croston vs. SBA–Granular results by data type† .
Fig. 6 shows that overall, SBA performs better than Croston 87% observed in Kocer [43], Varghese and Rossetti [35], and Kourentzes
of the time when comparing the methods in terms of accuracy [48]; however, none of these studies show that Croston offers a
measures. Fig. 7 provides more granular insights into these com- significant improvement over SBA. Thus, SBA can be preferred over
parisons by breaking down the percentage better (PB) scores for Croston for industrial spare parts datasets if forecast accuracy is
Croston according to the data types used by each paper comparing the primary concern.
Croston and SBA. We observe from Fig. 7a that in terms of accu- Fig. 6 also shows that SBA outperforms Croston on overage 21%
racy measures, SBA consistently outperforms Croston for the dif- of the time in inventory measures, whereas Croston outperforms
ferent data types. The most significant exception is observed in Li SBA on average 45% of the time. However, many comparisons are
and Lim [72], in which the largest PB score for Croston in terms
of accuracy measures is reported. The dataset in this study comes
from a retailer of women’s fashion products (e.g., shoes, sunglasses, †
In the figures, y-axes give the percentage of comparisons in which Croston out-
bags) and exhibits highly seasonal patterns. Other exceptions are performs SBA in a given paper (i.e., percentage better score for Croston).
15
inconclusive (34%), and most studies indicate rather small perfor- of Croston and of the traditional methods are very similar (45.8%
mance differences between the two methods. The difference in vs. 50.4%). In terms of inventory measures, however, Croston out-
performance in terms of inventory control measures can be ex- performs the traditional methods (56.2% vs. 18.8%), which can be
plained by the bias in Croston. Because Croston is positively bi- attributed to Croston’s positive bias.
ased, it leads to consistently higher average inventory and service Fig. 8 shows the distribution of the PB scores for Croston–
levels than the SBA method [e.g., [26,45,67]]. Moreover, our analy- in comparison to traditional methods–with respect to data types
sis shows that service level is the most common inventory perfor- and performance measures. For datasets from the aviation, mili-
mance measure used in the literature (see Fig. 5). Thus, the com- tary, petrochemical, and retail industries, Croston’s accuracy perfor-
bination of these two factors could lead to the slightly higher av- mance is generally better than that of the classic methods (Fig. 8a).
erage inventory performance observed in the comparative studies. On the other hand, for electronics, electrical parts, manufactur-
Especially for highly intermittent or decreasing demand, Cros- ing, and marine engine datasets, the classic methods seem to have
ton’s service level advantage becomes more pronounced, as SBA higher accuracy. For the automotive and military aviation datasets,
tends to underestimate such demand patterns. Fig. 7b shows that however, comparisons of the accuracy performance between Cros-
for automotive, aviation, and military aviation datasets, several ton and the traditional methods are less conclusive.
studies report better inventory performance by Croston as they The variation in the accuracy performances of the methods
contain highly intermittent or downward-trending spare parts de- across industrial datasets can be explained by the intermittency
mand data. On the other hand, for datasets that are neither partic- and demand size characteristics of the data. As suggested in Wille-
ularly intermittent nor erratic, such as jewelry data, the inventory main et al. [52], there might be an optimal degree of intermittency
performance of SBA is reported to be generally better than but very that justifies switching from a traditional method such as expo-
close to that of Croston. Thus, Croston may be preferred for the nential smoothing to Croston. That is, “too many zeros make it es-
automotive, aviation, and military aviation industries in which the sentially impossible to forecast well using any statistical method
service level is often the primary performance measure. while too few zero values make it unnecessary to abandon ex-
There are also other factors that may drive the observed dif- ponential smoothing.” Indeed, the electronics and electrical parts
ferences in performance in terms of inventory control and fore- datasets for which the traditional methods generate more accurate
cast accuracy among the various studies. Measuring the inventory forecasts than Croston are characterized by low intermittency and
performance of a method requires various input parameters, such large demand sizes [29,30]. Similarly, the marine engine dataset
as the inventory policy, safety factors, lead time, the target service consists almost entirely of zero demand instances [30]. On the
level, the initial stock level and the price of the spare part. Forecast other hand, the aviation and petrochemical datasets are character-
accuracy measures, on the other hand, are more standardized and ized by high intermittency and low demand sizes [29,30,101], to
treat all products the same. Aggregating data based on time, de- which the underlying assumptions of Croston seem better suited
mand volume, or product category also influences the performance and which therefore lead to better accuracy performance.
of a method. Especially when the two methods perform relatively In terms of inventory performance, Croston does not seem to
similarly to each other, as in the case of Croston and SBA, all of offer any advantages over the traditional methods for the auto-
these factors may contribute to the differences between the meth- motive, electronics, and jewelry datasets (Fig. 8b). For the military
ods’ inventory control and forecast accuracy performances across and military aviation datasets, however, Croston significantly out-
different studies, even if those studies use the same dataset. performs the traditional methods. Croston’s performance improve-
ment can be explained by the fact that military operations require
6.2.2. Comparison of Croston and SBA with traditional methods high service levels and that both data types are characterized by
Table 4 summarizes the performance comparison of the two moderate to high intermittency. We obtain qualitatively similar in-
benchmark methods (Croston and SBA) with traditional methods. sights from the comparisons between SBA and the classic meth-
Traditional methods are parametric methods that were not devel- ods in terms of performance and data type. The main difference is
oped for spare parts demand forecasting but are commonly used that the PB scores for SBA are slightly higher than those for Cros-
in comparative studies as benchmarks and in practice due to their ton due to its bias correction. We refer the reader to Fig. A.2 in
simplicity (e.g., SES, moving averages, naïve forecasting, zero fore- the Appendix for detailed results for SBA performance by
casting). In terms of accuracy measures, the overall performances data type.
Table 4
Croston and SBA vs. traditional methods.
Average Percentage Better (APB)

Number of
Croston Traditional Inconclusive Papers (N)
Accuracy measures Industrial data 43.8% 52.1% 4.1% 18

Simulated data 54.8% 42.9% 2.4% 7
Overall† 45.8% 50.4% 3.8% 23
Inventory measures Industrial data 57.7% 11.5% 30.8% 9
Simulated data 50.0% 50.0% - 3
Overall 56.2% 18.8% 25.0% 12

Number of
SBA Traditional Inconclusive Papers (N)

Simulated data 52.8% 44.4% 2.8% 6
Overall 56.4% 41.7% 1.9% 19
Simulated data 8.3% 50.0% 41.7% 3
Overall 50.4% 27.2% 22.4% 14
The number of papers (N) of the ‘Overall’ scores row can be different from the sum of the number of
papers reported in the above two rows as some papers use both industrial and simulated datasets.
16
Fig. 8. Croston vs. traditional methods—Granular results by data type.
Table 4 also shows that for simulated data, the inventory per- a simulation [45,74]. OEMs selling spare parts may face demand
formance of Croston and SBA are less conclusive. That is, Croston from several users and be less impacted by maintenance policies.
and the traditional methods tie (50% vs. 50%), whereas SBA is out- On the other hand, companies or organizations using spare parts
performed by the traditional methods (8.3% vs. 50%), and there are might have more slow-moving and uniquely installed parts.
many inconclusive results (41.7%). Although the small number of Moreover, many simulation studies assume i.i.d. demand in-
papers using simulated data (N = 3) casts doubt on the significance tervals and demand sizes, whereas industrial data can have pos-
of these observations, it is important to highlight that the indus- itively correlated inter-demand intervals and highly skewed de-
trial datasets might potentially have very different patterns from mand sizes. Willemain et al. [52] show that even under these less-
the simulated datasets to which a spare part forecasting method than-ideal conditions, Croston performs better than exponential
such as Croston or SBA are better suited. For example, the spare smoothing. Similarly, some simulation studies generate relatively-
parts demand generated by maintenance and inspection regimes fast-moving intermittent demand data for which the added value
may create demand spikes that are somewhat difficult to mimic in from abandoning traditional methods is low [e.g., [2,27,48]]. All of
17
Table 5
Croston and SBA vs. new parametric methods† .
Number of
Croston New Inconclusive Papers (N)

Simulated data 19.5% 79.3% 1.2% 8
Overall 18.6% 80.8% 0.6% 15
Overall 64.8% 19.6% 15.6% 5
Number of
SBA New Inconclusive Papers (N)

Simulated data 27.5% 68.4% 4.1% 6
Overall 47.0% 46.2% 6.8% 12
Overall 32.0% 38.0% 30.0% 3
These are the methods reviewed in Section 3.1 (e.g., TSB, modified SBA, HES, parametric boot-
strapping and distribution-fitting methods).
these factors might contribute to the performance difference because the many slow-moving parts in this dataset have sudden
tween the industrial and simulated datasets. demand obsolescence (due to equipment upgrades or removals),
which Croston and SBA cannot deal with well, as discussed in
6.2.3. Comparison of Croston and SBA with new parametric methods Section 3.1.2.
Table 5 presents the results of the comparisons between
the two benchmark methods and the newer parametric spare 6.2.4. Comparison of Willemain with new nonparametric methods
parts forecasting methods that were developed after Croston and To investigate the performance of the nonparametric meth-
reviewed in Section 3.1 [e.g., [21,27,29,32]]. We observe from ods, we first compare Willemain’s method with the nonparamet-
Table 5 that the newer methods have higher average accuracy and ric spare parts demand forecasting methods developed after it.
lower average inventory performance than Croston. These newer methods include the empirical method, neural net-
Similar to the comparison with SBA, Croston’s inventory perfor- work methods, and other forms of bootstrapping methods [e.g.,
mance advantage over the newer methods is likely to be driven by [10,37,46]].
its tendency to yield higher service levels for intermittent demand. Fig. 10a shows that the overall performance of Willemain is
Indeed, for the railway, military, and military aviation datasets, somewhat better than that of the newer nonparametric methods
which are characterized by moderate to high intermittency and (57% vs. 43% according to accuracy measures, 63% vs. 28% accord-
high target service levels, Croston outperforms newer methods in ing to inventory measures). However, this conclusion is based on a
terms of inventory performance (Fig. 9b). On the other hand, for small number of papers (N = 3, 6), and as shown in Fig. 11, Wille-
the automotive datasets, Babai et al. [26] and Babai et al. [27] show main’s accuracy and inventory performance can vary depending on
that the inventory performances of Croston, SBA, and the two the data type. For example, Pennings et al. [29] report that Wille-
newer methods (TSB and modified Croston) are very similar, but main generates more accurate results than their own nonparamet-
the newer methods perform slightly better (Fig. 9b). Thus, in terms ric bootstrapping method (referred to as DLP) for all considered
of inventory performance, both SBA and Croston seem to be better datasets except the auto parts dataset. However, the specific rea-
choices than the new and classic parametric methods, with the lat- sons behind Willemain’s superior performance are not clear. On
ter being more suitable for industries in which a high service level the other hand, Hua et al. [101] show that their method signifi-
is important. cantly outperforms Willemain for datasets of spare parts from a
Fig. 9a shows that in terms of accuracy measures, Croston per- petrochemical company. This finding is likely to be the result of
forms worse than the newer methods for the majority of data that the method proposed by Hua et al. [101] uses more infor-
types except for the retail data used by Li and Lim [72]. For this mation (demand autocorrelation and contextual information) than
dataset, Croston outperforms both TSB and SBA, which may be ex- Willemain.
plained by highly seasonal and potentially autocorrelated demand In terms of inventory performance, Willemain tends to yield
patterns for fashion products, as Croston seems to perform well higher service levels for datasets that are characterized by high
when stationarity assumptions are violated [52]. lumpiness, such as those found in military aviation and automotive
Table 5 shows that the overall accuracy performance of SBA is industries (Fig. 11b). This is because when there are large spikes in
better than that of the new parametric methods (47% vs. 46.2%). demand, Willemain may selectively pick these values to forecast
In particular, for automotive and electronics datasets, SBA seems the next period’s demand. Additionally, its jittering process tends
to generate more accurate forecasts than either Croston or the to increase variation around large demand values [10,45]. On the
newer methods (Fig. A.3a). For military, military aviation, and elec- other hand, for low or moderately lumpy demand series, such as
trical parts datasets, SBA and the newer methods compete depend- those found in the jewelry and commercial aviation spare parts
ing on the accuracy measures used. However, for spare parts data datasets used by Zhu et al. [45] and Hasni et al. [36], Willemain of-
from the commercial aviation industry, Romeijnders et al. [75] re- ten fails to achieve the target service level. Thus, for such datasets,
port that SBA generates significantly less accurate results than the the newer nonparametric methods can be preferred due to their
two newer methods (TSB and the two-step method). This is be- better inventory performance.
18
Fig. 9. Croston vs. newer methods—Granular results by data type .
Fig. 10. Performance of Willemain, nonparametric, and parametric methods.
19
Fig. 11. Willemain vs. newer nonparametric methods—Granular results by data type.
6.2.5. Comparison of nonparametric and parametric methods tary aviation datasets. For the automotive industry, two different
We continue investigating the performance of the nonparamet- datasets and methods used by Pennings et al. [29] and Lolli et al.
ric methods by comparing them with the parametric methods. [49] lead to two divergent conclusions. Similarly, for the electron-
Fig. 10 shows that in general, the nonparametric methods out- ics data, Pennings et al. [29] report more accurate results for the
perform the parametric methods in terms of both types of per- parametric methods, whereas several other papers point out that
formance measures (64% vs. 35% according to accuracy measures the nonparametric methods, which are based on neural networks
and 60% vs. 33% according to inventory performance measures). or bootstrapping, generate significantly more accurate results than
In terms of accuracy measures, each paper yields somewhat clear the parametric methods. For all other data types, the most accurate
results in favor of or against the parametric methods (Fig. 12a). method is a nonparametric one.
The parametric methods generate more accurate results than the In inventory measures, the nonparametric methods generally
nonparametric methods for the electric parts, military, and mili- perform better, but each study has some comparisons in which a
20
Fig. 12. Parametric vs. nonparametric methods—Granular results by data type.
parametric method outperforms a nonparametric method for some the same datasets, Hasni et al. [36] report mixed findings about
datasets (Fig. 12b). The only exception is the petrochemical refinery the service level performance of the parametric and nonparamet-
data, for which Porras and Dekker [10] show that a simple para- ric methods.
metric model, which assumes normally distributed spare parts de- Thus, based on these and the other studies shown in Fig. 12b,
mand, achieves the best overall performance out of two other non- it is difficult to provide clear-cut recommendations between para-
parametric methods (Willemain and the empirical method). Simi- metric and nonparametric methods in terms of inventory perfor-
larly, Syntetos et al. [113] report that for erratic data supplied by mance and data type. Furthermore, some nonparametric meth-
an electronics manufacturer, SES yields a better tradeoff between ods, such as neural network-based models, can be very hard to
service level and inventory investment than does Willemain. Their implement in business contexts. Consequently, the tradeoff be-
study also shows that for jewelry data, which are neither partic- tween implementation costs and forecast improvements should
ularly intermittent nor erratic, the parametric and nonparametric be carefully considered before adopting these technically involved
methods generate similar tradeoff curves. On the other hand, for methods.
21
Fig. 13. Performance of methods using installed base information or data aggregation.
Fig. 14. Methods including installed base information—Granular results by data type (accuracy measures) .
6.2.6. Comparison of methods using installed base information latter takes additional (repair) information into account. They par-
We also investigate the impact of contextual information and tially attribute this counterintuitive result to the demand patterns
data aggregation on the methods’ performances. Fig. 13a shows being very hard to forecast regardless of the method used.
comparisons between the methods that use installed base infor- For inventory performance, Van der Auweraer et al. [107] show
mation and those that do not (e.g., Croston, SES, TSB, Willemain). through a simulation study that combining maintenance, in-
The average percentage better scores suggest that using installed stalled base, and reliability information can lead to improved cy-
base information substantially increases forecast accuracy (85% vs. cle service levels with lower inventory than those generated by
8%) for industrial and simulated datasets. Fig. 14 shows that espe- SBA, SES, and their variants. However, this is the only paper
cially for spare parts characterized by highly intermittent demand, in the literature investigating the inventory performance of in-
such as those supporting electric power equipment or petrochem- stalled base forecasting, and therefore, it is difficult to draw robust
ical machinery, incorporating contextual information into forecasts conclusions.
seems to yield significant accuracy improvements over black-box
methods [32,101]. Similarly, installed base information can help es- 6.2.7. Comparison of methods using data aggregation
timate end-of-life demand for spare parts for consumer products Fig. 13a presents the overall results of the comparisons be-
[103]. On the other hand, the added value of contextual informa- tween the methods that use data aggregation and other techniques
tion can be limited for spare parts that are contained in many dif- that do not use any aggregation strategy (e.g., SES, Croston, SBA,
ferent components and have a highly intermittent demand with TSB). The results indicate that using an aggregation strategy gen-
sudden drops. The commercial aviation parts analyzed in Romei- erates more accurate forecasts than not using such a strategy in
jnders et al. [75] are an example of such data (Fig. 14). They report 66% of the comparisons. Moreover, Fig. 15 shows that aggrega-
that SES performs as well as their proposed method, although the tion over time performs particularly well for the military aviation
22
Fig. 15. Methods including data aggregation—Granular results by data type (accuracy measures).
data, which are characterized by high inter-demand interval vari- We have carried out similar analyses for the judgmental fore-
ance [22]. Aggregation over demand volume is found to perform casting and demand classification techniques; however, we skip
better for automotive data, which are characterized by erratic pat- the discussion of these analyses because they are not very infor-
terns and high demand-volume variance [69]. mative due to an insufficient number of papers.
Petropoulos and Kourentzes [73] also report that combinations
across forecasts derived from transformed frequencies using the
7. Conclusion
same single method or multiple methods is efficient and improves
forecasting performance. For spare parts with a strong hierarchi-
Extensive research has been conducted on spare parts demand
cal structure and lumpy demand patterns, such as those found in
forecasting since the publication of [3] seminal work. This pa-
complex weapons systems, a simple combination of forecasts with
per provides a critical review of this literature and a synthesis
quarterly aggregated data is found to generate the highest accuracy
of these research streams’ findings. We structured the literature
in most cases [71].
into three main categories (time-series forecasting methods, con-
Similar conclusions can be drawn regarding the inventory per-
textual forecasting methods, and comparative studies) and system-
formance of different aggregation strategies, although the results
atically reviewed the papers in each category by organizing them
are less clear. Fig. 13a shows that aggregation strategies result in
into smaller research streams. We then synthesized each research
higher inventory performance in 46% of the comparisons. Various
stream’s findings on the standard and new forecasting methods by
studies report that aggregation leads to higher service levels at a
carrying out a quantitative analysis of the performance results.
lower cost for erratic and lumpy spare parts demand and improves
Our analysis shows that the methods’ performances often vary
the inventory performance of simple methods such as SES [e.g.,
from one industrial dataset to another. Thus, there is no simple an-
68]. We refer the reader to Fig. A.4 in the Appendix for more gran-
swer to the question of when or why a particular intermittent de-
ular results on inventory measure comparisons.
mand forecasting method should be preferred. Furthermore, most
The value of combining forecasts from different methods has
previous research uses forecast accuracy measures as their perfor-
long been studied in the forecasting literature [e.g., [117–119]]. In
mance benchmarks despite inventory performance measures be-
a recent paper summarizing the findings of a well-known fore-
ing more practically relevant. This lack of standardization renders
casting competition, Makridakis et al. [120] states that “The most
a systematic comparison of methods difficult and widens the gap
important finding of the M4 Competition was that all of the top-
between research and practice. More effort is needed to develop a
performing methods, in terms of both point forecasts and pre-
more coherent research perspective that aims to close this gap.
diction intervals, were combinations of mostly statistical meth-
We find that data-intensive methods, such as neural networks
ods, with such combinations being more accurate numerically than
and installed base forecasting, have gained momentum in recent
either pure statistical or pure machine learning methods.” Simi-
years. This impulse, fueled by data analytics and business intel-
lar statements might be valid for spare parts demand forecasting
ligence trends, might evolve into the adoption of algorithmic ap-
as hinted by the papers reviewed above. Thus, more research is
proaches that harness big data and advanced computational tech-
needed to better understand when and how to use a portfolio of
niques, such as deep learning and artificial intelligence. These
methods while forecasting spare parts demand.
methods have been shown to work well in other supply chain
23
management contexts [e.g., 121,122], and testing them in the chal- pline. For example, clustering and anomaly detection algorithms
lenging terrain of spare parts demand forecasting could be a fertile can help to classify data patterns into more granular clusters
research field. and detect outliers. Detecting demand outliers caused by planned
An underresearched topic is the role of judgmental adjustments maintenance tasks in industrial datasets is a difficult problem.
in spare parts demand management. Earlier work suggests that These newer methods can help to better match the data type with
practitioners are more likely to use judgmental methods in con- the most appropriate forecasting method. Also, more research is
texts characterized by high uncertainty [82]. Moreover, a limited needed to better understand the value of forecast combinations
number of papers indicate that systematic judgmental adjustments for spare parts demand forecasting. As evidenced in the forecast-
can improve forecast accuracy and stock control of spare parts. ing literature, such combinations can consistently outperform the
Given the highly uncertain nature of spare parts demand, a more forecasts generated by pure methods [120].
in-depth investigation into how these benefits depend on demand Finally, we believe a few insights would be useful to stream-
characteristics and the adjustment type seems to be a fruitful and line the reporting of future comparative spare parts demand fore-
relevant research avenue. casting papers. Based on our analysis, we conclude that inven-
Another overlooked topic is the link between installed base tory and accuracy measures should be used together to provide a
forecasting and judgmental adjustments. Many B2C companies complete picture of the performances of various forecasting meth-
have a vague idea about the number of end-of-life products for ods. As testified by many studies, accuracy and inventory measures
which they are obligated to provide spare parts after these prod- can lead to different performance results with the same methods
ucts become obsolete. While sales, returns, or warranty informa- and datasets. Also, standard performance measures (as reported in
tion can be used to estimate the installed base’s size, human fore- Figs. 4 and 5) should be preferred. If new measures are introduced,
casters can intervene to adjust these estimates to incorporate ad- they should be used along with the standard measures. The indus-
ditional contextual information. Similar interventions are also pos- trial datasets should be presented with an adequate amount of in-
sible in B2B contexts in which unstructured or unforeseen contex- formation such as the types of industry and spare parts, number
tual information about the installed base requires forecasters to in- of SKUs, number of demand periods, and descriptive statistics. Sta-
tervene with quantitatively generated forecasts. Thus, a potential tistical significance tests should be carried out for all performance
future research direction would be to examine when and how hu- comparisons. For example, a common method of comparing inven-
man forecasters should intervene to incorporate such information tory performances of different methods is to use tradeoff curves;
into installed base forecasting. however, the statistical significance of the differences among these
Interactions between supply chain partners in the spare parts curves is usually missing in comparative studies. Considering that
management context is another future research avenue. Forecast multiple methods often yield close or crossing tradeoff curves, a
collaboration schemes between suppliers and retailers have been more objective comparison procedure is needed.
shown to improve forecast accuracy for retail supply chains [123];
however, there are no studies in the literature that investigate the Appendix
value of information sharing in spare parts supply chains. The re-
search on demand classification can also be extended by bring- Figs. A1–A4
ing in new methodologies from the unsupervised learning disci-
Fig. A.1. Year-by-year evolution of the spare parts demand forecasting literature.
24
Table A.1
Summary of the papers included in the quantitative literature analysis.
Performance comparison† # Datasets Ind. dataset type‡
Paper A B C D E F G H I J L Ind. Sim. Spare Other
Altay et al (2008) x x 1 16 x
Babai et al (2012) x x x x 1 0 x
Babai et al (2014) x x x x x 2 0 x
Babai et al (2019) x x x x x 2 8 x
Boutselis and McNaught (2019) x 0 18
Boylan and Syntetos (2007) x x 0 4
Boylan et al (2008) x 3 0 x
Croston (1972) x 0 1
Dombi et al (2018) x 1 0 x
do Rego and de Mesquita (2015) x x x 1 0 x
Eaves and Kingsman (2004) x x x 1 0 x
Fricker and Goodhart (2000) x x 1 0 x
Ghobbar and Friend (2003) x 1 0 x
Gutierrez et al (2008) x x x x 1 0 x
Hasni et al (2019a) x x 2 0 x x
Hasni et al (2019c) x x 1 24 x
Hellingrath and Cordes (2014) x 1 0 x
Hua et al (2007) x x x 1 0 x
Jiang et al (2020) x x 1 0 x
Kim et al (2017) x 1 0 x
Kocer (2013) x x x 1 0 x
Kourentzes(2013) x x x x x x 0 1
Li and Lim (2018) x x x x x x 1 0 x
Lolli et al (2017) x 1 0 x
Moon et al (2012) x 1 0 x
Mukhopadhyay et al (2012) x x x 1 0 x
Nikolopoulos et al (2011) x x 1 0 x
Pennings et al (2017) x x x x x x x 5 0 x
Petropoulos and Kourentzes (2015) x x x x x 1 0 x
Petropoulos et al (2014) x x x x x 0 1
Petropoulos et al (2016) x x x x x 2 2 x
Porras and Dekker (2008) x x 1 0 x
Prestwich et al (2014) x x x 0 4
Regattieri et al (2005) x 1 0 x
Romeijnders et al (2012) x x x x x x 1 81 x
Sani and Kingsman (1997) x 0 1
Snyder (2002) x x 1 0 x
Snyder et al (2012) x 1 0 x
Syntetos and Boylan (2001) x 0 20
Syntetos and Boylan (2005) x x x 1 0 x
Syntetos and Boylan (2006) x x x 1 0 x
Syntetos et al (2009) x 1 0 x
Syntetos et al (2015b) x x x x 2 0 x x
Teunter and Duncan (2009) x x x x 1 0 x
Teunter and Sani (2009) x x x 0 48
Teunter et al (2011) x x x x x 0 12
Turrini and Meissner (2019) x x x 1 0 x
Van der Auweraer and Boute (2019) x x 0 4
van Wingerden et al (2014) x x x x x x x 2 0 x
Varghese and Rossetti (2008) x x x 1 0 x
Wang and Syntetos (2011) x 0 30
Willemain et al (1994) x 4 12 x x
Willemain et al (2004) x x 9 0 x
Zhou and Viswanathan (2011) x 1 81 x
Zhu et al (2017) x x x 2 3 x
Zhu et al (2020) x x x x x x 2 0 x
†
A: Croston vs. SBA. B: Croston vs. traditional methods. C: SBA vs. traditional methods. D: Croston vs. newer parametric methods. E: SBA
vs. newer parametric methods. F: Willemain vs. newer nonparametric methods. G: parametric vs. nonparametric methods. H: installed
base information included. I: data aggregation. J: judgemental forecasting. L: demand classification. ‡ This column indicates whether the
industrial datasets used in comparative studies include spare parts or another type of data.
25
Fig. A.2. SBA vs. classic methods—Granular results by data type.
26
Fig. A.3. SBA vs. newer methods—Granular results by data type.
27
Fig. A.4. Methods including data aggregation—Granular results by data type (inventory measures).
References [21] Teunter RH, Syntetos AA, Babai MZ. Intermittent demand: linking forecasting
to inventory obsolescence. Eur J Oper Res 2011;214(3):606–15.
[1] Air Transport World. Forecast – 2013 another challenging year. http:// [22] Nikolopoulos K, Syntetos AA, Boylan JE, Petropoulos F, Assimakopoulos V. An
docslide.us/documents/air- transport- world- january- 2013.html; 2013. aggregate-disaggregate intermittent demand approach (ADIDA) to forecasting:
[2] Altay N, Rudisill F, Litteral LA. Adapting Wright’s modification of an empirical proposition and analysis. Journal of the Operational Research So-
Holt’s method to forecasting intermittent demand. Int J Prod Econ ciety 2011;62:544–54.
2008;111:389–408. [23] Hasni M, Babai MZ, Aguir MS, Jemai Z. An investigation on bootstrap-
[3] Croston JD. Forecasting and stock control for intermittent demand. Oper Res ping forecasting methods for intermittent demands. Int J Prod Econ
Q 1972;23:289–303. 2019;209:20–9.
[4] Syntetos AA, Babai Z, Boylan JE, Kolassa S, Nikolopoulos K. Supply chain [24] Eaves AHC, Kingsman BG. Forecasting for the ordering and stock-
forecasting: theory, practice, their gap and the future. Eur J Oper Res holding of spare parts. Journal of the Operational Research Society
2016;252(1):1–26. 2004;55(4):431–7.
[5] Hu Q, Boylan JE, Chen H, Labib A. OR In spare parts management: a review. [25] Mohammadipour M, Boylan JE. Forecast horizon aggregation in inte-
Eur J Oper Res 2018;266(2):395–414. ger autoregressive moving average (inarma)models. Omega (Westport)
[6] Basten R, van Houtum G. System-oriented inventory models for spare parts. 2012;40:703–12.
Surveys in Operations Research and Management Science 2014;19:34–55. [26] Babai MZ, Syntetos A, Teunter R. Intermittent demand forecasting: an
[7] Topan E, Eruguz A, Ma W, van der Heijden M, Dekker R. A review of oper- empirical study on accuracy and the risk of obsolescence. Int J Prod
ational spare parts service logistics in service control towers. Eur J Oper Res Econ 2014;157(0):212–19. The International Society for Inventory Research,
2020;282(2):401–14. 2012.
[8] Boylan JE, Syntetos AA. Spare parts management: a review of forecasting re- [27] Babai MZ, Dallery Y, Boubaker S, Kalai R. A new method to forecast inter-
search and extensions. IMA J Manage Math 2010;21:227–37. mittent demand in the presence of inventory obsolescence. Int J Prod Econ
[9] Bacchetti A, Saccani N. Spare parts classification and demand forecasting for 2019;209:30–41.
stock control: investigating the gap between research and practice. Omega [28] Prestwich SD, Tarim SA, Rossi R, Hnich B. Forecasting intermittent demand by
(Westport) 2012;40(6):722–37. hyperbolic-exponential smoothing. Int J Forecast 2014;30(4):928–33.
[10] Porras E, Dekker R. An inventory control system for spare parts at a refinery: [29] Pennings CLP, van Dalen J, van der Laan EA. Exploiting elapsed time
an empirical comparison of different re-order point methods. Eur J Oper Res for managing intermittent demand for spare parts. Eur J Oper Res
2008;184:101–32. 2017;258(3):958–69.
[11] Teunter R, Duncan L. Forecasting intermittent demand: a comparative study. [30] Willemain TR, Smart CN, Schwarz HF. A new approach to forecasting inter-
Journal of the Operational Research Society 2009;60:321–9. mittent demand for service parts inventories. Int J Forecast 2004;20:375–87.
[12] van Jaarsveld W, Dekker R. Estimating obsolescence risk from demand data to [31] Snyder RD, Ord JK, Beaumont A. Forecasting the intermittent de-
enhance inventory control - a case study. Int J Prod Econ 2011;133:423–31. mand for slow-moving inventories: a modelling approach. Int J Forecast
[13] Pinçe Ç, Frenk J, Dekker R. The role of contract expirations in service parts 2012;28(2):485–96.
management. Production and Operations Management 2015;24(10):1580–97. [32] Jiang A, Tam KL, Guo X, Zhang Y. A new approach to forecasting intermit-
[14] Syntetos A, Boylan J. On the bias of intermittent demand estimates. Int J Prod tent demand based on the mixed zero-truncated poisson model. J Forecast
Econ 2001;71:457–66. 2020;39(1):69–83.
[15] Syntetos AA, Boylan JE. The accuracy of intermittent demand estimates. Int J [33] Hasni M, Aguir MS, Babai MZ, Jemai Z. Spare parts demand forecasting: a
Forecast 2005;21:303–14. review on bootstrapping methods. Int J Prod Res 2019;57(15–16):4791–804.
[16] Levén E, Segerstedt A. Inventory control with a modified Croston procedure [34] Snyder RD. Forecasting sales of slow and fast moving inventories. Eur J Oper
and erlang distribution. Int J Prod Econ 2004;90:361–7. Res 2002;140:684–99.
[17] Boylan JE, Syntetos AA. The accuracy of a modified croston procedure. Int J [35] Varghese V, Rossetti M. A parametric bootstrapping approach to forecast in-
Prod Econ 2007;107:511–17. termittent demand. In: Fowler J, Mason S, editors. Proceedings of the 2008
[18] Teunter R, Sani B. On the bias of Croston’s forecasting method. Eur J Oper Res Industrial Engineering Research Conference; 2008.
2009;194:177–83. [36] Hasni M, Aguir MS, Babai MZ, Jemai Z. On the performance of adjusted
[19] Syntetos AA. Forecasting of intermittent demand. Brunel University-Bucking- bootstrapping methods for intermittent demand forecasting. Int J Prod Econ
hamshire New University, UK; 2001. 2019;216:145–53.
28
[37] Zhou C, Viswanathan S. Comparison of a new bootstrapping method with [70] Boylan JE, Babai MZ. On the performance of overlapping and non-
parametric approaches for safety stock determination in service parts inven- overlapping temporal demand aggregation approaches. Int J Prod Econ
tory systems. Int J Prod Econ 2011;133(1):481–5. 2016;181:136–44.
[38] Smith M, Babai MZ. A review of bootstrapping for spare parts forecasting. Ser- [71] Moon S, Hicks C, Simpson A. The development of a hierarchical forecasting
vice parts management: demand forecasting and inventory control. Altay N, method for predicting spare parts demand in the South Korean Navy - a case
Litteral LA, editors. Springer; 2011. study. Int J Prod Econ 2012;140:794–802.
[39] Efron B. Bootstrap methods: another look at the jackknife. Ann Stat [72] Li C, Lim A. A greedy aggregation–decomposition method for intermittent de-
1979;7:1–26. mand forecasting in fashion retailing. Eur J Oper Res 2018;269(3):860–9.
[40] Bookbinder JH, Lordahl AE. Estimation of inventory re-order levels using the [73] Petropoulos F, Kourentzes N. Forecast combinations for intermittent demand.
bootstrap statistical procedure. IIE Trans 1989;21(4):302–12. Journal of the Operational Research Society 2015;66(6):914–24.
[41] Fricker JrRD, Goodhart CCA. Applying a bootstrap approach for setting reorder [74] Zhu S, van Jaarsveld W, Dekker R. Spare parts inventory control
points in military supply systems. Nav Res Logist 20 0 0;47:459–78. based on maintenance planning. Reliability Engineering & System Safety
[42] Mosteller F, Tukey JW. Data analysis and regression: a second course in statis- 2020;193:1–10.
tics. Reading, MA: Addison-Wesley; 1977. [75] Romeijnders W, Teunter R, Jaarsveld WV. A two-step method for forecasting
[43] Kocer UU. Forecasting intermittent demand by Markov chain model. In- spare parts demand using information on component repairs. Eur J Oper Res
ternational Journal of Innovative Computing, Information and Control 2012;220:386–93.
2013;9(8):3307–18. [76] Makridakis S, Wheelwright SC, Hyndman RJ. Forecasting. methods and appli-
[44] Van Wingerden E, Basten R, Dekker R, Rustenburg W. More grip on inventory cations. third. John Wiley and Sons; 1998.
control through improved forecasting: a comparative study at three compa- [77] Goodwin P. Integrating management judgment and statistical methods to im-
nies. Int J Prod Econ 2014;157:220–37. prove short-term forecasts. Omega (Westport) 2002;30:127–35.
[45] Zhu S, Dekker R, van Jaarsveld W, Renjie RW, Koning AJ. An improved method [78] Fildes R, Goodwin P. Against your better judgment? how organizations can
for forecasting spare parts demand using extreme value theory. Eur J Oper improve their use of management judgment in forecasting. INFORMS Journal
Res 2017;261(1):169–81. on Applied Analytics 2007;37(6):570–6.
[46] Gutierrez RS, Solis AO, Mukhopadhyay S. Lumpy demand forecasting using [79] Turner DS. The role of judgment in macroeconomic forecasting. J Forecast
neural networks. Int J Prod Econ 2008;111(2):409–20. Special section on Sus- 1990;9:315–45.
tainable Supply Chains. [80] Mathews BP, Diamantopoulos A. Managerial intervention in forecasting. an
[47] Mukhopadhyay S, Solis AO, Gutierrez RS. The accuracy of non-traditional ver- empirical investigation of forecast manipulation. International Journal of Re-
sus traditional methods of forecasting lumpy demand: accuracy of lumpy de- search in Marketing 1986;3(1):3–10.
mand forecasting methods. J Forecast 2012;31(8):721–35. [81] Mathews BP, Diamantopoulos A. Judgemental revision of sales forecasts: the
[48] Kourentzes N. Intermittent demand forecasts with neural networks. Int J Prod relative performance of judgementally revised versus non-revised forecasts. J
Econ 2013;143(1):198–206. Forecast 1992;11(6):569.
[49] Lolli F, Gamberini R, Regattieri A, Balugani E, Gatos T, Gucci S. Single-hidden [82] Sanders NR, Manrodt KB. The efficacy of using judgmental versus quantitative
layer neural networks for forecasting intermittent demand. Int J Prod Econ forecasting methods in practice. Omega (Westport) 2003;31:511–22.
2017;183:116–28. [83] Franses PH, Legerstee R. Do experts’ adjustments on model-based SKU-level
[50] Guo F, Diao J, Zhao Q, Wang D, Sun Q. A double-level combination approach forecasts improve forecast quality? J Forecast 2009;36.
for demand forecasting of repairable airplane spare parts based on turnover [84] Petropoulos F, Fildes R, Goodwin P. Do ‘big losses’ in judgmental adjust-
data. Computers & Industrial Engineering 2017;110:92–108. ments to statistical forecasts affect experts’ behaviour? Eur J Oper Res
[51] Williams TM. Stock control with sporadic and slow-moving demand. J Oper 2016;249(3):842–52.
Res Soc 1984;35:939–48. [85] Pennings CLP, van Dalen J, Rook L. Coordinating judgmental forecasting: cop-
[52] Willemain TR, Smart CN, Shockor JH, DeSautels PA. Forecasting intermittent ing with intentional biases. Omega (Westport) 2019;87:46–56.
demand in manufacturing: a comparative evaluation of croston’s method. Int [86] Van den Broeke M, De Baets S, Vereecke A, Baecke P, Vanderheyden K. Judg-
J Forecast 1994;10:529–38. mental forecast adjustments over different time horizons. Omega (Westport)
[53] Johnston FR, Boylan JE. Forecasting for items with intermittent demand. J 2019;87:34–45.
Oper Res Soc 1996;47:113–21. [87] Croson R, Schultz K, Siemsen E, Yeo ML. Behavioral operations: the state of
[54] Syntetos A, Boylan J, Croston J. On the categorization of demand patterns. the field. J Oper Manage 2013;31(1–2):1–5.
Journal of the Operational Research Society 2005;56:495–503. [88] Arvan M, Fahimnia B, Reisi M, Siemsen E. Integrating human judge-
[55] Kostenko A, Hyndman R. A note on the categorization of demand patterns. J ment into quantitative forecasting methods: a review. Omega (Westport)
Oper Res Soc 2006;57(10). 2019;86:237–52.
[56] Boylan JE, Syntetos AA, Karakostas GC. Classification for forecasting and stock [89] Syntetos AA, Nikolopoulos K, Boylan JE, Fildes R, Goodwin P. The effects of
control: a case study. J Oper Res Soc 2008;59(4):473–81. integrating management judgement into intermittent demand forecast. Int J
[57] Syntetos AA, Babai MZ, Lengu D, Altay N. Distributional assumptions for para- Prod Econ 2009;118:72–81.
metric forecasting of intermittent demand. Service parts management: de- [90] Boutselis P, McNaught K. Using bayesian networks to forecast spares demand
mand forecasting and inventory control. Altay N, Litteral LA, editors. Springer; from equipment failures in a changing service logistics context. Int J Prod
2011. Econ 2019;209:325–33.
[58] do Rego JR, de Mesquita MA. Demand forecasting and inventory con- [91] Syntetos AA, Nikolopoulos K, Boylan JE. Judging the judges through accu-
trol: asimulation study on automotive spare parts. Int J Prod Econ racy-implication metrics: the case of inventory forecasting. Int J Forecast
2015;161(0):1–16. 2010;26(1):134–43.
[59] Ghobbar AA, Friend CH. Sources of intermittent demand for aircraft spare [92] Fildes R, Goodwin P, Lawrence M, Nikolopoulos K. Effective forecasting and
parts within airline operations. Journal of Air Transport Management judgemental adjustments: an empirical evaluation and strategies for improve-
2002;8:221–31. ment in supply chain planning. Int J Forecast 2009;25:3–23.
[60] Ghobbar AA, Friend CH. Evaluation of forecasting methods for intermittent [93] Syntetos AA, Georgantzas NC, Boylan JE, Dangerfield BC. Judgement and sup-
parts demand in the field of aviation: a predictive model. Computer and Op- ply chain dynamics. J Oper Res Soc 2011;62(6):1138–58.
erations Research 2003;30:2097–114. [94] Eksoz C, Mansouri A, Bourlakis M, Önkal D. Judgmental adjustments through
[61] Regattieri A, Gamberi M, Gamberini R, Manzini R. Managing lumpy de- supply integration for strategic partnerships in food chains. Omega (West-
mand for aircraft spare parts. Journal of Air Transport Management port) 2019;87:20–33.
2005;11:426–31. [95] Goodwin P, Moritz B, Siemsen E. Forecast decisions. In: The Handbook of
[62] Costantino F, Gravio GD, Patriarca R, Petrella L. Spare parts management for Behavioral Operations. Hoboken, NJ, USA: John Wiley & Sons, Inc.; 2018.
irregular demand items. Omega (Westport) 2018;81:57–66. p. 433–58.
[63] Lengu D, Syntetos AA, Babai MZ. Spare parts management: linking [96] Perera HN, Hurley J, Fahimnia B, Reisi M. The human factor in supply chain
distributional assumptions to demand classification. Eur J Oper Res forecasting: asystematic review. Eur J Oper Res 2019;274(2):574–600.
2014;235(3):624–35. [97] Dekker R, Pinçe C, Zuidwijk R, Jalil MN. On the use of installed base infor-
[64] Turrini L, Meissner J. Spare parts inventory management: new evidence from mation for spare parts logistics: a review of ideas and industry practice. Int J
distribution fitting. Eur J Oper Res 2019;273(1):118–30. Prod Econ 2013;143:536–45.
[65] Petropoulos F, Makridakis S, Assimakopoulos V, Nikolopoulos K. [98] Cohen M, Kamesam PV, Kleindorfer P, Lee H, Tekerian A. Optimizer: IBM’s
‘Horses for courses’ in demand forecasting. Eur J Oper Res multi-echelon inventory system for managing service logistics. Interfaces
2014;237(1):152–63. (Providence) 1990;20(1):65–82.
[66] Smith M, Dekker R. On the (s-1,s) stock model for renewal demand processes. [99] Petrović D, Petrović R. SPARTA II: Further development in an expert system
Probab Eng Inf Sci 1997;11(3):375–86. for advising on stocks of spare parts. Int J Prod Econ 1992;24:291–300.
[67] Syntetos AA, Babai MZ, Luo S. Forecasting of compound erlang demand. J [100] Aronis K, Magou I, Dekker R, Tagaras G. Inventory control of spare parts using
Oper Res Soc 2015;66(12):2061–74. a bayesian approach: a case study. Eur J Oper Res 2004;154(3):730–9.
[68] Babai MZ, Ali MM, Nikolopoulos K. Impact of temporal aggregation on stock [101] Hua Z, Zhang B, Yang J, Tan D. A new approach of forecasting intermittent
control performance of intermittent demand estimators: empirical analysis. demand for spare parts inventories in the process industries. Journal of the
Omega (Westport) 2012;40(6):713–21. Special Issue on Forecasting in Man- Operational Research Society 2007;58(1):52–61.
agement Science. [102] Wang W, Syntetos AA. Spare parts demand: linking forecasting to equipment
[69] Petropoulos F, Kourentzes N, Nikolopoulos K. Another look at estimators for maintenance. Transportation Research Part E: Logistics and Transportation Re-
intermittent demand. Int J Prod Econ 2016;181:154–61. view 2011;47(6):1194–209.
29
[103] Kim TY, Dekker R, Heij C. Spare part demand forecasting for consumer [114] Sani B, Kingsman BG. Selecting the best periodic inventory control and
goods using installed base information. Computers & Industrial Engineering demand forecasting methods for low demand items. J Oper Res Soc
2017;103:201–15. 1997;48(7):700–13.
[104] Dombi J, Jónás T, Tóth ZE. Modeling and long-term forecasting demand in [115] Strijbosch L, Heuts R, van der Schoot E. A combined forecast-inventory con-
spare parts logistics businesses. Int J Prod Econ 2018;201:1–17. trol procedure for spare parts. Journal of the Operational Research Society
[105] Van der Auweraer S, Boute R. Forecasting spare part demand using service 20 0 0;51:1184–92.
maintenance information. Int J Prod Econ 2019;213:138–49. [116] Wright DJ. Forecasting data published at irregular time intervals using an ex-
[106] Andersson J, Jonsson P. Big data in spare parts supply chains: the potential tension of holt’s method. Manage Sci 1986;32(4):499–510.
of using product-in-use data in aftermarket demand planning. International [117] Bates JM, Granger CWJ. The combination of forecasts. J Oper Res Soc
Journal of Physical Distribution & Logistics Management 2018;48(5):524–44. 1969;20(4):451–68.
[107] Van der Auweraer S, Boute RN, Syntetos AA. Forecasting spare part demand [118] Makridakis S, Winkler RL. Averages of forecasts: some empirical results. Man-
with installed base information: a review. Int J Forecast 2019;35(1):181–96. age Sci 1983;29(9):987–96.
[108] Hellingrath B, Cordes A-K. Conceptual approach for integrating condition [119] Claeskens G, Magnus JR, Vasnev AL, others. The forecast combination puzzle:
monitoring information and spare parts forecasting methods. Production & a simple theoretical explanation. Int J Forecast 2016;32(3):754–62.
Manufacturing Research 2014;2(1):725–37. [120] Makridakis S, Spiliotis E, Assimakopoulos V. The M4 competition: 10 0,0 0 0
[109] Hyndman RJ, Koehler AB. Another look at measures of forecast accuracy. Int J time series and 61 forecasting methods. Int J Forecast 2020;36(1):54–74.
Forecast 2006;22(4):679–88. [121] Baryannis G, Validi S, Dani S, Antoniou G. Supply chain risk management and
[110] Wallström P, Segerstedt A. Evaluation of forecasting error measurements and artificial intelligence: state of the art and future research directions. Int J Prod
techniques for intermittent demand. Int J Prod Econ 2010;128(2):625–36. Res 2019;57(7):2179–202.
[111] Prestwich S, Rossi R, Armagan Tarim S, Hnich B. Mean-based error measures [122] Kraus M, Feuerriegel S, Oztekin A. Deep learning in business analytics and
for intermittent demand forecasting. Int J Prod Res 2014;52(22):6782–91. operations research: models, applications and managerial implications. Eur J
[112] Syntetos AA, Boylan JE. On the stock control performance of intermittent de- Oper Res 2020;281(3):628–41.
mand estimators. Int J Prod Econ 2006;103:36–47. [123] Trapero JR, Kourentzes N, Fildes R. Impact of information exchange on sup-
[113] Syntetos AA, Zied Babai M, Gardner ES. Forecasting intermittent inven- plier forecasting performance. Omega (Westport) 2012;40:738–47.
tory demands: simple parametric methods vs. bootstrapping. J Bus Res
2015;68(8):1746–52.
30

Omega: Çera G Pinçe, Laura Turrini, Joern Meissner

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Omega: Çera G Pinçe, Laura Turrini, Joern Meissner

Uploaded by

Copyright:

Available Formats

Omega 105 (2021) 102513

Contents lists available at ScienceDirect

Intermittent demand forecasting for spare parts: A Critical review ✩

1. Introduction and stock control performance, we exclude spare parts inventory

Fig. 1. Evolution of spare parts demand forecasting literature.

Fig. 2. Literature review framework.

commonly used in the literature and reviews the comparative Table 1

Average on-hand inventory 477 456

Inventory management for any item begins with estimating the

Fig. 3. Exponential smoothing and Croston.

Table 2 sures to give a sense of a particular method’s overall performance.

Fig. 4. Forecast accuracy measures used in spare parts demand forecasting.

Holt, Winter) according to their MAPEs with data obtained from

Better Performance Scores

6. Quantitative literature analysis

Fig. 7. Croston vs. SBA–Granular results by data type† .

Average Percentage Better (APB)

Accuracy measures Industrial data 43.8% 52.1% 4.1% 18

Average Percentage Better (APB)

Accuracy measures Industrial data 57.3% 41.0% 1.7% 15

Fig. 8. Croston vs. traditional methods—Granular results by data type.

Average Percentage Better (APB)

Accuracy measures Industrial data 18.0% 81.6% 0.4% 8

Accuracy measures Industrial data 55.3% 36.7% 8.0% 7

Fig. 9. Croston vs. newer methods—Granular results by data type .

Fig. 10. Performance of Willemain, nonparametric, and parametric methods.

Fig. 12. Parametric vs. nonparametric methods—Granular results by data type.

Performance comparison† # Datasets Ind. dataset type‡

Paper A B C D E F G H I J L Ind. Sim. Spare Other

Fig. A.2. SBA vs. classic methods—Granular results by data type.

Fig. A.3. SBA vs. newer methods—Granular results by data type.

You might also like