A New Method Based On Machine Learning To Forecast Fruit Yield Using Spectrometric Data: Analysis in A Fruit Supply Chain Context

Precision Agriculture
https://doi.org/10.1007/s11119-022-09947-7
A new method based on machine learning to forecast fruit

yield using spectrometric data: analysis in a fruit supply
chain context
Javier E. Gómez‑Lagos1 · Marcela C. González‑Araya2 · Rodrigo Ortega Blu3 ·

Luis G. Acosta Espejo3
Accepted: 25 July 2022

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022
Abstract
The fruit supply chain (FSC) involves different stages that need to be planned at least two
months in advance. Therefore, having a good fruit yield forecast with anticipation allows
making timely correct decisions for providing the resources, transport, and cold stor-
age contracts, among others. Therefore, fruit yield over or underestimation could cause
important inefficiencies with regards to FSC. Because of its relevance, a method based
on machine learning (ML) techniques that uses spectrometric vegetation data is pro-
posed. This method, known as Spectrometry Based Method for Fruit Production Forecast
(SBM-Fruit), allows exploring the georeferenced Normalized Difference Vegetation Index
(NDVI), collected in different phenological stages, aiming to capture spatial and temporal
dependency in the fruit yield forecast. In the first step of SBM-Fruit, several clusters are
obtained in a clustering process using the georeferenced NDVI in all phenological stages
as input, while, in the second step, two validation functions are used for determining the
best clustering. Finally, in the third step, the predictor variables of the best clustering are
incorporated into an artificial neural network (ANN) for predicting the fruit yield. The
SBM-Fruit was applied to forecast table grape yield of an orchard located in the Valparaíso
Region, Chile. The results show fruit yield estimations with mean errors around 0.013 per-
cent for every spatial zone of the orchard, forecasted at least two months in advance. The
use of the SBM-Fruit would allow FSC stakeholders to make better decisions, improving
the coordination of the FSC stages, and reducing costs and fruit losses.
Keywords Fruit yield forecast · Normalized Difference Vegetation Index · Machine

learning · Artificial neural network · Spatial Fuzzy c-Means algorithm
* Marcela C. González‑Araya
mgonzalez@utalca.cl
Extended author information available on the last page of the article
13
Vol.:(0123456789)
Introduction
The fruit supply chain (FSC) involves different stages as planting, harvesting, processing,
and packaging, handling and storage, distribution and retail (Soto-Silva et al., 2016). One
of the first stages that is usually considered for making tactical decisions is harvesting,
where the mature fresh fruit is collected. This process involves different decisions about
resources and activities to be carried out to obtain a fresh produce of good quality. For
this reason, temporal, spatial and individual data of the orchards needs to be analyzed and
combined for estimating the quantity and variability of fresh fruit production, aiming to
have the proper resources during the whole harvest season. This objective is aligned with
the ones stated in the definition of Precision Agriculture (International Society of Precision
Agriculture, 2022). Furthermore, the estimation of fresh fruit production allows manag-
ers to coordinate other activities in the following stages of the FSC, as the number of cold
chambers to be used and their refrigeration technology, plant capacity, number of shifts to
work in the processing plants, number of refrigerated transports, among others. Despite the
importance of this estimation, according to Anderson et al. (2019), the forecast error for
the fruit production estimation is around 10% when traditional methods are used, i.e., the
simple random sample method. However, at the field level, estimation errors can be higher
(> 30%), depending on the crop. This error could increase resource costs, fruit losses,
packaging waste, and decrease fruit quality. One of the difficulties to forecast fruit yield is
its dependence on uncertain factors, as weather, soil characteristics, and orchard manage-
ment. For this reason, different methods have been proposed in the literature to improve the
fruit yield forecast. These methods have dealt with diverse techniques as lineal regression
(Mihai & Florin, 2016; Ye et al., 2007, 2008), computer vision (Anderson et al., 2019;
Farooque et al., 2013; Koirala et al., 2019; Stein et al., 2016), and machine learning (ML)
(Bose et al., 2016; Koller & Upadhyaya, 2005; Pantazi et al., 2016). The studies that use
regression analysis seek to represent a linear relationship among predictor variables and
the fruit yield. In computer vision, fruit field images are collected through machine vision
and processed with a ML method, aiming to predict the fruit yield. On the other hand, ML
methods use spectrometry data, collected through satellites or drones equipped with sen-
sors, and are analyzed for making fruit yield forecast.
The previous studies, which used spectrometric data with ML methods for estimating
fresh produce yield, have only considered the temporal correlation of these data. However,
there are not any published studies that also consider their spatial correlation. Therefore,
in this study, this spatial dependence is incorporated in a new method, called Spectrom-
etry Based Method for Fruit Production Forecast (SBM-Fruit), aiming to improve the fruit
yield prediction at least two months in advance. The SBM-Fruit is based on ML techniques
for analyzing a georeferenced vegetation index (NDVI), collected at different phenological
stages (periods). Similarly, this technique uses classification methods to cluster the NDVI
to obtain predictor variables that capture the spatial and temporal NDVI dependency. The
obtained variables are used in a prediction algorithm, aiming to forecast fruit yield. There-
fore, the SBM-Fruit allows estimating the fruit yield to be harvested at each georeferenced
tree in an orchard minimizing the forecast error. In this way, better tactical and operational
harvest planning decisions could be made by producers, improving the coordination with
following stages of the FSC, and reducing costs and fruit losses. The SBM-Fruit was
applied to estimate table grape yield in an orchard located in the Valparaíso Region, Chile.
This article is structured as follows. In “Literature review” Section, a literature review
about methods that use spectrometric data for estimating fresh produce is presented. This
13
section also describes the impacts of the fruit yield forecast considering different decisions
in the FSC. “Materials and methods” Section presents the steps of the SBM-Fruit for esti-
mating fruit yield. In “Case study” Section, the results obtained by the SBM-Fruit in a
case study of table grape are shown. “Managerial insights” Section discusses the manage-
rial insights of a good fruit yield forecast. Finally, “Conclusions and future research” Sec-
tion presents the main conclusions and the needs for future research.
Literature review
A literature review is carried out about studies, considering both foundation and recent
papers, that use spectrometric data for forecasting fresh produce yield and the impacts in
the fruit supply chain coordination, if deviations of this forecast occur.
Review of studies that use spectrometric data for estimating fresh produce yield
Spectrometric data collected from the crop canopy have been used in the literature for esti-
mating crop yield (wheat, maize, grassland, tomato, citrus, among others), because this
information is necessary for making planning decisions in the supply chain. These data
correspond to the canopy’s reflectance to different wavelengths usually from the visible
ones to near infrared (NIR). From these data, several vegetation indices such NDVI are
estimated. In this sub-section, a literature review of techniques that use spectrometric data
for forecasting fresh produce yield is presented. In order to facilitate the classification of
the reviewed literature, the nomenclature used for the spectrometric indices, forecasting
techniques and error measures is shown in Table 1.
In Table 2, a summary of the reviewed articles is presented.
In the reviewed literature (Table 2), it is possible to observe a large diversity in the pre-
diction methods used, showing that yield prediction is an evolving research line. One of the
first studies that used spectrometric data for estimating a fresh produce yield was Koller
& Upadhyaya (2005). In addition, several pieces of research are still required for forecast-
ing different crops, since every study has focused on the use of one ancillary variable that
is suitable only for the analyzed species. For example, for the estimation of citrus fruit
yield (Ye et al., 2008), the variable “canopy size” was used. Therefore, when other species
need to be predicted, “canopy size” is not necessarily an appropriate variable, and other
variables need to be collected. In Table 3, spectrometric indices and techniques used in the
analyzed literature review are presented.
As can be seen in Table 3, the most used spectrometric index to predict crop yield is
NDVI (12 articles). This could be explained by the large availability of the NDVI from sat-
ellite data as well as the fact that it can be calibrated to changing conditions such as topog-
raphy, cloudiness, shadows, and atmospheric condition (Huete et al., 1999). In this way, it
could intrinsically integrate external factors as predictor variables. Moreover, it can be used
to construct temporary and seasonal profiles of an orchard vegetation (Huete et al., 1999).
On the other hand, the most used technique is ANN, which is well-suited for fitting nonlin-
ear relationships and has been successfully used for crop yield prediction. (Fernandes et al.,
2017). In this literature review, it can be observed that methods used with spectrometric
data to predict crop yield have not addressed the spatial dependence on data. The hypothe-
sis of this study is that the crop yield of a given area in a fruit orchard can be estimated not
only by using the temporal correlation of the observed spectrometric data, but also their
13

13
Table 1 Nomenclature for classifying the reviewed literature
Spectrometric indices Forecasting techniques Error measures
Index Abbreviation Technique Abbreviation Measure Abbreviation
Enhanced vegetation index 2 EVI2 Adaptive-neuro fuzzy inference systems ANFIS Absolute Error AE
Land surface temperature LST Artificial neural network ANN Goodness of fit for regression analysis R2
Landsat modified soil adjusted vegetation MSAVI Back propagation neural network BPNN Mean error ME
index
Leaf area index LAI Correlation COR Mean square error MSE
Near infrared spectroscopy NIR Multi linear regression MLR Relative root mean square error RRMSE
NIR-Red-Green RGB Principal component analysis PCA Root mean square error RMSE
Normalized difference burn ratio NDBR Regression model RM
Normalized difference vegetation index NDVI Spiking neural network SNN
Normalized difference water index NDWI Supervised Kohonen networks SKNs
Optimized soil adjusted vegetation index OSAVI Support vector machine SVM
Photochemical reflectance index PRI XY-fused networks XY-F
Simple ratio SR
Soil adjusted vegetation index SAVI
Two-band vegetation index TBVI
Table 2 Key aspects of the reviewed articles about spectrometry data for estimating fresh produce yield
Iden- References Contribution Fresh produce Error measure*
tifier
1 Koller & Upadhyaya (2005) The LAI index, jointly with external data as environmental parameters, soil, and crop charac- Tomato MSE = 6,3%
teristics, were used for estimating tomato production
2 Ye et al. (2007) A demonstration about the correlation between canopy features and the yield of citrus trees Citrus fruits RMSE = 124.3–233.8
was presented using different spectrometric data. In addition, PLS models were proposed RRMSE = 0.5354–1.136
to explore the potential of predicting citrus yield from airborne hyperspectral imagery and
illustrated their good performance
3 Ye et al. (2008) An evaluation of different spectrometric indices for estimating citrus fruits production was Citrus fruits RRMSE = 0.6071
carried out, where the TBVI combined with the canopy size presented the best prediction
4 Panda et al. (2010) A comparison between spectrometric indices was carried out to estimate crop yield using BPNN Corn R2 = 0.72
5 Ortega et al. (2012) Use of cluster regression for yield prediction using NDVI Wine grape R2 = 0.55 -0.93
6 Mihai and Florin (2016) A comparison between the NDVI and NDBR indices was presented, where the most accurate Maize RMSE = 1.446–12.178
prediction was obtained using the NDBR index
7 Pantazi et al. (2016) A comparison of three artificial intelligence algorithms for predicting wheat yield was pre- Wheat AE = 9–19.08%
sented, where the best results were obtained by SKN for the low category of yield
8 Bose et al. (2016) An analysis of the SNN algorithm for forecasting winter wheat production was carried out, Winter wheat AE = 3–26%
concluding its suitability for this estimation
9 Ali et al. (2017) A comparison of ANFIS with two methods for predicting grassland production was Grassland RMSE = 11.07
presented. In addition, the potential of using long time series in the analyzed prediction
methods was discussed
10 Sun et al. (2017) NDVI and LAI were used for estimating wine grape yield through correlation techniques Wine grape RE = 5.9%—14–8%
11 Fernandes et al. (2017) ANN and NDVI index were used in order to forecast sugarcane yield three months before the Sugarcane RRMSE = 6.8%
harvest RMSE = 5.7 t ha−1
12 Ahmad et al. (2018) The PCA was used to estimate maize production with a RMSE of 255 kg h−1 Maize RMSE = 255 kg h−1
13 Uribeetxebarria et al. (2019) A comparison of RSS and SRS show that RSS improve sampling in fruit growing Peach AE = 10%
14 Bai et al. (2019) ANN and ML methods were used for jujube yield prediction. In addition, different spectro- Jujube R2 = 0.67–0.85
metric indices collected in different growing stages were analyzed
15 Stateras and Kalivas (2020) The NDVI index and tree information, as volume of tree crown, were used for estimating Olive ME = 0.27 kg/tree
olive yield through MLR RMSE = 8.21 kg/tree
13

13
Table 3 Spectrometry indices and methods used for estimating fresh produce yield
Iden- Spectrometric Indices Forecasting Techniques
tifier
EVI2 GVI LAI LST MSAVI NDBR NDVI NDWI NIR OSAVI PRI PVI RGB SAV1 SR TBVI ANFIS ANN BPNN COR MLR PCA RM SKNs SNN SVM XY-F
1 X X
2 X X X X X
3 X X X
4 X X X X X
5 X X
6 X X X
7 X X X X
8 X X
9 X X X X X X X X
10 X X X
11 X X
12 X X X
13 X
14 X X X X X X
15 X X
Total 2 1 2 1 1 1 12 1 2 1 1 1 1 3 1 1 1 5 1 1 4 1 2 1 1 1 1
spatial correlation. The incorporation of this spatial dependence could help improve crop
yield prediction with more anticipation. Moreover, eventually it could be possible to reduce
the data collection periods since the behavior of a given area within an orchard would be
represented at the same time by its historical and spatial performance. On the other hand,
in order to include the spatial correlation (dependence) in a ML method, it is necessary to
increase the closeness of the collected data and, consequently, the sampling intensity. Sen-
sor data, collected from different platforms, complies very well with this requirement.
In the following sub-section, the impacts of fruit yield over or under-estimation in the
FSC stages are described.
Deviation impacts of the fruit yield forecast in the fruit supply chain coordination
The FSC involves different stages or echelons for achieving the consumption markets as
agricultural practices and harvesting, consolidation and cold-chain entry, processing and/
or packing, transportation and logistics, final distribution logistics (Soto-Silva et al., 2016;
Villalobos et al., 2019). In the agricultural practices and harvesting stage, the fruit yield
needs to be estimated by the orchard managers and/or exporting companies many months
in advance of the harvest season. With this information, it will be possible to coordinate
activities in the next stages of the FSC and, therefore, maximize fruit packaging and
improve efficiency. An over or under-estimation of fruit yield will have different conse-
quences in every stage of the FSC as described in Table 4.
As mentioned in Table 4, the under-estimation of fruit yield would cause fruit losses
in almost all the stages of the FSC, because of the lack of available resources for the col-
lected fruit (Catalá et al., 2016; Negi et al., 2015; van Dyk & Maspero, n.d.). On the other
hand, an over-estimation of fruit yield would increase the costs associated to the available
resources.
In the agricultural practices and harvesting stage, tactical and operational harvest plans
must be carried out, being necessary to estimate the number of machine hours, workers,
and harvest materials. Different mathematical programming models to support tactical fruit
harvest plan decisions have been proposed in the literature (Bohle et al., 2010; Caixeta-
Filho, 2006; Ferrer et al., 2008; Gómez-Lagos et al., 2021; González-Araya et al., 2015;
Herrera-Cáceres et al., 2017; Soto-Silva et al., 2017; Varas et al., 2020). In most of these
models, the available fruit quantity in each block corresponds to a parameter. In this way,
deviations from this quantity estimation would generate errors in the necessary machine
hours, materials, and/or number of workers to be hired, and assigned to the blocks, having
as consequences fruit losses and cost increases. These losses would occur because the fruit
could be not harvested, or harvested without the required maturity, or harvested and stored
for a long time in a bin waiting to be picked up from a block, exposed to high temperatures
that could deteriorate its quality.
In the transportation and logistics stage, the managers schedule the truck arrival to the
plants or/and cold storage facilities. Consequently, deviations from the fruit yield would
impact mainly on the required number of truck trips, less-than-truck loads, and congestion
in the receiving areas and docks of cold storages or processing plants. These situations
could generate fruit losses because the fruit could travel and wait for long periods of time
before being stored in a cold chamber. The truck arrival scheduling problem for agricul-
tural products has been previously described (Lamsal et al., 2016).
In the consolidation and cold-chain entry stage, decisions related to store the fruit in
cold chambers need to be made. These decisions involve where to store the fresh fruit,
13
according to the required refrigeration technology and process destination (fresh, frozen,
juice, canned, among others). Deviations from the fruit yield would lead to a fruit misal-
location, lack of cold chambers, and idle cold storage capacity. The fruit losses in this stage
are generated for storing fruit in cold chambers without the required refrigeration technol-
ogy, or not having the required number of cold chambers during a processing season. More
details of this stage are described in (Soto-Silva et al., 2017).
In the processing and/or packing stage, a material requirement plan must be carried out,
which is based on the fruit yield and size forecasts. It is important to notice that the fruit
size estimation is based on the fruit yield forecast, because they are correlated (de Salva-
dor et al., 2006). For this reason, deviations from the fruit yield could lead to deviations
from the fruit size, having impact on the required packaging materials. Every packaging
container is designed for keeping a specific fruit size (Moreda et al., 2009). Therefore,
deviations from the fruit yield and, thus, from fruit size, would cause the scheduling of
an unsuitable purchase of packing materials. Moreover, these materials cannot be stored
for the following season because they deteriorate quickly (for example, paperboard boxes),
becoming waste material.
As described previously, the estimation of fruit yield impacts every stage of the FSC.
However, it is difficult to estimate yield with an adequate accuracy, at least two months
in advance. Usually, the contracts for having the resources available during a harvest sea-
son, need to be done at least two months before this period. In all these contracts, the
fruit yield forecast becomes relevant information for defining the required quantities (see
Table 4). Therefore, obtaining an accurate fruit yield forecast, at least two months before
the beginning of the harvest season, will allow to plan, in a better way, the required number
of machine hours, workers, harvest materials, trucks for fruit transport, number of cold
chambers and the related refrigeration technology, among others. In addition, fruit yield
could vary greatly from season to season because of controllable and uncontrollable vari-
ables. The controllable variables refer to the management practices, while the uncontrol-
lable variables refer to weather changes. For the reasons explained previously, in the next
section, a method for estimating a more accurate fruit production many months in advance
is proposed.
Materials and methods
In this section, the method for forecasting fruit yield, called Spectrometry Based Method
for Fruit Production Forecast (SBM-Fruit), is described. This method has three distinc-
tive steps. In the first step, a Spatial Fuzzy c-Means algorithm developed by Chuang et al.
(2006) is applied for clustering the spectrometric (NDVI) data. These data must be col-
lected from different phenological stages of the crop. In the second step, the obtained clus-
ters are evaluated using two validation functions to select the best cluster configuration.
In the third step, the information of the selected clustering is used for estimating the fruit
yield in the orchard using an ANN. This method is based on the three-step method pro-
posed by Gómez-Lagos et al. (2019), which was used to estimate future NDVI maps from
previous ones during a growing season. The main differences between these methods are
the inclusion of spatial information for carrying out the clustering step, the use of a new
validation function for selecting a cluster, and the fruit yield as the output variable.
13
Table 4 Consequences of yield prediction inaccuracies regarding FSC decisions

FSC stage Decisions Consequences of inaccuracies in the fruit yield forecast
Over-estimation Under-estimation
Agricultural practices and harvesting Definition of the required machine hours Idle machine hours Extra hours of machine drivers and fruit
losses
Definition of the number of workers to hire Idle workers Lack of workers and fruit losses
Transportation and logistics Definition of the number and kind of trucks Idle trucks and less-than-truckload Lack of trucks for fruit transport, increase of
to hire truck trips, and fruit losses
Scheduling truck arrival to the processing Underutilization of plant docks Congestion at the receiving areas of process-
plants or cold storages ing plants or cold storages and fruit losses
Consolidation and cold-chain entry Definition of the number and type of cold Idle cold chambers’ capacity Lack of cold chambers and fruit losses
chambers to lease
Processing and/or packing Schedule of the fruit process Idle plant capacity Lateness of fruit delivery and fruit losses
Definition of the number of workers to hire Idle workers Lack of workers and fruit losses
Materials requirement plan Depending on the fruit size, some stock shortages and overstock of different packaging
materials will be observed. This situation will cause inefficiencies that can increase
costs
13
In the following sub-sections, a detailed description of every step of the proposed

method is presented.
Clustering using the Spatial Fuzzy c‑Means algorithm
The purpose of a clustering problem is to find groups with similar characteristics within a
data set (Ghaemi et al., 2009). For performing the fruit yield forecast, it is necessary to find
the zones of an orchard that present have similar characteristics. This can be done by using
the Spatial Fuzzy c-Means algorithm (Chuang et al., 2006), which is an extension of the
Fuzzy c-Means algorithm (Bezdek, 1984). This algorithm seeks to find a feasible solution
for the Fuzzy c-Means model (Bezdek, 1984), using spatial information as well, and it is
applied because the spatial dependence of spectrometric data is considered. Therefore, in
this section, the mathematical formulation of the Fuzzy c-means model is presented, and
then, the Spatial Fuzzy c-Means algorithm is described.
The Fuzzy c-Means model obtains the degree of membership of every orchard zone to a
cluster, where each orchard zone could belong to more than one cluster. The mathematical
formulation of this model is presented as follows.
Definition of parameters
N: number of orchard zones,
V: number of characteristics (variables) considered in every orchard zone,
K: number of clusters,
xvi ∶ characteristic v of the orchard zone i, v = 1, …, V, i = 1, …, N.
Definition of decision variables

uik ∶ degree of membership of the orchard zone i to a cluster k, i = 1,…,N, k = 1,…,K,
cvk ∶ coordinates of characteristic v in the centroid of the cluster k, v = 1, …, V, k = 1, …,
K.
Mathematical formulation:
∑ ∑∑ (
N V K
)
Minimize d xvi , cvk uik (1)
i=1 v=1 k=1
Subject to
∑
K
uik = 1, i = 1, … , N, (2)
k=1
uik ≥ 0, i = 1, … , N, k = 1, … , K. (3)
The objective function (1) minimizes, for every characteristic v, the distance between
the centroid of cluster k (cvk) and each orchard zone i (xvi). Constraint (2) establishes
that the sum of the membership degrees to every cluster is one, for every orchard zone i.
Finally, constraint (3) corresponds to the nature of the decision variables. It is important
to notice that, for each characteristic v, d(xvi, cvk) corresponds to the characteristic distance
between an orchard zone i and the centroid of a cluster k. This distance can be obtained
through different formulas as the Euclidean distance, city block distance, among others.
13
The Fuzzy c-Means model formulated previously (Eqs. 1–3), is solved by using the Spa-
tial Fuzzy c-Means algorithm. This algorithm is described as follows.
Firstly, the initial values of uik’s can be assigned using any arbitrary feasible solution.
Then, every value of cvk is calculated using Eq. (4), where m corresponds to a fuzzyfier
parameter. It is important to mention that m can take values greater than one. Moreover,
increasing the value of m tends to degrade the degree of membership uik towards the fuzzi-
est state (Bezdek, 1984).
∑n � �m
i=1
uik xvi
cvk = ∑ n � �m , k = 1, … , K, v = 1, … , V, (4)
i=1
uik
Once the centroid value cvk is obtained, the values of uik must be recalculated according
to Eq. (5).
1
uik = , i = 1, … , N, k = 1, … , K,
∑K ∑V � d(xvi ,cvk ) � m−1 (5)
2
r=1 v=1 d(xvi ,cvr )
Using the new value of uik, the probability that an orchard zone i belongs to a cluster k
(hik) must be estimated with Eq. (6). In this equation, NBi represents the subset of orchard
zones that can influence orchard zone i or are correlated with it. Therefore, this equation
allows orchard zones, which are correlated, to belong to the same cluster. NBi can be estab-
lished using a variogram for the NDVI in order to determine the range of spatial depend-
ence among orchard zones.
∑
hik = u𝜏k , i = 1, … , N, k = 1, … , K,
(6)
𝜏∈NBi
For obtaining the degree of membership of every orchard zone i to a cluster k, a spatial
function is calculated by Eq. (7). In this equation, p and q are parameters that weight the
relative importance between the degree of membership of an orchard zone i (uik) and the
probability that an orchard zone i belongs to a cluster k (hik). When p = 1 and q = 0, the
Fuzzy c-Means algorithm is solved. Using values of uik . obtained by Eq. (7), Eqs. (4) to (7)
′
are recalculated. This procedure is repeated until some stop-criterion is achieved. In this
study, the stop-criterion of the algorithm was a determined number of maximum iterations.
� �p � �q
� uik hik
uik = ∑K � �p � �q , i = 1, … , N, k = 1, … , K. (7)
t=1 uit hit
Cluster validation
In this step, parameters K, m, p and q of the Spatial Fuzzy c-Means algorithm are analyzed
based on how well clustered the data become. For this purpose, cluster validation functions
are used. In this study, Silhouette, proposed by Rousseeuw (1987), and the validation func-
tion proposed by Xie and Beni (1991) are used. Inspired by the studies of Bezdek (1984)
and Chuang et al. (2006), which used the Silhouette (Rousseeuw, 1987) and the S (Xie &
13
Beni, 1991) validation functions respectively. In this study, both validation functions are
applied and compared.
The validation function Silhouette, SI, (Rousseeuw, 1987) compares the clusters regard-
ing their tightness and separation. For obtaining the value of this function, the Eqs. (8), (9)
and (10) must be calculated.
b(i) − a(i)
s(i) = , i = 1, … , N. (8)
max {a(i), b(i)}
In Eq. (8), a(i) corresponds to the average distance between an orchard zone i that belongs
to a cluster A, and every orchard zone that belongs to the cluster A. This parameter measures
the average disparity of orchard zone i with respect to all the other orchard zones of the clus-
ter. On the other hand, b(i) is obtained as follows:
b(i) = min {d(i, C)}. (9)
C=1,…,K
where d(i, C) is the average distance between the orchard zone i that belongs to a cluster A,
and every orchard zone that belongs to cluster C and C ≠ A . This parameter measures the
average dissimilarity of orchard zone i with respect to all the other orchard zones of cluster
C.
The value s(i) varies between –1 and 1. When s(i) is approximately 1, it means that a(i) is
much less than b(i). In this way, it is possible to affirm that the orchard zone i is well clustered,
because a(i) is approximately 0. In other words, the orchard zone i with respect to the other
orchard zones in the same cluster are very close. On the other hand, when s(i) is approxi-
mately –1, it means that b(i) is approximately 0. Therefore, the orchard zone i in regard to
the other orchard zones of cluster C are very close. It is important to notice that values of s(i)
closer to one are desirable.
The value of the Silhouette function (SI) is calculated by using Eq. (10). As it was men-
tioned previously, values of SI closer to one are desirable.
∑N
s(i)
SI = i=1 . (10)
N
The validation function proposed by Xie and Beni (1991) measures the quality of a fuzzy
clustering, which is presented in Eq. (11). This validation function considers the distance
between every zone i in the orchard (xvi) and its corresponding centroid to cluster k (cvk), mul-
tiplied by the degree of membership of zone i to cluster k (uik). This value is divided by the
minimum sum of distances between every centroid of clusters (centroid cvh and cvk), consider-
ing each characteristic v. This value is weighted by the number of the orchard zones N. The S
value is always positive. However, if the lower the value of S is, the better the clusterization.
This occurs because every orchard zone is closer to its cluster centroid and, at the same time,
the distance between every cluster is big.
13
∑N ∑K ∑V � �m
i=1 k=1 v=1
uik ∣∣ xvi , cvk ∣∣2
S= .
⎛ ⎞
⎜ ⎟
⎜ ⎟
⎜ � ∑V � ⎟ (11)
N⎜ min 2
∣∣ cvh − cvk ∣∣ ⎟
v=1
⎜ h = 1, … , K ⎟
⎜ k = 1, … , K ⎟
⎜ ⎟
⎝ h ≠ k ⎠
Fruit yield forecast through an ANN
Once the clusters are defined, an ANN algorithm is used to forecast the fruit yield in an
orchard. For this purpose, the following predictor variables are used:
Mrk ∶ coordinate r from a sampled orchard zone of cluster k, where r = 1, 2 (where
1 = abscissa, 2 = ordinate), k = 1,…, K,
Vk ∶ NDVI value of the sampled orchard zone in each cluster k, where k = 1,…,K,
∑2
dik = ∣∣ xri , Mrk ∣∣ , distance between an orchard zone i and the sampled orchard
r=1
zone in each cluster k, where i = 1, …, N, k = 1,…, K,
pik = Vk uik , weighted NDVI of every cluster k, where i = 1, …, N, k = 1, …, K.
ANNs are ML techniques that simulate the mechanism of learning in biological
organisms (Aggarwal, 2018). The first authors that proposed an ANN were Mcculloch
and Pitts (1943). The biological learning mechanism simulated by an ANN contains
computation units known as neurons. An example of an ANN is shown in Fig. 1. The
input layer of the ANN is exposed to different values of inputs (blue circles), which
are connected to the neurons (green circles) through a propagation function. The neu-
rons correspond to the hidden layer. Usually, one hidden layer is used, nevertheless,
more than one could be designed (Aggarwal, 2018). The propagation function initiates
an activation function in the neurons, whose results are sent to the output layer through
another propagation function. In Fig. 1, only one output is represented (red circle).
However, several outputs could be obtained because of the ANN learning process.
In this article, the predictor variables described previously are the inputs of the pro-
posed ANN and the fruit yield of the orchard studied corresponds to the output. For
more details about ANNs see Aggarwal (2018).
In the following section, the case study is presented, aiming to explain the methodology
developed.
Case study
In this case study, an orchard of table grape (vitis vinifera L, Red Globe variety) located
in the Valparaíso Region of Chile is analyzed using the SBM–Fruit. This orchard has 6.3
hectares divided into two blocks. The data obtained from this orchard corresponds to the
NDVI measured during the 2014–2015 season. This information is used because the NDVI
is correlated with plant canopy, which in turn is a good indicator of the tree health, and,
therefore, an indicator of the possibility to obtain a good fruit yield (Hall et al., 2011).
13
Spectral data were collected continuously, on five different dates, using active ground
sensors. Maps of NDVI at each date were produced using a 3 × 3 m cell (9 m2). A cell
is called an “orchard zone”, representing an area of 9 m2. Thus, a total of 6990 data of
the NDVI index were collected for the whole 6.3-ha field at each collection time, which
corresponded to a phenological stage of the crop. The data were collected by using two
active canopy sensors OptRx (AgLeader Technologies, USA), mounted on an ATV upside,
looking at the canopy from beneath at approximately 0,8 m of distance. Each sensor oper-
ated independently through the SMS Mobile software (AgLeader Technologies, USA),
and was connected to a robust field equipment, with a differential global positioning sys-
tem (DGPS). The five phenological stages are represented by the following dates: Octo-
ber 9th, 2014 (around 4 days after full bloom); October 3 0th, 2014 (around 25 days after
full bloom); November 11st, 2014 (around 37 days after full bloom); December 3rd, 2014
(around 59 days after bloom); and January 1 2nd, 2015 (around 102 days after bloom). It is
important to notice that the table grape harvest occurs around 162 days after full bloom.
In Fig. 2, it is possible to observe the orchard map with the NDVI collected in every date
associated to a phenological stage. In this figure, the green color represents an NDVI value
close to one (more vegetative development), while the red color represents an NDVI value
closer to zero (less vegetative development).
Moreover, the observed fruit yield in every orchard zone was collected at the end of
the 2014–2015 season. This data is useful for training the developed ANN and for assess-
ing the SBM – Fruit performance. In Fig. 3, the orchard map with the observed fruit yield
of every orchard zone at the end of the season is represented. The green color represents
a fruit yield closer to 50 ton/ha (large fruit yield), while red color represents a fruit yield
closer to 32 ton/ha (low fruit yield).
Fig. 1 Example of an ANN
13
Results and discussion
Using the data described previously, the SBM-Fruit is applied to estimate the table grape
yield. In this way, the results obtained in every step of this method are presented.
Results for clustering using the spatial fuzzy c‑means algorithm
The first step of the SBM-Fruit corresponds to the clustering using the Spatial Fuzzy
c-Means algorithm. For the execution of this algorithm, the coordinate of each orchard
zone (cell) is used, where the NDVI data are collected in each phenological stage, that is,
on different dates (see Fig. 2). In this way, every orchard zone has a georeferenced centroid,
separated 3 m from an adjacent one. The NDVI data across all stages are used as inputs for
the clustering process. Therefore, every coordinate of each orchard zone and every NDVI
data collected in each phenological stage are used as a characteristic v in the Spatial Fuzzy
c-Means. Furthermore, the subset of orchard zones that can influence every orchard zone
i ( NBi ) must be calculated. For this purpose, variograms for all NDVI samples are made
(Fig. 4). Thus, it is possible to analyze if an orchard zone influences another orchard zone,
according to their distances. To obtain the variograms, the geographical coordinates of the
orchard zones are used for calculating the Euclidean distance among them. The obtained
variograms are depicted in Fig. 4, being necessary to identify the range or the distance
where the variance becomes constant. For all dates, this situation occurs when the distance
is approximately 100 m, meaning that a spatial dependence among all the orchard zones
can be observed up to this distance. For this reason, in this case study, the NBi corresponds
to the subset of all orchard zones with a distance less than 100 m from the orchard zone i. It
is important to notice that when the spatial dependence disappears, the variograms present
a random behavior. In Fig. 4, this behavior is observed for distances greater than 100 m.
The Spatial Fuzzy c-Means algorithm was implemented in Java. For its execu-
tion, the parameters K, m, p, and q needed to be calibrated. As mentioned pre-
viously, these parameters correspond to the number of clusters, the fuzzy-
fier parameter, and the weights of the relative importance related to the degree
of membership of an orchard zone i, in turn. The values analyzed for each
Fig. 2 Map with the NDVI measured in different phenological stages of the season
13
parameter were: K = {2, 3, 4, … , 70}, m = {1.1, 1.2, 1.3}, p = {1, 1.5, 2}, q = {0, 0.5, 1} . Therefore,
69 × 3 × 3 × 3 = 1,863 configurations were evaluated.
Results for cluster validation
Aiming to calibrate the parameters K, m, p, and q, the cluster validation proposed in the
second step is carried out. The results obtained with Silhouette (Eq. 10) and S (Eq. 11)
are shown in Figs. 5 and 6 respectively. It is important to remember that, for Silhouette,
the greatest value is the best, while, for S, the lowest value is the best. Moreover, in both
figures, max represents the maximum value obtained for each number of clusters, by ana-
lyzing all configurations (K is constant). On the other hand, min represents the minimum
value obtained for each number of clusters, by analyzing all configurations (K is constant).
Finally, best corresponds to the best configuration obtained, which is evaluated in all the
remaining number of clusters (K is variable and m, p and q are constant).
The configuration with the best silhouette value considers four clusters (K = 4) and
the associated parameters are m = 1.1, p = 1.5, q = 1. The silhouette function of this
configuration according to different parameter values of m, p, and q, is represented by
the blue line in Fig. 5. As can be observed, this configuration usually obtains the best
silhouette value for different number of clusters. Furthermore, it can be noted that all
the silhouette functions presented in Fig. 5 remain stable when the number of clusters
equals to five. Therefore, the silhouette functions become independent from the number
of clusters.
In Fig. 6, it is possible to observe the inverse relationship between the number of clus-
ters and the S value. The minimum value of S is obtained by the configuration considering
69 clusters (K = 69) and the associated parameters are m = 1.1, p = 1, q = 1. The S function
of this configuration according to different parameter values of m, p, and q, is represented
by the blue line in Fig. 6. In addition, it can be observed that this configuration usually
obtains the best S value for different number of clusters. Unlike the performance of the sil-
houette function shown in Fig. 5, the S function depends greatly on the number of clusters
(K), and has a little impact from the parameter values of m, p, and q. This situation can be
observed by analyzing the ranges that the S values vary. For example, when two clusters
are considered, the worst and best S values vary from 3,844 to 3,589, respectively. On the
Fig. 3 Map of fruit yield in the

orchard at the end of 2014–2015
season
13
Fig. 4 Variograms for every date in the sample
other hand, when 69 cluster are considered, the worst and best S values vary from 151.8 to
98.2, respectively. Therefore, the influence of m, p, and q decreases as the number of clus-
ters increases.
It is important to highlight that the incorporation of spatial dependence through p and
q parameters contributes to improve the silhouette and S values, because the best configu-
rations in both cases have a q value different from zero. This fact reflects that the prob-
ability that an orchard zone i belongs to a cluster k (hik) is considered for defining the best
clustering.
Due to the great difference between the number of clusters considered in the best con-
figurations of silhouette and S functions, the clustering obtained by both configurations
are analyzed to establish the most suitable one to be used in the ANN algorithm. The best
Silhouette configuration (with four clusters) is presented in Fig. 7, while six clusters from
the best S configuration (with 69 clusters) are presented in Fig. 8. The clusters in Fig. 8 are
an example of the best S configuration. In addition, all clusters of this configuration are
presented in the Appendix (Online Resource). In Figs. 7 and 8, the green color represents a
higher degree of cluster membership, and the red color, a lower membership.
As can be observed in Figs. 7 and 8, the clusters are discontinuous, showing that the
closeness of the orchard zones is not the only relevant attribute for clustering. A similar
Fig. 5 Silhouette values for each

configuration
13
Fig. 6 S value for each configura-

tion
behavior was observed regarding the NDVI during the five dates where the data were
collected.
Once the membership degrees are obtained, the predictor variables defined in “Fruit
yield forecast through an ANN” section are calculated and normalized between zero and
one. These predictor variables are used in the designed ANN with one hidden layer.
Results for fruit yield forecast through an ANN
Regarding the predictor variables, these variables have been obtained with the best con-
figuration of Silhouette and S functions. The data of the orchard zones associated to the
predictor variables have been normalized between 0 and 1. Furthermore, they were divided
into three different random samples, which correspond to training (70%), validation (15%),
and testing (15%). The same samples have been used in the best configuration of Silhouette
and S functions. Furthermore, the number of neurons in the hidden layer has been adjusted
among 5 and 150 to establish the neurons’ configuration that minimizes the following error
measures for the validation sample: ME and MSE (see nomenclature in Table 1).
On the other hand, in order to obtain the confidence interval of the fruit yield estima-
tion, the bootstrap method is used (Efrom, 1979). Using this method, five executions with
different random samples for training, validation, and testing are generated. Every training
sample is used for training the ANN, and then, for forecasting the fruit yield and calcu-
lating its respective errors. Later, by using the respective random validation sample, the
deviation of errors is analyzed for obtaining the confidence interval.
Figure 9 presents the MSE obtained according to different number of neurons, for the
random validation samples in the five executions, and using the predictor variables of the
best configuration obtained by the Silhouette function (configuration with K = 4, m = 1.1,
p = 1.5, q = 1). It is possible to observe that, between 5 and 30 neurons, the error decreases.
In this way, for the random validation samples in the five executions, the MSE average
is 0.0042 when five neurons are used, and it is 0.0001 when 30 neurons are used. For
the range between 30 and 100 neurons, the MSE becomes steady (around 0.0001), and
its standard deviation is very low (around 0.000031). For the range between 100 and 150
neurons, the ANN becomes overfitted. This means they learn the bias, and, therefore, the
error increases. Finally, for validation samples in the five executions, the lowest error rate
is obtained using 40 neurons, where the MSE average is 0.000083. Consequently, the best
configuration obtained by the Silhouette function is reliable random training samples using
the five executions.
13
Fig. 7 Four clusters obtained by the best Silhouette function configuration
Fig. 8 Six clusters obtained by the best S function configuration
Figure 10 shows the MSE obtained according to different number of neurons, for the
random validation samples in the five executions, and using the predictor variables of the
best configuration obtained by the S function (configuration with K = 69, m = 1.1, p = 1,
q = 1). As can be observed in Fig. 11, for the random validation samples in the five execu-
tions, the minimum MSE is obtained using five neurons, and varies around 0.0009 and
0.0047. Using more than five neurons, the MSE varies between 0.0005 and 0.0045, as
observed in Fig. 10. Moreover, the standard deviation obtained by the best configuration of
the S function is greater than the standard deviation obtained by the best configuration of
the Silhouette function; for the S function, it varies from 0.001919 to 0.000115, while, for
the Silhouette function, it varies from 0.000326 to 0.000013 (see Table A.1 in the Appen-
dix – Online Resource).
13
Figure 12 presents the MSE comparison for the validation sample in the first execution
(execution 1), using the predictor variables of the best configurations of Silhouette and S
functions. In this figure, it is possible to observe that the greater the number of neurons, the
lower the MSE for the Silhouette function. On the other hand, the MSE behavior of the S
function does not show a trend. In addition, for almost all the number of neurons used, the
MSE of the Silhouette function is lower than the S function. This MSE performance may
be due to the number of clusters used in each configuration: 69 clusters for the S function
and 4 clusters for the Silhouette function. Because each predictor variable depends on the
number of clusters, while the greater the number of clusters, the lower the standard devia-
tion of the predictor variables. This situation means that most of the orchard zones do not
belong to a cluster, increasing the MSE of the forecast. For this reason, in this study, the
predictor variables obtained by best configuration of Silhouette function and assuming 40
neurons are used in the ANN for forecasting the fruit yield. As mentioned previously, when
40 neurons are used by the best configuration of the Silhouette function, the lowest MSE is
obtained.
The test sample is used for validating the best configuration of Silhouette function with
40 neurons. Figure 13 presents the dispersion of ME and MSE obtained for this configura-
tion. The ME allows to analyze if there is over or underestimation regarding fruit in every
orchard zone. As observed in Fig. 13a, the over and underestimation regarding fruit using
the best Silhouette configuration are offset, having obtained a ME average of 0.00013. This
means that the estimated fruit production in an orchard would have little bias. On the other
hand, it is important to analyze the consistency between the MSE obtained in the valida-
tion sample and in the test sample. The MSE average for the validation sample is 0.000083,
while, for the test sample is 0.000080. These results show the consistency of the fruit yield
forecast, meaning that there is not overfitting. In Fig. 13b, it is possible to observe that the
MSE deviation of the forecasted fruit yield for every orchard zone is very low. In addi-
tion, it is possible to observe that there are few outliers. However, they do not have a great
impact on the MSE average.
Finally, in order to estimate the goodness of fit, Fig. 14 illustrates the correlation
between the observed and the forecasted fruit yield. The test random sample of the first
execution (execution 1) was used for carrying out this forecast. Each point of the figure
represents an orchard zone of 9 m2. In this analysis, the correlation coefficient is 0.95
(R2 = 0.90). In this way, the considered predictor variables allow explaining 90 percent of
the fruit yield.
The results show that the proposed SBM – Fruit demonstrates a very good performance
to estimate a fruit yield. Furthermore, the fruit yield forecast for every orchard zone is also
Fig. 9 MSE for the validation

samples according to different
number of neurons, using the
best configuration of the Silhou-
ette function
13
Fig. 10 MSE for the validation

samples according to differ-
ent number of neurons, using
the best configuration of the S
function
Fig. 11 MSE ranges from 0 to

0.00055 for the validation sam-
ples according to different num-
ber of neurons, using the best
configuration of the S function
Fig. 12 MSE comparison of the

first validation sample according
to different number of neurons,
using the best configurations of
the S and Silhouette functions
well estimated. A more detailed fruit forecast in an orchard would allow making better
operative and/or tactical planning and improving the FSC coordination.
13
Fig. 13 Box-plots of ME and MSE obtained in the test sample
Fig. 14 Correlation between
observed fruit yield and its
forecast
Managerial insights
The SBM-Fruit uses spectrometric data (NDVI), with a cell size of 9 m 2, which can be
achieved by satellite, drone, and ground sensors. In addition, these data were obtained from
different phenological stages of a season in order to estimate the total fruit yield (Fig. 2).
Thus, aiming to collect these data, sensors located in the agricultural machinery can be
used while the agricultural activities are carried out (Mistele & Schmidhalter, 2010). In
this way, the data collection could be permanent and inexpensive. Once the data are col-
lected, it is necessary to process them using the configuration and algorithm proposed in
this study.
The potential users of the SBM-Fruit could be farmers, orchard managers, and/or profes-
sionals of fresh fruit companies. The obtained fruit yield forecast could be used for making
tactical and/or operational harvest planning or improving the coordination among differ-
ent stages of the FSC, involving different stakeholders as farmers, carriers, cold chambers,
and processing plant managers. Furthermore, the SBM-Fruit is simple and user-friendly
because it only requires that one user uploads, once every season, the spectrometric data,
and related geographical coordinates, obtaining the fruit yield forecast for every coordi-
nate. Moreover, the execution of this method can be carried out in a personal computer of
around 8 cores and 8 Gb if the analyzed orchard has less than 10 hectares.
It is important to highlight that the SBM-Fruit is better than traditional methods as the
simple random sample method because a lower MSE is obtained. This error reduction is
13
due to the incorporation of temporal and spatial correlations. Furthermore, the spectrom-
etry data of all trees in an orchard are used. Moreover, the SBM-Fruit allows to estimate
the yield of any major fruit. However, it requires to be trained again using the associated
spectrometric data. On the other hand, while more data of different major fruits are used
for training the SBM-Fruit, it becomes more general. In this way, the SBM-Fruit could be
applied for forecasting the yield of any kind of major fruit without training.
Conclusions and future research
Fruit yield forecast in the orchards is a key piece of information for making decisions
along the FSC. Fruit estimation errors impact negatively in the coordination of the FSC,
increasing costs, reducing fruit quality, increasing fruit losses, and the waste of packaging
materials, and affecting other activities in the FSC, such as fruit transport and cold storage
availability. For this reason, having good fruit yield forecast methods allow to reduce these
impacts. In this study, a new three-step method, SBM-Fruit, for estimating the fruit yield in
an orchard through NDVI data incorporation is proposed. The method uses georeferenced
NDVI data collected in different phenological stages. In the first step of SBM-Fruit, apply-
ing the spatial Fuzzy c-Means (Chuang et al., 2006), different fuzzy clusters are obtained
considering the spatial and temporal dependence. In the second step, S and Silhouette func-
tions are used to find the best clustering obtained in the first step. Finally, in the third step,
using the predictor variables of the best clustering identified in the second step, the fruit
yield forecast for every orchard zone is calculated with an ANN algorithm.
As a result of the second step, it is possible to observe that the spatial and temporal
information is required to improve the clustering because this dependence exists. In this
way, the hypothesis that the incorporation of the spatial dependence can improve the crop
yield prediction is validated. This fact is reflected in the value q (q = 1) of the best con-
figuration obtained by using the Silhouette function, where a value of q different from zero
means that the spatial dependence is considered in the spatial Fuzzy c-Means.
Using the best configuration of p and q in the ANN (third step), the lowest MSE aver-
age is obtained (0.000083) with 40 neurons. In this analysis, the best configurations of
S and Silhouette functions were compared, where those based on the Silhouette function
performed better. The results showed that the used NDVI sample is not overfitting, that
is, the sample is representative. Consequently, a confidence interval is not necessary. This
conclusion can be also observed from the test sample results, where very low errors are
observed (MSE = 0.00008 and ME = 0.00013). In this way, the SBM-Fruit can obtain good
fruit yield forecast for every orchard zone. Furthermore, it is important to highlight that the
fruit yield forecast with the SBM-Fruit could be obtained around two months in advance of
the harvest, adding flexibility and anticipation to the decision-making process, especially
when suppliers’ contracts need to be done with anticipation (for example, refrigerated
warehouse and transport leasing). This forecast must be done once during the season, and
other estimations (executions with the SBM-Fruit) will depend on the available temporal
NDVI data. For this reason, a suggestion for farmers is to use the sensors mounted on the
machinery to collect data when doing any labor.
For future research, computational experiments for determining suitable dates to col-
lect the NDVI data could be carried out. Furthermore, in order to improve the fruit yield
forecast, different spectrometric data of any major fruit could be analyzed, or/and new
13
predictor variables, such a soil quality or meteorological events, could be incorporated into
the ANN.
Supplementary Information The online version contains supplementary material available at https://doi.

org/10.1007/s11119-022-09947-7.
Acknowledgments DSc. Marcela C. González-Araya would like to thank FONDECYT project 1191764

(Chile) for their financial support. MSc. Javier Gómez is grateful for the research funding provided under
the CONICYT PFCHA/DOCTORADO BECAS CHILE 2019–21191364 (Chile).
References
Aggarwal, C. C. (2018). Neural networks and deep learning. New York, USA: Springer Nature.
Ahmad, I., Saeed, U., Fahad, M., Ullah, A., Habib ur Rahman, M., Ahmad, A., & Judge, J. (2018). Yield
Forecasting of Spring Maize Using Remote Sensing and Crop Modeling in Faisalabad-Punjab Paki-
stan. Journal of the Indian Society of Remote Sensing, 46(10), 1701–1711. https://doi.org/10.1007/
s12524-018-0825-8
Ali, I., Cawkwell, F., Dwyer, E., & Green, S. (2017). Modeling managed grassland biomass estimation by
using multitemporal remote sensing data-a machine learning approach. IEEE Journal of Selected Top-
ics in Applied Earth Observations and Remote Sensing, 10(7), 3254–3264. https://doi.org/10.1109/
JSTARS.2016.2561618
Anderson, N. T., Underwood, J. P., Rahman, M. M., Robson, A., & Walsh, K. B. (2019). Estimation of fruit
load in mango orchards: Tree sampling considerations and use of machine vision and satellite imagery.
Precision Agriculture, 20(4), 823–839. https://doi.org/10.1007/s11119-018-9614-1
Bai, T., Zhang, N., Mercatoris, B., & Chen, Y. (2019). Jujube yield prediction method combining Landsat 8
Vegetation Index and the phenological length. Computers and Electronics in Agriculture, 162, 1011–
1027. https://doi.org/10.1016/j.compag.2019.05.035
Bezdek, J. C. (1984). FCM: The Fuzzy c-Means Clustering Algorithm. In Computers & Geosciences (Vol.
10, Issue 3).
Bohle, C., Maturana, S., & Vera, J. (2010). A robust optimization approach to wine grape harvesting sched-
uling. European Journal of Operational Research, 200(1), 245–252. https://doi.org/10.1016/j.ejor.
2008.12.003
Bose, P., Kasabov, N. K., Bruzzone, L., & Hartono, R. N. (2016). Spiking Neural Networks for Crop Yield
Estimation Based on Spatiotemporal Analysis of Image Time Series. IEEE Transactions on Geosci-
ence and Remote Sensing, 54(11), 6563–6573. https://doi.org/10.1109/TGRS.2016.2586602
Caixeta-Filho, J. V. (2006). Orange harvesting scheduling management: A case study. Journal of the Opera-
tional Research Society, 57(6), 637–642. https://doi.org/10.1057/palgrave.jors.2602041
Catalá, L. P., Moreno, M. S., Blanco, A. M., & Bandoni, J. A. (2016). A bi-objective optimization model for
tactical planning in the pome fruit industry supply chain. Computers and Electronics in Agriculture,
130, 128–141. https://doi.org/10.1016/j.compag.2016.10.008
Chuang, K. S., Tzeng, H. L., Chen, S., Wu, J., & Chen, T. J. (2006). Fuzzy c-means clustering with spa-
tial information for image segmentation. Computerized Medical Imaging and Graphics, 30(1), 9–15.
https://doi.org/10.1016/j.compmedimag.2005.10.001
de Salvador, F. R., Fisichella, M., & Fontanari, M. (2006). Correlations between fruit size and fruit qual-
ity in apple trees with high and standard crop load levels. Journal of Fruit and Ornamental Plant
Research, 14(2), 113–122.
Efrom, B. (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), 1–26.
https://doi.org/10.1214/aos/1176344552
Farooque, A. A., Chang, Y. K., Zaman, Q. U., Groulx, D., Schumann, A. W., & Esau, T. J. (2013). Perfor-
mance evaluation of multiple ground based sensors mounted on a commercial wild blueberry harvester
to sense plant height, fruit yield and topographic features in real-time. Computers and Electronics in
Agriculture, 91, 135–144. https://doi.org/10.1016/j.compag.2012.12.006
Fernandes, J. L., Ebecken, N. F. F., & Esquerdo, J. C. D. M. (2017). Sugarcane yield prediction in Bra-
zil using NDVI time series and neural networks ensemble. International Journal of Remote Sensing,
38(16), 4631–4644. https://doi.org/10.1080/01431161.2017.1325531
13
Ferrer, J. C., ma Cawley, A., Maturana, S., Toloza, S., & Vera, J. (2008). An optimization approach for
scheduling wine grape harvest operations. International Journal of Production Economics, 112(2),
985–999. https://doi.org/10.1016/j.ijpe.2007.05.020
Ghaemi, R., Nasir Sulaiman, M., & Ibrahim, H. (2009). A Survey: Clustering Ensembles Techniques
Towards an optimal feature subset selection View project Intrusion detection system View project.
https://www.researchgate.net/publication/232700836
Gómez-Lagos, J. E., González-Araya, M. C., Blu, R. O., & Acosta Espejo, L. G. (2019). Using data mining
techniques to forecast the Normalized Difference Vegetation Index (NDVI) in table grape. In ICORES
2019—Proceedings of the 8th International Conference on Operations Research and Enterprise Sys-
tems. https://doi.org/10.5220/0007570101890194
Gómez-Lagos, J. E., González-Araya, M. C., Soto-Silva, W. E., & Rivera-Moraga, M. M. (2021). Opti-
mizing tactical harvest planning for multiple fruit orchards using a metaheuristic modeling approach.
European Journal of Operational Research, 290(1), 297–312. https://doi.org/10.1016/j.ejor.2020.08.
015
González-Araya, M. C., Soto-Silva, W. E., & Espejo, L. G. A. (2015). Harvest planning in apple orchards
using an optimization model. In L. M. Plà-Aragonés (Ed.), Handbook of operations research in
agriculture and the agri-food industry (pp. 79–105). New York: Springer. https://doi.org/10.1007/
978-1-4939-2483-7_4
Hall, A., Lamb, D. W., Holzapfel, B. P., & Louis, J. P. (2011). Within-season temporal variation in correla-
tions between vineyard canopy and winegrape composition and yield. Precision Agriculture, 12(1),
103–117. https://doi.org/10.1007/s11119-010-9159-4
Herrera-Cáceres, C., Pérez-Galarce, F., Álvarez-Miranda, E., & Candia-Véjar, A. (2017). Optimization of
the harvest planning in the olive oil production: A case study in Chile. Computers and Electronics in
Huete, A., Jd, W., & Leeuwen, V. (1999). MODIS vegetation index (MOD13) Impacts of extreme hydro-
meteorological conditions on ecosystem functioning and productivity patterns across Australia View
project Fingerprinting Australian ecosystem threats from climate change and biodiversity loss View
project. https://www.researchgate.net/publication/268745810
International Society of Precision Agriculture. (2022). https://www.ispag.org/about/definition
Koirala, A., Walsh, K. B., Wang, Z., & McCarthy, C. (2019). Deep learning – Method overview and
review of use for fruit detection and yield estimation. In Computers and Electronics in Agriculture
(Vol. 162, pp. 219–234). Elsevier B.V. https://doi.org/10.1016/j.compag.2019.04.017
Koller, M., & Upadhyaya, S. K. (2005). Prediction of processing tomato yield using a crop growth model
and remotely sensed aerial images. Transactions of the ASAE, 48(6), 2335–2341. https://doi.org/10.
13031/2013.20072
Lamsal, K., Jones, P. C., & Thomas, B. W. (2016). Harvest logistics in agricultural systems with mul-
tiple, independent producers and no on-farm storage. Computers and Industrial Engineering, 91,
129–138. https://doi.org/10.1016/j.cie.2015.10.018
Mcculloch, W. S., & Pitts, W. (1943). A LOGICAL CALCULUS OF THE IDEAS IMMANENT IN
NERVOUS ACTIVITY. In BULLETIN OF MATHEMATICAL BIOPHYSICS (Vol. 5).
Mihai, H., & Florin, S. (2016). Biomass prediction model in maize based on satellite images. AIP Con-
ference Proceedings. https://doi.org/10.1063/1.4952132
Mistele, B., & Schmidhalter, U. (2010). Tractor-based quadrilateral spectral reflectance measurements
to detect biomass and total aerial nitrogen in winter wheat. Agronomy Journal, 102(2), 499–506.
https://doi.org/10.2134/agronj2009.0282
Moreda, G. P., Ortiz-Cañavate, J., García-Ramos, F. J., & Ruiz-Altisent, M. (2009). Non-destructive
technologies for fruit and vegetable size determination—A review. Journal of Food Engineering,
92(2), 119–136. https://doi.org/10.1016/j.jfoodeng.2008.11.004
Negi, S., Anand, N., & Lscm, H. (. (2015). Cold Chain: A Weak Link in the Fruits and Vegetables Sup-
ply Chain in India Supply Chain Efficiency View project Calls for Papers (Special issue: Sustain-
able Procurement): International Journal of Social Ecology and Sustainable Development (IJSESD)
View project Cold Chain: A Weak Link in the Fruits and Vegetables Supply Chain in India. In The
IUP Journal of Supply Chain Management: Vol. XII (Issue 1). https://www.researchgate.net/publi
cation/279866746
Ortega, R. A., Acosta, L. E., & Jara, L. A. (2012). Use of cluster regression for yield prediction in wine
grape. Proceedings of the International Conference on Precision Agriculture, 1–8. https://ispag.
org/p rocee dings /?a ction=a bstra ct&i d=1 260&t itle=U se+o f+C luste r+Regres sion+for+Yield+
Prediction+in+Wine+Grape
13
Panda, S. S., Ames, D. P., & Panigrahi, S. (2010). Application of vegetation indices for agricultural crop
yield prediction using neural network techniques. Remote Sensing, 2(3), 673–696. https://doi.org/
10.3390/rs2030673
Pantazi, X. E., Moshou, D., Alexandridis, T., Whetton, R. L., & Mouazen, A. M. (2016). Wheat yield
prediction using machine learning and advanced sensing techniques. Computers and Electronics in
Rousseeuw, P. J. (1987). Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster
Analysis. Comput. Appl. Math. 20, 53–65. Journal of Computational and Applied Mathematics,
20, 53–65. https://doi.org/10.1016/0377-0427(87)90125-7
Soto-Silva, W. E., González-Araya, M. C., Oliva-Fernández, M. A., & Plà-Aragonés, L. M. (2017). Opti-
mizing fresh food logistics for processing: Application for a large Chilean apple supply chain. Com-
puters and Electronics in Agriculture, 136, 42–57. https://doi.org/10.1016/j.compag.2017.02.020
Soto-Silva, W. E., Nadal-Roig, E., González-Araya, M. C., & Pla-Aragones, L. M. (2016). Opera-
tional research models applied to the fresh fruit supply chain. In European Journal of Operational
Research (Vol. 251, Issue 2, pp. 345–355). Elsevier B.V. https://doi.org/10.1016/j.ejor.2015.08.046
Stateras, D., & Kalivas, D. (2020). Assessment of olive tree canopy characteristics and yield forecast
model using high resolution uav imagery. Agriculture (Switzerland), 10(9), 1–13. https://doi.org/10.
3390/agriculture10090385
Stein, M., Bargoti, S., & Underwood, J. (2016). Image based mango fruit detection, localisation and
yield estimation using multiple view geometry. Sensors (Switzerland). https://doi.org/10.3390/
s16111915
Sun, L., Gao, F., Anderson, M. C., Kustas, W. P., Alsina, M. M., Sanchez, L., Sams, B., McKee, L.,
Dulaney, W., White, W. A., Alfieri, J. G., Prueger, J. H., Melton, F., & Post, K. (2017). Daily map-
ping of 30 m LAI and NDVI for grape yield prediction in California vineyards. Remote Sensing,
9(4), 1–18. https://doi.org/10.3390/rs9040317
Uribeetxebarria, A., Martínez-Casasnovas, J. A., Tisseyre, B., Guillaume, S., Escolà, A., Rosell-Polo,
J. R., & Arnó, J. (2019). Assessing ranked set sampling and ancillary data to improve fruit load
estimates in peach orchards. Computers and Electronics in Agriculture. https://doi.org/10.1016/j.
compag.2019.104931
van Dyk, F. E., & Maspero, E. (n.d.). An analysis of the South African fruit logistics infrastructure (Vol.
20, Issue 1). http://www.orssa.org.za
Varas, M., Basso, F., Maturana, S., Osorio, D., & Pezoa, R. (2020). A multi-objective approach for support-
ing wine grape harvest operations. Computers and Industrial Engineering, 145, 106497. https://doi.
org/10.1016/j.cie.2020.106497
Villalobos, J. R., Soto-Silva, W. E., González-Araya, M. C., & González-Ramirez, R. G. (2019). Research
directions in technology development to support real-time decisions of fresh produce logistics: A
review and research agenda. In Computers and Electronics in Agriculture (Vol. 167). Elsevier B.V.
https://doi.org/10.1016/j.compag.2019.105092
Xie, X. L., & Beni, G. (1991). A validity measure for fuzzy clustering. IEEE Transactions on Pattern Analy-
sis and Machine Intelligence, 13(8), 841–847. https://doi.org/10.1109/34.85677
Ye, X., Sakai, K., Asada, S. I., & Sasao, A. (2008). Application of narrow-band TBVI in estimating fruit
yield in citrus. Biosystems Engineering, 99(2), 179–189. https://doi.org/10.1016/j.biosystemseng.2007.
09.016
Ye, X., Sakai, K., Manago, M., Asada, S. I., & Sasao, A. (2007). Prediction of citrus yield from air-
borne hyperspectral imagery. Precision Agriculture, 8(3), 111–125. https://doi.org/10.1007/
s11119-007-9032-2
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the
author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is
solely governed by the terms of such publishing agreement and applicable law.
13
Authors and Affiliations
Javier E. Gómez‑Lagos1 · Marcela C. González‑Araya2 · Rodrigo Ortega Blu3 ·

Luis G. Acosta Espejo3
Javier E. Gómez‑Lagos
javier.gomez@utalca.cl
Rodrigo Ortega Blu
rodrigo.ortega@usm.cl
Luis G. Acosta Espejo
luis.acosta@usm.cl
1
Doctorado en Sistemas de Ingeniería, Faculty of Engineering, Universidad de Talca, Campus
Curicó, Camino a Los Niches, km 1, Curicó, Chile
2
Department of Industrial Engineering, Faculty of Engineering, Universidad de Talca, Campus
Curicó, Camino a Los Niches km 1, Curicó, Chile
3
Departamento de Ingeniería Comercial, Universidad Técnica Federico Santa María, Avenida Santa
María 6400, Vitacura, Santiago, Chile
13

A New Method Based On Machine Learning To Forecast Fruit Yield Using Spectrometric Data: Analysis in A Fruit Supply Chain Context

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A New Method Based On Machine Learning To Forecast Fruit Yield Using Spectrometric Data: Analysis in A Fruit Supply Chain Context

Uploaded by

Copyright:

Available Formats

Precision Agriculture

A new method based on machine learning to forecast fruit

Javier E. Gómez‑Lagos1 · Marcela C. González‑Araya2 · Rodrigo Ortega Blu3 ·

Accepted: 25 July 2022

Keywords Fruit yield forecast · Normalized Difference Vegetation Index · Machine

Review of studies that use spectrometric data for estimating fresh produce yield

Deviation impacts of the fruit yield forecast in the fruit supply chain coordination

Table 4 Consequences of yield prediction inaccuracies regarding FSC decisions

In the following sub-sections, a detailed description of every step of the proposed

Clustering using the Spatial Fuzzy c‑Means algorithm

Definition of decision variables

r=1 v=1 d(xvi ,cvr )

Fruit yield forecast through an ANN

Results for clustering using the spatial fuzzy c‑means algorithm

Results for cluster validation

Fig. 3 Map of fruit yield in the

Fig. 4 Variograms for every date in the sample

Fig. 5 Silhouette values for each

Fig. 6 S value for each configura-

Results for fruit yield forecast through an ANN

Fig. 7 Four clusters obtained by the best Silhouette function configuration

Fig. 8 Six clusters obtained by the best S function configuration

Fig. 9 MSE for the validation

Fig. 10 MSE for the validation

Fig. 11 MSE ranges from 0 to

Fig. 12 MSE comparison of the

Fig. 13 Box-plots of ME and MSE obtained in the test sample

Conclusions and future research

Supplementary Information The online version contains supplementary material available at https://​doi.​

Acknowledgments DSc. Marcela C. González-Araya would like to thank FONDECYT project 1191764

Authors and Affiliations

Javier E. Gómez‑Lagos1 · Marcela C. González‑Araya2 · Rodrigo Ortega Blu3 ·

You might also like

Table 4 Consequences of yield prediction inaccuracies regarding FSC decisions

Fig. 3 Map of fruit yield in the

Fig. 4 Variograms for every date in the sample

Fig. 5 Silhouette values for each

Fig. 6 S value for each configura-

Fig. 7 Four clusters obtained by the best Silhouette function configuration

Fig. 8 Six clusters obtained by the best S function configuration

Fig. 9 MSE for the validation

Fig. 10 MSE for the validation

Fig. 11 MSE ranges from 0 to

Fig. 12 MSE comparison of the

Fig. 13 Box-plots of ME and MSE obtained in the test sample

Supplementary Information The online version contains supplementary material available at https://doi.