Professional Documents
Culture Documents
Authors’ list
Rodrigo Pereira Ramos (corresponding author) ORCID: 0000-0002-9243-3887, Federal University of São
Francisco Valley (Univasf) – College of Electrical Engineering, Juazeiro - BA, Brazil, e-mail:
rodrigo.ramos@univasf.edu.br
Jéssica Santana Gomes, Juazeiro - BA, Brazil, e-mail: jessicasgomes.3007@gmail.com
Ricardo Menezes Prates ORCID: 0000-0002-1580-9828, Federal University of São Francisco Valley
(Univasf) – College of Electrical Engineering, Juazeiro - BA, Brazil, e-mail: ricardo.prates@univasf.edu.br
Eduardo F. Simas Filho ORCID: 0000-0001-8707-785X, Federal University of Bahia (UFBA) – Department
of Electrical Engineering, Salvador - BA, Brazil, e-mail: eduardo.simas@ufba.br
Barbara Janet Teruel Mederos ORCID: 0000-0002-5102-6716, State University of Campinas (UNICAMP)
– College of Agricultural Engineering, Campinas - SP, Brazil, e-mail: barbara.teruel@feagri.unicamp.br
Daniel dos Santos Costa ORCID: 0000-0001-7703-3183, Federal University of São Francisco Valley
(Univasf) – College of Agricultural and Environmental Engineering, Juazeiro - BA, Brazil, e-mail:
daniel.costa@univasf.edu.br
Abstract
Background
The San Francisco Valley region from Brazil is known worldwide for its fruit production and exportation, especially
grapes and wines. The grapes have high quality due not only to the excellent morphological characteristics, but also to
the pleasant taste of their fruits. Such features are obtained because of the climatic conditions present in the region. In
addition to the favorable climate for grape cultivation, harvesting at the right time interferes with fruit properties.
Results
This work aims to define grape maturation stage of Syrah and Cabernet Sauvignon cultivars with the aid of deep
learning models. The idea of working with these algorithms came from the fact that the techniques commonly used to
find the ideal harvesting point are invasive, expensive, and take a long time to get their results. In this work,
Convolutional Neural Networks (CNNs) were used in an image classification system, in which grape images were
acquired, pre-processed and classified based on their maturation stage. Images were acquired with varying illuminants
that were considered as parameters of the classification models, as well as the different post-harvesting weeks. The best
models achieved maturation classification accuracy of 93.41% and 72.66% for Syrah and Cabernet Sauvignon,
respectively.
Conclusions
It was possible to correctly classify wine grapes using computational intelligent algorithms with high accuracy,
regarding the harvesting time, corroborating chemometric results.
Keywords
Deep learning, Grape maturation, Image processing, Post-harvest
Declarations
Funding – This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior -
Brasil (CAPES) - Finance Code 001.
Conflicts of interest/Competing interests – The authors declare that they have no conflict of interest.
Availability of data and material – The datasets generated during and/or analysed during the current study are
available from the corresponding author on reasonable request.
Code availability – Not applicable
This article has been accepted for publication and undergone full peer review but has not been
through the copyediting, typesetting, pagination and proofreading process which may lead to
differences between this version and the Version of Record. Please cite this article as doi:
10.1002/jsfa.10824
This article is protected by copyright. All rights reserved.
Non-Invasive Setup for Grape Maturation Classification using Deep Learning
Introduction
The actual demanding for food consumption worldwide has facing challenges regarding sustainability and production of
high-level quality supplies. In particular, the viticulture sector (wines and juices) has been giving attention to quality as
a way of meeting market requirements. Thus, determining the right harvesting time, with good precision, is crucial to
obtain grapes, and consequently wines, with high quality and increased aggregate values.
To obtain grapes for wine production, they must be harvested at the optimum point of ripeness.1 The optimal ripening of
grapes involves different events, being usually accompanied by a change in the color of the epicarp due to a
modification in the concentration of pigments in surface tissues.2 Such color change, resulting from biochemical
reactions inside the fruits, may be related to maturity.3 For example, in red grapes the fruits still green have a higher
proportion of chlorophyll, which gradually decreases with maturity, increasing the concentration of anthocyanins and
flavonoids.4,5,6
Monitoring of maturity and quality attributes often involves analytical techniques that are time-consuming, destructive
and sometimes can require sophisticated equipment. Analysis of total soluble solids (TSS), total anthocyanins and
flavonoids are examples of such techniques. With the purpose of developing non-destructive techniques, the use of
electronic instrumentation and machine learning techniques has been under consideration in recent years for
determining grapes maturation.7,8 The change of skin color has been correlated to grape maturation using hyperspectral
images and spectroscopy Vis-NIR compared to physiological and biochemical events.9,10 However, such methods
require very costly equipment for image and spectra acquisition to perform maturity determination.
A simpler and yet powerful technique for optimal harvesting time is the use of images acquired from traditional optic
cameras and artificial intelligent systems. This can be done combining automatic image acquisition systems, image
processing techniques and machine learning tools. Specifically, deep learning (DL) algorithms, as for example
Convolutional Neural Network (CNN) that is a class of algorithms mainly utilized for image classification, generating a
higher-level abstraction of the input data based on convolutional layers.11 CNNs have been widely employed in several
The present paper aims at the development of a framework for non-invasive grape classification based on their
maturation stage using deep machine learning algorithms. The grapes were collected in a vineyard from SFV region of
Brazil. An apparatus for image acquisition was used in combination with a computational CNN system for
classification, considering standard RGB images from photos taken with different illumination conditions. Several
harvesting times were tested, and the system generated outputs related to the maturation stages of the fruits. It is
important to state that CNN algorithms are not related to image colors, but to image textures, which is why color
The present paper is structured as follows. Section Related Works explores existing methods for estimate grape
maturation levels. Section Materials and Methods details the data acquisition process as well as the convolutional neural
network architectures employed in this study. Section Results and Discussions presents the experimental analysis and
results. Finally, we conclude the article and discuss future lines of work in Section Conclusions.
Related Works
Some works have analyzed fruit attributes using machine learning and images. Zuñiga et al.12 analyzed grapevine
maturity using seed images in a pattern recognition procedure with traditional neural networks. Seed images were
acquired in a controlled environment and an invariant color model was used to segment the images and collect several
feature descriptors that were feed to the neural network. It was reported an accuracy of 86% in the test set with a total of
Seng et al.13 developed an image database of different varieties in several maturation stages. They employed some
traditional machine learning algorithms – support vector machines (SVM), k nearest neighbor (k-NN), logistic
regression, classification tree, boosted tree, and social adaptive ensemble (SAE) – to detect grape varieties using
different image color spaces, such as Red-Green-Blue (RGB), YCbCr, HSV and Lab. The best classification rates, using
SVM and Lab, yielded 84.4% and 89.1%, for white and red cultivars, respectively.
Deep learning with CNN algorithms were used to detect, segment and track grapes using natural images taken from
low-cost cameras14. With a dataset comprised of 408 grapes, the authors have reached an F1-score of 0.91 in the
segmentation process, using ResNet 101 architecture for feature extraction followed by a Mask R-CNN for
segmentation. For detection tasks, three CNN networks were employed, with the best scores being reached with Mask
R-CNN. However, the authors did not consider maturation analysis in the work.
A robotic vision system based on a CNN framework named FasterRCNN was used to estimate ripeness of sweet
pepper15. Two approaches were employed: a multi-class task for detection of each ripeness to be detected and parallel
layers, one used to detection and another for ripeness estimation. On both cases, a VGG-16 architecture was used with
transfer learning, reaching F1-scores of 0.77 and 0.72 for parallel and multi-class approaches, respectively.
Hang et al.16 proposed a CNN model to classify apple trunk tree diseases. They generated a dataset composed of 607
images divided into three classes and used five CNN architectures for classification comparison. The best results were
reached using a VGGNet architecture with modified loss function, with which accuracy of 94.5% was obtained.
Deep learning was also used to detect grapevine phenotyping17. The authors developed a sensorial structure based on an
Intel RGB-D sensor to automatically estimate canopy volume and brunch detection and counting using four CNN
Other computational systems, either based on traditional classification algorithms or deep learning techniques, have
been widely employed for various agricultural and food challenges, extensively reported on the survey work of
Kamilaris et al.18. However, to the best of the authors' knowledge, no approach with convolutional neural networks,
combined with an image dataset with different illuminant spectra, have yet been proposed to classify grape ripeness.
This section provides detailed information regarding the experimental procedures and materials used for the work
development. Both hardware and software arrangements were taken to provide reliable images acquisition in a
controlled fashion and achieve trustworthy data analysis. Figure 1 shows a flowchart representing a compact diagram of
the methodology used in this work. Briefly explaining, at first, the images were acquired and subsequently adjusted in
size and pixel-normalized. After that, the two separated datasets were split, regarding the harvesting week, and fed both
deep learning models (VGG-19 and the proposed one). Data were split into training, validation, and test subsets, from
which the classification performance was extracted and the determination of total soluble solids, anthocyanins and
Image acquisition
Samples of wine grapes (Vitis Vinifera L.) were collected from a vineyard located in the city of Lagoa Grande - PE
(9.05363; -40.19868), in the SFV region. An amount of 432 and 576 berries from Syrah and Cabernet Sauvignon
cultivars were harvested, respectively. The harvests were interspersed by six dates for Syrah and eight dates for
Cabernet Sauvignon and accounted for a total of 72 berries for each date. The Syrah cultivar was precocious in the
climatic conditions of the SFV region, therefore the number of smaller harvests. Fruit collection was carried on a
weekly basis and corresponded to the period from May to August 2017, reaching various stages of fruit maturity. These
stages were classified according to the harvesting week, with each week, or set of weeks, being considered as a different
class representing a given stage. To obtain representative samples, a stratified sampling method was adopted. A middle
row of the vineyard was chosen and from it, six plants from the center were selected, being two bunches marked by
plant. The selected bunches were divided into three positions: top, middle and bottom. From each position, two berries
were removed, making a total of six berries per bunch. These samples were taken as representative of the bunch. It
should be noted that it is impracticable to measure SST, total anthocyanins and yellow flavonoids in the entire bunch.
The individual berries were packed in waterproof packaging and stored in ice for transportation, immediate processing
and acquisition of the spectra. Before each measurement, it was necessary to wait for the samples to stabilize at room
temperature of 25 °C. The individual berries were submitted to the reference measurements, namely total soluble solids,
Individual grape images were obtained from a Canon EOS REBEL T5i camera, with ISO-100 speed, f/5.6 scale, 48 mm
focal distance and 0.8s exposure time, allocated on top of a box, painted in matte black inside. Inside the dark chamber,
on its top, there were fifteen 3W LED illuminants equally distributed between red, green, blue, warm white and cool
white at an angular distance of 120°, as shown in Figure 2b. For each fruit, five RGB (Red, Green, Blue) images were
acquired (as shown in Figure 3), one for each illuminant turned on at a different time, thus resulting in a database of 360
images for each harvest week. The use of different illuminants was performed to take advantage of the several
interactions between visible sources of illumination and fruits and how they could improve the classification models.
The hardware setup for the image acquisition step is shown in Figure 2a. The above-mentioned dataset, containing all
a) b)
Fig. 2 Image acquisition setup: (a) Dark chamber outside view; (b) LED positioning inside the chamber.
Fig. 3 Grapes illuminated by different light spectra: (a) red; (b) green; (c) blue; (d) cool white; (e) warm white.
In the present work, two CNN architectures were employed, as detailed described in the following subsections, with the
number of outputs equal to the number of classes (harvesting weeks). All computational experiments were implemented
VGG-19
Acronym for Very Deep Convolutional Networks for Large-Scale Visual Recognition,20 VGG-19 is an CNN model
developed by researchers from the University of Oxford. As shown in Figure 4, the network has 19 layers, with 16
convolutional layers and 3 fully connected layers. It can classify images into different categories and, as it is trained
with the ImageNet database, its input image size is 224 x 224 pixels. VGG-19 is considered to be a simple network
because it has convolutional layers with kernel dimensions of 3 x 3. Although it has smaller convolution operations
compared to other architectures, it still has 144 million parameters, a quantity that requires a high computational cost.
To reduce its volume, maxpooling layers were used to reduce the amount of weights to be learned and avoid overfitting.
A pre-trained network model was used, with transfer learning to accelerate the learning process and improve
classification performance. For this work, only the last two layers were fine-tuning with the grape image dataset,
namely FC1 layer and output layer, representing a total of 102,780,932 trainable parameters.
Proposed architecture
Whereas VGG-19 is a robust, powerful CNN architecture, even with the use of transfer learning its training stage is
very time consuming with high computational efforts needed to implement it. We propose in this article a simplified
CNN architecture directed to decrease training computational costs, with reduced complexity compared to VGG-19,
The proposed CNN architecture is shown in Figure 5. It is composed of 10 convolutional layers, interlaced with
maxpooling layers, a fully connected (FC) layer at the end with 512 parameters and an output layer with softmax
activation function. The CNN outputs the probability of each entry image being of a given class (harvesting week). The
total number of trainable parameters was 704,643, which is far less (0.68%) than the number for VGG-19 model.
Image adjustment
The original images were obtained with high dimensions (3456 x 5184 pixels on 3 RGB channels with 8-bit resolution
each) and the grapes were almost imperceptible, filling only a small area of the image. Therefore, a step of resizing the
images were performed in such a way that the fruits stayed in the center of the image and it completely contained the
berry. The shape of ImageNet input images is 224 x 224 pixels per RGB channel, and those dimensions are used by
VGG-19 network. To match the dimensions, therefore, it was necessary to crop the images keeping the fruit in the
center while preserving pixel resolution. The resizing step returned 8-bits resolution images with dimensions 224 x 224
pixels.
After cropping and resizing, the images were normalized to the interval [0,1] and separated according to their classes,
with each class representing a different week. One-hot encoding matrix was used to provide class labels and the data
were saved in tensor formats. These data were then divided into training, validation, and testing sets in the proportion of
Both CNN architectures were trained iteratively with 50 epochs and 32 batches. The loss function used was the well-
known categorical cross-entropy and CNN weights optimization was conducted using Adam optimizer21 with learning
rate 10-3.
To evaluate both CNN models, confusion matrix, accuracy, and receiver operating characteristic (ROC) curves were
used. A confusion matrix shows the actual and predicted classification numbers for each class in the model. The
accuracy rate is a measure of the model prediction capacity on a given data set. It gives the rate of correct outputs of the
model. Mathematically, the accuracy can be defined by Acc = (TP + FN)/Total, where Total represents the sum of all
confusion matrix elements, TP the number of true positives and FN the number of false negatives. A ROC curve is a
graphical representation of the performance of a binary classifier system with varying classification thresholds. It is a
plot of TP rates (sensitivity) on the vertical axis versus FP rates (1-specificity) on the horizontal axis. The perfect model
The determination of TSS was performed from direct reading on a digital refractometer (HI 96804, Hanna Instruments,
USA), with a measurement range between 0 and 85 °Brix and accuracy of ±0.2 °Brix, using two drops of the sample,
The anthocyanins (Anth) and flavonoids (Flav) contents in the grape skin were determined according to Francis22, and
this single pH method had no difference with the AOAC method. A mass of 0.5 g of skin and pulp was weighed, then it
was added 25 mL of ethanol extraction solution (95%) acidified with HCl (1.5 N) in a proportion of 85:15. The samples
were macerated for one minute and next transferred to containers sheltered from light and stored in a refrigerator. After
24 hours, the supernatant was collected for reading on a spectrophotometer (Spectronic BioMate 5 UV-Vis, Thermo
Electron, UK) at 535 nm for anthocyanins and 374 nm for flavonoids. The results were expressed in mg.100 g-1 using
where the dilution factor is 5000, the extinction coefficient for anthocyanins (E1%) is 98.2 and the extinction coefficient
Syrah cultivar
Two scenarios were considered to evaluate the capacity of the CNN models to predict maturation stages. The first one
considered six classes representing all six harvesting weeks. In the second scenario, the data set was modified to work
with three classes for which every class encompasses grape samples from two consecutive weeks, i.e., the first class is
composed by grape images from the first and second harvesting weeks, the second class from third and fourth weeks
and the last class from fifth and sixth weeks. The motivation for this modified dataset will be explained later.
For the first case, there were a total of 1260 grape images, with 360 for each of the six classes or weeks (72 samples per
week x 5 LEDs). For graphical illustration purposes, Figures 6a and 6b shows VGG-19 accuracy and loss evolution for
the training and validation steps in the six classes scenario. Each curve legend shows the best values obtained for each
case. It can be noticed the value of 85.15% for the accuracy in the validation stage.
Figure 6c illustrates the normalized confusion matrix for VGG-19 model considering the test dataset. Actual values are
presented on the vertical axis and predicted values on the horizontal axis. The harvesting weeks are represented as
classes {0, 1, ..., 5}. It can be seen that the classifier performed better in some classes (weeks), as it was the case for
weeks 1, 3, and 4 (classes 0, 2, and 3, respectively), and had decreased performance for weeks 2, 5, and 6 (classes 1, 4,
and 5), resulting in 86.33% of average accuracy for the test dataset.
Fig. 6 VGG-19 (a) accuracy, (b) loss and (c) normalized confusion matrix for the training and validation stages on the
six classes scenario for Syrah cultivar. The values shown in legends represents the best models within 50 epochs.
The proposed CNN performance, regarding accuracy and loss curves, is shown in Figures 7a and 7b for the six-class
scenario. The best model has achieved a validation accuracy of 88.10%, better than VGG-19 used with transfer
learning. Figure 7c show the normalized confusion matrices of the proposed model. A test accuracy of 89.62% was
reached, outperforming VGG-19 results. A similar behavior was observed for the VGG-19 model occurred for the
proposed CNN, in which some weeks led to more correct CNN predicted values.
Fig. 7 Proposed CNN (a) accuracy, (b) loss and (c) normalized confusion matrix for the training and validation stages
on the six classes scenario for Syrah cultivar. The values shown in legends represents the best models within 50 epochs.
Confusion matrix results, both from VGG-19 and the proposed models, showed an interesting pattern when considering
a one-to-one relation between class labels and harvesting weeks. Clearly, grape images from some weeks produce
outcomes with a quite small performance. One possible explanation on this behavior is that two weeks may represent
little changes on the visible characteristics of the fruits, leading to less activating maps of the CNN models. Figure 8
illustrates some selected examples of the most representative fruit images for each class. As can be seen, images from
neighboring classes have more similar patterns than others, which suggests a possible fusion of classes. Therefore, a
new scenario was considered, gathering the neighboring classes, resulting in a three-class dataset.
Fig. 8 Examples of grape images taken from each class, representing the most common patterns for each class: (a) First
week; (b) second week; (c) third week; (d) fourth week; (e) fifth week; (f) sixth week.
The confusion matrices for both VGG-19 and the proposed models for the three classes scenario are shown in Figure 9.
ROC curves are plotted in Figure 10 for both deep learning models as well as the area under the ROC curve (AUC). The
proposed model slightly outperforms VGG-19 model, as it is perceived from the closer the curves are to the unit step
function.
Fig. 9 Normalized confusion matrices for Syrah cultivar and three classes scenario: (a) VGG-19 model; (b) Proposed
CNN model.
Fig. 10 ROC curves for Syrah cultivar and three classes scenario: (a) VGG-19 model; (b) Proposed CNN model.
As a matter of illustration, Figure 11 shows some image samples that were incorrectly classified by the two deep
learning models. As it can be noticed, there were very confusing images from one harvesting week that are being
Fig. 11 Examples of incorrectly classified Shiraz grape images: (a) sample from the first week classified as second
week; (b) sample from the second week classified as third week; (c) sample from the third week classified as fourth
week.
The average accuracy results for the two considered scenarios are illustrated in Table 1 for both VGG-19 and the
proposed CNN models. As it can be noticed, the fusion of classes resulted in better accuracy performances for both
models, as it was inferred. Moreover, VGG-19 was once again outperformed by the proposed simpler CNN model. It is
worthy to add that VGG-19 model is a very robust, efficient CNN and a highly hardware consuming architecture.
Besides that, it did not fit well for the present dataset.
Table 1 Validation and test average accuracy results for the two considered scenarios of Shiraz cultivar.
MODEL
Scenario VGG-19 Proposed CNN
Val. Acc. Test Acc. Val. Acc. Test Acc.
Six Classes 85.15% 86.33% 88.10% 89.62%
Three Classes 93.93% 91.30% 92.28% 93.41%
Chemometric results corroborate those findings, as it can be observed from Figure 12, that shows the boxplots of mean
and standard deviation for TSS, anthocyanins and flavonoids and the linear correlation for each maturation class. There
is an increase in quality attributes during ripening with accumulation of sugars (TSS) and pigments (anthocyanins and
flavonoids) for Syrah grapes, as expected. Moreover, it is observed that the decrease in the number of ripening classes
(harvesting weeks) increased the correlation for TSS and flavonoids, which was observed in the CNN models for the
Syrah cultivar.
Fig. 12 Boxplot of mean, standard deviation, and correlation of ripeness classes for quality attributes TSS (a) and (d),
anthocyanins (b) and (e), and flavonoids (c) and (f) of the Syrah cultivar.
The same approaches applied to Syrah cultivar were used for Cabernet Sauvignon, with some adaptations. For this
cultivar, eight harvesting weeks were employed because of the grape's different physicochemical characteristics. In the
first scenario, eight class labels were used in the CNN models to represent each of the harvesting weeks. In the second
scenario, the same strategy of combining neighboring weeks was conducted, leading to a four-class scenario.
The results for the CNN models applied to Cabernet Sauvignon dataset are summarized in Table 2. The table shows
average validation and test accuracy values for both considered models. Gathering neighboring classes does not
improve VGG-19 test performance but substantially enhances the test average accuracy of the proposed CNN model. It
can be inferred that the proposed model is more sensitive to fusion of harvesting weeks than VGG-19 model, a similar
behavior seen for Syrah cultivar. ROC curves are shown in Figure 13 and confirm the results stated above. The best
results of VGG-19 against the proposed CNN model may be related to the higher number of layers of the VGG-19
model.
Table 2 Validation and test average accuracy results for the two considered scenarios of Cabernet Sauvignon cultivar.
MODEL
Scenario VGG-19 Proposed CNN
Val. Acc. Test Acc. Val. Acc. Test Acc.
Eight Classes 80.69% 76.23% 64.52% 65.30%
Four Classes 81.63% 80.97% 79.28% 72.66%
Fig. 13 ROC curves for Cabernet Sauvignon cultivar and four classes scenario: (a) VGG-19 model; (b) Proposed CNN
model.
As shown in Tables 1 and 2, both network models performed better for the Syrah dataset than for Cabernet Sauvignon
images. This can be explained looking at some images from Cabernet Sauvignon grapes representing the average
characteristic of each week, as illustrated in Figure 14. After the initial maturation stages, the samples are very similar,
Fig. 14 Examples of grape images taken from each class, representing the most common patterns for each class: (a) first
week; (b) second week; (c) third week; (d) fourth week; (e) fifth week; (f) sixth week; (g) seven week; (h) eight week.
Figure 15 shows the boxplots of mean and standard deviation for TSS, anthocyanins and flavonoids and the linear
correlation for each maturation class. There is an increase in quality attributes during ripening with accumulation of
sugars (TSS) and pigments (anthocyanins and flavonoids) for Cabernet Sauvignon grapes. Also, it is observed that the
decrease in the number of ripening classes did not interfere in the correlation for quality attributes, different from the
Fig. 15 Boxplot of mean, standard deviation, and correlation of ripeness classes for quality attributes TSS (a) and (d),
anthocyanins (b) and (e), and flavonoids (c) and (f) of the Cabernet Sauvignon cultivar.
Although an important research topic with direct application to grape production, there are very few works in the
specialized literature regarding the application of machine learning to classify grapes based on their maturation stage.
Visible and near-infrared (Vis-NIR) spectroscopy was used to classify grapes from a China local cultivar using linear
discriminant analysis (LDA), artificial neural networks (ANN) and support vector machine (SVM).23 Precision results
reached 100% when combining LDA with principal component analysis (PCA) but no accuracy information was
provided by the authors. Despite the good results, the cited work employed a very costly and difficult technique in
Kangune et al.24 used CNN and SVM algorithms to classify grape ripeness of a local cultivar from India. Two classes
were considered, ripe and unripe, and classifiers outcomes were obtained regarding colors and morphological shape
features of the fruits. The authors reported validation accuracies equal to 79.49% for CNN and 69% for SVM that when
compared to the ones presented here showed better performance than the Syrah cultivar, but far worse than for the
Cabernet Sauvignon.
Conclusions
This study presented a procedure for classifying the maturation stage of wine grapes from two cultivars, namely Syrah
and Cabernet Sauvignon, using convolutional neural networks. Images of grapes were acquired with five different
illuminants and used as inputs to two CNN models for classifications based on their harvesting week representing
different maturation stages. Two scenarios were considered for both cultivars: one comprising all harvesting weeks and
another gathering samples from neighboring weeks. The idea was to merge similar samples in each class to improve
classification rates.
A simple CNN model was proposed, and its performance was compared to a pre-trained VGG19 model. The proposed
network outperformed the VGG-19 model in almost all cases considered, showing an average test accuracy of 93.41%
for Syrah dataset in a three classes scenario and 72.66% for Cabernet Sauvignon dataset in a four classes scenario. One
may infer that interspersed harvests in two weeks are sufficient and better to train the model, a behavior that was
corroborated by chemometric results. In addition to reducing the time spent on harvesting and obtaining the images, the
expenditure in monetary terms would be lower, since the personal responsible for collecting grape samples would
This work suggests the possibility of developing an apparatus comprising software and hardware steps to be used in
References
1. Jackson, D., Lombard, P., 1993. Environmental and management practices affecting grape composition and
wine quality – A review. American Journal of Enology and Viticulture, 44, 409–430.
2. Agati, G., Pinelli, P., Ebner, S.C., Romani, A., Cartelat, A., Cerovic, Z.G., 2005. Nondestructive evaluation of
anthocyanins in olive (Olea europaea) fruits by in situ chlorophyll fluorescence spectroscopy. Journal of
3. Choong, T.S.Y., Abbas, S., Shariff, A.R., Halim, R., Ismail, M.H.S., Yunus, R., Ali, S., Ahmadun, F.R, 2006.
Digital Image Processing of Palm Oil Fruits. International Journal of Food Engineering, 1556-1560.
4. Agati, G., Meyer, S., Matteini, P., Cerovic, Z.G., 2007. Assessment of anthocyanins in grape (Vitis vinifera L)
berries using a noninvasive chlorophyll fluorescence method. Journal of Agricultural and Food Chemistry, 55,
1053–1061.
5. Agati, G., Traversi, M.L., Cerovic, Z.G., 2008. Chlorophyll fluorescence imaging for the noninvasive
assessment of anthocyanins in whole grape (Vitis vinifera L.) bunches. Photochemistry and Photobiology, 84,
1431–1434.
6. Hagen, S.F., Solhaug, K.A., Bengtsson, G.B., Borge, G.I.A., Bilger, W., 2006. Chlorophyll fluorescence as a
tool for non-destructive estimation of anthocyanins and total flavonoids in apples. Postharvest Biology and
7. Ribera-Fonseca, A., Noferini, M., Jorquera-Fontena, E., Rombolà, A.D., 2016. Assessment of technological
maturity parameters and anthocyanins in berries of Cv. Sangiovese (Vitis vinifera L.) by a portable VIS-NIR
8. ElMasry, G.M., Nakauchi, S., 2016. Image analysis operations applied to hyperspectral images for non-
invasive sensing of food quality – A comprehensive review. Biosystems engineering, 142, 53–82.
9. Nogales-Bueno, J., Hernández-Hierro, J.M., Rodrı́guez-Pulido, F.J., Heredia, F.J., 2014. Determination of
technological maturity of grapes and total phenolic compounds of grape skins in red and white cultivars during
ripening by near infrared hyperspectral image: A preliminary approach. Food chemistry, 152, 586–591.
10. dos Santos Costa, D., Mesa, N.F.O., Freire, M.S., Ramos, R.P., Mederos, B.J.T., 2019. Development of
predictive models for quality and maturation stage attributes of wine grapes using VIS-NIR reflectance
11. Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S., 2017. Efficient processing of deep neural networks: A tutorial and
12. Zuñiga, A., Mora, M., Oyarce, M., Fredes, C., 2014. Grape maturity estimation based on seed images and
13. Seng, K.P., Ang, L.M., Schmidtke, L.M., Rogiers, S.Y., 2018. Computer vision and machine learning for
14. Santos, T.T., de Souza, L.L., Santos, A.A.d., Avila, S., 2019. Grape detection, segmentation and tracking using
deep neural networks and three-dimensional association. arXiv preprint arXiv:1907.11819, 1–36.
15. Halstead, M., McCool, C., Denman, S., Perez, T., Fookes, C., 2018. Fruit quantity and ripeness estimation
using a robotic vision system. IEEE Robotics and Automation Letters, 3, 2995–3002.
16. Hang, J., Zhang, D., Chen, P., Zhang, J., Wang, B., 2019. Identification of apple tree trunk diseases based on
improved convolutional neural network with fused loss functions, in: International Conference on Intelligent
17. Milella, A., Marani, R., Petitti, A., Reina, G., 2019. In-field high throughput grapevine phenotyping with a
18. Kamilaris, A., Prenafeta-Boldú, F.X., 2018. Deep learning in agriculture: A survey. Computers and electronics
19. Chollet, F., et al., 2015. Keras. https://keras.io. Accessed February, 2020.
20. Simonyan, K., Zisserman, A., 2014. Very deep convolutional networks for large-scale image recognition. arXiv
21. Kingma, D.P., Ba, J., 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 1–
15.
22. Francis, F.J., 1982. Analysis of Anthocyanins. In: Markais, P. (Ed.), Anthocyanins as Food Colors. Academic
23. Lv, G., Yang, H., Xu, N., Mouazen, A.M., 2012. Identification of less-ripen, ripen, and over-ripen grapes
during harvest time based on visible and near-infrared (vis-nir) spectroscopy, in: 2012 2nd International
Conference on Consumer Electronics, Communications and Networks (CECNet), IEEE. pp. 1067–1070.
24. Kangune, K., Kulkarni, V., Kosamkar, P., 2019. Grapes ripeness estimation using convolutional neural network
and support vector machine, in: 2019 Global Conference for Advancement in Technology (GCAT), IEEE. pp.
1–5.