You are on page 1of 12

Fruit Quality

Classifying model

Deep Learning Neural Networks


Post-Graduation in Enterprise Data Science & Analytics
GROUP 1
Bruno Teles
Carolina Golding
Manuel Fernandes
Ricardo Costa
Thiago Aguiar

Supervisors
Mafalda Sá Velho
Illya Bakurov
ABSTRACT
The aim of this study was to apply Convolutional Neural Networks (CNNs) to identify edible
fruits based on their visual characteristics. To conduct the tests, we identified a dataset [1]
that is composed of images of fruits from 3 different kinds: Apples, Oranges and Bananas.
The fruits were also divided in 2 states of conservation (fresh and rotten). Thus, we trained
and tested the models with a total of 6 categories.

The methodology adopted was to implement an initial base model. Then attempt to improve
the performance by applying different techniques. To visualize the performance and compare
models, we employed tensorboard app and matplotlib library.

The results obtained shows that there exists a strong potential for using CNNs in food quality
control at large scale due to the high values of models’ precision and recall rates.

KEYWORDS
Convolutional Neural Networks (CNN), Classes, regularization, batch normalization, data
augmentation, confusion matrix

STUDY OBJECTIVES
The goal of this study was twofold: first to gain an understand of the basic components and
typical architectures of convolutional neural networks, and secondly to build a neural network
that can recognize different fruit classes from rotten and fresh fruit. The study was carried out
using Python and respective libraries while Keras and Tensorflow were utilized to build the
deep learning model.

1
INTRODUCTION
Consumers have always been demanding and there are no signs that they will become less
so. When it comes to fruit and vegetables, there is a clear preference for big, juicy, crunchy
and plump pieces. As such, the work in the Quality Department of a large food retail company
is incredibly demanding due, not only to the high-quality standards but also the speed in which
the work must be done.

Workers need to be able to sort through tons of produce setting aside what is fit for
consumption from what isn’t. Luckily, it is reasonably easy to identify edible, fit for consumption
produce, from rotten produce due to visual cues such as: scrapes and cuts to the skin barrier,
mold or even discoloring.

For this project, we will be exploring these visual cues to tag photos of fruit as being fit for
consumption or, not. Using CNN, a type of neural network architecture, which is suitable for
image classification and object recognition tasks that involve the processing of pixel data. This
technology can be useful in improving the efficiency and accuracy of food quality inspection
processes, reducing food waste, and ensuring food safety for consumers.

It is important to note that the use of CNNs for food spoilage detection is still in its early stages
and further research and development is needed to fully realize its potential. There are some
startups and research organizations are already working on developing these solutions.

METHODOLOGY
This project made use of two tools: Jupyter notebook for the exploration of the dataset, the
creation and analysis of the models and, TensorBoard for the visualization and analysis of the
models.

The first step of the project was the exploration of the dataset. We started by importing images
and respective labels. The images were then split to build training and validation datasets.
The test dataset was available from the data source on Kaggle, thus not requiring any splits
from the original, full dataset.

Once the dataset was prepared and explored, we aimed at investigating four models of CNN
(Convolutional Neural Network): a basic CNN model without any regularization, few layers,
and a standard learning rate that would give us an indication of which parameters we should
tune, and which further steps would be required to improve the model’s performance. The
resulting combination of best parameters were carried forward to the subsequent models that

2
increased in complexity, namely the second model having been submitted to regularization.
Specifically, we added early stopping, batch normalization and dropout. With regards to the
third model, we performed L2 regularization and, finally, the fourth and final model was
submitted to augmentation with the purpose of improving model performance.

At the end of each model run, a confusion matrix was used to evaluate the model's accuracy
for each class and to determine the overall accuracy by evaluating the model's ability to
correctly identify the classes using the test dataset.

DATASET EXPLORATION
The dataset identified for this work is divided into two distinct folders: one for training and one
exclusively for testing. The training set is composed of 10 901 images and the test set of 2
689.

Both have pictures belonging to 6 categories. These categories are characterized by 3 distinct
types of fruits (apples, bananas, and oranges), each one in two states of preservation (fresh
and rotten).

The number of items available per category is shown in the following table.

Table 1 Number of objects for each category

This distribution of categories suggests being representative across all classes. The number
of objects in each class in the validation set is an acceptable amount. This number allows
the splitting of the dataset for the creation of the validation set.

From observing some random objects from the dataset, we found that the images in general
have a good level of detail and are varied in each class. This variety included backgrounds

3
of different colors, fruits of different colors and sizes. In a single photo there can be one or
multiple objects. Fruits may be whole or peeled.

We also found that some of these images suggest they are augmented (e.g. rotations).
Some examples can be observed in the following figure:

Figure 2 Random examples of objects from the dataset

Broadly speaking, the images are of good quality and are fit for purpose.

MODEL DEFINITION
Baseline – CNN v1

Sequential model created using TensorFlow's ‘tf.keras’ library. This model consists of a
sequence of several layers: two Conv2D layers with ReLU activation, two MaxPooling2D
layers, a Flatten layer, three Dense layers with ReLU activation, and a final Dense layer with
Softmax activation. The input shape of the first Conv2D layer is specified as (100, 100, 3),
indicating that the input tensor will have 100x100 pixels and 3 color channels7. The final
Dense layer has 6 units, meaning that the model will output predictions for 6 different
classes (rotten and fresh apples, oranges, and bananas)8.

The baseline model was compiled using the Adam optimizer (learning rate of 0.01), the
categorical crossentropy loss function, and the accuracy metric. These steps allowed the
compiled model to be ready to be trained on the dataset.

To prevent overfitting, we applied ‘EarlyStopping’, by interrupting the training process when,


supposedly, the performance on the validation set starts to degrade 1,6. In this case, we set
patience to 5, ie, training would stop if the validation loss did not improve for 5 epochs.

Hyper tuning – CNN v1

This stage aims to improve the model obtained previously by changing key parameters. To
avoid a long running time, a few isolated and independent tests of some configurations are

4
performed. Then the accuracy of the validation and respective loss were evaluated with the
purpose of comparing and assessing the models.

The parameters that were chosen to be changed are the batch size (64, 128, 256), learning
rate (0.01, 0.001) and number of layers (2, 3, 4).

CNN v2

This variation of the model included ‘BatchNormalization’ and dropout regulatizations.

BatchNormalization normalizes the activations of the previous layer for each batch, improving
the training speed and stability of deep neural networks 5. For this model we chose not to define
the parameters for ‘BatchNormalization’, opting instead for the default parameters.

The dropout regularization may improve the model generalization ability, by setting a random
subset of neurons to zero during training2.

CNN v3

The third model experimented starts as the foundation of the second one, but now adding a
new regularizer 'L2 regularization’.

This regularization method adds a penalty to the models, keeping their weights small9. That
prevents the model giving too much importance to one feature 9.

CNN v4

Finally, with regards to the fourth model, we defined an image data augmentation pipeline
using a Keras Sequential model. Augmentation will be added for a model with similar
parameters to the previously tested models.

The augmentation pipeline is composed of a series of data augmentation operations such as


RandomFlip, RandomRotation, RandomZoom and RandomTranslation applied one after the
other.

RESULTS AND DISCUSSION


Baseline – CNN v1

The output from training the model shows that the best epoch was the 13 th, with training
accuracy of 0.9999 and validation accuracy of 0.8963.

5
Epoch 13/30
69/69 [==============================] - 37s 523ms/step - loss: 0.0021
- acc: 0.9999 - val_loss: 0.4959 - val_acc: 0.8963
The large difference observed between training and test accuracy may indicate overfitting of
the model to the training data. It also suggests that there is potential for improvement with
parameters tuning.

Evaluating the model on the test dataset, the results can be seen in the following confusion
matrix:

Figure 3 Confusion Matrix of CNNv1 on test dataset

CNNv1 registered a test accuracy of 89.25%, accurately predicted the classes in 2 118
occasions (as shown in the blue squares). It performed worse when trying to classify Class 3
and was inconsistent with the classification of Class 5.

Hyper tuning – CNN v1

In total 10 tests were performed by changing the initial set parameters. The summary of the
values obtained can be viewed in the following table. The order in which the tests were
performed is as suggested by the test name. The tests are ordered by validation result.

6
Table 2 - Model Tunning

One of the main effects that impacted the validation performance was the number of layers,
where the worst tests had only 2 layers. The final configuration that had the best
performance was for a batch size of 128, with 4 layers and a learning rate of 0.01.

The tuning phase finished with a validation accuracy of 97.8%, far better than the previous
recorded 89.6%.

CNNv2

Running the second model with the indicated modifications to the first, the result of the tests
and validation of the best season is as follows.

Epoch 14/30
69/69 [==============================] - 97s 1s/step - loss: 0.0427 - a
cc: 0.9865 - val_loss: 0.0500 - val_acc: 0.9858
The validation accuracy achieved was 98.58%, better than the previous result of 97.8% after
tunning.

Putting the new model to the test, on the test dataset, the confusion matrix and its accuracy
can be seen in the following figure.

Figure 4 Confusion Matrix of CNNv2 on test dataset

7
The test accuracy result of 98.26% shows a great improvement over the previous model
before the tuning phase (89.25%). This proves that the changes made, were efficient on the
test dataset and that there was a large margin for improvement

CNNv3

The results for model 3 of the best validation epoch are described below;

Epoch 10/30
69/69 [==============================] - 119s 2s/step - loss: 0.0965 -
acc: 0.9764 - val_loss: 0.0644 - val_acc: 0.9858

Figure 5 Confusion Matrix of CNNv3 on test dataset

The results were very similar to those obtained previously, slightly better on the test. This
may be due to a smaller margin for improvement or to the inefficiency of this regularizer to
increase efficiency at this point.

CNNv4

The first step was to perform image augmentation. The figure below is an example of the
transformations performed to random images on the dataset.

Figure 6 Output of augmentation pipeline on a sample batch dataset.

8
After running the model, the results of the test and validation for the best season was as
follows:

Epoch 10/30
69/69 [==============================] - 81s 1s/step - loss: 0.1561 - a
cc: 0.9463 - val_loss: 0.1413 - val_acc: 0.9509

Figure 7 Table of results for each epoch during training of the model CNNv4

The performance of this model (94.3%) was below the previous ones. There are several
possible reasons for this, perhaps the existence of augmented images already in the dataset
may cause too many features to be repeated in training. Some changes may also be
increasing noise or not allowing you to see certain features.

CONCLUSION
Convolutional Neural Networks are suitable for image classification problems, as
demonstrated by carrying out this study. The use of a clean dataset with quality images may
have contributed to such high-performance values. It is very important to define a set of initial
parameters so that the model has adequate complexity for the problem it will solve.

The results from the four models tested, with their varying degrees of complexity, didn’t give
us enough information for us to reach a clear conclusion regarding optimal performance.
Despite that, the model that appears to be most promising is CNNv3. We highly recommend,
for future study, that this model is tested to model robustness by doing multiple runs.

REFERENCES
1. Brownlee, J. (2020) Use early stopping to halt the training of neural networks at the
Right Time, MachineLearningMastery.com. Available at:
https://machinelearningmastery.com/how-to-stop-training-deep-neural-networks-at-
the-right-time-using-early-stopping/.

9
2. Marimuthu, P. (2022) Dropout regularization in Deep learning, Analytics Vidhya.
Available at: https://www.analyticsvidhya.com/blog/2022/08/dropout-regularization-in-
deep-learning/.
3. Narkhede, S. (2021) Understanding confusion matrix, Medium. Towards Data
Science. Available at: https://towardsdatascience.com/understanding-confusion-
matrix-a9ad42dcfd62.
4. Sharma, P. (2022) Basic introduction to convolutional neural network in Deep
Learning, Analytics Vidhya. Available at:
https://www.analyticsvidhya.com/blog/2022/03/basic-introduction-to-convolutional-
neural-network-in-deep-learning/.
5. Singla, S. (2020) Why is batch normalization useful in deep neural network?, Medium.
Towards Data Science. Available at: https://towardsdatascience.com/batch-
normalisation-in-deep-neural-network-ce65dd9e8dbf (Accessed: February 4, 2023).
6. Tf.keras.callbacks.earlystopping: tensorflow V2.11.0 (no date) TensorFlow.
Available at:
https://www.tensorflow.org/api_docs/python/tf/keras/callbacks/EarlyStopping.
7. Tf.keras.layers.conv2d: tensorflow V2.11.0 (no date) TensorFlow. Available at:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Conv2D.
8. Tf.keras.layers.dense: tensorflow V2.11.0 (no date) TensorFlow. Available at:
https://www.tensorflow.org/api_docs/python/tf/keras/layers/Dense.
9. L1 and L2 regularization methods, explained (no date) Built In. Available at:
https://builtin.com/data-science/l2-regularization.
10. The growing demand for food quality: Implications for agricultural and ... (1990).
Available at:
https://kb.osu.edu/bitstream/handle/1811/66280/CFAES_ESO_1771.pdf;sequence=1
11. Zink DL. The impact of consumer demands and trends on food processing. Emerg
Infect Dis. 1997 Oct-Dec;3(4):467-9. doi: 10.3201/eid0304.970408. PMID: 9366598;
PMCID: PMC2640073.
12. Klaus G. Grunert, Food quality and safety: consumer perception and demand,
European Review of Agricultural Economics, Volume 32, Issue 3, September 2005,
Pages 369–391, https://doi.org/10.1093/eurrag/jbi011

10
ANNEXES

Tensorboard

Figure 8 Train and Validation accuracy by epoch multiple models tested

Figure 8 Train and Validation loss by epoch multiple models tested

11

You might also like