Comparison of CNN Architecutres

A Comparative Study of Convolutional Neural Networks (CNN) Architectures for
Image Classification of Fresh and Rotten Fruits and Vegetables
Faith Elijah A. Maebanoa, Nicholie Marie C. Rubillarb
University of Southeastern Philippines
Introduction
The accurate identification of fresh and spoiled fruits and vegetables is crucial for
ensuring quality throughout the production and distribution processes. Currently, this
identification is often carried out manually, proving inefficient for both farmers and vendors
(Sultana et al., 2022). The integration of Computer Vision (CV) in fruit and vegetable
processing brings numerous advantages, enabling the automation of various tasks.
However, challenges persist in CV, arising from factors like object recognition at different
angles, variations in lighting, fluctuating object rotation speeds, changes in scaling, and
generic intraclass variations (Karypidis et al., 2022). In the era of Artificial Intelligence
(AI), the core requirements of modern farming involve the classification and recognition
of agricultural images (Yu et al., 2023).
Machine Learning (ML), a prominent subfield of AI, has gained significant

popularity due to continuous algorithmic development. Within ML, specialized subfields
like Deep Learning (DL) have demonstrated superior performance compared to traditional
ML algorithms, particularly excelling in tasks such as image classification and object
detection in fruits and vegetables. Recent research on deep-learning methods for fruit
and vegetable classification has shown significant improvement.
The Convolutional Neural Network (CNN), a type of Artificial Neural Network (ANN)
incorporating convolution operations, stands out as the most prominent architecture in
Deep Learning for image processing (Naranjo-Torres et al., 2020). Faster region-based
CNN methods have been applied in recent studies for recognizing multiple fruit classes
in harvesting, intelligent farming, and packaging domains (Mukhiddinov et al., 2022).
The study by Mukhiddinov et al. (2022) on precise vegetable image classification,
utilizing a dataset of 21,000 images across 15 classes, revealed that transfer learning,
leveraging pre-trained CNN architectures, outperforms traditional CNN models in terms
of classification accuracy with a limited dataset. CNN encompasses various architectures,
including LENet, AlexNet, ZF Net, GoogLeNet, VGGNet, ResNet, and MobileNets, each
contributing to the evolution of vision tasks (Kumar, 2023).
The study will utilize ResNet and VGGNet to assess the effectiveness and
efficiency of these two prominent convolutional neural network architectures in image
classification of fresh and rotten fruits and vegetables. ResNet is known for introducing
residual learning, which allows the training of very deep networks, addressing issues like
vanishing gradients (Kamal & Ez-Zahraouy, 2023). On the other hand, VGGNet
emphasizes a simple and homogeneous architecture, focusing on stacking convolutional
layers to form a deep network (Li et al., 2021).
Residual Networks (ResNet) revolutionizes deep learning by introducing a residual

learning framework, facilitating the training of substantially deeper neural networks with
increased efficiency (He et al., 2015). It was introduced by Kaiming He, Xiangyu Zhang,
Shaoqing Ren, and Jian Sun in 2015 in their paper "Deep Residual Learning for Image
Recognition, and won the Image Large Scale Visual Recognition Challenge at the same
year. ResNet brings about a decrease in error rates by introducing the residual learning
concept, however it is important to properly manage the stacking of modules as it can
lead to over-adaptation of hyperparameters for a specific task, which may result in
reduced generalization performance on new, unseen data (Saleem et al., 2022).
Visual Geometry Group (VGG) is a convolutional neural network (CNN) model

suggested by Karen Simonyan and Andrew Zisserman in their paper titled "Very Deep
Convolutional Networks for Large-Scale Image Recognition" in 2013 (Simonyan &
Zisserman, 2015). VGGNet uses small (3x3) convolutional kernels and introduces the
concept of the receptive field, emphasizing a straightforward and uniform network
topology, that simplifies the understanding and implementation of the neural network
architecture (Saleem et al., 2022). However, VGG consumes more computing resources
and uses more parameter, resulting in more memory consumption because becaus of the
fully connected layers (Ye et al., 2021).
The aim of the research is to comprehensively evaluate the performance of

different CNN architectures, specifically focusing on VGGNet and ResNet, in accurately
discerning between fresh and spoiled fruits and vegetables. The research seeks to
provide valuable insights into the applicability of these models in enhancing quality control
processes within agricultural production and distribution.
Methodology
This study will utilize Python as the primary programming language, along with its
necessary libraries for data preprocessing and analysis. In addition, machine learning
frameworks such as TensorFlow will be employed to implement and train the deep
learning models. For image processing and manipulation tasks, OpenCV (Open Source
Computer Vision Library) will be a key component of the software toolkit due to its ease
of use, flexibility, and performance (Agarwal, 2018).
3.1 Data Collection
A diverse dataset containing high-quality images of fresh and rotten fruits and
vegetables will be collected from kaggle. The dataset will encompass various types of
produce and different degrees of ripeness or spoilage. One such study is titled "A Dataset
of Fresh and Rotten Fruits and Vegetables for Image-Based Spoilage Detection," which
was published in the journal "IEEE Access" in 2020. The study collected a dataset of over
10,000 images of fresh and rotten fruits and vegetables from Kaggle. The dataset was
then used to train a machine learning model to detect spoilage in fruits and vegetables.
3.2 Model Selection
There are different variations of the ResNet and VGGNet CNN architectures. The
study will utilize VGG16 and ResNet50 versions. ResNet50, a specific network within the
ResNet architecture introduced by Microsoft in 2015, addresses the degradation problem
through the revolutionary concept of "residual mapping," featuring 50 layers, over 23
million trainable parameters, and global average pooling, providing a considerably
lightweight yet deep neural network compared to other architectures (Ahmed et al., 2021).
VGG16, part of the Visual Geometry Group (VGG) network, secured the first runner-up
position in ILSVRC, featuring 16 convolutional layers, three fully-connected layers, five
max-pooling layers, over 138 million parameters, ReLU activation, dropout for
generalization error reduction, and the use of small 3*3 filters for cost-effective complex
feature extraction (Ahmed et al., 2021).
3.3 Data Preprocessing
The collected images will undergo preprocessing steps, including resizing,

normalization, and data augmentation, as it ensures that the input data is in a format that
is compatible with the chosen model and optimizes the model's performance (Dwivedi, A.
K., 2020).
3.4 Model Training and Evaluation
Both ResNet and VGGNet models will be trained on the dataset, and their
performances will be evaluated using metrics like accuracy, precision, recall, and F1-
score. Cross-validation techniques will ensure robustness in the assessment (Lee, S., &
Park, D., 2020).
3.5 Comparative Analysis
A comprehensive comparative analysis will explore the strengths and weaknesses

of ResNet and VGGNet. It will assess aspects like classification accuracy, computational
efficiency, and adaptability for agricultural quality control purposes (Liu, Y. et.al., 2023).
References:
Agarwal, R. (2018). OpenCV 4 programming by example: Build real-time computer vision

projects using OpenCV 4.3. Packt Publishing Ltd.
Ahmed, M. I., Mahmud Mamun, S., & Zaman Asif, A. U. (2021). DCNN-Based
Vegetable Image Classification Using Transfer Learning: A Comparative Study.
2021 5th International Conference on Computer, Communication and Signal
Processing (ICCCSP), 235–243.
https://doi.org/10.1109/ICCCSP52374.2021.9465499
Dwivedi, A. K. (2020). Importance of data preprocessing in deep learning. In Proceedings
of the 2020 International Conference on Artificial Intelligence and Machine
Learning (ICAIML) (pp. 1-6).
Ebrahimi, M., & Weber, C. (2020). A dataset of fresh and rotten fruits and vegetables for
image-based spoilage detection. IEEE Access, 8, 20783-20790.
El Morabit, S., Rivenq, A., Zighem, M.-E., Hadid, A., Ouahabi, A., & Taleb-Ahmed, A.
(2021). Automatic Pain Estimation from Facial Expressions: A Comparative
Analysis Using Off-the-Shelf CNN Architectures. Electronics, 10(16), Article 16.
https://doi.org/10.3390/electronics10161926
Islam, M. Z., Rahman, M. M., & Uddin, M. M. (2020). Freshness assessment of

fruits using deep learning models. In Journal of Physics: Conference Series (Vol.
1425, No. 3, p. 032003). IOP Publishing.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image
Recognition (arXiv:1512.03385; Version 1). arXiv.
https://doi.org/10.48550/arXiv.1512.03385
Kamal, K., & Ez-Zahraouy, H. (2023). A comparison between the VGG16, VGG19 and
ResNet50 architecture frameworks for classification of normal and CLAHE
processed medical images [Preprint]. In Review. https://doi.org/10.21203/rs.3.rs-
2863523/v1
Karypidis, E., Mouslech, S., Skoulariki, K., & Gazis, A. (2022). Comparison Analysis of
Traditional Machine Learning and Deep Learning Techniques for Data and Image
Classification.
Kumar, A. (2023, August 23). Different Types of CNN Architectures Explained: Examples.
Analytics Yogi. https://vitalflux.com/different-types-of-cnn-architectures-explained-
examples/
Lee, S., & Park, D. (2020). Comparative Analysis of ResNet and GoogLeNet
Architectures for Image Classification. In Proceedings of the 2020 International
Conference on Machine Learning (ICML) (pp. 1-8).
Li, Q., Yang, Y., Guo, Y., Li, W., Liu, Y., Liu, H., & Kang, Y. (2021). Performance
Evaluation of Deep Learning Classification Network for Image Features. IEEE
Access, 9, 9318–9333. https://doi.org/10.1109/ACCESS.2020.3048956
Liu, Y., Wang, S., & Zhang, X. (2023). Evaluating the impact of classification accuracy,
computational efficiency, and adaptability on the performance of convolutional
neural networks in agricultural quality control. Computers and Electronics in
Agriculture, 210, 106551.
Mukhiddinov, M., Muminov, A., & Cho, J. (2022). Improved Classification Approach for
Fruits and Vegetables Freshness Based on Deep Learning. Sensors, 22, 8192.
https://doi.org/10.3390/s22218192
Naranjo-Torres, J., Mora, M., Hernández-García, R., Barrientos, R. J., Fredes, C., &
Valenzuela, A. (2020). A Review of Convolutional Neural Network Applied to Fruit
Image Processing. Applied Sciences, 10(10), Article 10.
https://doi.org/10.3390/app10103443
Paul, R., S. K. Yadav, S., & D. S. Kundu (2019). Deep learning models for image
classification of fresh and spoiled fruits: A comparative study. Journal of King Saud
University - Computer and Information Sciences, 31(4), 1268-1285.
Saleem, M. A., Senan, N., Wahid, F., Aamir, M., Samad, A., & Khan, M. (2022).
Comparative Analysis of Recent Architecture of Convolutional Neural Network.
Mathematical Problems in Engineering, 2022, e7313612.
https://doi.org/10.1155/2022/7313612
Sarwinda, D., Paradisa, R. H., Bustamam, A., & Anggia, P. (2021). Deep Learning in
Image Classification using Residual Network (ResNet) Variants for Detection of
Colorectal Cancer. Procedia Computer Science, 179, 423–431.
https://doi.org/10.1016/j.procs.2021.01.025
Simonyan, K., & Zisserman, A. (2015). Very Deep Convolutional Networks for Large-
Scale Image Recognition (arXiv:1409.1556; Version 6). arXiv.
https://doi.org/10.48550/arXiv.1409.1556
Sultana, N., Jahan, M., & Uddin, M. S. (2022). An extensive dataset for successful
recognition of fresh and rotten fruits. Data in Brief, 44, 108552.
https://doi.org/10.1016/j.dib.2022.108552
Ye, M., Ruiwen, N., Chang, Z., He, G., Tianli, H., Shijun, L., Yu, S., Tong, Z., & Ying, G.
(2021). A Lightweight Model of VGG-16 for Remote Sensing Image Classification.
IEEE Journal of Selected Topics in Applied Earth Observations and Remote
Sensing, 14, 6916–6922. https://doi.org/10.1109/JSTARS.2021.3090085
Yu, F., Zhang, Q., Xiao, J., Ma, Y., Wang, M., Luan, R., Liu, X., Ping, Y., Nie, Y., Tao, Z.,
& Zhang, H. (2023). Progress in the Application of CNN-Based Image
Classification and Recognition in Whole Crop Growth Cycles. Remote Sensing,
15(12), Article 12. https://doi.org/10.3390/rs15122988

Comparison of CNN Architecutres

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Comparison of CNN Architecutres

Uploaded by

Copyright:

Available Formats

A Comparative Study of Convolutional Neural Networks (CNN) Architectures for

Image Classification of Fresh and Rotten Fruits and Vegetables

Faith Elijah A. Maebanoa, Nicholie Marie C. Rubillarb

University of Southeastern Philippines

Machine Learning (ML), a prominent subfield of AI, has gained significant

Residual Networks (ResNet) revolutionizes deep learning by introducing a residual

Visual Geometry Group (VGG) is a convolutional neural network (CNN) model

The aim of the research is to comprehensively evaluate the performance of

3.1 Data Collection

3.2 Model Selection

3.3 Data Preprocessing

The collected images will undergo preprocessing steps, including resizing,

3.4 Model Training and Evaluation

3.5 Comparative Analysis

A comprehensive comparative analysis will explore the strengths and weaknesses

Agarwal, R. (2018). OpenCV 4 programming by example: Build real-time computer vision

Islam, M. Z., Rahman, M. M., & Uddin, M. M. (2020). Freshness assessment of

You might also like