You are on page 1of 19

Similarity Report ID: oid:28506:47471634

PAPER NAME

Research_Work_2023.docx

WORD COUNT CHARACTER COUNT

5209 Words 32984 Characters

PAGE COUNT FILE SIZE

12 Pages 554.1KB

SUBMISSION DATE REPORT DATE

Nov 29, 2023 3:25 PM GMT+5:30 Nov 29, 2023 3:26 PM GMT+5:30

20% Overall Similarity


The combined total of all matches, including overlapping sources, for each database.
7% Internet database 8% Publications database
Crossref database Crossref Posted Content database
18% Submitted Works database

Excluded from Similarity Report


Bibliographic material

Summary
Research Paper

Exploring the Effectiveness of MobileNetV2, VGG19, and a Simple CNN


Model for Image Classification
Amritanshu Bathre , Amit Kumar , Naman Pasari , Shrikant Suryawanshi
(Mathematics and Computing)
MITS Gwalior

35
Abstract: In the dynamic field of computer vision and image I.
classification, the choice of a deep learning model plays a crucial
role in achieving optimal performance across various applications.
14
INTRODUCTION
This study conducts a comparative analysis of three distinct
convolutional neural network (CNN) architectures: MobileNetV2, In the rapidly evolving digital era, image classification has
VGG19, and a simplified CNN model. The objective is to assess become a pivotal task across various applications, including
45
their effectiveness in image classification tasks, considering factors autonomous vehicles, healthcare, security systems, and social
like accuracy, computational efficiency, and model complexity. media platforms. The accurate classification of images is
fundamental in these sectors, serving as the foundation
6
for
To ensure a thorough evaluation, a diverse dataset covering subsequent analysis and decision-making processes. Convolutional
multiple categories is utilized, enabling a robust assessment of the Neural Networks (CNNs) have significantly enhanced the
models' generalization capabilities. The training process involves performance of image classification tasks by autonomously
hyperparameter optimization for fair comparisons, and the models learning spatial hierarchies of features. This research paper
7
undergo rigorous testing on a separate validation set to measure explores the effectiveness of three CNN architectures:
their performance under real-world conditions. MobileNetV2, VGG19, and a simple CNN model in image
classification.
4
The experimental results provide nuanced insights into the
strengths and weaknesses of each architecture. MobileNetV2, MobileNetV2, developed by Google, is an efficient and lightweight
recognized for its lightweight design, demonstrates notable model tailored for mobile and embedded vision applications. It
efficiency in computational resource usage, making it well-suited utilizes inverted residuals and linear bottlenecks to balance
for deployment on resource-constrained devices. VGG19, computational efficiency and model accuracy, crucial in resource-
57
characterized by its deep and intricate structure, exhibits a strong limited mobile and embedded scenarios. This research focuses on
ability to capture complex hierarchical features, albeit with investigating the effectiveness of MobileNetV2 in such
increased computational demands. The simplified CNN model, applications.
designed to strike a balance between complexity and performance,
emerges as a practical alternative in scenarios where a compromise In contrast, VGG19, originating from the Visual Graphics Group
between accuracy and resource efficiency is sought. at Oxford, is a deeper and more complex model renowned for its
outstanding performance on the ImageNet dataset. Its depth and
52
This research contributes to the ongoing discourse surrounding the complexity, coupled with high performance, make it a popular
selection of CNN architectures for image classification tasks. By choice for image classification. However, VGG19 comes with
offering empirical evidence, it equips practitioners and researchers significantly higher computational requirements compared to
with valuable insights to inform their decisions based on the MobileNetV2. This study aims to provide a detailed analysis of
specific requirements of their applications. VGG19's performance and computational demands, offering
comprehensive insights into its suitability for image classification
tasks.

Keywords: Image Classification, Convolutional Neural


Network(CNN), Comparative Analysis, Accuracy(CAA), In contrasting these pretrained models, a simple CNN model,
33
Computational Efficiency, Model Complexity, MobileNetV2, typically comprising a few convolutional layers followed by max-
VGG1. pooling and fully connected layers, offers a more straightforward
and customizable approach to image classification. While it may
not attain the same accuracy levels as its more intricate
counterparts, its simplicity and lower computational requirements
render it an appealing option for specific applications. This learning in diverse domains beyond healthcare. Foundational
research seeks to conduct an in-depth analysis of a simple CNN works such as "MobileNetV2: Inverted Residuals and Linear
model, exploring its strengths, weaknesses, and suitability for Bottlenecks" (Sandler et al., 2018), "Very Deep Convolutional
59
various image classification tasks. Networks for Large-Scale Image Recognition" (Simonyan and
19 Zisserman, 2014), and "EfficientNet: Rethinking Model Scaling
The primary objective is to provide a comprehensive comparison for Convolutional Neural Networks" (Tan and Le, 2019) provide
of these three models concerning accuracy, computational essential insights into the architectural advancements shaping
efficiency, and ease of implementation. This comparative analysis modern convolutional neural networks. Verma and Chandra's
will be grounded in a series of experiments utilizing a diverse comprehensive review (2020) synthesizes the landscape of CNN
dataset of images. The outcomes of this study aim to furnish applications, serving as a valuable resource for researchers and
valuable insights for researchers and practitioners in the selection practitioners. Lane, Bhattacharya, and Georgiev's "Practical Deep
of the most appropriate model for their specific image Learning for Cloud, Mobile, and Edge" (2019) provides practical
classification tasks. Additionally, the paper strives to contribute to insights into deploying deep learning models across various
the ongoing discourse within the machine learning community platforms. Works by Goodfellow et al. (2016), Stanford
regarding the inherent trade-off between model complexity and University's CS231n course materials, and Rosebrock's "Deep
performance—an essential consideration in model selection with Learning for Computer Vision with Python" (2018) contribute
profound implications for the efficiency and effectiveness of image essential knowledge to the deep learning landscape. Simonyan and
classification tasks. Zisserman's seminal work on "Very Deep Convolutional Networks
8 for Large-Scale Image Recognition" (2014) remains influential in
Subsequent sections will delve into the intricacies of each model, shaping the architecture of deep convolutional networks.
outline the methodology employed for the comparison, present the
results, and7 draw conclusions based on the findings. The Despite the valuable insights provided by these works on
overarching goal is to provide a thorough comprehension of the
1 MobileNetV2, VGG19, and simple CNN models for image
strengths and weaknesses of each model, aiding in the decision-
classification, a comprehensive comparison of these three models
making process for future image classification tasks. This research 60
in terms of accuracy, computational efficiency, and ease of
is anticipated to be a valuable augmentation to the existing implementation is still lacking, which this research aims to address.
knowledge in the realm of image classification using CNNs, with
the hope that the insights gained will inform and guide future
advancements in this swiftly evolving field.
III.
TOOLS AND METHODOLOGY
II.
Literature Review
A. TOOLS AND DATASETS ARE:
1
In the field of deep learning applications, several studies have made 41
significant contributions to image classification and feature 1. CIFAR-10 Dataset: The CIFAR-10 dataset, created by the
56
extraction techniques. In a study by Yasin Kaya and Ercan Gürsoy Canadian Institute for Advanced Research, is a widely utilized
19
38 collection of 60,000 32x32 color images categorized into 10 classes
(2023), a MobileNet-based CNN model with a unique fine-tuning 34
mechanism is proposed for COVID-19 infection detection. This such as airplanes, cars, birds, cats, deer, dogs, frogs, horses, ships,
innovative approach enhances the model's adaptability and and trucks. With 6,000 images per class, it serves as a key resource
accuracy in identifying COVID-19 infections, addressing a critical for training machine learning and computer vision algorithms,
need in healthcare challenges. SP Godlin Jasil and V. frequently employed as a benchmark for evaluating their
Ulagamuthalvi (2021) focus on66skin lesion classification, utilizing performance.
a deep learning architecture 22based on transfer learning. This
demonstrates the versatility of deep learning in medical imaging, Each image in the CIFAR-10 dataset is a 32x32 color image, with
48 the RGB values stored in row-major order. The dataset is divided
offering a potential tool for early and accurate diagnosis of skin 36
conditions. Monika Bansal et al. (2021) explore transfer learning into five training batches and one test batch, each containing
43
using VGG19 for image classification, highlighting the 10,000 images. While the test batch comprises 1,000 randomly-
effectiveness of pre-trained models on the challenging Caltech-101 selected images per class, the training batches may have varying
image dataset. Atul Sharma and Gurbakash Phonsa (2021) class distributions.
39
contribute to image classification using CNN, as presented at the
International Conference on Innovative Computing & Despite its popularity, the CIFAR-10 dataset presents challenges
Communication. Yonis Gulzar (2023) introduces a fruit image due to the small image size, making it challenging for algorithms
classification model based on MobileNetV2, employing a deep to recognize subtle patterns. Additionally, the limited number of
transfer learning technique, showcasing the applicability of deep
images per class poses difficulties in distinguishing between extraction from input data, particularly in18
the context of images.
similar-looking classes. The architecture consists of three distinct blocks, each comprising
convolutional layers and max pooling layers. This design facilitates
27
However, these challenges render the CIFAR-10 dataset an effective hierarchical representation learning, allowing the network
excellent tool for testing the robustness of machine learning to discern intricate patterns within the data.
algorithms. Success on CIFAR-10 suggests potential efficacy on
larger, more intricate datasets. The operations within each block follow a specific sequence:
1
In summary, the CIFAR-10 dataset is a valuable asset for machine 1. Convolutional Layer:
learning and computer vision enthusiasts. Its compact 1
size, 28
complexity, and widespread use make it an effective benchmark This layer utilizes filters to convolve over the input data, capturing
for assessing the performance of machine learning algorithms, spatial hierarchies and extracting meaningful features. The
28
catering to both beginners and seasoned researchers in the field. convolutional operation enables the network to recognize local
47
patterns and learn feature representations.
2. Google Collab: Google Colaboratory, commonly referred to as
16 40
Google Colab, is a complimentary cloud-based service provided by 2. Max Pooling Layer:
Google that allows users to write and execute Python code in an
interactive environment via a web browser. Built on the Jupyter Subsequent
21
to convolution, the max pooling layer is applied to
notebook framework, it combines code, text, and multimedia in a reduce the spatial dimensions of the feature maps. Max pooling
single document, making it popular in the data science community. involves selecting the maximum value from a group of neighboring
values, effectively downsampling the information while retaining
53
A notable advantage of37Google Colab is its built-in access to the most salient features. This contributes to controlling the
hardware accelerators, including Graphics Processing Units computational complexity of the network and enhances its
(GPUs) and Tensor Processing Units (TPUs). These accelerators robustness.
significantly enhance computational speed, particularly beneficial
for training deep learning models. This process is iterated three times, constituting the three 5
20
convolutional and max pooling blocks. Post these blocks, the
Additionally, Google Colab seamlessly integrates with other feature maps undergo flattening into a one-dimensional vector.
Google services like Google Drive and Google Sheets, facilitating This flattening operation is pivotal as it transforms the spatial
46
easy data import,
7
model storage, and result export. It also supports information from the convolutional and pooling layers into a
widely used machine learning libraries such as TensorFlow, format suitable for input into densely connected layers.
PyTorch, and Keras, commonly employed for implementing and 50
training Convolutional Neural Network (CNN) models. Following flattening, the architecture integrates two dense (fully
connected) layers. These layers accept the flattened vector as input
In a research context, Google Colab proves invaluable, especially and are adept at handling classification tasks. The dense layers are
when dealing with computationally intensive tasks like training and particularly well-suited
26
for learning complex relationships in the
comparing multiple models. Leveraging hardware accelerators and data, as they establish connections between every neuron in one
integrating with Google Drive enhances efficiency and promotes layer and every neuron in the subsequent layer.
reproducibility and collaboration in research.
In summation, the overall sequence of operations in this CNN
Moreover, the support for popular machine learning libraries architecture involves the strategic utilization of convolutional and
enables researchers to implement and train models using the latest max pooling layers in three successive blocks, followed by
tools and techniques. flattening and processing through two dense layers ( as shown in
the following figure ). This comprehensive approach enables
In summary, Google Colab stands out as a potent tool for machine effective feature extraction and complex relationship learning
learning research. Its features, including free access to hardware within the network.
accelerators, smooth integration with Google services, and
compatibility with popular machine learning libraries, establish it
as an excellent platform for both beginners and seasoned
researchers in the field.
B. MODELS USED:
1. Simple CNN:

In the described Convolutional Neural Network (CNN)


architecture, a strategic design is employed to enhance feature
A. An Efficient Backbone:

MobileNetV2, an evolution of its predecessor MobileNet, is


5
recognized for its exceptional efficiency in resource-constrained
environments like mobile devices and edge computing platforms.
Key to its innovation is the integration of depthwise separable
This architecture demonstrates notable effectiveness in convolutions, significantly reducing parameters and computations
applications such as image classification, leveraging a hierarchical compared to traditional convolutional layers.
feature extraction process through convolution and pooling. This
proven approach, combining multiple convolutional and pooling Depthwise separable convolutions involve a depthwise
blocks, has yielded success in accurately predicting outcomes by convolution followed by a pointwise convolution, striking a
capturing intricate patterns. The sequential stacking of these blocks 51
balance between model size and performance, making it ideal for
empowers the network to acquire progressively sophisticated computational resource-limited applications.
representations, enhancing its proficiency in identifying patterns
across diverse scales and complexities inherent in the input data. B. Global Average Pooling for Compact Representation:
12
The introduction of a global average pooling layer after
2. MobileNetV2: MobileNetV2 is a strategic choice for achieving efficiency and
regularization. Global average pooling computes the average value
In the dynamic landscape of deep learning architectures, the of each feature map across its spatial dimensions, reducing spatial
presented model offers a compelling synthesis of efficiency and dimensions to a single value per channel.
performance. The incorporation of MobileNetV2 as the
2
foundational layer, combined with global average pooling and a Beyond dimensionality reduction, global average pooling
dense layer, signifies a thoughtful design strategy that addresses introduces spatial regularization by summarizing features across
both computational efficiency and nuanced feature representation. the entire spatial extent, enhancing generalization by focusing on
salient features and discarding spatially redundant information.
signifies a deliberate effort to balance computational efficiency,
C. Dense Layer for Classification: model compactness, and classification accuracy.
E. Advantages and Applicability:
The final dense layer acts as the ultimate classifier, transforming
the compacted
20
feature representation into predicted output classes. The model's design 17
provides several benefits. MobileNetV2's
In this fully connected layer, each neuron connects to every neuron efficiency suits resource-constrained environments, enabling
in the preceding layer, facilitating the learning of complex deployment on devices with limited computational capabilities.
relationships within the data. Global average pooling streamlines the architecture, fostering
spatial regularization that enhances generalization.
The dense layer leverages features extracted by MobileNetV2 and
condensed by global average pooling to make nuanced predictions. The use of a dense layer ensures adaptability to diverse tasks,
5
Its configuration corresponds to the number of output classes, with making the architecture valuable for image classification, object
an activation function, often softmax, converting raw output into detection, and various computer vision applications.
probability distributions.
F. Considerations and Future Directions:
D. Synergy and Harmonization:
While the model excels in efficiency and performance, future
The seamless collaboration among MobileNetV2, global average exploration is warranted. Fine-tuning hyperparameters like
pooling, and the dense layer is unmistakable. MobileNetV2's learning rates and dropout rates could optimize performance.
proficiency in feature extraction lays a sturdy foundation for Additionally, evaluating the model across different datasets and
subsequent layers. Global average pooling complements this task complexities will assess its versatility.
process by consolidating features into a concise representation,
offering regularization benefits and mitigating overfitting Future iterations may explore transfer learning techniques,
concerns. leveraging pre-trained MobileNetV2 weights on larger datasets to
enhance feature extraction and expedite convergence, especially in
Strategically positioned at the end, the dense layer conducts the scenarios with limited labeled data.
neural symphony, translating learned features into actionable
predictions. The overall harmonization of these components G. Conclusion:
2
In summary, the model architecture, featuring MobileNetV2, global average pooling facilitates computational efficiency in
global average pooling, and a dense layer, represents a thoughtful subsequent layers.
and effective approach to deep learning. Its deployment in
computationally demanding scenarios, such as mobile and edge C. Dense Layer for Final Classification:
computing, aligns with the need to optimize performance in 55
constrained environments. The architecture concludes with64a dense layer, serving as the final
stage for classification. This fully connected layer takes the
12
The interplay between these architectural elements, each condensed features obtained from the global average pooling layer
23
contributing unique strengths, results in a holistic model striking a and maps them to the desired output classes or predictions. Each
delicate balance between efficiency and accuracy. This model not neuron in the dense layer is65connected to every neuron in the
only showcases the ongoing evolution of deep learning preceding layer, enabling the model to capture intricate
architectures but also serves as a pragmatic and versatile solution relationships within the learned features.
for various image-related tasks. 24
The number of neurons in the dense layer corresponds to the
3. VGG19: number of output classes, and an activation function, often a
softmax activation, converts the model's raw output into
The model architecture described involves using VGG19 as the probability distributions across these classes. The dense layer acts
2
foundational layer, followed by a global average pooling layer and as the decision-making hub, leveraging the hierarchical features
concluding with a dense layer. This configuration signifies a extracted by VGG19 and compacted by global average pooling for
deliberate design strategy aiming to harness the representational precise classification.
power of VGG19, optimize computational efficiency through
22
global average pooling, and leverage a dense layer for final D. Synergistic Design and Considerations:
classification. Let's delve into the details of each component in this
model. The model's design reflects a synergistic approach to deep learning,
6
where each component plays a crucial role in enhancing the overall
A. VGG19 as the Foundational Layer: effectiveness. VGG19's feature extraction capabilities, coupled
15
with the spatial summarization achieved by global average pooling,
VGG19, a variant of the VGG (Visual Geometry Group) contribute to a refined and nuanced representation that facilitates
44
architecture, is known for its simplicity and effectiveness. It accurate classification.
27
consists of 19 layers, primarily composed of 3x3 convolutional 42
filters and max-pooling layers. The repeated stacking of these Considerations for fine-tuning hyperparameters, such 6
as learning
layers allows VGG19 to capture intricate hierarchical features in rates and dropout rates, may be pertinent to further optimize the
the input data, making it a formidable choice for image-related model's performance. Additionally, assessing the model's
tasks. robustness across different datasets and task complexities would
provide insights into its generalization capabilities.
As the foundational layer, VGG19 serves17as a feature extractor,
hierarchically learning and representing complex patterns and E. Advantages and Applicability:
structures within the input data. The inherent depth of VGG19
makes it adept at discerning abstract features and enables the The utilization of VGG19 as the base layer imparts a robust feature
subsequent layers to learn increasingly nuanced representations. extraction capability to the model, making it suitable for a variety
of image-related tasks. The modular and scalable nature of VGG
62
B. Global Average Pooling Layer: architectures allows for adaptability to different scenarios, and the
2
12
subsequent incorporation of global average pooling and a dense
Following the VGG19 base layer, a global
3
average pooling layer is layer aligns with the broader goals of regularization and
introduced. Global average pooling operates by computing the classification.
average value of each feature map across its entire spatial
3 31
dimensions. This operation58condenses the spatial information This model configuration is versatile and can be applied to diverse
within each feature map into a single value per channel, resulting tasks such as image classification, object detection, and feature
in a compact representation of the learned features. extraction. Its proficiency in learning hierarchical features makes
it well-suited for scenarios where intricate patterns and structures
Global average pooling contributes to regularization by reducing in the data need to be discerned.
spatial redundancies and promoting a form of spatial
summarization. This can be advantageous in mitigating overfitting F. Conclusion:
4
and enhancing the model's ability to generalize well to unseen data.
Additionally, the reduction in spatial dimensions achieved by In conclusion, the model architecture featuring VGG19 as the base
layer, global average pooling for spatial summarization, and a
3
dense layer for final classification embodies a strategic and scenarios. It gauges a model's ability to differentiate between
effective approach to deep learning. Its application in image- positive and negative classes under varying probability thresholds.
related tasks benefits from the depth and simplicity of VGG19,
coupled with the spatial regularization and classification prowess Breaking down its constituents:
2
provided by global average pooling and the dense layer,
5
respectively. This model stands as a testament to the adaptability 1. ROC Curve (Receiver Operating Characteristic Curve):
and versatility of deep learning architectures in addressing complex
10
tasks and scenarios. This graphical representation juxtaposes the true positive rate
(sensitivity) against the false positive rate (1-specificity) at
different threshold settings. Each point on the ROC curve signifies
a sensitivity/specificity pair linked to a specific decision threshold.
IV 63
2. AUC (Area Under the Curve):
RESULTS
9
Representing the area beneath the ROC curve, the AUC spans from
29 0 to 1, with a higher value signaling superior performance. An
The primary objective of this
8
study is to assess and contrast the AUC of 0.5 denotes a classifier no better than random, while an
efficacy of three distinct convolutional neural network (CNN) AUC of 1.0 signifies impeccable discrimination between positive
architectures: MobileNetV2, VGG19, and a simplified CNN and negative classes.
model. The evaluation and training of these models took place on
1 32
the CIFAR-10 dataset, a prominent and extensively utilized - AUC = 0.5: Random classifier
benchmark dataset specifically designed for image classification - AUC > 0.5: Superior to random
endeavors. - AUC = 1.0: Flawless classifier
TABLE I COMPARISON OF NETWORK VARIABLES OF VARIOUS CNNS
In essence, the AUC-ROC score offers a consolidated measure of
a classifier's performance across diverse decision thresholds. It
Models param data size param/data furnishes a singular value indicating the model's adeptness at
61
distinguishing between positive and negative instances.

In the subsequent section, the table provided enumerates the AUC-


CNN 68,000 10,000 68.0 ROC scores of the models, offering a comprehensive assessment
of their proficiency in discriminating between positive and
MobileNetV2 22,70,794 10,000 227.0794 negative classes.
VGG19 2,00,34,644 10,000 2003.4644
TABLE I COMPARISON OF NETWORK VARIABLES OF VARIOUS CNNS

Models AUC-ROC Score


Precession and recall formulas are also listed in the below image
for reference.

CNN 0.9421106
MobileNetV2 0.7725232888888888
VGG19 0.9116561666666667

B. Accuracy Comparison
The evaluation of three distinct models—Simple CNN,
MobileNetV2, and VGG19—on the CIFAR-10 dataset holds
1
A. AUC-ROC Comparison pivotal significance in our study. CIFAR-10, comprising 60,000
32x32 color images distributed across 10 classes, presents a
3
The AUC-ROC (Area Under the Receiver 6 Operating formidable benchmark for image classification. This section
Characteristic) score stands as a widely employed metric for explores the comparative examination of accuracy attained by each
assessing the performance of models in binary classification model, providing insights into their1
respective capabilities in
navigating the intricacies embedded within the CIFAR-10 dataset.
18
dataset. The analysis of these loss metrics offers valuable insights
1. Model Descriptions into the convergence and generalization capabilities of each model.

1. Simple CNN To ensure a fair and standardized comparison of our proposed


11
model, we opted for the CIFAR-10 dataset, a widely recognized
The Simple CNN features a direct convolutional neural network benchmark extensively employed for assessing image
structure, incorporating convolutional layers succeeded by pooling classification performance. Comprising 60,000 32x32 color
1
layers. Functioning as our baseline model, it establishes a images distributed across 10 classes, the CIFAR-10 dataset
25
foundation for comparison. facilitates a rigorous evaluation. The dataset was partitioned into
training and validation sets, with the model trained on the former,
2. MobileNetV2 and hyperparameters fine-tuned using the latter. Consistency was
maintained by conducting training for an equal number of epochs
MobileNetV2, known for its efficiency and suitability for mobile across all experiments.
and edge devices, features depth wise separable convolutions. Its
lightweight architecture aims to balance computational efficiency The findings
15
are visually depicted in the subsequent graphs,
and model accuracy. offering a comprehensive overview of the model's performance
throughout the training process. The initial set of graphs presents
3. VGG19 summarized loss in both normal and log scales, providing insights
into the overall convergence trend. Following this, a more detailed
VGG19, an architecture known for its depth and computational analysis unfolds with additional graphs showcasing individual
intensity, is comprised of 19 layers featuring small 3x3 training and validation losses. These visualizations contribute to a
convolutional filters. Acknowledged for its robust representation nuanced comprehension of the model's behavior, illuminating its
capabilities, VGG19 is utilized as a benchmark, offering insight training dynamics and generalization capabilities.
into the delicate balance between model complexity and accuracy.
A. Comparative Analysis
2. Visualization

Following graph presents the accuracy results for the three models
on the CIFAR-10 dataset.

C. Loss Comparison
The comprehensive evaluation of deep learning models is crucial
for discerning their efficacy in tackling specific tasks. This section
presents an in-depth comparison of the training and validation
losses incurred by three distinct models—Simple CNN,
11
MobileNetV2, and VGG19—when applied to the CIFAR-10
The loss comparison of individual function is also illustrated by
following figures

D. VGG19 Model
B. CNN - Our Model

C. Confusion Matrix
C. MobileNetV2 Model
To gauge the classification prowess13
of the proposed models,
confusion matrices are employed as a fundamental tool in the 13
evaluation of machine learning classifiers. These matrices offer a
detailed breakdown of the model's predictions, providing valuable
13
insights into its capacity to accurately classify instances across
diverse classes.

Within the confusion matrix, the main diagonal elements signify


true positives and true negatives, indicating instances correctly
classified by the model. Conversely, off-diagonal elements,
encompassing false positives and false negatives, expose instances
where the model deviates from the ground truth.
8
The assessment focused on three distinct convolutional neural
network (CNN) architectures: a custom-designed Simple CNN,
MobileNetV2, and VGG19. These evaluations were carried out on
30
the CIFAR-10 dataset, housing 60,000 32x32 color images
distributed across ten classes. The central emphasis of this analysis
rests on the accuracy and confusion matrices of each model.

C.VGG19 Model
A. CNN - Our Model

The study findings reveal that among the three models assessed, a
simple CNN model attains the highest accuracy. This implies that
7
a streamlined and adaptable architecture
49
can be highly effective for
B. MobileNetV2 Model image classification tasks. The added advantage of the simple CNN
model lies in its lower computational cost, emphasizing its Conversely, the Simple CNN model proves to be adept at capturing
practical utility. Although the MobileNetV2 and VGG19 models intricate features within images, resulting in commendable
exhibit comparatively lower accuracies in this study, it's worth accuracy. However, its considerable computational demands may
54
noting that their pre-trained weights and transfer learning limit its practicality in resource-constrained environments. The
capabilities may confer advantages for particular image delicate balance between computational efficiency and accuracy
classification tasks. emerges as a pivotal factor in the decision-making process, 14
underscoring the importance of choosing models judiciously based
on the specific requirements of the image classification task.

MobileNetV2, prioritizing computational efficiency and a compact


V architecture, presents itself as a compelling solution for
applications that necessitate a careful equilibrium between
CONCLUSION accuracy and resource constraints. Its success lies in achieving a
thoughtful compromise, maintaining competitive accuracy levels
while minimizing the computational resource demands. The
The investigation into the effectiveness of MobileNetV2, VGG19,
model's suitability for mobile and edge devices positions it as a
and a Simple CNN model for image classification has revealed a
frontrunner in the dynamic field of image classification, where
nuanced interplay of factors crucial for selecting an appropriate
real-time inference and responsiveness are paramount.
model across diverse applications. MobileNetV2, prioritizing
Nonetheless, the inherent trade-off entails potential limitations in
computational efficiency and compact model size, emerges as an
capturing highly intricate features due to its simplified structure.
ideal option for scenarios requiring lightweight architectures. Its
capacity to maintain competitive accuracy while minimizing
The Simple CNN model, despite its lower complexity compared to
resource demands positions it as a leading choice for deployment
MobileNetV2 and VGG19, emerges as a viable option for specific
in mobile and edge devices, particularly in settings with
image classification scenarios. Its simplicity makes it well-suited
pronounced computational constraints. However, this advantage
for tasks with straightforward requirements and limited
comes with a potential trade-off, as its streamlined architecture
computational resources.
may result in a compromise in nuanced feature representation.
In conclusion, the exploration of the effectiveness of
The quest for effective image classification models involves
MobileNetV2, VGG19, and a Simple CNN in image classification
navigating4 a dynamic landscape characterized by the intricate
underscores the multifaceted nature of model selection. It
balance of factors such as accuracy, computational efficiency, and
4 transcends a singular pursuit of accuracy, emphasizing the delicate
model complexity. In this exploration, we conducted a comparative
balance of considerations that empower practitioners to navigate
analysis of three distinct models: MobileNetV2, recognized for its
the evolving landscape of image classification, tailoring models to
lightweight architecture tailored for resource-constrained
the unique requirements of each application. As technology
environments; VGG19, a deep and intricate model designed to
advances, the pursuit of optimal models remains dynamic, with
capture nuanced features within images; and a Simple CNN model,
each stride forward revealing new possibilities and refining our
a streamlined alternative suited for scenarios prioritizing simplicity
comprehension of the intricate interplay between complexity and
and efficiency.
efficiency in the domain of image classification.

REFERENCES
[1] Kaya, Yasin, and Ercan Gürsoy. "A MobileNet-based CNN model with a novel fine-tuning mechanism for COVID-19 infection detection." Soft Computing 27.9
(2023): 5521-5535.

[2] Jasil, SP Godlin, and V. Ulagamuthalvi. "Deep learning architecture using transfer learning for classification of skin lesions." Journal of Ambient Intelligence and
Humanized Computing (2021): 1-8.

[3] Bansal, Monika, et al. "Transfer learning for image classification using VGG19: Caltech-101 image data set." Journal of ambient intelligence and humanized computing
(2021): 1-12.

[4] Sharma, Atul, and Gurbakash Phonsa. "Image classification using CNN." Proceedings of the International Conference on Innovative Computing & Communication
(ICICC). 2021.

[5] Gulzar, Yonis. "Fruit image classification model based on MobileNetV2 with deep transfer learning technique." Sustainability 15.3 (2023): 1906.

[6] Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L. C. (2018). MobileNetV2: Inverted Residuals and Linear Bottlenecks. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv preprint arXiv:1409.1556.

[8] Farabet, C., Couprie, C., Najman, L., & LeCun, Y. (2013). Learning Hierarchical Features for Scene Labeling. IEEE Transactions on Pattern Analysis and Machine
Intelligence (TPAMI), 35(8), 1915-1929.

[9] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information
Processing Systems (NIPS).

[10] Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In Proceedings of the International Conference on Machine
Learning (ICML).

[11] Verma, A., & Chandra, D. (2020). A Comprehensive Review on Convolutional Neural Network with Various Applications. Artificial Intelligence Review, 53(8),
5455-5505)

[12] Lane, N. D., Bhattacharya, S., & Georgiev, P. (2019). Practical Deep Learning for Cloud, Mobile, and Edge. O'Reilly Media.

[13] Goodfellow, I., Bengio, Y., Courville, A., & Bengio, Y. (2016). Deep Learning (Vol. 1). MIT press Cambridge.

[14] Stanford University. (Course materials and lecture notes on Convolutional Neural Networks for Visual Recognition. Available online: http://cs231n.stanford.edu/)

[15] Rosebrock, A. (2018). Deep Learning for Computer Vision with Python. PyImageSearch.

[16] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint arXiv:1409.1556 (2014).
Similarity Report ID: oid:28506:47471634

20% Overall Similarity


Top sources found in the following databases:
7% Internet database 8% Publications database
Crossref database Crossref Posted Content database
18% Submitted Works database

TOP SOURCES
The sources with the highest number of matches within the submission. Overlapping sources will not be
displayed.

University of Surrey on 2023-09-19


1 1%
Submitted works

Riad Alharbey, Mohamed M. Dessouky, Ahmed Sedik, Ali I. Siam, Moha...


2 <1%
Crossref

University of Surrey on 2023-09-19


3 <1%
Submitted works

Manchester Metropolitan University on 2023-05-26


4 <1%
Submitted works

mdpi.com
5 <1%
Internet

Nottingham Trent University on 2023-08-18


6 <1%
Submitted works

Kaunas University of Technology on 2023-05-26


7 <1%
Submitted works

University of Hull on 2023-08-25


8 <1%
Submitted works

Sources overview
Similarity Report ID: oid:28506:47471634

Southern New Hampshire University - Continuing Education on 2023-0...


9 <1%
Submitted works

Eastern University on 2023-07-26


10 <1%
Submitted works

University of Sunderland on 2023-10-26


11 <1%
Submitted works

"Neural Information Processing", Springer Science and Business Media...


12 <1%
Crossref

University of Essex on 2023-11-24


13 <1%
Submitted works

Omar Jilani Jidan, Susmoy Paul, Anirban Roy, Sharun Akter Khushbu, ...
14 <1%
Crossref

University College London on 2023-05-04


15 <1%
Submitted works

University of North Texas on 2023-09-18


16 <1%
Submitted works

Curtin University of Technology on 2023-10-29


17 <1%
Submitted works

University of Surrey on 2023-09-12


18 <1%
Submitted works

University of Hertfordshire on 2023-08-28


19 <1%
Submitted works

University of North Texas on 2023-04-21


20 <1%
Submitted works

Sources overview
Similarity Report ID: oid:28506:47471634

University of Lancaster on 2023-08-11


21 <1%
Submitted works

Heriot-Watt University on 2022-08-15


22 <1%
Submitted works

University of Technology, Sydney on 2023-05-22


23 <1%
Submitted works

Veselin Atanasov. "Application of Convolutional Neural Networks and ...


24 <1%
Crossref

RMIT University on 2023-08-13


25 <1%
Submitted works

Signal Mountain Middle High School on 2023-11-27


26 <1%
Submitted works

University of Surrey on 2023-09-04


27 <1%
Submitted works

University of Western Ontario on 2023-11-23


28 <1%
Submitted works

ncbi.nlm.nih.gov
29 <1%
Internet

Korea Advanced Institute of Science and Technology on 2015-11-04


30 <1%
Submitted works

dspace.bracu.ac.bd
31 <1%
Internet

indico.desy.de
32 <1%
Internet

Sources overview
Similarity Report ID: oid:28506:47471634

University of Surrey on 2015-04-01


33 <1%
Submitted works

digital.lib.washington.edu
34 <1%
Internet

Lecture Notes in Computer Science, 2015.


35 <1%
Crossref

University of Sunderland on 2023-10-30


36 <1%
Submitted works

iaeme.com
37 <1%
Internet

ijournalse.org
38 <1%
Internet

papers.ssrn.com
39 <1%
Internet

CSU, San Jose State University on 2016-08-12


40 <1%
Submitted works

Intercollege on 2023-10-29
41 <1%
Submitted works

Middlesex University on 2023-11-20


42 <1%
Submitted works

Xu, Chunyan, Canyi Lu, Xiaodan Liang, Junbin Gao, Wei Zheng, Tianjian...
43 <1%
Crossref

amitos.library.uop.gr
44 <1%
Internet

Sources overview
Similarity Report ID: oid:28506:47471634

data-science-ua.com
45 <1%
Internet

Al Quds University on 2023-05-31


46 <1%
Submitted works

Coventry University on 2023-08-08


47 <1%
Submitted works

Cranfield University on 2012-07-24


48 <1%
Submitted works

Hanoi University on 2023-11-06


49 <1%
Submitted works

Middlesex University on 2023-10-24


50 <1%
Submitted works

Narayanan Krishnasamy, Thangaraj Ponnusamy. " Deep robust hybrid ...


51 <1%
Crossref

National College of Ireland on 2023-05-13


52 <1%
Submitted works

Oxford Brookes University on 2023-04-29


53 <1%
Submitted works

Queen Mary and Westfield College on 2023-08-22


54 <1%
Submitted works

University of Southampton on 2022-09-22


55 <1%
Submitted works

University of Wales Institute, Cardiff on 2021-04-01


56 <1%
Submitted works

Sources overview
Similarity Report ID: oid:28506:47471634

doras.dcu.ie
57 <1%
Internet

ebin.pub
58 <1%
Internet

image-net.org
59 <1%
Internet

library.ndsu.edu
60 <1%
Internet

Liverpool John Moores University on 2023-09-03


61 <1%
Submitted works

Qian Zhao, Shuzhi Sam Ge, Mao Ye, Sibang Liu, Wei He. "Learning Salie...
62 <1%
Crossref

Yonsei University on 2023-02-15


63 <1%
Submitted works

Coventry University on 2017-01-11


64 <1%
Submitted works

Manchester Metropolitan University on 2023-10-06


65 <1%
Submitted works

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubba...


66 <1%
Crossref

Sources overview

You might also like