You are on page 1of 8

A Comprehensive Study of Camouflaged Object

Detection Using Deep Learning


Khalak Bin Khair, Saqib Jahir, Mohammed Ibrahim, Fahad Bin and Debajyoti Karmaker

Abstract: Object detection is a computer features), and other grayscale-grounded approaches.


technology that deals with searching through As can be seen in the picture in Fig. 1, with the help
digital images and videos for occurrences of of object detection we are able to spot the military
semantic elements of a particular class. It is soldiers and differentiate them from the tanks.
associated with image processing and computer
vision. In addition to object detection, we detect
camouflage objects within an image using deep
learning techniques. Deep learning may be a subset
of machine learning that is essentially a three-layer
neural network. Over 6500 images that possess
camouflage properties are gathered from various
internet sources and divided into four categories to
compare the result. Those images are labeled and Figure 1: Military Object Detection
then trained and tested using VGG16 architecture
on the Jupyter notebook using the TensorFlow Object detection can be done in two methods:
platform. The architecture is further customized machine learning approach or deep learning approach.
using transfer learning. Methods for transferring We discussed in brief some of the machine learning
information from one or more of these source tasks methods in the papers we reviewed, like Haar, HOG
to increase learning in a related target task are and so on. For the sake of this research, we will be
created through transfer learning. The purpose of focusing on deep learning methods because they have
this transfer of learning methodologies is to aid in become the state-of-the-art approaches to object
the evolution of machine learning to the point detection. We are going to propose a Camouflage
where it is as efficient as human learning.
Detection System using deep learning. Our purpose is
to create a robust model that will be able to detect the
Keywords: Deep Learning, Transfer Learning, TensorFlow,
camouflaged objects.
Camouflage, Object Detection, Architecture, Accuracy,
Model, VGG16.
1.2 Background
Handcrafted characteristics were used to produce the
I. Introduction
bulk of the first object discovery algorithms. People
1.1 Overview had no choice but to make complicated point
Object discovery is a computer vision approach for representations and a range of speed up skills to
detecting effects in images and videos. To obtain exhaust the employment of limited computing offers,
relevant results, object detection algorithms often use thanks to the dearth of effective image representation
machine learning or deep learning. We can recognize at the time.
and locate objects of interest in photos or video in a • Viola Jones Detectors
handful of seconds when we glance at them. The Viola and M. Jones performed real-time face
purpose of object detection is to use a computer to discovery without any limitations for the first time 18
imitate this intelligence. years ago [1], [2]. The sensor, which ran on a 700
Our aim or outcome of object discovery is to use a MHz Pentium III CPU, was tens or even hundreds of
computer to imitate this intelligence. But what is the times faster than existing methods at the time with
purpose of object discovery? Object discovery's main comparable detection accuracy. The authors' names
purpose is to identify and find one or further practical were given to the discovery system, which was
targets in still or videotape data. It covers an honest ultimately known as the "Viola-Jones (VJ) sensor," in
range of ways, including image processing, pattern honor of their substantial benefactions. The VJ sensor
recognition, AI, and machine literacy. Traditionally, uses a simple system of discovery called sliding
object discovery has been fulfilled by manually windows, which involves going through all implicit
rooting point models with popular features depicted locales and scales in a picture to see if any of the
by overeater (histogram of acquainted grade), SIFT windows include a mortal face (Fig. 2). Although it
(scale-steady point transfigure), Haar (Haar-suchlike appears to be a straightforward process, the
computations needed were well beyond the detect camouflaged images. The proposed object
capabilities of the computer at the time. detection system will help us yield a result with high
• HOG Detector accuracy. Working with TensorFlow helps us perform
Dalal and B. Triggs proposed the Histogram of a variety of tasks related to deep neural network
acquainted slants (HOG) [3] point descriptor in 2005. training and inference.
Overeater is taking into account a serious advancement
within the scale-steady point transfigures and shape II. Literature review
surrounds of the time. The overeater descriptor is
This research is based on research publications from a
meant to be reckoned on a thick grid of slightly spaced
variety of fields, involving a literature review of
cells and uses lapping original discrepancy
papers based on object detection, deep learning, CNN
normalization (on blocks) for flourishing delicacy
architectures and Anchor-based method object
while balancing point invariance detection.
•Deformable Part-based Model (DPM) What exactly do we mean by the term object
DPM was the loftiest of traditional object discovery detection?
systems, having won the VOC-07, VOC-08, and Detecting instances of visual objects of a specific
VOC-09 discovery competitions. P. Felzenszwalb [4] class (such as persons, animals, or cars) in digital
suggested DPM in 2008 as an extension of the photographs is an important computer vision task.
overeater sensor, and R. Girshick bettered it Object detection's goal is to create computational
significantly. The DPM uses the “divide and conquer” models and approaches that give one of the most
detection philosophy, in which the training can simply fundamental bits of the quantity of data needed for
be thought of as learning a right way to decompose an computer vision applications. Object detection is one
object, and the inference can be thought of as an of the tricky tasks in computer vision, and it provides
ensemble of detection on distinct object portions. the foundation for many other computer visions such
as segmentation, captioning of images, and so on. It is
1.3 Problem Statement possible to divide object detection into two research
Object detection of camouflage objects can be quite areas: general object detection and detection
complex using machine learning techniques. It can be applications. The former aims at investigating
quite difficult to spot the camouflage object. For methods for detecting numerous kinds of items in a
example, the pygmy seahorse is a kind of fish which unified framework to imitate human vision and
is only 1cm, and its body is natural camouflage for recognition, and the latter refers to detection in
under the sea objects, so detecting them can be quite particular application situations, such as detection and
difficult. The same goes with military, most military tracking, face identification, text recognition, and so
uniforms are natural camos so they can blend with the on. The fast advancement of deep learning techniques
in recent years [5] has given object detection a new
natural surroundings at war easily for which detecting
life, resulting in remarkable innovations and
them can be quite complex. Therefore, building a
propelling it to the forefront of study with unheard-of
robust model for object detection of camouflaged
attention [6]. Currently, object detection is widely
objects using deep learning helps us in detecting these
used in numerous practical applications, such as
complex objects with a lot more ease. We are autonomous driving, machine vision, video
providing solutions where the users can easily surveillance, and others.
understand the objects and recognize them amidst In object identification frameworks, anchors must be
their camo background. constructed and handled with care. The density with
which anchors cover the instance location space is one
1.4 Objective of the most critical factors in anchor design. Anchors
General Objective are meticulously designed depending on statistics
In this study, we propose a camouflage object computed from the training/validation set to attain a
detection system. This system will aim to detect high recall rate. A design decision based on a specific
objects that are camouflaged in nature. This system dataset may or may not apply to other applications,
will help us to even detect small animals such as reducing the generality. Anchor methods use
pygmy seahorses with ease. intersection-over union (IoU) to determine
positive/negative samples during the training phase,
Specific Objective which adds extra computation and hyper-parameters
• To design a robust camouflage object detection to an object detection system.
system using deep learning FoveaBox is a straightforward concept: it consists of a
• To develop a camouflage object detection system backbone network and a network of the fovea. An
using the TensorFlow platform. already-built convolutional network serves as the
• To find out the effectiveness of this architecture for backbone and is in charge of calculating a
object detection convolutional feature map over a full input image.
Our research is based on object detection using deep The fovea head is divided into two sections: the first
learning and to build a robust system which can able to does box prediction for every area that could be
covered by an item, and the second performs per-pixel resolution, depth, and width. Based on these
classification on the output of the backbone. improvements and the EfficientNet backbones, they
For each position potential contained by an instance, created the EfficientDet family of object detectors,
FoveaBox forecasts the object existence possibility which regularly outperforms past work in terms of
and the accompanying boundary. efficiency across a variety of resource constraints.
EfficientDet’s overall architecture is mostly based on
𝑥1 𝑦1 𝑥2 𝑦2 the one-stage detector paradigm. As the backbone
𝑥 ′1 = , 𝑦 ′1 = , 𝑥 ′ = , 𝑦′2 = ,
𝑠1 𝑠1 2
𝑠1 𝑠1 network, the authors use ImageNet-trained
𝑐′𝑥 ′
= 0.5(𝑥 2 + 𝑥 1 ′ ), ′ ′
𝑐′𝑦 = 0.5(𝑦 2 + 𝑦 1 ), EfficientNets. The feature network in our proposal is
𝜔 ′ = 𝑥 ′ 2 − 𝑥 ′ 1 , ℎ′ = 𝑦 ′ 2 − 𝑦 ′ 1 the BiFPN, which takes Applying top-down and
bottom-up bidirectional feature fusion on a periodic
We will go over the major components one by one in basis, level 3-7 features from the backbone network
this section. are used. These combined attributes are put into a
1) Possibility of an Object Occurrence: Assuming a class and box network to produce object class and
reliable ground-truth box (x1, y1, x2, y2). Assuming bounding box predictions.
that sl is the down-sample factor, we first map the box Compound Scaling:
into the target feature pyramid P1. The score map's To maximize both accuracy and efficiency, the
positive region Rpos is intended to substantially authors of this work developed a family of models
resemble the original area while being smaller. Each that can accommodate a variety of resource
cell inside the positive region is labeled with the constraints. The ability to scale up a baseline
appropriate target class during the training phase. The EfficientDet model is a crucial hurdle here. They
whole feature map, except the area in Rpos [7], is the offered a new compound object recognition approach
negative area. that leverages a straightforward compound coefficient
2) Scale Assignment: based on the number of feature to simultaneously scale up the backbone network, the
pyramidal levels, divide the scales of objects into BiFPN network, the class/box network, and the
several bins. On pyramid levels P3 through P7, each resolution.
pyramid has a basic scale rl that ranges from 32 to Input Image Resolution:
512. Because BiFPN uses feature levels 3–7, the input
3) Box Prediction: Each ground-truth bounding box is resolution must be 2 times 7. As a result, the equation
stated as G = (x1, y1, x2, y2). linearly increases resolutions. Input = 512 + φ 128.
The EfficientDet is evaluated on COCO2017
detection datasets with 118 k training images, the
s1 (x + 0.5) − x1
t x1 = log , model is further optimized using an SGD optimizer.
r1 Xuebin Qin et al. [9] proposed Salient object detection
with boundary awareness, BASNet; Saliency map
s1 (y + 0.5) − y1 refinement, and prediction are handled by the
t y1 = log , architecture's highly supervised Encoder-Decoder
r1
network and residual refinement module, respectively.
x2 − s1 (x + 0.5) The hybrid loss directs the network to learn the
t x2 = log , transformation between the input picture and the
r1
ground truth in a three-level hierarchy at the pixel,
patch, and map level by mixing Binary Cross-Entropy
y2 − s1 (x + 0.5)
t y2 = log , (BCE), Structural Similarity (SSIM), and Intersection
r1 over Union (IoU) losses. Thanks to the hybrid loss,
the proposed predict-refine architecture can
consistently anticipate the fine structures with clear
FoveaBox computes the normalized offset starting boundaries and effectively segment the salient object
from a positive point (x, y) in Rpos between (x, y) and areas. It delivers high-quality bounds and accurate
four borders. salient object segmentation. A new predict-refine
End-to-End Object Detection: network is presented to capture both global (coarse)
The area proposal loss, instance recognition loss, and and local (fine) circumstances. It combines a unique
duplication classification loss are all combined with residual refinement module with a UNet-like deeply
equal weights in the end-to-end training. supervised Encoder-Decoder network. The refinement
Mingxing Tan et al. [8] extensively examine neural module adjusts the predicted map by learning the
network architecture design choices for object residuals between the coarse saliency map and the
detection and offer some major enhancements to ground truth after the encoder-decoder network turns
increase efficiency. They first suggest a weighted bi- the input picture into a probability map. (ii) We
directional feature pyramid network (BiFPN), which propose a hybrid loss that combines Binary Cross-
empowers fast and easy multi-scale feature fusion, Entropy, Structural Similarity (SSIM), and IoU losses,
and then they suggest a compound scaling method, all of which are anticipated to learn from ground truth
which scales all backbone, feature network, and data at the pixel, patch, and map levels, respectively,
box/class prediction networks concurrently in terms of
to produce a high-confidence saliency map with a 3.3 Data Description
distinct border. The dataset we gathered is from various internet
Refine Model: sources. We collected a total of approximately 6500
Refinement Module (RM) [10], [11] is commonly pictures and divided them into four categories, Army,
constructed as a residual block that refines the coarse Pygmy seahorse, Chameleon and Octopus. Army
saliency maps that have been forecasted. Srefined = consists of approximately 1300 pictures, Pygmy
Scoarse + Sresidual. In order to define the term seahorse and Chameleon of 2000 each and Octopus of
"coarse" the authors talked about their refinement over 1000. Moreover, we try to seek the most
module. The term "coarse" has two meanings in this effective camouflage picture from the internet.
context. The first is the hazy and noisy limits. The
other is the regional probabilities that are unevenly 3.4 Data Preparation and Data Preprocessing
forecasted. SOD, ECSSD, DUT-OMRON, PASCAL- After gathering the datasets from various sources and
S, HKU-IS, and DUTS are six commonly used dividing them into four categories, we labeled the
benchmark datasets on which the authors tested their dataset using image labeling software.
technique. SOD includes 300 photos that were created
with image segmentation in mind. These photographs
are difficult to work with because most of them have
many important items that are either low contrast or
overlap with the image boundaries. The DUTS-TR
dataset, which contains 10553 pictures, was used to
train the network. The dataset is augmented by
horizontal flipping to 21106 photos before training.
Each image is scaled to 256256 pixels and then
randomly cropped to 224224 pixels during training.
The ResNet-34 model [12] is used to initialize some
of the encoder parameters. Xavier [13] initializes the
other convolutional layers. To train their network,
they used the Adam optimizer, with the
hyperparameters set to default. Alex Bochkoskiy et al.
[14] proposed a research YOLOv4 for optimal speed Fig. 2: Chameleon, Army, Pygmy seahorse, Octopus
and accuracy of object detection. respectively.
Rather than the theoretical indicator of low computing
volume, the major purpose of this effort is to develop From the bar chart we can see that the images divided
an object detector with an increasing operating speed into four categories. Then we labeled this image using
for use in manufacturing systems and to optimize for a labeling tool. The stage of preprocessing is at first,
parallel calculations (BFLOP). The authors created a we select that particular image, then we write the name
product that will be simple to train and operate. of the image so that the tool can detect the exact image
Anyone who trains and tests with a traditional GPU, more accurately that we are referring to.
for example, may achieve compelling object
identification findings that are real-time, high-quality, 3.5 Model Training
and accurate like the YOLOv4 does. At first, In implementation we import OS because the
OS module in Python has methods for adding and
III. Methodology deleting folders, retrieving their contents, and altering
and locating the current directory. Then, we import
3.1 Overview cv2, which is the module name for OpenCV-Python.
This chapter comprises the dataset used, its After that, we import glob pyplot, sns and image data
preparation, preprocessing and the architecture generator respectively. For the training, we used seed
implemented to find the result. It also describes the 1000, image size 100 and batch size 128. In the
process of how we trained our model by dividing the training generator, we used ImageData generator
dataset into training data and testing data with where, rotation range 30width_shift_range=0.2,
validation. height_shift_range=0.2, zoom range = 0.3,
horizontal_flip=True,validation_split=0.2.subset='trai
3.2 Research Methodology ning', class mode = 'sparse' has been used for train
The main goal is to develop a robust camouflage batch also in valid batch it remains same class mode
object detection system using transfer learning on but subset used as 'validation'. In the training batch,
VGG16 architecture. We show the optimal accuracy we found 5011 images belonging to four classes
achieved using the proposed customized training whereas in the validation batch it was 1252 images
model on VGG16 architecture and the benefits we can belonging to four classes. For the base model, we used
obtain like detecting soldiers behind enemy lines in 'Imagenet' as weights for our training model.
war, as well as the ability to discover the life cycle of Briefly, a sizable visual database created for use in
so many species which are endangered in the deep research on visual object identification software is
sea. called the ImageNet project. The project has manually
annotated more than 14 million photographs to neurons. The Softmax activation layer, which is used
identify the things they depict, and at least one million for category classification, comes after the output
of the images also include bounding boxes. There are layer. [15].
almost 20,000 categories in ImageNet, and a common In short, about transfer learning, users can take the
category, like "balloon" or "strawberry" can include learning from previously trained networks and then
several hundred photos. customize that model according to their needs. The
benefit of transfer learning is if the user has a really
3.6 Architecture Selection small dataset and anyone try to build a network or
The preprocess data was used to train transfer learning train a network from the very scratch it will perform
on VGG16. During the architecture selection, we used outstanding way. There is a misnomer that traditional
ResNet50, AlexNet, SSD Mobile net and VGG16 machine learning performs better compared to deep
architecture to test for accuracy. Among them, learning if the dataset size is very small. Now that is
VGG16 gives us highest amount of accuracy consider no longer true because using the transfer learning
to all other architecture. technique in a pre-train network even if the user has
About VGG16, on the ImageNet dataset, it was shown very small datasets, they can build a wonderful
to be the highest performing model out of all the network that will outperform any machine learning or
setups. A size 224 by 224 image with three channels – traditional machine learning approaches.
R, G, and B – is regarded as the input to any of the
network configurations. The only pre-processing done 3.7 Summary
is to normalize each pixel's RGB values. Every pixel Overall, in this chapter we discussed about efficiency
is subtracted from the mean value to accomplish this. of VGG16 and its architecture for more specification
The picture is routed via the first stack of two what is transfer learning and how it works. We also
convolution layers after ReLU activations, with a very mentioned about the ImageNet and our training model
small receptive area of 3 × 3. Each of these two layers that how we customized it.
has 64 filters. The convolution stride is fixed at 1
pixel, whereas the padding is 1 pixel. In this IV. Result Analysis
configuration, the spatial resolution is maintained, and
the output activation map has the same dimensions as 4.1 Overview
the input picture. The activation maps are then In this chapter, we describe the result that we have
performed using spatial max pooling over a 2 x 2- gathered from the proposed architecture. It mainly
pixel window with a stride of 2 pixels. The focuses on the architecture and how we trained our
activations' sizes are cut in half. The bottom model in transfer learning on VGG16. Here, all the
activations of the first stack are therefore 112*112 working procedures of the proposed architecture have
*64 in size. been discussed. We also explained our train model,
After that, the activations are run through a second validation accuracy and also about the train validation
layer, which has 128 filters as opposed to the previous loss. Finally, we discussed the efficiency of our
stack's 64. The size is therefore 56 x 56 x 128 after the developed model. Our main goal is to find the highest
second layer. Three convolutional layers plus a max accuracy which will work more effectively and
pool layer make up the third stack. The stack output robustly for detecting a camouflaged object.
size is 28 x 28 x 256 due to the 256 filters that were
employed. Two stacks of three convolutional layers 4.2 Result Description
with 512 filters each follow. The outcome will be 7 x As previously mentioned, we used transfer learning
7 x 512 after both stacks. on VGG16 architecture. We customize our pre-train
The stacks of convolutional layers are followed by model. In our base model input shape is 100 and
three fully linked layers, followed by a flattening weights defined as 'Imagenet'. As we know that
layer. The last fully connected layer acts as the output VGG16 was perform the highest performing model in
layer and has 1,000 neurons, which correspond to the out of all setups on Imagenet. However, in this model
1,000 probable classifications in the ImageNet we used a training batch and 50 epochs.
dataset. The previous two layers each have 4,096
Fig. 3: VGG16 Pre-train layer

During implementation we made our dataset image that are produced as a consequence of pooling feature
shape 100*100 with 3 filters. After that, it is gradually maps into a single, lengthy continuous linear vector.
narrowed down and remains 3 x 3 with 512 filters.
Where activation function performs ReLU and 5 max
polling or flattening layer remained. About flattening,
it is used to transform all of the 2-dimensional arrays

Fig. 4: Extended layer of Architecture.

This is the customized layer (Fig. 9) where the activation functions have been used one is ReLU and
number of filters gets reduced from 512 to 4 neurons. another one is softmax.
In the end, it remains 4 neurons and they are Army,
Octopus, Pygmy seahorses, and Chameleon. Two

Fig. 5: Cam_VGG16 full architecture


In Fig. 10, we draw a visual architecture of the fluctuations on validation accuracy and it shows nearly
proposed network where all the layers have been 91% accuracy until 50 epochs run.
discussed. Moving to the Loss part both train loss and validation
Train loss has downward trends, compared to validation
Train Validation Validation
loss train loss gives more finest result.
Accuracy Accuracy loss Loss
For representing, accuracy graph (15,5) figure size
95% 91% 0. 0.29%
has been made for plotting where subplot size was
1%
(1,2,1). There are two different labels on graph one is
train accuracy another one is validation accuracy. On
Table 1: Predicted result from the graph.
the other hand, for showing Loss we used subplot
(1,2,2) and two different labels where train loss and
Considering Train accuracy, there are so many ups and
validation loss included.
downs trend until 50 epochs. Where it performs almost
95% accuracy. On the other hand, there was huge

Fig. 6: Result of our model

As it can be seen from the picture all Epoch are


depicted horizontal line where Accuracy and Loss in
Vertical line.

4.3 Summary take the learning from the previously trained network
We mostly concentrate on the creation of the and customize the model according to the user's
architecture, how we introduced our own customized needs. The most significant advantage of transfer
layer, and the general layer description as we wrap up learning is resource savings and increased efficiency
this chapter. After that, we added the accuracy and while training new models.
loss from the result section and discussed about it.
5.3 Implementation of Methodology and Objective
V. Conclusion We implement an object detection system using deep
learning techniques and improve the VGG16 model
5.1 Overview with transfer learning. The proposed model was
This chapter demonstrates the summary of the developed in Jupyter notebook using the TensorFlow
research. The widespread objective of this research is platform comparing several types of architecture
to detect camouflaged objects using deep learning seeking an optimal accuracy which can detect the
techniques. We designed a VGG16 architecture and camouflaged objects more robustly and effectively.
customized it by implementing transfer learning on it.
The accuracy obtained from training and testing our 5.4 Limitation of the Study
model proves it be to robust and efficient. We faced several drawbacks while conducting this
research.
5.2 Framework of this Model 1) In our data collection step, that is collecting
The proposed framework of this research was based thousands of images from various internet
on transfer learning. In transfer learning, the user can
sources, we had to make sure all the images are .jpeg References
or .png, as any other formats will result in Unknown
image file format. [1] D. G. Lowe, "Distinctive image features from
2) We need a powerful gpu; the gpu we trained our scale-invariant keypoints," pp. 91-110, 2004.
model in is GeForce 2060. An outdated graphics card [2] J. M. a. J. P. Serge Belongie, "Shape
will not work. matching and object," 2002.
3)We need to label images using labelme, while [3] N. D. a. B. Triggs, " Histograms of oriented
labeling we have to make sure to include wright as gradients for human detection," pp. 886-893, 2005.
image net. [4] D. M. a. D. R. Pedro Felzenszwalb, "A
4) While collecting the dataset, we needed to find the discriminatively trained, multiscale, deformable part
right kind of images, that is, images that are properly model," pp. 1-8, 2008.
camouflaged. [5] Y. B. G. H. e. a. Yann LeCun, " Deep
Learning," pp. 436-444, 2015.
5.5 Recommendation for Future Work [6] Z. S. Y. G. J. Y. Zhengxia Zou, "Object
Object detection technology's future is still being detection in 20 years," 2019.
proven, it has the potential to release people from [7] F. S. H. L. Y. J. L. L. a. J. S. Tao Kong,
tiresome work that computers can accomplish more "Foveabox: Beyound anchor-based object detection,"
efficiently and effectively. It will also open up new pp. 7389-7398, 2020.
research and operational possibilities, which will yield
[8] R. P. Q. V. L. Mingxing Tan, "Efficientdet:
more benefits in the future. As a result, these
Scalable and efficient object detection," pp. 10781-
problems avoid the requirement for extensive training
that requires a large number of datasets in order to 10790, 2020.
perform more sophisticated jobs. With continuing [9] Z. Z. C. H. C. G. M. D. M. J. Xuebin Qin, "
evolution, as well as the devices and methodologies Basnet: Boundary-aware salient object detection," pp.
that enable, it could soon become the next big thing in 7479-7489, 2019.
the future. [10] M. K. M. R. N. D. B. a. Y. W. Md Amirul
Islam, " Salient object detection using a context-aware
Most present algorithms only address a tiny subset of refinement network,”, 2017.
the various tasks required for image comprehension [11] X. H. L. Z. X. X. J. Q. G. H. P.-A. H. Zijun
and are expensive. To replicate a fraction of the Deng, "R3net: Recurrent residual refinement network
normal person's capacity to recognize objects, one for saliency detection," 2018.
would need to merge multiple distinct algorithms into [12] G. G. P. D. a. R. G. Kaiming He, " Mask R-
a single system that runs in real-time, which would be CNN," pp. 2961-2969, 2017.
a huge problem with today's hardware. Detecting [13] X. G. a. Y. Bengio, " Understanding the
objects is a crucial task for most computer vision difficulty of training deep feedforward neural
systems. networks," pp. 249-256, 2010.
[14] C.-Y. W. a. H.-Y. M. L. Alexey
5.6 Summary of the Research
Bochkovskiy, "Optimal speed and accuracy of object
We have developed a robust object detection system
detection," 2020.
using deep learning techniques. The architecture we
[15] K. S. a. A. Zisserman., "Very deep
worked on is vVGG16 and customized it using
transfer learning. We worked on the Jupyter notebook convolutional networks for large-scale image
using the TensorFlow platform. We obtained an recognition," 2015.
accuracy of approximately 95% when we trained the
proposed model. The validation accuracy we achieved
is approximately 91%. Train Loss is 0.1% and
validation loss is 0.29%. In training accuracy, there
are many ups and downs with the trend until 50
epochs. When the dataset size is small, there is a
misconception that classical machine learning
outperforms deep learning. That is no longer the case
since, by employing the transfer learning technique in
the pre-train network, users may design a fantastic
network that outperforms any machine learning or
traditional machine learning approaches, even if they
have an extremely small dataset.

You might also like