You are on page 1of 6

CHAPTER 2: TECHNOLOGIES

What is YOLOv4?
YOLO stands for You Only Look Once, it is an object detection system in real-time that
recognizes various objects in a single enclosure. Moreover, it identifies objects more
rapidly and more precisely than other recognition systems. It can estimate up to 9000
and even more seen and unseen classes of objects. The real-time recognition system
could recognize several objects from a particular image, frame a confined-edge box
nearby objects, and quickly trained and implemented in a production system. Also, It is
an achievement in object detection research that yields in better, quicker, and
adaptable computer vision algorithms. 

YOLOv4 method features mainly include?

1. Selection of Architecture
A target detection model consists of three parts:
 Backbone: It is used to extract input shallow features (edges, colours, etc.), this
module can learn from the trained network;
 Neck: It is used to enhance the understanding and extraction of features.
Processing, combining and analysing the extracted shallow features, and optimizing
according to the target of the model;
 Head: Outputting according to the needs of the model, such as classifier,
detection frame, image segmentation, etc;
The goal is to find the best balance between the input network resolution, the number of

convolution layers, the parameter number ሺ𝑓𝑖𝑙𝑡𝑒𝑟𝑠𝑖𝑧𝑒ଶ ∗ 𝑓𝑖𝑙𝑡𝑒𝑟𝑠 ∗ 𝑐ℎ𝑎𝑛𝑛𝑒𝑙/𝑔𝑟𝑜𝑢𝑝𝑠ሻ


and the number of layer outputs (filters). We choose CSPDarknet53 backbone, SPP
(add module) , PANet(pathaggregation neck) and YOLOv3-head (anchor based) as
YOLOv4 architecture. CSP is a new kind of backbone which can enhance the learning
ability of CNN. The main technique is to divide the underlying feature mapping into
two parts: one is through dense block and transition layer, the other is combined with
transmission feature mapping to the next stage. CSPResNeXt50 performs better than
CSPDarkNet53 in classification, but worse in detection. It is suggested that the model

1
with larger receptive field and larger parameters should be selected as the backbone.
Therefore, by comparing CSPResNeXt50, CSPDarkNet53 and EfficientNet-B3 through
experiments, it shows that CSPDarkNet53 is more suitable as the backbone of detection
model. SPP comes from Kaiming. He’s SPP Net, mainly because it significantly
increases the receptive field, separates the most important context functions, and hardly
reduces the network operation speed. And PANet is mainly the improvement of feature
fusion.

2. Selection of BoF and BoS


Bag of Freebies: apply some training techniques to improve the accuracy of the model
without changing the complexity of the model.
Bag of Special: the insertion module is used to enhance some attribute and significantly
improve the accuracy of target detection.
 Bag of Freebies (BoF) for backbone: CutMix and Mosaic data augmentation;
 Bag of Specials (BoS) for backbone: Mish activation, Cross-stage partial
connections (CSP), Multi-input weighted residual connections (MiWRC); Bag of
Freebies (BoF) for backbone: CutMix and Mosaic data augmentation;
 Bag of Freebies (BoF) for detector: CIoU-loss, CmBN, DropBlock
regularization, Mosaic data augmentation, Self-Adversarial Training, eliminate grid
sensitivity, using multiple anchors for a single ground truth, Co-sine annealing
scheduler, Optimal hyperparameters, Random training shapes;
 Bag of Specials (BoS) for detector: Mish activation, SPP-block, SAM-block,
PAN pathaggregation block, DIoU-NMS.

3. Additional Improvements
In order to make the designed detector more suitable for the training of single GPU, the
following other designs and improvements have been made:
 A new method of data amplification Mosaic and self-confrontation training
(SAT); Mosaic represents a new data enhancement method, which mixes four training
images. So four different contexts are mixed, while CutMix only mixes two input
images. This makes it possible to detect objects other than normal contexts. In addition,
batch normalization calculates activation data from four different images on each layer.
This significantly reduces the need for a large batch size. Self-advanced training (SAT)

2
also represents a new data amplification technology, which operates in two forward
operation stages. In the first stage, the neural network changes the original image rather
than the weight of the network. In this way, the neural network attacks itself, changes
the original image, and creates the deceptive illusion that there is no object in the image.
In the second stage, the neural network trains and detects objects in the usual way.
 In the application of genetic algorithm, we choose the best super parameter;
 Some external methods have been modified, including SAM, PAN and CmBN.
CmBN is a modified version of CBN, which only collects statistics between mini-
batches in a single batch. Changing SAM from spatial-wise attention to point-wise
attention. Changing the shortcut connection of PAN to connection.

What is Python?
Python is an interpreted, object-oriented, high-level programming language with
dynamic semantics. Its high-level built in data structures, combined with dynamic
typing and dynamic binding, make it very attractive for Rapid Application
Development, as well as for use as a scripting or glue language to connect existing
components together. Python's simple, easy to learn syntax emphasizes readability and
therefore reduces the cost of program maintenance. Python supports modules and
packages, which encourages program modularity and code reuse. The Python interpreter
and the extensive standard library are available in source or binary form without charge
for all major platforms, and can be freely distributed.

Application of Python
There are mainly 2 types of applications that can be created using Python programming:

1. Artificial Intelligent and Machine Learning


Python’s simplicity, consistency, platform independence, great collection of resourceful
libraries, and an active community make it the perfect tool for developing Artificial
Intelligent and Machine Learning applications. Some of the best Python packages for
Artificial Intelligent and Machine Learning are:
 SciPy for advanced computing
 Pandas for general-purpose data analysis
 Seaborn for data visualization

3
 Keras, TensorFlow, and Scikit-learn for Machine Learning
 NumPy for high-performance scientific computing and data analysis

2. Data analysis
Being fast, Python jibes well with data analysis. And that’s due to heavy support;
availability of a whole slew of open-source libraries for different purposes, including
but not limited to scientific computing.

What is OpenCv?
OpenCV (Open Source Computer Vision Library) is an open source computer vision
and machine learning software library. OpenCV was built to provide a common
infrastructure for computer vision applications and to accelerate the use of machine
perception in the commercial products. Being a BSD-licensed product, OpenCV makes
it easy for businesses to utilize and modify the code.

The library has more than 2500 optimized algorithms, which includes a comprehensive
set of both classic and state-of-the-art computer vision and machine learning algorithms.
These algorithms can be used to detect and recognize faces, identify objects, classify
human actions in videos, track camera movements, track moving objects, extract 3D
models of objects, produce 3D point clouds from stereo cameras, stitch images together
to produce a high resolution image of an entire scene, find similar images from an
image database, remove red eyes from images taken using flash, follow eye movements,
recognize scenery and establish markers to overlay it with augmented reality, etc.
OpenCV has more than 47 thousand people of user community and estimated number of
downloads exceeding 18 million. The library is used extensively in companies, research
groups and by governmental bodies.

What is Numpy?
NumPy is the fundamental package for scientific computing in Python. It is a Python
library that provides a multidimensional array object, various derived objects (such as
masked arrays and matrices), and an assortment of routines for fast operations on arrays,
including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete

4
Fourier transforms, basic linear algebra, basic statistical operations, random simulation
and much more.

At the core of the NumPy package, is the ndarray object. This encapsulates n-


dimensional arrays of homogeneous data types, with many operations being performed
in compiled code for performance

What is Matplotlib?
Matplotlib is a cross-platform, data visualization and graphical plotting library for
Python and its numerical extension NumPy. As such, it offers a viable open source
alternative to MATLAB. Developers can also use matplotlib’s APIs (Application
Programming Interfaces) to embed plots in GUI applications.

What is TensorFlow?

TensorFlow is a free and open-source software library for machine learning. It can be


used across a range of tasks but has a particular focus on training and inference of deep
neural networks. Tensorflow is a symbolic math library based on dataflow and
differentiable programming. It is used for both research and production at Google.
TensorFlow was developed by the Google Brain team for internal Google use. It was
released under the Apache License 2.0 in 2015

Why TensorFlow?

TensorFlow is an end-to-end open source platform for machine learning. It has a


comprehensive, flexible ecosystem of tools, libraries and community resources that lets
researchers push the state-of-the-art in ML and developers easily build and deploy ML
powered applications.

1. Easy model building


Build and train ML models easily using intuitive high-level APIs like Keras with eager
execution, which makes for immediate model iteration and easy debugging..

2. Robust Machine Learning everywhere

5
Easily train and deploy models in the cloud, on-prem, in the browser, or on-device no
matter what language you use.

3. Powerful experimentation for research


A simple and flexible architecture to take new ideas from concept to code, to state-of-
the-art models, and to publication faster.

https://opencv.org/about/

https://www.tensorflow.org

https://www.activestate.com/resources/quick-reads/what-is-matplotlib-in-python-how-
to-use-it-for-plotting/

https://numpy.org/doc/stable/user/whatisnumpy.html

https://www.analyticssteps.com/blogs/introduction-yolov4

https://www.python.org/doc/essays/blurb/#:~:text=Python%20is%20an%20interpreted
%2C%20object,programming%20language%20with%20dynamic
%20semantics.&text=Python%27s%20simple%2C%20easy%20to%20learn,program
%20modularity%20and%20code%20reuse

https://iopscience.iop.org/article/10.1088/1742-6596/1865/4/042019/pdf?
fbclid=IwAR38STEeWRlFLt9cAE11TqkVgAItBBOjyDWeyAd6YebWaiX1RBL0Xw
StS2A

You might also like