Professional Documents
Culture Documents
What is YOLOv4?
YOLO stands for You Only Look Once, it is an object detection system in real-time that
recognizes various objects in a single enclosure. Moreover, it identifies objects more
rapidly and more precisely than other recognition systems. It can estimate up to 9000
and even more seen and unseen classes of objects. The real-time recognition system
could recognize several objects from a particular image, frame a confined-edge box
nearby objects, and quickly trained and implemented in a production system. Also, It is
an achievement in object detection research that yields in better, quicker, and
adaptable computer vision algorithms.
1. Selection of Architecture
A target detection model consists of three parts:
Backbone: It is used to extract input shallow features (edges, colours, etc.), this
module can learn from the trained network;
Neck: It is used to enhance the understanding and extraction of features.
Processing, combining and analysing the extracted shallow features, and optimizing
according to the target of the model;
Head: Outputting according to the needs of the model, such as classifier,
detection frame, image segmentation, etc;
The goal is to find the best balance between the input network resolution, the number of
1
with larger receptive field and larger parameters should be selected as the backbone.
Therefore, by comparing CSPResNeXt50, CSPDarkNet53 and EfficientNet-B3 through
experiments, it shows that CSPDarkNet53 is more suitable as the backbone of detection
model. SPP comes from Kaiming. He’s SPP Net, mainly because it significantly
increases the receptive field, separates the most important context functions, and hardly
reduces the network operation speed. And PANet is mainly the improvement of feature
fusion.
3. Additional Improvements
In order to make the designed detector more suitable for the training of single GPU, the
following other designs and improvements have been made:
A new method of data amplification Mosaic and self-confrontation training
(SAT); Mosaic represents a new data enhancement method, which mixes four training
images. So four different contexts are mixed, while CutMix only mixes two input
images. This makes it possible to detect objects other than normal contexts. In addition,
batch normalization calculates activation data from four different images on each layer.
This significantly reduces the need for a large batch size. Self-advanced training (SAT)
2
also represents a new data amplification technology, which operates in two forward
operation stages. In the first stage, the neural network changes the original image rather
than the weight of the network. In this way, the neural network attacks itself, changes
the original image, and creates the deceptive illusion that there is no object in the image.
In the second stage, the neural network trains and detects objects in the usual way.
In the application of genetic algorithm, we choose the best super parameter;
Some external methods have been modified, including SAM, PAN and CmBN.
CmBN is a modified version of CBN, which only collects statistics between mini-
batches in a single batch. Changing SAM from spatial-wise attention to point-wise
attention. Changing the shortcut connection of PAN to connection.
What is Python?
Python is an interpreted, object-oriented, high-level programming language with
dynamic semantics. Its high-level built in data structures, combined with dynamic
typing and dynamic binding, make it very attractive for Rapid Application
Development, as well as for use as a scripting or glue language to connect existing
components together. Python's simple, easy to learn syntax emphasizes readability and
therefore reduces the cost of program maintenance. Python supports modules and
packages, which encourages program modularity and code reuse. The Python interpreter
and the extensive standard library are available in source or binary form without charge
for all major platforms, and can be freely distributed.
Application of Python
There are mainly 2 types of applications that can be created using Python programming:
3
Keras, TensorFlow, and Scikit-learn for Machine Learning
NumPy for high-performance scientific computing and data analysis
2. Data analysis
Being fast, Python jibes well with data analysis. And that’s due to heavy support;
availability of a whole slew of open-source libraries for different purposes, including
but not limited to scientific computing.
What is OpenCv?
OpenCV (Open Source Computer Vision Library) is an open source computer vision
and machine learning software library. OpenCV was built to provide a common
infrastructure for computer vision applications and to accelerate the use of machine
perception in the commercial products. Being a BSD-licensed product, OpenCV makes
it easy for businesses to utilize and modify the code.
The library has more than 2500 optimized algorithms, which includes a comprehensive
set of both classic and state-of-the-art computer vision and machine learning algorithms.
These algorithms can be used to detect and recognize faces, identify objects, classify
human actions in videos, track camera movements, track moving objects, extract 3D
models of objects, produce 3D point clouds from stereo cameras, stitch images together
to produce a high resolution image of an entire scene, find similar images from an
image database, remove red eyes from images taken using flash, follow eye movements,
recognize scenery and establish markers to overlay it with augmented reality, etc.
OpenCV has more than 47 thousand people of user community and estimated number of
downloads exceeding 18 million. The library is used extensively in companies, research
groups and by governmental bodies.
What is Numpy?
NumPy is the fundamental package for scientific computing in Python. It is a Python
library that provides a multidimensional array object, various derived objects (such as
masked arrays and matrices), and an assortment of routines for fast operations on arrays,
including mathematical, logical, shape manipulation, sorting, selecting, I/O, discrete
4
Fourier transforms, basic linear algebra, basic statistical operations, random simulation
and much more.
What is Matplotlib?
Matplotlib is a cross-platform, data visualization and graphical plotting library for
Python and its numerical extension NumPy. As such, it offers a viable open source
alternative to MATLAB. Developers can also use matplotlib’s APIs (Application
Programming Interfaces) to embed plots in GUI applications.
What is TensorFlow?
Why TensorFlow?
5
Easily train and deploy models in the cloud, on-prem, in the browser, or on-device no
matter what language you use.
https://opencv.org/about/
https://www.tensorflow.org
https://www.activestate.com/resources/quick-reads/what-is-matplotlib-in-python-how-
to-use-it-for-plotting/
https://numpy.org/doc/stable/user/whatisnumpy.html
https://www.analyticssteps.com/blogs/introduction-yolov4
https://www.python.org/doc/essays/blurb/#:~:text=Python%20is%20an%20interpreted
%2C%20object,programming%20language%20with%20dynamic
%20semantics.&text=Python%27s%20simple%2C%20easy%20to%20learn,program
%20modularity%20and%20code%20reuse
https://iopscience.iop.org/article/10.1088/1742-6596/1865/4/042019/pdf?
fbclid=IwAR38STEeWRlFLt9cAE11TqkVgAItBBOjyDWeyAd6YebWaiX1RBL0Xw
StS2A