You are on page 1of 15

electronics

Essay
YOLOv5-Ytiny: A Miniature Aggregate Detection and
Classification Model
Sheng Yuan, Yuying Du *, Mingtang Liu, Shuang Yue, Bin Li and Hao Zhang

College of Information Engineering, North China University of Water Resources and Electric Power,
Zhengzhou 450000, China; yuansheng@ncwu.edu.cn (S.Y.); liumingtang@ncwu.edu.cn (M.L.);
syncwu@yeah.net (S.Y.); mlking44@yeah.net (B.L.); 201616710@stu.ncwu.edu.cn (H.Z.)
* Correspondence: dyyncwu@yeah.net

Abstract: Aggregate classification is the prerequisite for making concrete. Traditional aggregate
identification methods have the disadvantages of low accuracy and a slow speed. To solve these
problems, a miniature aggregate detection and classification model, based on the improved You
Only Look Once (YOLO) algorithm, named YOLOv5-ytiny is proposed in this study. Firstly, the C3
structure in YOLOv5 is replaced with our proposed CI structure. Then, the redundant part of the
Neck structure is pruned by us. Finally, the bounding box regression loss function GIoU is changed
to the CIoU function. The proposed YOLOv5-ytiny model was compared with other object detection
algorithms such as YOLOv4, YOLOv4-tiny, and SSD. The experimental results demonstrate that
the YOLOv5-ytiny model reaches 9.17 FPS, 60% higher than the original YOLOv5 algorithm, and
reaches 99.6% mAP (the mean average precision). Moreover, the YOLOv5-ytiny model has significant
speed advantages over CPU-only computer devices. This method can not only accurately identify
the aggregate but can also obtain the relative position of the aggregate, which can be effectively used
for aggregate detection.

Citation: Yuan, S.; Du, Y.; Liu, M.; Keywords: object detection; aggregate; YOLO; classification; computer vision
Yue, S.; Li, B.; Zhang, H.
YOLOv5-Ytiny: A Miniature
Aggregate Detection and
Classification Model. Electronics 2022, 1. Introduction
11, 1743. https://doi.org/10.3390/
Aggregate classification is an important factor for determining the performance and
electronics11111743
quality of concrete. Concrete is composed of cement, sand, stones, and water. Aggregate
Academic Editor: Pedro generally accounts for 70% to 80% of concrete [1]. Many factors affect the strength of
Latorre-Carmona concrete, mainly including the cement strength and water–binder ratio, the aggregate
Received: 27 April 2022
gradation and particle shape, the curing temperature and humidity, the curing age, etc. [2].
Accepted: 26 May 2022
The aggregate processing system is one of the most important auxiliary production systems
Published: 30 May 2022
used in the construction of large-scale water conservancy and hydropower projects [3].
Aggregate quality control is of great significance to promote the sound development of
Publisher’s Note: MDPI stays neutral
the engineering construction industry [4], it is also extremely important for improving the
with regard to jurisdictional claims in
quality of a project and optimizing the cost of a project [5]. Different types of aggregate have
published maps and institutional affil-
different effects on the performance of concrete [6]. Regarding the particle size and shape
iations.
of the aggregate, the current specifications for coarse aggregate needle-like particles are
relatively broad [7], and good-quality aggregate needs to have a standardized particle size
and shape [8]. Therefore, we must ensure the quality requirements of aggregate and select
Copyright: © 2022 by the authors.
raw materials are reasonable to ensure the quality of concrete. It is particularly important
Licensee MDPI, Basel, Switzerland. to find a suitable aggregate classification and detection method.
This article is an open access article In recent years, the level of aggregate classification and detection has greatly im-
distributed under the terms and proved [9], and there are now a variety of sand particle size measurement methods. These
conditions of the Creative Commons include, for example, mesoscale modeling of concrete static and dynamic tensile fractures
Attribution (CC BY) license (https:// for real shape aggregates [10], the development of a particle size and shape measurement
creativecommons.org/licenses/by/ system for manufactured sand [11], the use of extreme gradient boosting-based pavement
4.0/). aggregate shape classification [12], the use of the wire mesh method to sort aggregate

Electronics 2022, 11, 1743. https://doi.org/10.3390/electronics11111743 https://www.mdpi.com/journal/electronics


Electronics 2022, 11, 1743 2 of 15

gardens [13], a method for evaluating the strength of individual ballast aggregates by point
load testing and establishing a classification method [14], the determination of particle
size, and core and shell size determination of core–shell particle distribution by analytical
ultracentrifugation [15], the use of the projected area of the particle to calculate the surface
area, equivalent diameter, and sphericity [16], the use of imaging methods to obtain reli-
able particle size distribution [17], the use of a vibration dispersion system, feed blanking
system, and backlight image acquisition system to construct a particle size and shape
measurement device [18]. Isa et al. proposed an automatic intelligent aggregate classifi-
cation system combined with robot vision [19]. Sun et al. proposed a coarse aggregate
granularity classification method based on a deep residual network [20]. Moaveni et al.
developed a new segmentation technology that can capture images quickly and reliably to
analyze their size and shape [21]. Sinecen et al. established a laser-based aggregate shape
representation system [22], which classifies aggregate from the features extracted from the
created 3D images.
However, these screening methods can only measure the size of sand particles offline.
Although digital image processing methods use more mature technical means [23,24], the
research of these methods mainly focuses on the evaluation index of the shape characteris-
tics of the aggregate [25], which cannot achieve the efficient real-time detection of images.
In reality, with regards to the detection background of aggregate, the size of the detection
target, the day and night light, and the difference in detection distances, the transmission of
these detection targets to the processing side may cause different interferences. In this case,
it is necessary to first detect the target position and locate and frame the target to reduce
signal interference as much as possible, and, at the same time, detect target objects under
different characteristics. Therefore, the real-time detection of aggregate features under
complex backgrounds is of great significance.
In summary, this work brings the following main contributions:
• The design of a new type of aggregate detection and classification model, which
can accurately identify the types of aggregate under complex backgrounds, such as
different light, different distances, and different states of dry and wet aggregate.
• The improvement of YOLOv5, replacing the C3 module in the model backbone net-
work, tailoring the Neck structure, and realizing the compression of the model, so that
the model can be quickly detected on a computer that does not support GPU. The loss
function is improved making the object frame selection more accurate.
• In the original YOLOv5 model, the original three detection heads are simplified into
two, which is more suitable for the detection of a single target (only one target is
recognized in a picture), thus reducing the number of parameters and the number
of calculations.

2. Related Work
There are many mature target detection algorithms, such as YOLOv4 [26], SSD [27],
YOLOv4-tiny, and YOLOv5 [28]. Compared with these algorithms, YOLOv5 is lighter
and more portable. YOLOv5 uses a backbone feature extraction network, acquires the
depth features of the input image, uses feature fusion to further improve the effectiveness
of features, effectively frames the detection target, and improves the precision of target
detection [29]. At present, YOLO is also widely used as a popular target detection algorithm.
Yan et al. proposed a real-time apple targets for picking detection method robot based on
improved YOLOv5 [30]. Yao et al. proposed a ship detection method in optical remote
sensing images based on deep convolutional neural networks [31]. Gu et al. proposed
a YOLOv5-based method for the identification and analysis of emergency behavior of
caged laying ducks [32]. Zhu et al. proposed traffic sign recognition based on deep
learning [33]. Fan et al. proposed a strawberry ripeness recognition algorithm combining
dark channel enhancement and YOLOv5 [34]. A cost-performance evaluation of livestock
activity recognition services using aerial imagery was proposed by Lema et al. [35] Jhong
et al. proposed a night object detection system based on a lightweight and deep network of
Electronics 2022, 11, 1743 3 of 15

Electronics 2022, 11, 1743 3 of 15


of livestock activity recognition services using aerial imagery was proposed by Lema et
al. [35] Jhong et al. proposed a night object detection system based on a lightweight and
deep network of the internet of vehicles [36]. Wang et al. proposed a fast smoky vehicle
the internetmethod
detection of vehicles
based[36].
on Wang et al.YOLOv5
improved proposed[37].a fast
Wu smoky
et al. vehicle
studieddetection method
the application of
based
YOLOv5 on improved YOLOv5
in the detection [37]. targets
of small Wu et al. studiedsensing
in remote the application
images [38]. of YOLOv5 in pro-
Song et al. the
detection
posed anof small targets
improved in remote sensing
YOLOv5-based object images
detection[38]. Song et
method foral.grabbing
proposed an improved
robots [39].
YOLOv5-based object detection method for grabbing robots [39].
Although YOLO v5 is much lighter than other object detection algorithms, the net-
work Although
structureYOLO v5 is much
is complex, with lighter than other
many layers, object
a large detection
number algorithms,
of nodes, the network
and limited exper-
structure is complex, with many layers, a large number of nodes, and limited
imental equipment. If it was running on one CPU, it would take longer during the actual experimental
equipment.
training andIfinference.
it was running on one CPU, it would take longer during the actual training
and inference.
In order to solve these problems, this experiment established an aggregate classifica-
tion In order tomodel,
detection solve YOLOv5-ytiny,
these problems,based this experiment
on YOLOv5established
in a complex anbackground,
aggregate classi-
com-
fication detection model, YOLOv5-ytiny, based on YOLOv5
pressing the YOLOv5 model, extracting complex detection background features in a complex background,
in differ-
compressing the YOLOv5
ent environments, improvingmodel,
the extracting complex
detection speed, anddetection
providing background features in
real-time judgment of
different environments, improving
the classification of aggregate. the detection speed, and providing real-time judgment
of the classification of aggregate.
3. Materials and Methods
3. Materials and Methods
3.1.Data
3.1. DataCollection
Collectionand
andProcessing
Processing
Inthis
In thisexperiment,
experiment,a high-definition
a high-definition camera
camera is used
is used to collect
to collect images,
images, andreal-time
and the the real-
time images
images obtained
obtained by thebycamera
the camera are transmitted
are transmitted to the
to the client.
client. Themodel
The modelclassifies
classifiesand
and
recognizesthe
recognizes theacquired
acquiredimages,
images,and
andthen
thendisplays
displaysthe
theresults
resultstotothe
theclient.
client. Figure
Figure 11isisaa
schematicdiagram
schematic diagramof ofimage
imageacquisition.
acquisition.

Figure Image acquisition


Figure1.1. Image acquisition process:
process:after
afterthe
thetransport
transport vehicle
vehicle arrives,
arrives, thethe aggregate
aggregate image
image is
is col-
collected, and
lected, and thethe model
model returns
returns the
the result
result after
after processing.
processing.

When
Whencapturing
capturing images,
images, the camera is
the camera is fixed
fixed 22 m
mabove
abovethetheaggregate
aggregatebox.box.Consider-
Consid-
ering
ing that the distance between the car and the camera is within 1–2 m during actualactual
that the distance between the car and the camera is within 1–2 m during trans-
transportation, the effective
portation, the effective detection
detection distance
distance we expect
we expect is also
is also within
within 1–2 1–2 m. The
m. The image
image col-
collection is shot
lection is shot under
under natural
natural light
light andand night
night lighting.
lighting. TheThe shooting
shooting result
result is saved
is saved as a
as a 1920
1920
pixelpixel
× 1080× pixel
1080 pixel RGB image.
RGB image. ThereThere are 4 of
are 4 types types of aggregate,
aggregate, namelynamely
stones,stones, small
small stones,
stones, machine-made sand, and surface sand. The particle size of the stones is
machine-made sand, and surface sand. The particle size of the stones is in the range of 3– in the range
of 3–4 cm, the particle size of small stones is in the range of 2–3 cm, the particle size of
4 cm, the particle size of small stones is in the range of 2–3 cm, the particle size of machine-
machine-made sand is in the range of 1–2 cm, the particle size of surface sand is in the
made sand is in the range of 1–2 cm, the particle size of surface sand is in the range of 0.1–
range of 0.1–0.5 cm. A total of 525 images in four types were taken, including different light
0.5 cm. A total of 525 images in four types were taken, including different light conditions,
conditions, dry and wet aggregate, and different shooting distances.
dry and wet aggregate, and different shooting distances.
Taking the stones in Figure 2a and the small stones in Figure 2b as examples, the
unit grayscale number of stones is distributed in the (130,180) pixel interval, and the
unit grayscale number of small stones is distributed in the (120,180) pixel interval. The
Electronics 2022, 11, 1743 4 of 15
Taking the stones in Figure 2a and the small stones in Figure 2b as examples, the unit
grayscale number of stones is distributed in the (130,180) pixel interval, and the unit gray-
Electronics 2022, 11, 1743
scale number
Taking the of stones
small in stones
Figureis2a distributed
and the small in stones
the (120,180)
in Figurepixel2b as interval.
examples,The the 4unit
of 15
grayscale
grayscale number
distributions of stones
of these is distributed
two types in the (130,180)
of aggregate pixel interval,
have highly overlappingand theareas.
unit gray-
It may be
scale number of small stones is distributed in the (120,180) pixel
difficult to segment an image based on gray threshold, and it can be seen in Figure 3 that interval. The grayscale
grayscale distributions of types
these of two types ofhave
thedistributions
stones andofsmall thesestones
two aggregate
are stacked. If anaggregate
image is have
highly highly overlapping
overlapping
segmented areas.an
using It may areas.
be
image-based
It may be
difficult difficult to
to segment an segment
image based an image
on gray based on gray
threshold, and threshold,
it can be and
seenitincan be seen
Figure 3 thatin
grayscale threshold segmentation method, it may not be possible to segment a single ag-
Figure
the stones3 that
andthe stones
small andare
stones small stonesIfare
stacked. an stacked.
image is If an image using
segmented is segmented using an
an image-based
gregate target because
image-based grayscale the grayscales
threshold connected
segmentation to eachmay
method, other inbethe region are the same,
grayscale threshold segmentation method, it may not beitpossible notto possible
segment a to segment
single ag-
which
gregate
may
a single easily
aggregate
target
cause
target
because
thegrayscales
targets
thebecause theto stick together.
grayscales
connected connected
to
On to
each other
the other
each
in theother
hand,
region
the
in are
the image
region
the same,areof the
aggregate
the same, with a
which short
may collection
easily cause distance
the is
targets clear,
to but
stick the image
together.
which may easily cause the targets to stick together. On the other hand, the image of the On is blurred
the other when
hand, the dis-
the
tance is farther,
image
aggregate of the and it iscollection
withaggregate
a short difficult
with a short to perform
collection
distance image
is clear, butprocessing.
distancetheisimage
clear, is Therefore,
but the image
blurred whenthis
istheexperiment
blurred
dis-
when
uses
tancetheisthe distance
target
farther, andisitfarther,
detection andto
algorithm
is difficult it is difficult
YOLOv5
perform toto
image perform
extract image processing.
the Therefore,
processing. characteristicsTherefore,
this this
of aggregate
experiment in
experiment
different
uses thebackgroundsuses the
target detection target detection
to algorithm
realize the algorithm
type recognition
YOLOv5 YOLOv5
to extract the to extract the
of characteristics characteristics
aggregate. of aggregate in of
aggregate
different in different backgrounds
backgrounds to realize the to realize
type the typeofrecognition
recognition aggregate. of aggregate.

(a)
(a) (b) (b)
Figure
Figure 2.2.2.
Figure Comparison
Comparison of
of grayscale
Comparisonof grayscale histograms
grayscale histograms
histograms of of
of stones andand
stones
stones and small stones.
small
small (a)Grayscale
stones.
stones. (a) Grayscale histogram
(a) Grayscale histogram
histogram of
of stones. (b) Grayscale histogram of small stones.
of stones. (b) Grayscale
stones. (b) Grayscalehistogram
histogram of small
of small stones.
stones.

(a) (b)
Figure
Figure3.3.Stones
Stonesand
andsmall
(a) stones.
small stones.(a)
(a)Stones.
Stones.(b)
(b)Small
Smallstones.
stones. (b)
Figure In3.InStones
thisexperiment,
this experiment, aatotal
and small stones. totalofof525
(a) 525 images
Stones.
images were obtained.
(b) Small
were obtained.They
stones. Theywere
wereof ofstones,
stones,small
small
stones,machine-made
stones, machine-made sand, and and surface
surfacesand.
sand.InInthisthisexperiment,
experiment, the labels
the labelsthey delineated
they deline-
areInsz,
ated are xsz,
this jzs,
xsz,ms.jzs,We
sz,experiment, ms.use
Wea LabelImg
total to label
of 525
use LabelImg to images,
images labelwere andobtained.
images, the
andsmallest bounding
the They
smallestwere rectangle
of stones,
bounding ofsmall
rec-
the target was
tanglemachine-made
stones, used as
of the target wassand, the real
used as frame.
and In
thesurface the
real frame. final
sand. data
In the set, 80% (420
finalexperiment,
In this sheets)
data set, 80%the were
(420labels randomly
sheets)they
weredeline-
selected asselected
randomly the training
as the set and 20%
training set(105
and sheets)
20% as sheets)
(105 the testas set.
the test set.
ated are sz, xsz, jzs, ms. We use LabelImg to label images, and the smallest bounding rec-
The fourfour types
types ofof aggregate show different shapes and colors due to different drydry
and
tangle The
of the target was aggregate
used as the showrealdifferent
frame. In shapes and
the final colors due
data set, 80%to different
(420 sheets) were
wet wet
and states and and
states different brightness
different of light.
brightness The image
of light. The image collection environment
collection environmentwas under
was
randomly
cloudy, selected
sunny„ andasnight
the training setTheandaggregate
20% (105states sheets) asdry, thenormal,
test set.and wet. The
under cloudy, sunny,, andconditions.
night conditions. The aggregatewere states were dry, normal, and
The
collection four types of
distance distance aggregate
was 1.5 m. show different shapes and colors due to different dry
wet. The collection was 1.5 m.
and wet The
The states
computerand used
computer different
usedin thebrightness
inthe of light.
researchinstitute
research institute The image
isisIntel(R)
Intel(R) collection
Core(TM)
Core(TM) environment
i5-8250U,
i5-8250U, GHz was
1.80GHz
1.80
under
processor, running memory is 8 GB, storage memory is 512 GB, and the development en- and
cloudy,
processor, sunny,,
running and
memory night
is 8 conditions.
GB, storage The
memory aggregate
is 512 states
GB, and were
the dry, normal,
development
environment
wet. The collection
vironment is python
is python 3.6.3.6. was 1.5 m.
distance
The computer used in the research institute is Intel(R) Core(TM) i5-8250U, 1.80 GHz
processor, running memory is 8 GB, storage memory is 512 GB, and the development en-
vironment is python 3.6.
Electronics 2022, 11, 1743 5 of 15
Electronics 2022, 11, 1743 5 of 15

3.2. Model Establishment and Experimental Process


3.2. Model Establishment and Experimental Process
Aggregate Classification Model Based on Improved YOLOv5
Aggregate Classification Model Based on Improved YOLOv5
The technical route of the aggregate classification model is shown in Figure 4. The
The technical route of the aggregate classification model is shown in Figure 4. The
artificially-labeled aggregate data are input into the YOLOv5 model for training and fine-
artificially-labeled aggregate data are input into the YOLOv5 model for training and fine-
tuning,
tuning,totorealize
realize the
the real-time recognitionofofthe
real-time recognition thetarget.
target.The
Theimproved
improved YOLOv5-ytiny
YOLOv5-ytiny
model is used for the classification of aggregate under complex backgrounds
model is used for the classification of aggregate under complex backgrounds based based
on on
thethe
YOLOv5
YOLOv5model.model.YOLOv5-ytiny
YOLOv5-ytiny replaces theC3
replaces the C3module
moduleofofthethebackbone
backbone structure
structure of the
of the
network structure, cuts the Neck structure to achieve compression, reduces
network structure, cuts the Neck structure to achieve compression, reduces the network the network
prediction
predictionheader,
header, reduces the image
reduces the imagesize,
size,and
andadjusts
adjuststhe
thenetwork
network width.
width. It simplifies
It simplifies thethe
structure
structureandandparameters the model
parameters of the modelwhile
whileensuring
ensuringprecision.
precision.

Figure
Figure4.4.Improvements
Improvements to
to YOLOv5 model.
YOLOv5 model.

TheYOLOv5
The YOLOv5algorithmalgorithm has four four network
networkstructures,
structures,namely
namelyYOLOv5s,
YOLOv5s, YOLOv5m,
YOLOv5m,
YOLOv5l, and YOLOv5x. These four network structures are
YOLOv5l, and YOLOv5x. These four network structures are different in width and different in width and depth,
depth,
butthey
but theyare
arethe
thesame
same in in principle.
principle. These
Thesefourfournetwork
networkstructures
structures cancan bebe
flexibly selected
flexibly selected
accordingto
according to need.
need. The The greater
greaterthe thedepth
depth of of
thethe
structure selection,
structure the higher
selection, the precision,
the higher the preci-
but the training speed and inference speed also decrease. Aggregate
sion, but the training speed and inference speed also decrease. Aggregate is not a complexis not a complex target.
We hope to increase the speed of reasoning. Therefore, the selected structure is YOLOv5s,
target. We hope to increase the speed of reasoning. Therefore, the selected structure is
and improvements are made on this basis.
YOLOv5s, and improvements are made on this basis.
The YOLOv5 model mainly consists of the following five modules: (1) The Focus
The YOLOv5 model mainly consists of the following five modules: (1) The Focus
module slices the input image, which can achieve the effect of down-sampling the image
module
withoutslices
losingthe input image,
information. which
(2) The Convcanmodule,
achievethree
the effect of down-sampling
functions are encapsulatedthe image
in this
without losing information.
basic convolution (2) The Conv
module, including module,
convolution three functions
(Conv2d) layer, BN are encapsulated
(Batch Normalization)in this
basic convolution module, including convolution (Conv2d) layer, BN
batch normalization layer, and SiLU (Swish) activation function, which realizes that the in- (Batch Normaliza-
tion) batch normalization
put features are passed through layer,the and SiLU (Swish)
convolution layeractivation function,function.
and the activation which realizes
Throughthat
the
theinput features are
normalization passed
layer, through
the output layerthe convolution
is obtained. layer
(3) The and the module,
Bottleneck activation function.
is mainly
Through the normalization
used to reduce the number of layer, the output
parameters, layerreducing
thereby is obtained. (3) TheofBottleneck
the amount calculation,module,
and
after dimensionality reduction, data training and feature extraction
is mainly used to reduce the number of parameters, thereby reducing the amount of cal- can be performed more
effectively
culation, and and intuitively.
after (4) The C3
dimensionality module, data
reduction, in thetraining
new version of YOLOv5,
and feature the author
extraction can be
converts the
performed Bottleneck
more CSP (bottleneck
effectively layer) module
and intuitively. (4) The to C3themodule,
C3 module. Its new
in the structure and of
version
function are
YOLOv5, the the
authorsameconverts
as the CSP thearchitecture,
Bottleneck but CSPthe selection of
(bottleneck the correction
layer) module to unit
theis C3
different. It contains three standard convolutional layers and multiple
module. Its structure and function are the same as the CSP architecture, but the selection bottleneck modules.
(5) SPP module, spatial pyramid pooling. The main purpose of this module is to fuse more
of the correction unit is different. It contains three standard convolutional layers and mul-
features of different resolutions to obtain more information.
tiple bottleneck modules. (5) SPP module, spatial pyramid pooling. The main purpose of
this module is to fuse more features of different resolutions to obtain more information.
Electronics 2022, 11, 1743 6 of 15
Electronics 2022, 11, 1743 6 of 15
Electronics 2022, 11, 1743 6 of 15

3.3. YOLOv5-Ytiny
Although the established YOLOv5 model can realize the detection and classification
3.3. YOLOv5-Ytiny
3.3. YOLOv5-Ytiny
of aggregate,
Althoughthe thestructure
established andYOLOv5
parameters of the
model canmodel
realizeare
thestill relatively
detection andlarge, and the
classification
Although the established YOLOv5 model can realize the detection and classification
calculation
of aggregate, takes a long time. On the other hand, the detection and classification of aggre-
of aggregate, the thestructure
structureand andparameters
parametersofof thethemodel
modelareare
stillstill
relatively large,
relatively and
large, the
and
gate are generally
calculation takes a in the
long process
time. On of
thevehicle
other transportation,
hand, the and
detection the
and detection
classificationresults
of need
aggre-
the calculation takes a long time. On the other hand, the detection and classification of
to beare
gate displayed
generally in in
real-time. Therefore,
theinprocess to improve
of vehicle the detection
transportation, and the speed of theresultsmodelneed and
aggregate are generally the process of vehicle transportation, anddetection
the detection results
reduce
to be to the amount
displayed of calculation,
in real-time. the
Therefore, model is
to improve optimized and
the detection compressed
speed of the to form the
need be displayed in real-time. Therefore, to improve the detection speed ofmodel
the modeland
YOLOv5-ytiny
reduce the the
amountmodel.of calculation, thethe
model is isoptimized
and reduce amount of calculation, model optimizedand andcompressed
compressedto to form
form thethe
Replace the
YOLOv5-ytiny model. C3 module
model. of YOLOv5 with the CI module. As shown in Figure 5, the C3
YOLOv5-ytiny
module of YOLOv5
Replace the has a shortcut structure, which connects two adjacent layers of net-
Replace the C3
C3 module
moduleof ofYOLOv5
YOLOv5with withthetheCICImodule.
module. AsAs shown
shown in Figure
in Figure 5, the C3
5, the
works, and
module of there arehas
YOLOv5 n residual
a blocks.
shortcut For aggregate
structure, which targets, the
connects two data set belongs
adjacent layers to
of rela-
net-
C3 module of YOLOv5 has a shortcut structure, which connects two adjacent layers of
tively simple
works, andand target
there arerecognition.
n residual Multiple
blocks. For residual
aggregate modules
targets, in
thethe
dataC3setmodule
belongs may be a
to rela-
networks, there are n residual blocks. For aggregate targets, the data set belongs to
waste simple
tively of resources.
target Thus, replaceMultiple
recognition. it with aresidual
CI module. The in
modules structure
thetheC3C3 of CI is shown
module may bebein
a
relatively simple target recognition. Multiple residual modules in module may
Figure
awaste 5.
wasteofofresources.
resources.Thus,Thus,replace
replaceititwith
withaaCICImodule.
module. TheThestructure
structure of of CI
CI is shown in
is shown in
Figure 5.

Figure 5. Replace C3 module with CI module.


Figure
Figure 5. Replace C3
5. Replace C3 module
module with
with CI
CI module.
module.
After the C3 module is replaced with the CI module, the corresponding network
After
Afterinthe
structure theC3
the module
original
C3 module is
is replaced
Backbone with
module
replaced withandthe CI
CI module,
theNeck modulethe
module, willcorresponding
the also be changed.
corresponding network
At the
network
structure
same inall
time, thethe
original
layers Backbone
of the C3 module
module and
in Neck
the module
original willare
model also be changed.
changed to oneAt the
layer
structure in the original Backbone module and Neck module will also be changed. At the
same
to time,the
reduce alloverall
the layers of the
depth of C3 module
the network. inInthe original
order to modelmodel
achieve are changed to one layer
compression, after
same time, all the layers of the C3 module in the original model are changed to one layer
to reduce the
replacing overall depth
cutof theNeck
network. In order to achieve model compression, after
to reduce the
the C3 module,
overall depth ofthe module
the network. to remove
In order the relatively
to achieve redundant part
model compression, of
after
replacing
the the
network, C3
and module,
then cut
delete the
partNeck
of module
the structureto remove
to the
reduce relatively
the depth redundant
of the part
network,
replacing the C3 module, cut the Neck module to remove the relatively redundant part of
of the network,
reduce the amount andofthen delete part of theThe
model structure to reduce the depth of the network,
the network, and then deletecalculations. modified
part of the structure Neck module
to reduce the depth is shown
of the in Figure
network,
reduce
6. the amount of model calculations. The modified Neck module is shown in Figure 6.
reduce the amount of model calculations. The modified Neck module is shown in Figure
6.

Figure 6.
Figure 6. Modify
Modify the
the Neck
Neck module
module to
to compress
compress the
the network
network structure.
structure.
Figure 6. Modify the Neck module to compress the network structure.
There are three modules in the original YOLOv5 model, namely the Backbone module,
the Neck module, and the detection head. After replacing the C3 module with the CI
Electronics 2022, 11, 1743 7 of 15

module, in order to reduce the amount of calculation parameters, the detection head
was changed from three to two. The original Neck module is composed of multiple
convolutional layers (Conv), up-sampling, and tensor stitching (Concat). For a single
and simple target, only a part of the combination in the Neck layer may be good. The
original repeated multi-layer Neck structure may cause data redundancy, increase the
amount of calculation, so we tailor the Neck structure to compress the network. We use
fine-tuning-iterative training, a small number of times to cut the network structure, the
number of times of training, adjusting it by judging the convergence effect, and the final
network structure is shown in Figure 6. It can be seen that YOLOv5-ytiny eliminates part
of the repetitive hierarchy, retains the main network structure, and finally compresses it
into two detection heads.

3.4. Improvement of Loss Function


YOLOv5s uses GIoU Loss as the bounding box regression loss function to judge the
distance between the predicted box and the ground truth box. The formula is as follows.

A∩B
IoU = (1)
A∪B
Ac − u
GIoU = IoU − (2)
Ac
LGIoU = 1 − GIoU (3)
In the above formula, A is the predicted box, B is the ground truth box, IoU represents
the intersection ratio of predicted box and ground truth box, Ac represents the intersection
of predicted box and ground truth box, u represents the smallest circumscribed rectangle of
predicted box and ground truth box, and LGIoU is the GIoU Loss.
The original YOLOv5 model uses GIoU Loss as a position loss function to evaluate
the distance between the predicted box and the ground truth box, but GIoU Loss cannot
solve the situation where the prediction frame is inside the target frame and the size of
the prediction frame is the same. In addition, the bounding box regression is not accurate
enough, the convergence speed is slow, and only the overlap area relationship is considered.
CIoU Loss takes into account the scale information of the aspect ratio of the bounding box,
and measures it from the three perspectives of overlapping area, center point distance, and
aspect ratio, which makes the prediction box regression more effective.

d2
Lloc = 1 − IoU ( B, Bgt ) + + av (4)
c2
v
a= (5)
1 − IoU + v
4 w gt w
v= 2
( arctan gt
) − arctan )2 (6)
π h h
In the above formula, w and h are the width and height of the prediction box, respec-
tively, and wgt and hgt are the width and height of the ground truth box.
Compared with the GIoU Loss used in YOLOv5s, CIoU Loss takes into account the
overlapping area, center point distance, and aspect ratio for measurement, so that the
network can ensure faster convergence of the prediction frame during training and obtain
higher regression positioning accuracy; this paper uses CIoU Loss as the loss function of
the aggregate classification detection model.

4. Experimental Results and Analysis


4.1. Experimental Results
The YOLOv5-ytiny model detects aggregate as shown in Figure 7. Taking small stones
as an example, they are tested at different distances, different illuminations, and different
higher regression positioning accuracy; this paper uses CIoU Loss as the loss function of
the aggregate classification detection model.

4. Experimental Results and Analysis


Electronics 2022, 11, 1743 4.1. Experimental Results 8 of 15
The YOLOv5-ytiny model detects aggregate as shown in Figure 7. Taking small
stones as an example, they are tested at different distances, different illuminations, and
different
dry dry humidities
humidities of aggregate.
of aggregate. It can
It can be seen thatbe seen7a–c
Figure that are,
Figure 7a–c are, the
respectively, respectively,
detection
the detection
results results
on cloudy on cloudy
days, sunnydays,
days,sunny
and atdays, andatatanight
night at a distance
distance of 1.5
of 1.5 m. m. Figure
Figure 7d–f
7d–fthe
are aretest
theresults
test results at distances
at distances of 1of
m,1 m,
1.5 1.5
m, m,andand 2 m,
2 m, respectively.Figure
respectively. Figure7g–i
7g–ishow
show
the identification
the identification ofof small
small stones
stones under
under dry,
dry,normal,
normal,andandwetwetconditions.
conditions. The
The accurate
accurate
classificationof
classification ofaggregate
aggregateunder
underdifferent
differentbackgrounds
backgroundsisisrealized.
realized.After
Afterverification,
verification,the
the
effective
effectiverecognition
recognitionrange
rangeofofthe
theYOLOv5-ytiny
YOLOv5-ytinymodel modelisisbetween
between11mmand and22m.m.

Figure7.7. Small
Figure Smallstones
stonesunder
underdifferent
differentdistances,
distances,light,
light,and
anddry
dryhumidity.
humidity.(a) (a)isisthe
thesmall
smallstones
stonesonona
cloudy
a cloudyday, (b)(b)
day, is the small
is the stones
small on on
stones a sunny
a sunnyday,day,
(c) is
(c)the small
is the stones
small of the
stones night,
of the (d) is
night, (d)the
is small
the
small stones
stones at a distance
at a distance of 1 (e)
of 1 meter, meter,
is the(e)small
is thestones
smallatstones at a distance
a distance of 1.5 (f)
of 1.5 meters, meters,
is the (f) is the
small stones
small
at stones of
a distance at 2a meters,
distance(g) ofis2 the
meters,
small(g)stones
is theinsmall stones
the dry in the
state, dry
(h) is thestate,
small (h)stones
is thein
small stones
the normal
in the normal state, (i) is the small stones
state, (i) is the small stones in the wet state. in the wet state.

Table11shows
Table shows the
theconfidence
confidence levels
levels of
of the
the four
four types
types of
of aggregate
aggregate under
under different
different
conditions. The inspection was carried out under different light conditions
conditions. The inspection was carried out under different light conditions on sunny andon sunny and
cloudy days, and at night, and at different distances, namely 1 m, 1.5 m, and
cloudy days, and at night, and at different distances, namely 1 m, 1.5 m, and 2 m. When 2 m. When
inspecting the
inspecting the aggregate,
aggregate, the
the dry
dry and
and wet
wet state
state of
of the
the aggregate
aggregate waswas also
also different.
different. Tests
Tests
werecarried
were carriedout
outin
indry,
dry,normal,
normal,and
andwetwetconditions.
conditions.

Table1.1.Detection
Table Detectionconfidence
confidenceinindifferent
differentsates.
sates.

Category Category Different Lighting


Different Lighting DifferentDifferent
Distance Distance Different
Different WetWet
And And
Dry Dry
Confidence/% ConfidenceSunny Cloudy
State Cloudy Day Night
Sunny1 Night
m 1.51mm 1.52 mm 2 m Dry DryNormal
Normal Moist
Moist
/% Day
Stones 92 96 94 94 96 68 96 95 95
Small stones 72 93 61 94 93 93 93 88 86
Machine-made sand 86 80 78 90 80 83 80 83 89
Surface sand 95 95 86 94 95 70 95 96 95

In total, there were 300 iterations of the YOLOv5 model and YOLOv5-ytiny respec-
tively. Figure 8 shows the change trend of the classification loss function. We can see
Stones 92 96 94 94 96 68 96 95 95
Small stones 72 93 61 94 93 93 93 88 86
Machine-made
86 80 78 90 80 83 80 83 89
sand
Surface sand 95 95 86 94 95 70 95 96 95
Electronics 2022, 11, 1743 9 of 15
In total, there were 300 iterations of the YOLOv5 model and YOLOv5-ytiny respec
tively. Figure 8 shows the change trend of the classification loss function. We can see tha
theYOLOv5
that the original original YOLOv5 loss function
loss function drops rapidly
drops rapidly in the stages
in the early early stages
of theofiteration,
the iteration, ind
cating that the model is quickly fitting. The convergence speed of
indicating that the model is quickly fitting. The convergence speed of YOLOv5-ytiny in the YOLOv5-ytiny in th
early stage is slower than that of the original model. By 200 iterations,
early stage is slower than that of the original model. By 200 iterations, the loss values of the the loss values o
the two models are basically the same, indicating that the convergence effect of YOLOv5
two models are basically the same, indicating that the convergence effect of YOLOv5-ytiny
ytiny is good, and the learning efficiency of the model is high. As the iterations continu
is good, and the learning efficiency of the model is high. As the iterations continue, about
about 140 times, the model loss value decreases slowly. When the iteration reaches 22
140 times, the model loss value decreases slowly. When the iteration reaches 220 times, the
times, the loss value fluctuates at 0.001 and the model reaches a stable state.
loss value fluctuates at 0.001 and the model reaches a stable state.

Figure 8. Comparison of the epoch process of the two models. Note: the experiment mainly seeks to
compress the model
Figureand reduce the space
8. Comparison occupied
of the epoch by the
process model,
of the so it is normal
two models. toexperiment
Note: the have a certain
mainly seeks t
impact on precision and function
compress loss.
the model and reduce the space occupied by the model, so it is normal to have a certai
impact on precision and function loss.
4.2. Evaluation and Analysis of Model Performance
4.2.selects
This paper Evaluation and Analysisused
the commonly of Model Performance
evaluation indicators of the target detection
model: precision, recall, balanced F score (F1-score), mean averageindicators
This paper selects the commonly used evaluation precisionof
(mAP) and detectio
the target
FPS (frames per second) for three evaluations of the model with three indicators.
model: precision, recall, balanced F score (F1-score), mean average precision (mAP) an
The formula
FPSis(frames
as follows
per second) for three evaluations of the model with three indicators.
The formula is as follows
TP
P= (7)
TP + FP
TP P = TP
R= (8) (7
TP + FN TP + FP
P∗R
F1 − score = 2 ∗
+=R
PR
TP (9)
(8
∑P TP + FN
AP = (10)
Num( TotalObjects)
∑ AP
mAP = (11)
N (class)
In the above formula, TP is the number of positive examples that are correctly classified.
FP is the number of positive examples that are incorrectly classified. FN is the number of
negative examples that are incorrectly classified, TN is the number of negative examples
that are correctly classified. AP is the average precision, mAP is the mean of each type of
AP. Num is the target number of each category. N is the total category.

4.3. Comparison with Original Model


The comparison of the evaluation indicators between the original YOLOv5 model
and the YOLOv5-ytiny model is shown in Figure 9. This shows the precision, recall, and
of negative examples that are incorrectly classified, TN is the number of negative exam-
ples that are correctly classified. AP is the average precision, mAP is the mean of each type
of AP. Num is the target number of each category. N is the total category.

Electronics 2022, 11, 1743


4.3. Comparison with Original Model 10 of 15
The comparison of the evaluation indicators between the original YOLOv5 model
and the YOLOv5-ytiny model is shown in Figure 9. This shows the precision, recall, and
F1-score of the YOLOv5-ytiny and the original model YOLOv5. The precision of the
F1-score of the YOLOv5-ytiny and the original model YOLOv5. The precision of the
YOLOv5-ytiny
YOLOv5-ytiny is is96.5%,
96.5%,which
which is is
0.2% lower
0.2% lowerthan
thanthe
theoriginal
originalmodel,
model,the recall
the recallrate
rateis is
98.5%, 0.4% higher than the original YOLOv5 model, and the F1-score is 97.5%,
98.5%, 0.4% higher than the original YOLOv5 model, and the F1-score is 97.5%, which which is is
0.1% higher than the original model.
0.1% higher than the original model.

Figure 9. Comparison of model evaluation indicators.


Figure 9. Comparison of model evaluation indicators.
Below, the YOLOv5-ytiny model is compared with some parameters of the original
Below,model.
YOLOv5 the YOLOv5-ytiny model2,isthe
As shown in Table compared with some
total parameters parameters
of the of the model
YOLOv5-ytiny original
are
YOLOv5 model.
19,755,583 fewerAs shown
than in TableYOLOv5
the original 2, the total parameters
model. The mAP of is
the YOLOv5-ytiny
99.6%, model
which is consistent
are 19,755,583
with fewer
the original than themodel.
YOLOv5 original YOLOv5is model.
Precision reducedTheby mAP is 99.6%, which
0.2% compared to theisoriginal
con-
sistent
model, with the original
so there YOLOv5 model.
is no significant Precision is reduced
drop. YOLOv5-ytiny’s by space
storage 0.2% compared
is 3.04 MB,towhich
the
original model, so there is no significant drop. YOLOv5-ytiny’s storage space
is 10.66 MB smaller than the original YOLOv5, and the calculation time is 0.04 s, which isis 3.04 MB,
which is 10.66
60% faster thanMBYOLOv5.
smaller than the original
The data in Table 2YOLOv5,
show thatand
thethe calculation
precision of thetime is 0.04 s,
YOLOv5-ytiny
which
modelisis60% faster than
consistent with theYOLOv5.
originalThe data in
YOLOv5 Table
mAP, 2 show
with a slightthat the precision
decrease of the
in precision, and
the calculationmodel
YOLOv5-ytiny speedisisconsistent
greatly improved
with the compared with themAP,
original YOLOv5 original
withYOLOv5 model.
a slight decrease
in precision, and the calculation speed is greatly improved compared with the original
Improved model parameters.
Table 2. model.
YOLOv5
Parameter YOLO v5 YOLOv5-Ytiny
Total parameter quantity/piece 21,375,645 1,620,062
mAP/% 99.6 99.6
Precision/% 96.7 96.5
Model storage space/MB 13.7 3.04
Computing time/s 0.1 0.04

4.4. Comparison with Other Target Detection Models


In the field of target detection, SSD, YOLOv4, and YOLOv4-tiny have high detection
precision. In order to verify the effectiveness of this method, the training set of this paper
was used to train these three models and the YOLOv5 model, respectively, and the test
data were set to evaluate the performance of these four algorithms and obtain the precision,
recall, and F1-score of the four algorithms. The comparison results are shown in Figure 10.
In the field of target detection, SSD, YOLOv4, and YOLOv4-tiny have high detection
precision. In order to verify the effectiveness of this method, the training set of this paper
was used to train these three models and the YOLOv5 model, respectively, and the test
data were set to evaluate the performance of these four algorithms and obtain the preci-
Electronics 2022, 11, 1743 sion, recall, and F1-score of the four algorithms. The comparison results are shown in11Fig-
of 15
ure 10.

Figure10.
Figure Comparisonofofvarious
10.Comparison variousalgorithm
algorithmevaluation
evaluationindicators.
indicators.

We can see from Figure 10 that among the five algorithms, the SSD algorithm has
We can see from Figure 10 that among the five algorithms, the SSD algorithm has the
the highest precision rate. The recall rate and F1-score of the algorithm in this paper are
highest precision rate. The recall rate and F1-score of the algorithm in this paper are the
the highest. The F1-score is defined as the harmonic average of the precision and recall
highest.
rates. ItThe
is a F1-score
measureisofdefined as the harmonic
classification problems.average
In someofmachine
the precision andcompetitions
learning recall rates.
Itfor
is multi-classification
a measure of classification
problems,problems.
the F1-scoreIn issome
oftenmachine learning
used as the competitions
final evaluation for
method.
multi-classification problems,
YOLOv5-ytiny has superiority. the F1-score is often used as the final evaluation method.
YOLOv5-ytiny has superiority.
Table 3 compares the comprehensive evaluation indicators of the five algorithms.
Table 3 compares
The precision, mAP, modelthe comprehensive
storage space,evaluation
and FPS ofindicators
these fiveofalgorithms
the five algorithms. The
are compared,
precision, mAP, model storage space, and FPS of these
respectively. The comparison results are shown in Table 3. five algorithms are compared, re-
spectively. The comparison results are shown in Table 3.
Table 3. Comparison of comprehensive evaluation of algorithms.
Table 3. Comparison of comprehensive evaluation of algorithms.
Network Precision/% mAP/% Model Storage Space/MB FPS (Frame/s)
Model Storage
Network
YOLO v4 Precision/%
88 mAP/%
99.62 244.24 FPS (frame/s)
0.4
YOLO v4-tiny 82 91.78
Space/MB
22.47 0.4
YOLO
SSD v4 97.5488 96.9499.62 244.24
92.6 0.4
0.65
YOLO
YOLO v4-tiny
v5 96.782 99.691.78 22.47
13.7 0.4
9.17
YOLO v5-ytiny
SSD 96.5
97.54 99.696.94 3.04
92.6 22.73
0.65
YOLO v5 96.7 99.6 13.7 9.17
YOLOCompared
v5-ytinywith the other
96.5 four models, 99.6 the detection3.04
speed of YOLOv5-ytiny22.73 is faster.
The mAP of the improved YOLOv5-ytiny model is the same as that of the original YOLOv5
model, and the with
Compared model storage
the spacemodels,
other four is reduced by 78%. Inspeed
the detection termsofofYOLOv5-ytiny
the detection speed, the
is faster.
method in this paper has the fastest detection speed. The precision of
The mAP of the improved YOLOv5-ytiny model is the same as that of the original the SSD and YOLOv5
models is slightly higher than that of YOLOv5-ytiny. The precision of the YOLOv5-ytiny
model is 8.5% and 14.5% higher than that of YOLOv4 and YOLOv4-tiny, respectively. In
terms of the models’ storage space and detection speed, YOLOv5-ytiny has an absolute
advantage, and while improving the detection speed, the mAP of the YOLOv5-ytiny model
is consistent with the original YOLOv5 model, and the precision of the model has not
dropped significantly.
In summary, compared with the other four models, YOLOv5-ytiny has a smaller
model storage space and a faster detection speed. The detection speed is higher than that
of YOLOv4, YOLOv4-tiny, SSD, and YOLOv5, 22.33 f/s, 22.33 f/s, 22.08 f/s, and 13.56 f/s,
respectively. The detection precision of YOLOv5-ytiny is high, the detection speed is fast,
and the improved space is small, which proves that the aggregate detection classification
model YOLOv5-ytiny, based on the improved YOLOv5, has good practicability.

4.5. Practical Application


The experiment was conducted in cooperation with Zhengzhou Sanhe Hydraulic
Machinery Co., Ltd. The experimental method was applied to the concrete batching
plant for the preparation of concrete raw materials. Consistent with the model used in this
experiment, the labels were inconsistent; we divided the results into four states, namely null
(representing the unloaded state), complete unloading state, melon stones, and stones12.
4.5. Practical Application
The experiment was conducted in cooperation with Zhengzhou Sanhe Hydrauli
Machinery Co., Ltd. The experimental method was applied to the concrete batching plan
Electronics 2022, 11, 1743 for the preparation of concrete raw materials. Consistent with the model used in this ex
12 of 15
periment, the labels were inconsistent; we divided the results into four states, namely nul
(representing the unloaded state), complete unloading state, melon stones, and stones12
Theexperimental
The experimental results
results are are shown
shown in Figure
in Figure 11a–d.
11a–d. The number
The number on thelabel
on the image image label is th
is the
probabilityofof
probability recognition.
recognition.

Figure
Figure11.11.Examples
Examples of on-site aggregate
of on-site inspection
aggregate applications.
inspection (a) is the
applications. (a)unloaded state, (b) is
is the unloaded the (b) is
state,
state of completion of unloading, (c) represents that the aggregate is melon stones, (d) represents
the state of completion of unloading, (c) represents that the aggregate is melon stones , (d) repre- that
the aggregate
sents that theis aggregate
12 stones. is 12 stones.

5. Discussion
5. Discussion
Although tremendous progress has been made in the field of object detection recently,
Although
it remains tremendous
a difficult task to detect progress hasobjects
and identify been accurately
made in the
and field
quickly.of Yan
object
et al.detection
[30] re
cently,the
named it remains
YOLOv5a as difficult
the mosttaskpowerful
to detectobject
and identify
detectionobjects accurately
algorithm andtimes.
in present quickly. Yan
In
et the current
al. [30] study,
named thetheYOLOv5
overall performance
as the most of YOLOv5object
powerful was better than YOLOv4
detection algorithm and
in presen
YOLOv3. This finding is in line with some previous researches, as we
times. In the current study, the overall performance of YOLOv5 was better than YOLOv4 found several
studies comparing YOLOv5 to previous versions of YOLO, such as YOLOv4 or YOLOv3.
and YOLOv3. This finding is in line with some previous researches, as we found severa
According to a study by Nepal [40], YOLOv5 is more accurate and faster than YOLOv4.
studies comparing YOLOv5 to previous versions of YOLO, such as YOLOv4 or YOLOv3
YOLOv5 was compared to YOLOv3 and YOLOv4 for picking apples by robots, and the
According
mAP to a study
was increased by Nepal
by 14.95% [40], respectively
and 4.74%, YOLOv5 is[30].more accurate
Similar and
results andfaster than YOLOv4
comparisons
with other YOLO models were demonstrated by [32] while using YOLOv5 to detect the
behavior of cage-reared laying ducks. The recall (73%) and precision (62%) of YOLOv5 was
better compared to YOLOv3-tiny (57% and 45%, respectively) for ship detection in satellite
remote sensing images [31]. In experiments on grape variety detection, YOLOv5 had higher
F1 scores than YOLOv4-tiny [41]. In our experiment, YOLOv5 also showed better results
than YOLOv4 and YOLOv4-tiny. On the other hand, we encountered various studies that
showed that YOLO outperforms SSD in object detection deep learning methods. In traffic
sign recognition, Zhu et al. [33] used the same data set, and the results showed that the
mAP of YOLOv5 was 7.56% higher than that of SSD, and YOLOv5 was also better than SSD
in terms of recognition speed. In addition, YOLOv5 was found to have better recognition
accuracy than SSD [34] when detecting strawberry ripeness. In this experiment, YOLOv5
also shows better results in inference speed and mAP compared to SSD. In many studies,
YOLOv5 outperforms SSD in terms of speed and accuracy [35]. In this experiment, the
YOLOv5-ytiny model based on an improved YOLOv5 has advantages in both speed and
mAP. Furthermore, given the previous discussion, we believe that choosing to improve
YOLOv5 for aggregate identification is a wise move.
Electronics 2022, 11, 1743 13 of 15

6. Conclusions
The aggregate detection classification model YOLOv5-ytiny is based on the improve-
ment of YOLOv5. In order to adapt to the complex environmental factors in the detection
process, we trained four aggregates under different types of light, different wet and dry
conditions, and different detection distances to achieve the real-time classification of ag-
gregate. YOLOv5-ytiny used CIoU as the loss function of the frame regression to improve
the precision of the frame regression. We modified the network structure of the Backbone
C3 of YOLOv5. Under the premise of ensuring the mean average precision and precision,
reducing the number of YOLOv5 detection heads to simplify the model reduces the amount
of calculation and improves the detection speed of the model. The experiment shows that
the model storage space is reduced by 10.66 MB compared with YOLOv5, and the detection
speed is 60% higher than the original YOLOv5 model.
By comparing the experimental results of the proposed YOLOv5-ytiny model with the
object detection networks of SSD, YOLOv4, and YOLOv4-tiny, it can be proved that the
strategy proposed in this study can effectively improve the detection precision. Meanwhile,
the detection speed of 22.73 FPS enables the YOLOv5-ytiny model to be applied to the
industrial production of real-time aggregate classification.
In the experiment, on a computer device that supports a CPU, if the YOLOv5 model is
used, the model occupies a large space and the inference speed is slow. After the proposed
YOLOv5-ytiny is compressed, the model space is 3.04 MB, and the inference speed can
reach 22.73 FPS, which can meet the actual requirements.

Author Contributions: S.Y. (Sheng Yuan), methodology; Y.D., writing—original draft preparation;
M.L., supervision; S.Y. (Shuang Yue), validation; B.L., software; H.Z., resources. All authors have
read and agreed to the published version of the manuscript.
Funding: This research was funded by Major Science and Technology Project of Henan Province,
Zhengzhou major scientific and technological innovation special project, Key scientific research project
plan of colleges and universities in Henan Province, grant numbers are 201110210300, 2019CXZX0050,
and 21A510007. The APC was funded by Major Science and Technology Project of Henan Province,
Zhengzhou major scientific and technological innovation special project, Key scientific research
project plan of colleges and universities in Henan Province.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Acknowledgments: Materials used for experiments are supported by Zhengzhou Sanhe Hydraulic
Machinery Co., Ltd.
Conflicts of Interest: The authors declare no conflict of interest. A role of the funders in the design of
the study is to provide data in the writing of the manuscript, or in the decision to publish the results
must be declared in this section.

References
1. Yilmaz, M.; Tugrul, A. The effects of different sandstone aggregates on concrete strength. Constr. Build. Mater. 2012, 35, 294–303.
[CrossRef]
2. Jiangwei, B.; Wenbing, Z.; Zhenzhong, S.; Song, L.; Zhanglan, C. Analysis and optimization of mechanical properties of recycled
concrete based on aggregate characteristics. Sci. Eng. Compos. Mater. 2021, 28, 516–527. [CrossRef]
3. Yan, S.; Xla, B.; Sza, B.; Zl, C.; My, D.; Cl, C. Mechanical properties, durability, and itz characteristics of full-grade dam concrete
prepared by aggregates with surface rust stains. Constr. Build. Mater. 2021, 305, 124798.
4. Aksakal, E.L.; Angin, I.; Sari, S. A new approach for calculating aggregate stability: Mean weight aggregate stability (MWAS).
Catena 2020, 194, 104708. [CrossRef]
5. Wang, D.; Liu, G.; Li, K.; Wang, T.; Shrestha, A.; Martek, I.; Tao, X. Layout optimization model for the production planning of
precast concrete building components. Sustainability 2018, 10, 1807. [CrossRef]
6. Sicakova, A. Effect of Aggregate Size on Recycled Aggregate Concrete under Equivalent Mortar Volume Mix Design. Appl. Sci.
2021, 11, 11274.
Electronics 2022, 11, 1743 14 of 15

7. Ding, X.; Ma, T.; Gao, W. Morphological characterization and mechanical analysis for coarse aggregate skeleton of asphalt mixture
based on discrete-element modeling. Constr. Build. Mater. 2017, 154, 1048–1061. [CrossRef]
8. Zhan, D.A.; Pl, B.; Xu, W. Evaluation of the contact characteristics of graded aggregate using coarse aggregate composite geometric
indexes—ScienceDirect. Constr. Build. Mater. 2020, 247, 118608. [CrossRef]
9. Hong, L.; Gu, X.L.; Lin, F. Effects of coarse aggregate form, angularity, and surface texture on concrete mechanical performance.
J. Mater. Civ. Eng. 2019, 31, 10. [CrossRef]
10. Sadjad, N.; Mingzhong, Z. Meso-scale modelling of static and dynamic tensile fracture of concrete accounting for real-shape
aggregates. Cem. Concr. Compos. 2021, 116, 103889. [CrossRef]
11. Mohd, S.S.; Roslan, Z.A.; Nor, A.Z.; Mohammad, F.; Ahmad, F.; Ahmad, J. Revisiting the automated grain sizing technique (AGS)
for characterizing grain size distribution. Int. J. River Basin Manag. 2021, 37, 974–980.
12. Pei, L.; Sun, Z.; Yu, T.; Li, W.; Hao, X.; Hu, Y. Pavement aggregate shape classification based on extreme gradient boosting-
sciencedirect. Constr. Build. Mater. 2020, 256, 119356. [CrossRef]
13. Park, S.S.; Lee, J.S.; Lee, D.E. Aggregate Roundness Classification Using a Wire Mesh Method. Materials 2020, 13, 3682. [CrossRef]
[PubMed]
14. Mehdi, K. Assessment of strength of individual ballast aggregate by conducting point load test and establishment of classification
method. Int. J. Rock Mech. Min. Sci. 2021, 141, 104711. [CrossRef]
15. Schmidt, T.; Linders, J.; Mayer, C.; Cölfen, H. Determination of Particle Size, Core and Shell Size Distributions of Core–Shell
Particles by Analytical Ultracentrifugation. Part. Part. Syst. Charact. 2021, 38, 2100079. [CrossRef]
16. Bagheri, G.H.; Bonadonna, C.; Manzella, I.; Vonlanthen, P. On the characterization of size and shape of irregular particles. Powder
Technol. 2015, 270, 141–153. [CrossRef]
17. Agimelen, O.S.; Jawor, B.A.; McGinty, J.; Dziewierz, J.; Tachtatzis, C.; Cleary, A.; Mulholland, A.J. Integration of in situ imaging
and chord length distribution measurements for estimation of particle size and shape. Chem. Eng. Sci. 2016, 144, 87–100.
[CrossRef]
18. Yang, J.H.; Fang, H.Y.; Chen, S.J. Development of particle size and shape measuring system for machine-made sand. Part. Sci.
Technol. 2019, 37, 974–980. [CrossRef]
19. Isa, N.A.M.; Sani, Z.M.; AlBatah, M.S. Automated Intelligent real-time system for aggregate classification. Int. J. Miner. Process.
2011, 100, 41–50. [CrossRef]
20. Sun, Z.; Li, Y.; Pei, L.; Li, W.; Hao, X. Classification of Coarse Aggregate Particle Size Based on Deep Residual Network. Symmetry
2022, 14, 349. [CrossRef]
21. Moaveni, M.; Wang, S.; Hart, J.M.; Tutumluer, E.; Ahuja, N. Evaluation of aggregate size and shape by means of segmentation
techniques and aggregate image processing algorithms. Transp. Res. Rec. 2013, 2335, 50–59. [CrossRef]
22. Sinecen, M.; Topal, A.; Makinaci, M.; Baradan, B. Neural network classification of aggregate by means of line laser based 3D
acquisition. Expert Syst. 2013, 30, 333–340. [CrossRef]
23. Moon, K.H.; Falchetto, A.C.; Wistuba, M.P.; Jeong, J.H. Analyzing aggregate size distribution of asphalt mixtures using simple 2D
digital image processing techniques. Arab. J. Sci. Eng. 2015, 40, 1309–1326. [CrossRef]
24. Ozturk, H.I.; Rashidzade, I. A photogrammetry based method for determination of 3D morphological indices of coarse aggregate.
Constr. Build. Mater. 2020, 262, 120794. [CrossRef]
25. Pan, T.; Tutumluer, E. Evaluation of Visual Based Aggregate Shape Classifications Using the University of Illinois Aggregate
Image Analyzer (UIAIA). In Proceedings of the GeoShanghai International Conference 2006, Shanghai, China, 11 May 2006;
pp. 203–211. [CrossRef]
26. Wang, C.Y.; Liao, H.Y.M.; Yeh, I.H. Exploring the power of lightweight YOLOv4. In Proceedings of the IEEE/CVF International
Conference on Computer Vision, Montreal, QC, Canada, 11 October 2021; pp. 779–788.
27. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single Shot Multibox Detector. In Proceedings of
the European Conference on Computer Vision; Springer: Cham, Switzerlands, 2016; pp. 21–37.
28. Jocher, G.; Stoken, A.; Borovec, J.; Chaurasia, A.; Xie, T. Ultralytics/yolov5: V5. 0-YOLOv5-P6 1280 models AWS Supervise. ly
and YouTube integrations. Zenodo 2021, 4679653. [CrossRef]
29. Yao, J.; Qi, J.; Zhang, J.; Shao, H.; Yang, J.; Li, X. A Real-time detection algorithm for kiwifruit defects based on YOLOv5. Electronics
2021, 10, 1711. [CrossRef]
30. Yan, B.; Fan, P.; Lei, X.; Liu, Z.; Yang, F. A Real-Time Apple Targets Detection Method for Picking Robot Based on Improved
YOLOv5. Remote Sens. 2021, 13, 1619. [CrossRef]
31. Yao, Y.; Jiang, Z.; Zhang, H. Ship detection in optical remote sensing images based on deep convolutional neural networks.
J. Appl. Remote Sens. 2017, 11, 042611. [CrossRef]
32. Gu, Y.; Wang, S.; Yan, Y.; Tang, S.; Zhao, S. Identification and Analysis of Emergency Behavior of Cage-Reared Laying Ducks
Based on YoloV5. Agriculture 2022, 12, 485. [CrossRef]
33. Zhu, Y.; Yan, W.Q. Traffic sign recognition based on deep learning. Multimed. Tools Appl. 2022, 81, 17779–17791. [CrossRef]
34. Fan, Y.; Zhang, S.; Feng, K.; Qian, K.; Wang, Y.; Qin, S. Strawberry Maturity Recognition Algorithm Combining Dark Channel
Enhancement and YOLOv5. Sensors 2022, 22, 419. [CrossRef] [PubMed]
35. Lema, D.G.; Pedrayes, O.D.; Usamentiaga, R.; García, D.F.; Alonso, Á. Cost-Performance Evaluation of a Recognition Service of
Livestock Activity Using Aerial Images. Remote Sens. 2021, 13, 2318. [CrossRef]
Electronics 2022, 11, 1743 15 of 15

36. Jhong, S.Y.; Chen, Y.Y.; Hsia, C.H.; Lin, S.C.; Hsu, K.H.; Lai, C.F. Nighttime object detection system with lightweight deep network
for internet of vehicles. J. Real-Time Image Proc. 2021, 18, 1141–1155. [CrossRef]
37. Wang, C.; Wang, H.; Yu, F.; Xia, W. A High-Precision Fast Smoky Vehicle Detection Method Based on Improved Yolov5 Network.
In Proceedings of the 2021 IEEE International Conference on Artificial Intelligence and Industrial Design (AIID) IEEE, Bandung,
Indonesia, 28 May 2021; pp. 255–259.
38. Wu, W.; Liu, H.; Li, L.; Long, Y.; Wang, X.; Wang, Z.; Chang, Y. Application of local fully Convolutional Neural Network combined
with YOLOv5 algorithm in small target detection of remote sensing image. PLoS ONE. 2021, 16, e0259283. [CrossRef] [PubMed]
39. Song, Q.; Li, S.; Bai, Q.; Yang, J.; Zhang, X.; Li, Z.; Duan, Z. Object Detection Method for Grasping Robot Based on Improved
YOLOv5. Micromachines 2021, 12, 1273. [CrossRef] [PubMed]
40. Nepal, U.; Eslamiat, H. Comparing YOLOv3, YOLOv4 and YOLOv5 for Autonomous Landing Spot Detection in Faulty UAVs.
Sensors 2022, 22, 464. [CrossRef] [PubMed]
41. Sozzi, M.; Cantalamessa, S.; Cogato, A.; Kayad, A.; Marinello, F. Automatic Bunch Detection in White Grape Varieties Using
YOLOv3, YOLOv4, and YOLOv5 Deep Learning Algorithms. Agronomy 2022, 12, 319. [CrossRef]

You might also like