Professional Documents
Culture Documents
Abstract— Despite significant progress in vision-based detec- accurate object information is crucial for traffic investigations
tion methods, the task of detecting traffic objects in foggy weather related to safety [7], management [8], and other aspects
remains challenging. The presence of fog reduces visibility, which [9], [10].
in turn affects the information of traffic objects in videos.
However, accurate information regarding the localization and Acquiring accurate object information is challenging, due to
classification of traffic objects is crucial for certain traffic inves- the adverse effects of fog [11]. The presence of fog can lead
tigations. In this article, we focus on presenting a multiclass object to reduced visibility, blurred information, increased noise, and
detection method, namely, multiscale feature fusion attention- other factors in traffic videos [12]. The aforementioned factors
you only look once (MSFFA-YOLO) network, that can be
stemming from foggy conditions can significantly impact
trained and jointly achieve three tasks: visibility enhancement,
object classification, and object localization. In the network, the accuracy of object detection results, posing substantial
we employ the enhanced YOLOv7 as a detection subnet, which challenges in this study.
is responsible for learning to locate and classify objects. In the One of the challenges is to detect multiclass traffic objects,
restoration subnet, the MSFFA structure is presented for visibility such as cars, bicycles, buses, and others [12], [13], [14].
enhancement. The experimental results on the synthetic foggy
In foggy conditions, the phenomenon of reduced contrast and
datasets show that the presented MSFFA-YOLO can achieve
64.6% accuracy on the FC005 dataset, 67.3% accuracy on diminished clarity in object details may arise [11], subse-
the FC01 dataset, and 65.7% accuracy on the FC02 dataset. quently impacting the localization and classification of traffic
When evaluated on the natural foggy datasets, the presented objects. And, the concealment of certain traffic objects in the
MSFFA-YOLO can achieve 84.7% accuracy on the RTTS dataset fog may pose a significant obstacle to the accurate detection
and 84.1% accuracy on the RW dataset, indicating its ability
of multiclass traffic objects [5], [15], [16].
to accurately detect multiclass traffic objects in real and foggy
weather. And, the experimental results show that the presented In this article, we presented a multiclass object detection
MSFFA-YOLO can achieve the efficiency of 37 frames per second method for traffic investigations in foggy weather. The main
(FPS). Finally, the experimental results demonstrate the excellent contribution points of this article can be summarized as
performance of our presented method for object localization and follows.
classification in foggy weather. And, when detecting concealed
traffic objects in foggy weather, our presented method exhibits 1) We presented a novel object detection method, namely,
superior accuracy. These results substantiate the applicability of the multiscale feature fusion attention-you only look
our presented method for traffic investigations in foggy weather. once (MSFFA-YOLO) network, that includes multi-
Index Terms— Feature attention, feature fusion, foggy weather, scale feature fusion, feature attention mechanism, and
object detection, you only look once (YOLO). enhanced YOLOv7 to improve the accuracy of object
I. I NTRODUCTION detection in foggy weather.
2) Our presented method yields advancements by achiev-
I N RECENT years, deep learning-based object detection
methods [1], [2], [3], [4], [5] have exhibited remark-
able performance in some traffic investigations. Nevertheless,
ing exceptional performance in object localization and
classification tasks, while also demonstrating its superior
accuracy in detecting concealed traffic objects amidst
the effectiveness of certain methods can be considerably
foggy conditions.
compromised in foggy conditions. The issue has attracted con-
3) Our presented method can contribute to traffic investi-
siderable attention, especially in some traffic scenes [6], where gations by providing accurate and reliable information
Manuscript received 10 April 2023; revised 18 August 2023; about traffic objects in foggy weather, holding the
accepted 7 September 2023. Date of publication 25 September 2023; potential to enhance traffic safety and optimize traffic
date of current version 11 October 2023. This work was supported in management.
part by the National Natural Science Foundation of China under Grant
52272344 and in part by the Fundamental Research Funds for the Central Furthermore, the effectiveness of our presented method is
Universities. The Associate Editor coordinating the review process was Dr.
Jianbo Yu. (Corresponding author: Xiaojian Hu.) demonstrated through extensive experiments and benchmark
The authors are with the Jiangsu Key Laboratory of Urban ITS, the evaluations. There is substantial evidence from all exper-
Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic imental results that confirms the exceptional accuracy of
Technologies, and the School of Transportation, Southeast University, Nanjing
211189, China (e-mail: 230198700@seu.edu.cn; huxiaojian@seu.edu.cn). our presented method, surpassing that of other comparison
Digital Object Identifier 10.1109/TIM.2023.3318671 methods.
1557-9662 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 01:59:17 UTC from IEEE Xplore. Restrictions apply.
2528712 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
The remainder of this article is organized as follows. probabilities. However, if there are dense objects and small
In Section II, we review related works on object detection objects, the detection performance is poor.
and dehazing methods, as well as the challenges of multiclass Regarding the other members in the YOLO family, they can
object detection in foggy weather. In Section III, we showed achieve better detection performance through some improve-
our presented method in detail. In Section IV, we presented ments. YOLOv5 methods [16], [27], [28] include YOLOv5n,
experimental results and analysis. Finally, we summarize our YOLOv5s, YOLOv5m, and YOLOv5x. Notably, these meth-
work in Section V. ods have achieved advancements in terms of accuracy, speed,
and flexibility. Among them, YOLOv5x stands out with its
II. R ELATED W ORKS exceptional performance. In the pursuit of enhancing the
To enhance the accuracy of object detection in foggy accuracy and efficiency of YOLO methods [16], [29], sub-
weather, numerous studies have been conducted on dehazing stantial advancements have been made, with notable mentions
and object detection methods. In this section, we will provide including the introduction of YOLOv7 [5].
a brief overview of excellent techniques, with a particular Although the above methods improve detection performance
focus on deep learning-based object detection methods [17], in fine weather to a certain extent, they may be insufficient
[18], [19], [20], [21], [22] that have exhibited outstanding for locating and classifying some traffic objects in foggy and
performance in these fields. challenging conditions.
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 01:59:17 UTC from IEEE Xplore. Restrictions apply.
ZHANG AND HU: MSFFA-YOLO NETWORK: MULTICLASS OBJECT DETECTION FOR TRAFFIC INVESTIGATIONS 2528712
it. Chen et al. [47] presented the end-to-end gated context A. Detection Subnet
aggregation network to restore the final haze-free image. As discussed in Section II, many related object detection
Ren et al. [48] presented a gated fusion network to restore methods are presented. However, these methods may be insuf-
a clear image from a hazy input. Li et al. [49] presented ficient for locating and classifying some traffic objects in foggy
AOD-Net method for image dehazing using CNN. and challenging conditions, when the presence of fog can
Although the above methods can improve the performance constrain both feature extraction and feature expression [11],
of dehazing to a certain extent, they may pay insufficient atten- [49].
tion to important information, which can affect the dehazing Hence, to mitigate this concern, we employ the presented
performance. detection subnet to bolster the performance of feature extrac-
tion, feature expression, and predictions. In addition, several
C. Challenges of Multiclass Object Detection in Foggy crucial structures in the detection subnet can be presented.
Weather 1) CBS Structure: Conv-BN-sigmoid linear unit (SiLU)
Object detection in foggy weather is still an open and (CBS) includes a convolutional layer, a batch normalization
challenging issue that requires further research. To address layer, and a SiLU function. And, the SiLU function can be
this issue, some excellent methods [3], [4], [5], [11], [15], expressed as follows:
[23], [29], [42], [43], [44], [49], [50], [51], [52], [53] can
SiLU(x) = x · σ (x) (3)
be adopted to improve the performance of object detection
in foggy conditions. Some of the approaches are: 1) using where x is the input factor; σ is the sigmoid function.
synthetic fog datasets to augment the training data and enhance 2) Efficient Layer Aggregation Network: As shown in
the generalization ability of the method; 2) modifying the Fig. 2(a), the efficient layer aggregation network (ELAN) is
backbone network or adding new modules to adapt to foggy utilized as the computational block for the backbone. This
scenes; and 3) applying attention mechanisms or postprocess- integration of ELAN contributes to the overall stability and
ing techniques to suppress noise and highlight salient features. applicability of the network architecture.
The issue of inaccurate localization and classification has And the ELAN_2 structure can be presented in Fig. 2(b).
captured our attention, particularly when dealing with certain The ELAN_2 structure has significant branches. The one
traffic objects in foggy videos. Some classes of traffic objects, branch is to change the number of channels by a 1 ×
such as cars, pedestrians, bicycles, and motorcycles are often 1 convolution. In the remaining part, the number of channels
significant participants in traffic accidents [54]. Regarding can be modified using a 1 × 1 convolutional layer, followed
these classes of traffic objects, the results of object detection by the application of four 3 × 3 convolutional layers. ELAN_2
can be essential for traffic investigations and the prevention can be adopted to facilitate the capture of various object
of traffic accidents. In foggy conditions, details in the images features.
become blurred, and object information may even be hard to 3) MPConv Structure: The max-pooling convolution
perceive [11], making it challenging to detect multiple classes (MPConv) structure is introduced in the detection subnet. Due
of traffic objects accurately [12]. Therefore, how to accurately to its advantages, such as excellent computational efficiency
detecting multiclass traffic objects in foggy and challenging and effective feature extraction, the MPConv structure can
conditions has emerged as a critical concern. contribute to achieve excellent results in detection tasks.
To achieve accurate localization and classification of traf- Especially in foggy conditions, the MPConv structure can be
fic objects in foggy and challenging conditions, we present adopted to facilitate the extraction of object features, thereby
a multiclass object detection method, namely, MSFFA- improving the detection performance.
YOLO network. The MSFFA-YOLO network incorporates the As shown in Fig. 3, regarding the MPConv structure, the
enhanced YOLOv7 as the detection subnet, responsible for max-pooling (MP) layer is introduced to form the upper and
object localization and classification. In addition, we present lower branches. The upper branch halves the length and width
a restoration subnet that utilizes the MSFFA structure to of the image through the MP layer and halves the image
enhance visibility. Finally, our presented method is verified to channel through the CBS layer. The lower branch halves the
be applicable for traffic investigations in foggy weather. This image channel through the first CBS layer, and halves the
is the focus of this article. length and width of the image through the second CBS layer.
Finally, the utilization of the entire structure can serve to
III. M ETHODS strengthen the feature extraction ability of the network.
In this section, we elaborate on the details of our presented 4) Introduction of SPPCSPC: As shown in Fig. 4, the
method, namely, MSFFA-YOLO. In terms of methodology, SPPCSPC consists of significant branches: an upper branch
the challenges associated with object detection in foggy that typically includes three MP layers and several CBS
weather arise from the suboptimal performance exhibited by structures, and a lower branch that usually only includes one
conventional methods in object localization and classification CBS structure. In the SPPCSPC structure, the MP layers can
tasks. Therefore, our presented method aims to enhance the down-sample the input feature maps, while the CBS structures
performance of object localization and classification in foggy are used to deal with feature information from the input feature
weather, thereby ensuring the effectiveness and applicability of maps. Both the upper and lower branches of the SPPCSPC
our presented method in traffic investigations. The architecture structure are important, as they are responsible for feature
of MSFFA-YOLO is shown in Fig. 1. fusion, which can improve the detection performance. The
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 01:59:17 UTC from IEEE Xplore. Restrictions apply.
2528712 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
Fig. 2. ELAN and ELAN_2. (a) ELAN structure. (b) ELAN_2 structure.
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 01:59:17 UTC from IEEE Xplore. Restrictions apply.
ZHANG AND HU: MSFFA-YOLO NETWORK: MULTICLASS OBJECT DETECTION FOR TRAFFIC INVESTIGATIONS 2528712
PA = σ Conv δ Conv F ∗
(8)
∗
F̄ = F ⊗ PA (9)
Fig. 6. Feature attention structure. where Ui is the feature map of the ith layer in the decoding
stage; and Convde represents the convolution in the decoding
stage. And regarding this convolution, the size of the convo-
each convolution operation. And, regarding the down-sampling lution kernel is 3 × 3. The stride of the convolution is 1,
operation, the sampled kernel size is 2 × 2. and the channel number after the convolution is half as much
as the channel number of the previous feature map. There is
2) Feature Conversion Structure: As shown in Fig. 5,
a nonlinear ReLU function after each convolution operation.
to achieve a balance between network accuracy and compu-
And in the up-sampling operation, the width and height of the
tational efficiency, the feature conversion structure includes
feature map can be doubled.
18 two-layer residual blocks. The two-layer residual block
comprises convolutional layers along with the ReLU function.
The kernel size in the convolution operation is 3 × 3, and
the stride of the convolution is 1. The ReLU function can be C. Postprocessing Algorithm
adopted as the activation function. And, the input of the feature Regarding the original algorithm, the issue of missed detec-
conversion structure is the output of the encoder structure. tion is prone to occur in scenes with dense traffic objects.
3) Feature Attention Structure: As shown in Fig. 6, the To overcome this issue, the Soft-NMS algorithm [56] can be
feature attention structure includes channel attention structure adopted. And, the process of the algorithm can be expressed
and pixel attention structure. The channel attention structure as follows.
includes the pooling layer, the convolutional layers, the non-
linear ReLU function, and the nonlinear sigmoid function. Algorithm 1 Presented Postprocessing Algorithm
The pixel attention structure includes two convolutional layers,
Input: B = {b1 , . . . , b N }, S = {s1 , . . . , s N }, Nt
a nonlinear ReLU function, and a nonlinear sigmoid function.
1: D ← {}
Regarding the channel attention structure, the calculation
2: while B ̸= empty do
process can be expressed as follows:
3: m ← argmax S
H X W 4: M ← bm
1 X
5: D ← D ∪ M; B ← B − M
gc = H p (Fc ) = X c (i, j) (5)
H × W i=1 j=1 6: for bi in B do
7: If iou(M, bi ) ≥ Nt then #NMS
CAc = σ (Conv(δ(Conv(gc )))) (6)
8: B ← B − bi ; S ← S − si
Fc∗ = CAc ⊗ Fc (7) 9: end
10: si ← si f (iou(M, bi )) # Soft-NMS
where gc is the output feature map of the cth channel; H p 11: end
is the pooling function; Fc is the input feature map of the 12: end
cth channel; H is the height of the feature map; W is the Output: D, S
width of the feature map; X c (i, j) is the value at pixel (i, j)
in the feature map of the cth channel; Conv is the convolutional
layer, and the size of the convolutional kernel is 3 × 3; δ is the From the algorithm, B is the set of initial detection boxes
ReLU function; σ is the sigmoid function; CAc is the weight {b1 , . . . , b N }; S contains the corresponding detection scores;
of the cth channel; and Fc∗ is the feature map processed by Nt represents the threshold; and M represents the maximum
the channel attention structure. score.
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 01:59:17 UTC from IEEE Xplore. Restrictions apply.
2528712 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
TABLE I
A NCHOR S IZE C ORRESPONDING TO D IFFERENT DATA
TABLE III
A BLATION S TUDY ON F OGGY C ITYSCAPES
TABLE IV
RUNTIME OF S OFT-NMS ON F OGGY C ITYSCAPES
TABLE V
C OMPARISON OF D ETECTION P ERFORMANCE ON F OGGY C ITYSCAPES
AND RTTS
C. Ablation Study
In this section, the findings and analysis from the ablation
experimental results can be presented. As shown in Table III,
the baseline method is YOLOv7, and we conduct ablation tests
To conduct experiments, based on the Cityscapes data, three on Foggy Cityscapes (FC005, FC01, and FC02). And as shown
concentrations of Foggy Cityscapes are artificially generated in Table IV, the runtime of the Soft-NMS algorithm can be
through the atmospheric scattering model and the depth infor- presented.
mation. On Foggy Cityscapes, the generated datasets include On the FC005 dataset, for the introduction of MSFFA, the
FC005, FC01, and FC02, corresponding to different levels of accuracy is increased by 5.6% compared with the baseline
fog density (0.005, 0.01, and 0.02). As shown in Table II, the method; for the introduction of Soft-NMS, the accuracy is
datasets can be split, while example images and corresponding increased by 0.8% compared with the baseline method. Finally,
ground truths are presented in Fig. 7. 64.6% accuracy can be obtained on the FC005 dataset. For the
2) Natural Data: As shown in Fig. 8 and Table II, the RTTS 640 × 640 image size, the Soft-NMS algorithm can achieve
[57] dataset is the public dataset, which involves 4322 natural the runtime of 10.5 ms; for the 1280 × 1280 image size, the
foggy images with annotated object classes. Soft-NMS algorithm can achieve the runtime of 13.4 ms.
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 01:59:17 UTC from IEEE Xplore. Restrictions apply.
2528712 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
TABLE VI
P ERFORMANCE E VALUATION FOR E ACH O BJECT C LASS ON F OGGY C ITYSCAPES
On the FC01 dataset, for the introduction of MSFFA, the TABLE VII
accuracy is increased by 8.6% compared with the baseline P ERFORMANCE E VALUATION FOR E ACH O BJECT C LASS ON RTTS
method; for the introduction of Soft-NMS, the accuracy is
increased by 0.9% compared with the baseline method. Finally,
67.3% accuracy can be obtained on the FC01 dataset. For the
640 × 640 image size, the Soft-NMS algorithm can achieve
the runtime of 10.4 ms; and for the 1280 × 1280 image size,
the Soft-NMS algorithm can achieve the runtime of 13.9 ms.
On the FC02 dataset, for the introduction of MSFFA, the
accuracy is increased by 10.9% compared with the baseline
method; for the introduction of Soft-NMS, the accuracy is be higher than those achieved by YOLOv5x and YOLOv7,
increased by 1.2% compared with the baseline method. Finally, respectively.
65.7% accuracy can be obtained on the FC02 dataset. For the As shown in Tables VI and VII, and Fig. 9, compared
640 × 640 image size, the Soft-NMS algorithm can achieve with YOLOv7 and YOLOv5x, the performance evaluation
the runtime of 11.1 ms; for the 1280 × 1280 image size, the for each object class on Foggy Cityscapes and RTTS can
Soft-NMS algorithm can achieve the runtime of 15.4 ms. be presented. On the FC005 dataset, our presented method
can achieve the excellent performance for APcar (0.857),
APbicycle (0.616), APperson (0.744), APrider (0.814), APmotorcycle
D. Qualitative and Quantitative Results (0.668), APbus (0.689), and APtruck (0.526), which can be
In this section, we evaluate the localization and classifica- higher than those achieved by YOLOv5x and YOLOv7,
tion performance of our presented method, compared with the respectively. On the FC01 dataset, our presented method can
excellent detectors (YOLOv7 and YOLOv5x). And, we also achieve the excellent performance for APcar (0.853), APbicycle
conduct a performance evaluation of our presented method (0.619), APperson (0.726), APrider (0.8), APmotorcycle (0.692),
for concealed object detection, compared with some excellent APbus (0.668), APcaravan (0.469), and APtruck (0.556), which
detectors and concatenation methods. can be higher than those achieved by YOLOv5x and YOLOv7,
As shown in Table V, compared with YOLOv7 and respectively. On the FC02 dataset, our presented method can
YOLOv5x, the evaluation results of detection performance achieve the excellent performance for APcar (0.841), APbicycle
on Foggy Cityscapes and RTTS can be presented. On the (0.58), APperson (0.717), APrider (0.787), APmotorcycle (0.639),
FC005 dataset, our presented method can achieve the excellent APbus (0.686), APcaravan (0.469), and APtruck (0.537), which
performance for the P value (0.779), the R value (0.576), can be higher than those achieved by YOLOv5x and YOLOv7,
the mAP@0.5 value (0.646), and the mAP@0.5:0.95 value respectively. On the RTTS dataset, our presented method
(0.447), which can be higher than those achieved by YOLOv5x can achieve the excellent performance for APperson (0.903),
and YOLOv7, respectively. On the FC01 dataset, our pre- APbus (0.746), APcar (0.924), APmotorbike (0.793), and APbicycle
sented method can achieve the excellent performance for the (0.869), which can be higher than those achieved by YOLOv5x
P value (0.817), the R value (0.591), the mAP@0.5 value and YOLOv7, respectively.
(0.673), and the mAP@0.5:0.95 value (0.472), which can As shown in Tables VIII and IX, compared with YOLOv7
be higher than those achieved by YOLOv5x and YOLOv7, and YOLOv5x, the evaluation results of classification per-
respectively. On the FC02 dataset, our presented method can formance on Foggy Cityscapes and RTTS can be presented.
achieve the excellent performance for the P value (0.818), On the FC005 dataset, our presented method can achieve
the R value (0.582), the mAP@0.5 value (0.657), and the the excellent classification performance for car (0.82), bicycle
mAP@0.5:0.95 value (0.461), which can be higher than those (0.61), person (0.71), rider (0.79), motorcycle (0.64), bus
achieved by YOLOv5x and YOLOv7, respectively. On the (0.69), caravan (0.5), and truck (0.59), which can be higher
RTTS dataset, our presented method can achieve excellent than those achieved by YOLOv5x and YOLOv7, respectively.
performance for the R value (0.786), the mAP@0.5 value On the FC01 dataset, our presented method can achieve the
(0.847), and the mAP@0.5:0.95 value (0.616), which can excellent classification performance for car (0.82), bicycle
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 01:59:17 UTC from IEEE Xplore. Restrictions apply.
ZHANG AND HU: MSFFA-YOLO NETWORK: MULTICLASS OBJECT DETECTION FOR TRAFFIC INVESTIGATIONS 2528712
Fig. 9. Precision–recall curves of multiple classes obtained by MSFFA-YOLO on Foggy Cityscapes and RTTS. (a) Precision–recall curves obtained by
MSFFA-YOLO on FC005. (b) Precision–recall curves obtained by MSFFA-YOLO on FC01. (c) Precision–recall curves obtained by MSFFA-YOLO on FC02.
(d) Precision–recall curves obtained by MSFFA-YOLO on RTTS.
TABLE VIII
C OMPARISON OF C LASSIFICATION P ERFORMANCE ON F OGGY C ITYSCAPES
(0.61), person (0.67), rider (0.77), motorcycle (0.61), bus excellent performance for person (0.86), bus (0.76), car (0.91),
(0.65), caravan (0.29), and truck (0.57), which can be higher and motorbike (0.75), which can be higher than those achieved
than those achieved by YOLOv5x and YOLOv7, respectively. by YOLOv5x and YOLOv7, respectively.
On the FC02 dataset, our presented method can achieve As shown in Table X, compared with these excellent
the excellent classification performance for car (0.8), bicycle object detectors and concatenation methods, the evaluation
(0.55), person (0.67), rider (0.78), motorcycle (0.59), bus results of detection performance on the RW dataset can be
(0.67), caravan (0.5), and truck (0.44), which can be higher presented. Our presented method is 41.5%, 32.3%, 31.9%,
than those achieved by YOLOv5x and YOLOv7, respectively. 13.6%, 12.4%, 8%, 4.8%, and 1.3% higher than those achieved
On the RTTS dataset, our presented method can achieve the by DSNet, Faster R-CNN-R101-FPN+, RetinaNet-R101-
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 01:59:17 UTC from IEEE Xplore. Restrictions apply.
2528712 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
Fig. 10. Results of localization and classification. (a) Results in the first row are obtained by YOLOv5x. (b) Results in the second row are obtained by
YOLOv7. (c) Results in the last row are obtained by MSFFA-YOLO.
TABLE IX
C OMPARISON OF C LASSIFICATION P ERFORMANCE ON RTTS
TABLE X
C ONCEALED O BJECT D ETECTION P ERFORMANCE C OMPARED W ITH
E XCELLENT D ETECTORS AND C ONCATENATION M ETHODS ON RW
Fig. 11. Concealed object detection in real-world foggy traffic scenes using
MSFFA-YOLO.
conditions. Consequently, our presented method can contribute [15] N. Carion et al., “End-to-end object detection with transformers,” in
to traffic investigations by providing accurate and reliable Proc. Comput. Vis. (ECCV), Glasgow, U.K., 2020, pp. 213–229.
[16] M. Sozzi, S. Cantalamessa, A. Cogato, A. Kayad, and F. Marinello,
information about traffic objects in foggy weather, holding “Automatic bunch detection in white grape varieties using YOLOv3,
the potential to enhance traffic safety and optimize traffic YOLOv4, and YOLOv5 deep learning algorithms,” Agronomy, vol. 12,
management. no. 2, p. 319, Jan. 2022, doi: 10.3390/agronomy12020319.
In our future research, we will collect extensive datasets [17] M. Krišto, M. Ivasic-Kos, and M. Pobar, “Thermal object detection
in difficult weather conditions using YOLO,” IEEE Access, vol. 8,
from diverse sites featuring a range of traffic control scenes. pp. 125459–125476, 2020, doi: 10.1109/access.2020.3007481.
This comprehensive data collection will enable us to enhance [18] W. Y. Liu et al., “Image-adaptive YOLO for object detection in adverse
the generalizability of our presented method. Furthermore, weather conditions,” in Proc. AAAI Conf. Artif. Intell., Jun. 2022, vol. 36,
no. 2, pp. 1792–1800, doi: 10.1609/aaai.v36i2.20072.
we will explore innovative techniques and leverage advance-
[19] D. F. Liu, Y. M. Cui, Z. W. Cao, and Y. J. Chen, “A large-scale simulation
ments in CNN design to develop an even more powerful and dataset: Boost the detection accuracy for special weather conditions,”
efficient solution. By bridging the gap between theoretical in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Glasgow, U.K., 2020,
advancements and practical implementation, we aspire to pp. 1–8.
[20] P. Tumas, A. Nowosielski, and A. Serackis, “Pedestrian detection in
contribute to the field by providing excellent methods for severe weather conditions,” IEEE Access, vol. 8, pp. 62775–62784,
traffic investigations in challenging conditions. 2020, doi: 10.1109/access.2020.2982539.
[21] G. Li, Z. Ji, X. Qu, R. Zhou, and D. Cao, “Cross-domain object
detection for autonomous driving: A stepwise domain adaptative YOLO
R EFERENCES approach,” IEEE Trans. Intell. Vehicles, vol. 7, no. 3, pp. 603–615,
Sep. 2022, doi: 10.1109/TIV.2022.3165353.
[1] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, and
D. Terzopoulos, “Image segmentation using deep learning: A survey,” [22] Q. Lang, L. Zhang, W. Shi, W. Chen, and S. Pu, “Exploring implicit
IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 7, pp. 3523–3542, domain-invariant features for domain adaptive object detection,” IEEE
Jul. 2022, doi: 10.1109/TPAMI.2021.3059968. Trans. Circuits Syst. Video Technol., vol. 33, no. 4, pp. 1816–1826,
[2] M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. H. Hoi, “Deep Apr. 2023, doi: 10.1109/TCSVT.2022.3216611.
learning for person re-identification: A survey and outlook,” IEEE Trans. [23] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 2872–2893, Jun. 2022, no. 7553, pp. 436–444, May 2015, doi: 10.1038/nature14539.
doi: 10.1109/TPAMI.2021.3054775. [24] T. Y. Lin et al., “Feature pyramid networks for object detection,” in
[3] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look Proc. 30th IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
once: Unified, real-time object detection,” in Proc. IEEE Conf. Com- Honolulu, HI, USA, Jun. 2017, pp. 936–944.
put. Vis. Pattern Recognit. (CVPR), Seattle, WA, USA, Jul. 2016, [25] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-based
pp. 779–788. convolutional networks for accurate object detection and segmentation,”
[4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 1, pp. 142–158,
real-time object detection with region proposal networks,” IEEE Trans. Jan. 2016, doi: 10.1109/tpami.2015.2437384.
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017, [26] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and
doi: 10.1109/tpami.2016.2577031. A. W. M. Smeulders, “Selective search for object recognition,” Int.
[5] C. Y. Wang, A. Bochkovskiy, and H. Liao, “YOLOv7: Trainable bag- J. Comput. Vis., vol. 104, no. 2, pp. 154–171, Sep. 2013, doi:
of-freebies sets new state-of-the-art for real-time object detectors,” in 10.1007/s11263-013-0620-5.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Vancouver, BC, [27] B. Yan, P. Fan, X. Lei, Z. Liu, and F. Yang, “A real-time
Canada, Jun. 2022, pp. 7464–7475. apple targets detection method for picking robot based on improved
[6] M. Humayun, F. Ashfaq, N. Z. Jhanjhi, and M. K. Alsadun, “Traffic YOLOv5,” Remote Sens., vol. 13, no. 9, p. 1619, Apr. 2021, doi:
management: Multi-scale vehicle detection in varying weather condi- 10.3390/rs13091619.
tions using YOLOv4 and spatial pyramid pooling network,” Electronics, [28] Y. Yu, J. Zhao, Q. Gong, C. Huang, G. Zheng, and J. Ma, “Real-
vol. 11, no. 17, p. 2748, Sep. 2022, doi: 10.3390/electronics11172748. time underwater maritime object detection in side-scan sonar images
[7] K. Wang, W. Zhang, Z. Feng, and C. Wang, “Research on the classifi- based on transformer-YOLOv5,” Remote Sens., vol. 13, no. 18, p. 3555,
cation for road traffic visibility based on the characteristics of driving Sep. 2021, doi: 10.3390/rs13183555.
behaviour—A driving simulator experiment,” J. Intell. Connected Vehi- [29] G. Jocher, A. Chaurasia, and J. Qiu. YOLO by Ultralytics.
cles, vol. 3, no. 1, pp. 30–36, 2020, doi: 10.1108/JICV-10-2019-0011. Accessed: Jan. 10, 2023. [Online]. Available: https://github.
[8] M. Todorova, D. Radojka, and B. Jasmina, “Role of functional clas- com/ultralytics/ultralytics
sification of highways in road traffic safety,” Transp. Problem, vol. 4, [30] J. Y. Chiang and Y.-C. Chen, “Underwater image enhance-
pp. 97–104, Jan. 2009. ment by wavelength compensation and dehazing,” IEEE Trans.
[9] L. Jiao et al., “A survey of deep learning-based object Image Process., vol. 21, no. 4, pp. 1756–1769, Apr. 2012, doi:
detection,” IEEE Access, vol. 7, pp. 128837–128868, 2019, doi: 10.1109/tip.2011.2179666.
10.1109/access.2019.2939201. [31] C. O. Ancuti and C. Ancuti, “Single image dehazing by multi-scale
[10] S. R. E. Datondji, Y. Dupuis, P. Subirats, and P. Vasseur, “A survey fusion,” IEEE Trans. Image Process., vol. 22, no. 8, pp. 3271–3282,
of vision-based traffic monitoring of road intersections,” IEEE Trans. Aug. 2013, doi: 10.1109/tip.2013.2262284.
Intell. Transp. Syst., vol. 17, no. 10, pp. 2681–2698, Oct. 2016, doi: [32] R. Fattal, “Dehazing using color-lines,” Acm Trans. Graph., vol. 34,
10.1109/tits.2016.2530146. no. 1, pp. 1–14, Nov. 2014, doi: 10.1145/2651362.
[11] S.-C. Huang, T.-H. Le, and D.-W. Jaw, “DSNet: Joint semantic learning
for object detection in inclement weather conditions,” IEEE Trans. [33] D. Berman, T. Treibitz, and S. Avidan, “Non-local image dehazing,” in
Pattern Anal. Mach. Intell., vol. 43, no. 8, pp. 2623–2633, Aug. 2021, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, WA,
doi: 10.1109/tpami.2020.2977911. USA, Jun. 2016, pp. 1674–1682.
[12] H. Wang et al., “YOLOv5-fog: A multiobjective visual detection [34] R. Fattal, “Single image dehazing,” ACM Trans. Graph., vol. 27, no. 3,
algorithm for fog driving scenes based on improved YOLOv5,” pp. 1–9, Aug. 2008, doi: 10.1145/1360612.1360671.
IEEE Trans. Instrum. Meas., vol. 71, pp. 1–12, 2022, doi: [35] K. He, J. Sun, and X. Tang, “Single image haze removal using dark
10.1109/tim.2022.3196954. channel prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 12,
[13] K. Hsieh et al., “Focus: Querying large video datasets with low latency pp. 2341–2353, Dec. 2011, doi: 10.1109/tpami.2010.168.
and low cost,” in Proc. 13th USENIX Symp. Operating Syst. Design [36] M. Ju, D. Zhang, and X. Wang, “Single image dehazing via an
Implement. (OSDI), Carlsbad, CA, USA, 2018, pp. 269–286. improved atmospheric scattering model,” Vis. Comput., vol. 33, no. 12,
[14] A. Geiger, M. Lauer, C. Wojek, C. Stiller, and R. Urtasun, “3D traffic pp. 1613–1625, Dec. 2017, doi: 10.1007/s00371-016-1305-1.
scene understanding from movable platforms,” IEEE Trans. Pattern [37] W. Wang, X. Yuan, X. Wu, and Y. Liu, “Fast image dehazing method
Anal. Mach. Intell., vol. 36, no. 5, pp. 1012–1025, May 2014, doi: based on linear transformation,” IEEE Trans. Multimedia, vol. 19, no. 6,
10.1109/tpami.2013.185. pp. 1142–1155, Jun. 2017, doi: 10.1109/tmm.2017.2652069.
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 01:59:17 UTC from IEEE Xplore. Restrictions apply.
2528712 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
[38] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze removal algorithm [54] V. Milanés, S. E. Shladover, J. Spring, C. Nowakowski, H. Kawazoe,
using color attenuation prior,” IEEE Trans. Image Process., vol. 24, and M. Nakamura, “Cooperative adaptive cruise control in real traffic
no. 11, pp. 3522–3533, Nov. 2015, doi: 10.1109/tip.2015.2446191. situations,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 1, pp. 296–305,
[39] W. Zhang, J. Liang, H. Ju, L. Ren, E. Qu, and Z. Wu, “A robust haze- Feb. 2014, doi: 10.1109/tits.2013.2278494.
removal scheme in polarimetric dehazing imaging based on automatic [55] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
identification of sky region,” Opt. Laser Technol., vol. 86, pp. 145–151, for biomedical image segmentation,” in Proc. Med. Image Comput.
Dec. 2016, doi: 10.1016/j.optlastec.2016.07.015. Comput.-Assisted Intervent., Munich, Germany, 2015, pp. 234–241.
[40] G. Yan, M. Yu, S. Shi, and C. Feng, “The recognition of traffic speed [56] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-NMS—
limit sign in hazy weather,” J. Intell. Fuzzy Syst., vol. 33, no. 2, Improving object detection with one line of code,” in Proc. 16th IEEE
pp. 873–883, Jul. 2017, doi: 10.3233/jifs-162138. Int. Conf. Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp. 5562–5570.
[41] J. Zhang, X. Wang, C. Yang, J. Zhang, D. He, and H. Song, “Image [57] B. Li et al., “Benchmarking single-image dehazing and beyond,” IEEE
dehazing based on dark channel prior and brightness enhancement Trans. Image Process., vol. 28, no. 1, pp. 492–505, Jan. 2019, doi:
for agricultural remote sensing images from consumer-grade cameras,” 10.1109/tip.2018.2867951.
Comput. Electron. Agricult., vol. 151, pp. 196–206, Aug. 2018, doi: [58] T. Y. Lin et al., “Microsoft COCO: Common objects in context,” in
10.1016/j.compag.2018.06.010. Proc. 13th Eur. Conf. Comput. Vision (ECCV), Zurich, Switzerland,
[42] J. Yang, C. Wu, B. Du, and L. Zhang, “Enhanced multiscale fea- 2014, pp. 740–755.
ture fusion network for HSI classification,” IEEE Trans. Geosci. [59] A. Moffat and J. Zobel, “Rank-biased precision for measurement of
Remote Sens., vol. 59, no. 12, pp. 10328–10347, Dec. 2021, doi: retrieval effectiveness,” ACM Trans. Inf. Syst., vol. 27, no. 1, pp. 1–27,
10.1109/tgrs.2020.3046757. Dec. 2009, doi: 10.1145/1416950.1416952.
[43] W. J. Zhou, X. Y. Lin, J. S. Lei, L. Yu, and J. N. Hwang, [60] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, and B. Schiele,
“MFFENet: Multiscale feature fusion and enhancement network for “The cityscapes dataset for semantic urban scene understanding,” in
RGB–thermal urban road scene parsing,” IEEE Trans. Multimedia, Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, NV, USA,
vol. 24, pp. 2526–2538, 2022, doi: 10.1109/tmm.2021.3086618. Oct. 2016, pp. 3213–3223.
[44] D. F. Liu et al., “DenserNet: Weakly supervised visual localiza-
tion using multi-scale feature aggregation,” in Proc. AAAI Conf.
Artif. Intell., May 2021, vol. 35, no. 7, pp. 6101–6109, doi:
10.1609/aaai.v35i7.16760.
[45] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “DehazeNet: An
end-to-end system for single image haze removal,” IEEE Trans. Qiang Zhang received the master’s degree from
Image Process., vol. 25, no. 11, pp. 5187–5198, Nov. 2016, doi: Dalian Maritime University, Dalian, China, in 2019.
10.1109/tip.2016.2598681. He is currently pursuing the Ph.D. degree in
[46] W. Ren et al., “Single image dehazing via multi-scale convolutional transportation engineering with the School of Trans-
neural networks,” in Proc. 14th Eur. Conf. Comput. Vis. (ECCV), portation, Southeast University, Nanjing, China.
Amsterdam, The Netherlands, 2016, pp. 154–169. His research interests include computer vision,
deep learning, traffic investigation, traffic safety
[47] D. Chen et al., “Gated context aggregation network for image dehazing
analysis, data analysis, and intelligent transportation
and deraining,” in Proc. 19th IEEE Winter Conf. Appl. Comput. Vis.
systems.
(WACV), Waikoloa Village, HI, USA, Jan. 2019, pp. 1375–1383.
Mr. Zhang serves as a reviewer for several journals,
[48] W. Ren et al., “Gated fusion network for single image dehazing,” in such as Neural Computing and Applications, IEEE
Proc. 31st IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), T RANSACTIONS ON I NTELLIGENT T RANSPORTATION S YSTEMS, and IET
Salt Lake City, UT, USA, Jun. 2018, pp. 3253–3261. Intelligent Transport Systems.
[49] B. Y. Li et al., “AOD-Net: All-in-one dehazing network,” in Proc.
16th IEEE Int. Conf. Comput. Vis. (ICCV), Venice, Italy, 2017,
pp. 4780–4788.
[50] X. Fan, Z. Zhao, W. Yan, X. Yan, and P. Shi, “Multi-scale feature
fusion image dehazing algorithm combined with attention mechanism,”
Comput. Sci., vol. 49, no. 5, pp. 50–57, 2022. Xiaojian Hu received the Ph.D. degree in trans-
[51] C. Sakaridis, D. Dai, and L. Van Gool, “Semantic foggy scene under- portation engineering from Southeast University,
standing with synthetic data,” Int. J. Comput. Vis., vol. 126, no. 9, Nanjing, China, in 2009.
pp. 973–992, Sep. 2018, doi: 10.1007/s11263-018-1072-8. He is currently an Associate Professor with
[52] T. Y. Lin et al., “Focal loss for dense object detection,” in Proc. Southeast University. His research interests include
16th IEEE Int. Conf. Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, computer vision, deep learning, transportation orga-
pp. 2999–3007. nization and management, intelligent traffic systems,
transportation planning, and road traffic safety.
[53] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for
dense object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42,
no. 2, pp. 318–327, Feb. 2020, doi: 10.1109/tpami.2018.2858826.
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 01:59:17 UTC from IEEE Xplore. Restrictions apply.