MSFFA-YOLO Network Multiclass Object Detection For Traffic Investigations in Foggy Weather

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL.
72, 2023 2528712
MSFFA-YOLO Network: Multiclass Object

Detection for Traffic Investigations
in Foggy Weather
Qiang Zhang and Xiaojian Hu
Abstract— Despite significant progress in vision-based detec- accurate object information is crucial for traffic investigations
tion methods, the task of detecting traffic objects in foggy weather related to safety [7], management [8], and other aspects
remains challenging. The presence of fog reduces visibility, which [9], [10].
in turn affects the information of traffic objects in videos.
However, accurate information regarding the localization and Acquiring accurate object information is challenging, due to
classification of traffic objects is crucial for certain traffic inves- the adverse effects of fog [11]. The presence of fog can lead
tigations. In this article, we focus on presenting a multiclass object to reduced visibility, blurred information, increased noise, and
detection method, namely, multiscale feature fusion attention- other factors in traffic videos [12]. The aforementioned factors
you only look once (MSFFA-YOLO) network, that can be
stemming from foggy conditions can significantly impact
trained and jointly achieve three tasks: visibility enhancement,
object classification, and object localization. In the network, the accuracy of object detection results, posing substantial
we employ the enhanced YOLOv7 as a detection subnet, which challenges in this study.
is responsible for learning to locate and classify objects. In the One of the challenges is to detect multiclass traffic objects,
restoration subnet, the MSFFA structure is presented for visibility such as cars, bicycles, buses, and others [12], [13], [14].
enhancement. The experimental results on the synthetic foggy
In foggy conditions, the phenomenon of reduced contrast and
datasets show that the presented MSFFA-YOLO can achieve
64.6% accuracy on the FC005 dataset, 67.3% accuracy on diminished clarity in object details may arise [11], subse-
the FC01 dataset, and 65.7% accuracy on the FC02 dataset. quently impacting the localization and classification of traffic
When evaluated on the natural foggy datasets, the presented objects. And, the concealment of certain traffic objects in the
MSFFA-YOLO can achieve 84.7% accuracy on the RTTS dataset fog may pose a significant obstacle to the accurate detection
and 84.1% accuracy on the RW dataset, indicating its ability
of multiclass traffic objects [5], [15], [16].
to accurately detect multiclass traffic objects in real and foggy
weather. And, the experimental results show that the presented In this article, we presented a multiclass object detection
MSFFA-YOLO can achieve the efficiency of 37 frames per second method for traffic investigations in foggy weather. The main
(FPS). Finally, the experimental results demonstrate the excellent contribution points of this article can be summarized as
performance of our presented method for object localization and follows.
classification in foggy weather. And, when detecting concealed
traffic objects in foggy weather, our presented method exhibits 1) We presented a novel object detection method, namely,
superior accuracy. These results substantiate the applicability of the multiscale feature fusion attention-you only look
our presented method for traffic investigations in foggy weather. once (MSFFA-YOLO) network, that includes multi-
Index Terms— Feature attention, feature fusion, foggy weather, scale feature fusion, feature attention mechanism, and
object detection, you only look once (YOLO). enhanced YOLOv7 to improve the accuracy of object
I. I NTRODUCTION detection in foggy weather.
2) Our presented method yields advancements by achiev-
I N RECENT years, deep learning-based object detection
methods [1], [2], [3], [4], [5] have exhibited remark-
able performance in some traffic investigations. Nevertheless,
ing exceptional performance in object localization and
classification tasks, while also demonstrating its superior
accuracy in detecting concealed traffic objects amidst
the effectiveness of certain methods can be considerably
foggy conditions.
compromised in foggy conditions. The issue has attracted con-
3) Our presented method can contribute to traffic investi-
siderable attention, especially in some traffic scenes [6], where gations by providing accurate and reliable information
Manuscript received 10 April 2023; revised 18 August 2023; about traffic objects in foggy weather, holding the
accepted 7 September 2023. Date of publication 25 September 2023; potential to enhance traffic safety and optimize traffic
date of current version 11 October 2023. This work was supported in management.
part by the National Natural Science Foundation of China under Grant
52272344 and in part by the Fundamental Research Funds for the Central Furthermore, the effectiveness of our presented method is
Universities. The Associate Editor coordinating the review process was Dr.
Jianbo Yu. (Corresponding author: Xiaojian Hu.) demonstrated through extensive experiments and benchmark
The authors are with the Jiangsu Key Laboratory of Urban ITS, the evaluations. There is substantial evidence from all exper-
Jiangsu Province Collaborative Innovation Center of Modern Urban Traffic imental results that confirms the exceptional accuracy of
Technologies, and the School of Transportation, Southeast University, Nanjing
211189, China (e-mail: 230198700@seu.edu.cn; huxiaojian@seu.edu.cn). our presented method, surpassing that of other comparison
Digital Object Identifier 10.1109/TIM.2023.3318671 methods.
1557-9662 © 2023 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Universitas Brawijaya. Downloaded on January 01,2024 at 01:59:17 UTC from IEEE Xplore. Restrictions apply.
2528712 IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL. 72, 2023
The remainder of this article is organized as follows. probabilities. However, if there are dense objects and small
In Section II, we review related works on object detection objects, the detection performance is poor.
and dehazing methods, as well as the challenges of multiclass Regarding the other members in the YOLO family, they can
object detection in foggy weather. In Section III, we showed achieve better detection performance through some improve-
our presented method in detail. In Section IV, we presented ments. YOLOv5 methods [16], [27], [28] include YOLOv5n,
experimental results and analysis. Finally, we summarize our YOLOv5s, YOLOv5m, and YOLOv5x. Notably, these meth-
work in Section V. ods have achieved advancements in terms of accuracy, speed,
and flexibility. Among them, YOLOv5x stands out with its
II. R ELATED W ORKS exceptional performance. In the pursuit of enhancing the
To enhance the accuracy of object detection in foggy accuracy and efficiency of YOLO methods [16], [29], sub-
weather, numerous studies have been conducted on dehazing stantial advancements have been made, with notable mentions
and object detection methods. In this section, we will provide including the introduction of YOLOv7 [5].
a brief overview of excellent techniques, with a particular Although the above methods improve detection performance
focus on deep learning-based object detection methods [17], in fine weather to a certain extent, they may be insufficient
[18], [19], [20], [21], [22] that have exhibited outstanding for locating and classifying some traffic objects in foggy and
performance in these fields. challenging conditions.
A. Object Detection B. Image Dehazing

In the realm of deep learning, object detection based on con- With the development of computer vision, many dehazing
volution neural network (CNN) has gained extensive interest methods [30], [31], [32], [33], [34] have been presented to
[23], [24]. The CNN methods can be divided into two main reduce the interference of fog, thereby improving the accuracy
categories. of multiclass object detection in foggy conditions.
Based on the imaging principle [35] in foggy conditions,
1) Region proposal-based object detection methods,
Ju et al. [36] and Wang et al. [37] usually use the atmospheric
in which regions of interest (RoIs) can be processed for
scattering model to simulate the imaging process. And, the
object detection based on region proposal methods.
atmospheric scattering model can be expressed as follows:
2) Regression-based object detection methods, in which
bounding box coordinates and class probabilities are I (x) = J (x)t(x) + A(1 − t(x)) (1)
predicted directly from images by a single CNN.
where I (x) represents the hazy image recorded by the camera;
Regarding object detection methods in the first category,
J (x) represents the clear image; A represents the global atmo-
the mainstream methods include region proposals with CNNs
spheric light; and t (x) represents the medium transmission.
[region-based convolutional neural network (R-CNN) family].
Therefore, the image dehazing process can be expressed as
Girshick et al. [25] introduced the R-CNN, which takes an
follows:
input image and generates region proposals using an algorithm
I (x) − A
that calculates the hierarchical grouping of similar regions with J (x) = + A. (2)
respect to numerous compatible elements—a process known t(x)
as selective search [26]. Subsequently, a CNN is employed to As shown in (2), the traditional method can be used for
extract features for each proposal, followed by the adoption of image dehazing based on the prior knowledge. He et al.
a support vector machine (SVM) for performing classification [35] presented the method called dark channel prior (DCP),
[11]. Although R-CNN has achieved impressive accuracy, in which the medium transmission can be obtained based on
it still consumes a considerable amount of time during training prior knowledge, and the atmospheric scattering model can be
and inference. used for image dehazing. This method can achieve a good
To overcome this issue, Ren et al. [4] presented Faster dehazing performance, but it can cause color distortion in
R-CNN by introducing an additional region proposal network some scenes. Zhu et al. [38] presented a color attenuation
to replace the selective search algorithm for improving both prior method, and the clear images can be obtained by using
accuracy and speed. And, the Faster R-CNN-R101-FPN+ [15] the atmospheric scattering model. Some traditional dehazing
is the improved version of the original Faster R-CNN that methods [39], [40], [41] need some prior information, and use
includes additional feature pyramids and residual connections. artificial methods to extract the relevant features, resulting in
Regarding region-based convolutional networks [4], [15], [25], large errors for the estimations of the medium transmissions
they can improve the detection accuracy by region proposal and atmospheric light values.
methods, but their efficiency is poor due to a large amount of To overcome this issue, deep learning-based methods [42],
calculation. [43], [44] can be adopted. Cai et al. [45] present the dehazing
Regarding methods in the second category, the mainstream method, which can estimate the mapping relationship between
methods include YOLO family. Redmon et al. [3] presented the fogged image and the medium transmission based on
the YOLO method, which has efficient performance. In the image features. Ren et al. [46] presented the dehazing method
YOLO method, images can be scaled to the same size and called multi-scale convolutional neural network, in which
divided in the form of grids. The method can be used to the large-scale network is used to estimate the medium
process the images once to have the bounding boxes and class transmission, and the small-scale network is used to refine
ZHANG AND HU: MSFFA-YOLO NETWORK: MULTICLASS OBJECT DETECTION FOR TRAFFIC INVESTIGATIONS 2528712
it. Chen et al. [47] presented the end-to-end gated context A. Detection Subnet
aggregation network to restore the final haze-free image. As discussed in Section II, many related object detection
Ren et al. [48] presented a gated fusion network to restore methods are presented. However, these methods may be insuf-
a clear image from a hazy input. Li et al. [49] presented ficient for locating and classifying some traffic objects in foggy
AOD-Net method for image dehazing using CNN. and challenging conditions, when the presence of fog can
Although the above methods can improve the performance constrain both feature extraction and feature expression [11],
of dehazing to a certain extent, they may pay insufficient atten- [49].
tion to important information, which can affect the dehazing Hence, to mitigate this concern, we employ the presented
performance. detection subnet to bolster the performance of feature extrac-
tion, feature expression, and predictions. In addition, several
C. Challenges of Multiclass Object Detection in Foggy crucial structures in the detection subnet can be presented.
Weather 1) CBS Structure: Conv-BN-sigmoid linear unit (SiLU)
Object detection in foggy weather is still an open and (CBS) includes a convolutional layer, a batch normalization
challenging issue that requires further research. To address layer, and a SiLU function. And, the SiLU function can be
this issue, some excellent methods [3], [4], [5], [11], [15], expressed as follows:
[23], [29], [42], [43], [44], [49], [50], [51], [52], [53] can
SiLU(x) = x · σ (x) (3)
be adopted to improve the performance of object detection
in foggy conditions. Some of the approaches are: 1) using where x is the input factor; σ is the sigmoid function.
synthetic fog datasets to augment the training data and enhance 2) Efficient Layer Aggregation Network: As shown in
the generalization ability of the method; 2) modifying the Fig. 2(a), the efficient layer aggregation network (ELAN) is
backbone network or adding new modules to adapt to foggy utilized as the computational block for the backbone. This
scenes; and 3) applying attention mechanisms or postprocess- integration of ELAN contributes to the overall stability and
ing techniques to suppress noise and highlight salient features. applicability of the network architecture.
The issue of inaccurate localization and classification has And the ELAN_2 structure can be presented in Fig. 2(b).
captured our attention, particularly when dealing with certain The ELAN_2 structure has significant branches. The one
traffic objects in foggy videos. Some classes of traffic objects, branch is to change the number of channels by a 1 ×
such as cars, pedestrians, bicycles, and motorcycles are often 1 convolution. In the remaining part, the number of channels
significant participants in traffic accidents [54]. Regarding can be modified using a 1 × 1 convolutional layer, followed
these classes of traffic objects, the results of object detection by the application of four 3 × 3 convolutional layers. ELAN_2
can be essential for traffic investigations and the prevention can be adopted to facilitate the capture of various object
of traffic accidents. In foggy conditions, details in the images features.
become blurred, and object information may even be hard to 3) MPConv Structure: The max-pooling convolution
perceive [11], making it challenging to detect multiple classes (MPConv) structure is introduced in the detection subnet. Due
of traffic objects accurately [12]. Therefore, how to accurately to its advantages, such as excellent computational efficiency
detecting multiclass traffic objects in foggy and challenging and effective feature extraction, the MPConv structure can
conditions has emerged as a critical concern. contribute to achieve excellent results in detection tasks.
To achieve accurate localization and classification of traf- Especially in foggy conditions, the MPConv structure can be
fic objects in foggy and challenging conditions, we present adopted to facilitate the extraction of object features, thereby
a multiclass object detection method, namely, MSFFA- improving the detection performance.
YOLO network. The MSFFA-YOLO network incorporates the As shown in Fig. 3, regarding the MPConv structure, the
enhanced YOLOv7 as the detection subnet, responsible for max-pooling (MP) layer is introduced to form the upper and
object localization and classification. In addition, we present lower branches. The upper branch halves the length and width
a restoration subnet that utilizes the MSFFA structure to of the image through the MP layer and halves the image
enhance visibility. Finally, our presented method is verified to channel through the CBS layer. The lower branch halves the
be applicable for traffic investigations in foggy weather. This image channel through the first CBS layer, and halves the
is the focus of this article. length and width of the image through the second CBS layer.
Finally, the utilization of the entire structure can serve to
III. M ETHODS strengthen the feature extraction ability of the network.
In this section, we elaborate on the details of our presented 4) Introduction of SPPCSPC: As shown in Fig. 4, the
method, namely, MSFFA-YOLO. In terms of methodology, SPPCSPC consists of significant branches: an upper branch
the challenges associated with object detection in foggy that typically includes three MP layers and several CBS
weather arise from the suboptimal performance exhibited by structures, and a lower branch that usually only includes one
conventional methods in object localization and classification CBS structure. In the SPPCSPC structure, the MP layers can
tasks. Therefore, our presented method aims to enhance the down-sample the input feature maps, while the CBS structures
performance of object localization and classification in foggy are used to deal with feature information from the input feature
weather, thereby ensuring the effectiveness and applicability of maps. Both the upper and lower branches of the SPPCSPC
our presented method in traffic investigations. The architecture structure are important, as they are responsible for feature
of MSFFA-YOLO is shown in Fig. 1. fusion, which can improve the detection performance. The
Fig. 1. Architecture of MSFFA-YOLO network.
Fig. 3. MPConv structure.
Fig. 2. ELAN and ELAN_2. (a) ELAN structure. (b) ELAN_2 structure.
design of the entire structure can contribute to reinforcing

the expression of features, thereby playing a pivotal role in
Fig. 4. SPPCSPC structure.
achieving excellent performance when handling object detec-
tion tasks.
In summary, the process of the detection subnet is as to fuse the feature maps. By incorporating the feature attention
follows. First, the input images are preprocessed, and they structure into the restoration subnet, it enables the restoration
are sent to the backbone network for extracting features subnet to exhibit exceptional performance in regions with
from the processed images. Second, the extracted features are dense fog and significant channel information.
transformed into features with varying scales. Third, the fused 1) Encoder Structure: Regarding the encoder structure, the
features can be processed by the head network, and the outputs convolutional operation can be represented by the following
can be obtained after detection. formula:
B. Restoration Subnet Di = Conven (Di−1 ), i ∈ {1, 2, 3} (4)

As discussed in Section II, to overcome the limitations of where Di is the feature map of the ith layer in the encoding
the existing methods, we presented an MSFFA restoration stage; Conven represents the convolution in the encoding stage.
subnet for foggy conditions in Fig. 1. The restoration subnet And, regarding this convolution, the size of the convolution
includes the encoder structure, the feature conversion structure, kernel is 3 × 3. The stride of the convolution is 1, and
the feature attention structure, and the decoder structure. CNN the channel number after the convolution is twice as much
can be adopted to extract feature maps of different scales from as the channel number of the previous feature map. There
images, and then, skip connections [50], [55] can be adopted is a nonlinear rectified linear unit (ReLU) function after
Regarding the pixel attention structure, the calculation pro-

cess can be expressed as follows:
PA = σ Conv δ Conv F ∗

(8)
∗
F̄ = F ⊗ PA (9)
where PA is the weight of the pixel; Conv is the convolutional

layer; σ is the sigmoid function; δ is the ReLU function; F ∗
is the output feature map from the channel attention structure;
and F̄ is the feature map processed by the pixel attention
structure.
Fig. 5. Feature conversion structure. 4) Decoder Structure: The decoder structure includes con-
volutional layers and up-sampling layers. Regarding the
decoder structure, the convolutional operation can be repre-
sented by the following formula:
Ui−1 = Convde (Ui ), i ∈ {3, 2, 1} (10)
Fig. 6. Feature attention structure. where Ui is the feature map of the ith layer in the decoding
stage; and Convde represents the convolution in the decoding
stage. And regarding this convolution, the size of the convo-
each convolution operation. And, regarding the down-sampling lution kernel is 3 × 3. The stride of the convolution is 1,
operation, the sampled kernel size is 2 × 2. and the channel number after the convolution is half as much
as the channel number of the previous feature map. There is
2) Feature Conversion Structure: As shown in Fig. 5,
a nonlinear ReLU function after each convolution operation.
to achieve a balance between network accuracy and compu-
And in the up-sampling operation, the width and height of the
tational efficiency, the feature conversion structure includes
feature map can be doubled.
18 two-layer residual blocks. The two-layer residual block
comprises convolutional layers along with the ReLU function.
The kernel size in the convolution operation is 3 × 3, and
the stride of the convolution is 1. The ReLU function can be C. Postprocessing Algorithm
adopted as the activation function. And, the input of the feature Regarding the original algorithm, the issue of missed detec-
conversion structure is the output of the encoder structure. tion is prone to occur in scenes with dense traffic objects.
3) Feature Attention Structure: As shown in Fig. 6, the To overcome this issue, the Soft-NMS algorithm [56] can be
feature attention structure includes channel attention structure adopted. And, the process of the algorithm can be expressed
and pixel attention structure. The channel attention structure as follows.
includes the pooling layer, the convolutional layers, the non-
linear ReLU function, and the nonlinear sigmoid function. Algorithm 1 Presented Postprocessing Algorithm
The pixel attention structure includes two convolutional layers,
Input: B = {b1 , . . . , b N }, S = {s1 , . . . , s N }, Nt
a nonlinear ReLU function, and a nonlinear sigmoid function.
1: D ← {}
Regarding the channel attention structure, the calculation
2: while B ̸= empty do
process can be expressed as follows:
3: m ← argmax S
H X W 4: M ← bm
1 X
5: D ← D ∪ M; B ← B − M
gc = H p (Fc ) = X c (i, j) (5)
H × W i=1 j=1 6: for bi in B do
7: If iou(M, bi ) ≥ Nt then #NMS
CAc = σ (Conv(δ(Conv(gc )))) (6)
8: B ← B − bi ; S ← S − si
Fc∗ = CAc ⊗ Fc (7) 9: end
10: si ← si f (iou(M, bi )) # Soft-NMS
where gc is the output feature map of the cth channel; H p 11: end
is the pooling function; Fc is the input feature map of the 12: end
cth channel; H is the height of the feature map; W is the Output: D, S
width of the feature map; X c (i, j) is the value at pixel (i, j)
in the feature map of the cth channel; Conv is the convolutional
layer, and the size of the convolutional kernel is 3 × 3; δ is the From the algorithm, B is the set of initial detection boxes
ReLU function; σ is the sigmoid function; CAc is the weight {b1 , . . . , b N }; S contains the corresponding detection scores;
of the cth channel; and Fc∗ is the feature map processed by Nt represents the threshold; and M represents the maximum
the channel attention structure. score.
TABLE I
A NCHOR S IZE C ORRESPONDING TO D IFFERENT DATA
Second, we trained the network with the synthetic data. And,

the network can be trained for 100 epochs on the FC005,
FC01, and FC02 datasets, respectively.
Fig. 7. Example images and ground truths. The continuous training process can make the network
converge, and the network parameters can be stored.
2) Evaluation: To evaluate the performance of object
D. Loss Function detection methods, the publicly available metrics of pattern
In addition to the loss function of YOLOv7 [5], the loss analysis, statical modeling and computational learning (PAS-
function of MSFFA can be introduced. And, the loss function CAL) visual object classes (VOC) [59] serve as the established
of MSFFA can be expressed as follows: standard measures. And, the adopted metrics can be expressed
N 3 as follows.
1 XX
F ȳ c ( p) − yc ( p) Average precision (AP) is the area under the precision-recall

L= (11)
N p=1 c=1 curve and is calculated based on the precision–recall curve.
( The precision (P) and recall (R) are defined as follows:
0.5x 2 , |x| < 1
F(x) = (12) TP
|x| − 0.5, |x| ≥ 1 P= (13)
TP + FP
where N is the total number of pixels in the image; p TP
represents the pth pixel; c represents the cth channel; ȳ c ( p) R= (14)
TP + FN
represents the pth pixel value of the cth channel in the
predicted image; and yc ( p) represents the pth pixel value of where true positive (TP) is the number of predicting positive
the cth channel in the real image. samples as positive samples; true negative (TN) is the number
of predicting negative samples as negative samples; false
IV. E XPERIMENTS AND R ESULTS positive (FP) is the number of predicting negative samples
as positive samples; and false negative (FN) is the number of
In this section, we conduct thorough experiments and anal- predicting positive samples as negative samples.
yses to validate the performance of our presented method. And, the AP value is the precision averaged across all values
As shown in Fig. 7, the evaluation process involves utilizing of recall between 0 and 1
synthetic foggy images sourced from the Foggy Cityscapes
[51] datasets, as well as natural foggy images extracted from AP = ∫10 p(x)d x. (15)
the RTTS [57] and RW datasets. Through a comprehensive
analysis of the results, it is evident that our presented method In general, AP is the value in a single class, and the mean
exhibits exceptional performance in foggy and challenging AP (mAP) can be calculated by taking the average of the AP
conditions. values across all classes.
A. Implementation Details B. Datasets

1) Training Strategy: The experiments were imple- Fig. 8 can present the distribution of label information and
mented using the Python programming language and bbox sizes, regarding instances of road traffic objects in the
PyTorch1.8 framework. Prior to conducting experiments, datasets. As shown in Table I, there are variations in the sizes
a suitable hardware environment was chosen. In order to of anchor boxes, which can be taken into account during the
accelerate the training process, we used the high-performance method design and training process.
Tesla V100-32G GPU. To ensure the stability and effectiveness 1) Synthetic Data: The data on Foggy Cityscapes [51] can
of training, appropriate parameters were set. Specifically, the be generated by adding synthetic fog to the images from
learning rate was set to 0.001, and the batch size was set to Cityscapes [60]. Cityscapes can provide instance segmentation
16. The training process was as follows. annotations, which are converted to the tightest rectangle of
First, we trained the detection subnet with some microsoft an instance segmentation mask as the ground-truth bounding
common objects in context (MS COCO) [58] data and natural boxes in our experiment. Regarding the Cityscapes data fea-
data. The detection subnet can be trained for 300 epochs on turing specified objects, 3457 images (2048 × 1024 pixels)
the MS COCO data and for 100 epochs on the natural data. with annotations can be adopted.
TABLE III
A BLATION S TUDY ON F OGGY C ITYSCAPES
TABLE IV
RUNTIME OF S OFT-NMS ON F OGGY C ITYSCAPES
TABLE V
C OMPARISON OF D ETECTION P ERFORMANCE ON F OGGY C ITYSCAPES
AND RTTS
Fig. 8. Label information and bbox size distribution.

And to complement images featuring concealed traffic
objects, we introduced the RW dataset. This dataset encom-
TABLE II passes real-world foggy images captured in various traffic
S TATISTIC OF F OGGY DATASETS scenes, featuring a substantial presence of concealed traffic
objects. And as shown in Table II, the dataset comprises
3538 images allocated for training, 443 images allocated for
validation, and 442 images allocated for testing.
C. Ablation Study
In this section, the findings and analysis from the ablation
experimental results can be presented. As shown in Table III,
the baseline method is YOLOv7, and we conduct ablation tests
To conduct experiments, based on the Cityscapes data, three on Foggy Cityscapes (FC005, FC01, and FC02). And as shown
concentrations of Foggy Cityscapes are artificially generated in Table IV, the runtime of the Soft-NMS algorithm can be
through the atmospheric scattering model and the depth infor- presented.
mation. On Foggy Cityscapes, the generated datasets include On the FC005 dataset, for the introduction of MSFFA, the
FC005, FC01, and FC02, corresponding to different levels of accuracy is increased by 5.6% compared with the baseline
fog density (0.005, 0.01, and 0.02). As shown in Table II, the method; for the introduction of Soft-NMS, the accuracy is
datasets can be split, while example images and corresponding increased by 0.8% compared with the baseline method. Finally,
ground truths are presented in Fig. 7. 64.6% accuracy can be obtained on the FC005 dataset. For the
2) Natural Data: As shown in Fig. 8 and Table II, the RTTS 640 × 640 image size, the Soft-NMS algorithm can achieve
[57] dataset is the public dataset, which involves 4322 natural the runtime of 10.5 ms; for the 1280 × 1280 image size, the
foggy images with annotated object classes. Soft-NMS algorithm can achieve the runtime of 13.4 ms.
TABLE VI
P ERFORMANCE E VALUATION FOR E ACH O BJECT C LASS ON F OGGY C ITYSCAPES
On the FC01 dataset, for the introduction of MSFFA, the TABLE VII
accuracy is increased by 8.6% compared with the baseline P ERFORMANCE E VALUATION FOR E ACH O BJECT C LASS ON RTTS
method; for the introduction of Soft-NMS, the accuracy is
increased by 0.9% compared with the baseline method. Finally,
67.3% accuracy can be obtained on the FC01 dataset. For the
640 × 640 image size, the Soft-NMS algorithm can achieve
the runtime of 10.4 ms; and for the 1280 × 1280 image size,
the Soft-NMS algorithm can achieve the runtime of 13.9 ms.
On the FC02 dataset, for the introduction of MSFFA, the
accuracy is increased by 10.9% compared with the baseline
method; for the introduction of Soft-NMS, the accuracy is be higher than those achieved by YOLOv5x and YOLOv7,
increased by 1.2% compared with the baseline method. Finally, respectively.
65.7% accuracy can be obtained on the FC02 dataset. For the As shown in Tables VI and VII, and Fig. 9, compared
640 × 640 image size, the Soft-NMS algorithm can achieve with YOLOv7 and YOLOv5x, the performance evaluation
the runtime of 11.1 ms; for the 1280 × 1280 image size, the for each object class on Foggy Cityscapes and RTTS can
Soft-NMS algorithm can achieve the runtime of 15.4 ms. be presented. On the FC005 dataset, our presented method
can achieve the excellent performance for APcar (0.857),
APbicycle (0.616), APperson (0.744), APrider (0.814), APmotorcycle
D. Qualitative and Quantitative Results (0.668), APbus (0.689), and APtruck (0.526), which can be
In this section, we evaluate the localization and classifica- higher than those achieved by YOLOv5x and YOLOv7,
tion performance of our presented method, compared with the respectively. On the FC01 dataset, our presented method can
excellent detectors (YOLOv7 and YOLOv5x). And, we also achieve the excellent performance for APcar (0.853), APbicycle
conduct a performance evaluation of our presented method (0.619), APperson (0.726), APrider (0.8), APmotorcycle (0.692),
for concealed object detection, compared with some excellent APbus (0.668), APcaravan (0.469), and APtruck (0.556), which
detectors and concatenation methods. can be higher than those achieved by YOLOv5x and YOLOv7,
As shown in Table V, compared with YOLOv7 and respectively. On the FC02 dataset, our presented method can
YOLOv5x, the evaluation results of detection performance achieve the excellent performance for APcar (0.841), APbicycle
on Foggy Cityscapes and RTTS can be presented. On the (0.58), APperson (0.717), APrider (0.787), APmotorcycle (0.639),
FC005 dataset, our presented method can achieve the excellent APbus (0.686), APcaravan (0.469), and APtruck (0.537), which
performance for the P value (0.779), the R value (0.576), can be higher than those achieved by YOLOv5x and YOLOv7,
the mAP@0.5 value (0.646), and the mAP@0.5:0.95 value respectively. On the RTTS dataset, our presented method
(0.447), which can be higher than those achieved by YOLOv5x can achieve the excellent performance for APperson (0.903),
and YOLOv7, respectively. On the FC01 dataset, our pre- APbus (0.746), APcar (0.924), APmotorbike (0.793), and APbicycle
sented method can achieve the excellent performance for the (0.869), which can be higher than those achieved by YOLOv5x
P value (0.817), the R value (0.591), the mAP@0.5 value and YOLOv7, respectively.
(0.673), and the mAP@0.5:0.95 value (0.472), which can As shown in Tables VIII and IX, compared with YOLOv7
be higher than those achieved by YOLOv5x and YOLOv7, and YOLOv5x, the evaluation results of classification per-
respectively. On the FC02 dataset, our presented method can formance on Foggy Cityscapes and RTTS can be presented.
achieve the excellent performance for the P value (0.818), On the FC005 dataset, our presented method can achieve
the R value (0.582), the mAP@0.5 value (0.657), and the the excellent classification performance for car (0.82), bicycle
mAP@0.5:0.95 value (0.461), which can be higher than those (0.61), person (0.71), rider (0.79), motorcycle (0.64), bus
achieved by YOLOv5x and YOLOv7, respectively. On the (0.69), caravan (0.5), and truck (0.59), which can be higher
RTTS dataset, our presented method can achieve excellent than those achieved by YOLOv5x and YOLOv7, respectively.
performance for the R value (0.786), the mAP@0.5 value On the FC01 dataset, our presented method can achieve the
(0.847), and the mAP@0.5:0.95 value (0.616), which can excellent classification performance for car (0.82), bicycle
Fig. 9. Precision–recall curves of multiple classes obtained by MSFFA-YOLO on Foggy Cityscapes and RTTS. (a) Precision–recall curves obtained by
MSFFA-YOLO on FC005. (b) Precision–recall curves obtained by MSFFA-YOLO on FC01. (c) Precision–recall curves obtained by MSFFA-YOLO on FC02.
(d) Precision–recall curves obtained by MSFFA-YOLO on RTTS.
TABLE VIII
C OMPARISON OF C LASSIFICATION P ERFORMANCE ON F OGGY C ITYSCAPES
(0.61), person (0.67), rider (0.77), motorcycle (0.61), bus excellent performance for person (0.86), bus (0.76), car (0.91),
(0.65), caravan (0.29), and truck (0.57), which can be higher and motorbike (0.75), which can be higher than those achieved
than those achieved by YOLOv5x and YOLOv7, respectively. by YOLOv5x and YOLOv7, respectively.
On the FC02 dataset, our presented method can achieve As shown in Table X, compared with these excellent
the excellent classification performance for car (0.8), bicycle object detectors and concatenation methods, the evaluation
(0.55), person (0.67), rider (0.78), motorcycle (0.59), bus results of detection performance on the RW dataset can be
(0.67), caravan (0.5), and truck (0.44), which can be higher presented. Our presented method is 41.5%, 32.3%, 31.9%,
than those achieved by YOLOv5x and YOLOv7, respectively. 13.6%, 12.4%, 8%, 4.8%, and 1.3% higher than those achieved
On the RTTS dataset, our presented method can achieve the by DSNet, Faster R-CNN-R101-FPN+, RetinaNet-R101-
Fig. 10. Results of localization and classification. (a) Results in the first row are obtained by YOLOv5x. (b) Results in the second row are obtained by
YOLOv7. (c) Results in the last row are obtained by MSFFA-YOLO.
TABLE IX
C OMPARISON OF C LASSIFICATION P ERFORMANCE ON RTTS
TABLE X
C ONCEALED O BJECT D ETECTION P ERFORMANCE C OMPARED W ITH
E XCELLENT D ETECTORS AND C ONCATENATION M ETHODS ON RW
Fig. 11. Concealed object detection in real-world foggy traffic scenes using
MSFFA-YOLO.
the FC01 dataset, and 65.7% accuracy on the FC02 dataset.

When evaluated on the natural foggy datasets, our presented
method can achieve 84.7% accuracy on the RTTS dataset and
84.1% accuracy on the RW dataset, indicating its ability to
accurately detect multiclass traffic objects in real and foggy
weather. And, the experimental results show that our presented
method can achieve the efficiency of 37 ft/s. Finally, the
experimental results demonstrate the excellent performance
FPN, YOLOv5x, YOLOv8x, YOLOv7, AOD-YOLOv7, and of our presented method for object localization and classi-
MSFFA-YOLOv7, respectively. And, the evaluation results in fication in foggy weather. And, when detecting concealed
Table X show that our presented method can achieve the traffic objects in foggy weather, our presented method exhibits
efficiency of 37 frames per second (FPS). superior accuracy. These results substantiate the applicability
As shown in Fig. 10, the results are obtained using our of our presented method for traffic investigations in foggy
presented method and the most representative comparison weather.
methods. The figure includes representative examples of
foggy traffic scenes. Regarding the results of YOLOv5x and V. C ONCLUSION
YOLOv7, there are some problems, such as missed detection, In this article, a multiclass object detection method, namely,
repeated detection, and classification errors in examples 1–4. MSFFA-YOLO network, is presented for traffic investigations
And, our presented method demonstrates exceptional perfor- in foggy weather. In this network, the presented detection
mance in terms of both object localization and classification. subnet is utilized for object localization and classification.
As shown in Fig. 11, in these challenging situations, our pre- In addition, we present a restoration subnet that utilizes
sented method exhibits exceptional proficiency in accurately the MSFFA structure to enhance visibility. Qualitative and
detecting a wide range of traffic objects, including the con- quantitative evaluation results on the synthetic and natural
cealed ones. And, the effectiveness of our presented method foggy datasets can prove that our presented method has the
in reliably detecting concealed traffic objects demonstrates its excellent performance for multiclass object detection in foggy
advantages and potential for real-world applications. weather. And, our presented method yields advancements
In summary, the experimental results on the synthetic by achieving exceptional performance in object localization
foggy datasets show that our presented method can achieve and classification tasks, while also demonstrating its superior
64.6% accuracy on the FC005 dataset, 67.3% accuracy on accuracy in detecting concealed traffic objects amidst foggy
conditions. Consequently, our presented method can contribute [15] N. Carion et al., “End-to-end object detection with transformers,” in
to traffic investigations by providing accurate and reliable Proc. Comput. Vis. (ECCV), Glasgow, U.K., 2020, pp. 213–229.
[16] M. Sozzi, S. Cantalamessa, A. Cogato, A. Kayad, and F. Marinello,
information about traffic objects in foggy weather, holding “Automatic bunch detection in white grape varieties using YOLOv3,
the potential to enhance traffic safety and optimize traffic YOLOv4, and YOLOv5 deep learning algorithms,” Agronomy, vol. 12,
management. no. 2, p. 319, Jan. 2022, doi: 10.3390/agronomy12020319.
In our future research, we will collect extensive datasets [17] M. Krišto, M. Ivasic-Kos, and M. Pobar, “Thermal object detection
in difficult weather conditions using YOLO,” IEEE Access, vol. 8,
from diverse sites featuring a range of traffic control scenes. pp. 125459–125476, 2020, doi: 10.1109/access.2020.3007481.
This comprehensive data collection will enable us to enhance [18] W. Y. Liu et al., “Image-adaptive YOLO for object detection in adverse
the generalizability of our presented method. Furthermore, weather conditions,” in Proc. AAAI Conf. Artif. Intell., Jun. 2022, vol. 36,
no. 2, pp. 1792–1800, doi: 10.1609/aaai.v36i2.20072.
we will explore innovative techniques and leverage advance-
[19] D. F. Liu, Y. M. Cui, Z. W. Cao, and Y. J. Chen, “A large-scale simulation
ments in CNN design to develop an even more powerful and dataset: Boost the detection accuracy for special weather conditions,”
efficient solution. By bridging the gap between theoretical in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Glasgow, U.K., 2020,
advancements and practical implementation, we aspire to pp. 1–8.
[20] P. Tumas, A. Nowosielski, and A. Serackis, “Pedestrian detection in
contribute to the field by providing excellent methods for severe weather conditions,” IEEE Access, vol. 8, pp. 62775–62784,
traffic investigations in challenging conditions. 2020, doi: 10.1109/access.2020.2982539.
[21] G. Li, Z. Ji, X. Qu, R. Zhou, and D. Cao, “Cross-domain object
detection for autonomous driving: A stepwise domain adaptative YOLO
R EFERENCES approach,” IEEE Trans. Intell. Vehicles, vol. 7, no. 3, pp. 603–615,
Sep. 2022, doi: 10.1109/TIV.2022.3165353.
[1] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz, and
D. Terzopoulos, “Image segmentation using deep learning: A survey,” [22] Q. Lang, L. Zhang, W. Shi, W. Chen, and S. Pu, “Exploring implicit
IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 7, pp. 3523–3542, domain-invariant features for domain adaptive object detection,” IEEE
Jul. 2022, doi: 10.1109/TPAMI.2021.3059968. Trans. Circuits Syst. Video Technol., vol. 33, no. 4, pp. 1816–1826,
[2] M. Ye, J. Shen, G. Lin, T. Xiang, L. Shao, and S. C. H. Hoi, “Deep Apr. 2023, doi: 10.1109/TCSVT.2022.3216611.
learning for person re-identification: A survey and outlook,” IEEE Trans. [23] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 2872–2893, Jun. 2022, no. 7553, pp. 436–444, May 2015, doi: 10.1038/nature14539.
doi: 10.1109/TPAMI.2021.3054775. [24] T. Y. Lin et al., “Feature pyramid networks for object detection,” in
[3] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look Proc. 30th IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR),
once: Unified, real-time object detection,” in Proc. IEEE Conf. Com- Honolulu, HI, USA, Jun. 2017, pp. 936–944.
put. Vis. Pattern Recognit. (CVPR), Seattle, WA, USA, Jul. 2016, [25] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-based
pp. 779–788. convolutional networks for accurate object detection and segmentation,”
[4] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 1, pp. 142–158,
real-time object detection with region proposal networks,” IEEE Trans. Jan. 2016, doi: 10.1109/tpami.2015.2437384.
Pattern Anal. Mach. Intell., vol. 39, no. 6, pp. 1137–1149, Jun. 2017, [26] J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and
doi: 10.1109/tpami.2016.2577031. A. W. M. Smeulders, “Selective search for object recognition,” Int.
[5] C. Y. Wang, A. Bochkovskiy, and H. Liao, “YOLOv7: Trainable bag- J. Comput. Vis., vol. 104, no. 2, pp. 154–171, Sep. 2013, doi:
of-freebies sets new state-of-the-art for real-time object detectors,” in 10.1007/s11263-013-0620-5.
Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., Vancouver, BC, [27] B. Yan, P. Fan, X. Lei, Z. Liu, and F. Yang, “A real-time
Canada, Jun. 2022, pp. 7464–7475. apple targets detection method for picking robot based on improved
[6] M. Humayun, F. Ashfaq, N. Z. Jhanjhi, and M. K. Alsadun, “Traffic YOLOv5,” Remote Sens., vol. 13, no. 9, p. 1619, Apr. 2021, doi:
management: Multi-scale vehicle detection in varying weather condi- 10.3390/rs13091619.
tions using YOLOv4 and spatial pyramid pooling network,” Electronics, [28] Y. Yu, J. Zhao, Q. Gong, C. Huang, G. Zheng, and J. Ma, “Real-
vol. 11, no. 17, p. 2748, Sep. 2022, doi: 10.3390/electronics11172748. time underwater maritime object detection in side-scan sonar images
[7] K. Wang, W. Zhang, Z. Feng, and C. Wang, “Research on the classifi- based on transformer-YOLOv5,” Remote Sens., vol. 13, no. 18, p. 3555,
cation for road traffic visibility based on the characteristics of driving Sep. 2021, doi: 10.3390/rs13183555.
behaviour—A driving simulator experiment,” J. Intell. Connected Vehi- [29] G. Jocher, A. Chaurasia, and J. Qiu. YOLO by Ultralytics.
cles, vol. 3, no. 1, pp. 30–36, 2020, doi: 10.1108/JICV-10-2019-0011. Accessed: Jan. 10, 2023. [Online]. Available: https://github.
[8] M. Todorova, D. Radojka, and B. Jasmina, “Role of functional clas- com/ultralytics/ultralytics
sification of highways in road traffic safety,” Transp. Problem, vol. 4, [30] J. Y. Chiang and Y.-C. Chen, “Underwater image enhance-
pp. 97–104, Jan. 2009. ment by wavelength compensation and dehazing,” IEEE Trans.
[9] L. Jiao et al., “A survey of deep learning-based object Image Process., vol. 21, no. 4, pp. 1756–1769, Apr. 2012, doi:
detection,” IEEE Access, vol. 7, pp. 128837–128868, 2019, doi: 10.1109/tip.2011.2179666.
10.1109/access.2019.2939201. [31] C. O. Ancuti and C. Ancuti, “Single image dehazing by multi-scale
[10] S. R. E. Datondji, Y. Dupuis, P. Subirats, and P. Vasseur, “A survey fusion,” IEEE Trans. Image Process., vol. 22, no. 8, pp. 3271–3282,
of vision-based traffic monitoring of road intersections,” IEEE Trans. Aug. 2013, doi: 10.1109/tip.2013.2262284.
Intell. Transp. Syst., vol. 17, no. 10, pp. 2681–2698, Oct. 2016, doi: [32] R. Fattal, “Dehazing using color-lines,” Acm Trans. Graph., vol. 34,
10.1109/tits.2016.2530146. no. 1, pp. 1–14, Nov. 2014, doi: 10.1145/2651362.
[11] S.-C. Huang, T.-H. Le, and D.-W. Jaw, “DSNet: Joint semantic learning
for object detection in inclement weather conditions,” IEEE Trans. [33] D. Berman, T. Treibitz, and S. Avidan, “Non-local image dehazing,” in
Pattern Anal. Mach. Intell., vol. 43, no. 8, pp. 2623–2633, Aug. 2021, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Seattle, WA,
doi: 10.1109/tpami.2020.2977911. USA, Jun. 2016, pp. 1674–1682.
[12] H. Wang et al., “YOLOv5-fog: A multiobjective visual detection [34] R. Fattal, “Single image dehazing,” ACM Trans. Graph., vol. 27, no. 3,
algorithm for fog driving scenes based on improved YOLOv5,” pp. 1–9, Aug. 2008, doi: 10.1145/1360612.1360671.
IEEE Trans. Instrum. Meas., vol. 71, pp. 1–12, 2022, doi: [35] K. He, J. Sun, and X. Tang, “Single image haze removal using dark
10.1109/tim.2022.3196954. channel prior,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 12,
[13] K. Hsieh et al., “Focus: Querying large video datasets with low latency pp. 2341–2353, Dec. 2011, doi: 10.1109/tpami.2010.168.
and low cost,” in Proc. 13th USENIX Symp. Operating Syst. Design [36] M. Ju, D. Zhang, and X. Wang, “Single image dehazing via an
Implement. (OSDI), Carlsbad, CA, USA, 2018, pp. 269–286. improved atmospheric scattering model,” Vis. Comput., vol. 33, no. 12,
[14] A. Geiger, M. Lauer, C. Wojek, C. Stiller, and R. Urtasun, “3D traffic pp. 1613–1625, Dec. 2017, doi: 10.1007/s00371-016-1305-1.
scene understanding from movable platforms,” IEEE Trans. Pattern [37] W. Wang, X. Yuan, X. Wu, and Y. Liu, “Fast image dehazing method
Anal. Mach. Intell., vol. 36, no. 5, pp. 1012–1025, May 2014, doi: based on linear transformation,” IEEE Trans. Multimedia, vol. 19, no. 6,
10.1109/tpami.2013.185. pp. 1142–1155, Jun. 2017, doi: 10.1109/tmm.2017.2652069.
[38] Q. Zhu, J. Mai, and L. Shao, “A fast single image haze removal algorithm [54] V. Milanés, S. E. Shladover, J. Spring, C. Nowakowski, H. Kawazoe,
using color attenuation prior,” IEEE Trans. Image Process., vol. 24, and M. Nakamura, “Cooperative adaptive cruise control in real traffic
no. 11, pp. 3522–3533, Nov. 2015, doi: 10.1109/tip.2015.2446191. situations,” IEEE Trans. Intell. Transp. Syst., vol. 15, no. 1, pp. 296–305,
[39] W. Zhang, J. Liang, H. Ju, L. Ren, E. Qu, and Z. Wu, “A robust haze- Feb. 2014, doi: 10.1109/tits.2013.2278494.
removal scheme in polarimetric dehazing imaging based on automatic [55] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks
identification of sky region,” Opt. Laser Technol., vol. 86, pp. 145–151, for biomedical image segmentation,” in Proc. Med. Image Comput.
Dec. 2016, doi: 10.1016/j.optlastec.2016.07.015. Comput.-Assisted Intervent., Munich, Germany, 2015, pp. 234–241.
[40] G. Yan, M. Yu, S. Shi, and C. Feng, “The recognition of traffic speed [56] N. Bodla, B. Singh, R. Chellappa, and L. S. Davis, “Soft-NMS—
limit sign in hazy weather,” J. Intell. Fuzzy Syst., vol. 33, no. 2, Improving object detection with one line of code,” in Proc. 16th IEEE
pp. 873–883, Jul. 2017, doi: 10.3233/jifs-162138. Int. Conf. Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp. 5562–5570.
[41] J. Zhang, X. Wang, C. Yang, J. Zhang, D. He, and H. Song, “Image [57] B. Li et al., “Benchmarking single-image dehazing and beyond,” IEEE
dehazing based on dark channel prior and brightness enhancement Trans. Image Process., vol. 28, no. 1, pp. 492–505, Jan. 2019, doi:
for agricultural remote sensing images from consumer-grade cameras,” 10.1109/tip.2018.2867951.
Comput. Electron. Agricult., vol. 151, pp. 196–206, Aug. 2018, doi: [58] T. Y. Lin et al., “Microsoft COCO: Common objects in context,” in
10.1016/j.compag.2018.06.010. Proc. 13th Eur. Conf. Comput. Vision (ECCV), Zurich, Switzerland,
[42] J. Yang, C. Wu, B. Du, and L. Zhang, “Enhanced multiscale fea- 2014, pp. 740–755.
ture fusion network for HSI classification,” IEEE Trans. Geosci. [59] A. Moffat and J. Zobel, “Rank-biased precision for measurement of
Remote Sens., vol. 59, no. 12, pp. 10328–10347, Dec. 2021, doi: retrieval effectiveness,” ACM Trans. Inf. Syst., vol. 27, no. 1, pp. 1–27,
10.1109/tgrs.2020.3046757. Dec. 2009, doi: 10.1145/1416950.1416952.
[43] W. J. Zhou, X. Y. Lin, J. S. Lei, L. Yu, and J. N. Hwang, [60] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, and B. Schiele,
“MFFENet: Multiscale feature fusion and enhancement network for “The cityscapes dataset for semantic urban scene understanding,” in
RGB–thermal urban road scene parsing,” IEEE Trans. Multimedia, Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Las Vegas, NV, USA,
vol. 24, pp. 2526–2538, 2022, doi: 10.1109/tmm.2021.3086618. Oct. 2016, pp. 3213–3223.
[44] D. F. Liu et al., “DenserNet: Weakly supervised visual localiza-
tion using multi-scale feature aggregation,” in Proc. AAAI Conf.
Artif. Intell., May 2021, vol. 35, no. 7, pp. 6101–6109, doi:
10.1609/aaai.v35i7.16760.
[45] B. Cai, X. Xu, K. Jia, C. Qing, and D. Tao, “DehazeNet: An
end-to-end system for single image haze removal,” IEEE Trans. Qiang Zhang received the master’s degree from
Image Process., vol. 25, no. 11, pp. 5187–5198, Nov. 2016, doi: Dalian Maritime University, Dalian, China, in 2019.
10.1109/tip.2016.2598681. He is currently pursuing the Ph.D. degree in
[46] W. Ren et al., “Single image dehazing via multi-scale convolutional transportation engineering with the School of Trans-
neural networks,” in Proc. 14th Eur. Conf. Comput. Vis. (ECCV), portation, Southeast University, Nanjing, China.
Amsterdam, The Netherlands, 2016, pp. 154–169. His research interests include computer vision,
deep learning, traffic investigation, traffic safety
[47] D. Chen et al., “Gated context aggregation network for image dehazing
analysis, data analysis, and intelligent transportation
and deraining,” in Proc. 19th IEEE Winter Conf. Appl. Comput. Vis.
systems.
(WACV), Waikoloa Village, HI, USA, Jan. 2019, pp. 1375–1383.
Mr. Zhang serves as a reviewer for several journals,
[48] W. Ren et al., “Gated fusion network for single image dehazing,” in such as Neural Computing and Applications, IEEE
Proc. 31st IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), T RANSACTIONS ON I NTELLIGENT T RANSPORTATION S YSTEMS, and IET
Salt Lake City, UT, USA, Jun. 2018, pp. 3253–3261. Intelligent Transport Systems.
[49] B. Y. Li et al., “AOD-Net: All-in-one dehazing network,” in Proc.
16th IEEE Int. Conf. Comput. Vis. (ICCV), Venice, Italy, 2017,
pp. 4780–4788.
[50] X. Fan, Z. Zhao, W. Yan, X. Yan, and P. Shi, “Multi-scale feature
fusion image dehazing algorithm combined with attention mechanism,”
Comput. Sci., vol. 49, no. 5, pp. 50–57, 2022. Xiaojian Hu received the Ph.D. degree in trans-
[51] C. Sakaridis, D. Dai, and L. Van Gool, “Semantic foggy scene under- portation engineering from Southeast University,
standing with synthetic data,” Int. J. Comput. Vis., vol. 126, no. 9, Nanjing, China, in 2009.
pp. 973–992, Sep. 2018, doi: 10.1007/s11263-018-1072-8. He is currently an Associate Professor with
[52] T. Y. Lin et al., “Focal loss for dense object detection,” in Proc. Southeast University. His research interests include
16th IEEE Int. Conf. Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, computer vision, deep learning, transportation orga-
pp. 2999–3007. nization and management, intelligent traffic systems,
transportation planning, and road traffic safety.
[53] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal loss for
dense object detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42,
no. 2, pp. 318–327, Feb. 2020, doi: 10.1109/tpami.2018.2858826.

MSFFA-YOLO Network Multiclass Object Detection For Traffic Investigations in Foggy Weather

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MSFFA-YOLO Network Multiclass Object Detection For Traffic Investigations in Foggy Weather

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, VOL.

72, 2023 2528712

MSFFA-YOLO Network: Multiclass Object

A. Object Detection B. Image Dehazing

Fig. 1. Architecture of MSFFA-YOLO network.

Fig. 3. MPConv structure.

design of the entire structure can contribute to reinforcing

B. Restoration Subnet Di = Conven (Di−1 ), i ∈ {1, 2, 3} (4)

Regarding the pixel attention structure, the calculation pro-

where PA is the weight of the pixel; Conv is the convolutional

Ui−1 = Convde (Ui ), i ∈ {3, 2, 1} (10)

Second, we trained the network with the synthetic data. And,

A. Implementation Details B. Datasets

Fig. 8. Label information and bbox size distribution.

the FC01 dataset, and 65.7% accuracy on the FC02 dataset.

You might also like