3D Object Detection in Foggy Weather Conditions

Article
3D Object Detection in Foggy Weather Conditions

Nguyen Anh Minh MAI 1,2 * , Pierre DUTHON 3 , Louahdi KHOUDOUR 1 , Alain CROUZIL 2 and Sergio A.
VELASTIN 4,5
1 Cerema, Equipe-projet STI, 1 Avenue du Colonel Roche, 31400, Toulouse, France;

nguyen-anh-minh.mai@cerema.fr (N.A.M.M.); louahdi.khoudour@cerema.fr (L.K.)
2 Institut de Recherche en Informatique de Toulouse, Université de Toulouse, UPS, 31062 Toulouse, France;
nguyen-anh-minh.mai@irit.fr (N.A.M.M.); alain.crouzil@irit.fr (A.C.)

3 Cerema, Equipe-projet STI, 8-10, rue Bernard Palissy - 63017 Clermont-Ferrand Cedex 2, France;
pierre.duthon@cerema.fr
4 Department of Computer Science and Engineering, Universidad Carlos III de Madrid, 28911 Leganés,
Madrid, Spain; sergio.velastin@ieee.org

5 School of Electronic Engineering and Computer Science, Queen Mary University of London, UK
* Correspondence: mainguyenanhminh1996@gmail.com; Tel.: +33-7585-55539 (N.A.M.M.)

† These authors contributed equally to this work.
1 Abstract: The role of sensors such as cameras or LiDAR is crucial for the environmental awareness
2 of self-driving cars. However, the data collected from these sensors is subject to distortions in
3 extreme weather conditions such as fog, rain, and snow. This issue could lead to many safety
4 problems while operating a self-driving vehicle. The purpose of this study is to analyze the effects
5 of fog on the detection of objects in driving scenes and then to propose methods for improvement.
6 Collecting and processing data in adverse weather conditions is often more difficult than data in
7 good weather conditions. Hence, a synthetic dataset that can simulate bad weather conditions is a
8 good choice to validate a method, as it is simpler and more economical, before working with a real
9 dataset. In this paper, we apply fog synthesis on the public KITTI dataset to generate the Multifog
10 KITTI dataset for both images and point clouds. In terms of processing tasks, we test our previous
11 3D object detector based on LiDAR and camera, named Spare LiDAR Stereo Fusion Network
12 (SLS-Fusion), to see how it is affected by foggy weather conditions. We propose to train using
13 both the original dataset and the augmented dataset to improve performance in foggy weather
14 conditions while keeping good performance under normal conditions. We do experiments on
15 the KITTI and the proposed Multifog KITTI datasets and show that, before any improvement,
Citation: MAI, N.A.M.; DUTHON, 16 performance is reduced by 42.67% in 3D object detection for “Moderate” objects in foggy weather
P.; KHOUDOUR, L.; CROUZIL, A.; 17 conditions. By using a specific strategy of training, results are significantly improved by 26.72%
VELATIN, S.A. Title. Sensors 2021, 1, 18 and keep performing quite well on the original dataset with a drop only of 8.23%. In summary,
0. https://doi.org/ 19 fog often causes the failure of 3D detection on driving scenes. By additional training with the
20 augmented dataset, we significantly improve the performance of the proposed 3D object detection
Received:
21 algorithm for self-driving cars in foggy weather conditions.
Accepted:
Published:
22 Keywords: Adverse weather conditions, Foggy perception, Synthetic datasets, 3D object detection,
23 Autonomous vehicles
Publisher’s Note: MDPI stays neu-
tral with regard to jurisdictional
claims in published maps and insti-
tutional affiliations.
24 1. Introduction
25 SAV: a possible criticism is that there is no guarantee that the simulated fog has
Copyright: © 2021 by the authors. 26 the same characteristics as real fog. So you might need to address that. Also, do you
Submitted to Sensors for possible 27 mean that after training for foggy conditions, the resulting system operates with a lower
open access publication under the 28 performance (8%) under normal conditions? You might address this that in a real sce-
terms and conditions of the Cre- 29 nario it might be possible to independently identify a fog condition and therefore to
ative Commons Attribution (CC
30 automatically switch from a "normal" model to a "foggy" model
BY) license (https://creativecom-
31
mons.org/licenses/by/ 4.0/).
Version August 13, 2021 submitted to Sensors https://www.mdpi.com/journal/sensors

Version August 13, 2021 submitted to Sensors 2 of 16
(a) Original sample (b) Foggy sample

Figure 1. An example of distortion of an image and a point cloud in the proposed dataset. Note that the color of each point in the point
cloud is the color of the corresponding pixel in the image (for visualization only). (a) shows the original sample from the KITTI dataset
[1] with the image above and point clouds below collected by Velodyne HDL64 S2 LiDAR [1]. (b) shows the image and the point cloud
simulated in foggy conditions with visibility equal to 52 m (V = 52 m) from the proposed dataset. Arising from receiving back-scattered
light from water drops, fog makes the image have lower contrast and causes incorrect point measurements in the point cloud.
32 Today, the trend is to bring automation applications into life to reduce the cost,
33 human strength and improve working efficiency. While humans use sensory organs
34 such as eyes, ears, nose, and touch arms to perceive the world around them, industrial
35 applications use sensors such as camera, RADAR (Radio Detection and Ranging), Kinect,
36 LiDAR (Light Detection and Ranging), IMU (Inertial Measurement Unit)... to collect
37 data before being processed by complex algorithms. This research focuses mainly on
38 self-driving cars, where cameras and LiDAR play a vital role in vehicle perception. Cam-
39 eras are often integrated in many practical computer vision applications such as retail [2],
40 security [3], automotive industry [4], healthcare [5], agriculture [6], banking [7], indus-
41 trial automation [8]. Meanwhile, LiDAR is widely used to create high-resolution graphic
42 products, with many advanced applications in the fields of geodesy [9], geomatics [10],
43 archeology [11], geography [12], geology [13], geomorphology [14], mapping [15] and
44 aeronautics [16]. This technology is also widely used for the sensing mechanisms of
45 autonomous vehicles, and it is even integrated on the iPhone 12 Pro max camera with
46 LiDAR scanner [17]. The fusion of cameras, LiDAR, and other sensors such as RADAR,
47 GPS/IMU brings the ability to perceive the environment and then make operational
48 decisions for self-driving cars.
49 Autonomous vehicles have been put into tests or even commercial uses with varying
50 degrees of automation from big tech companies such as Waymo (google), Uber, Lyft,
51 Tesla, etc. [18]. However, many factors can affect the perception ability of self-driving
52 cars, thereby potentially causing serious consequences on the lives of other participants
53 in traffic [18]. One of the causes is that the data collected from these sensors is distorted
54 due to environmental influences, and it may directly affect the awareness of self-driving
55 cars. While applications using these sensors do quite well in controlled lighting or
56 indoor environments unaffected by weather, outdoor applications face many problems.
57 For example, applications that use a camera to perceive the environment often fail
58 in extreme lighting conditions such as sunburns, low light, or night-time conditions.
59 Significant challenges arise in self-driving cars under adverse weather conditions such
60 as fog, rain, or snow, in which both the camera and LiDAR are severely affected, as
61 shown in Figure 1. As said before, this may lead to failures to sense and perceive the
62 surrounding environment for self-driving cars, which can endanger road users like
63 pedestrians or vehicles. Therefore, this work focuses on 3D object detection with camera
64 and LiDAR sensors operating in foggy weather conditions.
65 The methods that produce the best results are mostly based on deep learning
66 architectures by training a model with large amounts of data associated with labeling
67 (supervised learning). While labeling and collecting data in good conditions such as
68 daytime or sunny weather take time, that takes even more time and effort in extreme
69 weather. Thus, there is an imbalance in the amount of data recorded in extreme weather
70 conditions compared to those in normal conditions [19–23]. In addition, there may be
71 an imbalance in different levels of fog, rain, or snow when collecting data. Most of
72 the data is only collected during a given time and place, so the data will not be able
73 to fully cover all situations, and therefore trained the model in such conditions may
74 make mistakes. So, in addition to real data, the creation of synthetic data which can be
75 simulated under many controlled parameters is equally important. This paper aims at
76 generating physics-based fog data on a well-known dataset collected daytime and mild
77 sunlight conditions to provide improvements under foggy conditions.
78 Some studies also show failures and how to fix them in extreme weather conditions
79 [24–28]. However, most of them only experiment on images, and very few experiments
80 on both images and point clouds [28]. To improve the performance of the perception
81 algorithms, several studies suggest to dehaze fog, rain, or snow on images [24,29,30] and
82 more recently for point clouds [31]. Others pointed out how the combination of LiDAR
83 and cameras also affects performance [26,28]. Other studies proposed to increase the
84 amount of data during the training phase [24,25,27,32].
85 In this paper, we extends our previous work [33], a 3D object detector for self-
86 driving cars in normal weather conditions, to run in foggy conditions. we point out that
87 the late-fusion-based architecture can perform well with a justifiable training strategy in
88 foggy weather conditions. The main contributions of this paper are:
89 • Firstly, we propose a new public dataset augmented from the KITTI dataset [1] for
90 both images and point clouds through different visibility ranges (in fog) from 20
91 to 80 meters to best match an actual foggy environment. SAV: how is that match
92 demonstrated?
93 • Secondly, we show that the data collected from the camera and LiDAR is sig-
94 nificantly distorted under foggy scenes. It directly affects the performance of
95 self-driving cars’ 3D object detection algorithm, and it is confirmed through our
96 experiments.
97 • Finally, we propose an effective training strategy to train models. Experiments
98 show that the model can run well in both fine and foggy weather conditions.
99 2. Related work
100 2.1. Datasets in Adverse Weather Conditions
101 Most common datasets existing in the literature are collected in good conditions
102 such as KITTI [1], Cityscape [34], or in different lighting conditions such as BDD100K [20],
103 Waymo [35], NuScenes [19]. More recent attention has focused on the perception ability
104 of self-driving cars in adverse weather conditions, because such conditions negatively
105 affect the quaility of camera and LiDAR sensing, resulting in degraded performance.
106 Consequently, many datasets are collected in foggy conditions including Foggy Driving
107 [24], Foggy Zurich [32], SeeingThroughFog [28], nuscenes [19], BDD100k [20], Oxford
108 dataset [21], [22], [23], [36] , rain [36,37] or snow [36,38,39] condition. SAV: it is not very
109 clear why you have not used these datasets, or have you? Are there any problems with
110 such datasets?
111 However, data collection is not easy, and it may cause post-processing problems
112 such as imbalance issues or labeling errors. In contrast, synthetic datasets are increasingly
113 close to real data and can avoid such problems. Synthetic datasets can be divided into
114 two categories: physics-based such as Foggy Cityscapes [24], RainCityscapes [40], Foggy
115 Cityscapes [32], Rain augmented [41], [25], [42] and generative adversarial network
116 based (GAN-based) such as [41,43].
117 While most synthetic datasets focus only on images [24,25,32,40,41], this work aims
118 to take into account fog for both image and point cloud starting from a nice weather
119 dataset [1]. We use the physics-based procedure proposed in [28] in order to retain
120 physical properties like real fog.
121 2.2. Perception in Adverse Weather Conditions

122 While the outdoor perception algorithms are usually more sensitive under various
123 lighting conditions than indoors [44,45], perception under extreme conditions is even
124 more challenging [24,25,28] in terms of sensor degradation, lower contrast, limited
125 visibility and thereby causing errors in the prediction. In fact, previous research showed
126 the performance drop in segmentation [24–27,32], 2D object detection [24,27,28], depth
127 estimation [27] algorithms for the perception of self-driving cars in adverse weather
128 conditions. Previous studies have also shown that the performance may be improved
129 by learning with synthetic data [24,25,27,32], dehazing [24] or using late-based fusion
130 [26,28].
131 Like previous works [24,24–27,27,27,28,32], this research aims to analyze the percep-
132 tion algorithms of self-driving cars under foggy scenes with a special emphasis on the
133 3D object detection task. Indeed, performance is greatly reduced for 3D object detection
134 in foggy scenes but by training both normal and synthetic dataset that performance
135 can be improved. Overall these findings are in accordance with findings reported in
136 [24,25,27,32].
137 3. Methods
138 This section presents the definition of the fog phenomenon, how fog is modeled to
139 generate the proposed Multifog KITTI dataset, mentioned later in Section 4. Then, we
140 extend our previous 3D object detection work [33] to investigate its performance under
141 foggy weather conditions.
142 3.1. Fog phenomenon

143 Fog is the phenomenon of water vapor condensing into tiny cloud-like particles
144 that appear on and near the ground instead of in the sky. The Earth’s moisture slowly
145 evaporates and when it does, it moves up, cools, and condenses to form fog. Fog can be
146 seen as a form of low clouds.
147 Physically, fog is a phenomenon that causes dispersion. Light is scattered by
148 the suspended water droplets before falling into the image sensor. This scattering
149 phenomenon has two primary effects. First, the chief ray is attenuated before falling into
150 the sensor, and second, a signal floor of scattering light is present. These effects reduce
151 contrast, as can be seen in Figure 2, the contrast of the image is inversely proportional
152 to fog density and the reduction in contrast causes driving difficulties both for human
153 drivers and sensor-based autonomous systems or driving aids.
154 The Meteorological Optical Range (MOR), also called visibility and denoted by V,
155 is the distance in meters where contrast is no longer distinguishable on a white and black
156 target. A white and black target is perceived by the human eye as uniformly grey when
157 moving away in the fog. A limit contrast level of 5% has been defined in the standard
158 [46]. The lower the MOR, the denser the fog. By international agreement, the term fog is
159 used when visibility V is less than 1 km.
160 3.2. Fog rendering

161 Datasets under extreme weather conditions are numerically less numerous than
162 those under normal conditions (clear weather). Firstly, adverse weather conditions do
163 not occur frequently. Secondly, it is more difficult to clean and label this kind of data.
164 It causes an imbalance problem for real datasets between different types of weather
165 conditions. Therefore, the generation of synthetic datasets is very useful to develop
166 systems to work under adverse weather conditions.
Clear V = 80 m V = 50 m V = 20 m
Point clouds
Image
(a) From left to right, image and point cloud are shown for “Clear” sample and simulated fog samples (64 beams) through different
levels of visibility. Point clouds are shown in bird’s eye view (BEV) representation.
clear
7500
Number of pixels
80 m
50 m
5000
20 m
2500
0
0 50 100 150 200 250
Pixel intensity
(b) The image histogram shows the pixel intensity distribution of four images in (a) which are converted to grayscale images here for
simplicity.
Figure 2. The distortion of the image and point cloud according to different levels of visibility in a foggy scene. As visibility decreases,
the contrast of the image and the range of the point cloud also decrease signficantly, as shown in (a), (b).
167 There are different ways to generate artificial data under foggy weather conditions:
168 a) acquisition under controlled conditions [47] or b) augmentation on another normal
169 condition dataset. For the second type, there are also different ways to model fog, such
170 as physics-based [24,25,32,40,41] or GAN-based [41] modeling. This paper uses the
171 physics-based method as it can keep the physical properties of the weather, and it has
172 been researched for a long time.
173 While radar is not significantly affected by fog [48], data collected from LiDAR and
174 cameras are quite distorted as shown in Figure 1 and Figure 2. Therefore, the following
175 describes how point clouds (LiDAR) and images (camera) are modeled and tested later
176 to show the influence of fog on these sensors and the performance of vision algorithms.
177 3.2.1. Camera in fog

Based on Koschmieder Law [49] in 1924, Sakaridis et al. [24] formulated the equa-
tion to obtain the observed foggy image I f oggy (u, v) at pixel (u, v) as follows:
I f oggy (u, v) = t(u, v) Iclear (u, v) + (1 − t(u, v)) L, (1)
178 where Iclear (u, v) denotes the latent clear image, L the atmospheric light which is assumed
179 to be globally constant (generally valid only for daytime images), and in case of a
180 homogeneous medium, t(u, v) = exp(− βD (u, v)) is the transmission coefficient, where
181 β is the fog density (or attenuation) coefficient, and D (u, v) is the scene depth at pixel
182 (u, v).
Fog thickness is controlled by β. Using Koschmieder’s Law, visibility V, can be
described by equation:
CT = exp(− βV ) (2)
or,
ln(CT )
V=− , (3)
β
183 where CT is the contrast threshold. According to the International Commission on
184 Illumination (CIE) [50], CT has a value of 0.05 for daytime visibility estimation.
185 3.2.2. LiDAR in fog

Gruber et al. [51] assume that beam divergence is not affected by fog. In this model,
a returned pulse echo is always registered as long as the received laser intensity is larger
than the effective noise floor. However, severe back-scatter from fog may lead to direct
back-scatter from points within the scattering fog volume, which is quantified by the
transmissivity t(u, v). Then, the observed foggy LiDAR L f oggy (u, v) can be modeled
using the following equation [51]:
L f oggy (u, v) = t(u, v) Lclear (u, v), (4)
186 where Lclear (u, v) and L f oggy (u, v) are the intensity of the light pulses emitted from
187 LiDAR and the received signal intensity, respectively.
188 Modern scanning LiDAR systems implement adaptive laser gain to increase the
189 signal for a given
noise
floor, see also [7]: citation?, yielding the maximum distance of
1 n
190 dmax = 2β ln Lclear + g with n being the detectable noise floor.
191 SAV: note that I prefer to avoid "we...", so I have tended to change that into the
192 passive form
193 This section has shown how to digitally add fog to a dataset initially acquired in
194 clear weather. The 3D object detection method applied to this dataset is presented next.
195 3.3. 3D object detection algorithm

196 Here, we use our previous 3D object detection algorithm which we called SLS-
197 Fusion [33]. Figure 3, shows a block diagram for this 3D detector. It takes a pair of stereo
198 images and the re-projected depth map of simulated LiDAR 4 beams on the left and
199 right image as input. It uses late-fusion and is divided into 3 parts: depth estimation,
200 converting data representation, and LiDAR-based 3D object detection.
201 The model takes stereo images (Il , Ir ) and the corresponding simulated stereo
202 images by projecting LiDAR 4 beams, for the left and right side (Sl , Sr ). Sl and Sr are
203 simulated using the formula proposed in [52]. An encoder-decoder network is used
204 to extract features from both images and point clouds. The proposed network has a
205 weight-sharing pipeline for both LiDAR and images, (Il , Sl ) and (Ir , Sr ), instead of only
206 using left and right images as proposed in [52,53]. Once the left and right features are
207 obtained from the decoding stage, they are passed to the Depth Cost Volume (DeCV)
208 proposed in [52] to learn the depth information. Here, as in [52], the smooth L1loss
209 function is used:
∑ |d(u, v) − D (u, v)|, (5)

(u,v)∈ Il
210 where d(u, v) denotes the valid depth ground truth. The predicted depth map is D,
211 where D (u, v) is the depth corresponding to the pixel (u, v) in the left image Il . Then
212 pseudo point clouds are generated using a pinhole camera model. Given the depth
Left Image
Weight Sharing
Right Image 2x 4x 8x 16x
Pseudo point cloud Depth map prediction
feature
data representation
Left
Converting
softmax
Depth Disparity
Cost Cost
Volume Volume
feature
Right
Left LiDAR 4 beams
Reinforced by
LiDAR 4 beams
Weight Sharing
2x 4x 8x 16x
Corrected pseudo point cloud Right LiDAR 4 beams
LiDAR-based
3D object detector
3D bounding boxes prediction 3D Convolution ResNet Block Up-Projection Upsample
Figure 3. Overview of the SLS-Fusion framework.
213 D (u, v) and camera intrinsic matrix, the 3D location (Xc , Yc , Zc ) in the camera coordinate
214 system for each pixel (u, v) is given by:
(depth) Zc = D (u, v) (6a)

(u − cU ) × Zc
(width) Xc = (6b)
fU
(v − cV ) × Zc
(height) Yc = (6c)
fV
215 where cU and cV are the coordinates of the principal point and f U and f V are respectively
216 the focal length in pixel width and height. Following [52], LiDAR 4 beams are used
217 to reinforce the quality of the pseudo point cloud. Then each point (Xc , Yc , Zc , 1) is
218 transformed into (Xl , Yl , Zl , 1) in the LiDAR coordinate system (the real world coordinate
219 system). The pseudo point
cloud
is filled by adding reflectance as 1. Given the camera
R t
220 extrinsic matrix C = , where R and t are respectively the rotation matrix and
0 1
221 translation vector SAV: something missing here?. The pseudo point cloud can be obtained
222 as follows:
   
Xl Xc
 Yl 
  = C −1  Yc .
 
 Zl   Zc  (7)
1 1
223 Once the pseudo point cloud is obtained, it can be treated like a normal point
224 cloud, although its accuracy depends on the quality of the predicted depth. Similar to
225 Pseudo-LiDAR++ [52], the input (point clouds 4 beams) is used to correct for errors in
226 the pseudo point cloud. This is a refinement step to obtain a more accurate point cloud.
227 Then, the depth map is converted into a pseudo point cloud. The idea is to leverage
228 the performance of current leading LiDAR-based methods such as PointRCNN [54] to
229 detect objects.
230 4. Experiments
231 4.1. Datasets and evaluation metrics
232 4.1.1. Datasets
233 This part presents the datasets that have been used for this work and it introduces
234 a new synthetic dataset in foggy situations for both image and LiDAR based on the
235 KITTI dataset [1]. As the detector [33] is based on the pseudo-LiDAR pipeline [55], the
236 training procedure is divided into two parts: depth prediction and 3D object detection.
237 The corresponding dataset for each part is introduced as follows. The Scene Flow dataset
238 [56], a large-scale synthetic dataset, is used first for training the depth estimation part. It
239 has 35,454 images of size is 960×540 for training and 4,370 images for testing.
240 The KITTI object detection dataset [1] is used for both fine-tuning the depth estima-
241 tion part and for training the 3D object detection part. To avoid confusion, from now
242 it is referred to as “Clear KITTI” dataset. This is the most common dataset for driving
243 scenes collected in the daytime and under good weather conditions. It has 7,481 training
244 samples and 7,518 testing samples for both stereo and LiDAR. Like most other studies,
245 here the training dataset is divided into a training part (3,712 samples) and a validation
246 part (3,769 samples) as proposed in [57].
247 The new proposed Multifog KITTI dataset is generated by using the equations
248 given in Section 3 for different visibility levels from 20 to 80 m. As the depth map is
249 requested in equations 1 and 4 for each frame, the algorithm proposed in [58] is applied
250 to the depth map D corresponding to each image on the left side Il and the same for the
251 right image Ir . The default configuration proposed in [51] is used, with g = 0.35 and
252 n = 0.05 for the Velodyne HDL64 S2 LiDAR used in the KITTI dataset. Figure 4 shows
253 the number of samples for each visibility level for the training and validation sets of the
254 Multifog KITTI dataset. This dataset is used similarly to the Clear KITTI dataset for both
255 depth estimation and 3D object detection parts. The proposed Multifog KITTI dataset
256 contains 7,481 training samples and 7,518 testing samples for stereo images, LiDAR 64
257 beams and LiDAR 4 beams.
258 4.1.2. Evaluation metrics

To measure the performance of the 3D object detection task, average precision (AP)
is computed across 11 recall positions values between 0 and 1 as proposed in [59] for
both 3D and bird’s eye view (BEV) levels with intersection over union (IoU) thresholds
at 0.5 and 0.7. SAV: is it necessary to have these equations?
1
AP = ∑ APr
11 r∈{0.0,...,1.0 }
(8)
1
= ∑ pinterp (r),
11 r∈{0.0,...,1.0 }
259 where the interpolated precision for a given recall value (r) is given by:
pinterp (r ) = max p(e

r ). (9)
r ≥r
r :e
e
260 According to [1], objects are divided into three levels of difficulty: easy, moderate
261 and hard, depending on the 2D bounding box sizes, occlusion, and truncation extent
262 appearing on the image. This study focuses on detecting “Car” objects because the car is
263 one of the main objects and occupies the largest percentage in the KITTI dataset.
µ = 50.0, σ = 17.5 µ = 50.1, σ = 17.3
80 80
70 70
Number of samples
Number of samples
60 60
50 50
40 40
30 30
20 20
10 10
0 0
20 40 60 80 20 40 60 80
Visibility distance level (m) Visibility distance level (m)
(a) Training set (b) Validation set

Figure 4. Multifog dataset, distribution of MOR in the database. TODO: add some details here
264 4.2. Experimental protocols

265 To better understand the different steps of the experiments, this part presents the
266 series of experiments carried out and explains why they were undertaken.
267 As said earlier, for 3D object detection, SLS-Fusion [33] was used. Satisfactory
268 results have been achieved with this algorithm on the KITTI dataset which is collected
269 in good lighting and normal weather conditions. Then, the trained model is applied
270 to the Clear KITTI, to verify whether the model is still satisfactory when dealing with
271 foggy weather conditions, i.e., when evaluated on the MultiFog KITTI. SAV: I am getting
272 confused. What is "Clear KITTI"? why is the model applied to Clear KITTI "to verify ...
273 when evaluated on the Multifo KITTTI"?
274 Secondly, the opposite of the above test is done: the model is trained on the Multifog
275 KITTI and then evaluated on Clear KITTI and Multifog KITTI. SAV: but not on KITII?
276 The purpose of this experiment and the one above is to verify whether the model
277 performance is affected by swapping the training and the evaluation datasets.
278 Then, we try to make the training on both Multifog and Clear KITTI together and
279 to verify the results on both sets with only one training. SAV: this sentence is very
280 confusing, can you please rephrase it?
281 Finally, the proposed algorithm is compared with the baseline Pseudo-LiDAR++
282 [52]. The results are presented on Clear KITTI dataset and also on Multifog KITTI
283 dataset.SAV: why compared with only one other method?
284 4.3. Implementation details

285 Both training and evaluation processes are carried out similarly through the experi-
286 ments presented in Section 4.2.
287 The LiDAR 4 beams used in the input of the network is simulated based on the
288 HDL-64E LiDAR sensor to resemble the ScaLa sensor as proposed in [52].
289 For both training and evaluation, the depth estimation part, which is written in
290 the pytorch framework, is trained and evaluated first, on the Scene Flow dataset. As
291 Scene Flow does not contain point cloud data, the input LiDAR is set to zeros. Then the
292 network is fine-tuned on the KITTI dataset (Clear or Multifog) for 100 epochs with a
293 batch size of 4 and a learning rate of 0.001. Especially when training on both Clear and
294 Multifog KITTI datasets, the network is trained with a double number of epochs SAV:
295 ???. For the detector part, the common LiDAR-based object detector PointRCNN [54] is
296 employed using the default configuration for both training and evaluation. PointRCNN
297 is a LiDAR-based method with high performance and used by many other methods
298 and used in this work to detect objects based on point clouds. As it was designed to
299 take into account sparse point clouds, the dense point cloud is sub-sampled to 64 beam
300 LiDAR. Then, the released implementations of PointRCNN is used directly, and their
301 guidelines are followed to train it on the training set of the KITTI object detection dataset.
302 Like our previous work, we only train for the “Car” class because car is one of the main
303 objects and occupies the largest percentage in KITTI dataset, which causes the imbalance
304 between “Car” and other classes, as solving this imbalance problem is not the goal of
305 this work.
306 All experiments in this paper were run on 2 NVIDIA GeForce GTX 1080 Ti GPUs
307 with 11 GB memory and on the Ubuntu operating system.
308 5. Results
309 As shown in Table 1 and Table 2, the results are respectively shown for the IoU of
310 0.5 and IoU of 0.7. In each cell of these tables, a pair of numbers A/B corresponds to
311 the results obtained with the APBEV and AP3D metrics on the Clear KITTI or Multifog
312 KITTI datasets (Table 1 for IoU 0.5 and Table 2 for IoU 0.7). These two tables show the
313 results and methods achieved through different experiments on the training dataset
314 and the testing dataset. Similarly, tables 3 and 4 compare the proposed method with
315 Pseudo-LiDAR++ [52].
316 5.1. SLS-Fusion algorithm evaluation in foggy conditions

317 In this section, the SLS-Fusion 3D object detector is evaluated for different cases
318 of training data and test data, and the results are reported in Tables 1 (0.5 IoU) and 2
319 (0.7 IoU). Whether for 0.5 or 0.7, on the first row we will find the detection results with
320 the use of our algorithm, SLS-Fusion [33] SAV: I dont understand. For both cases, the
321 network is trained and evaluated on the Clear KITTI dataset.
322 The following row (number 2 in both tables) shows that the detection performance
323 is drastically reduced when using weights trained on Clear KITTI and then evaluated
324 on Multifog KITTI (foggy dataset). This shows that the method trained only on clear
325 weather datasets, do not respond well to degraded conditions. Note that this data is not
326 dehazed before it is fed to the model.
327 Subsequently, the objective is to investigate the detection performance when under
328 fog (row number 3). The training part of the Multifog KITTI dataset is used as training
329 data and the test part of Multifog KITTI is used for testing. SAV: I removed some text as
330 it was repetitive It can be seen that detection performance increases markedly whether
331 for the “Easy”, “Moderate” or “Hard” cases compared to the Clear/MultiFog test (row
332 number 2).
333 We further test the robustness of the algorithm, learning and testing datasets are
334 inverted (rows 4 of the two tables): Multifog KITTI is used for training and Clear
335 KITTI dataset to evaluate. The results are quite poor, understandably so because the
336 distribution of the original data is different compared with this new augmented dataset
337 (similar to the Clear/Multifog test). SAV: I do not know what you mean
338 Finally, the last tests (rows 6 and 7 of the two tables) consist in using training
339 datasets as broadly as possible. In this case, we used the Multifog and Clear KITTI
340 datasets SAV: for training?, and tested on the Clear and Multifog datasets independently.
341 In general, it was found that the results were quite good SAV: "quite good" is vague!
342 do you mean better? worse? the same? compared to the experiments trained on the
343 individual datasets (either Clear KITTI or Multifog KITTI). We can also keep a trade-off
344 between the performance on Clear KITTI and Multifog KITTI within only one training.
345 SAV: sorry, not clear
346 The main finding SAV: ??? is the case of training on both Clear KITTI and Multifog
347 KITTI (rows 5 and 6 in the tables 1 and 2). The predicted results on the Multifog KITTI
348 dataset are:
349 • AP3D @0.5: 89.82 % (Easy), 78.19% (Moderate) and 75.05 % (Hard)
350 • AP3D @0.7: 69.07 % (Easy), 47.95 % (Moderate) and 45.50 % (Hard)
351 and the predicted results on the Clear KITTI dataset:
354 Figure 5 presents some illustrative results of car detection, showing 3D predicted
355 bounding boxes. It can partly confirm the rationality of the results shown in Tables 1
356 and 2. SAV: "partly"? "rationality of the results?"
357 In the next part, the proposed method is compared with the baseline method,
358 Pseudo-LiDAR++ [52]. SAV: I do not understand why you said many times "our base-
359 line", as I do not think Pseudo-LiDAR++ is yours, no?
Table 1. APBEV / AP3D results with training and testing on the Clear KITTI or Multifog KITTI
datasets for “Car” objects with IoU at 0.5 and on three levels of difficulty: Easy, Moderate and
Hard.
APBEV /AP3D @0.5

Method Training set Test set
Easy Moderate Hard
Our [33] Clear Clear 93.16/ 93.02 88.81/ 86.19 83.35/ 84.02
Our Clear MultiFog 72.66/ 66.60 47.11/ 41.86 41.24/ 37.97
Our MultiFog MultiFog 90.27/ 90.15 79.17/ 78.01 76.12/ 70.21
Our MultiFog Clear 57.33/ 55.00 39.10/ 38.38 33.55/ 32.58
Our MultiFog + Clear Clear 95.36/ 95.10 86.90/ 85.04 84.75/ 79.10
Our MultiFog + Clear MultiFog 89.94/ 89.82 79.13/ 78.19 76.36/ 75.05
Table 2. APBEV / AP3D results with training and testing on the Clear KITTI or Multifog KITTI
datasets for “Car” objects with IoU at 0.7 and on three levels of difficulty: Easy, Moderate and
Hard.
APBEV /AP3D @0.7

Easy Moderate Hard
Our [33] Clear Clear 87.51/ 76.67 76.88/ 63.90 73.55/ 56.78
Our Clear MultiFog 44.57/ 30.89 29.76/ 21.23 26.50/ 18.61
Our MultiFog MultiFog 83.42/ 69.57 62.79/ 48.19 56.84/ 44.85
Our MultiFog Clear 41.54/ 30.07 27.74/ 21.10 23.87/ 18.37
Our MultiFog + Clear Clear 86.23/ 72.14 72.15/ 55.67 67.40/ 53.57
360 5.2. Comparison with the baseline method

361 In [33], we compared our method and the baseline method Pseudo-LiDAR++ on
362 Clear KITTI dataset (normal weather conditions scenes), and it was found that our
363 algorithm provided better results on several metrics as shown in Tables 3 and 4. Better
364 results are in bold. These tables, have all the comparisons (APBEV / AP3D ) for IoU of 0.5
365 (Table 3) and IoU of 0.7 (Table 4).
366 In Table 3 (rows 1 and 2), for the experiment on Clear KITTI, it can be noticed that
367 the proposed algorithm is better in some categories such as AP3D for “Easy” objects
368 (93.02% against 90.3%). For “Moderate” objects, the algorithm achieves 88.81% against
369 87.7% (APBEV ). SAV: but....Hard? why not better?
370 Table 3 (rows 3 and 4), for the experiment on Multifog KITTI, shows that the
371 proposed algorithm outperforms the baseline in all comparisons regardless of the object
372 difficulty. For example, for 3D detection comparison (AP3D ), the method achieved
373 89.82% versus 88.65% for “Easy”, 78.19% versus 77.21% for “Moderate” and 75.05%
374 versus 69.59% for “Hard”.
Trained on Clear Trained on Clear + Multifog
Text
Test on Clear KITTI
Test on Multifog KITTI
Figure 5. Qualitative results for 3D object detection on the Clear KITTI dataset and the proposed dataset (Multifog KITTI). 3D bounding
boxes in red and in green denote the ground truth and the prediction for objects in the scene, respectively. Note that the point cloud is
shown in BEV representation.
375 In Table 4 (rows 1 and 2), for clear KITTI, it can be noticed that the algorithm is
376 better in some categories such as AP3D for “Easy” objects (76.67% against 75.1%), AP3D
377 for “Moderate” (63.90% against 63.8%) or APBEV for “Hard” (73.55% against 73.4%).
378 SAV: how about the others? you need to be more than descriptive, you need to discuss
379 the results
380 In Table 4 (rows 3 and 4), for the experiment on Multifog KITTI, it can be seen that
381 the algorithm is better on the APBEV metric for every level of difficulty objects such
382 as 84.30% against 82.47% for “Easy”, 63.12% APBEV against 62.54% for “Moderate” or
383 57.84% against 56.87% for “Hard”.
Table 3. Comparison with the baseline method, Pseudo-LiDAR++ [52], with IoU of 0.5
APBEV /AP3D @0.5

Easy Moderate Hard
PL++ [52] Clear Clear 90.3/ 90.3 87.7/ 86.9 84.6/ 84.2
Our [33] Clear Clear 93.16/ 93.02 88.81/ 86.19 83.35/ 84.02
PL++ MultiFog + Clear MultiFog 88.77/ 88.65 78.39/ 77.21 75.23/ 69.59
Table 4. Comparison with our baseline method, Pseudo-LiDAR++ [52], with IoU of 0.7.
APBEV /AP3D @0.7

Easy Moderate Hard
PL++ [52] Clear Clear 88.2/ 75.1 76.9/ 63.8 73.4/ 57.4
Our [33] Clear Clear 87.51/ 76.67 76.88/ 63.90 73.55/ 56.78
PL++ MultiFog + Clear MultiFog 82.47/ 70.22 62.54/ 48.34 56.87/ 45.74
384 6. Conclusions
385 The main objective of this research was to analyze the effect of fog on 3D object
386 detection algorithms. To do this, a new synthetic dataset was created for foggy driving
387 scenes and called Multifog KITTI dataset. This dataset was created from an initial dataset
388 (KITTI dataset) with the goal of applying fog at different levels of visibility (20 to 80 m).
389 This dataset covers the left and right images, 4-beam LiDAR data and 64-beam LiDAR
390 data, although the foggy 64-beam data is yet to be exploited.
391 It was found that the addition of fog on the images and the LiDAR data leads
392 to irregularities (lower contrast) in the images and distortions in the 3D point clouds
393 of the LiDAR targets. The objective was then to test the robustness of SLS-Fusion
394 algorithm in dealing with the degradation of image and LiDAR data. The first series
395 of tests consisted of verifying the negative effect of the fog on the detection algorithm.
396 Processing foggy data as input and using normal data in training, leads to a degradation
397 of results. Thus, the degradation of the detections of 3D objects is clearly observed
398 (detection rate decreasing from 93% to 72%). The second major finding was that the
399 performance of the 3D object detection algorithm can be improved by directly training
400 with both the KITTI dataset and the synthetic Multifog KITTI dataset, even without
401 dehazing. These results add to the rapidly expanding field of perception in adverse
402 weather conditions, specifically in foggy scenes.
403 Another test consisted of comparing our SLS-Fusion algorithm with the baseline,
404 which is Pseudo-LiDAR++. It was discovered that the proposed method outperforms
405 the baseline on different metrics for the proposed Multifog KITTI dataset. This result
406 is very satisfactory because it shows the robustness of the method when dealing with
407 foggy datasets.
408 The scope of this study was limited in terms of the effect of point clouds. As the data
409 used in the experiments is 4 beam point clouds, its effect on the algorithm is not as large
410 as an algorithm that uses 64 beam point clouds. These findings provide the following
411 insights for future research: perform experiments on the Multifog KITTI dataset to show
412 more clearly the effects of the fog with various 3D objects detection algorithms.
413 SAV: mmhhhh I think you need to say more about future research. I would void
414 "provide the following insights" just say, in the future we plan to ....
415 Author Contributions: For research articles with several authors, a short paragraph specifying
416 their individual contributions must be provided. The following statements should be used
417 “Conceptualization, X.X. and Y.Y.; methodology, X.X.; software, X.X.; validation, X.X., Y.Y. and
418 Z.Z.; formal analysis, X.X.; investigation, X.X.; resources, X.X.; data curation, X.X.; writing—
419 original draft preparation, X.X.; writing—review and editing, X.X.; visualization, X.X.; supervision,
420 X.X.; project administration, X.X.; funding acquisition, Y.Y. All authors have read and agreed
421 to the published version of the manuscript.”, please turn to the CRediT taxonomy for the term
422 explanation. Authorship must be limited to those who have contributed substantially to the
423 work reported.
424 Data Availability Statement: In this section, please provide details regarding where data sup-
425 porting reported results can be found, including links to publicly archived datasets analyzed or
426 generated during the study. Please refer to suggested Data Availability Statements in section
427 “MDPI Research Data Policies” at https://www.mdpi.com/ethics. You might choose to exclude
428 this statement if the study did not report any data.
429 Conflicts of Interest: “The authors declare no conflict of interest.”
430 Abbreviations
431 The following abbreviations are used in this manuscript:
432
MOR
IOU
SLS-Fusion
433 PL++
PL
BEV
AP
References
1. Geiger, A.; Lenz, P.; Urtasun, R. Are we ready for autonomous driving? The KITTI vision benchmark suite. 2012 IEEE Conference
on Computer Vision and Pattern Recognition 2012, pp. 3354–3361.
2. Tonioni, A.; Serra, E.; Di Stefano, L. A deep learning pipeline for product recognition on store shelves. 2018 IEEE International
Conference on Image Processing, Applications and Systems (IPAS). IEEE, 2018, pp. 25–31.
3. Sreenu, G.; Durai, M.S. Intelligent video surveillance: a review through deep learning techniques for crowd analysis. Journal of
Big Data 2019, 6, 1–27.
4. Kuutti, S.; Bowden, R.; Jin, Y.; Barber, P.; Fallah, S. A survey of deep learning applications to autonomous vehicle control. IEEE
Transactions on Intelligent Transportation Systems 2020.
5. Gao, J.; Yang, Y.; Lin, P.; Park, D.S. Computer vision in healthcare applications. Journal of Healthcare Engineering 2018.
6. Gomes, J.F.S.; Leta, F.R. Applications of computer vision techniques in the agriculture and food industry: a review. European Food
Research and Technology 2012, 235, 989–1000.
7. Hemery, B.; Mahier, J.; Pasquet, M.; Rosenberger, C. Face authentication for banking. First International Conference on Advances
in Computer-Human Interaction. IEEE, 2008, pp. 137–142.
8. Villalba-Diez, J.; Schmidt, D.; Gevers, R.; Ordieres-Meré, J.; Buchwitz, M.; Wellbrock, W. Deep learning for industrial computer
vision quality control in the printing industry 4.0. Sensors 2019, 19, 3987.
9. Kim, C.; Cho, S.; Sunwoo, M.; Resende, P.; Bradaï, B.; Jo, K. A Geodetic Normal Distribution Map for Long-term LiDAR
Localization on Earth. IEEE Access 2020.
10. Popescu, G.; Iordan, D.; others. An overall view of LiDAR and sonar systems used in geomatics applications for hydrology.
Scientific Papers 2018, p. 174.
11. Albrecht, C.M.; Fisher, C.; Freitag, M.; Hamann, H.F.; Pankanti, S.; Pezzutti, F.; Rossi, F. Learning and Recognizing Archeological
Features from LiDAR Data. 2019 IEEE International Conference on Big Data (Big Data). IEEE, 2019, pp. 5630–5636.
12. Dong, P.; Chen, Q. LiDAR remote sensing and applications; CRC Press, 2017.
13. Putkinen, N.; Eyles, N.; Putkinen, S.; Ojala, A.E.; Palmu, J.P.; Sarala, P.; Väänänen, T.; Räisänen, J.; Saarelainen, J.; Ahtonen, N.;
others. High-resolution LiDAR mapping of glacial landforms and ice stream lobes in Finland. Bulletin of the Geological Society of
Finland 2017, 89.
14. Le Mauff, B.; Juigner, M.; Ba, A.; Robin, M.; Launeau, P.; Fattal, P. Coastal monitoring solutions of the geomorphological response
of beach-dune systems using multi-temporal LiDAR datasets (Vendée coast, France). Geomorphology 2018, 304, 121–140.
15. Labbé, M.; Michaud, F. RTAB-Map as an open-source lidar and visual simultaneous localization and mapping library for
large-scale and long-term online operation. Journal of Field Robotics 2019, 36, 416–446.
16. Vrancken, P.; Herbst, J. Development and Test of a Fringe-Imaging Direct-Detection Doppler Wind Lidar for Aeronautics. EPJ
Web of Conferences, 2019, pp. 1–4.
17. iPhone 12 Pro and iPhone 12 Pro Max-Apple. Available online:. http://web.archive.org/web/20080207010024/http://www.80
8multimedia.com/winnt/kernel.htm. (accessed on 19 April 2021).
18. Wikipedia contributors. Self-driving car — Wikipedia, The Free Encyclopedia. https://en.wikipedia.org/w/index.php?title=
Self-driving_car&oldid=1026874153, 2021. [Online; accessed 7-June-2021].
19. Caesar, H.; Bankiti, V.; Lang, A.H.; Vora, S.; Liong, V.E.; Xu, Q.; Krishnan, A.; Pan, Y.; Baldan, G.; Beijbom, O. nuscenes:
A multimodal dataset for autonomous driving. Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition, 2020, pp. 11621–11631.
20. Yu, F.; Chen, H.; Wang, X.; Xian, W.; Chen, Y.; Liu, F.; Madhavan, V.; Darrell, T. Bdd100k: A diverse driving dataset for
heterogeneous multitask learning. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020,
pp. 2636–2645.
21. Maddern, W.; Pascoe, G.; Linegar, C.; Newman, P. 1 year, 1000 km: The Oxford RobotCar dataset. The International Journal of
Robotics Research 2017, 36, 3–15.
22. Maanpää, J.; Taher, J.; Manninen, P.; Pakola, L.; Melekhov, I.; Hyyppä, J. Multimodal End-to-End Learning for Autonomous
Steering in Adverse Road and Weather Conditions. arXiv preprint arXiv:2010.14924 2020.
23. Dahmane, K.; Amara, N.E.B.; Duthon, P.; Bernardin, F.; Colomb, M.; Chausse, F. The Cerema pedestrian database: A specific
database in adverse weather conditions to evaluate computer vision pedestrian detectors. 2016 7th International Conference on
Sciences of Electronics, Technologies of Information and Telecommunications (SETIT). IEEE, 2016, pp. 472–477.
24. Sakaridis, C.; Dai, D.; Van Gool, L. Semantic foggy scene understanding with synthetic data. International Journal of Computer
Vision 2018, 126, 973–992.
25. Hahner, M.; Dai, D.; Sakaridis, C.; Zaech, J.N.; Van Gool, L. Semantic understanding of foggy scenes with purely synthetic data.
2019 IEEE Intelligent Transportation Systems Conference (ITSC). IEEE, 2019, pp. 3675–3681.
26. Pfeuffer, A.; Dietmayer, K. Robust semantic segmentation in adverse weather conditions by means of sensor data fusion. 2019
22th International Conference on Information Fusion (FUSION). IEEE, 2019, pp. 1–8.
27. Tremblay, M.; Halder, S.S.; de Charette, R.; Lalonde, J.F. Rain rendering for evaluating and improving robustness to bad weather.
International Journal of Computer Vision 2021, 129, 341–360.
28. Bijelic, M.; Gruber, T.; Mannan, F.; Kraus, F.; Ritter, W.; Dietmayer, K.; Heide, F. Seeing through fog without seeing fog: Deep
multimodal sensor fusion in unseen adverse weather. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition, 2020, pp. 11682–11692.
29. He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE transactions on pattern analysis and machine
intelligence 2010, 33, 2341–2353.
30. Wang, Y.K.; Fan, C.T. Single image defogging by multiscale depth fusion. IEEE Transactions on image processing 2014, 23, 4826–4837.
31. Heinzler, R.; Piewak, F.; Schindler, P.; Stork, W. Cnn-based lidar point cloud de-noising in adverse weather. IEEE Robotics and
Automation Letters 2020, 5, 2514–2521.
32. Sakaridis, C.; Dai, D.; Hecker, S.; Van Gool, L. Model adaptation with synthetic and real data for semantic dense foggy scene
understanding. Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 687–704.
33. Mai, N.A.M.; Duthon, P.; Khoudour, L.; Crouzil, A.; Velastin, S.A. Sparse LiDAR and Stereo Fusion (SLS-Fusion) for Depth
Estimationand 3D Object Detection. arXiv preprint arXiv:2103.03977 2021.
34. Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes
Dataset for Semantic Urban Scene Understanding. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016,
pp. 3213–3223.
35. Sun, P.; Kretzschmar, H.; Dotiwalla, X.; Chouard, A.; Patnaik, V.; Tsui, P.; Guo, J.; Zhou, Y.; Chai, Y.; Caine, B.; others. Scalability
in perception for autonomous driving: Waymo open dataset. Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition, 2020, pp. 2446–2454.
36. Kenk, M.A.; Hassaballah, M. DAWN: Vehicle Detection in Adverse Weather Nature Dataset. arXiv preprint arXiv:2008.05402 2020.
37. Jin, J.; Fatemi, A.; Lira, W.; Yu, F.; Leng, B.; Ma, R.; Mahdavi-Amiri, A.; Zhang, H. RaidaR: A Rich Annotated Image Dataset of
Rainy Street Scenes. arXiv preprint arXiv:2104.04606 2021.
38. Pitropov, M.; Garcia, D.E.; Rebello, J.; Smart, M.; Wang, C.; Czarnecki, K.; Waslander, S. Canadian adverse driving conditions
dataset. The International Journal of Robotics Research 2021, 40, 681–690.
39. Lei, Y.; Emaru, T.; Ravankar, A.A.; Kobayashi, Y.; Wang, S. Semantic image segmentation on snow driving scenarios. 2020 IEEE
International Conference on Mechatronics and Automation (ICMA). IEEE, 2020, pp. 1094–1100.
40. Hu, X.; Fu, C.W.; Zhu, L.; Heng, P.A. Depth-attentional features for single-image rain removal. Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 2019, pp. 8022–8031.
41. Halder, S.S.; Lalonde, J.F.; Charette, R.d. Physics-based rendering for improving robustness to rain. Proceedings of the IEEE/CVF
International Conference on Computer Vision, 2019, pp. 10203–10212.
42. Michaelis, C.; Mitzkus, B.; Geirhos, R.; Rusak, E.; Bringmann, O.; Ecker, A.S.; Bethge, M.; Brendel, W. Benchmarking robustness
in object detection: Autonomous driving when winter is coming. arXiv preprint arXiv:1907.07484 2019.
43. Li, X.; Kou, K.; Zhao, B. Weather GAN: Multi-Domain Weather Translation Using Generative Adversarial Networks. arXiv
preprint arXiv:2103.05422 2021.
44. Sabzi, S.; Abbaspour-Gilandeh, Y.; Javadikia, H. Machine vision system for the automatic segmentation of plants under different
lighting conditions. Biosystems Engineering 2017, 161, 157–173.
45. Zhang, Y.; Song, S.; Yumer, E.; Savva, M.; Lee, J.Y.; Jin, H.; Funkhouser, T. Physically-based rendering for indoor scene
understanding using convolutional neural networks. Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2017, pp. 5287–5295.
46. Jarraud, M. Guide to meteorological instruments and methods of observation (WMO-No. 8). World Meteorological Organisation:
Geneva, Switzerland 2008, 29.
47. Seck, I.; Dahmane, K.; Duthon, P.; Loosli, G. Baselines and a datasheet for the Cerema AWP dataset. arXiv preprint arXiv:1806.04016
2018.
48. Jokela, M.; Kutila, M.; Pyykönen, P. Testing and validation of automotive point-cloud sensors in adverse weather conditions.
Applied Sciences 2019, 9, 2341.
49. Koschmieder, H. Theorie der horizontalen Sichtweite. Beitrage zur Physik der freien Atmosphare 1924, pp. 33–53.
50. Barbrow, L. International lighting vocabulary. Journal of the SMPTE 1964, 73, 331–332.
51. Gruberº, M.B.T.; Mannan, F.; Kraus, F.; Ritter, W.; Dietmayer, K.; Heide, F. Seeing Through Fog Without Seeing Fog: Deep
Multimodal Sensor Fusion in Unseen Adverse Weather (Supplemental Material).
52. You, Y.; Wang, Y.; Chao, W.L.; Garg, D.; Pleiss, G.; Hariharan, B.; Campbell, M.; Weinberger, K.Q. Pseudo-lidar++: Accurate depth
for 3d object detection in autonomous driving. arXiv preprint arXiv:1906.06310 2019.
53. Chang, J.R.; Chen, Y.S. Pyramid stereo matching network. Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition, 2018, pp. 5410–5418.
54. Shi, S.; Wang, X.; Li, H. Pointrcnn: 3d object proposal generation and detection from point cloud. Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition, 2019, pp. 770–779.
55. Wang, Y.; Chao, W.L.; Garg, D.; Hariharan, B.; Campbell, M.; Weinberger, K.Q. Pseudo-lidar from visual depth estimation:
Bridging the gap in 3d object detection for autonomous driving. Proceedings of the IEEE/CVF Conference on Computer Vision
and Pattern Recognition, 2019, pp. 8445–8453.
56. Mayer, N.; Ilg, E.; Hausser, P.; Fischer, P.; Cremers, D.; Dosovitskiy, A.; Brox, T. A large dataset to train convolutional networks for
disparity, optical flow, and scene flow estimation. Proceedings of the IEEE conference on computer vision and pattern recognition,
2016, pp. 4040–4048.
57. Chen, X.; Ma, H.; Wan, J.; Li, B.; Xia, T. Multi-view 3d object detection network for autonomous driving. Proceedings of the IEEE
conference on Computer Vision and Pattern Recognition, 2017, pp. 1907–1915.
58. Park, J.; Joo, K.; Hu, Z.; Liu, C.K.; Kweon, I.S. Non-local spatial propagation network for depth completion. arXiv preprint
arXiv:2007.10042 2020, 3.
59. Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. International
journal of computer vision 2010, 88, 303–338.

3D Object Detection in Foggy Weather Conditions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

3D Object Detection in Foggy Weather Conditions

Uploaded by

Copyright:

Available Formats

Article

3D Object Detection in Foggy Weather Conditions

1 Cerema, Equipe-projet STI, 1 Avenue du Colonel Roche, 31400, Toulouse, France;

nguyen-anh-minh.mai@irit.fr (N.A.M.M.); alain.crouzil@irit.fr (A.C.)

Madrid, Spain; sergio.velastin@ieee.org

* Correspondence: mainguyenanhminh1996@gmail.com; Tel.: +33-7585-55539 (N.A.M.M.)

Version August 13, 2021 submitted to Sensors https://www.mdpi.com/journal/sensors

(a) Original sample (b) Foggy sample

121 2.2. Perception in Adverse Weather Conditions

142 3.1. Fog phenomenon

160 3.2. Fog rendering

177 3.2.1. Camera in fog

I f oggy (u, v) = t(u, v) Iclear (u, v) + (1 − t(u, v)) L, (1)

185 3.2.2. LiDAR in fog

L f oggy (u, v) = t(u, v) Lclear (u, v), (4)

195 3.3. 3D object detection algorithm

∑ |d(u, v) − D (u, v)|, (5)

Pseudo point cloud Depth map prediction

3D bounding boxes prediction 3D Convolution ResNet Block Up-Projection Upsample

Figure 3. Overview of the SLS-Fusion framework.

(depth) Zc = D (u, v) (6a)

258 4.1.2. Evaluation metrics

pinterp (r ) = max p(e

µ = 50.0, σ = 17.5 µ = 50.1, σ = 17.3

(a) Training set (b) Validation set

264 4.2. Experimental protocols

284 4.3. Implementation details

316 5.1. SLS-Fusion algorithm evaluation in foggy conditions

APBEV /AP3D @0.5

APBEV /AP3D @0.7

360 5.2. Comparison with the baseline method

Trained on Clear Trained on Clear + Multifog

APBEV /AP3D @0.5

APBEV /AP3D @0.7

You might also like