Professional Documents
Culture Documents
6, JUNE 2021
Abstract—In recent years, the neural network (NN) has shown etc. [4]–[6]. These factors may introduce unexpected varia-
its great potential in image recognition tasks of autonomous tions to the real-world images, which are beyond the limited
driving systems, such as traffic sign recognition, pedestrian detec- training data and can cause considerable accuracy degradation.
tion, etc. However, theoretically well-trained NNs usually fail
their performance when facing real-world scenarios. For exam- As shown in Fig. 1(a), traffic sign examples under adverse
ple, adverse real-world conditions, e.g., bad weather and lighting weather conditions are wrongly classified by a well-trained
conditions, can introduce different physical variations and cause model. Such situations can cause severe safety issues, e.g.,
considerable accuracy degradation. As for now, the generaliza- several autonomous driving accidents due to the self-driving
tion capability of NNs is still one of the most critical challenges system error have already been reported during the last few
for the autonomous driving system. To facilitate the robust image
recognition tasks, in this work, we build the RobuTS dataset: a years [7]–[10].
comprehensive Robust Traffic Sign Recognition dataset, which The current autonomous driving systems’ inability with
includes images with different environmental variations, e.g., the real-world variations demonstrates the functionality flaw
rain, fog, darkening, and blurring. Then to enhance the NN’s gen- of state-of-the-art NNs, i.e., the insufficient generalization
eralization capability, we propose two generalization-enhanced capability. Generalization describes a model’s capability to
training schemes: 1) REIN for robust training without data in
adverse scenarios and 2) Self-Teaching (ST) for robust training extend its functionality from finite training cases into infi-
with unlabeled adverse data. The great advantages of such two nite unseen testing scenarios. To enhance the generalization
training schemes are they are data-free (REIN) and label-free ability of autonomous driving systems, many research works
(ST), thus effectively reducing the huge human efforts/cost of on- have emerged: Lim et al. [11] targeted at traditional data
road driving data collection, as well as the expensive manual data augmentation techniques, i.e., trying to include as much
annotation. We conduct extensive experiments to validate our
methods’ performance on both classification and detection tasks. full-annotated training examples under adverse weather con-
For classification tasks, our proposed training algorithms could ditions as possible, including rainy and cloudy images; and
consistently improve model performance by +15%–25% (REIN) Tian et al. [7] proposed an NN verification and testing system.
and +16%–30% (ST) in all adverse scenarios of our RobuTS By large amounts of simulated scenarios, they could help find
datasets. For detection tasks, our ST could also improve the detec- existed corner cases that autonomous driving systems may
tor’s performance by +10.1 mean average precision (mAP) on
Foggy-Cityscapes, outperforming previous state-of-the-art works fail to operate correctly; several other works aim to recog-
by +2.2 mAP. nize and remove the practical variations from the real-world
captured image, therefore restoring theoretical clean testing
Index Terms—Autonomous driving, deep neural network (NN),
robust image recognition. scenarios [4], [5], [12], [13].
Although these proposed works help alleviate the problems
to some extent, they are mainly focusing on “compensating”
I. I NTRODUCTION the NN generalization ability with large amounts of aux-
N RECENT years, neural network (NN) with exceptional iliary data and post-training efforts. Rather than enhancing
I performance shows its great potential in autonomous driv-
ing systems [1]–[3]. However, an NN with 100% accuracy
the generalization ability, most of the methods fell into con-
structing case-specific or scenario-specific networks, which
in theoretical testing is still not ready-to-go with cars. Many can cover limited scenarios only and may still fail under new
practical factors will affect the NN performance, such as differ- unexpected conditions. By comparison, another common prac-
ent weathers, lighting conditions, camera/sensor discrepancy, tice of industry is to collect tremendous amounts of data by
thousands-of-hours on-road driving, hoping these training data
Manuscript received March 1, 2020; revised June 21, 2020; accepted would cover as many as possible corner cases to enable the
September 28, 2020. Date of publication October 23, 2020; date of current model’s generalization capability in the real world. However,
version May 20, 2021. This work was supported in part by the NSF under
Grant 1717775. (Corresponding author: Fuxun Yu.) such practice comes with not only huge human driving efforts
Fuxun Yu, Zhuwei Qin, and Xiang Chen are with the School of Electrical (in years) but also produce raw thousands-of-hours data, which
and Computer Engineering, George Mason University, Fairfax, VA 22030 demands huge human annotation cost.
USA (e-mail: fyu2@gmu.edu).
Chenchen Liu is with the Department of Computer Science and Electrical In this work, we focus on improving the generaliza-
Engineering, University of Maryland, Baltimore County, Baltimore, MD tion performance of two subtasks in autonomous driv-
21250 USA. ing systems: 1) robust traffic sign classification and
Di Wang is with Microsoft Cognition, Microsoft, Redmond, WA 98052
USA. 2) object detection tasks. Specifically, we have the following
Digital Object Identifier 10.1109/TCAD.2020.3033498 contributions.
0278-0070
c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
YU et al.: REIN RobuTS: ROBUST DNN-BASED IMAGE RECOGNITION IN AUTONOMOUS DRIVING SYSTEMS 1259
(b)
(a) (c)
Fig. 1. System overview. (a) DNN model’s accuracy can degrade significantly in adverse weathers. We propose two training schemes to enhance model
generalization in adverse weathers targeting at two scenarios: (b) with only clean training data available: REIN and (c) with extra unlabeled training data in
adverse conditions: ST.
1) First, we build a comprehensive RobuTS dataset to facil- our ST could also bring +10.1 mean average precision (mAP)
itate the Robust Traffic Sign recognition research. The performance improvement, outperforming recent state-of-the-
dataset contains a large amount of traffic sign images art works by +2.2 mAP. Besides, the REIN and ST training
under four different weather conditions with varied method requires either no extra training data or extra anno-
variation intensities, e.g., rain, fog, day/night lighting, tations in these scenarios, which can save a lot of on-road
blurring, etc. Dataset will be open sourced for research driving data collection cost, or huge human annotation efforts.
purposes. To interpret the improvement of our training method, we also
2) We benchmark a state-of-the-art NN-based traffic sign conduct gradient analysis and NN visualization, showing that
classifier on our RobuTS dataset, which effectively helps our generalization-enhanced model could better capture the
indicate the existed weakness and performance flaw of main features of image content, which could be the potential
the well-trained classifier, giving valuable feedback on reason for our generalization improvement.
the potential enhancement directions.
3) We analyze the influence of different practical variations II. P RELIMINARY
and summarize these practical variation models into one
In this section, we briefly review the background for deep
unified model. Guided by the unified model, we propose
NNs applied in autonomous driving systems and introduce sev-
REIN: a robust and efficient training approach, which
eral common practical variations of traffic sign images in the
could significantly improve the model generalization but
on-road driving scenarios.
without the needs to utilize extra data during training.
4) Considering the potentially large amounts of unlabeled
data available, we also propose the ST algorithm, which A. Neural Networks in Autonomous Driving
could combine the labeled clean data and the unlabeled There are several major subtasks in autonomous driving
data under adverse conditions to enhance the model’s systems, including image classification tasks, like traffic sign
generalization capability during training; classification; object detection tasks, like vehicle/pedestrian
5) Finally, we implement our training methods and evalu- detection [14]; lane detection tasks [15]; etc. Recently with
ate the generalization enhancement on both our RobuTS the fast development of deep learning, NN-based models have
dataset and the common FoggyCityscapes detection been widely utilized and achieve great performance in such
benchmark, which demonstrates our methods’ effective- tasks [16], [17].
ness and huge potential in autonomous driving system. In this work, we take both the image classification and
Experiments show that on our RobuTS classification dataset object detection subtasks in autonomous driving systems as
(rainy, fogy, darkening, and bokeh/motion blurring), the our target applications. Take the classification task as an exam-
performance of a well-trained model with 96% accuracy can ple, convolutional NN is the most popular model due to its
dramatically degrade by −40%–60%. In such cases, both exceptional performance in image classification tasks [18].
REIN and ST training method could bring model signifi- Currently, in clean testing scenarios, the state-of-the-art traffic
cant generalization improvement, achieving consistent better sign classification model [19] could usually achieve over 99%
performance and bringing +15%–25%, +16%–30% accu- accuracy on Germany Traffic Sign recognition Benchmark
racy improvement in all practical scenarios. Meanwhile, on (GTSRB) [20]. Such performance is considered as the pri-
object detection, such as vehicle/pedestrian detection tasks, mary achievement of the NN in autonomous driving systems.
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
1260 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 40, NO. 6, JUNE 2021
However, such strong models often appear to have low A. Practical Variation Modeling With Traffic Signs
accuracy when applied to practical cases with unexpected 1) Rain Variation: Physically, raindrops are uniformly dis-
variations, so does the state-of-the-art detection model. tributed in space and the drop size follows the Marshal–Palmer
distribution for a given rain rate [4], [12]. During the camera
B. Performance Degradation With Practical Variations exposure time tcam , the drop with speed vrain covers a distance
of length lrain = tcam ∗vrain [12]. The color of raindrops is white
The input examples of real-world traffic signs usually con- with certain opacity α due to its motion blur [12]. Therefore,
tain various kinds of practical variations, such as rain [4], the rain variation could be modeled as
fog [5], different ambient light conditions [11], camera/sensor
discrepancy [21], etc. However, these unexpected practical Xrain = M(p) × Xorg + α(1 − M(p)) × Rain (1)
variations are usually beyond the clean training dataset, and
thus can easily corrupt the NN classification accuracy. For where Xorg and Xrain denote the original and the observed
example, previous works have demonstrated thousands of erro- image. M(p) denotes the raindrop position matrix with pixel
neous behaviors across three top-performing DNNs in the index p. And α is the raindrop opacity. ω is the sum of
Udacity self-driving car challenges due to such practical varia- nonzero positions in M(p), denoting the number of raindrops
tion [7], [22]. Once such erroneous behaviors are incurred, the and thus is used to simulate the different rainfall intensity. The
steering or speed modules can fail to work correctly and the synthesized rainfall images are shown in Fig. 2(a).
performance degradation would cause critical safety issues in 2) Fog Variation: Based on the atmospheric optics [13], the
real-world autonomous driving systems. Several autonomous observed color of a camera-captured image in the presence of
driving accidents due to the self-driving system error have fog/haze can be modeled as follows:
already been reported during the last few years [8]–[10], which
has raised customers’ great concern in autonomous driving Xfog = (1 − t(p)) × Xorg + t(p) × A (2)
security.
Based on the previous research work, several representative where A = (Ar , Ag , Ab )T is the global atmospheric light that
practical driving-variation scenarios can be categorized and are represents the ambient light in the atmosphere. Also, t(p) is
taken into consideration in our work. inversely proportional to the scene depth and fog intensity
1) Weather Conditions: Rain, fog/haze, snow, etc. [4], [5]. since fog and long distance will scatter and attenuate the light
2) Ambient Light Variation: Day/Night, cloudy, etc. [11] during transmission. The larger t(p) is, the image itself will
3) Camera Discrepancy: Camera aging, blurring, etc. [21]. be vaguer. Therefore, we use parameter t(p) to control the fog
4) Others: Camera perspective, occlusion, etc. [21]. intensity. The synthesized fogy images are shown in Fig. 2(b).
These environmental factors can cause different variations 3) Darkening Effect: One important factor for driving is
in the camera-captured street scene images. When such varia- enough ambient light. In our simulation, we change the image
tion intensity is high, the NN-based classification or detection brightness to simulate the darkening effect. In image process-
system can be highly likely to mispredict the captured images. ing, adjusting brightness equals to add a constant offset to each
In the next section, we take the classification task as an channel for the R,G,B fields of an image. Therefore to get a
example and choose four representative practical variations to darker image, we add a negative constant to every channel.
demonstrate the current flaw of NN-based systems. And then The ambient light condition can be modeled as
we introduce our RobuTS dataset, which could help evaluate
and demonstrate the influence of such practical variations on Xlight = Xorg ± β × A (3)
the NN-based classification system.
where β controls the darkening intensity. The finally syn-
thesized images with the darkening effect are shown in
Fig. 2(c).
III. ROBU TS: ROBUST T RAFFIC S IGN
R ECOGNITION DATASET 4) Bokeh/Motion Blurring: In the camera exposure, the
bokeh blur is one common phenomenon, which is the aes-
In this section, we introduce our RobuTS dataset for robust thetic quality of the blur produced in the out-of-focus parts of
traffic sign recognition. We choose four representative driv- an image produced by a lens [23]. Meanwhile, the fast moving
ing scenarios, i.e., the commonly seen rain and fog as the and car vibration will also introduce the blurring effect. In
weather cases, darkening effect as the ambient light case, and our simulation, we use the Gaussian blur to simulate the
bokeh/motion blurring as the camera discrepancy case. We blurring effect, which is generated by convolving an image
use image synthesizing to generate the new RobuTS dataset with a Gaussian kernel. The standard deviation δ controls the
with the seed images from GTSRB dataset [20]. To do so, we intensity of blurring. The blurring effect can be modeled as
detailedly model the four different practical variations based
on previous research and synthesize new traffic sign images Xblur = Gaussian(δ) ∗ Xorg (4)
with different variations and in varying intensities. After syn-
thesizing the new dataset, we then benchmark a state-of-the-art where Gaussian(δ) denotes the Gaussian kernel with zero
NN-based classifier on the new RobuTS test sets and demon- mean and standard deviation δ, which controls the varia-
strate the model’s poor generalization capability as it shows tion intensity. And ∗ denotes the convolution operation. The
significant performance degradation. synthesized blurred images are shown in Fig. 2(d).
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
YU et al.: REIN RobuTS: ROBUST DNN-BASED IMAGE RECOGNITION IN AUTONOMOUS DRIVING SYSTEMS 1261
Fig. 2. Practical variation modeling and synthesis effect in four scenarios: rain drops, fog variation, darkening effect, and bokeh/motion blurring. With the
highest variation intensity (0 → 1), the traffic sign images are still relatively clear to human drivers, but the NN-based classifier’s accuracy drops significantly
by 40%–60% in all four scenarios.
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
1262 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 40, NO. 6, JUNE 2021
IV. REIN: ROBUST AND E FFICIENT T RAINING W ITHOUT To conclude, the network’s resistance of small variation x
C OLLECTING A DVERSE DATA is inversely proportional to its gradients magnitude. That is to
In this section, we first analyze the underline influencing say, NN with high generalization ability should have as small
mechanism of these variations to the NN. We then define a uni- first-order gradients as possible. Therefore, in the next sec-
fied model by utilizing the first-order gradient to evaluate the tion, we will introduce gradient regularization, i.e., regulating
variation influence, as well as the NNs’ generalization capa- the network’s gradient magnitude to be small in the training
bility. Guided by it, we formulate a new training loss function process for generalization-enhancement purpose.
to enhance the model generalization by regulating the first-
order gradient magnitude. Also, the double-backpropagation B. Generalization-Enhanced Training Loss Formulation
technique [24] is used in the training process for second- As aforementioned, large first-order gradients might amplify
order gradients calculation. Finally, we give an overview of small variations and then influence the classification results.
our generalization-enhanced training algorithm. Therefore, in the training process, we should enforce network
gradients to be as small as possible while maintaining the same
A. Theoretical Generalization Problem Abstraction accuracy. As loss function could be used in training process to
Suppose we have a natural image input x, e.g., a 32×32×3 update the network parameters to satisfy certain constraints,
traffic sign image. The practical variation could be denoted we introduce the gradient loss penalty Lgrad into the network
by x. Note that x is usually small/moderate and will not training loss function
influence the main pattern of original image x. The NN could
Lθ (x) = Lce + c · Lgrad
be seen as a large-scale nonlinear function Fθ (x) composed
of massive volumes of neurons, where θ is network’s weights ∂Regθ (x)
where Lgrad = Lnorm (8)
and bias values. As a result, the NN classification failure cases ∂x
could be denoted as where θ is network parameters, Lce is normal cross-entropy
Fθ (x + x) = Fθ (x) (5) loss and Lgrad is the gradient loss penalty we added.
∂Regθ (x)/∂x is the Jacobian matrix of our regularizer function
which means, added with some variation x, the classification Regθ (x), and Lnorm could be l1 , l2 , or l∞ norm. The coefficient
result of NN changes to a wrong label which is different from c here is used to adjust the gradient loss penalty strength so
the original correct prediction. that it will not harm the accuracy in the training process.
According to first-order Taylor expansion, we can linearly As for the regularizer function Regθ (x), it could be the soft-
approximate the NN function F(x + x) at the neighborhood max outputs, or the logit outputs of the network function,
of x to the following format: Fθ (x). Here, we choose to use the logit outputs since it could
∂Fθ (x) preserve most of the useful information before the softmax
Fθ (x + x) = Fθ (x) + × x (6) operation. The regularizer function is defined as follows:
∂x
where ∂Fθ (x)/∂x is the first-order gradients of function Fθ (x). Regθ (x) = max{Z(x)i , i = t} − Z(x)t (9)
Combining (5) and (6), we could then calculate that the
influence on NN brought by x is where Z(·) is the logits output before the softmax layer, and t
∂Fθ (x) is the input x’s correct label. This regularizer function Regθ (x)
Effect(x) = Fθ (x + x) − Fθ (x) = × x. (7) could be interpreted as the difference between the maximum
∂x
wrong logits and the correct ones. As long as we could main-
From the conclusion in (7), we show that the influence of tain the gradients of this function to be small, the small
variation x on NN Fθ (x) can be approximated to be linearly variation x s influence on this function would be limit to be
correlated with the gradient ∂Fθ (x)/∂x. In other words, gra- smallest. That is to say, the wrong logits will not easily exceed
dients could be seen as the amplification coefficients of small the correct ones and thus will not cause the wrong classifica-
variation x. The larger the gradients are, the more influence tion. Therefore, the network’s resistance to small variations
will be brought by the small variations. x could be improved to the maximum extent.
Fig. 3 illustrates the relationship between gradients and the
influence of x by showing two NN’s decision boundaries.
The network with larger first-order gradients [Fig. 3(a)] will C. Gradients Descent With Double Backpropagation
form a function surface with steeper slopes (gradients). For General network training process is usually done by the
one natural traffic sign image of class Priority Road, if we stochastic gradient descent algorithm. As in normal gradi-
add some small variation into the image, the network with ent descent, every parameter is updated using the following
larger gradients are more susceptible to misclassify it: the rain equation:
variation ’s influence might push the output across the decision
∂Lce
boundary (violet surface), and then change the classification θ = θ − lr · (10)
∂θ
results to Yield. In contrast, NN with smoother decision bound-
ary [Fig. 3(b)] is more resistant to such small variations, where lr is the learning rate. However, in our defined train-
because all its decision surface neighbors are still in the same ing loss function, first-order gradient penalty Lgrad is included.
class. As a result, different from normal gradient descent problems,
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
YU et al.: REIN RobuTS: ROBUST DNN-BASED IMAGE RECOGNITION IN AUTONOMOUS DRIVING SYSTEMS 1263
Fig. 3. REIN: robust training with gradient regularization. An NN model should have a smooth decision boundary so that small variations (e.g., rains) cannot
change its prediction easily. Through gradient regularization, i.e., regulating gradient magnitudes in training loss, we could improve NN’s decision boundary
smoothness, thus enchaining NN’s generalization capability.
introducing the gradient loss into the training loss needs us to as natural training without sacrificing the training efficiency,
solve a second-order gradient computation problem. as we will show in the later experiments.
To compute the second-order gradients, we adopt the
double-backpropagation technique as the solution [24]. In dou-
V. S ELF -T EACHING : ROBUST T RAINING W ITH
ble backpropagation, we first compute the cross-entropy and
U NLABELED P RACTICAL DATA
gradient loss by forward propagation, with the gradients then
being calculated by backpropagation. Then, to minimize the The robust training algorithm, REIN, could achieve general
gradient loss Lgrad , we need to calculate the second-order robustness against various variations without using any extra
derivative of Lgrad . Therefore, a second backpropagation oper- training data with practical variations. In many real cases,
ation is performed to compute the second-order derivative the raw practical training data can sometimes be available
of Lgrad on θ . After this, the weights of NNs are updated through onboard recording cameras and sensors. For exam-
according to the gradient descent equation ple, to facilitate the auto-pilot system development, Tesla has
announced its data sharing policy, including collecting driving
∂Lce ∂Lnorm ∂Regθ (x)/∂x scene videos on customers’ cars in order to improve self-
θ = θ − lr · − lr · (11)
∂θ ∂θ driving performance [25]. With millions of customers driving
where −(∂Lce /∂θ) is the first-order gradients to minimize on road, collecting large amounts of raw data within all kinds
the cross-entropy, and −([∂Lnorm (∂Regθ (x)/∂x)]/∂θ) is the of scenarios becomes simple. However, the labeling cost for
second-order partial derivative to minimize the gradients loss. these raw but complex driving data can be very huge. This
leaves a huge amount of unlabeled data within all practi-
D. Generalization-Enhanced Training Overview cal scenarios unused, which has great potential to improve
self-driving system’s performance.
In summary, our generalization-enhanced training method Targeting at this problem, in this section, we propose a novel
introduces a new gradient penalty loss in the normal training training method to utilizing such unlabeled data, ST. By our
procedure to regulate the gradient magnitude to be as small novel ST algorithm, we could further improve the NN’s robust-
as possible. And the second-order gradients can be solved by ness against practical variations by utilizing unlabeled practical
adopting the double-backpropagation algorithm. As shown in data.
Fig. 3, during the REIN training, model’s large gradients will
be penalized through the whole training procedure, so the final
formed decision boundary will be smoother than the naturally A. Self-Teaching Overview
trained model. At the meantime, the model accuracy could be The ST algorithm originates from semisupervised learning
well preserved by controlling the penalty coefficients c. As a problem where the data labeling is incomplete, i.e., partial data
result, we could effectively improve the network’s generaliza- is labeled but the others are unlabeled [26], [27]. The main
tion ability among practical varied scenarios. Furthermore, one idea is to first use labeled data to train the learning system,
advantage of our algorithm is that our training method does not and then use the trained system itself as a teacher to (pseudo-
need to collect scenario-specific training data. In autonomous )annotate the unlabeled data. Finally, by combining data with
driving systems, this could save thousands miles of on-road real labels and data with pseudo-labels, the learning system
driving data collection, which has great practical significance. could be retrained to achieve better performance. This process
As for the extra cost of our training algorithm compared to could be iterated multiple times until convergence.
natural training, except that double backpropagation costs one Problem Setting: In our problem, for the labeled training
more backpropagation per training iteration, no other training data, we assume all the clean traffic sign images in GTRSB
overhead is introduced into our training process. Meanwhile, are available with labels, which are denoted as (Xs , Ys ). In
we could maintain the convergence speed at the same order contrast, all the traffic sign images with synthesized practical
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
1264 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 40, NO. 6, JUNE 2021
(a) (b)
Fig. 4. Overview of the ST algorithm. (a) In the initial step, we first use the labeled clean data to train an initial annotator model and predict the initial
pseudo-labels on the adverse data. (b) Then, both clean and adverse data will be involved into ST. During the iterative ST, the model performance can be
improved. Therefore, the pseudo-labels will also be relabeled by the better annotator to further improve the ST performance.
variations are assumed to be realistic data collected on-road both data sources (Xs , Ys ) and (Xt , Yt ) to improve classifier F s
but without annotations. This part of data is denoted as (Xt , ?) performance on such new data
where ? denotes unknown labels. Our major task is to train
an effective and generalizable NN-based classifier F that can Minimize −ys logF(xs ) + (1 − ys )logF(xs )
perform classification tasks well on both regular data, Xs , and + −yt logF(xt ) + 1 − yt logF(xt ). (14)
data with variations, Xt .
Algorithm Overview: Fig. 4 shows the overview of our ST Through the combined learning process, this would allow
algorithm, which is an iterative training process. In the first the classifier to extra knowledge from not only regular data
iteration, we use the labeled data (Xs , Ys ) to train the classifier but also practical data with various scenario variations, greatly
with regular cross-entropy loss, i.e., enhancing its generalization capability and improving accuracy
in these unlabeled practical scenarios.
N
The above process completes the first iteration of the ST
Minimize −ys logF(xs ) + (1 − ys )logF(xs ) (12) algorithm. Since we get a better classifier F than the initial
i
one, the more accurate pseudo-labels could be also obtained
where N is the number of images with labels. by applying the new classifier to reannotate. Therefore, the ST
After the initial training process, we could get a well-trained algorithm reiterates the annotation process and reconduct the
classifier F, which will then be used to generate pseudo- data-combined training to get increasingly better classifier and
labels Yt on the unlabeled raw data Xt . However, directly better pseudo-labels. The iteration process stops when achiev-
applying the classifier F onto the new data can generate erro- ing the upper bound performance. And the pseudocode of the
neous pseudo-labels due to the intense practical variations. ST algorithm is given in Algorithm 1.
Without precautions, these wrong pseudo-labels can signifi- As we can see, throughout the iterations, the classifier F
cantly hinder the following ST process. Therefore, we conduct performs as the teacher to generate pseudo-labels as its super-
confidence-based thresholding to choose the most confident vision, while it also acts as the student who relearns that from
pseudo-labels as they are most likely to be the correct ones. the real-labeled and pseudo-labeled supervision. Therefore, it
Specifically, the pseudo-label choosing criteria is depending is named the ST algorithm. ST could greatly boost the classi-
on whether the pseudo label’s largest confidence is higher than fier performance by involving large amounts of realistic data
a predefined confidence threshold θ . If so, it is labeled as the with practical variations. Meanwhile, since the training data
class with the highest confidence; otherwise, it is not annotated already covers as many as practical scenarios, the generaliza-
and remains unlabeled in the following iteration: tion capability of the classifier could also be greatly enhanced,
as we will show in the later experiments.
arg maxi (pi ), max(pi ) > θ
yt = (13)
none, otherwise
B. Self-Teaching Optimizations
where pi is the predicted probability of each class i for image ST improves the classifier’s performance mainly by training
xt , and yt is the output pseudo-label. with new images and the corresponding generated pseudo-
After the pseudo-label annotation process, we can get the labels. However, as we know that, the pseudo-labels generated
extra pseudo-labeled data (Xt , Yt ) from all possible scenarios. by the classifier inevitably contain certain errors. Even though
Here, Yt denotes the pseudo-labels generated by the previous with the confidence-based thresholding, certain erroneous
annotating process. With these extra images with various prac- pseudo-labels can still exist. When applying naive ST training,
tical variations, we could then conduct retraining by combining these wrong labels can cause error accumulation: During the
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
YU et al.: REIN RobuTS: ROBUST DNN-BASED IMAGE RECOGNITION IN AUTONOMOUS DRIVING SYSTEMS 1265
Algorithm 1 ST Algorithm Therefore, when combined two data sources during the
1: procedure I NITIALIZATION (A) training process, we conduct imbalanced sampling, i.e., to
2: Input: Clean data [xs , ys ], Adverse data [xt , ?] oversample the real labels but undersample the pseudo-labels
3: Initialize Model F in every mini-batch. Thus, we could ensure that most image
4: while not converge do samples in every mini-batch are accurately labeled and can
5: Train F using [xs , ys ]; produce useful gradient information. By imbalanced sampling
6: Update F according to Eq. (12); optimization, the ST process could achieve better performance
7: end while than naive random sampling, as we will show later.
8: Get pseudo label yt using F(xt ) by Eq. (13) Progressive Confidence Thresholds: The last optimization
9: Return [xt , yt ] we conduct is on the confidence threshold. Based on our
10: end procedure assumption, the confidence threshold θ controls the pseudo-
11: procedure I TERATIVE S ELF -T EACHING (B) label quality. A higher θ can produce more accurate pseudo-
12: Input: Clean data [xs , ys ], Adverse data [xt , yt ] labels and vice versa. But higher θ also cause another issue
13: Initialize Model F that most data will remain unlabeled since their pseudo-label
14: while not converge do confidence cannot achieve the threshold based on (13).
15: Train F using [xs , ys ] and [xt , yt ]; To tradeoff the pseudo-label quality and amount, we thus
16: Update F according to Eq. (14); propose progressive confidence thresholds. In the beginning
17: end while iterations, we set high confidence thresholds θ to ensure the
18: Update pseudo label yt using F (xt ) by Eq. (13) high label quality. With more training iterations, classifier F
19: Retrain model F and iterate until convergence. could progressively learn to classify more unlabeled data cor-
20: Return model F rectly. Thus, in the later iterations, we could then lower the
21: end procedure thresholds to include larger amount of new training data to
further enhance the model performance.
Overall, REIN and ST are both robust training algorithms
toward enhancing the NN-based traffic sign classification
following training process, some wrong pseudo-labels are used system’s generalization capability. Compared to the REIN
as ground-truth labels and classifier F will further reinforce training algorithm which does not require any extra train-
their wrong prediction results on such images. Even worse, ing data in adverse conditions, the self-training provides
more classification errors will appear on similar images due another alternative, i.e., utilizing the potentially available unla-
to the wrong supervision. beled training data to enhance the generalization, which could
To avoid the influence of such error accumulation, we get higher performance but also avoids the huge annotation
further propose some optimizations on ST: 1) curricu- cost. In the next two sections, we evaluate the performance
lum learning; 2) imbalanced sampling; and 3) progressive of both algorithms and demonstrate their great performance
confidence-based thresholding. improvement in terms of generalization enhancement.
Curriculum Learning: To avoid the overwhelming effects
produced by erroneous pseudo-labels, we first optimize the ST
algorithm following the practice of curriculum learning. When VI. E XPERIMENTS AND E VALUATION FOR REIN:
combining the pseudo-labeled dataset into ST, we use an easy- ROBUST T RAINING W ITHOUT P RACTICAL DATA
to-hard way to involve the pseudo-labeled training data. That In this section, we evaluate the performance of our robust
is, we first introduce images with smaller-intensity variations training method REIN on traffic sign classification tasks.
since their pseudo-label accuracy is higher than the ones with Experiment Setup: We use the GTRSB dataset as the source
more intense variations. When the model has learned to clas- dataset for model training in the experiments. Before train-
sify these images well, the model thus can also generate more ing, data augmentation techniques, including scaling, random
accurate pseudo-labels on the images with higher-intensity cropping, and rotating, are used. The base model we use is
images. Therefore, we will gradually introduce more images the convolutional NN similar to what we used in Section III.
with higher-intensity variations, i.e., a curriculum ST prac- Two raw models are then trained by natural training and
tice. As we will show later, under certain scenarios when our generalization-enhanced training method REIN, using
pseudo-label accuracy is relatively low, the curriculum learn- Momentum Optimizer with 5e-3 learning rate in Tensorflow-
ing is essential to ensure the significant improvement of the 1.6 [28]. All other training configurations are kept the same
ST algorithm. for a fair comparison. The two trained models are then
Imbalanced Sampling: After following the curriculum learn- named natural model and generalization-enhanced model,
ing practice, we still have both data sources: 1) the data with which achieve 95.8% and 94.4% accuracy, respectively. For
accurate real labels and 2) the data with pseudo-labels, which the accuracy of clear testing images, the robust training method
may not be perfectly accurate. And all real and pseudo-labels is slightly lower (−1.4%) than regular model training due to
are sampled randomly between two data sources. However, the gradient penalty, which we will discuss later. But under
when the pseudo-labels contain too many errors, the gradient all practical scenarios with variations, we will demonstrate
descent on one mini-batch can produce overwhelming wrong that our method could greatly outperform the natural models
gradients that hurt the training performance. by +15%–25%.
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
1266 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 40, NO. 6, JUNE 2021
Fig. 5. REIN effectiveness evaluation: natural model versus generalization-enhanced model by REIN. With four different kinds of practical variations on traffic
sign images, our generalization-enhanced model consistently outperforms the natural model by large margins, achieving +15%–25% accuracy improvement
at the highest variation intensity.
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
YU et al.: REIN RobuTS: ROBUST DNN-BASED IMAGE RECOGNITION IN AUTONOMOUS DRIVING SYSTEMS 1267
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
1268 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 40, NO. 6, JUNE 2021
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
YU et al.: REIN RobuTS: ROBUST DNN-BASED IMAGE RECOGNITION IN AUTONOMOUS DRIVING SYSTEMS 1269
Fig. 10. ST evaluation. During testing with four different kinds of practical variations on traffic sign images, our ST model consistently outperforms natural
model by large margins of +16%–30% at the highest variation intensity. (a) Rain variation. (b) Fog variation. (c) Darkening effect. (d) Bokeh/motion blur.
Fig. 11. Ablation study for curriculum optimization in ST. As the Fig. 12. Ablation study for imbalanced sampling optimization in ST. As
result shows, there is a 10% accuracy drop without curriculum optimization the result shows, there is a 8% accuracy drop without imbalanced sampling
during ST. during ST.
and w/o curriculum learning). Clearly, the ST with curriculum to 0.7 during the process of ST, which shows certain better
optimization (yellow line) outperforms the naive version (blue performance (+2%–3%) than the constant thresholding.
line) by at most +10% accuracy, demonstrating the necessity 3) Self-Teaching Algorithm Overhead Analysis: The ST
of curriculum optimization. overhead lies in the extra efforts of the model training on the
The reasons of performance difference are as follows: For pseudo-labeled data. As we adopt the imbalanced sampling
ST without curriculum learning, we directly pseudo-label all optimization, the sampling ratio of labeled data and pseudo-
the unlabeled data and combine them into the training dataset. labeled data also affects the training overhead. In our experi-
These data contains images with high-intensity rain variations, ments, we adopt the sampling ratio of 2:1 between labeled and
and thus their corresponding labeling accuracy at such rain unlabeled data in each iteration. Therefore, the per-epoch train-
intensity level is only around 30%–40%. As a result, these bad ing efforts are around three times than the default one. Since
pseudo-labels greatly hinder the performance improvement of the model training can be mostly done offline, we consider
ST. In contrast, ST with curriculum learning can gradually such overhead is acceptable in exchange for the classification
involve the data following an easy-to-hard way. Therefore, the accuracy improvement.
ST model can learn to predict these images with variations more
and more accurately, and thus achieving better performance. C. Detection Enhancement Under Practical Variations
2) Improvement of Imbalanced Sampling Optimization: The ST method can also improve the performance of detection
We conduct similar ablation study for imbalanced sampling tasks in autonomous driving. We apply our method on multiclass
optimization, and the results are shown in Fig. 12. As the object detection to demonstrate our performance enhancement
results show, the imbalanced sampling also provides with +8% on two widely used detection datasets: 1) Cityscapes [39] and
accuracy improvement than the naive version. 2) Foggy-Cityscapes [40]. As the name shows, the Cityscapes
Specifically, the imbalanced sampling is implemented as fol- dataset contains images of street scenes and includes eight
lows. Before combining the labeled and unlabeled data, we detection classes, such as Pedestrian, Rider, Car, etc. The Foggy-
control their composition ratio in the training dataset, e.g., Cityscapes dataset includes the similar street scene images but
2:1 (real-labeled data: pseudo-labeled data), as more clean with certain fog variations.
data with accurate labels are needed to stabilize the gradi- For implementation, we build our algorithm on top
ents of each mini-batch. Also for all ST experiments, we use of cyclegan for style translation [36], and then use the
a progressive confidence thresholding strategy to control the labeled Cityscapes and unlabeled Foggy-Cityscapes raw
pseudo-label quality, i.e., decreasing the thresholds from 0.9 images as training data for ST. The test data is from the
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
1270 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 40, NO. 6, JUNE 2021
TABLE I
D ETECTION P ERFORMANCE E NHANCEMENT ON F OGGY-C ITYSCAPES DATASET. I NPUT IMAGES A RE R ESIZED W ITH 512 OR 600 P IXELS AS THE
S HORTER S IDE FOR FAIR C OMPARISONS W ITH D IFFERENT S TATE - OF - THE -A RT W ORKS
Foggy-Cityscapes testset for performance evaluation in the [2] B. Wu, A. Wan, F. Iandola, P. H. Jin, and K. Keutzer, “Squeezedet:
adverse foggy condition. We use the faster-rcnn detector [41] Unified, small, low power fully convolutional neural networks for real-
time object detection for autonomous driving,” in Proc. IEEE Conf.
and the same input size settings with previous works [34], Comput. Vis. Pattern Recognit. Workshops, 2017, pp. 446–454.
[35], [37], [38]. For the performance evaluation, we report the [3] Z. Chen and Z. Chen, “RBNet: A deep neural network for unified road
mAP of all classes following the settings in the aforemen- and road boundary detection,” in Proc. Int. Conf. Neural Inf. Process.,
2017, pp. 677–687.
tioned previous works for fair performance comparison. The
[4] D. Hospach, S. Mueller, W. Rosenstiel, and O. Bringmann, “Simulation
oracle performance denotes the model’s mAP trained on fully of falling rain for robustness testing of video-based surround sensing
annotated foggy-cityscapes train set, which can be regarded as systems,” in Proc. Design Autom. Test Eur. Conf. Exhibit. (DATE), 2016,
the upper bound performance. pp. 233–236.
[5] R. Gallen, A. Cord, N. Hautiére, É. Dumont, and D. Aubert, “Nighttime
As Table I shows, our ST method improves the base- visibility analysis and estimation method in the presence of dense fog,”
line detector’s performance by +10.1 mAP in 512 × 1024 IEEE Trans. Intell. Transp. Syst., vol. 16, no. 1, pp. 310–320, Feb. 2015.
resolution. Compared with previous state-of-the-art works, [6] H. H. Aghdam and E. J. Heravi, Guide to Convolutional Neural
our method also achieves better detection performance, e.g., Networks: A Practical Application to Traffic-Sign Detection and
Classification. Cham, Switzerland: Springer, 2017.
+2.0 mAP and +2.2 mAP than [36] and [38] in 512 × 1024
[7] Y. Tian, K. Pei, S. Jana, and B. Ray “DeepTest: Automated test-
and 600 × 1200 resolutions. ing of deep-neural-network-driven autonomous cars,” 2017. [Online].
Overall, we show that our ST algorithm could bring Available: arXiv:1708.08559.
+16%–30% accuracy improvement for classification tasks and [8] A. Lubben. (2018). Self Driving Uber Killed a Pedestrian as Human
Safety Driver Watched. [Online]. Available: https://www.vice.com/en-
+10.1 mAP for detection tasks in adverse conditions. In prac- us/article/kzxq3y
tice, the requirement of unlabeled data is also easy to fulfill [9] J. Horwitz and H. Timmons. (2019). There Are Some Scary Similarities
by safe data sharing policy without compromising user pri- Between Tesla’s Deadly Crashes Linked to Autopilot. [Online].
vacy [25]. Therefore, we believe the ST algorithm has its great Available: https://qz.com/783009/
[10] J. Green. (2018). Tesla: Autopilot Was on During Deadly Mountain View
potential in enhancing NN model’s generalization capability in Crash. [Online]. Available: https://www.mercurynews.com/2018/03/30
autonomous driving systems. [11] K. Lim, Y. Hong, Y. Choi, and H. Byun, “Real-time traffic sign recog-
nition based on a general purpose GPU and deep-learning,” PLoS ONE,
VIII. C ONCLUSION vol. 12, no. 3, 2017, Art. no. e0173317.
[12] M. Nentwig and M. Stamminger, “Hardware-in-the-loop testing of com-
In this work, we first build a comprehensive RobuTS dataset puter vision based driver assistance systems,” in Proc. Intell. Veh. Symp.
including traffic sign images in four adverse weather condi- (IV), 2011, pp. 339–344.
tions, e.g., rainy, foggy, etc. Based on that, we benchmark [13] J.-H. Kim, W.-D. Jang, J.-Y. Sim, and C.-S. Kim, “Optimized contrast
NN-based classifiers and demonstrate their low generaliza- enhancement for real-time image and video dehazing,” J. Vis. Commun.
Image Represent., vol. 24, no. 3, pp. 410–425, 2013.
tion ability under practical variations. Then, we propose two [14] P. Dollár, C. Wojek, B. Schiele, and P. Perona, “Pedestrian detection: A
novel robust training schemes REIN and Self-Training target- benchmark,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2009,
ing at boosting model generalization ability in two scenarios: pp. 304–311.
1) without extra training data and 2) with unlabeled data in [15] C. Lee and J.-H. Moon, “Robust lane detection and tracking for real-
time applications,” IEEE Trans. Intell. Transp. Syst., vol. 19, no. 12,
adverse conditions. Experiments show that our proposed meth- pp. 4043–4048, Dec. 2018.
ods could greatly improve model’s intrinsic generalization in [16] W. Liu, S. Liao, W. Ren, W. Hu, and Y. Yu, “High-level semantic feature
both classification and detection tasks. detection: A new perspective for pedestrian detection,” in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit., 2019, pp. 5187–5196.
[17] Y. Hou, Z. Ma, C. Liu, and C. C. Loy, “Learning lightweight lane
R EFERENCES detection CNNS by self attention distillation,” in Proc. IEEE Int. Conf.
[1] D. Feng, L. Rosenbaum, and K. Dietmayer, “Towards safe autonomous Comput. Vis., 2019, pp. 1013–1021.
driving: Capture uncertainty in the deep neural network for Lidar 3D [18] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet:
vehicle detection,” in Proc. 21st Int. Conf. Intell. Transp. Syst. (ITSC), A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput.
2018, pp. 3266–3273. Vis. Pattern Recognit. (CVPR), 2009, pp. 248–255.
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.
YU et al.: REIN RobuTS: ROBUST DNN-BASED IMAGE RECOGNITION IN AUTONOMOUS DRIVING SYSTEMS 1271
[19] P. Sermanet and Y. LeCun, “Traffic sign recognition with multi-scale Zhuwei Qin received the B.S. degree from the
convolutional networks,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Tianjin University of Science and Technology,
2011, pp. 2809–2813. Tianjin, China, in 2014, and the M.S. degree
[20] J. Stallkamp, M. Schlipsing, J. Salmen, and C. Igel, “The German traffic from Oregon State University, Corvallis, OR, USA,
sign recognition benchmark: A multi-class classification competition,” in 2017. He is currently pursuing the Ph.D.
in Proc. IEEE Int. Joint Conf. Neural Netw., 2011, pp. 1453–1460. degree with the ECE Department, George Mason
[21] M.-Y. Fu and Y.-S. Huang, “A survey of traffic sign recognition,” University, Fairfax, VA, USA, under the supervision
in Proc. Int. Conf. Wavelet Anal. Pattern Recognit. (ICWAPR), 2010, of Prof. X. Chen.
pp. 119–124. His current research directions include deep neu-
[22] (2017). Udacity Self-Driving-Car Challenge. [Online]. Available: ral network compression, and interpretable deep
https://github.com/udacity/self-driving-car/tree/master/ neural network for mobile applications.
[23] J. Wu, C. Zheng, X. Hu, Y. Wang, and L. Zhang, “Realistic rendering
of bokeh effect based on optical aberrations,” Vis. Comput., vol. 26,
nos. 6–8, pp. 555–563, 2010.
[24] H. Drucker and Y. Le Cun, “Double backpropagation increasing gener-
alization performance,” in Proc. Int. Joint Conf. Neural Netw. (IJCNN),
vol. 2, 1991, pp. 145–150.
[25] Tesla. (2019). Tesla Data Sharing Privacy Policy. [Online]. Available: Chenchen Liu received the M.S. degree from
https://www.tesla.com/about/legal Peking University, Beijing, China, in 2013, and the
[26] F. Yu et al., “Unsupervised domain adaptation for object detection Ph.D. degree from the ECE Department, University
via cross-domain semi-supervised learning,” 2019. [Online]. Available: of Pittsburgh, Pittsburgh, PA, USA, in 2017.
arXiv:1911.07158. In 2017, she joined the Department of Electrical
[27] X. Zhu and A. B. Goldberg, “Introduction to semi-supervised learning,” and Computer Engineering, Clarkson University,
in Synthesis Lectures on Artificial Intelligence and Machine Learning, Potsdam, NY, USA. She is currently an Assistant
vol. 3. San Rafael, CA, USA: Morgan & Claypool, 2009, pp. 1–130. Professor with the Department of Computer Science
[28] M. Abadi et al., “TensorFlow: Large-scale machine learning and Electrical Engineering, University of Maryland
on heterogeneous distributed systems,” 2016. [Online]. Available: at Baltimore County, Baltimore, MD, USA. Her
arXiv:1603.04467. current researches include brain-inspired comput-
[29] I. Evtimov et al., “Robust physical-world attacks on deep learning ing system and security, machine learning, integrated circuits design, and
models,” 2017. [Online]. Available: arXiv:1707.08945. emerging nonvolatile memory technologies.
[30] F. Yu, Z. Qin, C. Liu, L. Zhao, Y. Wang, and X. Chen, “Interpreting
and evaluating neural network robustness,” in Proc. 28th Int. Joint Conf.
Artif. Intell. (IJCAI), 2019, pp. 4199–4205.
[31] F. Yu, C. Liu, Y. Wang, L. Zhao, and X. Chen, “Interpreting adversarial
robustness: A view from decision surface in input space,” 2018. [Online].
Available: arXiv:1810.00144. Di Wang received the B.E. degree in computer
[32] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural science and technology from Zhejiang University,
networks,” 2016. [Online]. Available: arXiv:1608.04644. Hangzhou, China, in 2005, the M.S. degree in
[33] A. Kurakin, I. Goodfellow, and S. Bengio, “Adversarial examples in the computer systems engineering from the Technical
physical world,” 2016. [Online]. Available: arXiv:1607.02533. University of Denmark, Lyngby, Denmark, in 2008,
[34] Y. Chen, W. Li, C. Sakaridis, D. Dai, and L. Van Gool, “Domain adaptive and the Ph.D. degree in computer science and engi-
faster R-CNN for object detection in the wild,” in Proc. IEEE Conf. neering from Pennsylvania State University, State
Comput. Vis. Pattern Recognit., 2018, pp. 3339–3348. College, PA, USA, in 2014.
[35] X. Zhu, J. Pang, C. Yang, J. Shi, and D. Lin, “Adapting object detectors He is currently a Principal Research Lead with
via selective cross-domain alignment,” in Proc. IEEE Conf. Comput. Vis. Microsoft, Redmond, WA, USA. He has authored
Pattern Recognit., 2019, pp. 687–696. more than 40 peer-reviewed papers. His research
[36] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image spans the areas of artificial intelligence, computer systems, computer archi-
translation using cycle-consistent adversarial networks,” in Proc. IEEE tecture, and energy-efficient system design and management.
Int. Conf. Comput. Vis., 2017, pp. 2242–2251. Dr. Wang received five best paper awards and two best paper nominations.
[37] K. Saito, Y. Ushiku, T. Harada, and K. Saenko, “Strong-weak distri-
bution alignment for adaptive object detection,” in Proc. IEEE Conf.
Comput. Vis. Pattern Recognit., 2019, pp. 6956–6965.
[38] R. Xie, F. Yu, J. Wang, Y. Wang, and L. Zhang, “Multi-level domain
adaptive learning for cross-domain detection,” in Proc. IEEE Int. Conf.
Comput. Vis. Workshops, 2019, pp. 3213–3219.
[39] M. Cordts et al., “The cityscapes dataset for semantic urban scene under- Xiang Chen (Member, IEEE) received the M.S. and
standing,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2016, Ph.D. degrees from the ECE Department, University
pp. 3213–3223. of Pittsburgh, Pittsburgh, PA, USA, in 2012 and
[40] C. Sakaridis, D. Dai, and L. Van Gool, “Semantic foggy scene under- 2016, respectively.
standing with synthetic data,” Int. J. Comput. Vis., vol. 126, no. 9, He is currently an Assistant Professor with the
pp. 973–992, 2018. Department of the Computer Engineering, George
[41] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real- Mason University, Fairfax, VA, USA, where he is
time object detection with region proposal networks,” in Proc. Adv. the Founder of the Intelligence Fusion Laboratory.
Neural Inf. Process. Syst., 2015, pp. 91–99. He also stays in close cooperation with not only
academic society, such as Duke University, Durham,
NC, USA; University of California at Santa Barbara,
Santa Barbara, CA, USA; University of Pittsburgh, Pittsburgh, PA, USA;
Fuxun Yu received the B.S. degree from the Harbin Syracuse University, Syracuse, NY, USA; Tsinghua University, Beijing, China;
Institute of Technology, Harbin, China, in 2017. Hong Kong University of Science and Technology, Hong Kong; and City
He is currently pursuing the Ph.D. degree with the University of Hong Kong, Hong Kong, but also industries, such as the research
Department of Electrical and Computer Engineering, labs of HP, Palo Alto, CA, USA; Samsung, Suwon, South Korea; MSRA,
George Mason University, Fairfax, VA, USA, under Beijing, China; Marvell, Hamilton, Bermuda; Amazon, Seattle, WA, USA; and
the supervision of Prof. X. Chen. Apple, Cupertino, CA, USA. In the past years of research, he has published
His current research interests include deep more than 30 papers in the top international conferences and journals and
learning robustness, high-performance deep neural received many best paper nominations and other awards. His research interests
network computing and optimization, interpretabil- are in the low-power mobile system, high-performance mobile computing,
ity, and explainability of deep learning. machine learning, and secure computing system.
Authorized licensed use limited to: SASTRA. Downloaded on March 21,2023 at 06:35:19 UTC from IEEE Xplore. Restrictions apply.