You are on page 1of 10

IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO.

2, MARCH 2020
 
395

Proximity Based Automatic Data Annotation


for Autonomous Driving
Chen Sun, Student Member, IEEE, Jean M. Uwabeza Vianney, Ying Li, Long Chen, Member, IEEE, Li Li, Fellow,
IEEE, Fei-Yue Wang, Fellow, IEEE, Amir Khajepour, and Dongpu Cao, Member, IEEE

    Abstract — The recent development in autonomous driving in- tem could also greatly reduce traffic accidents caused by hu-
volves high-level computer vision and detailed road scene under- man drivers [1]. To design a safe and robust autonomous driv-
standing. Today, most autonomous vehicles employ expensive
high quality sensor-set such as light detection and ranging (LID- ing system, the ability to understand the driving environment
AR) and HD maps with high level annotations. In this paper, we as well as the current vehicle state is essential [2], [3]. The
propose a scalable and affordable data collection and annotation techniques used in environment perception varied from simple
framework, image-to-map annotation proximity (I2MAP), for af- object marker detection by hand-crafted rules [4] to recent
fordance learning in autonomous driving applications. We deep learning approach [5]. The final goal is primarily to have
provide a new driving dataset using our proposed framework for
driving scene affordance learning by calibrating the data samples an affordable and robust system applicable under diverse en-
with available tags from online database such as open street map vironments.
(OSM). Our benchmark consists of 40 000 images with more than One of the most frequently used autonomous driving
40 affordance labels under various day time and weather even framework among car companies is the modular pipeline
with very challenging heavy snow. We implemented sample ad- approach where expensive LIDAR, high accurate global
vanced driver-assistance systems (ADAS) functions by training
our data with neural networks (NN) and cross-validate the res- navigation satellite system (GNSS) and 3-D high definition
ults on benchmarks like KITTI and BDD100K, which indicate maps are used to reconstruct the consistent world
the effectiveness of our framework and training models. representation of the surrounding environments [6], [7]. The
    Index Terms—Affordance learning, autonomous vehicles, data syn- ego vehicle then takes all the information into account and
chronization, scene understanding. make further control decisions. However, such a way of
perception is costly and raises problems in both storage space
I.  Introduction and processing speed. Furthermore, as mentioned in [8], a
UTONOMOUS driving became popular in the research human driver only needs relatively compact driving
A field in recent years. The information technology and
autonomous driving systems can in all be used to promote bet-
information to make driving and control decisions. Instead of
reconstructing the three-dimensional high definition map with
ter commuting choices, provide best route planning, improve bounding boxes of other traffic participants, a compact set of
bus scheduling and routing and finally reduce travel time and driving affordance set, or even vehicle states [9], may be
traffic congestion. A safe and robust autonomous driving sys- enough for control decisions. On the other hand, end-to-end
Manuscript received January 21, 2020; revised February 4, 2020; accepted learning [10] and direct perception [8] attempt to mapping
February 12, 2020. Recommended by Associate Editor Lingxi Li. (Corres- directly from camera images to either control inputs or driving
ponding author: Dongpu Cao.) scene affordances. The end to end learning for autonomous
Citation: C. Sun, J. M. Uwabeza Vianney, Y. Li, L. Chen, L. Li, F.-Y. driving enjoys cheap annotation of training data set; however,
Wang, A. Khajepour, and D. P. Cao, “Proximity based automatic data annota-
tion for autonomous driving,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 2, pp. it is hard to interpret the control decisions. The direct
395–404, Mar. 2020. perception approach proposed in [8] leverage interpret-ability
C. Sun, J. M. Uwabeza Vianney, A. Khajepour, and D. P. Cao are with the by using compact annotations of driving scene affordances.
Department of Mechanical and Mechatronics Engineering, University of Wa-
terloo, Waterloo ON N2L3G1, Canada (e-mails: chen.sun@uwaterloo.ca; Autonomous driving systems trained in both ways are highly
jmuviann@uwaterloo.ca; dongpu.cao@uwaterloo.ca; a.khajepour@uwater- dependent on the distribution and label accuracy of training
loo.ca). data sets. The data collection and attributes annotation for
Y. Li is with the Department of Geography and Environmental Manage-
ment, University of Waterloo, Waterloo ON N2L3G1, Canada (e-mail:
neural network-based training methods raised several
y2424li@uwaterloo.ca). problems, e.g., how to collect driving data in a scalable way in
L. Chen is with the School of Data and Computer Science, Sun Yat-sen a diverse environment, and how to ease the human annotation
University, Zhuhai 519082, China, and also with Waytous Inc., Qingdao effort in training data affordance labelling.
266109, China (e-mail: chenl46@mail.sysu.edu.cn).
L. Li is with the Department of Automation, Tsinghua University, Beijing
An alternative to manual annotation is offered by modern
100084, China (e-mail: forcelee@gmail.com). computer graphic techniques which allow generating large-
F.-Y. Wang is with the State Key Laboratory of Management and Control scale synthetic data-sets with pixel-level ground truth.
for Complex Systems, Institute of Automation, Chinese Academy of Sci- However, the creation of photo-realistic virtual worlds is time-
ences, Beijing 100190, China (e-mail: feiyue.wang@ia.ac.cn).
Color versions of one or more of the figures in this paper are available on-
consuming and expensive. Games like TORCS, limited by its
line at http://ieeexplore.ieee.org. development, the image and video in the game is not photo-
Digital Object Identifier 10.1109/JAS.2020.1003033 realistic enough. Recently released open-sourced simulation

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 14:15:13 UTC from IEEE Xplore. Restrictions apply.
396 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 2, MARCH 2020

platform CARLA [11] and AIRSIM [12] are built on unreal publicly available benchmarks. Geiger et al. [6] present KITTI
engine, which the image generated are more realistic and have benchmark in 2012 containing both camera images and
the flexibility to define various kinds of weather and LIDAR sweeps. They focused on local driving scenes in
environment settings. Autoware [13] shares real public roads Europe. The published benchmark is quite useful for modular
and region data in Japan for the software test of self-driving pipeline approach of autonomous driving where typical vision
vehicles. However, the problem is that the simulator does not tasks, including traffic agent detection, lane and road mark
represent reality. It is more reasonable to collect and to label detection, are demonstrated. In 2016, the Cityscape
the driving data in a distributed approach. Xu et al. [14] benchmark [18] collected various urban driving scene across
published BDD100K data set where the diverse driving data is 50 cities for semantic segmentation tasks. Authors open-
collected in a distributed way by Uber drivers across sourced their benchmark for pedestrian, lane and road
California and New York and annotated by human labour. The marking detection under various weather and day time in [19],
driving data are collected with a phone camera while the main [20]. However, accurate annotation is time and labour-
effort of manual annotation task is not relieved. In [15], the consuming. Baidu proposed their annotation pipeline along
authors deploy the APIs from open street map (OSM) [16] to with the Apolloscape [21] benchmark in 2018. At the same
tag images from google street view (GSV). It is indeed a time, UC Berkeley released BDDV dataset [17], which
clever implementation where the training samples are provided a semantic evaluation benchmark containing large-
available online (no driver needed) and the labels are free to scale driving datasets distributed collected across four cities.
use from OSM. However, the major drawbacks of this scheme The driving data quality, labelling accuracy is proportional to
are 1) low diversity of the driving samples due to GSV are the human labelling power and sensor equipment costs in
collected in sunny day time only; 2) low accuracy of the most cases.
sample labels due to the calibration between GNSS and OSM
and lack of label filtering. B. Driving Affordance Understanding
This work provides the following contributions: 1) We In the autonomous driving setting, the authors in [8] first
present an affordable, scalable driving data collection and argue that traditional modular pipeline approach uses
annotation scheme, image-to-map annotation proximity redundant information and making the driving task even
(I2MAP) with GNSS calibration and confidence score harder. Typically, human drivers can estimate the affordances
filtering for autonomous driving system, which provides of the environment, including the road attributes and relative
diverse samples and boosted label accuracy; 2) We propose distances associated with related traffic participants instead of
our driving scene understanding benchmark named Ontario identifying the bounding boxes or segmentation for all
driving dataset-affordances (ODD-A), which is composed of instances, appeared in the image. Sauer et al. [22] further
more than 40 000 samples with more than 40 labels under examined the idea of direct perception by extending the
various weather, light-condition and road segments. We driving scenario in urban driving using more photo-realistic
provide multiple labels for each image for driving scene simulation platform CARLA [11]. The images with the
understanding with the proposed I2MAP method. The affordance attribute attached in both works are collected easily
effectiveness of calibration and confidence score filtering are through the provided simulation API. However, affordance
examined with the label accuracy correlation of human-expert annotation is a challenging task in real driving environments
labelling; 3) We experiment on sample ADAS functions since it requires a certain level of understanding of the current
including traffic flow prediction, and multi-labelling diving environment. To annotate the driving affordances such
affordance state observer is trained in a single multi-task as heading angle, speed limit, road type as well as the distance
learning network with our dataset and cross-validated on to the intersection, the human labeller may have to label along
KITTI [6], BDD100K [17] benchmarks. the ride instead of following the annotation scheme mentioned
The rest of this paper is organized as follows: Section II above image by image. In 2016, Seff et al. [15] present the
discusses related work in driving benchmarks, driving-related affordance learning methods by combining the GSV
affordance understanding and multi-task learning. We outline panoramas and OSM road attributes. They use cropped GSV
our I2MAP data collection and annotation framework in panoramas to train a CNN model for a list of selected static
Section III, the detailed sensory fusion and proximity-based road attributes. This method was further examined in [23] for
mapping algorithms are discussed. In Section IV, we various geographical locations. Ideally, we would able to use
demonstrate the ODD-A dataset for a typical multi-task a compact set of affordances to represent the background and
learning setting. We demonstrate the network structure and event of the driving scene with enough samples under the
experimental results with recent research works. Finally, chosen driving scenario.
conclusion and future work discussion are presented in
C. Multi-Task Learning
Section V.
Multi-task learning (MTL) is a sub-branch of supervised
II.  Related Works learning, which can simultaneously solve multiple learning
tasks and make use of the commonness and differences
A. Driving Benchmarks between tasks. The affordance learning task of understanding
In the pursuit of understanding driving scenes and driving a provided driving scene is similar to visual question
autonomy, researchers have contributed massive effort to the answering (VQA) task [24], where the model needs to extract

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 14:15:13 UTC from IEEE Xplore. Restrictions apply.
SUN et al.: PROXIMITY BASED AUTOMATIC DATA ANNOTATION FOR AUTONOMOUS DRIVING 397

the common features of the sample and be able to reason the the Phone’s x-axis pointing to the true north matching with the
commonsense of the scene representation. Compared with the device attitude orientation. The biases are subtracted from
training various NN models alone, the learning efficiency, corresponding measurements in real-time as the driver collects
model size and prediction accuracy of the specific task model data.
can be improved by utilizing MTL. In the recent review [25], 3) Garmin Dash Camera: The Phone Data Collector App
the advantage of MTL with the ability to learn internal can capture driving videos similarly with [14] which using
representation which can improve the generalization ability of phone camera for recording. However, one drawback is that
trained NN has been illustrated. The effect of MTL in the the horizontal view angle for the captured iPhone videos was
context of autonomous driving applications are demonstrated only about 60° field of view (FOV). We use Garmin Dash
in [14], [22]. Cam 45 with a 122° FOV to overcome the raw data quality
issue in the distributed driving sampling applications in [14],
III.  I2MAP Framework [17]. This camera records videos at 30 fps with a frame
In this section, we introduce proposed I2MAP data resolution of 1920 × 1080. The camera gives 3 channels
collection and affordance annotation framework, as shown in (RGB) and has a night colour mode setting which helps
Fig. 1. We use Honda Civic LX 2017 as our ego-vehicle for capture relatively good images at night. Hence, the raw image
our later driving data collection. In the I2MAP set up, we quality is much better than the GSV samples used in [15] as
include a front camera, IPhone and Panda (Gray version) well as the capability to capture more abundant driving
OBDII Interface from comma.ai. All the sensory data are scenes. Each frame of the dashcam recording is tagged with
synchronized with coordinated universal time (UTC). Once UTC, GPS position and movement speed.
the recordings are calibrated, we apply the extended Kalman
filter (EKF) as sensory fusion method to raw GNSS signal, the B. Proximity Mapping
phone sensor and CAN bus readings to obtain better The main idea of proximity mapping is to overlay road
positioning result. The estimated pose and UTC are then used feature attributes to recorded driving scenes. The OSM to
as a query for online labels for the image tags. To reduce the image matching algorithm was first proposed in [15] and
auto-labelling error, we propose a confidence score based data further examined in [23]. The contributed data in OSM is tied
filtering post-processing method, which is discussed in this together using location information in a world geodetic
section. coordinate system (WGS84). The tags for road attributes and
visibility can then be queried from OSM and online weather
A. Sensor Setup
API correspondingly. However, the major challenge of this
1) Vehicle Proprioceptive Sensors: Vehicles have many type of proximity mapping-based data auto annotation is how
proprioceptive sensors such as steering angle, and throttle to ensure label accuracy. The accuracy of feature association
input accessible via vehicle CAN Bus. We can access the is directly affected by the location accuracy in both GSV and
vehicle dynamics using Panda OBDII Interface and further OSM and weather information in both sources was updated in
decoded based on BDC file matching with our vehicle model. the same time frame. As highlighted later, we found some
The Panda (Gray version) OBDII Interface also come up with mislabeled affordances due to unresolved location differences
a Tallysman GPS antenna as GNSS receiver, the raw GNSS especially at bridges, intersections and close road networks
signal can be retrieved in real-time. For our ODD-A data where a small location deviation would associate features of
collection, messages of interest such as longitudinal one road to an image showing a different road.
acceleration and steering angle currently capture and save to a 1) Extended Kalman Filter: Extended Kalman filter (EKF)
separate file at 1 Hz. is an effective recursive filter to estimate the state of a
2) Phone Data Collector App: We mounted an iPhone on nonlinear dynamic system, which has been applied to vehicle
the dashboard of the ego-vehicle as a complement device for positioning with GPS and vehicle sensors in [26]–[28]. We
vehicle dynamics estimation. We built an App capable of use the kinematic bicycle model [29] given by the following:
logging the phone position and orientation estimates,
accelerometer and gyroscope sensor readings in real-time. ẋ = v cos(ψ + β) (1)
Notice that the accuracy of the IMU and magnetometer ẏ = v sin(ψ + β) (2)
equipped on the iPhone is relatively low due to the cheap
implementation. To compensate for the low accuracy v
ψ̇ = sin(β) (3)
problem, we implemented on-board automatic accelerometer Lr
and gyro calibration, where the iPhone is calibrated in an
v̇ = a (4)
upward portrait plane position demonstrated in Fig. 2. In this ( ( ))
regard, we only need to focus on bias estimation, and the where β = tan−1 Lr tan δ f / Lr + L f with the steer input δ f
accelerometer sensor scaling can be ignored. As a reading from CAN bus and a as acceleration. Notice that the
complement source of measurement of CAN bus, we can element in global pose x, y and velocity v are estimated from
estimate the vehicle heading by logging the iPhone heading, the vehicle proprioceptive sensors; the inertial heading ψ is
which is determined using embedded magnetometer sensors. estimated by the gyroscope sensor in the smart phone. Our
We need to calculate the bias for the calibration process; state vector is then X = [x, y, ψ, v] with input vector
hence we drove slowly on flat ground for few minutes until u = [δ f , a]. By concluding the white noise uncertainties in the

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 14:15:13 UTC from IEEE Xplore. Restrictions apply.
398 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 2, MARCH 2020

Data from all devices synchronized by UTC time

Phone sensors: Proprioceptive Dash cam:


-GPS sensors: -40k key frames
-Speed & acceleration -Vehicle dynamics -30 fps
-Vehicle drife & heading -Driver control inputs -1920×1080 resolution

Calibrate Calibrate
Query

EFK-based
Online data-base: pre-rocessing
-Road static attributes sensor fusion
-Weather
-Road type, bus stop,
intersection, stop signs Annotated data sets

Confidence score based data filtering

Fig. 1. A framework demonstration of proposed I2MAP driving data collection and automatic annotation pipeline.
 
+Y  0 0 −v sin(ψ + β) cos(ψ + β) 
 0 v cos(ψ + β) sin(ψ + β) 
Roll  0 
JF =  sin β 
 (7)
 0 0 0

Pitch
 Lr
+X 0 0 0 0
Yaw
+Z JH = I4×4 . (8)
The process noise covariance P and measurement noise
(a) (b) covariance R are chosen based on experimental data
   
Fig. 2. The phone is mounted with its z-axis parallel to the vehicle’s x-axis,  1 0 0 0 
  4 0 0 0 
 0 1 0 0  
 0 4 0 0 
ego vehicle forwarding direction is the same as the –z direction in iPhone co-
P =   , R =  .
 0 0 2 0 
(9)
ordinate system. (a) IPhone coordinate reference system; (b) The phone and  0 0 0.5 0 
dash camera set up in the ego vehicle. 0 0 0 0.3 0 0 0 4
2) GNSS Preprocessing: With a minimum of four satellites,
dynamic model w , we have the nonlinear dynamic system for a good position for the GPS receiver is achieved by solving a
model forecast 4-dimensional trilateration problem [30]. The primary source
the GPS error is coming from user equivalent range errors
Ẋ = F(X) + w (5) (UERE) relate with the timing and path readings and dilution
where F(X) wrap up the functions in (1)–(4). The measure- of precision (DoP) caused by the arrangement of satellites in
ment model can be defined as follows, based on the need to the sky. The open-sourced Laika package in [31] provides
useful sources to get the information about satellites and
estimate the position (x, y) , velocity v , and the heading angle ψ
signal delays from the raw GNSS data. We can obtain the
   pseudorange delay from the summation of tropospheric delay,
 1 0 0 0  
 
x 

   ionospheric delay, clock error and DCB corrections in the
 0 1 0 0   y 
 recording area [32]. The positioning accuracy is then
zk =   

 + v = H(X) + v.
 (6)
 0 0 1 0   ψ  improved by query the position of the nearby continuously
   
 operating reference station (CORS) through Laika [31] and
0 0 0 1 v make compensation by differential correction scheme with the
∂F ∂H known pseudorange delay.
The Jacobian matrices JF = |bx and JH = | are
∂X ∂X bx Other than the UERE, we can also evaluate the DoP from
derived as the raw GNSS data to estimate the error propagation on

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 14:15:13 UTC from IEEE Xplore. Restrictions apply.
SUN et al.: PROXIMITY BASED AUTOMATIC DATA ANNOTATION FOR AUTONOMOUS DRIVING 399

positional measurement precision. If the elevations and can find a demonstrated plot for the confidence score in
azimuths of the satellites in view are simial, the DOP is high. Fig. 3(b) corresponding to (10)–(12).
Simultaneously, a high DoP means that small errors in UERE
can cause significant errors in the computed position [33]. In D. Performance Evaluation
our paper, we use DoP as one feature to specify the 1) Position Accuracy: To prove the effectiveness of our
correctness of the processed GNSS position result. algorithm, we compare our result with the raw signal
positioning from Phone, DashCam, and Raw GNSS data.
C. Confidence Score Filtering Since the ground truth position of the receiver is never known,
One major issue in autonomous labelling is the label it is hard to judge the filtering algorithms. However, with the
accuracy and data quality. The annotated label needs to be as same assumption in [31], we can estimate the altitude
accurate as possible to avoid introducing bias to the accuracy of a position by checking the variation of the
subsequent usage on supervised learning. We have improved estimated road height over a small batch. It is reasonable to
the GNSS positioning through proposed raw GNSS assume that the vertical and horizontal accuracy is equally
compensation and EKF filtering. Instead of the positioning correlated for the computed position. Fig. 4 shows the altitude
accuracy, the final label accuracy is strongly correlated with error distribution for position computed with Phone GPS,
the label confidence of the road segment structure complexity. DashCam, Raw GNSS, EKF-fusion algorithm and the EKF
For example, we demonstrate an branching intersection in with confidence score filtering algorithm. Overall the
Fig. 3(a), where at time t, vehicle is positioned with state Xt ; positioning error was reduced by 50%–60% using EKF with
after some time period δt , the true state of the vehicle is the following confidence score specification
portrait as Xt+δt and the estimated state is given as X̂t+δt . Based
Φ = (S c po > 0.8) ∧ (S cDoP > 0.6) ∧ (S cil > 0.6). (13)
on proximity mapping, we find the nearest intersection
distance from the corresponding ego-vehicle heading for the Notice that the position error can be further reduced by
ground truth label mapping. Consider the case at time t + δt , providing more stringent criteria Φ ; the downside is that we
the image sampled at the real vehicle state at Xt+δt is then have to filter out more data.
tagged with the static road attributes annotated at nearby OSM 2) Label Accuracy: To compare the label accuracy, we
position (OSM-P2). generated a single set of human labels as ground truth by
Labels like number of lanes, road type, vehicle to road combining human annotators consensus following [35]. Each
heading are likely to raise error in this case. The problem of human volunteer was first shown fifty example data samples
intersection label accuracy is inherited from the lack of formal to help them understand the corresponding affordance
semantic definition of labels at the scenario transitions, which definition. We choose four classification tasks: number of
was elaborated in [34]. One popular strategy is to have lanes, heading angle, road type and day or night. Notice that
multiple human labeller to evaluate the image labels and find estimating the real heading angle as a regression task is tough
the final consensus as to the image tag. However, using for human annotators. We set a pseudo rule that we classify
consensus label with even more human efforts is against the the calculated heading angle corresponding to three types: left
intention of reducing human labelling workload. Moreover, (> 15◦), right (< −15◦ ), center (the rest of them). As evident in
even by reaching a consensus tag, it remains vague and lack of Fig. 5, the proposed I2MAP label accuracy boosted 8% on
formal definitions. We introduce the confidence score to average by EKF fusion algorithm; the performance is further
measure the data label accuracy in the following three improved by confidence filtering with the specification (13)
categories from 5% to 14%. We find that for the affordances have a
strong correlation with pose estimation, such as number of
S c po (∆p) = e−α1 ∆p
2
(10) lanes, heading angle and road type; the corresponding label
accuracy can be greatly improved through our proposed
S cDoP (∆d) = e−α2 ∆d
2
(11) algorithm. On the other hand, tags such as weather, visibility
are not very sensitive to the road structure change, and
S cil (∆t) = σ(|∆t|) (12)
localization error, hence the labelling quality of these
where the S c po measures the confidence score for the pitch affordances are naturally good enough.
offset as in our assumption, the ground is mostly flat for the
sake of sensor bias reduction. Hence a large pitch offset ∆p is IV.  Benchmark Analysis and Experiments
not expected. Similarly, the calculated DoP ∆d is a positive
real value state how errors in the measurement affect the final A. Data Statistics
state estimation. As mentioned in [33], a considerable ∆d > 10 Our current dataset consists of about 40 000 images taken
is often representing a low confidence level, and the position- during three weeks of driving in Waterloo. In total more than
al measurements should be discarded. The intersection label 40 label classes are automatically annotated. The available
confidence is measured by S cil (∆t) where ∆t is the frame off- tags and annotated attributes are shown in Table. I. We
set from the given data sample frame to the sample frame cor- recorded the data in various driving scenarios across various
responding to the intersection. Since the structure transition road types, day time and night time.
mostly causes this label confidence issue, we use the sigmoid The data balance of selected subset attributes is
function σ to map the frame difference ∆t to [0.5, 1]. Readers demonstrated in Fig. 6. Our data set contains 53% secondary

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 14:15:13 UTC from IEEE Xplore. Restrictions apply.
400 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 2, MARCH 2020

Confidence score evaluation Sc (Δx) for


pitch offset, DoP and intersection label transition
1.0

The confidence score: Sc (Δx)


OSM-P1 0.8 Pitch off. score
DoP score
^
Xt+δt Intersection
d2 0.6 label scote
OSM-P2
d1 Xt+δt 0.4
Xt
0.2

0
−20 −15 −10 −5 0 5 10 15 20
The parameter for confidence scor evaluation: Δx
(a) (b)

Fig. 3. Label mapping error due to proximity and the proposed confidence score functions to filter the error. (a) An example scenario at branching intersec-
tion which induces the proximity-based label mapping error. The black dots are corresponding to the OSM reference location with road attributes annotation in
the database; the green dots represent the ground truth state of the ego-vehicle; the red dot is the estimated state at time t + δt ; (b) A set of confidence score com-
putation. Although they share the same x-axis, the meaning of the corresponding ∆x depends on the corresponding confidence score function.
(α1 = 0.1, α2 = 0.01).

Altitude error distributions Altitude error distributions


0.18
Phone Raw GNSS
0.16
DashCam EKF-fusion
0.08 0.14
Raw GNSS Cinfidence
0.12 score filtered
0.06
Density

Density

0.10
0.08
0.04
0.06
0.02 0.04
0.02
0 0
−20 −15 −10 −5 0 5 10 15 20 −20 −15 −10 −5 0 5 10 15 20
Altitude error (m) Altitude error (m)
(a) (b)

Fig. 4. Estimated altitude error distributions comparison for the signal from five schemes. (a) Error distribution comparison between Phone (purple), Dash-
Cam (green), and Raw GNSS (red); (b) Error distribution comparison between Raw GNSS (red), EKF-fusion (pink) and EKF with confidence score filtering
with our specification Φ (light-green).

roads which are the highest of all road types. The secondary the environmental labels, are listed in Table. I.
roads mostly indicate a route with two lanes and traffic
B. Affordance Observer Network
1.2 Raw GNSS One application of our proposed dataset ODD-A is to solve
EKF fusion the VQA problem as a multi-labelling problem. We use a
1.0 EKF fusion with cinfidence filtered
0.95 0.92 0.95 0.96 single neural network demonstrated in Fig. 7 as a affordance
Label accuracy

0.93 0.89 0.90


0.8 0.80
0.84 0.81 0.85 observer. The task blocks corresponding to different sub-tasks
0.68
0.6 on learning the affordances or even control outputs as an
0.4 imitation learning structure. The network we use here in Fig. 7
0.2
can be extended with controller modules as [11], [22], [36].
Here, we choose one example implementation on predicting
0
Num. of lane Heading angle Road type Day or night two sets of tasks with the image inputs and the history logs of
Label categories velocity and driver’s inputs. We use the ResNet50 [37] pre-
trained on ImageNet [38] as the encoder to extract features
Fig. 5. Label accuracy compared with human consensus on selected afford- from raw input images. The features are extracted from the
ances. input sequence and stacked as one set of N feature map. The
encoded features from images concatenate with N frame
moving both ways. The recorded high percentage is a true velocity and driver’s steer and throttle inputs forward pass to
reflection of most road networks that we collected data from. task block as an input. The task blocks are task-specific layer
The smallest represented road types with 6% are the service which is grouped as texture oriented task and geometric
roads which are used to provide access to business areas and oriented task. A texture oriented task is aiming to focus more
public gathering places such as business parks and campsites. on the texture instead of geometric relationships in the image.
The detailed image tags, including the vehicle dynamics and For example, the features to identify weather condition or

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 14:15:13 UTC from IEEE Xplore. Restrictions apply.
SUN et al.: PROXIMITY BASED AUTOMATIC DATA ANNOTATION FOR AUTONOMOUS DRIVING 401

TABLE I
Current Available Labels From the I2 MAP Data Collection Scheme
Labels from vehicle sensor Labels from Phone and DashCam Labels from OSM and online API
Longitudinal acceleration GPS & images Road type & intersection type
Engine torque & estimate Speed, heading angle & drift Intersection distance
Steer angle & steering wheel angle Estimated attitude: roll, pitch, yaw Bike lane? & one_way?
Engine RPM, odometer & pedal gas Gyro & accelerometer measurements Number of lanes
Throttle & brake Vertical, horizontal & heading accuracy Weather index & road condition
Front left & front right wheel speed Moving traffic? Bus stop? & stop sign?
Rear left & rear right wheel speed Wind speed & visibility

secondary n=2
of size 32. We use the class-weighted categorical cross-
20 000
entropy loss for classification, where the weights are chosen
17 500
Number of frames

sunny based on the data balance in the training set. The mean
15 000
12 500
cloudy average error is chosen for the regression task. The data
10 000 snowy
augmentation set is different between the weather task and the
tertiary n=1
7500 primary n=3 n=4 rest geometric oriented task blocks. Notice that we have an
5000 service rainy experiment with FCN (N = 1) and LSTMs (N = 5 or 10) for
2500 the task-specific layers.
0 We have evaluated the experimental results in Table. II with
Road type Num. of lane Weather
Label categories the cross-validation on KITTI [6] and BDD100K [17]. To
synchronize the labels for comparison, we obtain the image
Fig. 6. The sampled driving data distribution over a set of road type, num- tags for all testing datasets with the proposed I2MAP
ber of lanes and different weathers. algorithm. We found that by employing a temporal task block,
most of the prediction results improve, which demonstrates
Features Task blocks the additional temporal information could help to improve the
1
Texture oriented classification and regression ability. However, the number of
...
2 Affordances
feature maps stack on memory could affect the prediction
Image 3 FCN/LSTM result as well. In our experiment, we sample the data in 1 fps,
Encoder ...
N−1
which could result in the additional temporal trace (N = 10)
N Geometric oriented becomes disturbances in the final network prediction. Almost
...
N frames of velocity and Affordances all the data are collected in clear weather in KITTI [6], hence
driver’s inputs FCN/LSTM
we list NA in Table. II. Notice that the proposed dataset
ODD-A and BDD100K [17] are sampled similarly in terms of
driving location (both in North America) and driving scenario
Fig. 7. Overview of the multi-task learning network. The auto-encoder ex- (various lighting condition, urban). The ODD-A trained
tracts the features from the input image. We store the last N feature maps in network to generalize well when validating on the BDD100K
memory where N is the length of the image sequence for the designed percep- [17] data. Although a more diligent pre-processing needs to be
tion tasks. The N frames velocity and driver’s input are concatenated with the done for improving the validation result on KITTI [6], we can
N feature maps from images branched to each task blocks. The defined af- demonstrate the applicability and effectiveness of the
fordances are predicted from each task blocks. proposed ODD-A benchmark and the multi-task learning
network structure as shown in Fig. 7.
visibility from a front view image are different from those for
learning the cross-track error or lane structure. Different C. Traffic Flow Prediction
weighting combination should be learned corresponding to the We used the trained model to suggest actions of stop or
extracted features from the auto-encoder. Notice that the two drive given the driving scene as a sample ADAS function. By
types of tasks can be treated as augmentation noise to each restricting the moving direction of the Moving Traffic task
other during the training phase to improve the robustness of (must same as the ego-vehicle), as well as combining the
the neural network. historical driver's input and the vehicle speed, we can utilize
We choose the following seven affordances as the learning the result from Moving Traffic as the driving aid function.
tasks, namely heading angle, intersection distance, number of Some of the driving scenes, along with the predicted actions,
lanes, intersection type, weather type, moving traffic and road are presented in Fig. 8. We present scenes of snowy roads in
type. The weather task block is a texture oriented block, both day and night time. The image on top row in Fig. 8,
whereas the rest of the tasks are focus on learning the shows a road a snowy with unclear road geometry. Vehicles
embedded geometric relationships. We use the sixty per cent ahead of the ego vehicle appear far away. The image was
of the proposed ODD-A as the training set and the rest for correctly classified, and the model suggested that it is safe to
validation and testing. The network is trained on mini-batches drive. The top middle image in the same figure indicates

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 14:15:13 UTC from IEEE Xplore. Restrictions apply.
402 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 2, MARCH 2020

TABLE II
The Performance of the Multi-Task Network. The Higher the Accuracy (ACC) and the Lower the Mean Average Error
(MAE) the Better
ODD-A KITTI [6] BDD100K [17]
Task Metric Non-temporal Temporal Temporal Non-temporal Temporal Temporal Non-temporal Temporal Temporal
(N = 1) (N = 5) (N = 10) (N = 1) (N = 5) (N = 10) (N = 1) (N = 5) (N = 10)
Heading MAE 3.52 1.16 1.98 5.86 2.83 4.12 4.88 2.64 2.13
angle
Intersection MAE 4.77 2.28 3.02 8.12 6.33 6.98 6.73 6.02 6.97
distance
Number of 87.41% 91.27% 84.32% 90.37% 93.44% 90.08% 83.86% 89.37% 88.26%
ACC
lanes
Intersection ACC 93.52% 92.12% 88.48% 78.23% 83.12% 81.94% 80.09% 83.12% 81.88%
Weather ACC 91.74% 92.73% 92.53% NA NA NA 90.87% 91.33% 92.84%
Moving 81.26% 94.29% 88.65% 77.86% 82.61% 83.02% 80.54% 81.88% 81.79%
ACC
traffic
Road type ACC 91.53% 92.41% 91.88% 76.96% 81.49% 81.09% 91.35% 92.16% 91.37%

snowy road with a vehicle on the left lane but very close to traffic lights and of other participating vehicles might be
ego vehicle. The model can recognize that such scene-setting exaggerated and misleading. However, the model can learn
suggest a safe to drive action. The top right image and the the most important traffic flow cues given the intent of the ego
bottom center image in Fig. 8 indicates that the model does vehicle.
not just associate the green traffic light with clear to move
V.  Conclusion
action but also considers the actions and positions of other
participants. It is also able to read the intention of the ego In this work, we introduced a scalable driving data
vehicle given its orientation on the road. The top right image collection and automatic annotation framework driving by
clearly shows that the traffic lights are green and even another EKF proximity mapping and confidence score filtering. The
vehicle (white) shown in the scene continued to move straight. collected data from distributed devices are synchronized and
However, the model predicted that the ego vehicle intended to annotated with filtered labels for direct perception scheme.
The proposed benchmark, ODD-A include vehicle dynamics
turn right and given that there are pedestrians, the prediction is
and road attributes under various scenarios including day time
a suggestion to stop action given the learned dynamic
and night time under various weather conditions (no rain, rain,
affordances. Similarly, the bottom centre image in Fig. 8
snow). We train and evaluate the affordance observer as a
shows a scene with clear green lights. However, the model
multi-task learning network. One sample ADAS function on
learned the difference in lighting between a moving and
traffic flow prediction for harsh weather are evaluated and
stopping vehicle (in road setting). Consequently, it was able to
demonstrate its effectiveness with popular benchmarks. We
suggest that the correct action at that instance was to stop even
concluded that the proposed data collection and annotation
though the lights were green — the bottom right image in Fig. 8
framework can be further employed in larger scale in order to
indicates a case where the traffic light is red but are obstructed
enhance the trained model generalization ability. Exploring
by a large cargo vehicle moving in the adjacent traffic. This more advanced neural network structures and refining the
static OSM labels with dynamic observations is currently
under investigation.

References
[1] F.-Y. Wang, N.-N. Zheng, D. Cao, C. M. Martinez, L. Li, and T. Liu,
“Parallel driving in cpss: a unified approach for transport automation
Pred = safe to drive Pred = safe to drive Pred = suggest to stop and vehicle intelligence,” IEEE/CAA J. Autom. Sinica, vol. 4, no. 4,
pp. 577–587, 2017.
[2] C. Lv, D. Cao, Y. Zhao, D. J. Auger, M. Sullman, H. Wang, L. M.
Dutka, L. Skrypchuk, and A. Mouzakitis, “Analysis of autopilot
disengagements occurring during autonomous vehicle testing,”
IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 58–68, 2017.
[3] Y. Xing, C. Lv, L. Chen, H. Wang, H. Wang, D. Cao, E. Velenis, and
F.-Y. Wang, “Advances in vision-based lane detection: algorithms,
Pred = safe to drive Pred = suggest to stop Pred = suggest to stop integration, assessment, and perspectives on acp-based parallel vision,”
IEEE/CAA J. Autom. Sinica, vol. 5, no. 3, pp. 645–661, 2018.
Fig. 8. Our model prediction on traffic flow driving suggestion under visib- [4] B. Jähne and H. Haußecker, Computer Vision and Applications: A
Guide for Students and Practitioners, Elsevier, 2000.
ility conditions.
[5] S. J. Russell and P. Norvig, Artificial Intelligence: A Modern Approach.
Malaysia; Pearson Education Limited, 2016.
scenario represents an obstruction object that the model can [6] A. Geiger, P. Lenz, C. Stiller, and R. Urtasun, “Vision meets robotics:
recognize. Finally, the bottom left image in the same figure the kitti dataset,” The Int. J. Robotics Research, vol. 32, no. 11,
points out the difficulty of driving at night while raining. The pp. 1231–1237, 2013.

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 14:15:13 UTC from IEEE Xplore. Restrictions apply.
SUN et al.: PROXIMITY BASED AUTOMATIC DATA ANNOTATION FOR AUTONOMOUS DRIVING 403

[7] Z. Chen, J. Zhang, and D. Tao, “Progressive Lidar adaptation for road dynamic vehicle models for autonomous driving control design,” in
detection,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 693–702, 2019. Proc. IEEE Intelligent Vehicles Symp. 2015, pp. 1094–1099.
[8] C. Chen, A. Seff, A. Kornhauser, and J. Xiao, “Deepdriving: learning [30] K. T. Leung, J. F. Whidborne, D. Purdy, and P. Barber, “Road vehicle
affordance for direct perception in autonomous driving,” in Proc. IEEE state estimation using low-cost GPS/INS,” Mechanical Systems and
Int. Conf. Computer Vision, 2015, pp. 2722–2730. Signal Processing, vol. 25, no. 6, pp. 1988–2004, 2011.
[9] H. Guo, D. Cao, H. Chen, C. Lv, H. Wang, and S. Yang, “Vehicle [31] H. Schafer, E. Santana, A. Haden, and R. Biasini, “A commute in data:
dynamic state estimation: state of the art schemes and perspectives,” the comma2k19 dataset,” arXiv Preprint arXiv: 1812.057522018,
IEEE/CAA J. Autom. Sinica, vol. 5, no. 2, pp. 418–431, 2018. 2018.
[10] V. Rausch, A. Hansen, E. Solowjow, C. Liu, E. Kreuzer, and J. K.
[32] S. Miura, L.-T. Hsu, F. Chen, and S. Kamijo, “GPS error correction
Hedrick, “Learning a deep neural net policy for end-to-end control of
autonomous vehicles,” in Proc. IEEE American Control Conf., 2017, with pseudorange evaluation using three-dimensional maps,” IEEE
pp. 4914–4919. Trans. Intelligent Transportation Systems, vol. 16, no. 6, pp. 3104–3115,
2015.
[11] A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun,
“CARLA: an open urban driving simulator,” in Proc. 1st Annual Conf. [33] H. Sairo, D. Akopian, and J. Takala, “Weighted dilution of precision as
Robot Learning, 2017, pp. 1–16. quality measure in satellite positioning,” IEE Proceedings-Radar, Sonar
[12] S. Shah, D. Dey, C. Lovett, and A. Kapoor, “Airsim: high-fidelity visual and Navigation, vol. 150, no. 6, pp. 430–436, 2003.
and physical simulation for autonomous vehicles,” in Field and Service [34] K. Czarnecki and R. Salay, “Towards a framework to manage
Robotics. Springer, 2018, pp. 621–635. perceptual uncertainty for safe automated driving,” in Proc. Int. Conf.
[13] S. Kato, S. Tokunaga, Y. Maruyama, S. Maeda, M. Hirabayashi, Y. Computer Safety, Reliability, and Security. Springer, 2018, pp.
Kitsukawa, A. Monrroy, T. Ando, Y. Fujii, and T. Azumi, “Autoware 439–445.
on board: enabling autonomous vehicles with embedded systems,” in [35] E. Herrera-Viedma, F. Herrera, and F. Chiclana, “A consensus model
Proc. ACM/IEEE 9th Int. Conf. Cyber-Physical Systems, 2018, pp. for multiperson decision making with different preference structures,”
287–296. IEEE Trans. Systems, Man, and Cybernetics–Part A: Systems and
[14] H. Xu, Y. Gao, F. Yu, and T. Darrell, “End-to-end learning of driving Humans, vol. 32, no. 3, pp. 394–402, 2002.
models from large-scale video datasets,” in Proc. IEEE Conf. Computer
[36] P. M. Kebria, A. Khosravi, S. M. Salaken, and S. Nahavandi, “Deep
Vision and Pattern Recognition, 2017, pp. 3530–3538.
imitation learning for autonomous vehicles based on convolutional
[15] A. Seff and J. Xiao, “Learning from maps: visual common sense for neural networks,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 1, pp. 82–95,
autonomous driving,” arXiv Preprint arXiv: 1611.08583, 2016. 2019.
[16] M. Haklay and P. Weber, “Open street map: user-generated street [37] Z. Wu, C. Shen, and A. Van Den Hengel, “Wider or deeper: revisiting
maps,” IEEE Pervas Comput., vol. 7, no. 4, pp. 12–18, 2008.
the resnet model for visual recognition,” Pattern Recognition, vol. 90,
[17] F. Yu, W. Xian, Y. Chen, F. Liu, M. Liao, V. Madhavan, and T. Darrell, pp. 119–133, 2019.
“BDD100K: a diverse driving video database with scalable annotation
tooling,” arXiv Preprint arXiv: 1805.04687, 2018. [38] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F. F. Li, “Imagenet: a
large-scale hierarchical image database,” in Proc. IEEE Conf. Computer
[18] M. Cordts, M. Omran, S. Ramos, T. Rehfeld, M. Enzweiler, R. Vision and Pattern Recognition, 2009, pp. 248–255.
Benenson, U. Franke, S. Roth, and B. Schiele, “The cityscapes dataset
for semantic urban scene understanding,” in Proc. IEEE Conf.
Computer Vision and Pattern Recognition, 2016, pp. 3213–3223. Chen Sun (SM’17) received the M.A.Sc degree in
[19] S. Lee, J. Kim, J. Shin Yoon, S. Shin, O. Bailo, N. Kim, T.-H. Lee, H. electrical & computer engineering from University of
Seok Hong, S.-H. Han, and I. So Kweon, “Vpgnet: vanishing point Toronto, Canada in 2017 and B.Eng. degree in auto-
guided network for lane and road marking detection and recognition,” in mation from the University of Electronic Science and
Proc. IEEE Int. Conf. Computer Vision, 2017, pp. 1947–1955. Technology of China in 2014. He is currently work-
[20] G. Li, Y. Yang, and X. Qu, “Deep learning approaches on pedestrian ing toward the Ph.D. degree in mechanical
detection in hazy weather,” IEEE Trans. Industrial Electronics, 2019. & mechatronics engineering at University of Water-
[Online]. Avaliable: https://ieeexplore.ieee.org/document/8880634/ loo. He is a Member of Cognitive Autonomous Driv-
ing Lab (CogDrive) and supervised by Prof. Dongpu
[21] X. Huang, X. Cheng, Q. Geng, B. Cao, D. Zhou, P. Wang, Y. Lin, and Cao and Prof. Amir Khajepour. His research in-
R. Yang, “The apolloscape dataset for autonomous driving,” in Proc. terests include end-to-end autonomous driving, safety validation for cyber-
IEEE Conf. Computer Vision and Pattern Recognition Workshops,
physical systems, planning, and control for robots.
2018, pp. 954–960.
[22] A. Sauer, N. Savinov, and A. Geiger, “Conditional affordance learning
for driving in urban environments,” arXiv Preprint arXiv: 1806.06498, Jean M. Uwabeza Vianney received the BSc. de-
2018. gree from University of Calgary, Canada in 2015. He
[23] C. Sun, J. M. U. Vianney, and D. Cao, “Affordance learning in direct is currently a second year MASc Student in mechan-
perception for autonomous driving,” arXiv Preprint arXiv: 1903.08746, ical & mechatronics engineering at University of
2019. Waterloo. He is a Member of Cognitive Autonom-
ous Driving Lab (CogDrive) and supervised by Prof.
[24] S. Antol, A. Agrawal, J. Lu, M. Mitchell, D. Batra, C. Lawrence
Dongpu Cao. Prior to starting a master’s program,
Zitnick, and D. Parikh, “VQA: visual question answering,” in Proc.
Jean was a Product Support Analyst at Applanix for
IEEE Int. Conf. Computer Vision, 2015, pp. 2425–2433.
more than 2 years. His research interests include end-
[25] S. Ruder, “An overview of multi-task learning in deep neural to-end autonomous driving, vision-based systems,
networks,” arXiv Preprint arXiv: 1706.05098, 2017. sensor fusion, machine learning, geospatial data engineering, prediction, and
[26] S. Rezaei and R. Sengupta, “Kalman filter-based integration of dgps and planning for autonomous driving.
vehicle sensors for localization,” IEEE Trans. Control Systems
Technology, vol. 15, no. 6, pp. 1080–1088, 2007.
Ying Li received the M.Sc. degree in remote sensing
[27] S. E. Li, G. Li, J. Yu, C. Liu, B. Cheng, J. Wang, and K. Li, “Kalman
filter-based tracking of moving objects using linear ultrasonic sensor from Wuhan University in 2017. She is currently
array for road vehicles,” Mechanical Systems and Signal Processing, working toward the Ph.D. degree with the Mobile
vol. 98, pp. 173–189, 2018. Sensing and Geodata Science Laboratory, Depart-
ment of Geography and Environmental Management,
[28] S. Gao, Y. Hou, H. Dong, S. Stichel, and B. Ning, “High-speed trains University of Waterloo, Canada. Her research in-
automatic operation with protection constraints: a resilient nonlinear terests include autonomous driving, mobile laser
gainbased feedback control approach,” IEEE/CAA J. Autom. Sinica, scanning, intelligent processing of point clouds, geo-
vol. 6, no. 4, pp. 992–999, 2019. metric and semantic modeling, and augmented real-
[29] J. Kong, M. Pfeiffer, G. Schildbach, and F. Borrelli, “Kinematic and ity.

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 14:15:13 UTC from IEEE Xplore. Restrictions apply.
404 IEEE/CAA JOURNAL OF AUTOMATICA SINICA, VOL. 7, NO. 2, MARCH 2020

Long Chen (M’12) received the B.Sc. degree in from 1995 to 2000 and the IEEE ITS Magazine from 2006 to 2007. He was
communication engineering and the Ph.D. degree in the Editor-in-Chief of the IEEE Intelligent Systems from 2009 to 2012, and
signal and information processing from Wuhan Uni- the IEEE Transactions on ITS from 2009 to 2016. He is currently the Editor-
versity. He is currently an Associate Professor with in-Chief of the IEEE Transactions on Computational Social Systems and the
the School of Data and Computer Science, Sun Yat- Founding Editor-in-Chief of the IEEE/CAA Journal of Automatica Sinica, and
sen University. His research interests include the Chinese Journal of Command and Control. He was the President of the
autonomous driving, robotics, and artificial intelli- IEEE ITS Society from 2005 to 2007, the Chinese Association for Science and
gence where he has contributed more than 70 public- Technology, USA, in 2005, and the American Zhu Kezhen Education Founda-
ations. He serves as an Associate Editor for IEEE tion from 2007 to 2008. He was the Vice President of the ACM China Coun-
Transactions on Intelligent Transportation Systems. cil from 2010 to 2011. He is currently the President of the IEEE Council on
Radio Frequency Identification. Since 2008, he has been the Vice President
and the Secretary General of the Chinese Association of Automation.
Li Li (S’05–M’06–SM’10–F’17) is currently an As-
sociate Professor with the Department of Automa-
tion, Tsinghua University, where he was involved in Amir Khajepour received the Ph.D. degree in
artificial intelligence, intelligent control and sensing, mechanical engineering from the University of Wa-
intelligent transportation systems, and intelligent terloo, Canada, in 1996. He is a Professor of the De-
vehicles. He has authored over 90 SCI-indexed inter- partment of Mechanical and Mechatronics Engineer-
national journal papers and over 50 international con- ing at the University of Waterloo. He holds the
ference papers. Dr. Li was a Member of the Editorial Canada Research Chair in Mechatronic Vehicle Sys-
Advisory Board for Transportation Research Part C: tems, and NSERC/General Motors Industrial Re-
Emerging Technologies, a Member of the Editorial search program that applies his expertise in several
Board of Transport Reviews and ACTA Automatica Sinica. He serves as an key multidisciplinary areas including system model-
Associate Editor for the IEEE Transactions on Intelligent Transportation Sys- ing and control of dynamic systems. His research has
tems. resulted in many patents and technology transfers. He is the author of more
than 400 journal and conference publications as well as several books. Dr.
Fei-Yue Wang (S’87–M’89–SM’94–F’03) received Khajepour is a Fellow of the Engineering Institute of Canada, the American
the Ph.D. degree in computer and systems engineer- Society of Mechanical Engineers, and the Canadian Society of Mechanical
ing from the Rensselaer Polytechnic Institute, USA, Engineering. His research interests include the development of hybrid power-
in 1990. In 1990, he joined the University of Ari- trains (electric and air hybrid engines), component sizing and power manage-
zona, USA, where he became a Professor and the ment design through concurrent optimization, vehicle modeling through real-
Director of the Robotics and Automation Laboratory time simulation and hardware-in-the-loop, active and adaptive suspension
and the Program in Advanced Research for Complex systems, vehicle stability, ultra-high-speed robotics, automated laser fabrica-
Systems. In 1999, he founded the Intelligent Control tion, and micro-electrical-mechanical systems.
and Systems Engineering Center, Institute of Auto-
mation, Chinese Academy of Sciences (CAS), under
the support of the Outstanding Overseas Chinese Talents Program from the Dongpu Cao (M’13) received the Ph.D. degree from
State Planning Council and the 100 Talent Program from CAS. In 2002, he Concordia University, Canada, in 2008. He is a
joined the Laboratory of Complex Systems and Intelligence Science, CAS, as Canada Research Chair in Driver Cognition and
the Director, where he was the Vice President for Research, Education, and Automated Driving, and currently an Associate Pro-
Academic Exchanges with the Institute of Automation from 2006 to 2010. In fessor and Director of Waterloo Cognitive Autonom-
2011, he was named the State Specially Appointed Expert and Director of the ous Driving (CogDrive) Lab at University of Water-
State Key Laboratory for Management and Control of Complex Systems. His loo, Canada. His current research interests include
current research interests include methods and applications for parallel sys- driver cognition, automated driving, and cognitive
tems, social computing, parallel intelligence, and knowledge automation. Dr. autonomous driving. He has contributed more than
Wang was elected as a Fellow of INCOSE, IFAC, ASME, and AAAS. In 180 publications, 2 books and 1 patent. He received
2007, he was a Recipient of the National Prize in Natural Sciences of China the SAE Arch T. Colwell Merit Award in 2012, and three Best Paper Awards
and the Outstanding Scientist by ACM for his research contributions in intel- from the ASME and IEEE conferences. Dr. Cao serves as an Associate Edit-
ligent control and social computing. He was a Recipient of the IEEE Intelli- or for IEEE Transactions on Vehicular Technology, IEEE Transactions on In-
gent Transportation Systems (ITS) Outstanding Application and Research telligent Transportation Systems, IEEE/ASME Transactions on Mechatronics,
Awards in 2009, 2011, and 2015 and the IEEE SMC Norbert Wiener Award IEEE Transactions on Industrial Electronics and ASME Journal of Dynamic
in 2014. He was the Chair of IFAC TC on Economic and Social Systems from Systems, Measurement and Control. He was a Guest Editor for Vehicle Sys-
2008 to 2011. He has been the General or Program Chair of more than 30 tem Dynamics and IEEE Transactions on SMC: Systems. He serves on the
IEEE, INFORMS, ACM, and ASME conferences. He was the Founding Edit- SAE Vehicle Dynamics Standards Committee and acts as the Co-Chair of
or-in-Chief of the International Journal of Intelligent Control and Systems IEEE ITSS Technical Committee on Cooperative Driving.

Authorized licensed use limited to: University of Exeter. Downloaded on June 20,2020 at 14:15:13 UTC from IEEE Xplore. Restrictions apply.

You might also like