Professional Documents
Culture Documents
Abstract— Pedestrian lane detection is an important task in a smart wheelchair, with little guidance from its disabled user,
many assistive and autonomous navigation systems. This arti- to traverse a pedestrian lane [2]. Pedestrian lane detection is
cle presents a new approach for pedestrian lane detection in useful also for autonomous vehicles to avoid pedestrians or
unstructured environments, where the pedestrian lanes can have off-limit regions in a scene [3]. In addition, it complements
arbitrary surfaces with no painted markers. In this approach, other features of electronic navigation devices such as obstacle
a hybrid deep learning-Gaussian process (DL-GP) network is pro-
posed to segment a scene image into lane and background regions.
detection [4], [5] and GPS-based guidance [6].
The network combines a compact convolutional encoder–decoder The existing methods proposed for pedestrian lane detection
net and a powerful nonparametric hierarchical GP classifier. The are mostly designed for detecting pedestrian lanes painted with
resulting network with a smaller number of trainable parameters white markers [7]–[10]. This article addresses this gap by
helps mitigate the overfitting problem while maintaining the focusing on the camera-based detection of pedestrian lanes
modeling power. In addition to the segmentation output for each in unstructured environments, where the pedestrian lanes can
test image, the network also generates a map of uncertainty—a have arbitrary surfaces with no painted markers. The scenes
measure that is negatively correlated with the confidence level containing the pedestrian lane are under varying lighting
with which we can trust the segmentation. This measure is conditions and could be indoor or outdoor.
important for pedestrian lane-detection applications, since its Existing algorithms for unmarked lane detection mostly rely
prediction affects the safety of its users. We also introduce a new
data set of 5000 images for training and evaluating the pedestrian
on hand-engineered features. These methods either use the
lane-detection algorithms. This data set is expected to facilitate color- and texture-based features of the lane surfaces to differ-
research in pedestrian lane detection, especially the application entiate the lane pixels from the background [11]–[13], or use
of DL in this area. Evaluated on this data set, the proposed the edge features to locate the lane boundaries [14]–[16].
network shows significant performance improvements compared In general, these methods are sensitive to scene variations,
with several existing methods. which cannot be easily captured by such model-based systems.
Index Terms— Assistive and autonomous navigation, bench-
Recently, lane-detection methods that use deep neural
mark data set, deep learning (DL), Gaussian process (GP) networks (DNNs) for automatic feature-learning have been
classifier, pedestrian lane detection. proposed [17]–[19]. These methods yielded promising perfor-
mances in terms of accuracy and processing time. However,
I. I NTRODUCTION they were mostly designed for vehicle road lane detection.
To the best of our knowledge, there are few publicly available
A UTOMATIC detection of the pedestrian lane in a
scene is an important component in many assistive and
autonomous navigation systems. It assists vision-impaired peo-
methods based on the DNNs for unmarked pedestrian lane
detection, which is generally a more challenging problem than
road lane detection, because the appearances, surfaces, and
ple in finding the walkable path and maintaining their balance
shapes of the pedestrian lanes often vary more significantly
while walking—a difficult task that is currently performed
than the vehicle lanes.
mostly using a white cane or a guided dog [1]. It also allows
There are two major challenges in adopting the DNN
Manuscript received April 7, 2019; revised September 26, 2019 and Decem- methods for pedestrian lane detection. First, training a typical
ber 21, 2019; accepted January 1, 2020. Date of publication February 13, DNN often requires a large volume of data, especially for
2020; date of current version December 1, 2020. This work was supported by complex problems. However, the data sets of labeled images
a grant from the Australian Research Council (ARC). (Corresponding author: for training a pedestrian lane-detection system are generally
Thi Nhat Anh Nguyen.)
Thi Nhat Anh Nguyen and Son Lam Phung are with the School of Electri- small compared with the training sets in other computer vision
cal, Computer and Telecommunications Engineering, University of Wollon- tasks. The largest publicly available data set for pedestrian lane
gong, Wollongong, NSW 2522, Australia (e-mail: ngt.nhatanh@gmail.com; detection was introduced in [16] with only 2000 images. Since
phung@uow.edu.au). typical DNNs usually have a huge amount of parameters to
Abdesselam Bouzerdoum is with the School of Electrical, Computer model complex problems, they are highly prone to overfitting
and Telecommunications Engineering, University of Wollongong, Wollon-
gong, NSW 2522, Australia, and also with the College of Science and if the amount of available data is small. Second, for the
Engineering, Hamad Bin Khalifa University, Doha 7675, Qatar (e-mail: safety of the users (vision-impaired persons, for example),
a.bouzerdoum@uow.edu.au). the pedestrian lane-detection system must generate not only
This article has supplementary downloadable material available at an accurate segmentation of the lane but also a confidence
https://ieeexplore.ieee.org, provided by the authors.
Color versions of one or more of the figures in this article are available measure with which we can trust its predictive output. Ideally,
online at https://ieeexplore.ieee.org. the system should give a full-resolution confidence map,
Digital Object Identifier 10.1109/TNNLS.2020.2966246 so that the user can decide which parts of the scene to avoid
2162-237X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: University of Hull. Downloaded on August 12,2023 at 17:52:35 UTC from IEEE Xplore. Restrictions apply.
NGUYEN et al.: HYBRID DL-GP NETWORK FOR PEDESTRIAN LANE DETECTION IN UNSTRUCTURED SCENES 5325
Fig. 1. Conceptual comparison of a softmax layer of a deep network versus a GP classifier, using an example in binary classification. A softmax layer gives
only a point prediction for each test input. In comparison, a GP classifier gives a probabilistic prediction with a predictive mean and an uncertainty level
for each test input: the test inputs far from the training data have higher uncertainty. In Fig. 1(c), the uncertainty interval (the gray area) is plotted with the
predictive 95% confidence bounds. (a) Input to a softmax layer as a function of the network input x. (b) Output of the softmax layer as a function of the
network input x. (c) Predictive output of GP classifier as a function of its input x.
(the parts with low confidence levels). A typical DNN does shown to improve the predictive performance over the
not naturally produce such a confidence measurement for its methods that only use either global or local information.
prediction. However, the above article mainly focuses on the multi
To address the above two challenges, this article presents class classification cases. A model specifically designed
a new approach for pedestrian lane detection using a hybrid for binary classification will lead to a further reduction
deep learning-Gaussian process (DL-GP) architecture. In this in the computational cost. This article will discuss the
approach, we cast the pedestrian lane detection in unstructured HGP model for binary classification in detail. We will
environments as a segmentation problem where a scene image then formulate a single loss function and present an
is segmented into pedestrian lane and background regions. The algorithm for end-to-end training of the hybrid DL-GP
contributions of the article can be highlighted as follows. architecture for pedestrian lane detection.
1) We propose a hybrid architecture for pedestrian 3) We create a new data set with manually annotated
lane detection that combines a compact convolutional ground truth for the objective evaluation of algorithms
encoder–decoder (E-D) network and a hierarchical for pedestrian lane detection. This data set consists
GP (HGP) classifier. Unlike the existing lane-detection of 5000 images collected from the realistic indoor
approaches that use very deep convolutional networks, and outdoor scenes, with various shapes, textures, and
the proposed architecture combines a compact network surface colors. This data set is the largest data set
having a smaller number of parameters with a powerful for pedestrian lane detection in the literature; it is
nonparametric GP classifier. This strategy helps mitigate extended from the data set that has been previously
the overfitting problem while maintaining the modeling introduced in [16]. It is expected to facilitate research
power. The proposed architecture can be trained in an in pedestrian lane detection, especially the application
end-to-end manner. An additional benefit of using a GP of DL in this area. The data set is available at
classifier in our architecture is that besides the seg- slp-lab.com/phung/plvp2.html.
mentation output for each test image, the classifier also The rest of the article is organized as follows. Section II
generates a map of well-calibrated uncertainty—a para- reviews the related work. Section III presents the proposed
meter that is negatively correlated with the confidence hybrid DL-GP architecture for pedestrian lane detection.
with which we can trust the segmentation. In a typical Section IV presents the experiments and analysis. Finally,
DNN classifier, predictive probabilities obtained by a Section V concludes the article.
softmax layer are often erroneously interpreted as model
confidence. In fact, since softmax uses point estimation II. R ELATED W ORK
without considering model uncertainty, a model can be
uncertain in its predictions even with a high softmax In this section, we review the related work for unmarked lane
output. As illustrated in Fig. 1, passing a point estimate detection including the traditional and DL-based methods.
of a function [see Fig. 1(a)] through a softmax layer
results in extrapolations [see Fig. 1(b)] with unjustified A. Traditional Methods for Unmarked Lane Detection
high confidence for points far from the training data. Representative traditional methods based on hand-
For example, an input x = 6 is classified as class 1 engineered features for detecting pedestrian lanes in
with a probability of 1. In contrast, a GP classifier [see unstructured scenes are listed in Table I. They can be divided
Fig. 1(c)] gives a full probabilistic prediction with a pre- into two categories: 1) lane segmentation and 2) lane-
dictive mean and an uncertainty level for each test input. border detection. In the lane-segmentation approach, color
2) The main limitation of the GP is the high computational models, which are built through offline training, are used to
cost. Existing methods to reduce its cost can differentiate the lane pixels from the background [11], [12],
be categorized into global or local approximation [21], [22]. These methods use different color spaces and
approaches. Global approximations use only sparse classifiers. Crisman and Thorpe [11] represent the on-road
information from the training data and normally cannot and off-road classes with Gaussian color models in the
account for nonstationarity and locality in complex data red–green–blue (RGB) color space. Tan et al. [12] uses color
sets. Local approximations fit a separate GP for each histograms in the RGB space. Ramstrom and Christensen [22]
subregion of the input space and are prone to overfitting. construct Gaussian mixture models from the UV, normalized
In [20], we proposed a GP approximation method that red and green, and luminance components. Sotelo et al. [21]
uses a hierarchical structure to combine global and local employ the hue–saturation–intensity (HSI) color space, and
information from the data set. This method has been classify pixels by thresholding their chromatic distance to the
Authorized licensed use limited to: University of Hull. Downloaded on August 12,2023 at 17:52:35 UTC from IEEE Xplore. Restrictions apply.
5326 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 12, DECEMBER 2020
TABLE I
R EPRESENTATIVE T RADITIONAL M ETHODS FOR U NMARKED L ANE D ETECTION . T HE M ARK “-” M EANS T HAT THE T ECHNIQUE I S N OT U SED
color models. In general, the above methods using offline feature-learning have been proposed recently [18], [19],
trained models do not cope well with the variations in lane [30]. Mendes et al. [19] design and train a CNN for image
appearance, lane surfaces, and illumination conditions. patch classification and then convert it into a network for
To address this problem, several methods choose to build road lane segmentation. During training, each image is
the lane model directly from the sample regions in the input divided into 4 × 4 regions, and a patch centered at each
image [23]–[26]. There have been different ways to obtain region is extracted. The CNN is trained to classify the
sample lane regions. In [26] and [27], small random areas are patches into road or nonroad classes that are attributed to
selected at the bottom and in the middle of the input image. the corresponding 4 × 4 regions. For inference, the CNN is
In [25], the sample lane region is initialized as a trapezoid at converted into a fully convolutional network by turning the
the bottom and center of the image, and then refined using fully connected (FC) layers into convolutional layers. In this
the vanishing point. In [23], the sample lane region is formed way, the network can receive an entire image (instead of a
from the candidate lane boundaries that are detected using patch) as input, and output a class for every image region.
the vanishing point and the assumption about the lane width. However, the classification net uses subsampling to reduce
The performance of the above methods highly depends on the the number of features and the computational complexity.
quality of the sample lane regions, which in turn relies on This will also coarsen the segmentation output in the final
prior knowledge about the lane. network and reduce its size compared with the input image.
In the lane-border-detection approach, the lane borders are To address this problem, newer DL methods for semantic
detected using the vanishing point [14], [15] or the templates segmentation learn to decode low-resolution image represen-
of lane boundaries [28]. In [14], the two lane borders are found tations to pixelwise predictions [30]–[32]. These methods
among the edges pointing to the vanishing point using an typically employ the E-D architecture that comprises two
objective function that measures the color and texture differ- parts: encoder and decoder networks. The encoder network
ences between the lane and background regions. This method acts as a feature extractor that transforms an input image into
requires that the color and texture of the lane region are homo- a low-resolution feature representation. The encoder network
geneous and differ significantly from those of the background typically resembles the VGG16 classification network [33],
regions. In [15], the lane borders are also found from the which has 13 convolutional layers and three FC layers. The
edges directed toward the vanishing point; the edges are ranked decoder network is responsible for decoding or mapping the
using texture orientation and color features. In another method, low-resolution feature representation to a probability map of
the lane boundaries are found from the edges of the homoge- the same size as the input image.
neous color regions by matching with the lane templates [28]. A decoder network is typically comprised of several
The above methods for lane-border detection are sensitive to decoders, each of which increases the size of its input feature
background edges. To overcome this problem, Chang et al. map by a factor of 2. In the fully convolutional network [31],
[29] propose combining lane-border detection and lane seg- each decoder learns to upsample its input feature map through
mentation. In this method, lane borders are detected using deconvolution with a 2 × 2 stride. The upsampled feature map
the vanishing point, and the lane region is segmented using is combined with the corresponding encoder feature map to
the color model learned from a homogeneous region at the produce the input to the next decoder. In the deconvolutional
middle bottom of the input image. In [16], the sample lane network [31], each decoder performs two main operations:
region is first identified using the vanishing point method. The unpooling and deconvolution. Unpooling performs nonlinear
lane region is then segmented using both the matching scores upsampling—a reverse operation of pooling in the encoder
between the edges of the homogeneous color regions and lane network. It uses the locations of the maximum activations
templates, and between the colors of these regions and the selected during pooling to place each activation back to its
color model learned from the sample lane region. original locations. Deconvolution with a 1 × 1 stride is then
used to densify the sparse feature map obtained by unpooling.
In SegNet [30], a convolution layer is used in the place of
B. DL-Based Methods for Unmarked Lane Detection deconvolution in a similar decoder network. In Bayesian Seg-
Convolutional neural networks (CNNs) are originally Net, Kendall et al. [34] extend SegNet to a Bayesian network,
proposed for entire-image classification, but they are making which can produce a probabilistic segmentation output. This is
progress in structured prediction problems such as object done by adding dropout layers to the network at both training
detection and semantic segmentation. Inspired by such a and testing phases. Dropout is used at test time to sample the
progress in the CNNs in semantic segmentation, a few network with randomly dropped out units and, thereby, obtain
road lane-detection methods using CNNs for automatic the samples of the posterior distribution of softmax class
Authorized licensed use limited to: University of Hull. Downloaded on August 12,2023 at 17:52:35 UTC from IEEE Xplore. Restrictions apply.
NGUYEN et al.: HYBRID DL-GP NETWORK FOR PEDESTRIAN LANE DETECTION IN UNSTRUCTURED SCENES 5327
TABLE II
R EPRESENTATIVE DL-BASED M ETHODS FOR ITS
Fig. 3. Example E-D network with four encoder/decoder units. The convolutional layers are denoted as “Conv kernel size-number of channels.” Max-
pooling layers are denoted as “Pool kernel size.” Upsampling layers are denoted as “Up kernel size.” Each Conv layer is immediately followed by a batch
normalization layer and an ReLU layer.
image, the E-D network extracts 64 full-resolution feature An HGP classifier is proposed to combine the advantages
maps, which are then rearranged into an array of feature of sparse approximation and MoGPE by exploiting both the
vectors; each vector consists of 64 features corresponding to global information and local information from the training data
a pixel. HGP takes a feature vector as input and classifies the through a two-layer hierarchical structure. In the upper layer,
corresponding pixel into lane or background. a sparse GP, hereafter called the global GP, is used to coarsely
Let x denote an extracted feature vector (row vector of model the entire data set. In the lower layer, a gating network
D features; D = 64 in this application), and y denote the divides the input space into T regions; and within each region,
corresponding class label: y = 1 for lane pixel and y = 0 a specific local GP expert is used for finer modeling. All the
for background. Let N denote the total number of pixels from local GP have a common mean function m(x), which encodes
all the training images, and X = [(x1 )T , . . . , (x N )T ]T and y = information from the global GP. In this way, information is
[y1 , . . . , y N ]T denote the collection of all the feature vectors shared between the two layers as well as among the local
and the corresponding labels of all the N training pixels. X experts to avoid overfitting.
is a matrix with the dimension of N × D and y is a column Herein, we use subscript 0 for the global unit and subscript
vector of size N × 1. We adopt the convention that lowercase k for the kth local expert (k = 1, . . . , T ).
italic letters denote scalar variables or functions, uppercase 3) Details of the GPs: The global GP is associated with a
italic letters denote scalar constants, lowercase bold letters latent function f 0 (x), a zero mean function, and a covariance
denote vectors, and uppercase bold letters denote matrices. function κ0 (x, x ): f 0 (x) ∼ GP(0, κ0 (x, x )). The kth local
The operators row(A) and col(A) return the number of rows GP is associated with a latent function f k (x), the mean
and columns of a matrix A, respectively. function m(x), and a covariance function κk (x, x ): f k (x) ∼
1) Background on GP Classifiers: GP models are powerful GP(m(x), κk (x, x )), for k = 1, . . . , T . Each GP in the model
tools for Bayesian classification [42]–[46]. They have two is a sparse GP in which the training latent variables (i.e.,
most desirable properties. First, since a GP is a nonparametric the latent variables placed at the training inputs) are summa-
model, it has only a few hyperparameters that need to be rized by a set of M inducing points. Each inducing point is
learned, and thus, overfitting can be avoided. Second, GPs give comprised of an inducing input, which is a point from the
probabilistic predictions with well-calibrated uncertainties. input space X , and its corresponding latent variable.
Let X denote the feature space. In GP classification, We introduce the following notations for the kth GP
we assume that there is an underlying latent function f (x): (k = 0, . . . , T ): f kn denotes f k (xn ), fk is the vector
X −→ R that is distributed according to a GP: f (x) ∼ (size N × 1) of all training latent variables, gk is the
GP(m(x), κ(x, x )). Here, m(x) and κ(x, x ) denote the mean vector (size M × 1) of the inducing variables, Uk is the
function and the covariance function that characterize the GP. matrix (size M × D) formed by the M inducing inputs, θk is
For each input x, the value of f (x) is a latent variable. the set of hyperparameters of the covariance function κk (x, x ),
(k)
For any given set of input points, the GP places a prior and KAB is the covariance matrix [size row(A) × row(B)]
multivariate normal distribution on the corresponding latent formed by evaluating κk (x, x ) at all pairs of points (x, x ),
variables. The observed output y, given the latent variable where x is in A and x is in B.
f (x), is then distributed according to a non-Gaussian like- a) Global GP prior: The global GP places a joint prior
lihood: p(y| f (x)) = h( f (x)). The objects of interest are the distribution on its latent variables g0 and f0 , which results in
posterior of the latent variables placed at the training inputs the following distributions:
and the predictive distribution p(y ∗ |y) for the class label y ∗
at a test point x∗ ; they can be estimated using approximate p(g0 ) = N 0, K(0)U0 U0 (1)
inference methods such as the variational inference. (0) (0) −1 (0)
p(f0 |g0 ) = N KXU0 KU0 U0 g0 , KXX
2) Rationale Behind HGP: A limitation of GP is its high (0) −1 (0)
computational cost, mainly due to the inversion of the kernel − K(0)
XU0 KU0 U0 KU0 X. . (2)
matrix, which is O(N 3 ) in the training time, where N is
the number of training samples. Many approximation methods b) Local GP prior: To encode information from the
for GP have been proposed to overcome this limitation. They global unit, we set the prior mean of the local GPs to be
(0) (0)
can be classified into global and local approaches. Global GP the mean of p( f 0 (x)|g0 ), which is m(x) = KxU0 [KU0 U0 ]−1 g0 ,
approximation methods (also known as sparse GP methods) according to (2). This imposes a complex dependence between
try to summarize all the training data using a set of M N gk and g0 , making it difficult for model inference. To remove
inducing points to reduce the computational cost [47]–[50]. this dependence, we introduce new latent variables hk =
The sparse GPs work well for simple data sets, but they gk −m(Uk ). This results in the following distributions:
cannot deal with locality and nonstationarity in complex (k)
data sets. p(hk ) = N 0, KUk Uk (3)
Local approaches attempt to overcome the limitation of (k) (k) −1 (k)
p(fk |hk , g0 ) = N KXUk KUk Uk hk + KXU0 K−1
U0 U0 g0 KXX
global methods by employing a mixture of GP experts
(k) (k) −1 (k)
(MoGPE). In MoGPE, a gating network divides the input space − KXUk KUk Uk KUk X . (4)
into regions within which a specific GP expert is responsible
for making predictions [51]–[54]. In this way, the locality and 4) Likelihood Function: In the upper layer, the observed
nonstationarity in the data are addressed. The computational pixel labels are related to latent variables according to the
cost is also reduced as the inversion of a large kernel matrix following likelihood:
is replaced by that of several smaller matrices. However, since p(yn | f 0n ) = B(yn |φ( f 0n )) = φ( f0n ) yn (1 − φ( f0n ))1−yn .
each GP expert is trained independently using only the local
data assigned to it, without considering the global information Here, φ(z) is the probit function that maps the latent variables
(the correlations between the clusters), the experts are likely into the unit interval [0, 1], and B denotes a Bernoulli distrib-
to overfit the local training data. ution. The above equation implies that the class membership
Authorized licensed use limited to: University of Hull. Downloaded on August 12,2023 at 17:52:35 UTC from IEEE Xplore. Restrictions apply.
5330 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 12, DECEMBER 2020
Fig. 6. HGP classification model for pixelwise lane segmentation. Arrows indicate the dataflow at test time.
We note that the computational complexity is independent Algorithm 1 End-to-End Training Algorithm for the Hybrid
of the number of experts T . Therefore, we can freely select the DL-HGP Network
value of T to fit the data. For that, an autoselection mechanism 1: Initialize all the parameters (denoted as δ) of the E-D
for T is implemented as follows. We first initialize T = T0 , network.
where T0 is larger than the expected number of experts and 2: Initialize all the parameters of HGP, i.e., γ and q(z).
is typically smaller than 10. After each optimization iteration,
3: repeat
if an expert k does not have any training pixel assigned to it
(q(z n = k) = 1), this expert will be removed. 4: Sample a mini-batch of B I images randomly from the
8) Predicting With HGP: Consider a test pixel with the training set.
feature vector x∗ . Let f k∗ denote fk (x∗ ). The predictive distri- 5: Pass the mini-batch through the E-D network to
bution for fk∗ can be approximated as generate a set of 64 full-resolution feature maps for
each image in the mini-batch.
p( f k∗ |y) ≈ p f k∗ |hk , g0 q(hk )q(g0 )dhk dg0 = N (μ∗k , σk∗2 ) 6: Rearrange the feature maps to form a set of feature
where vectors denoted as S (t ). Each vector in S (t ) has 64
T T features corresponding to a pixel in the mini-batch.
μ∗k = a∗k mk + a∗0 m0 7: Update q(z n ) according to Eq. (10) for each feature
T ∗ ∗ T
σk∗2 = κk (xn , xn )+ a∗k Sk −K(k) ∗ vector xn in S (t ) .
Uk Uk ak + a0 S0 a0 (12)
8: Calculate the partial derivatives of L(S (t ) ) (Eq. (11))
(k) (k)
with a∗k = [KUk Uk ]−1 KUk x∗ . It can be observed that the w.r.t. all the GP variables in γ and all the parameters
last terms of μ∗k and σk∗2 in (12) are actually the posterior δ of the E-D network using back- propagation.
mean and variance of m(x∗ ), which represent the information 9: Update the current estimate of γ and δ using the
passed
from the upperlayer T to the ∗lower
T layer:
q(m(x∗ )) = calculated partial derivatives.
∗ ∗ ∗
N m̄(x ), v̄(x ) = N a0 m0 , a0 S0 a0 . ∗ 10: until convergence.
The prediction at x∗ by expert k is then given by
GP, the local GP experts, and the gating network for a single
p(y ∗ |x∗ , y, z ∗ = k) = p y ∗ | f k∗ p f k∗ |y d f k∗ (13)
test image.
which can be computed by the Gauss–Hermite quadrature
method giving two outputs: the predictive mean (or the lane C. Training and Testing for the Hybrid DL-GP Architecture
probability) and the predictive variance (or the uncertainty). When HGP is used as the classifier in the proposed hybrid
The final prediction by HGP at x∗ can be computed as the DL-GP architecture, we referred to the resulting network as
weighted average of the predictions from T GP experts the DL-HGP network.
T 1) Training: The end-to-end training procedure for the
p(y ∗ |x∗ , y) = p(z ∗ = k|x∗ , y) p(y ∗ |x∗ , yz ∗ = k) DL-HGP network is presented in Algorithm 1.
k=1 As discussed in Section III-B6, the HGP classifier alone
will converge to a local or global maximum. However, this
T
≈ q(z ∗ = k) p(y ∗ |x∗ , y, z ∗ = k) (14) does not guarantee the convergence of the training algorithm
for the hybrid DL-HGP network. In fact, it is also well known
k=1
that even the traditional neural networks are not guaranteed
where q(z ∗ = k) is the output of the gating network, which to converge to a global optimum. Nevertheless, throughout
is calculated according to (10). our experiments, the training algorithm for DL-HGP always
The computational cost of the predicting label for a single converges to a meaningful state, in which the network gives
test pixel by HGP is dominated by the computation of σk∗2 in good predictions on a validation set. This convergence state
(12), which has the time complexity of O(M 2 ). can be detected when both the network parameters and the
Fig. 6 illustrates the input and output of the HGP, and the objective function do not change significantly anymore (i.e.,
dataflow among its main components including the global their changes are below a certain threshold).
Authorized licensed use limited to: University of Hull. Downloaded on August 12,2023 at 17:52:35 UTC from IEEE Xplore. Restrictions apply.
5332 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 12, DECEMBER 2020
Authorized licensed use limited to: University of Hull. Downloaded on August 12,2023 at 17:52:35 UTC from IEEE Xplore. Restrictions apply.
NGUYEN et al.: HYBRID DL-GP NETWORK FOR PEDESTRIAN LANE DETECTION IN UNSTRUCTURED SCENES 5333
Fig. 7. Examples from the PLVP2 data set. First rows: pedestrian lane images. Second rows: ground-truth-segmentation masks.
TABLE IV
P ERFORMANCE OF D IFFERENT E-D N ETWORK C ONFIGURATIONS FOR L ANE D ETECTION ON THE F OLD -1 PLVP2 VALIDATION S ET. T HE L ISTED
N UMBER OF PARAMETERS C OUNTS A LL THE PARAMETERS OF THE E-D N ETWORK AND THE GP C LASSIFIER
Authorized licensed use limited to: University of Hull. Downloaded on August 12,2023 at 17:52:35 UTC from IEEE Xplore. Restrictions apply.
5334 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 12, DECEMBER 2020
TABLE VI
P ERFORMANCE OF D IFFERENT L ANE -D ETECTION M ETHODS ON THE PLVP2 D ATA S ET U SING F IVEFOLD C ROSS -VALIDATION ; THE B EST
P ERFORMANCE M EASURES A RE G IVEN IN B OLD
listed in Table IV. We observe that the network achieves higher 1) DeepLabv3 + [60]: This method also uses an E-D
accuracy and F-measure by using more encoder/decoder units, structure. The encoder network can extract features at
up to four to five units. The performance drops when the sixth an arbitrary resolution by applying atrous convolution.
encoder/decoder unit is added. This dropping in performance It also uses atrous Spatial Pyramid Pooling with multiple
can be explained by the fact that too much spatial information atrous rates to capture multi scale context. Normally,
has been lost after six pooling layers of the encoder network the encoder will generate feature maps with an output
that it cannot be effectively recovered by the decoder network. stride of 8 or 16 (i.e., the size of the feature maps is
The configurations C4K5 and C5K5, which have the filter 1/8 or 1/16 of that of the input image). A simple decoder
size of 5 × 5 and use four and five encoder/decoder units, module recovers the object boundaries. We find that the
respectively, achieve the best performances in terms of both encoder output stride of 8 performs the best for this data
accuracy and F-measure. The configuration C4K5 is chosen set. Therefore, the result of DeepLabv3+ with an output
for our network, because it has fewer trainable parameters stride of 8 is reported.
(and, hence, it is less prone to overfitting). 2) Fully Convolutional DenseNets [61]: This method
2) Initial Number of Expert Search: Next, we fix the config- extends DenseNets [62] to deal with the problem
uration C4K5 for the E-D network and vary the initial number of semantic segmentation. It uses a downsampling–
of GP experts T0 among 3, 6, 9, and 12. The performance of upsampling style E-D network. Each stage (between
the network using different values for T0 on the PLVP2 vali- the pooling layers) uses a dense block, in which each
dation set is presented in Table V. The network achieves the layer takes a concatenation of feature maps from all
best performance at T0 = 9, where the number of experts T at the preceding layers as input. It also concatenates skip
convergence is 9. When T0 increases to 12 or 15, the network connections from the encoder to the decoder. We report
also converges at T = 9 and achieves the performance that is the results for the Fully Convolutional DenseNets
very close to when T0 = 9. This result shows that the proposed with 56 layers (FC-DenseNet56) and with 103 layers
autoselection mechanism for T is effective as long as the initial (FC-DenseNet103).
number of experts T0 is larger than or equal to the expected 3) SegNet [30]: An E-D network is followed by a linear
T . For the remaining experiments, we fix T0 = 9. classifier (an FC layer and a Softmax layer), which
generates a class label for each image pixel. The encoder
C. Comparison With Other Lane-Detection Methods network of SegNet resembles the first part of the
VGG16 classification network [33] with 13 convolu-
1) Quantitative Comparison: In this experiment, we use tional layers.
fivefold cross-validation to compare the performances of dif- 4) Bayesian SegNet [34]: It has dropout layers added to the
ferent lane-detection methods on the PLVP2 data set. The SegNet architecture at both training and testing phases
proposed DL-HGP network is compared with the traditional to produce a probabilistic segmentation output.
methods that use hand-engineered features and the DL-based
methods for unmarked lane detection. Two representative and We also experiment with the variants of SegNet, Bayesian
relevant traditional methods are included in this experiment. SegNet, and the proposed network (C4K5 + HGP).
1) Edge-Based Method [15]: This method detects the lane 1) SegNet-Basic [30]: This is a smaller version of SegNet,
boundaries from the edges pointing to the vanishing where each of the encoder and decoder networks has
points. We use the MATLAB code provided by Kong four convolutional layers with filter size 7 × 7. Each
et al. [15]. convolutional layer is followed by a pooling or upsam-
2) Border-Detection + Segmentation [16]: This method pling layer.
combines lane-border detection and lane segmen- 2) Bayesian SegNet-Basic [34]: Dropout layers are added
tation. We use the MATLAB code provided by to SegNet-Basic to produce probabilistic segmentation.
Phung et al. [16]. 3) SegNet + HGP, Bayesian SegNet + HGP, SegNet-
Four state-of-the-art DL-based methods for road scene seg- Basic + HGP, and Bayesian SegNet-Basic + HGP:
mentation are evaluated in this experiment. These are, respectively, the variants of SegNet, Bayesian
Authorized licensed use limited to: University of Hull. Downloaded on August 12,2023 at 17:52:35 UTC from IEEE Xplore. Restrictions apply.
NGUYEN et al.: HYBRID DL-GP NETWORK FOR PEDESTRIAN LANE DETECTION IN UNSTRUCTURED SCENES 5335
Fig. 8. Pedestrian lane segmentation using different algorithms. Column 1: input images. Column 2: output of the Border-detection + segmentation method
[16]. Column 3: output of SegNet [30]. Column 4: output of Bayesian SegNet [34]. Column 5: output of the proposed DL-HGP network. See the electronic
color image.
SegNet, SegNet-Basic, and Bayesian SegNet-Basic, with the higher accuracy between the two evaluated
where the linear classifier is replaced by the HGP traditional methods, by 2.12% in accuracy and 1.67% in
classifier. F-measure, while being approximately 173 times faster
4) C4K5 + Linear-Classifier: This is a variant of the than Border-detection + segmentation for inference.
proposed network, where HGP is replaced by a linear 2) For each of the DL networks in the third group, replacing
classifier. its linear classifier by the HGP classifier in the fourth
All the DL-based methods in our experiments are imple- group improves the performance. Especially, we find
mented in Python using Tensorflow. For Bayesian SegNet and that combining the HGP classifier with a compact E-D
its variants, each test image is passed through the network network like SegNet-Basic and C4K5 provides more
30 times to obtain 30 samples of the posterior distribution of significant increases in accuracy (2.29% and 1.21%
the segmentation output. improvement) and F-measure (1.50% and 1.67%
The performance of different methods on the PLVP2 data set improvement). This result shows that using a powerful
using fivefold cross-validation is presented in Table VI. Here, GP classifier can help mitigate the need for a complex
the methods are arranged into four groups. The first group con- E-D network in pedestrian lane detection.
sists of traditional methods that use hand-engineered features. 3) The compact E-D network C4K5 has fewer than half
The second group includes the variants of DeepLabv3+ and the number of trained parameters of the E-D network
Fully Convolutional DenseNets. The third group comprises of SegNet: 13 017 472 versus 29 480 064 parameters.
the variants of SegNet, Bayesian SegNet, and the proposed However, the proposed network combining C4K5 with
network that use the linear classifier. The last group involves HGP is more accurate than SegNet + HGP. It also gives
the variants of SegNet, Bayesian SegNet, and the proposed the best performance among the tested methods in terms
network that use the HGP classifier. The following can be of accuracy, F-measure, and recall. It outperforms the
observed from the results in Table VI. state-of-the-art methods SegNet and Bayesian SegNet by
1) The DL methods outperform the traditional methods 1.21% and 0.82% in terms of accuracy, and 1.56% and
that use the hand-engineered features for unmarked 1.19% in terms of F-measure. This is because pairing
lane detection, and in general, they take a much shorter the compact E-D network C4K5 with the powerful HGP
time for inference. SegNet-Basic, which has the second classifier gives a similar modeling power to the larger
lowest accuracy among the DL methods, outperforms networks while mitigating the overfitting problem,
Border-detection + segmentation, which is the one especially for small data sets. We can also expect that
Authorized licensed use limited to: University of Hull. Downloaded on August 12,2023 at 17:52:35 UTC from IEEE Xplore. Restrictions apply.
5336 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 12, DECEMBER 2020
Fig. 9. Pedestrian lane-detection results of Bayesian SegNet and DL-HGP. Column 1: input images. Column 2 and 3: detected lanes and the uncertainty
maps by Bayesian SegNet. Column 4 and 5: detected lanes and the uncertainty maps by DL-HGP. A brighter intensity in the uncertainty maps presents a
higher uncertain level. See the electronic color image.
the proposed network can perform better than the above (as seen in images 1, 4, 5, and 7). As a result, its performance
larger networks for new test samples that are quite is poor for indoor scenes in which there exist many strong
different from those seen in the PLVP2 data set. structured edges (image 7). It also does not perform well in a
4) In Bayesian SegNet-Basic and Bayesian SegNet, adding scene with extreme lighting conditions such as strong shadow
the dropout layers for probabilistic segmentation has a (image 2) or when the colors of the lane and background
positive side effect: It results in a slightly improved region are similar (images 3 and 6). The three remaining
accuracy of these two networks over SegNet-Basic and (DL-based) methods are less susceptible to the above prob-
SegNet. This improvement comes at the cost of much lems. Especially, they give much better results for indoor
slower inference: Inference in Bayesian SegNet-Basic scenes (image 7). Among these methods, DL-HGP seems to
and Bayesian SegNet requires 0.484 and 0.995 s/test be the one that is most robust to the background edges (as seen
image, respectively. On the other hand, combining the in images 1 and 5), and to the extreme lighting conditions (as
dropout layers into the networks that use the HGP clas- seen in images 2, 3, and 5). There are three possible reasons
sifier does not seem to improve the performance of these for DL-HGP to be able to address these conditions better than
networks: Bayesian SegNet-Basic + HGP classifier ver- the two other DL methods.
sus SegNet-Basic + HGP classifier, and Bayesian Seg-
Net + HGP classifier versus SegNet + HGP classifier. 1) While the linear classifier at the end of a traditional
5) The Fully Convolutional DenseNets (FC-DenseNet103) DNN requires the network to extract linearly separable
gives a good performance, which is close to features, the HGP classifier does not. The HGP can
the performance of the proposed network. Hence, classify well using nonlinearly separable features.
we believe that it would be beneficial to combine an 2) Since HGP does not require linearly separable features,
E-D architecture based on dense blocks with an HGP we can afford to use fewer network layers in DL-HGP
classifier for the pedestrian lane-detectionproblem. (16 convolutional and four pooling layers) than that in
We leave it for future exploration. SegNet and Bayesian SegNet (26 convolutional and five
2) Visual Comparison on Lane-Detection Results: Fig. 8 pooling layers). Using fewer pooling layers means that
shows the visual comparative results of different methods for the resolution of the final encoder feature maps is not
pedestrian lane detection. The compared methods include the reduced as much. Therefore, more edge and boundary
Border-detection + segmentation method [16], SegNet [30], information still remains on the final encoder feature
Bayesian SegNet [34], and DL-HGP. The Border-detection + maps. This helps DL-HGP to detect the lane boundary
segmentation method [16] is susceptible to background edges better.
Authorized licensed use limited to: University of Hull. Downloaded on August 12,2023 at 17:52:35 UTC from IEEE Xplore. Restrictions apply.
NGUYEN et al.: HYBRID DL-GP NETWORK FOR PEDESTRIAN LANE DETECTION IN UNSTRUCTURED SCENES 5337
3) HGP is a generative model, which is well known for [12] C. Tan, H. Tsai, T. Chang, and M. Shneier, “Color model-based real-
generalizing better and for being more robust to noise time learning for road following,” in Proc. IEEE Conf. Intell. Transp.
than a discriminative model (e.g., the linear classifier Syst., Sep. 2006, pp. 939–944.
[13] J. M. Alvarez, T. Gevers, and A. M. Lopez, “Vision-based road detection
and the SVM), especially for small data sets [63]. using road models,” in Proc. ICIP, 2009, pp. 2073–2076.
The compact E-D network with a smaller number of [14] C. Rasmussen, “Texture-based vanishing point voting for road shape
parameters in the proposed DL-HGP architecture is also estimation,” in Proc. BMVC, 2004, pp. 470–477.
less likely to overfit training data and is more robust to [15] H. Kong, J.-Y. Audibert, and J. Ponce, “General road detection
noise. from a single image,” IEEE Trans. Image Process., vol. 19, no. 8,
pp. 2211–2220, Aug. 2010.
3) Visual Comparison on Uncertainty Maps: Fig. 9 shows [16] S. L. Phung, M. C. Le, and A. Bouzerdoum, “Pedestrian lane detection
the visual comparative results for the uncertainty maps gen- in unstructured scenes for assistive navigation,” Comput. Vis. Image
erated by Bayesian SegNet and by the proposed DL-HGP. It Understand., vol. 149, pp. 186–196, Aug. 2016.
can be seen that DL-HGP is generally more certain about its [17] J. Kim and M. Lee, “Robust lane detection based on convolutional
neural network and random sample consensus,” in Proc. ICONIP, 2014,
prediction than Bayesian SegNet. The areas of high uncer- pp. 454–461.
tainty generated by DL-HGP are mostly located near the lane [18] D. P. A. Nugroho and M. Riasetiawan, “Road lane segmentation using
boundaries. On the other hand, the areas of high uncertainty deconvolutional neural network,” in Proc. Int. Conf. Soft Comput. Data
produced by Bayesian SegNet appear more at random loca- Sci., 2017, pp. 13–22.
tions, which can be seen clearly in images 1, 2, 3, 4, and 7. [19] C. C. T. Mendes, V. Fremont, and D. F. Wolf, “Exploiting fully
convolutional neural networks for fast road detection,” in Proc. IEEE
The uncertainty maps by DL-HGP can be more useful for the Int. Conf. Robot. Autom. (ICRA), May 2016, pp. 3174–3179.
pedestrian lane-detection application, since they can be used [20] T. N. A. Nguyen, A. Bouzerdoum, and S. L. Phung, “A scalable
to locate the lane boundaries and to warn the blind users about hierarchical Gaussian process classifier,” IEEE Trans. Signal Process.,
the areas of high uncertainty near the lane boundary to keep vol. 67, no. 11, pp. 3042–3057, Jun. 2019.
them within the safe pedestrian lane areas. [21] M. A. Sotelo, F. J. Rodriguez, L. Magdalena, L. M. Bergasa, and
L. Boquete, “A color vision-based lane tracking system for autonomous
V. C ONCLUSION driving on unmarked roads,” Auto. Robots, vol. 16, no. 1, pp. 95–116,
Jan. 2004.
This article presents a method for pedestrian lane detection [22] O. Ramstrom and H. Christensen, “A method for following unmarked
in unstructured environments using a hybrid DL-GP architec- roads,” in Proc. IEEE Intell. Vehicles Symp., 2005., 2005, pp. 650–655.
ture to segment scene images into the pedestrian lane and [23] Y. He, H. Wang, and B. Zhang, “Color-based road detection in
background regions. The proposed hybrid network, which can urban traffic scenes,” IEEE Trans. Intell. Transp. Syst., vol. 5, no. 4,
pp. 309–318, Dec. 2004.
be trained in an end-to-end manner, combines a compact con-
[24] C. Oh, J. Son, and K. Sohn, “Illumination robust road detection using
volutional E-D network with a powerful nonparametric HGP geometric information,” in Proc. 15th Int. IEEE Conf. Intell. Transp.
classifier to mitigate the overfitting problem while maintaining Syst., Sep. 2012, pp. 1566–1571.
its modeling power. In addition to the segmentation output [25] O. Miksik, P. Petyovsky, L. Zalud, and P. Jura, “Robust detection of
for each test image, the network also generates a map of shady and highlighted roads for monocular camera based navigation of
well-calibrated uncertainty. Last but not least, a new data set UGV,” in Proc. ICRA, 2011, pp. 64–71.
[26] J. M. Á. Alvarez and A. M. Lopez, “Road detection based on illu-
of 5000 images for training and evaluating pedestrian lane- minant invariance,” IEEE Trans. Intell. Transp. Syst., vol. 12, no. 1,
detection algorithms is introduced. It is expected to facilitate pp. 184–193, Mar. 2011.
research in pedestrian lane detection, especially the application [27] J. M. Alvarez, T. Gevers, Y. LeCun, and A. M. Lopez, “Road scene
of DL in this area. segmentation from a single image,” in Proc. ECCV, 2012, pp. 376–389.
[28] J. D. Crisman and C. E. Thorpe, “UNSCARF—A color vision sys-
R EFERENCES tem for the detection of unstructured roads,” in Proc. ICRA, 1991,
pp. 2496–2501.
[1] A. J. Jackson and J. S. Wolffsohn, Low Vision Manual. Amsterdam, [29] C.-K. Chang, C. Siagian, and L. Itti, “Mobile robot monocular vision
The Netherlands: Elsevier, 2007. navigation based on road region and boundary estimation,” in Proc.
[2] R. C. Simpson, “How many people would benefit from a smart wheel- IEEE/RSJ Int. Conf. Intell. Robots Syst., Oct. 2012, pp. 1043–1050.
chair?” J. Rehabil. Res. Develop., vol. 45, no. 1, pp. 53–72, Dec. 2008. [30] V. Badrinarayanan, A. Kendall, and R. Cipolla, “SegNet: A deep
[3] J. Kim and H. Shin, Algorithm and SoC Design for Automotive Vision convolutional encoder-decoder architecture for image segmentation,”
Systems. Amsterdam, The Netherlands: Springer, 2014. IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 12, pp. 2481–2495,
[4] I. Ulrich and J. Borenstein, “The GuideCane-applying mobile robot Dec. 2017.
technologies to assist the visually impaired,” IEEE Trans. Syst., Man, [31] J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks
Cybern. A, Syst. Humans, vol. 31, no. 2, pp. 131–136, Mar. 2001. for semantic segmentation,” in Proc. CVPR, Jun. 2015, pp. 3431–3440.
[5] S. Shoval, J. Borenstein, and Y. Koren, “Auditory guidance with the
[32] H. Noh, S. Hong, and B. Han, “Learning deconvolution network for
Navbelt—A computerized travel aid for the blind,” IEEE Trans. Syst.,
semantic segmentation,” in Proc. ICCV, Dec. 2015, pp. 1520–1528.
Man, Cybern. C, Appl. Rev., vol. 28, no. 3, pp. 459–467, Mar. 1998.
[6] HumanWare. (2015). BrailleNote GPS. [Online]. Available: [33] K. Simonyan and A. Zisserman, “Very deep convolutional networks
https://store.humanware.com/hus/braillenote-gps-software-and-receiver- for large-scale image recognition,” 2014, arXiv:1409.1556. [Online].
package.html Available: https://arxiv.org/abs/1409.1556
[7] S. Se and M. Brady, “Road feature detection and estimation,” Mach. [34] A. Kendall, V. Badrinarayanan, and R. Cipolla, “Bayesian SegNet:
Vis. Appl., vol. 14, no. 3, pp. 157–165, Jul. 2003. Model uncertainty in deep convolutional encoder-decoder architectures
[8] M. S. Uddin and T. Shioyama, “Bipolarity and projective invariant-based for scene understanding,” in Proc. BMVC, 2017.
zebra-crossing detection for the visually impaired,” in Proc. CVPRW, [35] G. J. Brostow, J. Fauqueur, and R. Cipolla, “Semantic object classes in
2005, pp. 22–30. video: A high-definition ground truth database,” Pattern Recognit. Lett.,
[9] V. Ivanchenko, J. Coughlan, and S. Huiying, “Detecting and locating vol. 30, no. 2, pp. 88–97, Jan. 2009.
crosswalks using a camera phone,” in Proc. CVPRW, 2008, pp. 1–8. [36] J. Jin, K. Fu, and C. Zhang, “Traffic sign recognition with hinge loss
[10] M. C. Le, S. L. Phung, and A. Bouzerdoum, “Pedestrian lane detec- trained convolutional neural networks,” IEEE Trans. Intell. Transp. Syst.,
tion for assistive navigation of blind people,” in Proc. ICPR, 2012, vol. 15, no. 5, pp. 1991–2000, Oct. 2014.
pp. 2594–2597. [37] M. M. Bejani and M. Ghatee, “Convolutional neural network
[11] J. Crisman and C. Thorpe, “SCARF: A color vision system that tracks with adaptive regularization to classify driving styles on smart-
roads and intersections,” IEEE Trans. Robot. Automat., vol. 9, no. 1, phones,” IEEE Trans. Intell. Transp. Syst., early access, doi:
pp. 49–58, Feb. 1993. 10.1109/TITS.2019.2896672.
Authorized licensed use limited to: University of Hull. Downloaded on August 12,2023 at 17:52:35 UTC from IEEE Xplore. Restrictions apply.
5338 IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 31, NO. 12, DECEMBER 2020
[38] A. Pashaei, M. Ghatee, and H. Sajedi, “Convolution neural network joint Thi Nhat Anh Nguyen received the B.Eng. degree
with mixture of extreme learning machines for feature extraction and (Hons.) and the M.Eng. degree from Nanyang Tech-
classification of accident images,” J. Real-Time Image Process., vol. 16, nological University, Singapore, in 2007 and 2012,
pp. 1–16, Feb. 2019. respectively, and the Ph.D. degree from the Univer-
[39] R. Bastani Zadeh, M. Ghatee, and H. R. Eftekhari, “Three-phases sity of Wollongong, Wollongong, NSW, Australia,
smartphone-based warning system to protect vulnerable road users under in 2019, all in computer engineering.
fuzzy conditions,” IEEE Trans. Intell. Transp. Syst., vol. 19, no. 7, Her research interests include machine learning,
pp. 2086–2098, Jul. 2018. pattern recognition, computer vision, and image
[40] Y. Lv, Y. Duan, W. Kang, Z. Li, and F. Wang, “Traffic flow prediction processing.
with big data: A deep learning approach,” IEEE Trans. Intell. Transp.
Syst., vol. 16, no. 2, pp. 865–873, Apr. 2014.
[41] L. Zhang, F. Yang, Y. Daniel Zhang, and Y. J. Zhu, “Road crack detection
using deep convolutional neural network,” in Proc. IEEE Int. Conf.
Image Process. (ICIP), Sep. 2016, pp. 3708–3712.
[42] M. Kuss and C. E. Rasmussen, “Assessing approximate inference for
binary Gaussian process classification,” J. Mach. Learn. Res., vol. 6,
pp. 1679–1704, Oct. 2005.
[43] Y. Altun, T. Hofmann, and A. J. Smola, “Gaussian process classifica-
tion for segmenting and annotating sequences,” in Proc. ICML, 2004,
pp. 4–11. Son Lam Phung (Senior Member, IEEE) received
[44] A. Kapoor, K. Grauman, R. Urtasun, and T. Darrell, “Active learning the B.Eng. degree (Hons.) and the Ph.D. degree
with Gaussian processes for object categorization,” in Proc. ICCV, 2007, from Edith Cowan University, Perth, WA, Australia,
pp. 1–8. in 1999 and 2003, respectively, all in computer
[45] H. Nickisch and C. E. Rasmussen, “Approximations for binary engineering.
Gaussian process classification,” J. Mach. Learn. Res., vol. 9, no. 10, He is currently an Associate Professor with the
pp. 2035–2078, 2008. School of Electrical, Computer and Telecommu-
[46] T. P. N. A. Centeno and N. D. Lawrence, “Optimising kernel parameters nications Engineering, University of Wollongong,
and regularisation coefficients for non-linear discriminant analysis,” Wollongong, NSW, Australia. His general research
J. Mach. Learn. Res., vol. 7, pp. 455–491, Feb. 2006. interests are in the areas of image and signal
[47] N. Lawrence, M. Seeger, and R. Herbrich, “Fast sparse Gaussian processing, neural networks, pattern recognition, and
process methods: The informative vector machine,” in Proc. NIPS, 2003, machine learning.
pp. 625–632. Dr. Phung received the University and Faculty Medals in 2000.
[48] A. Naish-Guzman and S. Holden, “The generalized FITC approxima-
tion,” in Proc. NIPS, 2007, pp. 1057–1064.
[49] J. Hensman, A. Matthews, and Z. Ghahramani, “Scalable variational
Gaussian process classification,” in Proc. AISTATS, 2015, pp. 351–360.
[50] J. Hensman, A. G. Matthews, M. Filippone, and Z. Ghahramani,
“MCMC for variationally sparse Gaussian processes,” in Proc. NIPS,
2015, pp. 1648–1656.
[51] C. E. Rasmussen and Z. Ghahramani, “Infinite mixtures of Gaussian
process experts,” in Proc. NIPS, 2002, pp. 881–888.
[52] V. Tresp, “Mixtures of Gaussian processes,” in Proc. NIPS, 2000, Abdesselam Bouzerdoum (Senior Member, IEEE)
pp. 654–660. received the M.S.E.E. and Ph.D. degrees in electri-
[53] T. N. A. Nguyen, A. Bouzerdoum, and S. L. Phung, “Variational cal engineering from the University of Washington,
inference for infinite mixtures of sparse Gaussian processes through KL- Seattle, WA, USA, in 1986 and 1991, respectively.
correction,” in Proc. IEEE Int. Conf. Acoust., Speech Signal Process. In 1991, he joined The University of Adelaide,
(ICASSP), Mar. 2016, pp. 2579–2583. Adelaide, SA, Australia, where he was a Research
[54] J. Shi, R. Murray-Smith, and D. Titterington, “Bayesian regression Associate and then a Senior Assistant Professor.
and classification using mixtures of Gaussian processes,” Int. J. Adapt. In 1998, he joined Edith Cowan University, Perth,
Control Signal Process., vol. 17, no. 2, pp. 149–161, Mar. 2003. WA, Australia, as an Associate Professor. In 2004,
[55] T. N. A. Nguyen, A. Bouzerdoum, and S. L. Phung, “Stochastic varia- he was appointed as a Professor of computer engi-
tional hierarchical mixture of sparse Gaussian processes for regression,” neering and the Head of the School of Electri-
Mach. Learn., vol. 107, no. 12, pp. 1947–1986, Dec. 2018. cal, Computer and Telecommunications Engineering, University of Wollon-
[56] M. Girolami and S. Rogers, “Variational Bayesian multinomial probit gong, Wollongong, NSW, Australia, where he served as an Associate Dean
regression with Gaussian process priors,” Neural Comput., vol. 18, no. 8, Researcher for the Faculty of Informatics from 2007 to 2013. He is currently
pp. 1790–1817, Aug. 2006. serving as the Head of the Information and Computing Technology Division,
[57] J. Fritsch, T. Kuhnl, and A. Geiger, “A new performance measure and College of Science and Engineering, Hamad Bin Khalifa University, Doha,
evaluation benchmark for road detection algorithms,” in Proc. 16th Int. Qatar, and a Senior Professor of computer engineering with the University of
IEEE Conf. Intell. Transp. Syst. (ITSC), Oct. 2013, pp. 1693–1700. Wollongong. He held several visiting professor appointments at the Institut
[58] A. G. D. G. Matthews et al., “GPflow: A Gaussian process library using Galilée, Université Paris-13, Villetaneuse, France; LAAS/CNRS, Toulouse,
TensorFlow,” J. Mach. Learn. Res., vol. 18, no. 40, pp. 1–6, 2017. France; Institut Femto-st, Besancon, France; Villanova University, Villanova,
[59] M. K. Titsias, “Variational learning of inducing variables in sparse PA, USA; and The Hong Kong University of Science and Technology,
Gaussian processes,” in Proc. AISTATS, 2009, pp. 567–574. Hong Kong. He has published over 360 technical articles and graduated
[60] L. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder- 50 Ph.D. and research master’s students, and supervised over 60 honors theses.
decoder with atrous separable convolution for semantic image segmen- His research interests include radar imaging and signal processing, image
tation,” in Proc. ECCV, 2018, pp. 801–818. processing, vision, machine learning, and pattern recognition.
[61] S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, “The one Dr. Bouzerdoum was a member of the Australian Research Council (ARC)
hundred layers Tiramisu: Fully convolutional DenseNets for semantic College of Experts from 2009 to 2011. He received the Eureka Prize for
segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Outstanding Science in Support of Defense or National Security in 2011,
Workshops (CVPRW), Jul. 2017, pp. 11–19. the Chester Sall Award of the IEEE T RANSACTIONS ON C ONSUMER
[62] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten, “Densely E LECTRONICS in 2005, and the Distinguished Researcher Award, Chercheur
connected convolutional networks,” in Proc. CVPR, 2017, vol. 1, no. 2, de Haut Niveau, from the French Ministry in 2001. He is the Deputy
p. 3. Chair of the Engineering, Mathematics and Informatics Panel from 2010 to
[63] A. Y. Ng and M. I. Jordan, “On discriminative vs. generative classifiers: 2011. He has served as an Associate Editor for five international journals,
A comparison of logistic regression and naive Bayes,” in Proc. NIPS, including the IEEE T RANSACTIONS ON I MAGE P ROCESSING and the IEEE
2002, pp. 841–848. T RANSACTIONS S YSTEMS , M AN , AND C YBERNETICS from 1999 to 2006.
Authorized licensed use limited to: University of Hull. Downloaded on August 12,2023 at 17:52:35 UTC from IEEE Xplore. Restrictions apply.