You are on page 1of 33

Communicated by Yue Gao

Accepted Manuscript

Multiscale road centerlines extraction from high-resolution aerial


imagery

Ruyi Liu, Qiguang Miao, Jianfeng Song, Yining Quan, Yunan Li,
Pengfei Xu, Jing Dai

PII: S0925-2312(18)31224-4
DOI: https://doi.org/10.1016/j.neucom.2018.10.036
Reference: NEUCOM 20058

To appear in: Neurocomputing

Received date: 5 October 2017


Revised date: 12 August 2018
Accepted date: 9 October 2018

Please cite this article as: Ruyi Liu, Qiguang Miao, Jianfeng Song, Yining Quan, Yunan Li, Pengfei Xu,
Jing Dai, Multiscale road centerlines extraction from high-resolution aerial imagery, Neurocomputing
(2018), doi: https://doi.org/10.1016/j.neucom.2018.10.036

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service
to our customers we are providing this early version of the manuscript. The manuscript will undergo
copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please
note that during the production process errors may be discovered which could affect the content, and
all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT

Multiscale road centerlines extraction from


high-resolution aerial imagery✩

Ruyi Liua , Qiguang Miaoa,∗, Jianfeng Songa , Yining Quana , Yunan Lia ,
Pengfei Xub , Jing Daic

T
a School of Computer Science and Technology, Xidian University, Xi’an, Shaanxi, 710071,

IP
China
b School of Information Science and Technology, Northwest University, Xi’an, Shaanxi,

710127, China
c The hospital of Cheng Du military area authority, Chengdu, Sichuan, 610011, China

CR
Abstract

US
Accurate road extraction from high-resolution aerial imagery has many appli-
cations such as urban planning and vehicle navigation system. The common
AN
road extraction methods are based on classification algorithm, which needs to
design robust handcrafted features for road. However, designing such features
is difficult. For the road centerlines extraction problem, the existing algorithms
M

have some limitations, such as spurs, time consuming. To address the above
issues to some extent, we introduce the feature learning based on deep learn-
ED

ing to extract robust features automatically, and present a method to extract


road centerlines based on multiscale Gabor filters and multiple directional non-
maximum suppression. The proposed algorithm consists of the following four
PT

steps. Firstly, the aerial imagery is classified by a pixel-wise classifier based


on convolutional neural network (CNN). Specifically, CNN is used to learn fea-
CE

tures from raw data automatically, especially the structural features. Then,
edge-preserving filtering is conducted on the resulting classification map, with
the original imagery serving as the guidance image. It is exploied to preserve
AC

the edges and the details of the road. After that, we do some post-processing

✩ Fully
documented templates are available in the elsarticle package on CTAN.
∗ Correspondingauthor
Email address: qgmiao@126.com (Qiguang Miao)
URL: http://web.xidian.edu.cn/qgmiao/en/index.html (Qiguang Miao)

Preprint submitted to Journal of LATEX Templates October 16, 2018


ACCEPTED MANUSCRIPT

based on shape features to extract more reliable roads. Finally, multiscale Ga-
bor filters and multiple directional non-maximum suppression are integrated to
get a complete and accurate road network. Experimental results show that the
proposed method can achieve comparable or higher quantitative results, as well

T
as more satisfactory visual performance.

IP
Keywords: Convolutional Neural Network (CNN), Edge-preserving filtering,
multiscale Gabor filters, centerlines extraction

CR
1. Introduction

US
Road information has become essential nowadays as they support many ap-
plications, such as vehicle navigation system, change detection and urban plan-
ning, so road extraction from high-resolution images is very important. In the
AN
5 last years, much effort has been made and various road extraction methods have
been developed. Roads are modeled as a network of intersections and links be-
tween these intersections, and are found by grouping processes [1]. A scheme
M

for road extraction in rural areas which integrates three different modules with
specific strengths is presented. It can get good results for simple open area with
ED

10 bright ribbon roads, but can not work well for the images with complex back-
ground. Shao et al. [2] used a fast linear detector to extract road centerlines on
high-contrast road pixels with increased performance. Besides linear features,
PT

road intersections are another signature in road networks. Owing to the simple
road extraction operator of only two directions and structure optimization algo-
15 rithm, the algorithm can be realized rapidly. However, the problem lies in this
CE

method emphasizes the speed of the algorithm but disregards its effectiveness.
Most popular and successful methods rely on classification [3] [4] [5] [6] [7] [8]
AC

[9] [10] [11] [12]. Yager et al. applied SVM to the road extraction from re-
mote sensing images using edge-based features [3]. Song and Civco exploited
20 smoothness and compactness to reduce the misclassification between roads and
other objects in a SVM classifier [4]. To improve the accuracy of road extrac-
tion, Zhang and Couloigner [5] integrated traditional k-means clustering with

2
ACCEPTED MANUSCRIPT

angular texture signature. Multiscale structural features and support vector


machines were applied in road network extraction by Huang and Zhang [6].
25 Das et al. designed a multistage framework to extract road network based on
probabilistic SVM and histogram-based features, such as mean, standard de-

T
viation, skew, energy, and entropy [7]. Classification is applied to label the

IP
pixels or voxels as belonging to the structure of interest or to the background
[8]. What makes the problem challenging is the complex structure of the prior:

CR
30 Roads form a connect network, thin segments which meet at junctions and
crossings. A priori knowledge is more difficult to turn into a tractable model
than standard smoothness or co-occurrence assumption. Shi et al. proposed

35
US
an integrated method for urban main road centerlines extraction. SVM, gen-
eral adaptive neighborhood, local Geary’s C and local linear kernel smoothing
regression were utilized [9]. Xu et al. presented a bio-inspired model for road
AN
extraction from remote sensing imagery. The model was an improved support
vector machine (SVM) based on the pooling of feature vectors [10]. In order to
obtain a comprehensive feature extraction method for road extraction, Miao et
M

al. proposed a novel object-based automatic method [11]. This method not only
40 used spectral information but also other spatial and spectral features derived
ED

from objects. A semisupervised method was introduced by Cheng et al. [12],


which explored the intrinsic structures between the labeled samples and the
unlabeled ones. From 2015 onwards, some new methods have been proposed.
PT

An information fusion based approach is proposed in [13]. Spectral and shape


45 features are explored to compute at the pixel level and select road segments
CE

using two different methods (i.e., expectation maximization clustering and lin-
earness filtering). A novel multi-stage object-based approach for road extraction
from VHR satellite images is proposed [14]. Object-based information is embed-
AC

ded as heuristic information in the ant colony optimization (ACO) algorithm


50 for handling the road network extraction problem [15]. Wei et al. proposed
a RSRCNN approach which incorporates road structure in learning the CNN
model with structured output of road regions [16]. Liu et al. proposed a mul-
tiview dictionary learning formulation to approximate the Hough transform for

3
ACCEPTED MANUSCRIPT

straight road detection in multispectral images [17]. A new method for extract-
55 ing roads from high-resolution imagery based on hierarchical graph-based image
segmentation is presented in [18]. Abdollahi et al. proposed a new automatic
method for road extraction by integrating the SVM and Level Set methods.

T
The estimated probability of classification by SVM is used as input in Level

IP
Set Method [19]. Zang et al. proposed a task-oriented enhancing technique
60 for extracting road networks from satellite images [20]. An unsupervised road

CR
detection method based on a Gaussian mixture model and object-based features
is proposed in [21].
The aforementioned methods have achieved accurate road extraction from

65 US
remote sensing imagery. They all depend on the features that can better char-
acterize the distinctiveness of a road region with respect to its surrounding area.
So how to design the robust features is critical for the performance of road ex-
AN
traction. Recently, feature learning has been a topic of interest and considerable
progress has been achieved. Features learned by deep learning have resulted in
state-of-the-art performance in various classification tasks [22] [23] [24] [25] [26].
M

70 Deep learning is a new machine learning method, which establishes deep hier-
archical models to represent and analyze data. Deep learning methods mainly
ED

include Convolutional Neural Network (CNN), deep belief networks (DBN) and
stacked auto-encoders (SAE) [27]. CNN is a popular deep model and has been
widely used in computer vision, such as object detection, image classification
PT

75 and image segmentation. In CNN model, trainable filters and local neighbor-
hood pooling operations are applied alternatingly on the raw input images. Such
CE

multiple layers architecture can extract robust features from raw pixels automat-
ically. Features extracted by such network are highly versatile and often more
effective than traditional handcrafted features [28]. Inspired by this, we obtain
AC

80 features for roads by training CNN. Our trained CNN can also classify raw
pixels into road or non-road. During the last decade, in order to improve the
classification accuracy, edge-preserving filtering has been successfully applied
for hyperspectral image classification [29]. Here, we also use edge-preserving
filtering to smooth the classification map and get the real road boundaries.

4
ACCEPTED MANUSCRIPT

85 Many approaches have been proposed to extract road centerlines. These


approaches include radon transform [5], morphological thinning algorithm [6],
hough transform [30], and regression [9] [31]. Although each of them can ex-
tract road centerlines, they also have some disadvantages. The morphological

T
thinning algorithm always produces some spurs. The radon transform based

IP
90 method can only extract the centerlines of straight road segments. Due to the
limitations of hough transform, some false centerlines exist. Some regression

CR
based methods fail to extract the centerlines of the complicated junctions. To
overcome the aforementioned shortcomings of centerlines extraction algorithms,
we use multiscale Gabor filters [32] [33] to enhance the centerlines. This helps to
95

US
find the accurate location of centerlines. The main contributions of our approach
are highlighted as follows. a) We use CNN to capture the local contrast, texture
as well as shape information, and predicting the label of each pixel without the
AN
need for hand-crafted features. b) We introduce the edge-preserving based on
guided image filtering to make the initial road map align with real road bound-
100 aries. c) A multiscale centerlines extraction using Gabor filters and multiple
M

directional non-maximum suppression is proposed.


In this paper, we have proposed a method to extract road centerlines from
ED

high-resolution aerial images. The pixel-wise classification map can be obtained


by the CNN and then processed with edge-preserving filtering. To extract more
105 reliable roads, we do some post-processing based on shape features. Finally, the
PT

road centerlines are extracted by multiscale Gabor filters and multiple direc-
tional non-maximum suppression. The rest of this paper is organized as follows.
CE

In Section 2, the proposed method is described. Section 3 gives the experimen-


tal results and provides a discussion. Finally, the concluding remarks are given
110 in Section 4.
AC

2. Proposed methodology

The proposed method consists of four steps: pixel-wise classification with


CNN, edge-preserving based on guided image filtering, post-processing based

5
ACCEPTED MANUSCRIPT

on shape features, and multiscale road centerlines extraction using Gabor filters
115 and multiple directional non-maximum suppression. The organization of this
method is shown in Figure 1. The details of each step are presented as follows.

Input image

T
IP
Patches Predictions
Initial road map
by Pixel-wise
classification

CR
based on CNN

US
Edge-preserving filtering

Post-processing based on shape features


AN
Multiscale centerlines extraction using Gabor filters
Data
and multiple directional non-maximum suppression

Process
Final road centerlines map
M

Figure 1: Flowchart of the proposed method.


ED

2.1. Pixel-wise classification with CNN


PT

CNN has been successfully applied in the classification task. It is a train-


able multistage feed-forward neural network [34]. The layers of CNN can be
CE

120 divided into three types: convolution layer, pooling layer and multi-layer per-
ceptron layer. All the convolution and pooling layers compose the feature ex-
tractor of CNN. After extracting features with a multilayer convolutional net-
AC

work, fully connected layers with a classifier are added to output class predic-
tions. A large amount of labeled samples are needed to train CNN for great
125 generalization. Given an image I, the input image patches are represented as
xO = {x1 , x2 , ...xs }, where s is the number of the training samples. These
input samples are obtained by extracting the patches centered on the pixels

6
ACCEPTED MANUSCRIPT

Table 1: Architecture details of our network. C: convolutional layer; BN: Batch normaliza-
tion, it takes a step towards reducing internal covariate shift, and in doing so dramatically
accelerates the training of deep neural nets; P: pooling; F: fully connected layer; R: ReLUs, it
is the most commonly used activation function in deep learning models. The function returns
0 if it receives any negative input, but for any positive value x it returns that value back; S:

T
softmax; Channels: the number of output feature maps; Input size: the spatial size of input

IP
feature maps.
Layer 1 2 3 4 5(Output)

CR
Type C+BN+P C+BN+P C+BN+R F F+S
Channels 12 24 50 50 2
Filter size 6×6 6×6 4×4 - -
Pooling size
Pooling stride
Input size
2×2
2
31 × 31
2×2
2
13 × 13
US -
-
4×4
-
-
1×1
-
-
1×1
AN

O = {o1 , o2 , ...os }. For the input image patches xO = {x1 , x2 , ...xs }, the corre-
sponding network output is expressed as y = {y1 , y2 , ...ys }. Each ys takes its
M

130 value from a finite set of classes Ω1 = {1, 2, ...K}, where K is the number of
class. The labels in our dataset are binary, i.e., road or non-road. For real-world
ED

applications we often faced with the requirement of constrained time budget.


The designs of the network architectures should exhibit as trade-offs among the
factors like depth, numbers of filters, filter sizes, etc. Our network consists of
PT

135 five layers, with three convolutional layers and two fully connected layers. Each
layer contains learnable parameters. The network takes a RGB image patch of
CE

31×31 pixels as an input, and exploits a softmax regression model as the output
layer to generate the probabilities of the central pixel being road and non-road.
The architecture details are listed in Table 1.
AC

140 For the datasets used in our paper, we randomly select 1% labeled patches
with the size of 31 × 31 as training samples. All the patches centered on the
pixels with the size of 31 × 31 are used as the testing samples. To label the
training patches, we mainly consider the groundtruth of their central pixels

7
ACCEPTED MANUSCRIPT

T
IP
(a) (b) (c)

Figure 2: Feature maps of different convolutional layers. (a) The original patch with the size

CR
of 31 × 31; (b) The feature maps of the first convolutional layers; (c) The feature maps of the
second convolutional layers.

145
US
as well as the overlaps between the patches and the groundtruth road mask.
The patch B is labeled as a positive training example if i). the central pixel
is belong to road class, and ii).it sufficiently overlaps with the ground truth
AN
road region G: |B ∩ G| ≥ 0.8 × min(|B| , |G|). Similarly, the patch B is labeled
as a negative training example if i). the central pixel is located within the
background, and ii). its overlap with the ground truth salient region is less
M

150 than a predefined threshold: |B ∩ G| < 0.2 × min (|B| , |G|). In order to better
understand CNN, we show the feature maps of convolutional layers in Figure 2.
ED

Figure 2(a) is the original image patch. Figure 2(b) and (c) are feature maps of
the first convolutional layers and the second convolutional layers, respectively.
It is observed that CNN can learn the features very well. Besides, the extracted
PT

155 features are more abstract and global as the layers get deeper.

2.2. Edge-preserving based on guided image filtering


CE

The result of pixel-wise classification appears noisy and not aligned with real
road boundaries. To solve this problem, it is optimized by edge-preserving based
on guided image filtering [29] [35]. The guided filter is based on a local linear
AC

model which assumes that the filtering output q can be represented as a linear
transform of the guidance image I in a local window w of size (2r + 1)×(2r + 1)
as follows:
qi = aj Ii + bj , ∀i ∈ wj (1)

8
ACCEPTED MANUSCRIPT

This model ensures ∇q ≈ a∇I, which means that the filtering output q will have
an edge only if the guidance image I has an edge. To determine the coefficients
aj and bj , the output q is modeled as the input p subtracting some unwanted
components n like noise/textures:

T
qi = pi − ni (2)

IP
A solution that minimizes the difference between q and p while maintaining the

CR
linear model (1) is proposed in [35]. the following cost function is minimized in
the window wj :
X 2

(aj Ii + bj − Pi ) + εa2j
E (aj , bj ) =
i∈wj
US
where ε is a regularization parameter deciding the degree of blurring for the
(3)
AN
guided filter. Figure 3 shows an illustration of the guided filtering process. We
adopt the input image as the color guidance image. For guided filter, (1) can be
represented in a weighted average form as (4), so the filtering output at a pixel
M

is expressed as a weighted average:


X
qi = wi,j (I) pj (4)
ED

where i and j are pixel indexes. p represents the result of pixel-wise classifica-
tion. The filtering weight wi,j is chosen so that the filter can preserve edges of
PT

the guidance image I. wi,j can be expressed as follows:

1 X  (Ii − µk ) (Ij − µk )

wi,j (I) = 2 1+ (5)
|w| k∈wi ,k∈wj σk2 + ε
CE

where wi and wj are local windows around pixel i and j, respectively, µk and
σk are the mean and variance of I in wk ,and |w| is the number of pixels in
AC

wk . A 1-D step edge example is presented in Figure 4 to demonstrate the edge


preserving property of the filtering weight for the guided filter. As shown in
Figure 4, when Ii and Ij are on the same side of an edge (Ii − µk ) (Ij − µk ), the
term will have a positive sign. However, if the two pixels are on different sides,
the term will have a negative sign. Figure 5 shows an example of edge-preserving

9
ACCEPTED MANUSCRIPT

Shear
transform

T
...

IP
CR
Collect
samples

US
Figure 3: Illustrations of the guided filtering process.
...
AN
filtering. It shows that the noisy in Figure 5(b) can be effectively smoothed.
importantly, the image obtained by edge-preserving filtering tends to. be
MoreConvolution
.
. is
aligned with edges in the guidance image. Then a thresholding algorithm
M

Convolution
Subsampling
applied to extract desirable road segments. The thresholding method can be
Subsampling
expressed mathematically as
ED

 Softmax
1 q i > threshold
qi = (6)
0 otherwise
PT

. .
Ii I j
s
CE

m
s
AC

.I j

Figure 4: Example of 1-D step edge. Here, µ and σ are shown for a filtering kernel centered
exactly at an edge.

10
ACCEPTED MANUSCRIPT

T
IP
CR
(a)
US
(b) (c)
AN
Figure 5: Guided image filtering. (a) The guidance image I; (b) The result of pixel-wise
classification; (c) An image obtained by edge-preserving filtering.

2.3. Post-processing based on shape features


M

After the above processes, false roads still exist. A refinement process is
160 necessary to improve the reliability of road extraction. In general, roads have
ED

the following characteristics: 1) They do not have small areas; regions with small
areas can be regarded as noisy and should be removed, and 2) they are narrow
and long [36]. In terms of roads’ characteristics, shape features filtering [31]
PT

and multidirectional morphological filtering [37] are used to distinguish potential


165 road segments from other segments. For holes due to noise in some road regions,
CE

a morphologic closing computation operator is applied to fill holes on road. We


apply closing operation with a structure element, size of which equals to 2 to 4
pixels in the high-resolution images.
AC

A typical shape features filtering method is using linear feature index (LFI).
We use connected component analysis to divide pixels into connected compo-
nents. The component is then converted to a rectangle which satisfies

LW = np (7)

11
ACCEPTED MANUSCRIPT

where L is the length of the new rectangle, W is the width of the new rectangle,
170 np is the area of the road segment(also known as pixel number).
LFI can be calculated by
L L L2
LF I = = = (8)

T
W np /L np
According to roads characteristics, they should have large values of LFI, so

IP
regions with small values of LFI can be regarded as nonlinear features and will

CR
be removed.
In order to ensure the independence of each road candidate, directional mor-
phological filtering is applied to eliminate these neighboring non-road segments.

EL,α . It can be expressed as US


We perform the morphological opening operation using the structure element

90

AN
f= ∪ ◦
I ◦ EL,αi (9)
αi =−90

 
 ◦

yi = xi tan αi , xi = 0, ±1, ... ± (L−1)2cos αi , if |αi | ≤ 45
= (xi , yi )
M

EL,αi
 xi = yi cot αi , yi = 0, ±1, ... ± (L−1) sin αi , if 45 ≤ |αi | < 90 
◦ ◦

2
(10)
ED

where αi is the orientation angle , Lse is the length of the line structure element,
175 I is the image after shape features filtering, ◦ is the morphological opening
operator.
PT

2.4. Multiscale road centerlines extraction using Gabor filters and multiple di-
rectional Non-maximum suppression
CE

Road centerlines extraction has been an active research. In order to produce


180 more smooth and accuracy road network, multiscale Gabor filters [32] [33] and
multiple directional non-maximum suppression [38] are introduced.
AC

A 2-D even Gabor function is a Gaussian modulated by cosine, as the fol-


lowing equation illustrates
  2  2 
 0
 x + y
0

  
1 0
hF,θ (x, y) = 2
exp − 2
cos 2πF x (11)
2πσ 
 2σ 

12
ACCEPTED MANUSCRIPT

0.5

T
-0.5

-1

IP
50

CR
-50
-30 -20 -10 0 10 20 30

(a) (b)

US
Figure 6: A 2-D even Gabor filter in spatial domain at orientation θ = 0 rendered as (a) a 2D
intensity map and (b) a 3D surface.
AN
 0 0

x ,y = (x cos θ + y sin θ, −x sin θ + y cos θ) (12)
M

1
F = (13)
2 ∗ width
r
ln 2 2BF + 1
ED

σ= (14)
2 πF (2BF − 1)
where θ and F denote the orientation of the filter and the spatial center fre-
quency, respectively. width represents the width of a region. σ is the standard
PT

deviation of the Gaussian envelope, BF is the spatial frequency bandwidth.


185 Figure 6 illustrates the support area of a symmetrical Gabor filter.
CE

Centerlines extraction from binary images is to get the maximum response


map, which should have large values on centerlines and gradually decrease as
the distance of a point to the centerlines increases. In order to make the road
AC

centerlines positions have local maximum values, we convolve the road segmen-
tation image with a bank of Gabor filters tuned at different orientations and
frequencies, and choose the maximum response among the different scales and

13
ACCEPTED MANUSCRIPT

orientations. Thus, the response maps Ires (x, y) can be given as

Ires (x, y) = |I ∗ hF,θ (x, y)| (15)

where I denotes the road segmentation image, ∗ denotes the convolution op-

T
eration. After getting above convolution results, we find the maximum of the

IP
corresponding pixel values of images Ires (x, y) as

Imax = max Ires (x, y) (16)

CR
where Imax denotes the maximum response map.

US
AN

(a) (b) (c)


M

Figure 7: The proposed road centerlines extraction method. (a) A binary image; (b) Maximum
response map; (c) Final centerline result.
ED

To get the road centerlines, multiple directional non-maximum suppres-


sion [38] is applied on the maximum response map. Non-maximum suppression
PT

is the process of marking all pixels whose intensity is not maximal within a
190 certain local neighbourhood as zero. The shape of this local neighbourhood is
usually a square or a rectangular window. We use eight linear windows at angles
CE

of 0◦ , 22.5◦ , 45◦ , 67.5◦ , 90◦ , 112.5◦ , 135◦ and 157.5◦ . As Figure 7 shows, our
method can extract smooth and accuracy centerlines.
AC

3. Experiments and analysis

195 Our experiments are performed on a PC with i7 − 4790 3.6 GHz CPU and
a Nvidia GPU of GeForce GTX TianX with MatConvNet toolbox. To demon-
strate the performance of our approach, we do some experiments with aerial

14
ACCEPTED MANUSCRIPT

images and discuss our results. For evaluation, we use two public datasets in
our experiments. One is the EPFL-dataset, which is developed by Turetken
200 et al. [39], the other is Massachusetts Roads dataset. Some images are under
the conditions of complex backgrounds and occlusions of trees. Figure 8(a)

T
shows an aerial image. The result by pixel-wise classification based on CNN is

IP
shown in Figure 8(b). Figure 8(c) shows the image after edge-preserving filter-
ing. Figure 8(d) shows the result by performing a thresholding algorithm on

CR
205 Figure 8(c). Figure 8(e) shows the image after post-processing based on shape
features. Road centerlines result is shown in Figure 8(f). In this experiment,
we use Gabor even filters of twelve different orientations and choose five scales

US
corresponding to five scales,width ∈ {9, 10, 11, 12, 13}, BF is set as 1 .
AN
M

(a) (b) (c)


ED
PT
CE

(d) (e) (f)

Figure 8: The result of each step. (a) Aerial image; (b) Road extraction result by CNN; (c)
AC

The image obtained by edge-preserving filtering; (d) Result by a thresholding algorithm; (e)
Image after post-processing based on shape features; (f) Result of the proposed method;

We provide the comparison results related to SVM to test the performance


210 of classification. For each window representing a road or non-road sample,

15
ACCEPTED MANUSCRIPT

the histogram-based measures, such as mean, standard deviation, skew, energy,


kurtosis and entropy, are computed to form an 18-D feature vector for training
SVM. For fair comparison, the same training samples are used to train SVM.
Figure 9 shows the comparison results. Table 2 and Figure 10 present the cor-

T
215 responding classification accuracy. As can be seen, our method achieves higher

IP
classification accuracy compared to SVM based method, which uses the mean,
standard deviation, skew, energy, kurtosis and entropy of samples. A vision

CR
comparison reveals that the extracted road by our method is more accurate
than that by SVM. Although the initial road network is noisy and not entirely
220 aligned with real road boundaries, it can be solved by the subsequent processing
procedure.
US
AN
M

(a) (b) (c)


ED
PT
CE
AC

(d) (e) (f)

Figure 9: (a) (d)Original images; (b) (e)Road extraction results by SVM; (c) (f)Road extrac-
tion results by the pixel-wise classification with CNN

For quantifying the performance, we use the following three accuracy mea-

16
ACCEPTED MANUSCRIPT

100%
90%
80%
Table 2:70%
Comparison of the classification accuracy of SVM and our method.
60%
50%
Experiment Methods Classification accuracy
40%
30% SVM 78.13%
20% Figure 9(a)

T
10%
Our method 88.70%
0%

IP
SVM Our method SVM Our method
Figure 9(a)SVM 75.85%
Figure 9(d)
Figure 9(d)

CR
Our method 88.23%

100%

US
90%
80%
70%
60%
50%
AN
40%
30%
20%
10%
0%
M

SVM Our method SVM Our method


Figure 9(a) Figure 9(d)
ED

Figure 10: The classification accuracy of SVM and our method.

sures proposed by Wiedemann et al. [40].


PT

TP
Completeness = (17)
TP + FN
CE

TP
Correctness = (18)
TP + FP

TP
AC

Quality = (19)
TP + FP + FN
where T P is the road pixels obtained by an extraction algorithm which is coin-
ciding with the reference data, F P is the obtained road pixels which are not in
the reference data, and F N is the road pixels which are in the reference data
225 but not in the obtained result.

17
ACCEPTED MANUSCRIPT

T
(a1) (b1) (c1) (d1)

IP
CR
(e1) (f1) (g1)

(h1) US
AN
M

(a2) (b2) (c2) (d2)


ED
PT

(e2) (f2) (g2)


CE
AC

(h2)

Figure 11: comparison with different methods ( [2], [39], [9] and [41]) on aerial images. (a1)
and (a2) Aerial images; (b1) and (b2) Ground truth ; (c1) and (c2) Results of Shao’s method
[2]; (d1) and (d2) Results of Sironi’s method [39]; (e1) and (e2) Results of Shi’s method [9],
(f1) and (f2) Results of Liu’s method [41], (g1) and (g2) Results of the proposed method,
(h1) and (h2) Results of magnified images of the sub-regions obtained by different methods.
18

(a3) (b3) (c3) (d3)


ACCEPTED MANUSCRIPT

T
IP
CR
(a3) (b3) (c3) (d3)

US
AN
M

(e3) (f3) (g3)


ED
PT
CE

(a4) (b4) (c4) (d4)


AC

(e4) (f4) (g4)


19

Figure 2: Feature
(e4) maps of different convolutional layers. (g4)
(f4) (a) The original patch with the size
of 31 × 31; (b) The feature maps of the first convolutional layers; (c) The feature maps of the
(b)
second convolutional layers.

Figure 2: Feature maps of different convolutional


(b)layers. (a) The original patch with the size
of 31 × 31; (b) The feature maps of the first convolutional layers; (c) The feature maps of the
second convolutional layers.
Figure 2: Feature maps of different convolutional layers. (a) The original patch with the size
of 31 × 31; (b) The feature maps of the first convolutional layers; (c) The feature maps of the
ACCEPTED MANUSCRIPT

T
IP
(a5) (b5) (c5) (d5)

CR
(e5) (f5)
US (g5)
AN
M

(a6) (b6) (c6) (d6)


ED
PT

(e6) (f6) (g6)


CE

(b) ( [2], [39], [9] and [41]) on aerial images.


Figure 12: Comparison with different methods
AC

(a3), (a4) ,(a5) and (a6)Aerial images; (b3), (b4) ,(b5) and (b6)Ground truth ; (c3), (c4), (c5)
and (c6)2:Results
Figure Featureofmaps
Shao’s method convolutional
of different [2]; (d3), (d4)layers.
,(d5) and (d6) original
(a) The Results patch
of Sironi’s method
with the size
[39];
of 31(e3),
× 31;(e4) ,(e5)feature
(b) The and (e6) Results
maps of theoffirst
Shi’sconvolutional
method [9];layers;
(f3), (f4) ,(f5) feature
(c) The and (f6)maps
Results of
of the
Liu’s
secondmethod [41] (g3),
convolutional (g4) ,(g5) and (g6) Results of the proposed method.
layers.

4
20
ACCEPTED MANUSCRIPT

To verify the performance, we have compared the proposed algorithm with


four existing road extraction methods from the literature. These four methods
are introduced by Shao et al. [2], Sironi et al. [39], Shi et al. [9] and Liu et
al. [41]. Figure 11 and Figure 12 give the comparison results of different road

T
230 extraction methods. A vision comparision reveals that the completeness and

IP
correctness of the proposed method are superior to those of the other three
methods. To evaluate these methods quantitatively, the completeness, correct-

CR
ness, and quality of each method are computed. The results are given in Table
3 and Figure 13. As can be seen, for the image with complex background,
235 the completeness of the road extracted by Sironi’s method is higher than ours.

US
The road centerlines extracted by the proposed method are more correct than
others. Thus our proposed method achieves relatively highest Quality values,
which is an overall evaluation index and a more general measure of the final
AN
result combining completeness and correctness. Shao’s method gets the lowest
240 Quality values because it emphasizes the speed of the algorithm but disregards
its effectiveness. Besides, from the magnified images in Figure 11 we can see
M

that the centerlines extracted by our method have less noise. Our method gets a
good performance in centerlines extraction from high-resolution aerial imagery.
ED

To further verify the robustness of the proposed algorithm for different complex
245 backgrounds, we chose two images from the Massachusetts dataset. The size of
these two images, shown in Figure 14 (a1) and (a2), is 1500 × 1500. They all
PT

have complex backgrounds. Figure 14 (b1) and (b2) shows the ground truth
segmentation , Figure 14 (c1) and (c2) shows the subjective results of road ex-
CE

tracted by our method. One may see that our method can get accurate and
250 complete road network.
For CNN, it consists of convolutional layer, pooling layer and full connected
AC

layer. However, the pooling layer and full connected layer often take 5-10%
of the computational time [42]. Most of the computational time is consumed
by the convolutional layers. As shown in [42], the total time complexity of all

21
ACCEPTED MANUSCRIPT

Table 3: The quantitative evaluation results of various methods.


Experiment Methods Completeness Correctness Quality

Shao’s method [2] 91.78% 84.61% 78.65%

Sironi’s method [39] 98.86 % 92.08% 91.12%

T
Figure 11(a1) Shi’s method [9] 94.69% 92.88% 88.28%

IP
Liu’s method [41] 95.46% 95.92% 91.74%

Proposed method 96.23% 95.70% 92.24%

CR
Shao’s method [2] 93.89% 82.89% 78.65%

Sironi’s method [39] 98.15 % 87.96% 86.53%

Figure 11(a2) Shi’s method [9] 87.56% 90.68% 80.33%

Liu’s method [41]

Proposed method

Shao’s method [2]


US 90.84%

94.12%

82.60%
90.79%

91.77%

65.33%
83.18%

86.79%

57.43%
AN
Sironi’s method [39] 87.67% 74.96% 67.81%

Figure 12(a3) Shi’s method [9] 83.90% 94.40% 80.0%

Liu’s method [41] 88.20% 95.60% 84.80%


M

Proposed method 93.91% 92.72% 87.46%

Shao’s method [2] 89.28% 54.09% 50.79%

Sironi’s method [39] 85.92% 76.88% 68.28%


ED

Figure 12(a4) Shi’s method [9] 89.68% 81.65% 74.63%

Liu’s method [41] 86.72% 78.34% 69.95%

Proposed method 95.92% 84.80% 81.85%


PT

Shao’s method [2] 85.96% 57.53% 52.59%

Sironi’s method [39] 92.77% 68.79% 65.29%


CE

Figure 12(a5) Shi’s method [9] 88.20% 82.69% 74.46%

Liu’s method [41] 82.58% 81.74% 69.72%

Proposed method 95.58% 83.04% 79.97%


AC

Shao’s method [2] 85.43% 69.26% 61.94%

Sironi’s method [39] 96.75 % 70.27% 68.65%

Figure 12(a6) Shi’s method [9] 89.88% 86.66% 78.95%

Liu’s method [41] 88.08% 84.37% 75.72%

Proposed method 96.65% 91.84% 89.00%

22
ACCEPTED MANUSCRIPT

T
IP
CR
US
AN
M
ED
PT

Figure 13: The visualization of quantitative evaluation results.


CE

convolutional layers is
d
!
X
O tl−1 × vl2 × tl × m2l (20)
l=1
AC

where l is the index of the convolutional layer, and d is the depth (the number
of convolutional layers). tl is the number of filters in the lth layer and tl−1 is
also known as the number of input channels of the (l − 1) the layer. Moreover,
vl and ml are the size of the filter and the output feature map, respectively.
255 In fact, our proposed approach mainly consists of four parts, pixel-wise clas-

23
ACCEPTED MANUSCRIPT

sification with CNN , edge-preserving filtering, post-processing and multiscale


centerlines extraction. Hence, the computational complexity of our proposed ap-
proach is the sum of the computational complexity of the four parts. Although
our proposed approach has the high computational complexity, it achieves good

T
260 performance. Therefore, our proposed approach deserves to be studied due to

IP
its promising results.

CR
4. Conclusion

In this paper, we present a method to extract road centerlines from high-


resolution aerial images accurately. The pixel-wise classification map can be
265
US
obtained by the CNN. Edge-preserving filtering is used to optimize the pixel-
wise classification map. Shape features filtering, multidirectional morphological
AN
filtering and hole filling are used to improve the reliability of road extraction.
The centerlines are extracted by multiscale Gabor filters and multiple directional
non-maximum suppression. Experimental results have been evaluated to show
M

270 the effectiveness of the proposed method. However, the proposed method still
has several flaws which we need to do some more research later. The main
limitation of the proposed method is some centerlines by our method are not
ED

single-pixel wide, thus a more accurate method need to be studied.


PT

Acknowledgment

275 The work was jointly supported by the National Natural Science Founda-
CE

tions of China under grant No. 61772396, 61472302, 61772392, 61672409, and
the Fundamental Research Funds for the Central Universities under grant No.
JB170306, JB170304. Natural Science Foundation of Hebei Province of China
AC

under grant No. F2018203096, China Postdoctoral Science Foundation under


280 grant No. 2017M 611188.

24
ACCEPTED MANUSCRIPT

T
IP
CR
(a1) (a2)

US
AN
M
ED

(b1) (b2)
PT
CE
AC

(c1) (c2)

Figure 14: Road area extraction on the Massachusetts Roads dataset. (a1) and (a2) Two
test areas; (b1) and (b2) Ground truth; (c1) and (c2) Corresponding road extraction results
produced by the proposed method.
25
ACCEPTED MANUSCRIPT

References

[1] A. Baumgartner, C. Steger, H. Mayer, W. Eckstein, H. Ebner, Automatic


road extraction based on multi-scale, grouping, and context, Photogram-
metric Engineering and Remote Sensing 65 (1999) 777–786.

T
IP
285 [2] Y. Shao, B. Guo, X. Hu, L. Di, Application of a fast linear feature detector
to road extraction from remotely sensed imagery, IEEE Journal of Selected

CR
Topics in Applied Earth Observations and Remote Sensing 4 (3) (2011)
626–631.

[3] N. Yager, A. Sowmya, Support vector machines for road extraction from re-
290

US
motely sensed images, in: International Conference on Computer Analysis
of Images and Patterns, Springer, 2003, pp. 285–292.
AN
[4] M. Song, D. Civco, Road extraction using svm and image segmentation,
Photogrammetric Engineering & Remote Sensing 70 (12) (2004) 1365–1371.
M

[5] Q. Zhang, I. Couloigner, Benefit of the angular texture signature for the
295 separation of parking lots and roads on high resolution multi-spectral im-
agery, Pattern recognition letters 27 (9) (2006) 937–946.
ED

[6] X. Huang, L. Zhang, Road centreline extraction from high-resolution im-


agery based on multiscale structural features and support vector machines,
PT

International Journal of Remote Sensing 30 (8) (2009) 1977–1987.

300 [7] S. Das, T. Mirnalinee, K. Varghese, Use of salient features for the design
CE

of a multistage framework to extract roads from high-resolution multispec-


tral satellite images, IEEE transactions on Geoscience and Remote sensing
49 (10) (2011) 3906–3931.
AC

[8] J. D. Wegner, J. A. Montoya-Zegarra, K. Schindler, A higher-order crf


305 model for road network extraction, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2013, pp. 1698–1705.

26
ACCEPTED MANUSCRIPT

[9] W. Shi, Z. Miao, J. Debayle, An integrated method for urban main-road


centerline extraction from optical remotely sensed imagery, IEEE Transac-
tions on Geoscience and Remote Sensing 52 (6) (2014) 3359–3372.

310 [10] J. Xu, R. Wang, S. Yue, Bio-inspired classifier for road extraction from

T
remote sensing imagery, Journal of Applied Remote Sensing 8 (1) (2014)

IP
5946–5957.

CR
[11] Z. Miao, W. Shi, P. Gamba, Z. Li, An object-based method for road net-
work extraction in vhr satellite images, IEEE Journal of Selected Topics in
315 Applied Earth Observations and Remote Sensing 8 (10) (2015) 4853–4862.

US
[12] G. Cheng, F. Zhu, S. Xiang, C. Pan, Road centerline extraction via semisu-
pervised segmentation and multidirection nonmaximum suppression, IEEE
Geoscience and Remote Sensing Letters 13 (4) (2016) 545–549.
AN
[13] Z. Miao, W. Shi, A. Samat, G. Lisini, P. Gamba, Information fusion for
320 urban road extraction from vhr optical satellite images, IEEE Journal of
M

Selected Topics in Applied Earth Observations and Remote Sensing 9 (5)


(2016) 1817–1829.
ED

[14] M. Maboudi, J. Amini, M. Hahn, M. Saati, Road network extraction from


vhr satellite images using context aware object feature integration and ten-
325 sor voting, Remote Sensing 8 (8) (2016) 637.
PT

[15] M. Maboudi, J. Amini, M. Hahn, M. Saati, Object-based road extraction


from satellite images using ant colony optimization, International Journal
CE

of Remote Sensing 38 (1) (2017) 179–198.

[16] Y. Wei, Z. Wang, M. Xu, Road structure refined cnn for road extraction in
AC

330 aerial image, IEEE Geoscience and Remote Sensing Letters 14 (5) (2017)
709–713.

[17] W. Liu, Z. Zhang, X. Chen, S. Li, Y. Zhou, Dictionary learning-based hough


transform for road detection in multispectral image, IEEE Geoscience and
Remote Sensing Letters 14 (12) (2017) 2330–2334.

27
ACCEPTED MANUSCRIPT

335 [18] R. Alshehhi, P. R. Marpu, Hierarchical graph-based segmentation for ex-


tracting road networks from high-resolution satellite images, ISPRS journal
of photogrammetry and remote sensing 126 (2017) 245–260.

[19] A. Abdollahi, H. R. R. Bakhtiari, M. P. Nejad, Investigation of svm and

T
level set interactive methods for road extraction from google earth images,

IP
340 Journal of the Indian Society of Remote Sensing (2017) 1–8.

CR
[20] Y. Zang, C. Wang, Y. Yu, L. Luo, K. Yang, J. Li, Joint enhancing filtering
for road network extraction, IEEE Transactions on Geoscience and Remote
Sensing 55 (3) (2017) 1511–1525.

345
US
[21] J. Li, Q. Hu, M. Ai, Unsupervised road extraction via a gaussian mixture
model with object-based features, International Journal of Remote Sensing
AN
39 (8) (2018) 2421–2440.

[22] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T. Darrell,


Decaf: A deep convolutional activation feature for generic visual recogni-
M

tion., in: ICML, 2014, pp. 647–655.

350 [23] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for
ED

accurate object detection and semantic segmentation, in: Proceedings of


the IEEE conference on computer vision and pattern recognition, 2014, pp.
580–587.
PT

[24] C. Nebauer, Evaluation of convolutional neural networks for visual recog-


355 nition, IEEE Transactions on Neural Networks 9 (4) (1998) 685–696.
CE

[25] Y. LeCun, K. Kavukcuoglu, C. Farabet, et al., Convolutional networks and


applications in vision., in: ISCAS, 2010, pp. 253–256.
AC

[26] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with


deep convolutional neural networks, in: Advances in neural information
360 processing systems, 2012, pp. 1097–1105.

28
ACCEPTED MANUSCRIPT

[27] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015)
436–444.

[28] G. Li, Y. Yu, Visual saliency based on multiscale deep features, in: Proceed-
ings of the IEEE Conference on Computer Vision and Pattern Recognition,

T
365 2015, pp. 5455–5463.

IP
[29] X. Kang, S. Li, J. A. Benediktsson, Spectral–spatial hyperspectral im-

CR
age classification with edge-preserving filtering, IEEE transactions on geo-
science and remote sensing 52 (5) (2014) 2666–2677.

[30] C. Poullis, S. You, Delineation and geometric modeling of road networks,


370

181.
US
ISPRS Journal of Photogrammetry and Remote Sensing 65 (2) (2010) 165–
AN
[31] Z. Miao, W. Shi, H. Zhang, X. Wang, Road centerline extraction from
high-resolution imagery based on shape features and multivariate adaptive
regression splines, IEEE geoscience and remote sensing letters 10 (3) (2013)
M

375 583–587.

[32] N. Sang, Q. Tang, X. Liu, W. Weng, Multiscale centerline extraction of


ED

angiogram vessels using gabor filters, in: International Conference on Com-


putational and Information Science, Springer, 2004, pp. 570–575.
PT

[33] Z. Cao, X. Liu, B. Peng, Y.-S. Moon, Dsa image registration based on mul-
380 tiscale gabor filters and mutual information, in: 2005 IEEE International
Conference on Information Acquisition, IEEE, 2005, pp. 6–pp.
CE

[34] P. Sermanet, S. Chintala, Y. LeCun, Convolutional neural networks applied


to house numbers digit classification, in: Pattern Recognition (ICPR), 2012
AC

21st International Conference on, IEEE, 2012, pp. 3288–3291.

385 [35] K. He, J. Sun, X. Tang, Guided image filtering, in: European conference
on computer vision, Springer, 2010, pp. 1–14.

29
ACCEPTED MANUSCRIPT

[36] P. P. Singh, R. Garg, A two-stage framework for road extraction from


high-resolution satellite images by using prominent features of impervious
surfaces, International Journal of Remote Sensing 35 (24) (2014) 8074–
390 8107.

T
[37] T. Chao, T. Yihua, C. Huajie, et al., Object-oriented method of hierarchical

IP
urban building extraction from high resolution remote sensing imagery,
Acta Geodaetica et Cartographica Sinica 39 (1) (2010) 39–45.

CR
[38] C. Sun, P. Vallotton, Fast linear feature detection using multiple directional
395 non-maximum suppression, Journal of Microscopy 234 (2) (2009) 147–157.

US
[39] A. Sironi, V. Lepetit, P. Fua, Multiscale centerline detection by learning
a scale-space distance transform, in: 2014 IEEE Conference on Computer
AN
Vision and Pattern Recognition, IEEE, 2014, pp. 2697–2704.

[40] C. Wiedemann, C. Heipke, H. Mayer, O. Jamet, Empirical evaluation of au-


400 tomatically extracted road axes, Empirical Evaluation Techniques in Com-
M

puter Vision (1998) 172–187.

[41] R. Liu, Q. Miao, B. Huang, J. Song, J. Debayle, Improved road centerlines


ED

extraction in high-resolution remote sensing images using shear transform,


directional morphological filtering and enhanced broken lines connection,
405 Journal of Visual Communication and Image Representation 40 (2016) 300–
PT

311.

[42] K. He, J. Sun, Convolutional neural networks at constrained time cost,


CE

in: Proceedings of the IEEE Conference on Computer Vision and Pattern


Recognition, 2015, pp. 5353–5360.
AC

30
ACCEPTED MANUSCRIPT

410 Biography

Ruyi Liu received the B.S. degree from Shaanxi Normal University Shaanxi,

T
China in 2012. Now she is currently working toward the Ph.D. degree in School

IP
of Computer and technology, Xidian University Shaanxi, China. Her current
interests include image classification and segmentation, and computer vision

CR
415

methods with applications in remote sensing. Email: ruyi198901210121@126.com

US
AN
Qiguang Miao received the M.Eng and Doctor degrees in Computer Sci-
ence from Xidian University, China. He is currently working as a professor at
420 school of computer science, Xidian University. His research interests include the
M

intelligent information processing, the intelligent image processing, and multi-


scale geometric representations for image. Email: qgmiao@126.com
ED

Jianfeng Song received his BEs and M.Eng degrees in Computer Appli-
PT

425 cation Technology at Xidian University, China. He is currently working as


Lecturer at School of Computer Science and Technology, Xidian University.
His research interests include the intelligent image processing and the system
CE

security. Email: jfsong@mail.xidian.edu.cn


AC

430 YiningQuanistheassociateprofessorandgraduatestudentsupervisorofSchoolofComputerScienceandTechnologyi
received his doctor degree in cryptology fromXidianUniversityin2010.His re-
search interests include network computingandsecurity. Email: ynquan@xidian.edu.cn

31
ACCEPTED MANUSCRIPT

Yunan Li received this B.S degree from the School of Computer Science and

T
435 Technology, Xidian University, Xi’an, China in 2014. He is currently working
toward the Ph.D. degree at Xidian University. His research interests include

IP
pattern recognition and digital image processing. Email: xdfzliyunan@163.com

CR
440
US
Pengfei Xu is lecturer at Information Science and Technology School,
Northwest University in China. His main research interests include: image
AN
processing and pattern recognition. Email:pfxu@nwu.edu.cn
M
ED
PT
CE
AC

32

You might also like