You are on page 1of 22

Automation in Construction 137 (2022) 104214

Contents lists available at ScienceDirect

Automation in Construction
journal homepage: www.elsevier.com/locate/autcon

Vision-based navigation planning for autonomous post-earthquake


inspection of reinforced concrete railway viaducts using unmanned
aerial vehicles
Yasutaka Narazaki a, *, Vedhus Hoskere b, Girish Chowdhary c, Billie F. Spencer Jr d
a
Zhejiang University/University of Illinois at Urbana-Champaign Institute, Zhejiang University, China
b
Department of Civil and Environmental Engineering, University of Houston, USA
c
Coordinated Science Laboratory, University of Illinois at Urbana-Champaign, USA
d
Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, USA

A R T I C L E I N F O A B S T R A C T

Keywords: This research proposes an approach for vision-based autonomous navigation planning of unmanned aerial ve­
Post-earthquake inspection hicles for the collection of images suitable for the rapid post-earthquake inspection of reinforced concrete railway
Autonomous structural inspection viaducts. The proposed approach automatically recognizes and localizes critical structural components, columns
Unmanned Aerial Vehicles
in this case, and determines appropriate viewpoints for inspection relative to the identified components.
Online structural component recognition
Semantic segmentation
Structural component recognition and localization are formulated through online detection of rectangular
Path planning prismatic shapes from the parsed sparse point-cloud data, where prior knowledge of the target structural system
is incorporated. The proposed approach is tested in a synthetic environment representing Japanese high-speed
railway viaducts. First, the ability to detect the columns of the target viaduct is assessed. The results show
that the columns are detected completely and robustly, with centimeter-level accuracy. Subsequently, the entire
approach is demonstrated in the synthetic environment, showing the significant potential of collecting high-
quality images for post-earthquake structural inspection efficiently.

1. Introduction structural damage. For example, damage caused by the Shizuoka


Earthquake (8/11/2009, M6.5) was limited to nonstructural compo­
Rapid post-earthquake structural inspection is critical during the nents. However, railway operations in the affected area were shut down
initial phase of disaster response to minimize the negative impact to for 3–5 h to allow for rail infrastructure to be inspected. As a result, 31
people’s lives and business. Moreover, for a large and catastrophic Shinkansen trains (Japanese bullet trains) were canceled, and 99 Shin­
earthquake, the first 72 h are known to be crucial for rescue activities, as kansen trains were delayed more than four hours [3]. These cancella­
well as for maintaining the supply of critical goods and services, such as tions and delays hindered people’s social and economic activities.
water, power, and food [1]. On the other hand, human resources are Additionally, the Central Japan Railway Company suffered financial loss
often limited immediately after a major earthquake, due to damage to due to the need to refund the cost of special express tickets. Another
homes, transportation systems, etc. For example, during the first, sec­ example that highlights the need for rapid post-earthquake structural
ond, and third day after the Great Hanshin Earthquake (also called Kobe inspections is illustrated by the initial response in the Tokyo metropol­
Earthquake, 1/17/1995, M7.2), only 41%, 60%, and 70%, respectively, itan area to the Great East Japan Earthquake (also called Tohoku
of the Kobe Government staff was available [2]. These limitations can Earthquake, 3/11/2011, M9.0). The Tokyo metropolitan area is about
significantly hinder structural inspections required for effective plan­ 400 km away from the epicenter, allowing the ground motion to
ning and execution of emergency response activities after an attenuate substantially before it arrived. Nonetheless, major railway
earthquake. lines in the area were shut down for 16 to 18 h. The main problems
Strong demand for rapid post-earthquake structural inspection also limiting the availability of inspectors and their access to the target
exists when the ground motion is not as intense as ones causing severe structures were traffic congestion and communication failure [4].

* Corresponding author.
E-mail address: narazaki@intl.zju.edu.cn (Y. Narazaki).

https://doi.org/10.1016/j.autcon.2022.104214
Received 10 September 2021; Received in revised form 25 January 2022; Accepted 13 March 2022
Available online 30 March 2022
0926-5805/© 2022 Elsevier B.V. All rights reserved.
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Because transportation in the Tokyo metropolitan area depends heavily Table 1


on the railway systems, more than 5 million people were stranded in the Summary of robotic navigation planning approaches for autonomous structural
area and unable to return home on the day of the earthquake [2]. To inspection.
alleviate these problems, post-earthquake structural inspection needs to Model-based Logic-based
be improved to rapidly collect information about structural damage with Description Navigation path is determined Navigation path is determined
an increased level of automation. based on the models, such as on-site based on the robot
The use of robotic platforms, including Unmanned Aerial Vehicles pre-built 3D maps and BIMs. perception.
(UAVs), is seeing widespread adoption for structural inspections of Potential Detailed scan with high “Almost behavioral” inspection
repeatability. with high reactivity to
diverse structures. Earlier work focused on collecting images using UAVs
Technically more established environmental changes.
operated manually and remotely, and assessing the data visually [5–8]. for routine inspection Limited dependence on data
As the robot-assisted inspection has become more feasible, researchers scenarios (e.g., use of Skydio communication with the central
started investigating post-processing methods that can extract infor­ drones). server.
mation about the geometry of structures and their conditions from im­ Challenge Accurate (up to date) model Need for developing/
should be available for the implementing appropriate on-
ages collected using such platforms. This category of research includes robot during inspection. site data processing methods.
the investigation of 3D reconstruction methods [9–12], target surface Model should be aligned
recognition and localization [13–16], structural damage identification accurately to the current 3D
methods [17–22], and their combinations [23–26]. Promising results map.
Suitable Routine inspections (with Inspection of structures without
from those research efforts push the need for automation in one of the
scenarios models). models.
most time-consuming parts of the post-earthquake structural inspection: Post-disaster inspections of Rapid post-disaster inspections of
accessing appropriate viewpoints to inspect critical structural compo­ singular structures (with relatively simple structures
nents and collecting images desirable for such post-processing methods, models). distributed over the entire
particularly in or near GPS-denied environments typically encountered affected area.

during civil structural inspections.


Investigation of autonomous robotic navigation for structural in­ path defined using the model. Mansouri et al. has extended such model-
spection is relatively limited in the literature. Well-established robot based path planning to the scenarios with multiple UAVs cooperating
navigation approaches based on predetermined GPS-based waypoints with each other (cooperative coverage path planning) [36]. With the
are not always straightforward to apply for inspecting rapidly structures accurate model (or pre-built map) of the target structure and localiza­
in an entire city or region; the waypoints should be designed for each of tion of the robots relative to the available model, those approaches have
the structures in the city/region, taking into consideration the GPS demonstrated significant potential and flexibility for performing
signal strength and potential changes in the structure and surrounding detailed inspection tasks.
environment. Failure to do so could result in waypoints or the con­ Another approach for autonomous structural inspection is to encode
necting trajectories intersecting with obstacles. These problems have the logic that characterizes the desired tasks. Stokkeland et al. [37]
motivated researchers to propose reactive automated techniques to proposed an autonomous navigation approach for the UAV-based in­
explore and collect information in damaged areas. spection of wind turbines, where a UAV is directed to recognize the shaft
The use of autonomous robots to create a complete map of an un­ and blades using machine vision techniques (Hough transform and post-
structured environment (the exploration problem) has been investigated processing using prior knowledge about the angles of/between lines),
extensively [27–31]. Some of those approaches have led to commercial and then tracking the blades using a Kalman filter. Hui et al. [38] pro­
inspection robots that are adopted by many asset managers (e.g., Skydio posed a vision-based UAV navigation approach for power line inspec­
2 has enabled autonomous demand-driven vision-based navigation tion, where the transmission tower is detected using the Faster R-CNN
around structures in GPS-denied environments [32]). However, these object detection algorithm [39], with the power lines being recognized
approaches are not directly applicable to post-earthquake inspection, as parallel lines. The UAV is then directed to point the camera to the
because a complete map of the environment is not the final outcome that tower center, and flies along a path parallel to the power lines toward
we are looking for; rather, collection of close-up images of critical their vanishing point. Higuti et al. [40] proposed a Light Detection and
structural components, balancing required image quality and rapidness, Ranging (LiDAR)-based autonomous navigation approach using a small
are needed (Similar discussions have been made for visual recognition ground robot to perform phenotyping by following the rows of canopy-
for structural condition assessment applied to the collected images [33], like spaces between crop lines in agricultural fields. To deal with the
which are further extended for robotics navigation context herein). noisy measurement data caused by frequent occlusions by hanging
More specifically, post-earthquake inspection requires: (i) the robot to leaves, the estimations of the crop lines are refined based on the prior
be aware of the semantics of the environment and to determine for knowledge about the line width, robot width, etc. These logic-based,
which parts of the structure images are needed (e.g., the robot can skip almost behavioral, approaches do not need complete pre-built maps of
inspection of unselected components), and (ii) image quality should be the environment and tend to have simpler initialization steps (complete
determined based on inspection needs, instead of the creation of an models are not aligned explicitly), while the perception and path plan­
accurate and complete point cloud model (e.g., in some cases, a single ning steps are versatile and dependent on specific application scenarios.
high-quality image is sufficient to determine critical structural damage Consequently, new sets of logic need to be developed to extend those
to the surface within the frame, even without a large amount of overlap). approaches to the problem of autonomous rapid post-earthquake
To investigate autonomous robotic navigation for structural inspec­ structural inspection. Properties of different robotic navigation ap­
tion and other related scenarios, several different problem formulations proaches for autonomous structural inspection are summarized in
have been proposed. One of the approaches uses accurate pre-built maps Table 1.
of the environment to define the paths. Sheng et al. [34] investigated In this context, the authors envision an autonomous system that
aircraft rivet inspection using an autonomous crawler robot that visits mimics the logic followed by human inspectors during the post-
all target rivets by following the path defined with respect to a earthquake assessment of railway viaducts. First, an inspector obtains
Computer-Aided Design (CAD) model. Ibrahim et al. [35] investigated an initial understanding of the target viaduct by walking nearby. In this
indoor construction progress monitoring using a Building Information step, the inspector does not need to understand the entire structure.
Model (BIM)-driven autonomous Unmanned Ground Vehicle (UGV), Instead, the inspector needs to obtain rough information about the
where the model of a building is aligned to the map created by the robot target structure, as well as a clear idea of where the first few critical
navigating in that building, and then the robot is directed to follow the

2
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Fig. 1. Roadmap for the autonomous post-earthquake structural inspection system. First two blocks, “conceptual and theoretical development” and “prototype
development in synthetic environments” are investigated in this paper.

structural components are located. After the initiation of the task, the systems for the post-earthquake structural inspection problem. Sup­
inspector’s understanding of the target structure is improved incre­ ported by the work to date and vision for the future, this research in­
mentally and continuously during the inspection. The inspector does not vestigates the first two parts of the roadmap, conceptual and theoretical
need to have access to the complete design drawing of the structure to development, and prototype development in synthetic environments.
perform this task. Rather, prior knowledge is required regarding the This research proposes an approach for vision-based autonomous
typical geometric or visual patterns of the structure (e.g., regular grid UAV navigation planning for rapid post-earthquake inspection of rein­
patterns of the column locations, typical dimensions of the columns, forced concrete railway viaducts. The proposed approach automatically
etc.). After this process, the inspector has taken a detailed look of all recognizes and localizes critical structural components, columns in this
structurally important parts of the viaduct, based on which the struc­ case, and determines appropriate viewpoints for inspection relative to
tural conditions are evaluated. the identified components. To date, structural component recognition
A roadmap for realizing the envisioned autonomous system for UAV- and localization have been investigated as 2D semantic segmentation
based rapid post-earthquake structural inspection is shown in Fig. 1. The [15,25] or off-line dense point cloud segmentation [16,41] problems.
roadmap is divided into three parts: conceptual and theoretical devel­ On the other hand, the approach presented in this research identifies
opment, prototype development in synthetic environments, and devel­ rectangular prismatic shapes (not just segmented points) representing
opment and validation in experimental/field environments. In the bridge columns (i) using sparse point cloud data (ii) by online processing
conceptual and theoretical development, the high-level description of (iii) that is aware of the key characteristics of the target structure. The
the logic of the human inspectors should be converted into technical and first point, the use of sparse point cloud data, is critical for this research,
mathematical descriptions to enable software implementations that can because accurate dense point cloud data is typically not available in the
be readily combined with other generic components of autonomous robotic navigation scenarios. The second point, online processing, is also
navigation systems. Then, the system should be investigated in synthetic appropriate for the application context: the system does not have to wait
environments. Synthetic environments are suitable during preliminary until it gets the complete information about the target structure. Instead,
development stages, because of their capabilities to support extensive the system can recognize some of the structural components from partial
tests under controlled environments without actual safety risks. Once point cloud data of the structure and start planning the waypoints
the autonomous system (software) works successfully in synthetic en­ quickly. The last point, the awareness to the characteristics of the
vironments, a prototype system should be developed and tested in structure, is meant to mimic the prior knowledge by the human
experimental and field environments. In this step, the prototype high- inspector. This research encodes typical geometric patterns of railway
level navigation planning system should be integrated with generic viaducts to improve the column recognition results, as well as guess
components of autonomous navigations, such as state-of-the-art map­ column information from insufficient or no 3D points associated with
ping algorithms and local path planning (including obstacle avoidance). those columns. Following conceptual and theoretical development of
Hardware implementation of the combined system should then be the proposed autonomous navigation planning approach, the prototype
investigated. Finally, the system should be validated in field environ­ system is developed and tested in a synthetic environment representing
ments, where the robustness should be improved to handle various Japanese high-speed railway viaducts.
uncertainties which have not been considered in the synthetic and This paper first defines the autonomous UAV navigation planning
experimental environments. This roadmap is based on the previous work problem investigated in this research, including the associated as­
by the authors about synthetic environments that represent post- sumptions. Then, the algorithmic components that embody the auton­
earthquake scenarios of Japanese high-speed railway (Tokaido Shin­ omous navigation planning approach are discussed. Finally, the
kansen) viaducts [25]. In that research, 2000 viaducts were generated validation and demonstration are performed in the synthetic environ­
by following the actual design procedures adopted by Tokaido Shin­ ment of Japanese high-speed railway viaducts. This research makes a
kansen, providing a relevant platform for investigating autonomous key step toward realizing the autonomous UAV navigation for rapid

3
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

this research focuses on the formulation that is specific to the post-


earthquake structural inspection problem and simplifies the problem
for those generic components.
The assumptions for this research are as follows:
Assumption 1 (Target structure): This research investigates rapid
post-earthquake inspection of Reinforced Concrete (RC) railway via­
ducts for a Japanese high-speed railway line (the Tokaido Shinkansen)
operated and maintained by the Central Japan Railway Company. When
the Tokaido Shinkansen was constructed, such viaducts were mass-
produced based on the standardized design procedure [42]. Conse­
quently, the viaducts of this type comprise the majority of the 116 km of
railway bridges of the 515 km-length Tokaido Shinkansen line [43]. The
advantages of focusing on this type of railway viaduct are: (i) the
developed approach can be applied to numerous structures that share
Fig. 2. Google street view image of typical RC viaducts with the stan­ the same design, (ii) a relatively simple structure facilitates the devel­
dard design. opment and testing of the approach, and (iii) this generic structure en­
ables straightforward extensions to other railway viaducts designed
similarly. Note that the pre-built digital models do not exist or are not
Table 2 accessible immediately for most of those structures, supporting the need
Dimensions of structural component cross sections [cm×cm] [42]. Straight: for investigating the logic-based systems.
radius of curvature is larger than 5000 m. Curved: radius of curvature is smaller Typical RC viaducts with the standard design are shown in Fig. 2.
than 5000 m.
Each viaduct is 24 m in the longitudinal direction, comprising of three
Viaduct Columns Intermediate Columns Intermediate central spans (6 m each) and cantilever-type end spans (3 m each), with
height (Straight) beams (Curved) beams
the 5.2 m-interval between the lines of columns in the transverse di­
H[m] (Straight) (Curved)
rection. The dimensions of cross-sections of viaduct columns and in­
5.5 < H ≤ 7 60 × 60 None 60 × 70 None termediate beams are determined by the viaduct height and the viaduct
7 < H ≤ 8.5 70 × 70 None 70 × 80 None
8.5 < H ≤ 10 80 × 80 None 80 × 90 None
trajectory type, as shown in Table 2. The readers are directed to the
10 < H ≤ 12 70 × 70 80 × 60 70 × 85 80 × 60 original Japanese literature for other dimensions and further detail of
12 < H ≤ 14 80 × 80 80 × 60 80 × 95 80 × 60 the design [42].
Assumption 2 (Target structural component): This research fo­
cuses on the rapid inspection of viaduct columns, because post-
post-earthquake structural inspection.
earthquake surveys of RC railway viaducts have shown that cracks,
spalling, and exposed rebar on the column surfaces are the major modes
2. Problem statement
of structural damage [44–46].
Assumption 3 (Sensory input): This research uses data from a
Based on the high-level description of the desired behavior of the
monocular camera to determine waypoints. This assumption does not
robot discussed in the introduction, this research investigates the
limit the sensory input for the entire navigation system. For example,
following problem:
odometry and local path planning can use other types of sensors (e.g.,
Given an image stream from a monocular camera, incrementally deter­ LiDAR). On the other hand, the system investigated in this research is
mine waypoints that a UAV (or other robots that navigate in 3D) can intended to supplement generic components of autonomous navigation
follow to collect close-up images of the critical structural components for systems. The system based on a monocular camera is advantageous,
rapid post-earthquake inspection. because most robotic platforms (e.g., commercial UAVs) are equipped
with such cameras.
This system does not replace generic components of robotic plat­ Assumption 4 (GPS): This research does not use GPS coordinates to
forms, such as odometry, motion control, obstacle avoidance, and local determine the waypoints (viewpoints) for structural inspection directly,
path planning given the waypoints. Rather, the system supplements the because GPS signals are either not available or unstable underneath
existing generic systems, and provides directions that such systems can viaducts. However, this research assumes that the GPS data is available
execute to perform post-earthquake inspection. With this consideration, around the target structure, and therefore the initialization phase of the

Fig. 3. Overview of the proposed autonomous UAV navigation approach for post-earthquake structural inspection of RC railway viaducts.

4
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Fig. 4. UNET3+ architecture used in the frame-wise image processing of this research.
E(i): i-th sub-encoder, D(i): i-th sub-decoder, Conv: convolution, Dconv: depth-wise separable convolution, (m,n,f,s) specifies filter size (m × n), number of filters (f),
and stride (s × s) of the Conv or Dconv operation, BN-ReLu: batch normalization, followed by ReLu activation function, ↓n: max-pooling (n × n), ↑n: bilinear
upsampling by a factor of n, Sup: prediction for deep supervision.

inspection can be guided by the GPS data. viaducts is shown in Fig. 3. The system determines waypoints to collect
Assumption 5 (Sparse point cloud): Besides raw image data, the close-up images of critical structural components, or columns, by pro­
approach developed in this research processes the map of the environ­ cessing the input image stream sequentially. The system begins with
ment represented by sparse point cloud data to determine waypoints. processing each frame of the input image stream. Two types of pro­
During navigation, the map is created progressively, typically by cessing are performed on the raw image data: updating the sparse point
simultaneous localization and mapping (SLAM) algorithms, Structure cloud map of the environment and visual recognition. The sparse point
from Motion (SfM) algorithms, or LiDAR-based odometry algorithms. cloud update is typically performed by SLAM or SfM algorithms. Visual
Those mapping algorithms embody an independent active research area recognition has two components: 2D semantic segmentation of struc­
with rapid improvement in accuracy, point density, and computational tural components, and monocular depth estimation. The semantic seg­
efficiency (e.g., [47–51]). This research, on the other hand, simplifies mentation results are used to incrementally parse the sparse point cloud
the mapping component to focus on the problem formulation; data, and the depth estimation results are used during the initial phase of
computing sparse point cloud data batch-wise using SfM [52] and the navigation to estimate the approximate scale of the point cloud data.
feeding the 3D points that fall in each frame incrementally to the algo­ Once the parsed point cloud data is updated based on the semantic
rithms for waypoint determination. segmentation results, 3D rectangles are searched for from the points
labeled as columns. Once in a predetermined number of frames (Ncb),
3. Methods the rectangles are further processed to find pairs that form rectangular
prismatic shapes representing viaduct columns (blue shapes in Fig. 3). If
3.1. Overview enough columns are detected to identify the regular grid pattern, in­
formation about columns with unpaired rectangles (yellow) and no
An overview of the proposed approach for autonomous UAV navi­ rectangle (red) are guessed based on the identified pattern. With this
gation planning for post-earthquake structural inspection of RC railway approach, the UAV can fly around the viaducts using GPS data until it

5
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

detects first few columns, and then start missions to inspect those likelihood, P(I| c). Using the theorem, the log-probability is updated
components, while improving the column recognition results every time a new image is streamed. A point is defined to have a valid
continuously. label when the following condition is satisfied:
Pi (c|I). > 99.99% (3)
3.2. Sparse point cloud update and frame-wise image processing
This step works as the online semantic segmentation, as well as the
The first steps after getting each frame of the image stream are to removal of outlier points.
enrich the sparse point cloud data, as well as to perform the semantic After updating point labels, the approximate scale of the point cloud
segmentation and monocular depth estimation. The sparse point cloud is estimated using monocular depth estimation results. In this research,
data update is simplified in this research (Assumption 5 discussed pre­ the point cloud scale is represented by the Gaussian probability density
viously), and therefore this section discusses the visual recognition al­ function with mean μs and variance σs2, which are updated every time a
gorithms applied to perform semantic segmentation and depth new frame, I, is obtained. First, UNET3+ for depth estimation is used to
estimation tasks. obtain a depth map, D(x), where x denotes the image pixel location.
This research extends the 58-layer Fully Convolutional Network When the image point x has the corresponding 3D point in the sparse
(FCN) [53] used by Narazaki et al. [25] to the UNET3+ [54] architecture point cloud data, the depth in the point cloud coordinate system, Dpc(x)
to perform the frame-wise image processing tasks. The UNET3+ im­ is used to compute the scale estimate
proves the U-net [55] by implementing dense and nested skip connec­ D(x)
tions and deep supervision. UNET3+ achieves high segmentation (4)
Dpc (x)
performance without increasing the number of trainable parameters
significantly. The network architecture used in this study is shown in This procedure is repeated for all points in the frame that have cor­
Fig. 4. The network has an encoder that consists of 6 sub-encoders responding 3D points and valid component labels (excluding non-bridge
working at different scales (E0-E5). The sub-decoders at the corre­ class). Considering the scale observations are samples drawn from the
sponding scales (D0-D4, since D5 = E5) are then created by combining Gaussian distribution, the mean and standard deviation are then
sub-encoder outputs from the same and lower scales, as well as sub- updated:
decoder outputs from the higher scales (Fig. 4 (c)). If necessary, linear mμ−s + nμobs
convolution is applied to change the number of channels of the incoming μs ← (5)
m+n
sub-encoder/decoder output into 8, and the resolution is changed to the
(
current one by either max-pooling or bilinear up-sampling (the order of 1 ( )2 ( )2
σ 2s ← (m − 1) σ−s + m μ−s − μs + (n − 1)σ 2obs
the operations and the number of convolutions have been changed from m+n− 1
)
those presented in the original paper to reduce computational work­ + n(μobs − μs )2 (6)
load). The resulting feature map has 48 channels (8 channels from 6
scales), which is further processed by a convolutional layer with 8 filters,
where m is the number of previously obtained samples, n is the number
followed by the batch normalization and ReLu activation function. The
of samples obtained during the current step, (μs− , σ s− ) denote mean and
classification-guided module discussed in [54] is not implemented,
standard deviation computed from the m samples obtained previously.
because all input images contain bridges (the original paper discusses a
The monocular depth estimation and scale parameter update are per­
medical image segmentation problem, where images that contain target
formed until the global pattern of the structure is identified.
patterns are rare). The network is trained by generating predictions from
D0-D4, and minimizing a loss function summed over all those scales. For
semantic segmentation of structural components, a loss function that 3.4. Rectangle detection
combines focal loss, Multi-Scale Structural Similarity Index (MS-SSIM)
loss, and IoU loss is used, following the original paper. This research recognizes and models viaduct columns by rectangular
During depth estimation, camera focal lengths affect the relations prismatic shapes, and therefore the first step after point cloud parsing is
between the distance to an object and its appearance in images. This to find 3D rectangles from the points with the valid “column” label.
research concatenates a layer of the size 20 × 11 × 1 to the output of D5, Rectangle detection of this research is based on the online Expectation
and set the value to the focal length everywhere in that additional layer. Maximization (EM) method discussed in [57]. The method first pa­
As a result, the output of D5 has 129 channels for depth estimation. rameterizes a rectangle by a 9-element vector, θ, that represents the
Reverse Huber loss function is used for this task, following [25,56]. surface normal (3 degrees of freedom, or 3DOF), distance to the plane
(1DOF), location on the plane (2DOF), size (2DOF), and rotation in the
plane (1DOF). Then, the method models the probability density function
3.3. Point cloud parsing
of points as the mixture of J Gaussian distributions:
( ( ))
In this step, semantic segmentation results are fused with the existing ( ) 1 d2 zi θj
point labels using Bayes’ theorem. For the i-th point, the probability that p zi |θj = √̅̅̅̅̅̅̅̅̅̅ exp − (1 ≤ j ≤ J) (7)
2πσ2 2σ 2
the point belongs to the class c given an image I and a prior probability P
(c) is expressed as
and a uniform distribution that represents points that are not assigned to
P(I|c)P(c) any of the rectangles:
Pi (c|I) = ∑ (1) {
c P(I|c)P(c) 1/∣V∣ (zi is in a bounding volume V)
p(zi |θ* ) = (8)
0 (Otherwise)
or equivalently,
In this probabilistic model, d(zi, θj) indicates the minimum distance
lnPi (c|I) = lnP(I|c) + lnP(c) + Const. (2)
between the point zi and a rectangle θj. Design parameters, σ 2, V, are
When a point is observed for the first time, the prior probability is determined for each of the specific applications. After initialization, the
initialized to 0.125 (=1/8), because 8 types of structural components are method iteratively evaluates log-likelihood (E-step) and updates the
identified by UNET3+. During subsequent steps, posterior probability mixture weights (M-step) to find the best set of rectangles and point
from the previous update is used as the prior probability of the current assignments.
update. Softmax probability computed by UNET3+ is then used as the This research makes the following adjustment to the method to

6
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

better accommodate the proposed autonomous UAV navigation strat­


egy, that is, the use of two-dimensional (2D) Gaussian distributions
instead of the 1D distribution:
( )
( ) 1 1 T( ) −1 ( )
p zi |θj = exp − d z θ
i j Σd d z θ
i j (9)
2π|Σd |1/2 2

where d(zi, θj) = [dip(zi, θj), dop(zi, θj)]T is now a 2D vector that contains
in-plane and out-of-plane distance, and
⎡ ⎤
σ 2ip 0
Σd = ⎣ ⎦ (10)
0 σ 2op

is the associated covariance matrix. This adjustment enables the use of


sparse points in-plane, while maintaining the quality of rectangles by
only allowing small deviations in the out-of-plane directions. Based on
the preliminary investigation that tests different values of V, σ ip2, σ op2,
the values of σip and σ op are determined to be 1.0/μs and 0.005/μs,
respectively (μs is the current estimate of the point cloud scale), and the
value of V is determined such that p(zi| θj) and p(zi| θ*) balance at the Fig. 5. Example of detected rectanlges (yellow) and their surface normal vec­
distance of 1 m in the in-plane direction. tors (red) illustrated in the identified X-Y plane. Points indicate the 3D points from
Every time an image is obtained, the method initializes a pre­ the sparse point cloud with the valid “Column” label. (For interpretation of the
determined number of rectangles (Nrect). While the original method references to colour in this figure legend, the reader is referred to the web
initializes the rectangle parameters (θ) randomly, this research initial­ version of this article.)
izes rectangles by (i) selecting Nrect points with the “Column” label
randomly from the points that have not been assigned to any of the closely spaced rectangles that are either in parallel or perpendicular to
rectangles, (ii) finding a small number of points with the “Column” label each other. To identify such rectangle pairs, the system should define the
nearest to the selected point (5 points in this research), and (iii) fitting a world coordinate system aligned with the vertical direction and two
rectangle to the points. After the initialization, the online EM algorithm horizontal directions associated with the column face orientations.
is applied for 10 iterations using points with the “Column” label that When the predetermined number of rectangles (more than 4 rectangles
have not been assigned to any of the rectangles. Finally, model selection in this research) have been identified, Random Sample Consensus
using Bayesian prior is applied to keep informative rectangles only, (RANSAC) algorithm [58] is applied to the rectangle surface normal
following [57]. vectors as follows: (i) determine the threshold t (=0.05), (ii) select n = 2
After rectangles are detected by the online EM method, the results rectangles randomly, (iii) create an n × 3 matrix, A, whose i-th row is the
are refined by checking their dimensions. Based on the prior knowledge i-th surface normal vector, ni, (iv) if |n0 ⋅ ni| < t (vectors are nearly
about the column dimensions (Table 2), rectangles are verified if one of perpendicular), estimate the vertical direction, nv, by singular value
their sides has the length in the range [w− /μs, w+/μs], and the other side decomposition of the matrix A, otherwise go back to (ii), (v) select
has the length in the range [h− /μs, h+/μs]. The rectangles that do not rectangles (inliers) that satisfy |nv ⋅ ni| < t. The steps (ii) -(v) are repeated
satisfy either of the conditions are discarded in this step. This research 100 times, and the case with the largest number of inliers is selected. The
defines [w− , w+] to be [0.3, 1.425] (tolerance of 50% of the minimum final estimate of the vertical vector is obtained using those inliers.
and maximum dimensions), and [h− , h+] to be [2.0, 14.5], respectively. The two other axes are determined by applying a similar process to
With this procedure, candidate column faces are identified incremen­ the projection of rectangle surface normal vectors to the horizontal
tally, and at the same time, the effect of outlier points is further reduced, plane. First, two horizontal axes are initialized arbitrarily. This research
because of the capability of the online EM method to model outlier defines the first horizontal axis in the direction of X-axis of the point
points and the post-processing that discards low-quality rectangles. cloud coordinate system projected to the horizontal plane, and the
second axis is defined accordingly. Using this initial coordinate system,
3.5. Column detection the rotation that brings X-axis into the projected surface normal vector,
α, can be identified. Because the projected surface normal vectors are
Based on the rectangle detection results, rectangular prismatic expected to exist with the multiples of 90-degree intervals, the
shapes are detected as representations of bridge columns. This step is remainder calculated by dividing α by 90 degrees, αrem, is expected to be
run at a lower rate (every Ncb step) to reduce computational load. The close to each other. Therefore, when more than 4 rectangles exist after
column detection consists of three subtasks: (i) detecting rectangular identifying the vertical direction, each rectangle is scanned in the
prismatic shapes from pairs of rectangles, (ii) recovering rectangular following manner; the remainder computed for the ith rectangle, (αrem)i,
prismatic shapes from unpaired rectangles, (iii) identifying the grid = i, and the jth rectangle is determined
is subtracted from (αrem)j for all j ∕
patterns of the column locations, based on which the information about to be an inlier if |(αrem)i − (αrem)j| < t = 0.05. Other processing follows
the unseen columns is inferred. The prior knowledge about the target the standard RANSAC algorithm.
structure encoded in this subsystem includes the dimensions of the grid After identifying the world coordinate system, consistency in the
patterns and the fact that each column faces non-bridge points at the rectangle orientations about the surface normal direction is checked. In
bottom and beam points at the top. These assumptions do not limit the this step, the two in-plane axes that define the rectangle sides are
applicability of the proposed approach significantly, because those compared with the vertical direction of the world coordinate system.
patterns can be generally observed for railway viaducts that are not The rectangle is kept if the sin of the minimum absolute angle difference
necessarily designed based on the standardized procedure adopted by between the vertical direction and the rectangle in-plane axis is less than
the Tokaido Shinkansen line. 0.1, and discarded otherwise. This step removes severely inclined rect­
angles that contaminate column detection results.
3.5.1. Column detection based on pairs of rectangles Each rectangle projected to the horizontal plane can be expressed
Rectangular prismatic shapes can be detected by finding pairs of using the following parameters: (i) a 2D vector of the projected surface

7
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Table 3
Logic applied to find rectangle pairs. dn and dip indicate distances between c2D
and c2D′ in the surface normal direction and its perpendicular direction,
respectively.
Nearly perpendicular (∣n2D ⋅ n2D′ ∣ < Nearly parallel (|n2D ⋅ n2D′ | > 0.95)
0.05)

Paired if w− /2μs < dn < w+/2μs and Paired if w− /μs < dn < w+/μs and dip <
w− /2μs < dip < w+/2μs w+/2μs
Merged if dn < 0.005/μs and dip < 0.5/μs

normal, n2D, (ii) a 2D vector of projected rectangle center, c2D, and (iii)
the length of line segment obtained by projecting the rectangle (a sca­
lar), L2D. Example rectangle detection results expressed using those
parameters are shown in Fig. 5. Using the parameterization, two rect­
angles, (n2D, c2D, L2D) and (n2D′ , c2D′ , L2D′ ), are compared to determine
whether they should be paired or not. The logic applied in this step is
shown in Table 3. First, the two rectangles are classified into either
nearly perpendicular or nearly parallel, based on the dot product, n2D ⋅
n2D′ . At the same time, the 2D distance between the rectangle centers,
‖c2D′ − c2D‖, is evaluated and decomposed into the direction of the
surface normal, n2D, and the direction perpendicular to n2D, yielding
distance components, dn and dip. The logic for the nearly perpendicular
case has been derived from the fact that the distance components of the Fig. 6. Point assignment for the iterative closest point (ICP) algorithm. The
two perpendicular column faces are halves of the width of the faces. length of each side of the square is 1, and the colored regions are ±0.1 from the
Similarly, the logic for pairing nearly parallel rectangles has been projected faces of the initial cuboid.
derived from the fact that the distance between parallel faces in the
surface normal direction is the width of the perpendicular column face. ⎡ ⎤
In addition to those cases, nearly parallel rectangles that are close to λ0 0 0
each other in the surface normal direction are merged, because those S=⎣0 λ1 0⎦ (12)
rectangles are likely to be in the same column face. This case occurs 0 0 λ2
when, for example, the lower part of the column face is detected first, This transform is 9 DOF (note that [59] focuses on 6DOF rigid body
and the upper part is detected later. motion):
Once rectangle pairs are identified, rectangular prismatic shapes
[ ]T
representing viaduct columns are initialized. In this step, 2D projections θcb = λ0 λ1 λ2 θx θy θz tx ty tz (13)
of the columns (rectangular cross-sections) are first obtained using (n2D,
c2D, L2D) and dn. The ends of the columns in the vertical axis are not where [θx θy θz ]T are Euler angles and t = [tx ty tz ]T. Using this
identified by the rectangles alone, because in this online approach, the parameterization, the incremental transform,
rectangles do not necessarily span the entire column face from bottom to [ ]T
top. To address this challenge, this research uses prior knowledge of the dθcb = dλ0 dλ1 dλ2 dθx dθy dθz dtx dty dtz (14)
target structure, that is, columns face non-bridge surface (ground) at the
can be defined as the transform that leads to the following composite
bottom and beams at the top. Using 3D points with the valid labels of
transform:
those categories, this research determines the lower and upper ends of
[ ]
the columns. The steps are: (i) find valid non-bridge or beam points R’S’ t’
within the horizontal distance of 12 m from the mean of coordinates of T(θcb ) = T (15)
0 1
rectangle centers, considering the length of each viaduct is 24 m in the
longitudinal direction, (ii) compute the vertical coordinates of the non- where
bridge/beam planes by RANSAC (20 iterations, sampling 1 point in each ⎡ ⎤
λ0 + dλ0 0 0
iteration, and the threshold of selecting inliers to be 0.15). If no inlier
S’ = ⎣ 0 λ1 + dλ1 0 ⎦ (16)
point is found by RANSAC, or the selected non-bridge/beam plane is
0 0 λ2 + dλ2
higher/lower than the mean height of the rectangle centers, the vertical
coordinates of the column ends are replaced by those of the nearest [ ] [ ][ ]
R’ t’ R t dR dt
column. To complete the column initialization, the horizontal distances = T (17)
0T 1 0 1 0T 1
between column centers are checked, and closely spaced columns (the
horizontal distance is less than w+/μs) are regarded as duplicates, from (dR and dt are the rotation matrix and translation vector associated
which only one column is chosen for the subsequent analysis. with dθcb). Among the 9 parameters of dθcb, 5 parameters (dλ2, dθx, dθy,
The final step of column detection using rectangle pairs is to fit dθz, dtz) are constrained, because those parameters specify the world
rectangular prismatic shapes to 3D points. This step applies the Iterative coordinate system and the vertical location of the column ends. There­
Closest Point (ICP) algorithm with the point-to-plane distance metric fore, the rectangular prismatic shape fitted in this step has 4DOF.
[59]. To apply the method, this research first parameterizes each column The ICP algorithm in this research proceeds by the following steps.
by a transform of a unit cube centered on the origin [0, 0, 0]T, that is, First, 3D points are transformed to the coordinate system of the unit
[ ] cube by applying T− 1(θcb), as shown in Fig. 6. Then, the points that exist
T(θcb ) =
RS t
(11) in the height range of [− 0.5, 0.5] and the horizontal distance range of
0T 1
[− 0.1, 0.1] from faces are selected for the ICP algorithm (other points
are grayed out in Fig. 6). The selected points are assigned to one of the
where R is a rotation matrix, t is the location of the column center, and S
four sides of the cube, based on the four regions illustrated by different
is a matrix that applies appropriate scale to the unit cube;

8
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

3.5.2. Column detection based on unpaired rectangles


Mathematically, a column (rectangular prismatic shape) cannot be
identified uniquely from only one rectangle. To obtain reasonable esti­
mates of such columns, this research examines the global grid-type
pattern of the column locations. In the standard design used by the
Tokaido Shinkansen, column intervals are 5.2 m in the transverse di­
rection, and 6.0 m in the longitudinal direction. This prior knowledge
can be incorporated to improve the accuracy and robustness of the
column detection results. Moreover, this grid-type pattern is shared with
many other railway viaducts, making generalizations to those structures
straightforward.
The grid lines perpendicular to the X and Y axes of the world coor­
dinate system can be represented by the X- and Y-coordinates:
Fig. 7. Grid line initialization step. (a) No column exists nearby (b) Columns
exist nearby in the associated direction. Grid lines (X) : gx = {gx1 , gx2 , ….}
{ }
Grid lines (Y) : gy = gy1 , gy2 , … (19)
colors in Fig. 6. The objective function used at each iteration is written as
th th
∑ where gxi and gxj are X and Y coordinate values of i and j grid lines
J= (di ⋅λi )2 (18)
i
in the associated sets. This research initializes the grid lines using col­
umns detected based on rectangle pairs. For each column, a new grid
where the summation is computed over all the selected points that are line is defined if no other column exists within the distance w+/μs in the
assigned to one of the sides, di is the horizontal distance from each point associated direction (Fig. 7 (a)), and otherwise the column is assigned to
to the nearest side, and λi is the scale of the column in the corresponding the existing grid line defined by the closest column (Fig. 7 (b)). The grid
direction. The objective function J is minimized with respect to the 4 lines are then finalized by computing the mean of the X or Y coordinates
parameters of θcb using the least square method, based on which the of the associated column centers.
point assignments are updated. This process is repeated until the change The next step is to assign unpaired rectangles to the grid lines. Two
of the objective function becomes less than 0.001 or the number of it­ types of rectangle configurations are considered herein: (i) the rectangle
erations reaches 10 times. is approximately parallel to the grid line, and (ii) the rectangle is
The ICP does not always converge to the correct solution when the approximately perpendicular to the grid line. The two types of config­
number of points or the quality of points is not sufficient. To handle such urations are illustrated in Fig. 8. Based on this classification, type 1
cases, this research checks if the dimensions of the cross-sections of the rectangles are assigned to the existing grid lines when the distance from
column after the ICP are in the range [0.55/μs, 1.0/μs ], and reset the the rectangle center to the line is in the range [w− /2μs, w+/2μs]. Then,
parameters to the initial values if the condition is not satisfied (this those rectangles are checked for the type 2 configurations; the rectangles
condition is more strict than the conditions used previously, that is, [w− / are assigned to the existing grid lines when the distance from the rect­
μs, w+/μs], because this is the final check). angle center to the line is less than w+/2μs, and otherwise grid lines
The columns detected by pairs of rectangles are reliable and robust to passing the rectangle centers are added to the set of grid lines in the
measurement noise. On the other hand, not all columns are detected by associated direction.
this approach, because at least two faces of the columns need to be To complete this step, grid lines are refined, and an accurate estimate
detected accurately. To augment the column detection results, methods of the scale of the point cloud, μs, is obtained. First, distance matrices in
to infer information about columns from one or no rectangle are X and Y directions, DX and DY, are defined as follows:
developed, which are discussed next.
(DX )ij = gxj − gxi (20)

(DY )ij = gyj − gyi (21)

Fig. 8. Types of configurations of unpaired rectangles with respect to the grid lines.

9
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

where the subscript (∙)ij indicates (i, j) element of the matrices. Then, we
assume that X-axis is in the transverse direction, and apply the RANSAC
algorithm by the following steps: (i) randomly select one nonzero dis­
tance value, dX, from DX, (ii) assuming |dX| = Dtrans = 5.2m (known
transverse interval of the gridlines), assess intervals of the longitudinal
grid lines, IY, by
⃒ ⃒ (D )
⃒ ⃒
(22)
trans
(IY )ij = ⃒(DY )ij ⃒⋅
|dX |

, (iii) evaluate the difference between (IY)ij and the closest non-zero
integer multiple of known longitudinal interval of the gridlines, Dlong
= 6m, and (iv) select (IY)ij values that satisfy a threshold w+/2μs as in­
liers, (v) repeat steps (i-iv) 20 times and choose the results from the case
with the largest number of inliers. After the RANSAC process, we assume
that Y axis is in the transverse direction, and apply the same RANSAC
process. Finally, the assumption with the larger number of inliers is
selected, from which inlier grid lines and the point cloud scale μs are
Fig. 9. Example waypoints with camera directions shown by red bars. The
obtained. Based on the selected assumption, the global coordinate sys­
waypoint after scanning each face is inserted to avoid the UAV flying too close
to the column. (For interpretation of the references to colour in this figure tem is rotated about Z axis to align X axis to the viaduct transverse di­
legend, the reader is referred to the web version of this article.) rection. The intersections of the refined grid lines are considered as
(Npx = 7, R = 0.5, focal length: 35 mm, sensor size: 36 mm, resolution: 1,920 column locations even if no column has been detected at the location so
× 1,080) far.

3.5.3. Column detection from no rectangle


The final stage of the column detection is to add missing columns

Fig. 10. Pseudocode for the autonomous UAV navigation strategy for rapid post-earthquake inspection of RC railway viaducts.

10
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Fig. 11. Synthetic environment of RC railway viaducts.

based on the awareness to the global structure, or more specifically, grid (Fig. 9). The navigation path is defined such that the UAV inspects the
lines, gx and gy. First, gy is converted to integer values that represents the columns in the following order: from reliable columns (columns detec­
number of intervals (Dlong) counted from the line with the minimum ted based on rectangle pairs) to less reliable ones (columns detected
value of gyi. If missing integer values are identified between the mini­ based on a single rectangle or no rectangle). The order of inspection
mum and maximum values, those values are added to gy as new grid within each reliability category is determined based on the distance
lines. Then, the intersections between gx and gy are identified as the from the camera (from close columns to far columns). This research uses
expected column locations. the A* algorithm [60] in the PythonRobotics library [61] to define the
At each candidate column location (grid line intersection), a 2D 2D path connecting inspection waypoints of different columns in the
square region with the side w+/μs centered at each candidate location is horizontal plane, and linearly interpolates the vertical coordinates of the
checked for the existence of a column that has been already detected. If origin and destination to realize the 3D path. Throughout the navigation
no column has been detected so far in that area, a column is added to process, the column detection results and navigation path are updated
that location, where the parameters other than the column center is every 10 frames. If the 10-th frame is during the column inspection
obtained from the nearest column detected previously. At the same time, depicted in Fig. 9, the updates are deferred until the end of the inspec­
columns detected previously whose centers do not exist inside the tion of that column.
square with the side w+/μs centered at each candidate location are The autonomous UAV navigation planning for rapid post-earthquake
regarded as outliers and discarded. inspection of RC railway viaducts discussed in this section is summa­
rized in the pseudocode in Fig. 10. The system can start planning the
3.5.4. Waypoint determination navigation path once the partial information of the structure (grid lines
Visual recognition methods for structural damage, such as cracks, and first two columns) is obtained, without waiting for the complete
spalling, and exposed rebar, are sensitive to viewpoints to the target information of the structure. The system can also make inferences of
surface and, in particular, the distance to the surface. For example, missing or partially occluded columns during the early stages of the
Narazaki et al. [25] performed semantic segmentation of structural navigation, and progressively improve the information for those col­
damage using deep fully convolutional networks, where the distance to umns as the system experiences more viewpoints.
the target surface is 1.5 pixels per centimeter (pixel/cm) at most. The
network does not always perform accurately for far surfaces, and 4. Demonstration using synthetic environment
therefore the regions with less than 1.5 pixel/cm have been excluded
from the consideration. To accommodate such image post-processing 4.1. Overview: Synthetic environment of RC railway viaducts
approaches, this research determines the waypoints by the following
steps. First, a distance to the target structure in pixel/cm, Npx, and the This research demonstrates the proposed autonomous UAV naviga­
ratio of overlap between adjacent images, R, are determined. Then, for tion planning approach using a synthetic environment of RC railway
each column, the waypoints that collect images of all faces with the viaducts developed for [25]. In the synthetic environment, viaducts are
distance Npx and the overlap R are determined. Example waypoints and modeled following the standard design procedure adopted by the
the path connecting the waypoints are shown in Fig. 9. Tokaido Shinkansen line. The modeling procedure is programmed using
The UAV starts the mission by flying along a predetermined GPS- Python, and therefore every time the python program is executed, via­
based path around (but not underneath) the target structure until the ducts with random geometry, surface texture, and damage scenario can
following two conditions are satisfied: (i) the scale of the structure (grid be created with random surrounding environments. This research gen­
lines) has been identified, and (ii) at least two columns are detected. erates an environment that contains three viaducts with straight tra­
Once those conditions are satisfied, the UAV switches from the GPS- jectory, zero slope, and the height of 7 m, where the autonomous
based navigation to the navigation using the inspection waypoints navigation planning for the inspection of the central viaduct is

Fig. 12. Sparse point cloud data obtained by the sample UAV flight.

11
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Table 4 Table 6
Structural component recognition performance on testing set [%] (synthetic Monocular depth estimation results for the Tokaido dataset (testing set). MAE:
images). mean absolute error [m], RMSE: root mean squared error [m], ARD: absolute
FCN58 [25] UNET3+
relative distance [%].
FCN58 [25] UNET3+ (without focal UNET3+ (with focal
Precision Recall IoU Precision Recall IoU
length) length)
No Bridge 98.8 99.4 98.2 99.6 99.6 99.2
MAE RMSE ARD MAE RMSE ARD MAE RMSE ARD
Slab 96.1 94.6 91.1 97.5 97.3 95.0
Beam 94.0 93.9 88.6 96.2 96.9 93.3 1.20 1.85 10.02 1.01 1.66 8.46 0.82 1.43 6.72
Column 96.6 97.6 94.4 98.6 98.8 97.4
Nonstructural 97.5 91.7 89.5 97.7 95.8 93.6
Rail 95.2 91.6 87.6 95.1 96.0 91.5 dataset (7275 images for training, 300 images for validation, and 1073
Sleeper 84.2 75.3 66.0 87.4 86.7 77.1
images for testing) to form the training set, resulting in 610 iterations
Mean 94.6 92.0 87.9 96.0 95.9 92.4
per epoch. The training has been performed for 200 epochs with
learning rate 1.0 × 10− 3, and then 50 epochs with learning rate 1.0 ×
10− 4, and finally 10 epochs with learning rate 1.0 × 10− 5. Precision,
Table 5 recall and Intersection over Union (IoU) values for the testing set are
Structural component recognition performance on testing set [%] (real-world
shown in Table 4 and Table 5 for synthetic and real-world testing im­
images).
ages, respectively. The comparison with the previously trained network
FCN58 [25] UNET3+ (FCN58) shows consistent improvement of all metric values.
Precision Recall IoU Precision Recall IoU Similar to the previous work [25], synthetic data is used for the
No Bridge 89.6 93.3 84.2 94.7 94.4 89.6 training and testing of the UNET3+ for depth estimation, leading to 606
Slab 85.2 86.7 75.3 88.8 88.7 79.7 iterations per epochs with batch size 12. Other hyperparameters, such as
Beam 84.1 82.5 71.3 87.9 85.6 76.6 the number of epochs and learning rates, followed those of structural
Column 84.9 90.5 78.0 87.5 91.3 80.7 component recognition. The performance of the network is evaluated
Nonstructural 88.0 66.3 60.8 90.9 90.7 83.2
Others 55.4 30.2 24.3 51.5 52.5 35.1
using three accuracy metrics: mean absolute error (mean|ed|), root mean
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
( )
Mean* 86.4 83.9 73.9 89.9 90.1 82.0 squared error ( mean e2d ), and absolute relative distance
⃒ ⃒
* “Others” class is not included. ⃒ ⃒
(mean ⃒ed/d⃒), where d and ed denote ground truth depth and depth

investigated. The overview of the synthetic environment is shown in estimation error, respectively. The performance metric values are shown
Fig. 11. in Table 6 for three networks: FCN58 used in the previous research,
Based on the assumptions 4–5 discussed previously, this section de­ UNET3+ without focal length information, and UNET3+ with focal
fines a sample UAV path around the bridge that collects 235 images with length information. Compared to FCN58 network, depth estimation re­
the resolution of 1,920 × 1,080, sensor size of 36 mm, and focal length of sults are improved consistently with the state-of-the-art architecture.
35 mm. The UAV scans each side of the viaduct for three height levels (2 Moreover, using focal length information as part of the input to the
m, 5 m, 8 m) and flies to the other side, performing the scan at the same network, further accuracy improvement has been attained.
three height levels. During flight, the UAV collects images at 1 m in­
tervals. The sparse point cloud data is then generated using Agisoft
4.3. Demonstration 1: Online column detection using images from sample
Metashape photogrammetry software [52]. The sparse point cloud data
UAV flight
and the UAV path are shown in Fig. 12. The sparse point cloud data
contains 33,829 points in total.
Example results of point cloud parsing, rectangle detection, and
This section first discusses the training of the frame-wise image
column detection for the sample UAV flight data with the number of
processing subsystem using UNET3+ method (“Preparation”). Then, the
rectangles per step (Nrect) of 1 are shown in Fig. 13. In an online pro­
UAV navigation approach is demonstrated by feeding parts of the sparse
cedure, the 2D semantic segmentation can parse the point cloud data
point cloud data associated with each image to the waypoint determi­
and extract points associated with the target structural components.
nation system incrementally (“Demonstration 1”). After evaluating the
Then, the rectangle detection can extract the key geometry of the col­
system’s capability of detecting columns robustly and reliably, the sec­
umn faces. Finally, columns are detected from rectangle pairs, a single
tion extends the discussion to incremental sparse point cloud building
unpaired rectangle, or no rectangle. The figure shows that the global
and progressive improvement and addition of waypoints (“Demonstra­
grid line pattern is identified, based on which the existence of unseen
tion 2”).
columns is inferred during the early stages of the sample flight. More­
over, as the UAV experiences more views, the number of columns
4.2. Preparation: Frame-wise image processing detected from rectangle pairs increases, indicating the progressive
improvement of the detection results.
The first step toward the demonstration of the autonomous naviga­ To investigate the effect of Nrect (the number of rectangles initialized
tion planning is to train its perception module: frame-wise semantic in each step), the column detection is performed 100 times for the
segmentation of structural components and depth estimation. The deep sample UAV flight data with Nrect = 1, 2, 3, 4. The numbers of rectangles
UNET3+ architecture discussed previously is trained for that purpose. A detected for each of the four cases are shown in Fig. 14. At every frame,
large-scale synthetic dataset, termed Tokaido dataset [25], is used to up to Nrect rectangles are identified, which are then reduced at every Ncb
enable the training of the deep networks with more than a million pa­ step by the applications of RANSAC to identify the global coordinate
rameters. During training, the input images and the associated ground system, resulting in zigzag patterns of the plots. Compared to the case
truth labels are flipped horizontally with probability 0.5, and rotated by with Nrect = 1, the case with Nrect = 2 detects more rectangles at a faster
an angle sampled uniformly from the range [− π/12, π/12]. The opti­ rate. When the value of Nrect further increases, the number of rectangles
mization is based on Adam method [62] with batch size of 12. Tensor­ increases accordingly. However, the refinement of the detection results
flow 2 is used for the implementation of algorithms [63]. at every Ncb(=10) frame lower the rate of increase by either discarding
For structural component recognition, small number of real-world or merging rectangles. When the value of Nrect is too large, most of the
images (51 for training, 50 for testing) are mixed with the Tokaido detected triangles are low quality and discarded by the refinement

12
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Fig. 13. Example results of data processing steps (every 20 frame from 20th to 200th frame, Nrect = 1).

13
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Fig. 14. Number of rectangles detected with different values of Nrect.

Fig. 15. Number of columns detected with different values of Nrect.

process. maximum, top 25%, median, top 75%, and minimum number of rect­
The numbers of columns detected for each of the four cases are angles detected among 100 runs. The figures also show that the columns
shown in Fig. 15. Each figure shows five lines, corresponding to the tend to be detected more quickly for larger values of Nrect. For all Monte

14
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Fig. 16. Detailed results for the case with Nrect = 3.

Table 7
Column detection accuracy (root mean squared error of column center location
and orientation). Blue: columns detected based on rectangle pairs, red: columns
detected from unpaired/no rectangle.
After 150th frame After 235th frame

RMS(Blue) RMS(Red) RMS(Blue) RMS(Red)

X[m] 0.057 0.074 0.038 NA


Y[m] 0.061 0.092 0.043 NA
Z[m] 0.127 0.122 0.159 NA
θz [rad] 0.011 0.010 0.014 NA

Table 8
Column detection accuracy (dimensions). Blue: columns detected based on
rectangle pairs, red: columns detected from unpaired/no rectangle.
After 150th frame After 235th frame

Blue Red Blue


Fig. 17. Step processing time (Nrect = 2).
Mean STD Mean STD Mean STD

wx[m] 0.584 0.032 0.582 0.036 0.575 0.044


detected from unpaired or no rectangle (“Red”). After 235th frame, all
wy[m] 0.509 0.065 0.487 0.084 0.552 0.069
h[m] 5.596 0.029 5.582 0.028 5.592 0.024 columns were detected based on rectangle pairs, and therefore RMS
errors only for the “Blue” category are presented. The RMS errors are less
than 5 cm horizontally, and less than 16 cm vertically. The orientations
Carlo runs, eight columns are detected before processing all frames. The are also estimated accurately, with RMS error of 0.014 rad. After 150th
rate of column detection is faster with larger values of Nrect. For frame, the column detection results are less complete, and some of the
example, when Nrect ≥ 2, all 8 columns are detected by the 140th step for columns are detected from unpaired or no rectangle (“Red” columns).
all runs, and by the 40th step for 75% of the tested cases. This obser­ The horizontal accuracies are lower than the accuracy evaluated after
vation supports the advantage of the proposed approach that the UAV the 235th frame, but maintains the RMS error less than 7 cm. Column
can acquire the understanding of the global structure at the early stages vertical locations and rotations are determined globally (ground plane
of the sample flight path. below, beam plane above, and global grid line patterns are used), and
The number of points with the valid “Column” label that are not therefore no significant accuracy improvement has been observed once
assigned to any of the rectangles, as well as the point cloud scale esti­ those patterns are detected.
mation results are shown in Fig. 16 for the case with Nrect = 2. Among The means and standard deviations of the column dimensions esti­
33,829 points contained in the point cloud data, up to about 1000 points mated after 150th and 235th frames are shown in Table 8 (Nrect = 2),
(3.0%) are analyzed to detect rectangles and columns. The number of where wxis the column side length in the longitudinal direction (ground
points further decreases as the method finds valid rectangles repre­ truth is 0.6 m), wy is the column side length in the transverse direction
senting column faces. The point cloud scale is approximate initially (ground truth is 0.6 m), and h is the column height (ground truth is 5.3
(about 22), which converges to the true value (10.34) once the grid m). Again, all columns were detected as “blue” after the last frame, and
pattern is identified (every run gives the same value for the scale esti­ all columns are detected as “blue” or “red” columns after 150th frame,
mation based on the monocular depth estimation, and therefore the with comparable accuracy.
standard deviation is zero initially). The step processing time for the case with Nrect = 2 is evaluated using
The accuracy for the estimation of column locations and dimensions a desktop computer with an Intel Core i9-10900K 3.70GHz CPU, NVIDIA
is evaluated using the ground truth values obtained from the synthetic RTX 3090 graphics card, and 64 GB RAM. The results are shown in
environment (Nrect = 2). The Root Mean Squared (RMS) error of the Fig. 17. The processing time for point cloud parsing and rectangle
estimated column center locations and orientations are evaluated after detection is shorter than that of column detection step, which runs in a
150th and 235th (last) frames and shown in Table 7 for two categories: lower rate. While the results are promising, the investigation of the
columns detected based on rectangle pairs (“Blue”), and columns computational aspect of the autonomous navigation is preliminary in

15
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Fig. 18. Data processing procedure for the demonstration of the autonomous navigation planning approach.

Fig. 19. Coordinate systems used in the second demonstration.

this research: the software implementation has not been optimized for environment) and the corresponding camera poses estimated with the
on-board and real-time applications. For example, optimization of map by the SfM software, following the method discussed in [64] (this
UNET3+ architectures, multi-thread processing, and implementation of transform is used only for image rendering, and therefore not visible for
additional logics that remove data associated with columns that have the proposed navigation planning system).
been inspected already or has not been assigned to valid rectangles for a During the demonstration, the conditions for switching from the
long time. Those investigations are part of future work. GPS-based navigation to the one using the inspection waypoints are
satisfied at the 50th frame (50 m from the initiation of the mission). The
parsed sparse point cloud data, navigation path in 2D plane, and navi­
4.4. Demonstration 2: Autonomous UAV navigation for rapid structural gation path in 3D space are shown in Fig. 20 at every 100 frames after
inspection entering the inspection phase. The proposed strategy can plan a UAV
navigation path for rapid post-earthquake structural inspection auto­
This section demonstrates the autonomous UAV navigation planning matically based on the understanding of the target structure. Further­
approach following the procedure illustrated in Fig. 18. First, a UAV flies more, columns in the adjacent viaducts are detected toward the end of
along a predetermined GPS-based path (Fig. 12) to initiate the mission. the inspection of the current viaduct, enabling smooth transition to the
In contrast to the previous section that computed the sparse point cloud inspection of the next structure.
data off-line using the entire dataset, this section takes a more realistic Images of one of the columns inspected during the inspection phase
approach; every time 10 new images are obtained, those images are are shown in Fig. 21. As expected, images from the close and controlled
processed by the SfM, and then the column detection algorithm dis­ distance are collected with overlaps, which are ideal for the automated
cussed in this research is applied. If the scale (grid lines) and at least two post-earthquake structural damage recognition. The results of applying
columns are detected, the UAV switches from the initial GPS-based the previously trained semantic segmentation algorithm for structural
navigation to the navigation using the inspection waypoints (“Inspec­ damage recognition [25] to the images shown in Fig. 21 are presented in
tion phase” that uses waypoints shown in Fig. 9), and otherwise stays in the appendix.
the GPS-based path. Once the inspection phase is initiated, image
batches are rendered in the synthetic environment as needed to enable 5. Discussions
the demonstration of the progressive path planning and refinement
approach. Following the roadmap presented in Fig. 1, this research developed
The coordinate systems used in this demonstration are summarized the conceptual and theoretical framework for UAV navigation planning
in Fig. 19. The synthetic environment has its own coordinate system, in to enable autonomous rapid post-earthquake inspection of railway via­
which the ground truth configurations of the viaducts and the UAV are ducts, and demonstrated the prototype system in the synthetic envi­
described. From images collected in the synthetic environment, the ronment. The demonstration showed significant potential for detecting
sparse map and the UAV poses are obtained in the default coordinate and localizing critical structural components (columns) in an online
system used by the SfM software. During the data processing, world manner, and then planning appropriate paths to collect close-up images
coordinate system relative to the default coordinate system (transform of those components from desired levels of details and overlaps. On the
T2) is identified, which is used for finding grid patterns and planning other hand, further work is needed to complete the aspects of the
navigation paths. The column detection and path planning results are autonomous post-earthquake structural inspection system that were not
visualized either in this world coordinate system, or in the 2D horizontal covered in this research, which are discussed herein.
plane derived from the world coordinate system. Once the inspection
path is defined in the world coordinate system, the path is converted
back to the default map coordinate system, and then to the coordinate 5.1. Prototype development in experimental environments
system of the synthetic environment to render images. The transform
between the coordinate systems of the synthetic environment and the In this step, a prototype system that includes both hardware and
default map, T1, is defined by estimating the transform between the software should be developed, and its capabilities of performing in­
initial GPS-based waypoints (ground truth values define in the synthetic spection tasks in experimental environments should be validated. The

16
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Fig. 20. Parsed sparse point cloud, navigation path (2D), and navigation path (3D) at 1st, 100th, 200th, 300th, 400th, and 446th (last) frames after entering the
inspection phase. Dotted lines are finished parts, and the green solid lines show planned path to be followed. (For interpretation of the references to colour in this
figure legend, the reader is referred to the web version of this article.)

17
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Fig. 21. Images of the four faces of a column collected by the proposed UAV navigation strategy (Npx = 7, R = 0.5).

prototype system developed in this phase should incorporate advanced experimental environments to field environments. The field environ­
techniques for generic components of autonomous navigation systems, ments have an even higher degree of uncertainty than experimental
such as control, local path planning, and SLAM. These extensions will environments. Perception of the system should be improved to recognize
enable the UAV to perform missions in the complex environments using the types and locations of various objects in the field, and the navigation
the inspection waypoints determined based on the understanding of the unit should consider those complexities appropriately (depth recogni­
structural systems, as discussed in this research. Besides, the current tion could be improved further by incorporating GPS data available
system puts little attention to computational and power resource man­ during the initialization phase). The system’s robustness to various non-
agement, which need to be investigated thoroughly in this phase. The standard trajectories of the viaducts (e.g., large curvature) should be
key to improving the current system would be (i) algorithm optimization improved. Moreover, the current system assumes that the damage to the
to reduce computation (network architectures, eliminating repeated target structure does not reach complete failure (collapse), so that the
computation for the rectangle initialization stage etc.), (ii) frame rate prior knowledge of the global geometric patterns can be explored. To
optimization (image processing for high-level path planning can occur at accommodate the complete collapse scenario, the system should
a lower rate than other tasks for control and local path planning), (iii) perform a global structure-level collapse classification prior to the pro­
optimization of the timing to switch to the inspection (sooner the better cess discussed in this research (“system level” assessment discussed in
in terms of power consumption), and (iv) hardware optimization. [33]). The inspection process is initiated only when the structure is
classified as “not collapsed”.

5.2. Development and validation in field environments

This step extends the prototype system developed for the

18
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

5.3. Generalization to the inspection of other structural systems/ parallel or near-perpendicular pairs. In addition to the columns thus
components detected, the approach tries to find the grid-line pattern of the column
locations, based on which the detection results are refined, and infor­
This research investigated the autonomous navigation planning mation about missing columns is inferred. With this approach, the UAV
system for RC railway viaduct columns with rectangular cross-sections, can initiate the inspection mission before obtaining complete informa­
whose locations are characterized by regular grid-type patterns. This tion of the target structure. The information (map) can be improved
type of structures comprise a significant portion of railway lines in Japan progressively as the UAV experiences more views during the inspection.
and other countries; for example, nearly 50% of the total lengths of some The capability of the developed approach in detecting columns robustly
Japanese high-speed railway lines are made of this type of viaducts [65]. and reliably was evaluated using synthetic data. The results showed that
This fact indicates the importance of developing an effective autono­ all eight columns of the target viaducts were detected in all 100 trials
mous inspection system for this specific structural type. On the other with centimeter-level accuracy. Finally, the entire approach was
hand, extending the system to other structural systems (e.g., non-square demonstrated in the synthetic environment, showing the significant
cross-sections, simply supported bridges) and components (e.g., light potential of collecting high-quality images for post-earthquake struc­
poles, rails, slabs) is not straightforward with the current formulation. A tural inspection efficiently and quickly.
possible approach for generalizing the proposed system to such sce­ The approach investigated in this research is a prototype for the
narios is to find tight 3D bounding boxes of the critical structural com­ autonomous UAV navigation planning for the post-earthquake struc­
ponents, followed by the complete scan of those selected parts, as is tural inspection, and therefore needs to be improved to accommodate
often done by the Skydio platform. experimental and field implementation. First, this research postulates a
simplifying assumption on the sparse point cloud data generation and
5.4. Waypoint determination based on the observed structural conditions updating: this research applies the SfM to image batches. While pro­
ducing accurate results, the approach is computationally expensive, and
In this research, the waypoint determination subsystem after column is not necessarily optimal for mobile robotics applications. Further
detection and localization was deterministic and pre-designed relative investigation of other computational aspects of the proposed approach is
to the square prismatic shapes representing the columns. However, the also needed to enable real-time and on-board data processing (e.g.,
navigation planning approach investigated in this research has the po­ reducing repeated computations for column detection at every Ncb step,
tential of designing paths in response to the observed structural condi­ not performing point cloud processing for inspected columns). Finally,
tions, similar to the human inspection process. After earthquakes, the demonstration in this research is performed in the synthetic envi­
structural damage, such as cracks and large deformations of structural/ ronment only, which is free of uncertainty encountered in the experi­
nonstructural components, may occur, which is the primary source of mental and field environments, such as obstacles and modeling
information for deriving the inspection ratings. By improving and imperfections. The robustness of the proposed approach should be
extending the logic for the waypoint determination step, the “reactive” improved in the future by combining the developed system with the
autonomous system is expected to be able to design a path that collects state-of-the-art techniques for autonomous navigations in general, such
many images of observed damage, without increasing the workload for as local path planning and control, including collision avoidance. The
intact parts of the structure. system investigated in this research is expected to form a basis for those
future investigations toward the fully autonomous post-earthquake
6. Conclusions structural inspection.

This research developed an approach for vision-based autonomous Declaration of Competing Interest
UAV navigation planning for rapid post-earthquake inspection of rein­
forced concrete railway viaducts. The approach mimics the way human None.
inspectors perform the task: the system does not require the complete 3D
model of the environment, and instead uses the key characteristics of the Acknowledgement
target structure as prior knowledge. In an online manner, the system
parses the sparse point cloud data using frame-wise semantic segmen­ The authors would like to acknowledge the financial support by the
tation results, and identifies the approximate scale using the frame-wise U.S. Army Corps of Engineers (Contract/Purchase Order No. W912HZ-
depth estimation results. Then, rectangles are fitted to the points with 17-2-0024). This research was also supported in part by the National
the “Column” label, from which columns are detected by finding near- Natural Science Foundation of China Grant No. 51978182.

Appendix A. Damage recognition using collected images

The synthetic viaducts used for the demonstration in this research contains damage, such as concrete damage and exposed rebar. The 58-layer Fully
Convolutional Network trained previously [25] is applied to the images presented in Fig. 21, and the estimated masks of concrete damage are shown in
Fig. 22. We can observe fine cracks in the detection results, illustrating the application of the proposed autonomous navigation planning approach. For
the detailed discussions about the performance of the semantic segmentation algorithm used herein, the readers are directed to the reference [25].

19
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

Fig. 22. Results of applying previously trained semantic segmentation algorithm to the images shown in Fig. 21.

References Accessed: Nov. 15, 2021. [Online]. Available: http://www.dot.state.mn.


us/research/TS/2015/201540.pdf.
[7] J. Wells, B. Lovelace, Unmanned Aircraft System Bridge Inspection Demonstration
[1] National Research Council, National Earthquake Resilience: Research,
Project Phase II Final Report, Project report for the Minnesota Department of
Implementation, and Outreach, National Academies Press, 2011. https://www.nat
Transportation, 2017. Accessed: Nov. 15, 2021. [Online]. Available: https://rosap.
ionalacademies.org/our-work/national-earthquake-resilience—research-imple
ntl.bts.gov/view/dot/32636.
mentation-and-outreach (accessed Nov. 15, 2021).
[8] C. Brooks, R. Dobson, D. Banach, D. Dean, T. Oommen, R. Wolf, T. Havens,
[2] Cabinet Office, Government of Japan. http://www.bousai.go.jp/ (accessed Jul. 08,
T. Ahlborn, B. Hart, Evaluating the Use of Unmanned Aerial Vehicles for
2020).
Transportation Purposes, A project for the Michigan Department of Transportation,
[3] The Asahi Shimbun evening newspaper (in Japanese), Accessed: Jul. 08, 2020.
2015. Accessed: Nov. 15, 2021. [Online]. Available: https://www.michigan.
[Online]. Available: https://www.asahi.com/, Aug. 11, 2009 (Accessed Jan. 23,
gov/mdot/0,4616,7-151-9622_11045_24249-353767–,00.html.
2022).
[9] D. Lattanzi, G.R. Miller, 3D scene reconstruction for robotic bridge inspection,
[4] A report of the Committee on the Resumption of Train Operation in Tokyo Area
J. Infrastruct. Syst. 21 (2–04014041) (2015) 1–12, https://doi.org/10.1061/
After Large-Scale Earthquakes (in Japanese), Accessed: Jul. 08, 2020. [Online].
(ASCE)IS.1943-555X.0000229.
Available: https://www.mlit.go.jp/common/000208774.pdf, 2012.
[10] S. Chen, D.F. Laefer, E. Mangina, S.M.I. Zolanvari, J. Byrne, UAV bridge inspection
[5] L.D. Otero, N. Gagliardo, D. Dalli, W.H. Huang, P. Cosentino, in: Project report for
through evaluated 3D reconstructions, J. Bridg. Eng. 24 (4–05019001) (2019)
the Florida Department of Transportation (Ed.), Proof of Concept for Using
1–15, https://doi.org/10.1061/(ASCE)BE.1943-5592.0001343.
Unmanned Aerial Vehicles for High Mast Pole and Bridge Inspection, 2015.
[11] T.G. Mondal, M.R. Jahanshahi, Autonomous vision-based damage chronology for
Accessed: Nov. 15, 2021. [Online]. Available: https://rosap.ntl.bts.gov/view/dot/
spatiotemporal condition assessment of civil infrastructure using unmanned aerial
29176.
vehicle, Smart Struct. Syst. 25 (6) (2020) 733–749, https://doi.org/10.12989/
[6] J. Zink, B. Lovelace, Unmanned Aerial Vehicle Bridge Inspection Demonstration
SSS.2020.25.6.733.
Project, Project report for the Minnesota Department of Transportation, 2015.

20
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

[12] I. Brilakis, H. Fathi, A. Rashidi, Progressive 3D reconstruction of infrastructure [37] M. Stokkeland, K. Klausen, T.A. Johansen, Autonomous visual navigation of
with videogrammetry, Autom. Constr. 20 (7) (2011) 884–895, https://doi.org/ unmanned aerial vehicle for wind turbine inspection, Int. Conf. Unmanned Aircraft
10.1016/j.autcon.2011.03.005. Syst. 2015 (2015) 998–1007, https://doi.org/10.1109/ICUAS.2015.7152389.
[13] C.M. Yeum, J. Choi, S.J. Dyke, Autonomous image localization for visual inspection [38] X. Hui, J. Bian, X. Zhao, M. Tan, Vision-based autonomous navigation approach for
of civil infrastructure, Smart Mater. Struct. 26 (3–35051) (2017) 1–12, https://doi. unmanned aerial vehicle transmission-line inspection, Int. J. Adv. Robot. Syst. 15
org/10.1088/1361-665X/aa510e. (1) (2018) 1–15, https://doi.org/10.1177/1729881417752821.
[14] Y. Narazaki, V. Hoskere, T.A. Hoang, B.F. Spencer, Automated bridge component [39] S. Ren, K. He, R. Gershick, J. Sun, R. Girshick, J. Sun, Faster R-CNN: towards real-
recognition using video data, in: The 7th World Conference on Structural Control time object detection with region proposal networks, IEEE Trans. Pattern Anal.
and Monitoring, Qingdao, China, July 22–25, 2018. Accessed: Nov. 15, 2021. Mach. Intell. 39 (6) (2015) 91–99, https://doi.org/10.1109/
[Online]. Available: https://arxiv.org/abs/1806.06820. TPAMI.2016.2577031.
[15] Y. Narazaki, V. Hoskere, T.A. Hoang, Y. Fujino, A. Sakurai, B.F. Spencer, Vision- [40] V.A.H. Higuti, A.E.B. Velasquez, D.V. Magalhaes, M. Becker, G. Chowdhary, Under
based automated bridge component recognition with high-level scene consistency, canopy light detection and ranging-based autonomous navigation, J. Field Robot.
Comp. Aided Civil Infrastruct. Eng. 35 (5) (2020) 465–482, https://doi.org/ 36 (3) (2019) 547–567, https://doi.org/10.1002/rob.21852.
10.1111/mice.12505. [41] Y. Perez-Perez, M. Golparvar-Fard, K. El-Rayes, Artificial neural network for
[16] H. Kim, J. Yoon, S. Sim, Automated bridge component recognition from point semantic segmentation of built environments for automated Scan2BIM, Am. Soc.
clouds using deep learning, Struct. Control. Health Monit. 27 (9-e2591) (2020) Civil Eng. Int. Conf. Comp. Civil Eng. (2019) 97–104, https://doi.org/10.1061/
1–13, https://doi.org/10.1002/stc.2591. 9780784482438.013.
[17] S. Dorafshan, R.J. Thomas, C. Coopmans, M. Maguire, Deep Learning Neural [42] M. Kono, Y. Matsumoto, Design of the standard rigid frame railway bridge in new
Networks for sUAS-Assisted Structural Inspections: Feasibility and Application, Tokaido line (in Japanese), Trans. Japan Soc. Civil Eng. Mar. 1965 (115) (1965)
2018 International Conference on Unmanned Aircraft Systems, Dallas, TX, USA, 13–25, https://doi.org/10.2208/jscej1949.1965.115_13.
Aug. 2018, pp. 874–882, https://doi.org/10.1109/ICUAS.2018.8453409. [43] M. Ohba, The design history of the railway viaduct from the design of tokaido
[18] F.-C. Chen, R.M.R. Jahanshahi, NB-CNN: deep learning-based crack detection using shinkansen to the recent design (in Japanese), Concrete J. 51 (1) (2013) 112–115,
convolutional neural network and Naïve Bayes data fusion, IEEE Trans. Ind. https://doi.org/10.3151/coj.51.112.
Electron. 65 (5) (2017) 4392–4400, https://doi.org/10.1109/TIE.2017.2764844. [44] M. Kobayashi, K. Shinoda, K. Mizuno, S. Nozawa, T. Ishibashi, Study on damage
[19] V. Hoskere, Y. Narazaki, T.A. Hoang, B.F. Spencer, MaDnet: multi-task semantic caused to Shinkansen RC viaducts by the 2011 off the pacific coast of Tohoku
segmentation of multiple types of structural materials and damage in images of earthquake (in Japanese), J. Japan Soc. Civil Eng. A1 70 (4) (2014), https://doi.
civil infrastructure, J. Civ. Struct. Heal. Monit. 10 (2020) 757–773, https://doi. org/10.2208/jscejseee.70.I_688 p. I_688-I_700.
org/10.1007/s13349-020-00409-0. [45] H. Inaguma, M. Seki, Experimental study on earthquake strengthening using
[20] M.R. Saleem, J.-W. Park, J.-H. Lee, H.-J. Jung, M.Z. Sarwar, Instant bridge visual polyester sheets of RC railway viaduct columns (in Japanese), Japan Soc. Civil Eng.
inspection using an unmanned aerial vehicle by image capturing and geo-tagging J. Struct. Eng. 50A (2) (2004) 515–526. Accessed: Nov. 15, 2021. [Online].
system and deep convolutional neural network, Struct. Health Monit. 20 (4) (2020) Available: http://library.jsce.or.jp/jsce/open/00127/2004/50-0515.pdf.
1760–1777, https://doi.org/10.1177/1475921720932384. [46] Y. Takahashi, Report on Damage Caused by the 2011 Tohoku Earthquake (in
[21] J. Shi, R. Zuo, J. Dang, Bridge damage classification and detection using fully Japanese), Accessed: Nov. 15, 2021. [Online]. Available: https://committees.jsce.
convolutional neural network based on images from UAVs, in: Experimental or.jp/report/system/files/10_takahashi.pdf, 2011.
Vibration Analysis for Civil Structures, CRC Press, 2020. ISBN: 9781003090564. [47] K. Tateno, F. Tombari, I. Laina, N. Navab, CNN-SLAM: Real-time dense monocular
[22] E. McLaughlin, N. Charron, S. Narasimhan, Automated defect quantification in SLAM with learned depth prediction, in: IEEE Conference on Computer Vision and
concrete bridges using robotics and deep learning, J. Comput. Civ. Eng. 34 Pattern Recognition, 2017, pp. 6243–6252, https://doi.org/10.1109/
(5–04020029) (2020) 1–12, https://doi.org/10.1061/(asce)cp.1943- CVPR.2017.695.
5487.0000915. [48] T. Schöps, T. Sattler, M. Pollefeys, BAD SLAM: Bundle adjusted direct RGB-D
[23] M.R. Jahanshahi, S.F. Masri, Adaptive vision-based crack detection using 3D scene SLAM, in: IEEE Conference on Computer Vision and Pattern Recognition, 2019,
reconstruction for condition assessment of structures, Autom. Constr. 22 (2012) pp. 134–144, https://doi.org/10.1109/CVPR.2019.00022.
567–576, https://doi.org/10.1016/j.autcon.2011.11.018. [49] R. Mur-Artal, J.D. Tardos, ORB-SLAM2: an open-source SLAM system for
[24] B.F. Spencer, V. Hoskere, Y. Narazaki, Advances in computer vision-based civil monocular, stereo and RGB-D cameras, IEEE Trans. Robot. 33 (5) (2016)
infrastructure inspection and monitoring, Engineering 5 (2) (2019) 199–222, 1255–1262, https://doi.org/10.1109/TRO.2017.2705103.
https://doi.org/10.1016/J.ENG.2018.11.030. [50] J. Engel, T. Schöps, D. Cremers, LSD-SLAM: Large-scale direct monocular SLAM, in:
[25] Y. Narazaki, V. Hoskere, K. Yoshida, B.F. Spencer, Y. Fujino, Synthetic 2014 European Conference on Computer Vision, 2014, pp. 834–849, https://doi.
environments for vision-based structural condition assessment of Japanese high- org/10.1007/978-3-319-10605-2_54.
speed railway viaducts, Mech. Syst. Signal Process. 160 (107850) (2021) 1–22, [51] X. Yang, Y. Gao, H. Luo, C. Liao, K.T. Cheng, Bayesian DeNet: monocular depth
https://doi.org/10.1016/j.ymssp.2021.107850. prediction and frame-wise fusion with synchronized uncertainty, IEEE Trans.
[26] Cross-ministerial strategic innovation promotion program (SIP) report, Cabinet Multimedia 21 (11) (2019) 2701–2713, https://doi.org/10.1109/
office of Japan, 2018. Accessed: Nov. 15, 2021. [Online]. Available: https://www. TMM.2019.2912121.
jst.go.jp/sip/dl/k07/pamphlet_2018_en.pdf. [52] Agisoft Metashape. https://www.agisoft.com/ (accessed Aug. 30, 2020).
[27] B. Yamauchi, Frontier-based approach for autonomous exploration, in: Proceedings [53] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic
of IEEE International Symposium on Computational Intelligence in Robotics and segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39 (4) (2015) 640–651,
Automation, 1997, pp. 146–151, https://doi.org/10.1109/cira.1997.613851. https://doi.org/10.1109/TPAMI.2016.2572683.
[28] S.K. Ramakrishnan, Z. Al-Halah, K. Grauman, Occupancy anticipation for efficient [54] H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, Y. Iwamoto, X. Han, Y.W. Chen, J. Wu,
exploration and navigation, in: Proceedings of the European Conference on UNet 3+: A full-scale connected UNet for medical image segmentation, in: 2020
Computer Vision, 2020. Accessed: Aug. 29, 2020. [Online]. Available: http://arxiv. IEEE International Conference on Acoustics, Speech and Signal Processing, 2020,
org/abs/2008.09285. pp. 1055–1059, https://doi.org/10.1109/ICASSP40776.2020.9053405.
[29] M. Srinivasan Ramanagopal, A.P. Van Nguyen, J. Le Ny, A motion planning [55] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical
strategy for the active vision-based mapping of ground-level structures, IEEE Trans. image segmentation, in: 2015 International Conference on Medical Image
Autom. Sci. Eng. 15 (1) (2018) 356–368, https://doi.org/10.1109/ Computing and Computer-Assisted Intervention vol. 9351, 2015, pp. 234–241,
TASE.2017.2762088. https://doi.org/10.1007/978-3-319-24574-4_28.
[30] A. Howard, M.J. Matarić, G.S. Sukhatme, An incremental self-deployment [56] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab, Deeper depth
algorithm for mobile sensor networks, Auton. Robot. 13 (2) (2002) 113–126, prediction with fully convolutional residual networks, in: 2016 Fourth
https://doi.org/10.1023/A:1019625207705. International Conference on 3D Vision, 2016, pp. 239–248, https://doi.org/
[31] F. Fraundorfer, L. Heng, D. Honegger, G.H. Lee, L. Meier, P. Tanskanen, 10.1109/3DV.2016.32.
M. Pollefeys, Vision-based autonomous mapping and exploration using a quadrotor [57] C. Martin, S. Thrun, Real-time acquisition of compact volumetric 3D maps with
MAV, IEEE Int. Conf. Intel. Robots Syst. (2012) 4557–4564, https://doi.org/ mobile robots, IEEE Int. Conf. Robot. Automat. 1 (2002) 311–316, https://doi.org/
10.1109/IROS.2012.6385934. 10.1109/ROBOT.2002.1013379.
[32] 3D ScanTM | Skydio. https://www.skydio.com/3d-scan (accessed Sep. 10, 2021). [58] M.A. Fischler, R.C. Bolles, Random sample consensus: a paradigm for model fitting
[33] X. Liang, Image-based post-disaster inspection of reinforced concrete bridge with applications to image analysis and automated cartography, Commun. ACM 24
systems using deep learning with Bayesian optimization, Comp. Aided Civil (6) (1981) 381–395, https://doi.org/10.1145/358669.358692.
Infrastruct. Eng. 34 (5) (2018) 415–430, https://doi.org/10.1111/mice.12425. [59] Y. Chen, G. Medioni, Object modeling by registration of multiple range images,
[34] W. Sheng, H. Chen, N. Xi, Navigating a miniature crawler robot for engineered IEEE Int. Conf. Robot. Automat. 3 (1991) 2724–2729, https://doi.org/10.1109/
structure inspection, IEEE Trans. Autom. Sci. Eng. 5 (2) (2008) 368–373, https:// robot.1991.132043.
doi.org/10.1109/TASE.2007.910795. [60] P.E. Hart, N.J. Nilsson, B. Raphael, A formal basis for the heuristic determination of
[35] A. Ibrahim, A. Sabet, M. Golparvar-Fard, BIM-driven mission planning and minimum cost paths, IEEE Trans. Syst. Sci. Cybernet. 4 (2) (1968) 100–107,
navigation for automatic indoor construction progress detection using robotic https://doi.org/10.1109/TSSC.1968.300136.
ground platform, in: 2019 European Conference on Computing in Construction, [61] A. Sakai, D. Ingram, J. Dinius, K. Chawla, A. Raffin, A. Paques, PythonRobotics: A
2019, pp. 182–189, https://doi.org/10.35490/EC3.2019.195. Python Code Collection of Robotics Algorithms, Accessed: Sep. 20, 2020. [Online].
[36] S.S. Mansouri, C. Kanellakis, E. Fresk, D. Kominiak, G. Nikolakopoulos, Available: http://arxiv.org/abs/1808.10703, Aug. 2018.
Cooperative coverage path planning for visual inspection, Control. Eng. Pract. 74 [62] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, in: The 3rd
(2018) 118–131, https://doi.org/10.1016/J.CONENGPRAC.2018.03.002. International Conference for Learning Representations, 2015, pp. 1–15. Accessed:
Nov. 15, 2021. [Online]. Available: https://arxiv.org/pdf/1412.6980.pdf.

21
Y. Narazaki et al. Automation in Construction 137 (2022) 104214

[63] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, [64] D.W. Eggert, A. Lorusso, R.B. Fisher, Estimating 3-D rigid body transformations: a
G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D.G. Murray, comparison of four major algorithms, Mach. Vis. Appl. 9 (5) (1997) 272–290,
B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng, https://doi.org/10.1007/S001380050048.
TensorFlow: a system for large-scale machine learning, in: 12th Symposium on [65] S. Takatsu, M. Doi, High-speed railways in Japan – past and future (in Japanese),
Operating Systems Design and Implementation, 2016, pp. 265–283. Accessed: Nov. Railway Pictorial 58 (2) (2008) 142–153. Accessed: Jul. 10, 2020. [Online].
15, 2021. [Online]. Available: https://research.google/pubs/pub45381/. Available: https://ci.nii.ac.jp/naid/40015748291.

22

You might also like