Professional Documents
Culture Documents
Automation in Construction
journal homepage: www.elsevier.com/locate/autcon
A R T I C L E I N F O A B S T R A C T
Keywords: This research proposes an approach for vision-based autonomous navigation planning of unmanned aerial ve
Post-earthquake inspection hicles for the collection of images suitable for the rapid post-earthquake inspection of reinforced concrete railway
Autonomous structural inspection viaducts. The proposed approach automatically recognizes and localizes critical structural components, columns
Unmanned Aerial Vehicles
in this case, and determines appropriate viewpoints for inspection relative to the identified components.
Online structural component recognition
Semantic segmentation
Structural component recognition and localization are formulated through online detection of rectangular
Path planning prismatic shapes from the parsed sparse point-cloud data, where prior knowledge of the target structural system
is incorporated. The proposed approach is tested in a synthetic environment representing Japanese high-speed
railway viaducts. First, the ability to detect the columns of the target viaduct is assessed. The results show
that the columns are detected completely and robustly, with centimeter-level accuracy. Subsequently, the entire
approach is demonstrated in the synthetic environment, showing the significant potential of collecting high-
quality images for post-earthquake structural inspection efficiently.
* Corresponding author.
E-mail address: narazaki@intl.zju.edu.cn (Y. Narazaki).
https://doi.org/10.1016/j.autcon.2022.104214
Received 10 September 2021; Received in revised form 25 January 2022; Accepted 13 March 2022
Available online 30 March 2022
0926-5805/© 2022 Elsevier B.V. All rights reserved.
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
2
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
Fig. 1. Roadmap for the autonomous post-earthquake structural inspection system. First two blocks, “conceptual and theoretical development” and “prototype
development in synthetic environments” are investigated in this paper.
structural components are located. After the initiation of the task, the systems for the post-earthquake structural inspection problem. Sup
inspector’s understanding of the target structure is improved incre ported by the work to date and vision for the future, this research in
mentally and continuously during the inspection. The inspector does not vestigates the first two parts of the roadmap, conceptual and theoretical
need to have access to the complete design drawing of the structure to development, and prototype development in synthetic environments.
perform this task. Rather, prior knowledge is required regarding the This research proposes an approach for vision-based autonomous
typical geometric or visual patterns of the structure (e.g., regular grid UAV navigation planning for rapid post-earthquake inspection of rein
patterns of the column locations, typical dimensions of the columns, forced concrete railway viaducts. The proposed approach automatically
etc.). After this process, the inspector has taken a detailed look of all recognizes and localizes critical structural components, columns in this
structurally important parts of the viaduct, based on which the struc case, and determines appropriate viewpoints for inspection relative to
tural conditions are evaluated. the identified components. To date, structural component recognition
A roadmap for realizing the envisioned autonomous system for UAV- and localization have been investigated as 2D semantic segmentation
based rapid post-earthquake structural inspection is shown in Fig. 1. The [15,25] or off-line dense point cloud segmentation [16,41] problems.
roadmap is divided into three parts: conceptual and theoretical devel On the other hand, the approach presented in this research identifies
opment, prototype development in synthetic environments, and devel rectangular prismatic shapes (not just segmented points) representing
opment and validation in experimental/field environments. In the bridge columns (i) using sparse point cloud data (ii) by online processing
conceptual and theoretical development, the high-level description of (iii) that is aware of the key characteristics of the target structure. The
the logic of the human inspectors should be converted into technical and first point, the use of sparse point cloud data, is critical for this research,
mathematical descriptions to enable software implementations that can because accurate dense point cloud data is typically not available in the
be readily combined with other generic components of autonomous robotic navigation scenarios. The second point, online processing, is also
navigation systems. Then, the system should be investigated in synthetic appropriate for the application context: the system does not have to wait
environments. Synthetic environments are suitable during preliminary until it gets the complete information about the target structure. Instead,
development stages, because of their capabilities to support extensive the system can recognize some of the structural components from partial
tests under controlled environments without actual safety risks. Once point cloud data of the structure and start planning the waypoints
the autonomous system (software) works successfully in synthetic en quickly. The last point, the awareness to the characteristics of the
vironments, a prototype system should be developed and tested in structure, is meant to mimic the prior knowledge by the human
experimental and field environments. In this step, the prototype high- inspector. This research encodes typical geometric patterns of railway
level navigation planning system should be integrated with generic viaducts to improve the column recognition results, as well as guess
components of autonomous navigations, such as state-of-the-art map column information from insufficient or no 3D points associated with
ping algorithms and local path planning (including obstacle avoidance). those columns. Following conceptual and theoretical development of
Hardware implementation of the combined system should then be the proposed autonomous navigation planning approach, the prototype
investigated. Finally, the system should be validated in field environ system is developed and tested in a synthetic environment representing
ments, where the robustness should be improved to handle various Japanese high-speed railway viaducts.
uncertainties which have not been considered in the synthetic and This paper first defines the autonomous UAV navigation planning
experimental environments. This roadmap is based on the previous work problem investigated in this research, including the associated as
by the authors about synthetic environments that represent post- sumptions. Then, the algorithmic components that embody the auton
earthquake scenarios of Japanese high-speed railway (Tokaido Shin omous navigation planning approach are discussed. Finally, the
kansen) viaducts [25]. In that research, 2000 viaducts were generated validation and demonstration are performed in the synthetic environ
by following the actual design procedures adopted by Tokaido Shin ment of Japanese high-speed railway viaducts. This research makes a
kansen, providing a relevant platform for investigating autonomous key step toward realizing the autonomous UAV navigation for rapid
3
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
Fig. 3. Overview of the proposed autonomous UAV navigation approach for post-earthquake structural inspection of RC railway viaducts.
4
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
Fig. 4. UNET3+ architecture used in the frame-wise image processing of this research.
E(i): i-th sub-encoder, D(i): i-th sub-decoder, Conv: convolution, Dconv: depth-wise separable convolution, (m,n,f,s) specifies filter size (m × n), number of filters (f),
and stride (s × s) of the Conv or Dconv operation, BN-ReLu: batch normalization, followed by ReLu activation function, ↓n: max-pooling (n × n), ↑n: bilinear
upsampling by a factor of n, Sup: prediction for deep supervision.
inspection can be guided by the GPS data. viaducts is shown in Fig. 3. The system determines waypoints to collect
Assumption 5 (Sparse point cloud): Besides raw image data, the close-up images of critical structural components, or columns, by pro
approach developed in this research processes the map of the environ cessing the input image stream sequentially. The system begins with
ment represented by sparse point cloud data to determine waypoints. processing each frame of the input image stream. Two types of pro
During navigation, the map is created progressively, typically by cessing are performed on the raw image data: updating the sparse point
simultaneous localization and mapping (SLAM) algorithms, Structure cloud map of the environment and visual recognition. The sparse point
from Motion (SfM) algorithms, or LiDAR-based odometry algorithms. cloud update is typically performed by SLAM or SfM algorithms. Visual
Those mapping algorithms embody an independent active research area recognition has two components: 2D semantic segmentation of struc
with rapid improvement in accuracy, point density, and computational tural components, and monocular depth estimation. The semantic seg
efficiency (e.g., [47–51]). This research, on the other hand, simplifies mentation results are used to incrementally parse the sparse point cloud
the mapping component to focus on the problem formulation; data, and the depth estimation results are used during the initial phase of
computing sparse point cloud data batch-wise using SfM [52] and the navigation to estimate the approximate scale of the point cloud data.
feeding the 3D points that fall in each frame incrementally to the algo Once the parsed point cloud data is updated based on the semantic
rithms for waypoint determination. segmentation results, 3D rectangles are searched for from the points
labeled as columns. Once in a predetermined number of frames (Ncb),
3. Methods the rectangles are further processed to find pairs that form rectangular
prismatic shapes representing viaduct columns (blue shapes in Fig. 3). If
3.1. Overview enough columns are detected to identify the regular grid pattern, in
formation about columns with unpaired rectangles (yellow) and no
An overview of the proposed approach for autonomous UAV navi rectangle (red) are guessed based on the identified pattern. With this
gation planning for post-earthquake structural inspection of RC railway approach, the UAV can fly around the viaducts using GPS data until it
5
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
detects first few columns, and then start missions to inspect those likelihood, P(I| c). Using the theorem, the log-probability is updated
components, while improving the column recognition results every time a new image is streamed. A point is defined to have a valid
continuously. label when the following condition is satisfied:
Pi (c|I). > 99.99% (3)
3.2. Sparse point cloud update and frame-wise image processing
This step works as the online semantic segmentation, as well as the
The first steps after getting each frame of the image stream are to removal of outlier points.
enrich the sparse point cloud data, as well as to perform the semantic After updating point labels, the approximate scale of the point cloud
segmentation and monocular depth estimation. The sparse point cloud is estimated using monocular depth estimation results. In this research,
data update is simplified in this research (Assumption 5 discussed pre the point cloud scale is represented by the Gaussian probability density
viously), and therefore this section discusses the visual recognition al function with mean μs and variance σs2, which are updated every time a
gorithms applied to perform semantic segmentation and depth new frame, I, is obtained. First, UNET3+ for depth estimation is used to
estimation tasks. obtain a depth map, D(x), where x denotes the image pixel location.
This research extends the 58-layer Fully Convolutional Network When the image point x has the corresponding 3D point in the sparse
(FCN) [53] used by Narazaki et al. [25] to the UNET3+ [54] architecture point cloud data, the depth in the point cloud coordinate system, Dpc(x)
to perform the frame-wise image processing tasks. The UNET3+ im is used to compute the scale estimate
proves the U-net [55] by implementing dense and nested skip connec D(x)
tions and deep supervision. UNET3+ achieves high segmentation (4)
Dpc (x)
performance without increasing the number of trainable parameters
significantly. The network architecture used in this study is shown in This procedure is repeated for all points in the frame that have cor
Fig. 4. The network has an encoder that consists of 6 sub-encoders responding 3D points and valid component labels (excluding non-bridge
working at different scales (E0-E5). The sub-decoders at the corre class). Considering the scale observations are samples drawn from the
sponding scales (D0-D4, since D5 = E5) are then created by combining Gaussian distribution, the mean and standard deviation are then
sub-encoder outputs from the same and lower scales, as well as sub- updated:
decoder outputs from the higher scales (Fig. 4 (c)). If necessary, linear mμ−s + nμobs
convolution is applied to change the number of channels of the incoming μs ← (5)
m+n
sub-encoder/decoder output into 8, and the resolution is changed to the
(
current one by either max-pooling or bilinear up-sampling (the order of 1 ( )2 ( )2
σ 2s ← (m − 1) σ−s + m μ−s − μs + (n − 1)σ 2obs
the operations and the number of convolutions have been changed from m+n− 1
)
those presented in the original paper to reduce computational work + n(μobs − μs )2 (6)
load). The resulting feature map has 48 channels (8 channels from 6
scales), which is further processed by a convolutional layer with 8 filters,
where m is the number of previously obtained samples, n is the number
followed by the batch normalization and ReLu activation function. The
of samples obtained during the current step, (μs− , σ s− ) denote mean and
classification-guided module discussed in [54] is not implemented,
standard deviation computed from the m samples obtained previously.
because all input images contain bridges (the original paper discusses a
The monocular depth estimation and scale parameter update are per
medical image segmentation problem, where images that contain target
formed until the global pattern of the structure is identified.
patterns are rare). The network is trained by generating predictions from
D0-D4, and minimizing a loss function summed over all those scales. For
semantic segmentation of structural components, a loss function that 3.4. Rectangle detection
combines focal loss, Multi-Scale Structural Similarity Index (MS-SSIM)
loss, and IoU loss is used, following the original paper. This research recognizes and models viaduct columns by rectangular
During depth estimation, camera focal lengths affect the relations prismatic shapes, and therefore the first step after point cloud parsing is
between the distance to an object and its appearance in images. This to find 3D rectangles from the points with the valid “column” label.
research concatenates a layer of the size 20 × 11 × 1 to the output of D5, Rectangle detection of this research is based on the online Expectation
and set the value to the focal length everywhere in that additional layer. Maximization (EM) method discussed in [57]. The method first pa
As a result, the output of D5 has 129 channels for depth estimation. rameterizes a rectangle by a 9-element vector, θ, that represents the
Reverse Huber loss function is used for this task, following [25,56]. surface normal (3 degrees of freedom, or 3DOF), distance to the plane
(1DOF), location on the plane (2DOF), size (2DOF), and rotation in the
plane (1DOF). Then, the method models the probability density function
3.3. Point cloud parsing
of points as the mixture of J Gaussian distributions:
( ( ))
In this step, semantic segmentation results are fused with the existing ( ) 1 d2 zi θj
point labels using Bayes’ theorem. For the i-th point, the probability that p zi |θj = √̅̅̅̅̅̅̅̅̅̅ exp − (1 ≤ j ≤ J) (7)
2πσ2 2σ 2
the point belongs to the class c given an image I and a prior probability P
(c) is expressed as
and a uniform distribution that represents points that are not assigned to
P(I|c)P(c) any of the rectangles:
Pi (c|I) = ∑ (1) {
c P(I|c)P(c) 1/∣V∣ (zi is in a bounding volume V)
p(zi |θ* ) = (8)
0 (Otherwise)
or equivalently,
In this probabilistic model, d(zi, θj) indicates the minimum distance
lnPi (c|I) = lnP(I|c) + lnP(c) + Const. (2)
between the point zi and a rectangle θj. Design parameters, σ 2, V, are
When a point is observed for the first time, the prior probability is determined for each of the specific applications. After initialization, the
initialized to 0.125 (=1/8), because 8 types of structural components are method iteratively evaluates log-likelihood (E-step) and updates the
identified by UNET3+. During subsequent steps, posterior probability mixture weights (M-step) to find the best set of rectangles and point
from the previous update is used as the prior probability of the current assignments.
update. Softmax probability computed by UNET3+ is then used as the This research makes the following adjustment to the method to
6
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
where d(zi, θj) = [dip(zi, θj), dop(zi, θj)]T is now a 2D vector that contains
in-plane and out-of-plane distance, and
⎡ ⎤
σ 2ip 0
Σd = ⎣ ⎦ (10)
0 σ 2op
7
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
Table 3
Logic applied to find rectangle pairs. dn and dip indicate distances between c2D
and c2D′ in the surface normal direction and its perpendicular direction,
respectively.
Nearly perpendicular (∣n2D ⋅ n2D′ ∣ < Nearly parallel (|n2D ⋅ n2D′ | > 0.95)
0.05)
Paired if w− /2μs < dn < w+/2μs and Paired if w− /μs < dn < w+/μs and dip <
w− /2μs < dip < w+/2μs w+/2μs
Merged if dn < 0.005/μs and dip < 0.5/μs
normal, n2D, (ii) a 2D vector of projected rectangle center, c2D, and (iii)
the length of line segment obtained by projecting the rectangle (a sca
lar), L2D. Example rectangle detection results expressed using those
parameters are shown in Fig. 5. Using the parameterization, two rect
angles, (n2D, c2D, L2D) and (n2D′ , c2D′ , L2D′ ), are compared to determine
whether they should be paired or not. The logic applied in this step is
shown in Table 3. First, the two rectangles are classified into either
nearly perpendicular or nearly parallel, based on the dot product, n2D ⋅
n2D′ . At the same time, the 2D distance between the rectangle centers,
‖c2D′ − c2D‖, is evaluated and decomposed into the direction of the
surface normal, n2D, and the direction perpendicular to n2D, yielding
distance components, dn and dip. The logic for the nearly perpendicular
case has been derived from the fact that the distance components of the Fig. 6. Point assignment for the iterative closest point (ICP) algorithm. The
two perpendicular column faces are halves of the width of the faces. length of each side of the square is 1, and the colored regions are ±0.1 from the
Similarly, the logic for pairing nearly parallel rectangles has been projected faces of the initial cuboid.
derived from the fact that the distance between parallel faces in the
surface normal direction is the width of the perpendicular column face. ⎡ ⎤
In addition to those cases, nearly parallel rectangles that are close to λ0 0 0
each other in the surface normal direction are merged, because those S=⎣0 λ1 0⎦ (12)
rectangles are likely to be in the same column face. This case occurs 0 0 λ2
when, for example, the lower part of the column face is detected first, This transform is 9 DOF (note that [59] focuses on 6DOF rigid body
and the upper part is detected later. motion):
Once rectangle pairs are identified, rectangular prismatic shapes
[ ]T
representing viaduct columns are initialized. In this step, 2D projections θcb = λ0 λ1 λ2 θx θy θz tx ty tz (13)
of the columns (rectangular cross-sections) are first obtained using (n2D,
c2D, L2D) and dn. The ends of the columns in the vertical axis are not where [θx θy θz ]T are Euler angles and t = [tx ty tz ]T. Using this
identified by the rectangles alone, because in this online approach, the parameterization, the incremental transform,
rectangles do not necessarily span the entire column face from bottom to [ ]T
top. To address this challenge, this research uses prior knowledge of the dθcb = dλ0 dλ1 dλ2 dθx dθy dθz dtx dty dtz (14)
target structure, that is, columns face non-bridge surface (ground) at the
can be defined as the transform that leads to the following composite
bottom and beams at the top. Using 3D points with the valid labels of
transform:
those categories, this research determines the lower and upper ends of
[ ]
the columns. The steps are: (i) find valid non-bridge or beam points R’S’ t’
within the horizontal distance of 12 m from the mean of coordinates of T(θcb ) = T (15)
0 1
rectangle centers, considering the length of each viaduct is 24 m in the
longitudinal direction, (ii) compute the vertical coordinates of the non- where
bridge/beam planes by RANSAC (20 iterations, sampling 1 point in each ⎡ ⎤
λ0 + dλ0 0 0
iteration, and the threshold of selecting inliers to be 0.15). If no inlier
S’ = ⎣ 0 λ1 + dλ1 0 ⎦ (16)
point is found by RANSAC, or the selected non-bridge/beam plane is
0 0 λ2 + dλ2
higher/lower than the mean height of the rectangle centers, the vertical
coordinates of the column ends are replaced by those of the nearest [ ] [ ][ ]
R’ t’ R t dR dt
column. To complete the column initialization, the horizontal distances = T (17)
0T 1 0 1 0T 1
between column centers are checked, and closely spaced columns (the
horizontal distance is less than w+/μs) are regarded as duplicates, from (dR and dt are the rotation matrix and translation vector associated
which only one column is chosen for the subsequent analysis. with dθcb). Among the 9 parameters of dθcb, 5 parameters (dλ2, dθx, dθy,
The final step of column detection using rectangle pairs is to fit dθz, dtz) are constrained, because those parameters specify the world
rectangular prismatic shapes to 3D points. This step applies the Iterative coordinate system and the vertical location of the column ends. There
Closest Point (ICP) algorithm with the point-to-plane distance metric fore, the rectangular prismatic shape fitted in this step has 4DOF.
[59]. To apply the method, this research first parameterizes each column The ICP algorithm in this research proceeds by the following steps.
by a transform of a unit cube centered on the origin [0, 0, 0]T, that is, First, 3D points are transformed to the coordinate system of the unit
[ ] cube by applying T− 1(θcb), as shown in Fig. 6. Then, the points that exist
T(θcb ) =
RS t
(11) in the height range of [− 0.5, 0.5] and the horizontal distance range of
0T 1
[− 0.1, 0.1] from faces are selected for the ICP algorithm (other points
are grayed out in Fig. 6). The selected points are assigned to one of the
where R is a rotation matrix, t is the location of the column center, and S
four sides of the cube, based on the four regions illustrated by different
is a matrix that applies appropriate scale to the unit cube;
8
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
Fig. 8. Types of configurations of unpaired rectangles with respect to the grid lines.
9
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
where the subscript (∙)ij indicates (i, j) element of the matrices. Then, we
assume that X-axis is in the transverse direction, and apply the RANSAC
algorithm by the following steps: (i) randomly select one nonzero dis
tance value, dX, from DX, (ii) assuming |dX| = Dtrans = 5.2m (known
transverse interval of the gridlines), assess intervals of the longitudinal
grid lines, IY, by
⃒ ⃒ (D )
⃒ ⃒
(22)
trans
(IY )ij = ⃒(DY )ij ⃒⋅
|dX |
, (iii) evaluate the difference between (IY)ij and the closest non-zero
integer multiple of known longitudinal interval of the gridlines, Dlong
= 6m, and (iv) select (IY)ij values that satisfy a threshold w+/2μs as in
liers, (v) repeat steps (i-iv) 20 times and choose the results from the case
with the largest number of inliers. After the RANSAC process, we assume
that Y axis is in the transverse direction, and apply the same RANSAC
process. Finally, the assumption with the larger number of inliers is
selected, from which inlier grid lines and the point cloud scale μs are
Fig. 9. Example waypoints with camera directions shown by red bars. The
obtained. Based on the selected assumption, the global coordinate sys
waypoint after scanning each face is inserted to avoid the UAV flying too close
to the column. (For interpretation of the references to colour in this figure tem is rotated about Z axis to align X axis to the viaduct transverse di
legend, the reader is referred to the web version of this article.) rection. The intersections of the refined grid lines are considered as
(Npx = 7, R = 0.5, focal length: 35 mm, sensor size: 36 mm, resolution: 1,920 column locations even if no column has been detected at the location so
× 1,080) far.
Fig. 10. Pseudocode for the autonomous UAV navigation strategy for rapid post-earthquake inspection of RC railway viaducts.
10
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
based on the awareness to the global structure, or more specifically, grid (Fig. 9). The navigation path is defined such that the UAV inspects the
lines, gx and gy. First, gy is converted to integer values that represents the columns in the following order: from reliable columns (columns detec
number of intervals (Dlong) counted from the line with the minimum ted based on rectangle pairs) to less reliable ones (columns detected
value of gyi. If missing integer values are identified between the mini based on a single rectangle or no rectangle). The order of inspection
mum and maximum values, those values are added to gy as new grid within each reliability category is determined based on the distance
lines. Then, the intersections between gx and gy are identified as the from the camera (from close columns to far columns). This research uses
expected column locations. the A* algorithm [60] in the PythonRobotics library [61] to define the
At each candidate column location (grid line intersection), a 2D 2D path connecting inspection waypoints of different columns in the
square region with the side w+/μs centered at each candidate location is horizontal plane, and linearly interpolates the vertical coordinates of the
checked for the existence of a column that has been already detected. If origin and destination to realize the 3D path. Throughout the navigation
no column has been detected so far in that area, a column is added to process, the column detection results and navigation path are updated
that location, where the parameters other than the column center is every 10 frames. If the 10-th frame is during the column inspection
obtained from the nearest column detected previously. At the same time, depicted in Fig. 9, the updates are deferred until the end of the inspec
columns detected previously whose centers do not exist inside the tion of that column.
square with the side w+/μs centered at each candidate location are The autonomous UAV navigation planning for rapid post-earthquake
regarded as outliers and discarded. inspection of RC railway viaducts discussed in this section is summa
rized in the pseudocode in Fig. 10. The system can start planning the
3.5.4. Waypoint determination navigation path once the partial information of the structure (grid lines
Visual recognition methods for structural damage, such as cracks, and first two columns) is obtained, without waiting for the complete
spalling, and exposed rebar, are sensitive to viewpoints to the target information of the structure. The system can also make inferences of
surface and, in particular, the distance to the surface. For example, missing or partially occluded columns during the early stages of the
Narazaki et al. [25] performed semantic segmentation of structural navigation, and progressively improve the information for those col
damage using deep fully convolutional networks, where the distance to umns as the system experiences more viewpoints.
the target surface is 1.5 pixels per centimeter (pixel/cm) at most. The
network does not always perform accurately for far surfaces, and 4. Demonstration using synthetic environment
therefore the regions with less than 1.5 pixel/cm have been excluded
from the consideration. To accommodate such image post-processing 4.1. Overview: Synthetic environment of RC railway viaducts
approaches, this research determines the waypoints by the following
steps. First, a distance to the target structure in pixel/cm, Npx, and the This research demonstrates the proposed autonomous UAV naviga
ratio of overlap between adjacent images, R, are determined. Then, for tion planning approach using a synthetic environment of RC railway
each column, the waypoints that collect images of all faces with the viaducts developed for [25]. In the synthetic environment, viaducts are
distance Npx and the overlap R are determined. Example waypoints and modeled following the standard design procedure adopted by the
the path connecting the waypoints are shown in Fig. 9. Tokaido Shinkansen line. The modeling procedure is programmed using
The UAV starts the mission by flying along a predetermined GPS- Python, and therefore every time the python program is executed, via
based path around (but not underneath) the target structure until the ducts with random geometry, surface texture, and damage scenario can
following two conditions are satisfied: (i) the scale of the structure (grid be created with random surrounding environments. This research gen
lines) has been identified, and (ii) at least two columns are detected. erates an environment that contains three viaducts with straight tra
Once those conditions are satisfied, the UAV switches from the GPS- jectory, zero slope, and the height of 7 m, where the autonomous
based navigation to the navigation using the inspection waypoints navigation planning for the inspection of the central viaduct is
Fig. 12. Sparse point cloud data obtained by the sample UAV flight.
11
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
Table 4 Table 6
Structural component recognition performance on testing set [%] (synthetic Monocular depth estimation results for the Tokaido dataset (testing set). MAE:
images). mean absolute error [m], RMSE: root mean squared error [m], ARD: absolute
FCN58 [25] UNET3+
relative distance [%].
FCN58 [25] UNET3+ (without focal UNET3+ (with focal
Precision Recall IoU Precision Recall IoU
length) length)
No Bridge 98.8 99.4 98.2 99.6 99.6 99.2
MAE RMSE ARD MAE RMSE ARD MAE RMSE ARD
Slab 96.1 94.6 91.1 97.5 97.3 95.0
Beam 94.0 93.9 88.6 96.2 96.9 93.3 1.20 1.85 10.02 1.01 1.66 8.46 0.82 1.43 6.72
Column 96.6 97.6 94.4 98.6 98.8 97.4
Nonstructural 97.5 91.7 89.5 97.7 95.8 93.6
Rail 95.2 91.6 87.6 95.1 96.0 91.5 dataset (7275 images for training, 300 images for validation, and 1073
Sleeper 84.2 75.3 66.0 87.4 86.7 77.1
images for testing) to form the training set, resulting in 610 iterations
Mean 94.6 92.0 87.9 96.0 95.9 92.4
per epoch. The training has been performed for 200 epochs with
learning rate 1.0 × 10− 3, and then 50 epochs with learning rate 1.0 ×
10− 4, and finally 10 epochs with learning rate 1.0 × 10− 5. Precision,
Table 5 recall and Intersection over Union (IoU) values for the testing set are
Structural component recognition performance on testing set [%] (real-world
shown in Table 4 and Table 5 for synthetic and real-world testing im
images).
ages, respectively. The comparison with the previously trained network
FCN58 [25] UNET3+ (FCN58) shows consistent improvement of all metric values.
Precision Recall IoU Precision Recall IoU Similar to the previous work [25], synthetic data is used for the
No Bridge 89.6 93.3 84.2 94.7 94.4 89.6 training and testing of the UNET3+ for depth estimation, leading to 606
Slab 85.2 86.7 75.3 88.8 88.7 79.7 iterations per epochs with batch size 12. Other hyperparameters, such as
Beam 84.1 82.5 71.3 87.9 85.6 76.6 the number of epochs and learning rates, followed those of structural
Column 84.9 90.5 78.0 87.5 91.3 80.7 component recognition. The performance of the network is evaluated
Nonstructural 88.0 66.3 60.8 90.9 90.7 83.2
Others 55.4 30.2 24.3 51.5 52.5 35.1
using three accuracy metrics: mean absolute error (mean|ed|), root mean
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
( )
Mean* 86.4 83.9 73.9 89.9 90.1 82.0 squared error ( mean e2d ), and absolute relative distance
⃒ ⃒
* “Others” class is not included. ⃒ ⃒
(mean ⃒ed/d⃒), where d and ed denote ground truth depth and depth
investigated. The overview of the synthetic environment is shown in estimation error, respectively. The performance metric values are shown
Fig. 11. in Table 6 for three networks: FCN58 used in the previous research,
Based on the assumptions 4–5 discussed previously, this section de UNET3+ without focal length information, and UNET3+ with focal
fines a sample UAV path around the bridge that collects 235 images with length information. Compared to FCN58 network, depth estimation re
the resolution of 1,920 × 1,080, sensor size of 36 mm, and focal length of sults are improved consistently with the state-of-the-art architecture.
35 mm. The UAV scans each side of the viaduct for three height levels (2 Moreover, using focal length information as part of the input to the
m, 5 m, 8 m) and flies to the other side, performing the scan at the same network, further accuracy improvement has been attained.
three height levels. During flight, the UAV collects images at 1 m in
tervals. The sparse point cloud data is then generated using Agisoft
4.3. Demonstration 1: Online column detection using images from sample
Metashape photogrammetry software [52]. The sparse point cloud data
UAV flight
and the UAV path are shown in Fig. 12. The sparse point cloud data
contains 33,829 points in total.
Example results of point cloud parsing, rectangle detection, and
This section first discusses the training of the frame-wise image
column detection for the sample UAV flight data with the number of
processing subsystem using UNET3+ method (“Preparation”). Then, the
rectangles per step (Nrect) of 1 are shown in Fig. 13. In an online pro
UAV navigation approach is demonstrated by feeding parts of the sparse
cedure, the 2D semantic segmentation can parse the point cloud data
point cloud data associated with each image to the waypoint determi
and extract points associated with the target structural components.
nation system incrementally (“Demonstration 1”). After evaluating the
Then, the rectangle detection can extract the key geometry of the col
system’s capability of detecting columns robustly and reliably, the sec
umn faces. Finally, columns are detected from rectangle pairs, a single
tion extends the discussion to incremental sparse point cloud building
unpaired rectangle, or no rectangle. The figure shows that the global
and progressive improvement and addition of waypoints (“Demonstra
grid line pattern is identified, based on which the existence of unseen
tion 2”).
columns is inferred during the early stages of the sample flight. More
over, as the UAV experiences more views, the number of columns
4.2. Preparation: Frame-wise image processing detected from rectangle pairs increases, indicating the progressive
improvement of the detection results.
The first step toward the demonstration of the autonomous naviga To investigate the effect of Nrect (the number of rectangles initialized
tion planning is to train its perception module: frame-wise semantic in each step), the column detection is performed 100 times for the
segmentation of structural components and depth estimation. The deep sample UAV flight data with Nrect = 1, 2, 3, 4. The numbers of rectangles
UNET3+ architecture discussed previously is trained for that purpose. A detected for each of the four cases are shown in Fig. 14. At every frame,
large-scale synthetic dataset, termed Tokaido dataset [25], is used to up to Nrect rectangles are identified, which are then reduced at every Ncb
enable the training of the deep networks with more than a million pa step by the applications of RANSAC to identify the global coordinate
rameters. During training, the input images and the associated ground system, resulting in zigzag patterns of the plots. Compared to the case
truth labels are flipped horizontally with probability 0.5, and rotated by with Nrect = 1, the case with Nrect = 2 detects more rectangles at a faster
an angle sampled uniformly from the range [− π/12, π/12]. The opti rate. When the value of Nrect further increases, the number of rectangles
mization is based on Adam method [62] with batch size of 12. Tensor increases accordingly. However, the refinement of the detection results
flow 2 is used for the implementation of algorithms [63]. at every Ncb(=10) frame lower the rate of increase by either discarding
For structural component recognition, small number of real-world or merging rectangles. When the value of Nrect is too large, most of the
images (51 for training, 50 for testing) are mixed with the Tokaido detected triangles are low quality and discarded by the refinement
12
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
Fig. 13. Example results of data processing steps (every 20 frame from 20th to 200th frame, Nrect = 1).
13
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
process. maximum, top 25%, median, top 75%, and minimum number of rect
The numbers of columns detected for each of the four cases are angles detected among 100 runs. The figures also show that the columns
shown in Fig. 15. Each figure shows five lines, corresponding to the tend to be detected more quickly for larger values of Nrect. For all Monte
14
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
Table 7
Column detection accuracy (root mean squared error of column center location
and orientation). Blue: columns detected based on rectangle pairs, red: columns
detected from unpaired/no rectangle.
After 150th frame After 235th frame
Table 8
Column detection accuracy (dimensions). Blue: columns detected based on
rectangle pairs, red: columns detected from unpaired/no rectangle.
After 150th frame After 235th frame
15
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
Fig. 18. Data processing procedure for the demonstration of the autonomous navigation planning approach.
this research: the software implementation has not been optimized for environment) and the corresponding camera poses estimated with the
on-board and real-time applications. For example, optimization of map by the SfM software, following the method discussed in [64] (this
UNET3+ architectures, multi-thread processing, and implementation of transform is used only for image rendering, and therefore not visible for
additional logics that remove data associated with columns that have the proposed navigation planning system).
been inspected already or has not been assigned to valid rectangles for a During the demonstration, the conditions for switching from the
long time. Those investigations are part of future work. GPS-based navigation to the one using the inspection waypoints are
satisfied at the 50th frame (50 m from the initiation of the mission). The
parsed sparse point cloud data, navigation path in 2D plane, and navi
4.4. Demonstration 2: Autonomous UAV navigation for rapid structural gation path in 3D space are shown in Fig. 20 at every 100 frames after
inspection entering the inspection phase. The proposed strategy can plan a UAV
navigation path for rapid post-earthquake structural inspection auto
This section demonstrates the autonomous UAV navigation planning matically based on the understanding of the target structure. Further
approach following the procedure illustrated in Fig. 18. First, a UAV flies more, columns in the adjacent viaducts are detected toward the end of
along a predetermined GPS-based path (Fig. 12) to initiate the mission. the inspection of the current viaduct, enabling smooth transition to the
In contrast to the previous section that computed the sparse point cloud inspection of the next structure.
data off-line using the entire dataset, this section takes a more realistic Images of one of the columns inspected during the inspection phase
approach; every time 10 new images are obtained, those images are are shown in Fig. 21. As expected, images from the close and controlled
processed by the SfM, and then the column detection algorithm dis distance are collected with overlaps, which are ideal for the automated
cussed in this research is applied. If the scale (grid lines) and at least two post-earthquake structural damage recognition. The results of applying
columns are detected, the UAV switches from the initial GPS-based the previously trained semantic segmentation algorithm for structural
navigation to the navigation using the inspection waypoints (“Inspec damage recognition [25] to the images shown in Fig. 21 are presented in
tion phase” that uses waypoints shown in Fig. 9), and otherwise stays in the appendix.
the GPS-based path. Once the inspection phase is initiated, image
batches are rendered in the synthetic environment as needed to enable 5. Discussions
the demonstration of the progressive path planning and refinement
approach. Following the roadmap presented in Fig. 1, this research developed
The coordinate systems used in this demonstration are summarized the conceptual and theoretical framework for UAV navigation planning
in Fig. 19. The synthetic environment has its own coordinate system, in to enable autonomous rapid post-earthquake inspection of railway via
which the ground truth configurations of the viaducts and the UAV are ducts, and demonstrated the prototype system in the synthetic envi
described. From images collected in the synthetic environment, the ronment. The demonstration showed significant potential for detecting
sparse map and the UAV poses are obtained in the default coordinate and localizing critical structural components (columns) in an online
system used by the SfM software. During the data processing, world manner, and then planning appropriate paths to collect close-up images
coordinate system relative to the default coordinate system (transform of those components from desired levels of details and overlaps. On the
T2) is identified, which is used for finding grid patterns and planning other hand, further work is needed to complete the aspects of the
navigation paths. The column detection and path planning results are autonomous post-earthquake structural inspection system that were not
visualized either in this world coordinate system, or in the 2D horizontal covered in this research, which are discussed herein.
plane derived from the world coordinate system. Once the inspection
path is defined in the world coordinate system, the path is converted
back to the default map coordinate system, and then to the coordinate 5.1. Prototype development in experimental environments
system of the synthetic environment to render images. The transform
between the coordinate systems of the synthetic environment and the In this step, a prototype system that includes both hardware and
default map, T1, is defined by estimating the transform between the software should be developed, and its capabilities of performing in
initial GPS-based waypoints (ground truth values define in the synthetic spection tasks in experimental environments should be validated. The
16
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
Fig. 20. Parsed sparse point cloud, navigation path (2D), and navigation path (3D) at 1st, 100th, 200th, 300th, 400th, and 446th (last) frames after entering the
inspection phase. Dotted lines are finished parts, and the green solid lines show planned path to be followed. (For interpretation of the references to colour in this
figure legend, the reader is referred to the web version of this article.)
17
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
Fig. 21. Images of the four faces of a column collected by the proposed UAV navigation strategy (Npx = 7, R = 0.5).
prototype system developed in this phase should incorporate advanced experimental environments to field environments. The field environ
techniques for generic components of autonomous navigation systems, ments have an even higher degree of uncertainty than experimental
such as control, local path planning, and SLAM. These extensions will environments. Perception of the system should be improved to recognize
enable the UAV to perform missions in the complex environments using the types and locations of various objects in the field, and the navigation
the inspection waypoints determined based on the understanding of the unit should consider those complexities appropriately (depth recogni
structural systems, as discussed in this research. Besides, the current tion could be improved further by incorporating GPS data available
system puts little attention to computational and power resource man during the initialization phase). The system’s robustness to various non-
agement, which need to be investigated thoroughly in this phase. The standard trajectories of the viaducts (e.g., large curvature) should be
key to improving the current system would be (i) algorithm optimization improved. Moreover, the current system assumes that the damage to the
to reduce computation (network architectures, eliminating repeated target structure does not reach complete failure (collapse), so that the
computation for the rectangle initialization stage etc.), (ii) frame rate prior knowledge of the global geometric patterns can be explored. To
optimization (image processing for high-level path planning can occur at accommodate the complete collapse scenario, the system should
a lower rate than other tasks for control and local path planning), (iii) perform a global structure-level collapse classification prior to the pro
optimization of the timing to switch to the inspection (sooner the better cess discussed in this research (“system level” assessment discussed in
in terms of power consumption), and (iv) hardware optimization. [33]). The inspection process is initiated only when the structure is
classified as “not collapsed”.
18
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
5.3. Generalization to the inspection of other structural systems/ parallel or near-perpendicular pairs. In addition to the columns thus
components detected, the approach tries to find the grid-line pattern of the column
locations, based on which the detection results are refined, and infor
This research investigated the autonomous navigation planning mation about missing columns is inferred. With this approach, the UAV
system for RC railway viaduct columns with rectangular cross-sections, can initiate the inspection mission before obtaining complete informa
whose locations are characterized by regular grid-type patterns. This tion of the target structure. The information (map) can be improved
type of structures comprise a significant portion of railway lines in Japan progressively as the UAV experiences more views during the inspection.
and other countries; for example, nearly 50% of the total lengths of some The capability of the developed approach in detecting columns robustly
Japanese high-speed railway lines are made of this type of viaducts [65]. and reliably was evaluated using synthetic data. The results showed that
This fact indicates the importance of developing an effective autono all eight columns of the target viaducts were detected in all 100 trials
mous inspection system for this specific structural type. On the other with centimeter-level accuracy. Finally, the entire approach was
hand, extending the system to other structural systems (e.g., non-square demonstrated in the synthetic environment, showing the significant
cross-sections, simply supported bridges) and components (e.g., light potential of collecting high-quality images for post-earthquake struc
poles, rails, slabs) is not straightforward with the current formulation. A tural inspection efficiently and quickly.
possible approach for generalizing the proposed system to such sce The approach investigated in this research is a prototype for the
narios is to find tight 3D bounding boxes of the critical structural com autonomous UAV navigation planning for the post-earthquake struc
ponents, followed by the complete scan of those selected parts, as is tural inspection, and therefore needs to be improved to accommodate
often done by the Skydio platform. experimental and field implementation. First, this research postulates a
simplifying assumption on the sparse point cloud data generation and
5.4. Waypoint determination based on the observed structural conditions updating: this research applies the SfM to image batches. While pro
ducing accurate results, the approach is computationally expensive, and
In this research, the waypoint determination subsystem after column is not necessarily optimal for mobile robotics applications. Further
detection and localization was deterministic and pre-designed relative investigation of other computational aspects of the proposed approach is
to the square prismatic shapes representing the columns. However, the also needed to enable real-time and on-board data processing (e.g.,
navigation planning approach investigated in this research has the po reducing repeated computations for column detection at every Ncb step,
tential of designing paths in response to the observed structural condi not performing point cloud processing for inspected columns). Finally,
tions, similar to the human inspection process. After earthquakes, the demonstration in this research is performed in the synthetic envi
structural damage, such as cracks and large deformations of structural/ ronment only, which is free of uncertainty encountered in the experi
nonstructural components, may occur, which is the primary source of mental and field environments, such as obstacles and modeling
information for deriving the inspection ratings. By improving and imperfections. The robustness of the proposed approach should be
extending the logic for the waypoint determination step, the “reactive” improved in the future by combining the developed system with the
autonomous system is expected to be able to design a path that collects state-of-the-art techniques for autonomous navigations in general, such
many images of observed damage, without increasing the workload for as local path planning and control, including collision avoidance. The
intact parts of the structure. system investigated in this research is expected to form a basis for those
future investigations toward the fully autonomous post-earthquake
6. Conclusions structural inspection.
This research developed an approach for vision-based autonomous Declaration of Competing Interest
UAV navigation planning for rapid post-earthquake inspection of rein
forced concrete railway viaducts. The approach mimics the way human None.
inspectors perform the task: the system does not require the complete 3D
model of the environment, and instead uses the key characteristics of the Acknowledgement
target structure as prior knowledge. In an online manner, the system
parses the sparse point cloud data using frame-wise semantic segmen The authors would like to acknowledge the financial support by the
tation results, and identifies the approximate scale using the frame-wise U.S. Army Corps of Engineers (Contract/Purchase Order No. W912HZ-
depth estimation results. Then, rectangles are fitted to the points with 17-2-0024). This research was also supported in part by the National
the “Column” label, from which columns are detected by finding near- Natural Science Foundation of China Grant No. 51978182.
The synthetic viaducts used for the demonstration in this research contains damage, such as concrete damage and exposed rebar. The 58-layer Fully
Convolutional Network trained previously [25] is applied to the images presented in Fig. 21, and the estimated masks of concrete damage are shown in
Fig. 22. We can observe fine cracks in the detection results, illustrating the application of the proposed autonomous navigation planning approach. For
the detailed discussions about the performance of the semantic segmentation algorithm used herein, the readers are directed to the reference [25].
19
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
Fig. 22. Results of applying previously trained semantic segmentation algorithm to the images shown in Fig. 21.
20
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
[12] I. Brilakis, H. Fathi, A. Rashidi, Progressive 3D reconstruction of infrastructure [37] M. Stokkeland, K. Klausen, T.A. Johansen, Autonomous visual navigation of
with videogrammetry, Autom. Constr. 20 (7) (2011) 884–895, https://doi.org/ unmanned aerial vehicle for wind turbine inspection, Int. Conf. Unmanned Aircraft
10.1016/j.autcon.2011.03.005. Syst. 2015 (2015) 998–1007, https://doi.org/10.1109/ICUAS.2015.7152389.
[13] C.M. Yeum, J. Choi, S.J. Dyke, Autonomous image localization for visual inspection [38] X. Hui, J. Bian, X. Zhao, M. Tan, Vision-based autonomous navigation approach for
of civil infrastructure, Smart Mater. Struct. 26 (3–35051) (2017) 1–12, https://doi. unmanned aerial vehicle transmission-line inspection, Int. J. Adv. Robot. Syst. 15
org/10.1088/1361-665X/aa510e. (1) (2018) 1–15, https://doi.org/10.1177/1729881417752821.
[14] Y. Narazaki, V. Hoskere, T.A. Hoang, B.F. Spencer, Automated bridge component [39] S. Ren, K. He, R. Gershick, J. Sun, R. Girshick, J. Sun, Faster R-CNN: towards real-
recognition using video data, in: The 7th World Conference on Structural Control time object detection with region proposal networks, IEEE Trans. Pattern Anal.
and Monitoring, Qingdao, China, July 22–25, 2018. Accessed: Nov. 15, 2021. Mach. Intell. 39 (6) (2015) 91–99, https://doi.org/10.1109/
[Online]. Available: https://arxiv.org/abs/1806.06820. TPAMI.2016.2577031.
[15] Y. Narazaki, V. Hoskere, T.A. Hoang, Y. Fujino, A. Sakurai, B.F. Spencer, Vision- [40] V.A.H. Higuti, A.E.B. Velasquez, D.V. Magalhaes, M. Becker, G. Chowdhary, Under
based automated bridge component recognition with high-level scene consistency, canopy light detection and ranging-based autonomous navigation, J. Field Robot.
Comp. Aided Civil Infrastruct. Eng. 35 (5) (2020) 465–482, https://doi.org/ 36 (3) (2019) 547–567, https://doi.org/10.1002/rob.21852.
10.1111/mice.12505. [41] Y. Perez-Perez, M. Golparvar-Fard, K. El-Rayes, Artificial neural network for
[16] H. Kim, J. Yoon, S. Sim, Automated bridge component recognition from point semantic segmentation of built environments for automated Scan2BIM, Am. Soc.
clouds using deep learning, Struct. Control. Health Monit. 27 (9-e2591) (2020) Civil Eng. Int. Conf. Comp. Civil Eng. (2019) 97–104, https://doi.org/10.1061/
1–13, https://doi.org/10.1002/stc.2591. 9780784482438.013.
[17] S. Dorafshan, R.J. Thomas, C. Coopmans, M. Maguire, Deep Learning Neural [42] M. Kono, Y. Matsumoto, Design of the standard rigid frame railway bridge in new
Networks for sUAS-Assisted Structural Inspections: Feasibility and Application, Tokaido line (in Japanese), Trans. Japan Soc. Civil Eng. Mar. 1965 (115) (1965)
2018 International Conference on Unmanned Aircraft Systems, Dallas, TX, USA, 13–25, https://doi.org/10.2208/jscej1949.1965.115_13.
Aug. 2018, pp. 874–882, https://doi.org/10.1109/ICUAS.2018.8453409. [43] M. Ohba, The design history of the railway viaduct from the design of tokaido
[18] F.-C. Chen, R.M.R. Jahanshahi, NB-CNN: deep learning-based crack detection using shinkansen to the recent design (in Japanese), Concrete J. 51 (1) (2013) 112–115,
convolutional neural network and Naïve Bayes data fusion, IEEE Trans. Ind. https://doi.org/10.3151/coj.51.112.
Electron. 65 (5) (2017) 4392–4400, https://doi.org/10.1109/TIE.2017.2764844. [44] M. Kobayashi, K. Shinoda, K. Mizuno, S. Nozawa, T. Ishibashi, Study on damage
[19] V. Hoskere, Y. Narazaki, T.A. Hoang, B.F. Spencer, MaDnet: multi-task semantic caused to Shinkansen RC viaducts by the 2011 off the pacific coast of Tohoku
segmentation of multiple types of structural materials and damage in images of earthquake (in Japanese), J. Japan Soc. Civil Eng. A1 70 (4) (2014), https://doi.
civil infrastructure, J. Civ. Struct. Heal. Monit. 10 (2020) 757–773, https://doi. org/10.2208/jscejseee.70.I_688 p. I_688-I_700.
org/10.1007/s13349-020-00409-0. [45] H. Inaguma, M. Seki, Experimental study on earthquake strengthening using
[20] M.R. Saleem, J.-W. Park, J.-H. Lee, H.-J. Jung, M.Z. Sarwar, Instant bridge visual polyester sheets of RC railway viaduct columns (in Japanese), Japan Soc. Civil Eng.
inspection using an unmanned aerial vehicle by image capturing and geo-tagging J. Struct. Eng. 50A (2) (2004) 515–526. Accessed: Nov. 15, 2021. [Online].
system and deep convolutional neural network, Struct. Health Monit. 20 (4) (2020) Available: http://library.jsce.or.jp/jsce/open/00127/2004/50-0515.pdf.
1760–1777, https://doi.org/10.1177/1475921720932384. [46] Y. Takahashi, Report on Damage Caused by the 2011 Tohoku Earthquake (in
[21] J. Shi, R. Zuo, J. Dang, Bridge damage classification and detection using fully Japanese), Accessed: Nov. 15, 2021. [Online]. Available: https://committees.jsce.
convolutional neural network based on images from UAVs, in: Experimental or.jp/report/system/files/10_takahashi.pdf, 2011.
Vibration Analysis for Civil Structures, CRC Press, 2020. ISBN: 9781003090564. [47] K. Tateno, F. Tombari, I. Laina, N. Navab, CNN-SLAM: Real-time dense monocular
[22] E. McLaughlin, N. Charron, S. Narasimhan, Automated defect quantification in SLAM with learned depth prediction, in: IEEE Conference on Computer Vision and
concrete bridges using robotics and deep learning, J. Comput. Civ. Eng. 34 Pattern Recognition, 2017, pp. 6243–6252, https://doi.org/10.1109/
(5–04020029) (2020) 1–12, https://doi.org/10.1061/(asce)cp.1943- CVPR.2017.695.
5487.0000915. [48] T. Schöps, T. Sattler, M. Pollefeys, BAD SLAM: Bundle adjusted direct RGB-D
[23] M.R. Jahanshahi, S.F. Masri, Adaptive vision-based crack detection using 3D scene SLAM, in: IEEE Conference on Computer Vision and Pattern Recognition, 2019,
reconstruction for condition assessment of structures, Autom. Constr. 22 (2012) pp. 134–144, https://doi.org/10.1109/CVPR.2019.00022.
567–576, https://doi.org/10.1016/j.autcon.2011.11.018. [49] R. Mur-Artal, J.D. Tardos, ORB-SLAM2: an open-source SLAM system for
[24] B.F. Spencer, V. Hoskere, Y. Narazaki, Advances in computer vision-based civil monocular, stereo and RGB-D cameras, IEEE Trans. Robot. 33 (5) (2016)
infrastructure inspection and monitoring, Engineering 5 (2) (2019) 199–222, 1255–1262, https://doi.org/10.1109/TRO.2017.2705103.
https://doi.org/10.1016/J.ENG.2018.11.030. [50] J. Engel, T. Schöps, D. Cremers, LSD-SLAM: Large-scale direct monocular SLAM, in:
[25] Y. Narazaki, V. Hoskere, K. Yoshida, B.F. Spencer, Y. Fujino, Synthetic 2014 European Conference on Computer Vision, 2014, pp. 834–849, https://doi.
environments for vision-based structural condition assessment of Japanese high- org/10.1007/978-3-319-10605-2_54.
speed railway viaducts, Mech. Syst. Signal Process. 160 (107850) (2021) 1–22, [51] X. Yang, Y. Gao, H. Luo, C. Liao, K.T. Cheng, Bayesian DeNet: monocular depth
https://doi.org/10.1016/j.ymssp.2021.107850. prediction and frame-wise fusion with synchronized uncertainty, IEEE Trans.
[26] Cross-ministerial strategic innovation promotion program (SIP) report, Cabinet Multimedia 21 (11) (2019) 2701–2713, https://doi.org/10.1109/
office of Japan, 2018. Accessed: Nov. 15, 2021. [Online]. Available: https://www. TMM.2019.2912121.
jst.go.jp/sip/dl/k07/pamphlet_2018_en.pdf. [52] Agisoft Metashape. https://www.agisoft.com/ (accessed Aug. 30, 2020).
[27] B. Yamauchi, Frontier-based approach for autonomous exploration, in: Proceedings [53] J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic
of IEEE International Symposium on Computational Intelligence in Robotics and segmentation, IEEE Trans. Pattern Anal. Mach. Intell. 39 (4) (2015) 640–651,
Automation, 1997, pp. 146–151, https://doi.org/10.1109/cira.1997.613851. https://doi.org/10.1109/TPAMI.2016.2572683.
[28] S.K. Ramakrishnan, Z. Al-Halah, K. Grauman, Occupancy anticipation for efficient [54] H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, Y. Iwamoto, X. Han, Y.W. Chen, J. Wu,
exploration and navigation, in: Proceedings of the European Conference on UNet 3+: A full-scale connected UNet for medical image segmentation, in: 2020
Computer Vision, 2020. Accessed: Aug. 29, 2020. [Online]. Available: http://arxiv. IEEE International Conference on Acoustics, Speech and Signal Processing, 2020,
org/abs/2008.09285. pp. 1055–1059, https://doi.org/10.1109/ICASSP40776.2020.9053405.
[29] M. Srinivasan Ramanagopal, A.P. Van Nguyen, J. Le Ny, A motion planning [55] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical
strategy for the active vision-based mapping of ground-level structures, IEEE Trans. image segmentation, in: 2015 International Conference on Medical Image
Autom. Sci. Eng. 15 (1) (2018) 356–368, https://doi.org/10.1109/ Computing and Computer-Assisted Intervention vol. 9351, 2015, pp. 234–241,
TASE.2017.2762088. https://doi.org/10.1007/978-3-319-24574-4_28.
[30] A. Howard, M.J. Matarić, G.S. Sukhatme, An incremental self-deployment [56] I. Laina, C. Rupprecht, V. Belagiannis, F. Tombari, N. Navab, Deeper depth
algorithm for mobile sensor networks, Auton. Robot. 13 (2) (2002) 113–126, prediction with fully convolutional residual networks, in: 2016 Fourth
https://doi.org/10.1023/A:1019625207705. International Conference on 3D Vision, 2016, pp. 239–248, https://doi.org/
[31] F. Fraundorfer, L. Heng, D. Honegger, G.H. Lee, L. Meier, P. Tanskanen, 10.1109/3DV.2016.32.
M. Pollefeys, Vision-based autonomous mapping and exploration using a quadrotor [57] C. Martin, S. Thrun, Real-time acquisition of compact volumetric 3D maps with
MAV, IEEE Int. Conf. Intel. Robots Syst. (2012) 4557–4564, https://doi.org/ mobile robots, IEEE Int. Conf. Robot. Automat. 1 (2002) 311–316, https://doi.org/
10.1109/IROS.2012.6385934. 10.1109/ROBOT.2002.1013379.
[32] 3D ScanTM | Skydio. https://www.skydio.com/3d-scan (accessed Sep. 10, 2021). [58] M.A. Fischler, R.C. Bolles, Random sample consensus: a paradigm for model fitting
[33] X. Liang, Image-based post-disaster inspection of reinforced concrete bridge with applications to image analysis and automated cartography, Commun. ACM 24
systems using deep learning with Bayesian optimization, Comp. Aided Civil (6) (1981) 381–395, https://doi.org/10.1145/358669.358692.
Infrastruct. Eng. 34 (5) (2018) 415–430, https://doi.org/10.1111/mice.12425. [59] Y. Chen, G. Medioni, Object modeling by registration of multiple range images,
[34] W. Sheng, H. Chen, N. Xi, Navigating a miniature crawler robot for engineered IEEE Int. Conf. Robot. Automat. 3 (1991) 2724–2729, https://doi.org/10.1109/
structure inspection, IEEE Trans. Autom. Sci. Eng. 5 (2) (2008) 368–373, https:// robot.1991.132043.
doi.org/10.1109/TASE.2007.910795. [60] P.E. Hart, N.J. Nilsson, B. Raphael, A formal basis for the heuristic determination of
[35] A. Ibrahim, A. Sabet, M. Golparvar-Fard, BIM-driven mission planning and minimum cost paths, IEEE Trans. Syst. Sci. Cybernet. 4 (2) (1968) 100–107,
navigation for automatic indoor construction progress detection using robotic https://doi.org/10.1109/TSSC.1968.300136.
ground platform, in: 2019 European Conference on Computing in Construction, [61] A. Sakai, D. Ingram, J. Dinius, K. Chawla, A. Raffin, A. Paques, PythonRobotics: A
2019, pp. 182–189, https://doi.org/10.35490/EC3.2019.195. Python Code Collection of Robotics Algorithms, Accessed: Sep. 20, 2020. [Online].
[36] S.S. Mansouri, C. Kanellakis, E. Fresk, D. Kominiak, G. Nikolakopoulos, Available: http://arxiv.org/abs/1808.10703, Aug. 2018.
Cooperative coverage path planning for visual inspection, Control. Eng. Pract. 74 [62] D.P. Kingma, J. Ba, Adam: A method for stochastic optimization, in: The 3rd
(2018) 118–131, https://doi.org/10.1016/J.CONENGPRAC.2018.03.002. International Conference for Learning Representations, 2015, pp. 1–15. Accessed:
Nov. 15, 2021. [Online]. Available: https://arxiv.org/pdf/1412.6980.pdf.
21
Y. Narazaki et al. Automation in Construction 137 (2022) 104214
[63] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, [64] D.W. Eggert, A. Lorusso, R.B. Fisher, Estimating 3-D rigid body transformations: a
G. Irving, M. Isard, M. Kudlur, J. Levenberg, R. Monga, S. Moore, D.G. Murray, comparison of four major algorithms, Mach. Vis. Appl. 9 (5) (1997) 272–290,
B. Steiner, P. Tucker, V. Vasudevan, P. Warden, M. Wicke, Y. Yu, X. Zheng, https://doi.org/10.1007/S001380050048.
TensorFlow: a system for large-scale machine learning, in: 12th Symposium on [65] S. Takatsu, M. Doi, High-speed railways in Japan – past and future (in Japanese),
Operating Systems Design and Implementation, 2016, pp. 265–283. Accessed: Nov. Railway Pictorial 58 (2) (2008) 142–153. Accessed: Jul. 10, 2020. [Online].
15, 2021. [Online]. Available: https://research.google/pubs/pub45381/. Available: https://ci.nii.ac.jp/naid/40015748291.
22