You are on page 1of 18

DOI: 10.1111/mice.

12693

ORIGINAL ARTICLE

Metrics and methods for evaluating model-driven reality


capture plans

Amir Ibrahim1 Mani Golparvar-Fard2 Khaled El-Rayes1

1 Department of Civil and Environmental

Engineering, University of Illinois at Abstract


Urbana-Champaign, IL, USA This paper presents new metrics and methods for evaluating the quality of reality
2Computer Science and Tech capture plans—commonly used to operate camera-mounted unmanned aerial
Entrepreneurship, University of Illinois at
Urbana-Champaign, IL, USA
vehicles (UAVs) or ground rovers—for construction progress monitoring and
inspection of as-is conditions. Using 4D building information model (BIM) or
Correspondence 3D reality model as a priori, these metrics provide feedback on the quality of a
Mani Golparvar-Fard, Associate Professor
of Civil Engineering, Computer Science plan (within a few minutes), accounting for resolution, visibility, accuracy, com-
and Tech Entrepreneurship, University of pleteness of the capture, and satisfying battery capacity and line-of-sight require-
Illinois at Urbana-Champaign.
ments. A cloud-based system is introduced to create and optimize UAV/rover
Address: 205 N. Mathews Ave., MC-250,
Urbana, IL 61801, USA. missions in the context of prior model. Results from real-world construction
Email: mgolpar@illinois.edu data sets demonstrate that the proposed metrics offer actionable insights into
the accuracy and completeness of reality capture plans. Additionally, a cap-
Present address: Amir Ibrahim. 205
N. Mathews Ave., MC-250, Urbana, IL ture plan—with a combination of canonical and noncanonical camera views—
61801, USA that satisfies the introduced metrics is statistically correlated with the quality of
Funding information
reconstructed reality. These metrics can improve computer-vision progress mon-
National Science Foundation (NSF), itoring and inspection methods that rely on the construction site’s appearance
Grant/Award Numbers: 1446765, 1544999 and geometry.

1 INTRODUCTION For construction and infrastructure monitoring tasks,


the collected images and videos are converted into infor-
Capturing reality and modeling as-is conditions of con- mative data formats such as 3D point clouds, orthophotos,
struction sites or existing infrastructure assets using laser and contour maps. The resulting measurable images and
scanners, 360◦ cameras, camera-mounted unmanned point clouds are used for construction planning, site logis-
aerial vehicles (UAVs), and ground rovers is becoming tics, as-built modeling, progress reporting, quality control,
an indispensable part of construction monitoring and and infrastructure condition assessment. Despite the
inspection practices. Visual data collected using high-end benefits of reality capture data, their successful conversion
laser scanners are used for progress monitoring (Zhang & into complete, accurate, and actionable 3D models and
Arditi, 2013), as-built documentation (Zhang et al., 2016), 2D orthophotos relies heavily on the engineering team’s
and structural health monitoring (Park et al., 2007). How- experience to collect and postprocess the data. Neverthe-
ever, high operational costs, slow planning, and execution less, manual planning and collection of the data lead—in
of scanning tasks hinder the streamlined technology most cases—to low-quality visual data (Asadi et al., 2020).
adoption. On the other hand, visual data, in the form of Figure 1 shows examples of low-quality reality models
images and videos, present a quick and low-cost solution produced from poor data collection practices. As shown,
to construction monitoring and documentation. the results are incomplete, contain low-density regions,

© 2021 Computer-Aided Civil and Infrastructure Engineering

Comput Aided Civ Inf. 2021;1–18. wileyonlinelibrary.com/journal/mice 1


2 IBRAHIM et al.

F I G U R E 1 Manual reality capture planning may result in (a) incomplete coverage, (b) inaccurate 3D reconstruction, and (c) low
visibility measured in SSD

and are inaccurate due to drift errors and low surface ries of data frames to structure’s topology. UAVs and
sampling distance (SSD), that is, 𝑥 units of measure on the ground rovers are also operated near existing buildings,
structure’s surface per image pixel. Consequently, project trees, power lines, and on-site personnel; thus, there are
teams return to construction sites to perform additional inherent risks of damages and injuries.
captures. This practice is costly, and a complementary To address the limitations mentioned above, this work
capture might be too late to document critical changes. introduces methods to plan for image-based reality cap-
Of course, project teams can overcollect reality data for ture, evaluate the quality of capture plans based on a com-
better completeness and accuracy; however, this practice prehensive set of metrics, and improve the plans for sup-
exponentially increases the data’s postprocessing time porting construction monitoring applications. Addition-
and, in turn, takes away from productivity gains. ally, a cloud-based solution with an intuitive user interface
According to Ham et al. (2016), automatic collection is presented for reality capture planning and visualizing
of reality capture plans using autonomous platforms and each capture plan’s performance against the introduced
assessing the plans’ quality before collection offer an oppor- criteria. The next sections provide an overview of related
tunity to address the latter challenges. In practice, com- works, followed by a discussion on visual quality metrics,
mercially available flight planning applications are used evaluation methods, conducted experiments, and findings.
by drone operators to create user-defined flight plans. By
fine-tuning parameters such as the height of the flight,
image overlap, and image resolution, lawn-mowing—grid- 2 RELATED WORKS
based—2D flight missions are created to visually cover a
specific region. Then, dedicated mobile applications are Image-based 3D reconstruction has been widely used for
used to operate UAVs and collect the data automatically. automated progress monitoring, for instance, the work
Such planning methods can only provide users with feed- by Hamledari et al. (2017). Moreover, Xu et al. (2020)
back on the expected ground sampling distance (GSD)— used 3D reconstructed models for quality control during
which measures the distance between two pixels’ centers construction, and W. Y. Lin (2020) utilized such models
on the ground. While GSD feedback is useful for survey- for condition assessment of assets. Yang et al. (2015) and
ing tasks, specifically to evaluate the accuracy of mea- Kopsida et al. (2015) offer an extensive literature review
surements conducted on the collected images, it is not on recent techniques for automated evaluation of progress
suitable for construction monitoring tasks as the GSD deviations using 3D reality models. The following sections
value assumes a fixed distance between the UAV and the review prior work on the quality of 3D reconstructed
structure’s topology. Accordingly, GSD does not assure models and captured data for progress monitoring, quality
accurate measurements for buildings or infrastructure control, and condition assessment.
assets.
Moreover, applications such as automated construc-
tion progress monitoring and condition assessment of 2.1 Quality of 3D reconstruction
existing assets have several additional requirements: for
example, (1) visual coverage to the monitored assets, that Prior research works have focused on methods to com-
is, setting the viewpoints of a capture device to visually pare point clouds (with or without associated images)
cover a region of interest on the construction site; (2) against the building information model (BIM). Prior works
clear visibility of each construction asset in the collected have examined the quality of laser scanner point clouds
frames—measured in SSD; and (3) canonical trajecto- from capture planning and quality control perspectives.
IBRAHIM et al. 3

Anil et al. (2013) proposed a deviation analysis method to for complete visual coverage to an infrastructure
assess 3D as-built point clouds’ quality by measuring dis- asset. Here, the 3D geometry of the infrastructure
tances between laser scanner points and reference BIM. asset is decomposed into surface patches. Then,
Zhang et al. (2016) evaluated the level of detail (LOD) of the UAV path is optimized at a fixed distance from
point clouds by measuring the density of laser scanner the structure to cover all patches with a maximum
points projected on structure surfaces and used a min- GSD. Moreover, Baik and Valenzuela (2019) used
imum LOD requirement for optimizing laser scanning a visual coverage metric to create UAV plans for
plans. Kalyan et al. (2016) used dimensional analysis to complete visual inspection of electric transmission
measure the accuracy of reality models collected using towers using simple geometries. However, for con-
a depth sensor by comparing the dimensions of scanned struction monitoring tasks, visual coverage for
point clouds to a ground truth (GT) model. Rebolj et al. all elements cannot be assured using simplified
(2017) investigated different techniques for evaluating the geometry of the structure where some elements—
quality of laser scanner point clouds. In such work, the 3D especially those with small dimensions—will be
model’s quality is indicated by the points’ density (num- excluded from the evaluated model. Nevertheless,
ber of points per 𝑚2 ) and accuracy, which measures the clear visibility of each construction asset in the
difference in depth between scanned points and refer- collected frames is also needed to support auto-
ence points. To this end, similar metrics and methods to matic vision-based monitoring methods (Han &
assess the accuracy and completeness of image-based real- Golparvar-Fard, 2015).
ity capture plans—necessary for the success of automated Sensor parameters: to select the best camera func-
progress monitoring methods—have not been the subject tional parameters, including sensor’s type, resolu-
of research. tion, field of view (FOV), image type (e.g., per-
In the computer vision community, Seitz et al. (2006) spective or equirectangular), and frame rate for
established quality evaluation methods for calculating the time lapse or standard videos. Tuttas et al. (2016)
accuracy and completeness of 3D reality models generated compared different methods and setups for image
via different image-based reconstruction pipelines. In this acquisition on construction sites using hand-held
work, the accuracy of a model is measured by the 90th per- cameras, UAVs, and crane cameras. Zhang et al.
centile of the distances between 3D reconstructed points (2016) optimized plans for laser scanning by defin-
and a reference model (GT). Besides, the completeness is ing the best sensor configurations to improve the
calculated as a percentage of the GT points having a dis- density and accuracy of scanned point clouds.
tance below a fixed threshold to the nearest reconstructed Rodríguez-Gonzálvez et al. (2017) utilized visual
point. These evaluation metrics and methods are useful sensors for weld inspection and compared macro
in comparing 3D reconstruction algorithms. Accordingly, photography and laser scanners in terms of the
they offer an opportunity to compare the impact of reality resulting GSD and operational costs. Accordingly,
plans on the quality of reconstructed reality models. planning for visual reality capture has to consider
the variability in sensor parameters.
GSD: to indicate the resolution of measurements in the
2.2 Metrics to evaluate reality capture collected images. Daftry et al. (2015) used GSD as
plans an indicator of the quality of image-based recon-
structed point clouds collected using aerial plat-
To date, prior research works have introduced some met- forms. Baik and Valenzuela (2019) evaluated and
rics to evaluate and improve the quality of reality capture optimized the resolution of UAV images obtained
plans before data collection, including the following: for inspection missions by calculating the expected
GSD. Kim et al. (2019) set a fixed altitude for the
Visual coverage: to evaluate the best sensor configura- UAV missions to satisfy a maximum GSD value
tions (camera positions and trajectories) that com- suitable for generating an initial 3D map of the
pletely observe all the primitives of a 3D structure. construction environment. However, using GSD as
A survey by Galceran and Carreras (2013) pre- feedback does not assure accurate measurements
sented various robotic path planning methods for for buildings or infrastructure assets with complex
providing complete visual coverage to 3D struc- geometries, particularly when the camera takes
tures. The latter work recommended simplifying close-up images. In such a case, the GSD metric can
the structures to basic geometries to solve cover- be extended to measure SSD that has better perfor-
age path planning problems. More recent work by mance in evaluating the expected accuracy of a 3D
Phung et al. (2017) optimized UAV configurations reconstruction.
4 IBRAHIM et al.

Visual features: where image-based 3D reconstruc- 2.3 Methods to evaluate reality capture
tion relies on detecting and matching visual plans
features. These features are mathematically repre-
sented as feature descriptors that can be automat- Previous research endeavors have investigated the usage
ically detected, be matched across images, form of a priori—such as BIMs—to create, communicate, and
feature tracks, and be transformed into 3D recon- evaluate the quality and safety of reality capture plans
structed points via bundle adjustment optimiza- (Ibrahim, Roberts, et al., 2017; Y. H. Lin et al., 2013; Taneja
tion. Degol, Golparvar-Fard et al. (2018) evaluated et al., 2016). However, the latest site conditions—which
the success of 3D reconstruction through the prob- are modeled using 4D BIM and temporal 3D reality
ability of detecting and matching simulated fea- models—are rarely accounted for during data collection
tures and forming successful feature tracks (scene planning. Also, the applicability of prior works was
graph) between image pairs. Furthermore, Javad- validated in the context of UAV images only. However, it
nejad et al. (2021) sampled features from sparse did not consider 360◦ images/videos and interior spaces
point clouds and attributed the quality of recon- where automated data collection and progress monitoring
structed dense point clouds to the distance between are more challenging.
reconstructed points and reference features. While An earlier work by the authors (Ibrahim, Golparvar-
detecting and matching visual features are essential Fard, et al., 2017) discussed the importance of using the
for successful 3D reconstruction, to date, this met- most up-to-date state of an asset—either represented
ric has not been used to assess and optimize reality in the form of 4D BIM or 4D reality model—for qual-
plans. ity assessment of reality capture plans. However, the
Sensor orientation: which measures the angle of latter research focused only on providing a method to
incidence between the sensor’s look-at direction calculate the visibility of elements in a manually cre-
and the structure’s surfaces. As shown in Ham ated reality plan and using the redundant visibility of
et al. (2016) and Javadnejad et al. (2021), canoni- elements to indicate the completeness of reconstructed
cal views where camera orientations are orthogo- point clouds. Nonetheless, other critical criteria such as
nal to the structure can improve image resolution SSD, viewpoints orientation, reconstruction stability, and
and enhance the material detection algorithms’ operational requirements were not addressed. Moreover,
performance. Nevertheless, Degol, Lee et al. (2016) an objective validation utilizing real-world construction
have shown that canonical camera trajectories also data is missing, which is required to confirm the metrics
improve construction materials’ detection using and methods for evaluating reality capture plans.
on-site images. Consequently, assuring canonical Nevertheless, visualizing the quality metrics for reality
views in a reality plan is vital to enhance the accu- capture plans—particularly for users in the field—is as
racy of appearance-informed material recognition important as the methods used to calculate them. For
methods used for automated progress monitoring. instance, Daftry et al. (2015) used color heat maps to visu-
alize GSD values and indicate the number of overlapping
Although the previously researched metrics have been images observing the same mesh primitives. Ibrahim,
validated in different studies and under different assump- Golparvar-Fard, et al. (2017) colored BIM elements with
tions, to date, no research has investigated the collective a metaphor of traffic-light colors to visualize the values
usage of these metrics for evaluating construction reality of visual coverage and redundant observations of BIM
capture plans. Besides, the relative significance of these elements in the data. Visualizing other metrics such as
metrics in evaluating reality plans’ quality is still missing completeness of the capture and operational metrics
in the literature. introduced before has not been thoroughly investigated.
Beyond the monitoring metrics listed above, several
additional operational metrics should be accounted for
during the automatic collection of the data. Operating
2.4 Optimizing reality plans
autonomous platforms is associated with inherent risks
Optimizing a reality plan before execution is essential
of damages and injuries. Metrics for measuring (1) safe
for assuring complete and accurate results and reduce
proximity to the structure, (2) availability of enough
data collection time and costs. Several research studies
batteries to collect the data, and (3) preserve a continuous
have focused on solving path planning coverage problems
line-of-sight during UAV operation—required by the
such as Chen et al. (2018) and Lindner et al. (2019).
Federal Aviation Agency (FAA)—are equally important to
These works provide various algorithms such as greedy
assess the feasibility of executing reality capture plans.
next-best-view and set-cover optimization to generate
IBRAHIM et al. 5

F I G U R E 2 The new method and cloud-based system—built with a client-server architecture—enable fast and memory-efficient
evaluation and improvement of the reality capture plans against six evaluation criteria

data collection paths with optimal visual coverage. While each monitored element in SSD units, (3) the orientation
optimizing visual coverage using 3D models’ primitives is of the camera viewpoints to the model’s topology, (4) the
sufficient for some applications, Ibrahim, Golparvar-Fard, expected stability of 3D reconstruction pipeline, (5) satis-
et al. (2017) showed the importance of associating visual faction of the FAA regulation for maintaining line-of-sight
coverage to the number of back-projected pixels per BIM during drone operation, and (6) battery operation time
element. By doing so, a sufficient surface area of each during data collection. These criteria are transformed into
element would be visible in the data. Such a metric is visual quality metrics to assess visual coverage, redundant
designed to satisfy the requirements of appearance-based visibility, back-projection resolution, viewpoints orienta-
recognition methods used for progress monitoring (Han tion, and stability of 3D reconstruction. Simultaneously,
& Golparvar-Fard, 2017). the operational criteria are accounted for during reality
capture planning to enforce safe and successful execution.
Furthermore, each reality plan is improved through itera-
3 METHOD
tive manual modification and evaluation before execution.
In this paper, a reality capture plan ℝ has 𝑛 frames (per-
In this section, a comprehensive set of objective metrics
spective or equirectangular) given by (𝑓 | 𝑓 = 1 ∶ 𝑛; 𝑓 ∈
is presented to benchmark, compare, and support opti-
ℝ). ℝ is evaluated at time 𝐷 for a structure Γ that has 𝑧
mizing reality capture plans before their execution. A new
number of elements. An evaluated element—that is, an
method is introduced to enable fast and memory-efficient
element in BIM or a face of a polygon mesh in a reality
simulation of capture plans in the context of 4D BIM and
model—is defined by (𝑖 | 𝑖 = 1 ∶ 𝑧; 𝑖 ∈ Γ, 𝜕(𝑖) ≤ 𝐷), where
existing point clouds. A prototype of these methods and
𝜕(𝑖) returns the planned construction date for element
metrics is also presented that runs efficiently in a web
𝑖. Because reality models are dominantly reconstructed
browser. Besides, the prototype offers visual and numeric
using structure from motion techniques (Szeliski, 2020),
feedback on data collection plans’ performance against all
a pinhole camera model is considered for each frame in
evaluation criteria.
ℝ. As such, each element 𝑖 is back-projected into frame 𝑓
using Equation (1):
3.1 Metrics
𝑝𝑖,𝑓 = 𝑀𝑓 𝑃𝑖 = 𝑄𝑓 [𝑅𝑓 | 𝑇𝑓 ]𝑃𝑖 (1)
As shown in Figure 2, a reality capture plan is created con-
sidering six main criteria: (1) the visual coverage of data where 𝑃𝑖 is the 3D coordinate of a point that belongs to
frames to the structure’s elements, (2) the resolution of element 𝑖, 𝑀𝑓 is a 11 degrees-of-freedom (DoFs) camera
6 IBRAHIM et al.

where 𝛿(⋅) is a binary operator (1 if the underlying con-


dition is satisfied or 0 otherwise), 𝑖 refers to the evaluated
element, and 𝑧 is the total number of elements. Besides,
it is crucial to evaluate the relative visual coverage, which
indicates the ratio of visually observed elements satisfying
Ω in ℝ to all observed elements (Equation (3)). Calculat-
ing relative visual coverage 𝑉̄ 𝐶 is necessary to determine
if the visual coverage can be improved by increasing the
number of back-projected pixels, for example, by moving
cameras closer to elements or adding more image frames
to the capture plan:

𝑉𝐶 × 𝑧
𝑉̄ 𝐶 = ∑ ((∑ ) ) (3)
𝑧 𝑛
F I G U R E 3 Back-projecting point 𝑃𝑖 of element 𝑖 from WCS to 𝑖=1 𝛿 𝑓=1 𝛿(𝛾𝑖,𝑓 ≥ 1) > 0
FCS of frame 𝑓 using Equation (1). 𝛾𝑖,𝑓 shows all back-projected
points of element 𝑖 in frame 𝑓
3.1.2 Redundant visibility

Camera positioning errors during execution of a capture


projection matrix for frame 𝑓. The camera projection plan due to poor GPS signal and wind in case of using
matrix has six DoFs for rotation 𝑅𝑓 and translation 𝑇𝑓 UAV or localization errors in case of ground rovers lower
of the camera with respect to the world coordinate sys- the probability of observing expected visible elements.
tem (WCS) and represented by extrinsic camera matrix As such, redundant visibility reduces the chances of
[𝑅𝑓 | 𝑇𝑓 ]. The other five DoFs are associated with the poor-quality data. Also, based on Han and Golparvar-
intrinsic camera matrix 𝑄𝑓 , which defines the horizontal Fard (2015), the redundancy in elements’ observations
and vertical focal lengths, the sensor skew parameter, and improves automatic progress monitoring methods. Hence,
the camera’s 2D principal point. 𝑝𝑖,𝑓 represents the coordi- a new metric is introduced to measure the redundancy in
nates of the back-projected point 𝑃𝑖 in the frame coordinate observing the structure’s elements in ℝ. The redundant
system (FCS). Additionally, 𝛾𝑖,𝑓 represents the region of visibility per element is measured by enumerating frames
back-projected pixels of element 𝑖 in frame 𝑓. An element 𝑖 observing the element while considering the Ω threshold.
is considered visible in frame 𝑓 when 𝛾𝑖,𝑓 exceeds a prede- Hence, the redundant visibility metric 𝑉𝑅 is the average
fined visibility threshold Ω, which ensures that elements redundancy value for all visible elements (Equation (4)):
have minimum back-projected pixels to be identified in
frames (see Figure 3). For instance, if an automatic mate- ∑𝑧 (∑𝑛 )
𝑖=1 𝑓=1 𝛿(𝛾𝑖,𝑓 ≥ Ω)
rial recognition method such as in Han and Golparvar- 𝑉𝑅 = (4)
Fard (2017) requires extraction of 25 × 25 image patches 𝑉𝐶 × 𝑧
to detect elements’ materials in the collected frames, then
3.1.3 Back-projection resolution
Ω is set to 625. The following defines the metrics of visual
quality assessment.
The resolution of back-projected structure Γ in a reality
plan ℝ is important for visually detecting elements,
3.1.1 Visual coverage
conducting accurate measurements, and improving the
accuracy of reconstructed model. SSD metric is used to
This metric measures the visual coverage of all elements
calculate the resolution of the elements in each frame
𝑖 ∈ Γ in a reality capture plan ℝ. Specifically, it observes
indicating the distance—also surface area—that each
if 𝛾𝑖,𝑓 meets a minimum number of back-projected pixels
back-projected pixel represents. The average resolution
Ω. A complete visual coverage is achieved when ∃(𝛾𝑖,𝑓 ≥
for all back-projected pixels 𝜌𝑖,𝑓 for element 𝑖 in frame 𝑓
Ω), ∀𝑖 ∈ Γ. The visual coverage metric 𝑉𝐶 is defined by the
is calculated using the mean SSD for all back-projected
ratio of elements that pass the visibility criterion (Equation
pixels 𝑝𝑖,𝑓 ∈ 𝛾𝑖,𝑓 using Equation (5):
(2)):
(𝜙 )
((∑ ) )
∑𝑧 𝑛 tan 𝑓
× 𝑑̄𝑖,𝑓
𝑖=1 𝛿 𝑓=1 𝛿(𝛾𝑖,𝑓 ≥ Ω) > 0
𝜌𝑖,𝑓 = 2
2
(5)
𝑉𝐶 = (2) ℎ𝑓
𝑧
IBRAHIM et al. 7

where 𝜙𝑓 is the the camera’s FoV, ℎ𝑓 is the frame’s diago- metrics, which are all calculated in the FCS, the relative
nal size measured in pixels, and 𝑑̄𝑖,𝑓 is the mean depth of orientation is measured for all back-projected pixels 𝛾𝑖,𝑓 .
all back-projected pixels. The depth 𝑑𝑖,𝑓 of each pixel 𝑝𝑖,𝑓 The average relative orientation 𝜃𝑖,𝑓 between element 𝑖 and
is calculated by measuring the distance between the 3D camera 𝑓 is used similarly as averaging an element’s reso-
point 𝑃𝑖 and the camera frame 𝑓 (see Figure 3). lution per frame. Also, the mean of the top 𝑘𝑖 (Equation (6))
An element 𝑖 can be visible in several frames with differ- orientation values of an element 𝑖 is used to indicate the
ent resolutions. However, typically the best 𝑘-resolutions relative orientation of the element across all the frames in
across all visual frames contribute to reconstructing a ℝ. The viewpoint orientation metric 𝑂 is calculated as the
structure’s element in 3D. In addition, image-based mea- average orientation of all visible elements (Equation (9)):
surements of a structural element in a 3D viewer are most
∑𝑧 𝑘𝑖 ( )
𝑖=1 𝜇𝑓 𝜃𝑖,𝑓
accurate when conducted using canonical views or best
frames observing the element (Hoiem, 2018). Thus, the 𝑂= (9)
𝑉𝐶 × 𝑧
resolution of an element 𝑖 in plan ℝ is set to be the top
𝑘 mean resolutions of the back-projected element across 3.1.5 Stability of reconstruction
all frames. The redundant visibility of an element 𝑉𝑅,𝑖 =
∑𝑛
𝑓=1 𝛿(𝛾𝑖,𝑓 ≥ Ω) can end up lower that the desired 𝑘 value. Three-dimensional reconstruction algorithms require
Thus, 𝑘𝑖 is defined per element 𝑖, where the detection and matching of visual features across the
{ ) frames of ℝ so that visual tracks are formed and points are
𝑘 𝑘 ≤ 𝑉𝑅,𝑖 triangulated in 3D. To ensure success, implementations of
𝑘𝑖 = (6)
𝑉𝑅,𝑖 otherwise these algorithms recommend a minimum number of fea-
tures Λ to be detected per visual frame, for example, Λ = 20
Finally, the back-projected resolution metric 𝑅 for plan in Golparvar-Fard et al. (2009). In the presented method,
ℝ is calculated using Equation (7): these visual features are extracted from a priori model
∑𝑧 𝑘𝑖
Γ. Because these features are simulated, it is difficult—if
𝑖=1 𝜇𝑓 (𝜌𝑖,𝑓 ) not impossible—to precisely predict the actual position
𝑅= (7)
𝑉𝐶 × 𝑧 (𝑃𝑢 ) of each visual feature 𝑢 in the reality data before data
collection. In the worst case scenario, visual features exist
𝑘
where 𝜇𝑓𝑖 is the top 𝑘𝑖 mean resolution of an element 𝑖 at locations with a high probability of generating robust
across reality capture frames and 𝜌𝑖,𝑓 is the average SSD feature descriptors. These locations are typically around
of element 𝑖 in frame 𝑓. corners of elements and along highly textured surfaces.
Thus, the features are sampled from the simulated model
at the corners and at highly textured meshes with a
3.1.4 Viewpoint orientation sampling rate of 1 feature per m2 to measure the 3D
reconstruction’s stability in the most conservative state.
The camera’s relative orientation to an element’s sur- Once the features are detected, they need to be matched
faces affects the back-projected resolution and the overall across image pairs. Mathematically, epipolar geometry
quality of a 3D reconstruction. Hence, the viewpoint is used to model the transformations between image
orientation metric is designed to calculate the mean pairs utilizing the corresponding inlier features to fit a
orientation of a reality plan’s camera trajectory against all fundamental matrix using an RANSAC loop (Szeliski,
visible elements. Equation (8) calculates the orientation 2020). The fundamental matrix estimation per frame
angle between a frame 𝑓 and the surface normal vector at requires matching at least eight corresponding point pairs.
a point 𝑃𝑖 that belongs to element 𝑖: Since the simulated features may not be captured in the
( ) collected reality data, the necessary number of simulated
𝑁⃗𝑖 .𝑁⃗𝑓
Θ𝑖,𝑓 = cos−1 (8) feature pairs Λ is practically set higher than eight.
||𝑁⃗𝑖 ||.||𝑁⃗𝑓 || Finally, global optimization using bundle adjustment
is applied to estimate the 3D position of visual features
where Θ𝑖,𝑓 is the orientation angle between the surface accurately. A robust global optimization process requires
normal at point 𝑃𝑖 and frame 𝑓, 𝑁⃗𝑖 is the surface normal feature tracks between consecutive images in ℝ. Accord-
at the point 𝑃𝑖 , and 𝑁⃗𝑓 is the view direction of frame 𝑓. A ingly, an additional constraint is utilized to ensure each
canonical orientation occurs when the view direction 𝑁⃗𝑓 feature is visible in at least 𝑌 (e.g., 5) consecutive frames;
satisfies 𝑁⃗𝑖 ⋅ 𝑁⃗𝑓 = 0. For consistency with other evaluation where the frames in ℝ are spatially ordered along data
8 IBRAHIM et al.

A l g o r i t h m 1 Simulating stability of a 3D reconstruction camera-based metrics are used to color-code BIM or point
cloud/mesh models to provide visual feedback for users to
improve and optimize their capture plans.

3.2 Reality capture planning

A web-based application is developed for reality capture


planning. The application automatically generates grid-
based missions utilizing a priori in the form of 4D BIM or
reality model. The application uses a client-server archi-
tecture for easy access and sharing of capture plans among
project teams on-site and off-site without installing any
software package. Using this application, users can create,
revise, and visualize a reality plan for indoor and outdoor
environments. The planning method in this application
is based on a selected number of preset templates and
user-defined parameters. As such, this method does not
focus on automatically optimizing capture plans. For out-
collection paths. Algorithm 1 shows how these constraints doors, the camera’s 3D trajectories along the capture path
are enforced. Here, a feature track 𝑡𝑢 counts the number are defined by waypoints. Similar to the authors’ previous
of consecutive frames where a 3D feature 𝑢 was observed, work (Ibrahim & Golparvar-Fard, 2019), a 3D grid pattern
and 𝕋 is a set of all features that formed a feature track. with waypoints is sampled from a bounding-box geometry
The number of simulated visual features 𝜈𝑓 that pass the (see Figure 4). Unlike previous work, circular, top-down
stability test in Algorithm 1 and are detected in each frame lawn-mower, and bridge data capture templates are also
𝑓 is used to measure the stability of reconstruction metric integrated to support various monitoring applications.
𝑇 (Equation (10)). Figure 4 shows an example of how a sta- These waypoints are created along missions’ paths
ble 3D reality capture plan can provide a high-quality point while maintaining a minimum safety distance between
cloud model: the capture device and the a priori model to ensure a safe
flight and avoid collisions. The safety distance is defined by
∑𝑛
𝑓=1 𝜈𝑓
the user based on the company’s guidelines for safe UAV
𝑇= (10) operation while considering factors affecting the visibility
𝑛
and positioning of the UAV, such as the strength of GPS
Here, each metric’s formulation shows a single scalar signal and weather conditions. Additionally, the safety dis-
value per ℝ. Nevertheless, the first four metrics are calcu- tance accounts for vertical obstacles as trees, power lines,
lated per model’s element (or surface of a reality mesh). and tower cranes at the data collection site. Through a web
The last metric on the stability of 3D reconstruction is mea- form, user-defined parameters including image overlap
sured per plan’s frame 𝑓 in terms of 𝜈𝑓 . These element and and camera specifications are used to sample waypoints

FIGURE 4 An example of a stable 3D reality capture plan, which has resulted in a complete reconstructed point cloud
IBRAHIM et al. 9

along the data collection path. The 𝑠 spacing between way-


points is calculated using Equation (11), where 𝑔 is the user-
defined safety distance from the model’s bounding box, Φ
is the camera’s FoV, and 𝑙 is the percent image overlap:
( )
Φ
𝑠 = 2𝑔 tan (1 − 𝑙) (11)
2

The reality capture plan is split into multiple missions


to ensure each mission satisfies the FAA’s requirement
for preserving the operator’s line of sight. All waypoints
associated with a single mission are visually observable
from a ground station, assuming that the operator will
set the ground station at one corner of the flight plan’s
region to maximize the visual observation angle. Missions
are split into segments that are individually executable
within the UAV’s battery limitations. The latter helps the
UAV’s operator know how many batteries are required to
execute the whole plan and track the completed missions.
Next, the viewpoint of each waypoint is set perpendicular
to the structure to achieve canonical views. Besides, the
F I G U R E 5 A reality capture plan in an indoor space: (a)
viewpoints along the top and lateral edges of the plan
ground rover’s motion trajectory is shown in 3D and (b) plan view of
are set to oblique 45◦ allowing for a smooth transition
the same motion trajectory against elevated floor
between viewpoints for improving feature tracks.
Given the complexity of indoor spaces, reality capture
plans are manually created by the users, and hence no hierarchical octree structure is used to load and visualize
automated template is provided. Specifically, to create an point cloud data by storing the original point cloud at
indoor capture plan, the user selects a set of consecutive different resolutions similar to J. J. Lin and Golparvar-fard
waypoints along the capture path aligned to the prior (2016). In this approach, the octree’s root node stores a low-
model as a reference (see Figure 5). The indoor waypoints resolution point cloud, and with each level, the resolution
are designed for touring each navigable space of the gradually increases. This structure also culls parts of the
structure at a fixed elevation while avoiding collision point cloud outside the viewer and renders distant regions
with the structure, obstacles, and vertical shafts. Because at lower levels of detail. This performance optimization is
indoor reality captures are generally conducted using 360◦ essential because BIM models are already memory inten-
cameras, image overlap cannot be used to sample way- sive while the application still needs to load and render
points along the path. Accordingly, the user manually sets point cloud data, image data, and reality capture plans.
the sampling distance along the camera motion trajectory Figure 6 shows the interface of the developed application
(e.g., one frame per second). Finally, the data collection with a reality model overlaid on a 4D BIM and green colors
path is split into missions based on the ground rover’s bat- indicating in-progress elements at the data collection date.
tery limitation, where each mission includes consecutive Similar to Ibrahim, Golparvar-Fard, et al. (2017), a
waypoints that can be executed with a single battery. GPU-powered WebGL rendering engine is used for effi-
cient back-projection of each model’s element into camera
frames using Equation (1). The rendering engine utilizes
3.3 Simulation environment GPU multithread architecture to instantly calculate the 2D
location of mesh vertices in a target frame using a vertex
The developed application enables users to upload and shader program. A fragment shader program then colors
view BIM models and project schedules. Autodesk Forge the mesh surface using ambient material while leveraging
(2020) is used to load and visualize these models effi- z-buffer computation for considering occlusions. Based on
ciently in a web viewer. Connecting schedule tasks with this engine, four new rendering pipelines are developed to
model elements follows conventional 4D BIM practices. A calculate the visual evaluation metrics (see Figure 7): (1)
state-of-the-art engine for 3D reconstruction (Reconstruct, visibility pipeline ⊳𝑉 that renders each element 𝑖 with a
2020) capable of processing both perspective and 360◦ unique color in each frame 𝑓; (2) resolution pipeline ⊳𝑅 ,
images and videos is used to process the reality data. A which colors back-projected pixel 𝑝𝑖,𝑓 in frame 𝑓 based
10 IBRAHIM et al.

FIGURE 6 Visualizing 4D BIM and reality model in the developed web-based application

with a small offset (e.g., 1 mm) from the structure’s surfaces


to ensure features are rendered on top of their correspond-
ing elements. For resolution rendering pipeline, the depth
of each pixel is color encoded using
( )
𝑅 𝐺 𝐵
𝑑𝑖,𝑓 = + + × 𝐷max (12)
255 2552 2553

where, 𝑑𝑖,𝑓 is the depth of back-projected pixel 𝑝𝑖,𝑓 and


𝐷max is the maximum rendering depth. The value of 𝐷max
is set to 1000 to only render elements within 1000 m from
camera’s optical center, leading to a depth encoding reso-
lution of ∼ 0.1 mm. Nevertheless, the surface normal 𝑁⃗𝑖,𝑓
at each back-projected pixel 𝑝𝑖,𝑓 is color-coded using
F I G U R E 7 4D BIM visualized in the developed web-based
2
application and color-encoded for visual quality assessment 𝑁⃗𝑖,𝑓 = −1 + [𝑅, 𝐺, 𝐵] (13)
255

on its depth, encoded using RGB colors; (3) orientation The overall simulation and evaluation are executed
pipeline ⊳𝑂 that renders the relative orientation of each according to Algorithm 2. Applying each rendering
back-projected pixel 𝑝𝑖,𝑓 with respect to frame 𝑓 encoded pipeline at frame 𝑓 results in a new frame, thus the
using RGB colors; and (4) feature pipeline ⊳𝐹 that renders four rendering pipelines results in 𝑓𝑉 , 𝑓𝑅 , 𝑓𝑂 , and 𝑓𝐹 .
each simulated feature 𝑢 with a unique color as well. The rendering clear color is set to black, which has a
The color coding of visibility and feature rendering decoded index equals to zero to remove the background
pipelines follows the equation 𝐼𝑖 or 𝐼𝑢 = 2562 𝑅 + 256𝐺 + during evaluation.
𝐵, where 𝐼𝑖 is the index of the element 𝑖 ∈ Γ similarly 𝐼𝑢 is
the index of the feature 𝑢 ∈ Γ and 𝑅, 𝐺, and 𝐵 are the red,
green, and blue color channels, respectively. This color- 3.4 Visual quality analysis and feedback
coding strategy encodes indices with over 16.7 million val-
ues, sufficient for the simulated structure. For simulating While visual quality metrics are useful to compare data
visual features, occlusion of the features by the model’s ele- collection plans, they do not provide spatial feedback on
ments is accounted for using feature rendering pipeline ⊳𝐹 locations linked with low visual quality, which is vital for
that renders the structure Γ using background color to hide optimizing reality plans. Thus, each metric is calculated
occluded features. Additionally, the features are translated per element 𝑖 ∈ Γ, and visual feedback is provided using
IBRAHIM et al. 11

A l g o r i t h m 2 Simulation and evaluation process 4.1 Data set

As shown in Table 1, projects P1, P2, and P3 are data


collected for existing structures, projects P4, P5, P6, and
P7 are collected on undergoing construction sites. Project
P7 includes data collected at seven different construc-
tion stages and thus provides an example of using 4D
a priori for evaluating reality plans created for con-
struction progress monitoring. The collected data for P7
observe the construction at the structural foundation stage
(P7:1), installing structural steel frames and architectural
components (P7:2–P7:4), partial installation of mechani-
cal and plumbing systems (P7:5–P7:6), and the complete
installation of mechanical and plumbing systems (P7:7).
The collected data sets represent images observing var-
ious structures with different building systems captured
using cameras-equipped UAVs and 360◦ cameras. Three-
dimensional flight plans were used to collect data on P1
and P2, while top-down 2D flight plans were used for P5,
P7:1, P7:2, P7:3, and P7:7; 360◦ camera was used to col-
lect data manually for the remaining projects to mimic
ground rover capture. The number of frames collected
traffic-light colors gradient with red color indicating low per data set is reported in the table. A cubic projec-
visual quality and green for acceptable visual quality. tion was used to simulate a 360◦ equirectangular frame
The range of acceptable values per metric is manually during validation. The projection converts each equirect-
defined so that each element and camera frame is rendered angular frame into six orthogonal perspective frames.
with the traffic-color gradient from red to green. Setting Overall, the assembled data sets include three cap-
these ranges depends on the data collection purpose. For ture modalities for outdoor, indoor, and integrated out-
instance, inspection tasks require low SSD values (i.e., door/indoor environments (see Figure 8).
1 cm/px to 1 mm/px) to allow for detecting cracks and
corrosion. This flexibility enables the user to visually
detect low-resolution elements (i.e., red-colored elements) 4.2 Experimental results
and modify the reality plan by moving frames closer to the
elements improving their back-projected SSD. Similarly, Each reality data set contains reconstructed reality models
if a frame is rendered in red, more frames are needed to (3D point cloud and mesh model) and the project’s 4D
increase the feature track near that frame’s location. The BIM. Four different indicators were used to evaluate the
visual feedback enables the users to analyze and improve reconstructed model’s quality, focusing on assessing its
a plan’s visual quality in the same environment to guar- accuracy and completeness. A variation on the metrics
antee stability in image-based 3D reconstruction. Such introduced by Seitz et al. (2006) is adopted. Here, instead
evaluation and optimization are conducted iteratively of using a point-to-point comparison, the validation
until the visual quality of the plan is satisfactory. process measures the distance between depth values of
the reality mesh Ψ and the GT BIM Γ, per data frame.
The latter method improves the accuracy and speed of
4 EXPERIMENTS AND RESULTS the calculation. Thus, for each back-projected pixel 𝑝𝑖,𝑓 , a
depth value 𝑑𝑖,𝑓 is measured and attributed with the index
Validation of the metrics, the methods, and the whole of a back-projected element 𝐼𝑖 to capture the accuracy and
system was conducted using 13 data sets from seven completeness metrics per element. A weighted average
different real-world construction projects. The objectives accuracy 𝐴 (measured in meters) was used since the GT
are to measure the effectiveness of the visual quality metrics BIM models do not have a high level of development
to indicate the expected completeness and accuracy of the (e.g., LOD = 400). Also, some elements across the data
reconstructed models, and evaluate the system’s capability sets are not reconstructed correctly due to low coverage,
in providing numeric and visual feedback to improve reality which significantly affects accuracy. The weighted average
plans before their execution. accuracy is calculated using a pair-wise difference in depth
12 IBRAHIM et al.

TA B L E 1 Experimental setup
ID Project description Simulated systems Progress state Capture modality Device # of frames
P1 Five-storey institutional building S,A Completed Outdoors UAV 1924
P2 Four-storey commercial building S,A Completed Outdoors UAV 366
P3 Five-storey residential building S,A Completed Integrated 360◦ 183

P4 Two mechanical rooms and a facade S,A In progress Outdoors 360 282
P5 30-storey high-rise commercial building S,A In progress Outdoors UAV 153

P6 One floor of a commercial building S,A,M,P In progress Indoors 360 268
A warehouse and a connected one-storey office building captured at different dates
P7:1 S In progress Outdoors UAV 289
P7:2 S,A In progress Outdoors UAV 349
P7:3 S,A In progress Outdoors UAV 369
P7:4 S,A In progress Indoors 360◦ 508

P7:5 S,A,M In progress Indoors 360 354
P7:6 S,A,M,P In progress Integrated 360◦ 1251
P7:7 S,A,M,P In progress Iutdoors UAV 503
Note: S: structural, A: architectural, M: mechanical, P: plumbing.

FIGURE 8 Sample reality capture plans for (a) UAV and (b) ground rover in outdoor and indoor environments
IBRAHIM et al. 13

TA B L E 2 Evaluation results
Data set ID 𝑽𝑪 (%) 𝑽̄ 𝑪 (%) 𝑽𝑹 (#) 𝑹 (m) 𝑶 (◦ ) 𝑻 (#) 𝑨 (m) 𝑪𝒆 (%) 𝑪𝒗 (%) 𝑪𝒕 (%) Time (min)
P1 36.42 50.86 20.42 0.011 17.11 479.37 0.057 26.11 92.51 38.05 57.0
P2 48.84 96.34 35.77 0.007 24.61 499.83 0.076 88.94 98.03 49.11 4.8
P3 13.95 25.35 13.93 0.031 39.62 3326.11 0.067 30.3 94.27 29.29 13.8
P4 55.08 60.75 34.28 0.03 24.67 1765.8 0.143 40.31 94.37 56.78 9.6
P5 1.56 2.81 14.87 0.023 53.76 805.62 0.302 8.44 87.5 8.34 2.9
P6 3.64 11.38 23.34 0.013 39.42 1093.11 0.03 28.1 79.21 8.67 20.7
P7:1 91.86 92.94 24.99 0.007 29.75 209.71 0.072 18.96 93.85 70.93 3.8
P7:2 25.32 30.29 14.39 0.007 29.49 575.38 0.04 7.19 82.77 12.64 4.6
P7:3 7.81 31.29 12.41 0.012 42.77 116.42 0.03 1.98 87.5 2.86 4.1
P7:4 2.76 5.06 16.1 0.044 35.76 2400.47 0.1 53.46 94.36 30.44 27.5
P7:5 3.25 7.43 20.78 0.044 27.48 4597.41 0.106 45.55 91.66 31.70 36.7
P7:6 7.61 10.62 20.41 0.015 30.17 1012.56 0.04 27.35 85.58 36.68 49.7
P7:7 4.29 39.24 15.31 0.009 43.4 84.11 0.064 37.37 84.36 5.40 9.8

values between BIM and reality meshes (see Equation times are reported. Figures 9 and 10 show examples of the
(14)). The relative weight 𝜔𝑖,𝑓 for each pixel 𝑝𝑖,𝑓 is set visual feedback for outdoor data set P1 and indoor data
to the redundant observation value for the BIM element set P7:5. For more results, a demonstration video can be
detected at the pixel, which reflects the probability of accessed via https://vimeo.com/477370145.
correct reconstruction of an element:

∑𝑛 ∑𝑧 4.3 Validation of visual quality metrics


𝑓 𝑖 𝜔𝑖,𝑓 × (𝑑𝑖,𝑓 (Γ) − 𝑑𝑖,𝑓 (Ψ)))
𝐴= ∑𝑛 ∑𝑧 (14)
𝑓 𝑖 𝜔𝑖,𝑓 Pearson correlation analysis was conducted to validate the
metrics’ effectiveness in indicating the quality of recon-
Measuring the completeness per element 𝑖 ∈ Γ is as sig- struction. Significant correlations are only considered,
nificant as the overall completeness of a reality model Ψ, which include strong correlations (≥ 50% or ≤ −50%) and
because completely reconstructed elements are important moderate correlations ([25%, 50%] or [−50% , −25%]).
for computer vision–based progress monitoring. Hence, Figure 11 demonstrates the significant correlations
the average completeness per element 𝐶𝑒 and the overall between each metric and the reconstructed model’s
completeness of the reconstruction 𝐶𝑡 are measured. The quality for the 13 data sets.
percentage of back-projected pixels that have |𝑑𝑖,𝑗 (Γ) − The correlation shows a significant contribution of reso-
𝑑𝑖,𝑗 (Ψ)| ≤ 𝜆 is used to measure the completeness. Since 𝜆 is lution 𝑅 and viewpoints orientation 𝑂 metrics to the accu-
affected by the manual reality to BIM registration process, racy of the 3D reconstruction. The analysis proves that
the value of 𝜆 is set to 1.2× BIM-point cloud registration visual coverage 𝑉𝐶 , redundant visibility 𝑉𝑅 , and view-
error. Also, the average completeness per element for only points orientation 𝑂 contribute significantly to the com-
visible elements in the data set 𝐶𝑣 is measured to investi- pleteness of 3D reconstruction. It is also noted that using
gate low completeness of visible elements. Moreover, the visual quality metrics collectively is imperative since all the
minimum number of features Λ per image to form a fea- metrics contribute to the overall reconstruction quality.
ture track was set to 50.
Table 2 shows the visual evaluation metrics for each
data set calculated using equations and methods defined 5 DISCUSSION
in Section 3. The reported values are compared to the
quality of the 3D reconstruction per data set in terms It is found that the visual quality of a reality plan depends
of accuracy 𝐴, average completeness per element 𝐶𝑒 , on the completeness of the plan’s missions (e.g., 3D grid
average completeness per visible element 𝐶𝑣 , and overall plan vs. 2D top-down plan) and the level of spatial com-
completeness 𝐶𝑡 . Further analysis of the results is also plexity of the mapped structure (e.g., capturing structural
provided in the next section. The evaluations were run in systems vs. mechanical systems). For instance, data set
Google Chrome using Intel Core i7 12 × 2.60 GHz CPU P7:1 has 289 UAV top-down frames capturing the site
and NVIDIA GeForce GTX 1650 GPU, and computational during the foundation and structural walls construction
14 IBRAHIM et al.

F I G U R E 9 Visual quality feedback on outdoor capture plan P1: (a) visual coverage and redundant visibility, (b) elements’ resolution, (c)
viewpoint orientation, and (d) stability of reconstruction

F I G U R E 1 0 Visual quality feedback on indoor reality plan P7:5: (a) visual coverage and redundant visibility, (b) elements’ resolution,
(c) viewpoint orientation, and (d) stability of reconstruction
IBRAHIM et al. 15

F I G U R E 1 3 Correlation between back-projection resolution


metric and the accuracy of reconstruction. Blue region shows 95%
confidence in correlation

the redundant visibility metric 𝑉𝑅 shows the strongest


correlation (with 65% coefficient) with the average recon-
F I G U R E 1 1 Correlation between visual quality metrics and struction completeness per element. This correlation
quality of 3D reconstruction demonstrates that redundant observation of an element in
the collected data is important for correctly documenting
the element’s as-is condition.
The experiments show better resolution in back-
projection for images captured with a perspective UAV-
mounted camera compared to a 360◦ camera. Additionally,
in data set P6, where 360◦ frames are captured closer to the
structure, the resolution is superior compared to other data
sets collected using 360◦ cameras. A positive correlation
between back-projection resolution 𝑅 and the weighted
average accuracy 𝐴 is shown in Figure 13, which confirms
F I G U R E 1 2 Correlation between visual coverage metric and the correlation results in Figure 11.
overall completeness of reconstruction. Blue region shows 95%
It is found that canonical views lead to better accuracy
confidence in correlation
as they improve the back-projection resolution. However,
canonical views negatively impact the completeness of the
leading to visual coverage 𝑉𝐶 of 91.86% and average model. This finding means a successful 3D reconstruction
redundant visibility of 24.99 instances. Similarly, data set and improved image-based measurement require mixing
P4 captures the construction of two mechanical rooms canonical and noncanonical views. Finally, the stability of
and a building’s facade—simple structures—leading to 3D reconstruction is positively correlated with the com-
high visual coverage. For data set P2, a 3D flight plan is pleteness of the reality model while having no significant
used to collect data for an existing box-shaped four-storey effect on the accuracy of the 3D reconstruction.
commercial building resulting in a low overall visual These findings confirm the prior works’ results that
coverage (48.84%). However, the reported relative visual attributed visual coverage and resolution of collected data
coverage is 96.34%, which means that using a complete to the quality of 3D reconstruction. Additionally, they
3D outdoor reality plan is insufficient for covering indoor show the importance of measuring SSD and evaluating
elements. The lowest visual coverage metrics are reported the five proposed metrics collectively. The conducted
for data set P5 due to using a top-down UAV flight plan correlation analysis indicates—for the first time—the
inadequate to cover the complex vertical structure. It relative importance of the proposed metrics in evaluating
is also clear that data sets of complex mechanical and the accuracy and completeness of reconstructed models.
plumbing systems can result in low visual coverage, which Through analyzing the simulation and evaluation dura-
can be enhanced by revising capture plans and collecting tion per frame for both the UAV and 360◦ , it is found
more frames as in data set P7:6. that the average processing time per frame is ∼ 2.3 s.
Figure 12 shows the overall reconstruction completeness Since perspective frames have smaller dimensions than
𝐶𝑡 against the calculated visual coverage metric 𝑉𝐶 for equirectangular frames, the average evaluation time per
the 13 data sets. The finding here shows a strong positive perspective frame is ∼ 1 s while that for an equirectangular
correlation between the 𝑉𝐶 and 𝐶𝑡 , which supports the frame is ∼ 3.8 s. Accordingly, this proves that utilizing
conducted correlation analysis (Figure 11). Additionally, the proposed GPU-based methods for visualizing and
16 IBRAHIM et al.

evaluating reality plans is vital for the proposed system’s cations. Besides, the iterative evaluation and modification
feasibility, where bench-marking and optimizing reality processes are tedious. Such processes have to be repeated
capture plans before their execution require generating per data collection date where changes in the structure—
visual quality feedback promptly (within a few minutes). represented through 4D a priori—require altering the
reality plan. Future work will focus on automating the
creation and optimization of reality capture plans offline
6 CONCLUSION AND FUTURE WORK using the five developed metrics and operational require-
ments. Moreover, this work does not consider localization
This work demonstrated the importance of utilizing five errors during plans’ execution, which presents a challenge
visual quality metrics simultaneously to assess construc- to collect the data accurately.
tion reality capture plans. Besides, the proposed quality
metrics and their calculation methods were effective in
AC K N OW L E D G M E N T S
providing—within a few minutes—feedback on the recon-
The authors would like to acknowledge the financial
structed models’ quality during reality capture planning.
support of National Science Foundation (NSF) Grants
Nevertheless, the work showed that using 4D a priori
1446765 and 1544999. The authors also appreciate the
is essential for evaluating reality plans for construction
support of Reconstruct Inc. and all other construction
monitoring and asset inspection tasks.
companies who offered the real-world project data. Any
Results from 13 reality plans—created for seven con-
opinions, findings, conclusions, or recommendations
struction projects—concluded a significant correlation
expressed in this material are those of the authors. They
between the proposed metrics and the completeness and
do not necessarily reflect the view of the NSF, industry
accuracy of reconstructed reality models. It was found that
partners, or professionals mentioned above.
the expected visual coverage and redundant visibility of
elements in the reality plan are indicative of the overall
completeness of reconstructed models. Nonetheless, the REFERENCES
reconstruction’s accuracy depends on the resolution of Anil E. B., Tang P., Akinci B., Huber D. (2013). Deviation analysis
elements in the data frames measured in terms of SSD method for the assessment of the quality of the as-is Building
and the viewpoints’ orientation to the structure topology. Information Models generated from point cloud data. Automation
Additionally, the visual coverage and redundant visibility in Construction, 35, 507–516. https://doi.org/10.1016/j.autcon.2013.
metrics reflect the expected completeness of reconstruc- 06.003.
tion better than the reconstruction’s stability metric. Asadi, K., Kalkunte Suresh, A., Ender, A., Gotad, S., Maniyar, S.,
More interestingly, the experiments showed that setting Anand, S., Noghabaei, M., Han, K., Lobaton, E., Wu, T. (2020). An
integrated UGV-UAV system for construction site data collection.
camera trajectories to canonical views, which was believed
Automation in Construction, 112, 103068. https://doi.org/10.1016/j.
to improve reconstruction quality, actually leads to lower autcon.2019.103068
completeness. Thus, a combination of canonical and Autodesk Forge. (2020). https://forge.autodesk.com/api/model-
noncanonical views is recommended. It is important to derivative-cover-page/
note that more complex structures and building systems Baik, H., & Valenzuela, J. (2019). Unmanned aircraft system path
require comprehensive 3D reality plans; for example, the planning for visually inspecting electric transmission towers. Jour-
2D lawn-mowing pattern used in P5 results in low visual nal of Intelligent and Robotic Systems: Theory and Applications,
95(3-4), 1097–1111. https://doi.org/10.1007/s10846-018-0947-9
coverage (below 2%) and an overall completeness value of
Chen, M., Koc, E., Shi, Z., & Soibelman, L. (2018). Proactive 2D
∼ 8%. Since 360◦ cameras have low pixel resolution, it is
model-based scan planning for existing buildings. Automation in
suggested to place 360◦ waypoints close to the structure’s Construction, 93, 165–177. https://doi.org/10.1016/j.autcon.2018.05.
elements to improve SSD and visual coverage. 010
Finally, the feasibility of a client-server architecture Daftry, S., Hoppe, C., & Bischof, H. (2015). Building with drones:
for deploying the developed web-based system relies on Accurate 3D facade reconstruction using MAVs. IEEE Interna-
the performance of data visualization and storage using tional Conference on Robotics and Automation (ICRA), Seattle, WA,
efficient data structures. It was shown that leveraging USA (pp. 3487–3494). https://doi.org/10.1109/ICRA.2015.7139681
Degol, J., Golparvar-Fard, M., & Hoiem, D. (2016). Geometry-
GPU’s power for visualization, simulation, and processing
informed material recognition. Proceedings of the IEEE Computer
tasks can promptly provide feedback (∼ 2.3 s per frame) Society Conference on Computer Vision and Pattern Recognition,
and supports interactive optimization of reality capture Las Vegas, NV, USA (pp. 1554–1562). https://doi.org/10.1109/CVPR.
plans. 2016.172
While the presented methods provide feedback on the Degol, J., Lee, J. Y., Kataria, R., Yuan, D., Bretl, T., & Hoiem, D. (2018).
quality of reality plans, creating an optimal reality plan FEATS: Synthetic feature tracks for structure from motion evalu-
still relies on user-defined parameters and manual modifi- ation. In International Conference on 3D Vision, 3DV 2018, Verona,
IBRAHIM et al. 17

Italy (pp. 352–361). Institute of Electrical and Electronics Engi- Construction, 106, 102918. https://doi.org/10.1016/j.autcon.2019.
neers Inc. https://doi.org/10.1109/3DV.2018.00048 102918
Galceran, E., & Carreras, M. (2013). A survey on coverage path plan- Kopsida, M., Brilakis, I., & Vela, P. A. (2015). A review of
ning for robotics. Robotics and Autonomous Systems, 61(12), 1258– automated construction progress monitoring and inspection
1276. https://doi.org/10.1016/j.robot.2013.09.004 methods. In 32nd CIB W78 Conference 2015 (pp. 421–431).
Golparvar-Fard, M., Peña-Mora, F., Arboleda, C. a., & Lee, S. (2009). Eindhoven, The Netherlands. http://itc.scix.net/data/works/att/
Visualization of construction progress monitoring with 4D sim- w78-2015-paper-044.pdf
ulation model overlaid on time-lapsed photographs. Journal of Lin, W. Y. (2020). Automatic generation of high-accuracy stair
Computing in Civil Engineering, 23(6), 391–404. https://doi.org/10. paths for straight, spiral, and winder stairs using IFC-based mod-
1061/(ASCE)0887-3801(2009)23:6(391) els. ISPRS International Journal of Geo-Information, 9(4), 22–26.
Ham, Y., Han, K. K., Lin, J. J., & Golparvar-Fard, M. (2016). https://doi.org/10.3390/ijgi9040215
Visual monitoring of civil infrastructure systems via camera- Lin, J. J., & Golparvar-fard, M. (2016). Web-based 4D visual produc-
equipped unmanned aerial vehicles (UAVs): A review of related tion models for decentralized work tracking and information com-
works. Visualization in Engineering, 4(1), 1. https://doi.org/10. munication on construction sites. Construction Research Congress
1186/s40327-015-0029-z 2016, San Juan, Puerto Rico (pp. 1731–1741). https://doi.org/10.1061/
Hamledari, H., McCabe, B., & Davari, S. (2017). Automated com- 9780784479827.203
puter vision-based detection of components of under-construction Lin, Y. H., Liu, Y. S., Gao, G., Han, X. G., Lai, C. Y., & Gu, M. (2013).
indoor partitions. Automation in Construction, 74, 78–94. https: The IFC-based path planning for 3D indoor spaces. Advanced
//doi.org/10.1016/j.autcon.2016.11.009 Engineering Informatics, 27(2), 189–205. https://doi.org/10.1016/j.
Han, K. K., & Golparvar-Fard, M. (2015). Appearance-based mate- aei.2012.10.001
rial classification for monitoring of operation-level construction Lindner, S., Garbe, C., & Mombaur, K. (2019). Optimization based
progress using 4D BIM and site photologs. Automation in Con- multi-view coverage path planning for autonomous structure from
struction, 53, 44–57. https://doi.org/10.1016/j.autcon.2015.02.007 motion recordings. IEEE Robotics and Automation Letters, 4(4),
Han, K. K., & Golparvar-Fard, M. (2017). Potential of big visual data 3278–3285. https://doi.org/10.1109/LRA.2019.2926216
and building information modeling for construction performance Park, H. S., Lee, H. M., Adeli, H., & Lee, I. (2007). A new approach
analytics: An exploratory study. Automation in Construction, 73, for health monitoring of structures: Terrestrial laser scanning.
184–198. https://doi.org/10.1016/j.autcon.2016.11.004 Computer-Aided Civil and Infrastructure Engineering, 22(1), 19–30.
Hoiem, D. (2018). Maximize measurement accuracy with images over- https://doi.org/10.1111/j.1467-8667.2006.00466.x
laid on point clouds. 1–4. https://medium.com/reconstruct-inc/ Phung, M. D., Quach, C. H., Dinh, T. H., & Ha, Q. (2017). Enhanced
maximize-measurement-accuracy-with-images-overlaid- discrete particle swarm optimization path planning for UAV
on-point-clouds-dca828f4a539. vision-based surface inspection. Automation in Construction, 81,
Ibrahim, A., & Golparvar-Fard, M. (2019). 4D BIM based opti- 25–33. https://doi.org/10.1016/j.autcon.2017.04.013
mal flight planning for construction monitoring applications Rebolj, D., Pucko, Z., Babic, N. C., Bizjak, M., & Mongus, D. (2017).
using camera-equipped UAVs. In Computing in Civil Engineer- Point cloud quality requirements for Scan-vs-BIM based auto-
ing 2019, Atlanta, Georgia (pp. 217–224). https://doi.org/10.1061/ mated construction progress monitoring. Automation in Construc-
9780784482438.028 tion, 84, 323–334. https://doi.org/10.1016/j.autcon.2017.09.021
Ibrahim, A., Golparvar-Fard, M., Bretl, T., & El-Rayes, K. (2017). Reconstruct. (2020). https://www.reconstructinc.com/
Model-driven visual data capture on construction sites: Method Rodríguez-Gonzálvez, P., Rodríguez-Martín, M., Ramos, L. F., &
and metrics of success. In International Workshop for Computing González-Aguilera, D. (2017). 3D reconstruction methods and
in Civil Engineering (IWCCE 2017), Seattle, Washington (pp. 109– quality assessment for visual inspection of welds. Automation in
116). https://doi.org/10.1061/9780784480847.014 Construction, 79, 49–58. https://doi.org/10.1016/j.autcon.2017.03.
Ibrahim, A., Roberts, D., Golparvar-Fard, M., & Bretl, T. (2017). An 002
interactive model-driven path planning and data capture system Seitz, S. S. M., Curless, B., Diebel, J., Scharstein, D., & Szeliski, R.
for camera-equipped aerial robots on construction sites. Inter- (2006). A comparison and evaluation of multi-view stereo recon-
national Workshop for Computing in Civil Engineering (IWCCE struction algorithms. In IEEE Computer Society Conference on
2017), Seattle, Washington (pp. 117–124). https://doi.org/10.1061/ Computer Vision and Pattern Recognition (CVPR’06) (vol. 1, pp.
9780784480847.015 519–528). https://doi.org/10.1109/CVPR.2006.19
Javadnejad, F., Slocum, R. K., Gillins, D. T., Olsen, M. J., & Parrish, Szeliski, R. (2020). Computer vision: Algorithms and applications.
C. E. (2021). Dense point cloud quality factor as proxy for accuracy Springer. https://doi.org/10.1007/978-1-84882-935-0
assessment of image-based 3D reconstruction. Journal of Survey- Taneja, S., Akinci, B., Garrett, J. H., & Soibelman, L. (2016). Algo-
ing Engineering, 147(1), 04020021. https://doi.org/10.1061/(asce)su. rithms for automated generation of navigation models from
1943-5428.0000333 building information models to support indoor map-matching.
Kalyan, T. S., Zadeh, P. A., Staub-French, S., & Froese, T. M. (2016). Automation in Construction, 61, 24–41. https://doi.org/10.1016/j.
Construction quality assessment using 3D as-built models gen- autcon.2015.09.010
erated with project tango. Procedia Engineering, 145, 1416–1423. Tuttas, S., Braun, A., Borrmann, A., & Stilla, U. (2016). Evaluation of
https://doi.org/10.1016/j.proeng.2016.04.178 acquisition strategies for image-based construction site monitor-
Kim, P., Park, J., Cho, Y. K., & Kang, J. (2019). UAV-assisted ing. ISPRS - International Archives of the Photogrammetry, Remote
autonomous mobile robot navigation for as-is 3D data collec- Sensing and Spatial Information Sciences, 41, 733–740. https://doi.
tion and registration in cluttered environments. Automation in org/10.5194/isprsarchives-XLI-B5-733-2016
18 IBRAHIM et al.

Xu, Z., Kang, R., & Lu, R. (2020). 3D reconstruction and measure- Zhang, C., Kalasapudi, V. S., & Tang, P. (2016). Rapid data qual-
ment of surface defects in prefabricated elements using point ity oriented laser scan planning for dynamic construction envi-
clouds. Journal of Computing in Civil Engineering, 34(5), 04020033. ronments. Advanced Engineering Informatics, 30(2), 218–232.
https://doi.org/10.1061/(ASCE)CP.1943-5487.0000920 https://doi.org/10.1016/j.aei.2016.03.004
Yang, J., Park, M. W., Vela, P. A., & Golparvar-Fard, M. (2015). Con-
struction performance monitoring via still images, time-n=lapse
photos, and video streams: Now, tomorrow, and the future. How to cite this article: Ibrahim A,
Advanced Engineering Informatics, 29(2), 211–224. https://doi.org/ Golparvar-Fard M, El-Rayes K. Metrics and
10.1016/j.aei.2015.01.011 methods for evaluating model-driven reality
Zhang, C., & Arditi, D. (2013). Automated progress control using
capture plans. Comput Aided Civ Inf. 2021;1–18.
laser scanning technology. Automation in Construction, 36, 108–
116. https://doi.org/10.1016/j.autcon.2013.08.012
https://doi.org/10.1111/mice.12693

You might also like