You are on page 1of 22

Automation in Construction 131 (2021) 103892

Contents lists available at ScienceDirect

Automation in Construction
journal homepage: www.elsevier.com/locate/autcon

Review

Deep-learning-based visual data analytics for smart


construction management
Aritra Pal, Shang-Hsien Hsieh *
Department of Civil Engineering, National Taiwan University, Taipei, Taiwan

A R T I C L E I N F O A B S T R A C T

Keywords: Visual data captured at construction sites is a rich source of information for the day-to-day operation of con­
Deep learning struction projects. The development of deep-learning-based methods has demonstrated their capabilities in
visual data analytics analyzing complex visual data and inferring valuable insights. Recent applications of these methods in con­
construction management
struction have also shown promising performance in making the construction management process smarter. To
generalized workflow
understand the current research trends and to highlight future research directions, this study reviews state-of-
3D visual data
the-art deep-learning applications on visual data analytics in the context of construction project management.
This in-depth review identifies six major fields and fifty-two subfields of construction management where deep-
learning-based visual data analytics have been applied. It also proposes a generalized workflow for applying
deep-learning-based visual data analytics methods for solving construction management problems. In addition,
the study highlights three future research directions where deep-learning-based visual data analytics can be
applied on relatively less explored 3D visual data.

1. Introduction initiative of automating construction monitoring and control has also


prompted researchers and practitioners to try different methods similar
Visual data analytics is a solution for improving construction man­ to vision-based methods. Some of these alternative methods used audio-
agement practices. According to the Construction Management Associ­ signal processing for recognition of construction activities [4], while
ation of America (CMAA), “Construction management is a professional others used installed devices like ultra-wideband, inertial measurement
service that provides a project’s owner(s) with effective management of the units, radio-frequency identification, or a global positioning system on
project’s schedule, cost, quality, safety, scope, and function” [1]. However, construction workers or equipment items for monitoring and tracking
traditional construction management (CM) practices have often failed to their performances. The major limitation of the audio-based method is
achieve their primary objective because of common problems in this distinguishing the unique sounds of each activity on a noisy construction
industry, such as poor productivity, high occupational safety risk, and site [4]. Another is that worn installed devices cause discomfort to
poor quality of product delivery. MGI 2017 reported that construction- workers, which negatively affects their productivity [5]. Furthermore,
related spending contributes 13% to the global GDP but the annual the adoption of these methods is often hindered by the high cost of
productivity growth for this sector has remained at only 1% over the installation and monitoring [6].
past twenty years. This growth rate is much lower than the global The visual data collected daily from construction sites in the form of
average of 2.8%, as well as the 3.6% growth rate of the manufacturing images, time-lapse videos, and video streams contain a lot of informa­
industry [2]. According to the Occupational Safety and Health Admin­ tion relevant for effective project monitoring and management [7].
istration, United States Department of Labor, 3.5 per 100,000 full-time- Proper and systematic deployment of visual data collection and analysis
equivalent workers died on the job in 2019 [3]. Needless to say, the strategies in construction projects can significantly reduce labor-
current situation demands improved construction management intensive site-monitoring tasks, so project managers and site personnel
practices. can focus more on evidence-based decision-making and quick resolu­
The application of digital solutions in the construction industry has tions to project constraints. In recent years, the advent of computer
already shown significant benefits in overcoming these challenges. The vision (CV) techniques, the relatively lower cost of storing and

* Corresponding author.
E-mail address: shhsieh@ntu.edu.tw (S.-H. Hsieh).

https://doi.org/10.1016/j.autcon.2021.103892
Received 23 May 2021; Received in revised form 3 August 2021; Accepted 12 August 2021
Available online 19 August 2021
0926-5805/© 2021 Elsevier B.V. All rights reserved.
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

processing information, and the substantially reduced cost of visual- However, there are still many open challenges to address for deploying
data-capturing tools have made the application of vision-based moni­ these research findings directly in complex construction projects. At this
toring techniques for construction management obvious. point, it is important to review the state of the art, highlight future
Although CV applications in construction projects have been re­ research directions for the construction management community, and
ported since 2009, they have been limited to some specific tasks. provide a generalized workflow for DL-based visual data analytics for
Widespread applications of vision-based methods in construction man­ construction practitioners. Although some recent reviews [24,25] have
agement were constrained until the adoption of deep learning [8]. Deep focused on vision-based techniques for construction, those are not spe­
learning (DL) is a subset of machine learning that is largely based on cific to DL. As DL algorithms have outperformed almost all traditional
artificial neural networks (ANN) [9]. They use neural networks with CV methods in terms of performance and robustness, a dedicated review
multiple layers to progressively extract higher-level features from raw of DL is much needed. To the best of our knowledge, so far only two
input data [10]. The end-to-end learning approach makes it significantly papers [26,27] have reviewed DL applications in the architecture, en­
different from traditional CV methods, which heavily rely on hand­ gineering, and construction (AEC) industries. Details of these reviews
crafted features. Extraction and selection of important features for are presented in Table 1. As Hou et al. [27] reviewed DL applications
initiating the traditional CV methods require expert CV engineers and a only related to safety management, other applications need further re­
long trial-and-error process [11]. Fig. 1 shows the difference in work­ view. Akinosho et al. [26] included papers published until early 2020 in
flows between traditional CV methods and DL methods. Traditional CV their review. However, a significant number of relevant papers have
methods such as the histogram of gradients and Haar-like features been published after that. Additionally, both papers are not specific to
methods have been widely used in construction applications for shape vision-based methods and they did not provide any generic workflow for
feature extraction, whereas mixture-of-Gaussians-based background DL-based visual data analytics. These knowledge gaps in the earlier
subtraction or foreground detection methods have been used for spatio- literature highlight the need for further review.
temporal feature extraction [12]. Visual data analytics received a boost To bridge these gaps, this paper aims to review state-of-the-art DL
in the 2012 ImageNet large-scale visual recognition challenge applications in the vision-based monitoring of construction projects and,
(ILSVRC2012), where Krizhevsky et al. [13] used a convolutional neural to this end, a generalized workflow for DL-based visual data analytics is
network (CNN)—a DL algorithm—for image classification. Additionally, proposed for improved construction management. During the review
the rapid advancement of graphics processing units (GPU) has acceler­ process, a few open challenges are identified and three future research
ated the computation speed of complex DL algorithms, which has directions are highlighted. The remainder of this paper is organized as
eventually triggered the application of DL-based visual data analytics in follows: Section 2 describes the research methodology, including liter­
many fields [14,15], along with construction. ature selection and analyses; Section 3 demonstrates the generalized
In recent years, researchers and practitioners have extensively used workflow for DL-based visual data analytics; Section 4 reviews the
DL for analyzing visual data and inferring valuable information for evolution of DL applications in construction management; Section 5
effective construction management purposes. The visual data collected highlights the open challenges and future research directions; and
from construction sites are utilized for various operations-level man­ finally, Section 6 summarizes and concludes the study.
agement purposes, such as monitoring construction safety [16], moni­
toring equipment and worker performance [17,18], monitoring the 2. Research methodology
progress of construction activities [19], in situ and post-construction
quality assessment [20], construction waste management [21], facil­ 2.1. Literature search
ities management [22], and dynamic worksite management [23].
Various CV tasks such as image classification, object detection, object The rise of artificial intelligence (AI) applications in almost every
tracking, pose estimation, and activity recognition were solved through field of study has encouraged AEC experts to explore different machine-
DL algorithms as an integral part of these applications. The number of learning and DL applications in their domain for solving many existing
studies related to DL applications in construction has been skyrocketing problems. The interesting outcomes and favorable conditions in recent
since 2018 because of many factors, including the availability of a few times have accelerated the growth of such research during the last
public datasets, the advancement of GPU processing, and so on. decade. To explore the research evolution in the domain of DL-based

Input Features Output

Feature Engineering Classifier with


(Manual Extraction + Selection) shallow structure

Input Output

End-to-end Learning
(Feature Learning + Classifier)

Fig. 1. Workflows of the traditional CV method (top) and DL-based methods (bottom) [11].

2
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

Table 1 Two sets of keywords were used for searching. The first set was related
Details of the earlier DL-related reviews in the AEC domain. to DL applications in construction for analyzing visual data and the other
Ref. Search details Scope of the set was related to CV applications in construction management. The
paper keyword set 1 consisted of ("deep learning” OR CNN OR conv OR convnet
Databases Search Search No. of papers
searched period keywords reviewed OR convolutional OR RNN OR recurrent OR GAN OR autoencoder OR
“supervised learning” OR “unsupervised learning” OR “reinforcement
[27] Web of 2010 (“deep 527 papers Applications
Science to learning” OR for related to
learning” OR “deep reinforcement learning” OR LSTM) AND vision AND
(WoS) 2020 “machine bibliometric safety construction and the keyword set 2 consisted of "computer vision" AND
learning” OR analysis. The management "construction" AND (manage* OR monitor* OR track*). The keywords
“convolutional number of within one set were combined using ‘AND’ or ‘OR’ operators. The use of
neural Papers used
the ‘*’ symbol helped to search all variants of a related word. The key­
network*” OR for thorough
CNN* OR RNN review was words were searched within the title, keywords, and abstracts of the
OR “Recurrent not specified papers. As the CNN introduced by Krizhevsky et al. [13] in ILSVRC2012
neural was a breakthrough in the DL domain, the start time for literature search
network*”) was set as January 2012. Data compilation and analysis were started in
AND
(construction*
March 2021, accordingly, and documents published until February 2021
OR site* OR were included in this study. The search results from both databases using
civil* OR “AEC two sets of keywords were exported into a spreadsheet. The lists were
industr*”) AND integrated by removing the duplicate entries. The conditional format­
(crack* OR
ting function was applied to the ‘Article Title’ column in the spreadsheet
“structu*
health to identify the duplicate entries and advanced filtering function was
monitoring” applied to remove them. The initial search retrieved 153 papers from
OR SHM OR WoS and 468 papers from Scopus. The integration process retained 380
damage* OR papers.
defect* OR
maintenance*
In the next step, the retained papers that are written in English were
OR inspection* checked for their relevance to the research theme by manually skimming
OR behavi* OR through the title, abstracts, and keywords. After the initial sorting, 145
safe* OR papers were selected for thorough review. However, during the detailed
unsafe* OR
review, seventeen papers were discarded for lack of relevance to the
fatigue* OR
concrete* OR research theme or for not providing enough information required for
“computer this study. Furthermore, a secondary search strategy was adopted to
vision*” OR check and include the papers that were referred by the finally sorted
“Natural papers. A Scopus RESTful API wrapper, pybliometrics [30], was used to
Language
Processing” OR
compile a list of unique cited papers from the references of the sorted
NLP OR papers. The list contained both types of papers: Scopus indexed and non-
integration*) indexed. A python script was written to retrieve the metadata of the
[26] Scopus 2012 “deep 45 SCI Applications Scopus indexed papers using pybliometrics and non-indexed papers
and to learning”, indexed related to
through web scrapping. Next, the relevant papers were filtered out by
Science 2020 “deep learning paper for the construction
Direct in the in-depth applying the same search (keywords and time period) and sort (initial
construction review and final) criteria as used in the case of primary search. At the end of this
industry”, process, additional fourteen papers were included for thorough review.
“recurrent Finally, a total of 142 papers, comprising 81 journal articles, 49 con­
neural
ference papers, and 12 review papers, were reviewed in-depth for
networks”,
“deep neural addressing the research objective. The content analysis was done in two
networks”, steps: bibliometric analysis and in-depth review. The bibliometric ana­
“convolution lyses focused on identifying year-on-year publication growth, journals
neural
and conferences involved in publications of related literature, and the
networks”,
“Auto- relevance of the selected papers to the research theme through keyword
Encoders” co-occurrence mapping. In the in-depth review, the papers were classi­
fied based on their application fields and subfields within the domain of
construction management. The relevant information required for
visual data analytics for construction management, a systematic litera­ formulating the proposed workflow was extracted from the papers at the
ture review was conducted. The systematic review process consisted of same time.
three steps: primary literature search and retrieval, sorting of the
collected papers based on relevance to the subject area and secondary 2.2. Bibliometric analysis
search for including papers cited by the sorted papers, and qualitative
content analysis for indicating the major application fields so far, open In Fig. 3, the year-wise publication numbers indicate the popularity
challenges, and probable future research directions. Finally, based on of this research field in recent years. Although the first study in this
the accumulated information, a DL-based visual data analytics workflow domain was published in the year 2015, the research started gaining
integrated with construction domain knowledge was proposed. Fig. 2 importance in 2018. The year-on-year publication growth has had an
shows a graphical representation of the research methodology. average rate of nearly 50% in the last three years. It is worth noting that
In the first step, journal articles and conference papers were retrieved this growth is apparent in 2021 too. In the first couple of months, eleven
from two major online databases: the Web of Science core collection papers have been published in reputable journals and conferences
(WoS) and Scopus. The main reason for selecting these databases was worldwide. The faster growth surely indicates the potential for more
the wide availability of high-quality academic journals and conference advanced future research in this area. Articles selected in this review
papers related to the engineering and management domain [28,29]. were published in 24 highly reputable journals. Table 2 gives the list of

3
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

Literature Database Web of Science Core Collection, Scopus


Search &
Retrieval
Type of Literature Journal Articles, Conference Papers

Keywords Keywords Set-1: ("Deep Learning" OR cnn OR conv OR


convnet OR convolutional OR rnn OR recurrent OR gan OR
autoencoder OR "supervised learning" OR "unsupervised
learning" OR "reinforcement learning" OR "deep reinforcement
learning" OR lstm ) AND vision AND construction

Keywords Set-2: "computer vision" AND "construction" AND


(manage* OR monitor* OR track*)

Time Period Jan 2012 to Feb 2021

Remove duplicate papers: using conditional formatting and


Integration
advanced filtering function in Excel

Keep papers written in English; Check relevance of the paper title,


Sorting of Initial Sorting
keyword, and abstract
Literature &
Secondary
Discard papers unrelated to deep learning applications, vision
Search Final Sorting
based approach, and construction management

Secondary Search List out papers that are cited by the finally sorted papers; Apply
& Sorting primary search and sorting criteria to retrieve relevant papers

Literature Scientometric
Content Paper publication statistics, Keywords co-occurrence analysis
Analysis
Analysis
Construction Safety, Productivity,
In depth Review Deep Learning Progress Monitoring, Quality
Application fields Management, Waste Management,
Facilities Management

Components of the
Preparation, Preprocessing, Model
Generalized
Selection, Training, validation, Testing
Framework

Future Research Directions: 3D Localization, Instance


Contribution For CM Researchers Segmentation of 3D Point cloud, Automatic Scan to BIM,
Integrated Applications

For CM A Generalized workflow for DL-based visual data analytics


Practitioners integrated with construction domain knowledge

Fig. 2. Research methodology.

journals with more than two publications, along with their impact fac­ idea about the relevance of the selected papers with the research theme
tor. The ten journals appearing in this list contain 85% of the selected [28]. For creating the keyword co-occurrence map, VOSviewer software
articles. The impact factor information is based on the 2020 Journal was used. Out of 362 author keywords, 27 keywords whose frequency of
Citation Reports (Clarivate Analytics). This review also includes con­ occurrence was more than three were included in the co-occurrence
ference papers from 33 international conferences especially focusing on map. Table 4 shows the frequency of occurrences and the average
automating engineering processes. Table 3 shows the list of conferences citation scores of 27 selected keywords. The keywords were grouped
where at least two papers related to this research theme were published. into four clusters based on the linkages between them and four different
It is observed that almost 51%, i.e., 25 out of 49, selected conference colors were assigned to represent the clusters. The co-occurrence map is
papers were published in these five conferences held in different years. visualized in Figure 4. The size of the bubble corresponding to each
Keyword co-occurrence mapping can provide a reasonable under­ keyword represents the frequency of occurrence and the thickness of the
standing of the scope of any research field. Moreover, it can provide an links between two keywords highlights the strength of linkage. The

4
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

strongest links between the two most frequent keywords, ‘computer


Article Conference Paper Review
vision’ and ‘deep learning’, surely indicate the importance of DL for
70 visual data analytics. The other associated keywords also motivated the
60 formulation of the generalized workflow for DL-based visual data ana­
lytics by highlighting the construction applications (Safety, Productivity
9
Number of publications

50
Analysis, Inspection, Defect Detection, and Structural Health Moni­
20
40 1 toring), CV tasks (Image Processing, Object Detection, Semantic Seg­
Till
30 14
Feb mentation, and Pose Estimation), methods (Transfer Learning, Deep
Neural Network, Convolutional Neural Network, and Faster R-CNN),
0
2021
20 10

24
33 and the target entities (Construction Site, Construction Equipment, and
Construction Workers). The clustering in Fig. 4 shows that keywords
10 2
16 2
7
0 0 1
2015
0 0 1
2016
0 1 1
2017
0
2018 2019 2020 2021
from different categories often occurred more frequently together. For
Year of publication example, the ‘object detection’ keyword that belongs to the CV task
category is found in the cluster of construction applications. This in­
Fig. 3. Year-wise publication numbers. dicates that certain methods are predominant in addressing some spe­
cific challenges and researchers frequently refer to those methods.
According to the VOSviewer manual, the average citation score is the
Table 2
average number of citations received by the documents in which a
Journals with more than two publications.
keyword occurs. In this study, this score was estimated as of 23rd June
Journal title Number of Impact 2021. The average citation score can indicate the importance of certain
articles factor
keywords within the research community. ‘Faster R-CNN’, ‘Construction
Automation in Construction 42 7.700 Site’, ‘Transfer Learning’, ‘Object Detection’ and ‘Safety’ were found to
Journal of Computing in Civil Engineering 10 4.640
be the five most important keywords within this research field.
Advanced Engineering Informatics 7 5.603
Computer-Aided Civil and Infrastructure 4 11.775
Engineering 3. Generalized workflow for deep-learning-based visual data
Journal of Construction Engineering and 4 3.951 analytics
Management
Journal of Building Engineering 3 5.318
As DL is becoming more powerful for analyzing visual data, it is
Frontiers in Built Environment 3 -NA-
Applied Sciences (Switzerland) 2 2.679 necessary to formulate a generalized workflow for using DL in con­
Sensors (Switzerland) 2 3.576 struction management applications supported with domain expertise
Structural Health Monitoring 2 5.929 and knowledge. The CV community has provided various off-the-shelf
DL models for applications that can be implemented directly without
having a deeper insight into the models. However, domain expertise and
Table 3 knowledge always play a vital role in getting the desired solution to an
Conferences with more than two publications. engineering problem [31,32]. The proposed workflow suggests the
Conference name Number of fusion of construction domain knowledge and knowledge extracted from
papers visual data through four main steps: domain knowledge mapping,
International Symposium on Automation and Robotics in 12
knowledge extraction, knowledge fusion, and knowledge synthesis.
Construction (ISARC) Fig. 5 shows a schematic diagram of the generalized workflow. The first
ASCE International Conference on Computing in Civil Engineering 5 step, i.e., domain knowledge mapping, deals with problem identification
(I3CE) and ontology modeling related to that problem based on the experts’
Construction Research Congress (CRC) 4
experience in that field. The main decisive outcome of this step is
International Conference on Computing in Civil and Building 2
Engineering (ICCCBE) whether or not to use vision-based methods for solving the problem.
Workshop of the European Group for Intelligent Computing in 2 Once the vision-based methods are chosen to be part of the solution, the
Engineering, (EG-ICE) next step is a problem-specific knowledge extraction through visual data
analytics. In this study, only DL-based approaches for visual data ana­
lytics are discussed, owing to various reasons stated in the Introduction
section. The knowledge extracted from this step can be of various types,
such as an identity of an object, trajectory of the object, pose of the

Table 4
Frequency of occurrences and average citation scores of author keywords.
Author keywords Frequency Avg. citations Author keywords Frequency Avg. citations

Faster R-CNN 4 45.00 Construction worker 8 14.50


Construction site 4 30.00 Automation 3 14.33
Transfer learning 3 30.00 Deep neural network 5 13.80
Object detection 14 22.64 Defect detection 9 13.67
Safety 13 21.54 Construction 6 13.67
Deep learning 57 18.46 Structural health monitoring 3 11.67
Image processing 4 16.25 Artificial intelligence 4 10.50
Earthmoving 3 15.67 Construction equipment 5 7.80
Productivity analysis 3 15.67 Semantic segmentation 5 7.60
Simulation 3 15.67 Pose estimation 3 7.00
Machine learning 13 15.46 Machine vision 3 6.67
Convolutional neural network 30 15.17 Tracking 3 6.33
Computer vision 52 14.81 Inspection 4 4.75
Vision-based method 7 14.57

5
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

Fig. 4. Keyword co-occurrence map.

DOMAIN KNOWLEDGE KNOWLEDGE KNOWLEDGE


KNOWLEDGE EXTRACTION FUSION SYSTHESIS
MAPPING
Problem specific Fuse domain knowledge
and extracted knowledge Synthesize the fused
Problem identification, knowledge extraction
for reasoning. E.g. Spatial knowledge for problem
Expert experience, through visual-data
Temporal Analysis solving
Ontology modeling analytics

Data Collection Preparation Pre-processing Model Selection Training Validation & Deployment
Testing
Capture images, time- Dataset preparation, Image resizing, image CV task identification Model training and Validate the results, Select deployment
lapse images, or videos filtering & selection of augmentation, and and model selection. hyperparameter tuning. predict on test data, options considering
images, and labeling dataset splitting Model types: CNN, Selection of training evaluate the model cost vs. performance
RNN, GAN strategies performance trade-off

Fig. 5. Generalized workflow for DL-based visual data analytics.

object, or the activities being performed by the object. This information 3.1. Knowledge extraction through DL-based visual data analytics
can be further fused with the domain knowledge in the knowledge
fusion step through spatio-temporal analysis for extracting more se­ The performance of the DL-based methods heavily relies on the data.
mantic meaning. The knowledge fusion can provide information about Before putting these methods into practice, the collection, preparation,
the spatial position and the temporal behavior of the object and also and pre-processing of those data are necessary [33]. Additionally, these
highlight the interaction and relationship between different objects in a data processing tasks demand human efforts and skills [34]. Therefore,
given problem scenario. The final knowledge synthesis step provides the knowledge extraction through DL-based visual data analytics is required
solution to the identified problem by synthesizing the knowledge ac­ to be completed through seven major steps: data collection, preparation,
quired from the past three steps. This step can help in applying the pre-processing of data, model selection based on the purpose, model
inferred knowledge from the domain expertise and DL-based visual data training, validation and testing, and deployment. Fig. 6 shows the seven-
analytics for improving the current construction management practices. step workflow of the knowledge extraction from visual data.
Although four steps are equally important for obtaining the desired so­
lution of a given problem, knowledge extraction through DL-based vi­ 3.1.1. Data collection
sual data analytics is discussed in detail to remain focused on the Problem-specific data collection is the first step for implementing the
research theme. A detailed step-by-step explanation of each component DL-based methods in construction applications. Visual data generally
of DL-based visual data analytics is discussed in the next sub-section. consists of still photos, time-lapse images, and video streams [7]. These

6
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

Fig. 6. Details of the knowledge extraction workflow.

7
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

visual data collected from construction sites are valuable sources of in­ [58], specific datasets related to construction applications are limited.
formation for construction management. However, because of the un­ Hence, researchers made an effort to create and share construction-
availability of a proper workflow for visual data analytics in application-specific datasets for promoting CV research and practice in
construction sites, captured data are often unordered and may not be the construction domain. The AIRCon Lab from the University of Alberta
useful for a specific purpose. The establishment of a data capture plan at developed the Alberta Construction Image Dataset (ACID), which con­
the site level could be a useful strategy. Today a wide range of visual sists of 10,000 labeled images of ten categories of construction machines
data-capturing devices from inexpensive point-and-shoot cameras to the [50]. This dataset can be used for object detection purposes. Roberts and
most expensive ground-dwelling robots equipped with cameras are Golparvar-Fard [59] developed a public dataset for activity analysis of
available in the market. Two types of data collection methods using earthmoving operations by annotating ten construction videos. The Max
these devices are generally found in construction applications. In some Planck Institute for Informatics (MPII) human pose data set for 2D joint
applications, cameras were kept fixed at one location and, in others, recognition and the Human 3.6M Dataset for 3D joint coordinate esti­
cameras were mounted on mobile devices that can move across the mates were frequently used by construction researchers for worker’s
project sites. Surveillance cameras [33,35,36] and common red-green- pose estimation [60] and activity analysis [18]. The Advanced Infra­
blue (RGB) cameras [37], such as smartphone cameras, point and structure Management (AIM) construction vehicle dataset, which is a
shoot cameras, DSLRs, and camcorders fixed with tripods or any other subset of the ImageNet dataset, was developed by Kim et al. (2018c) and
supports, have been generally used as fixed cameras. In the case of fixed further improved by Arabi et al. [51] for multiple construction equip­
camera applications, camera placement plays an important role. How­ ment detections. The GDUT-Hardhat Wearing Detection (GDUT-HWD)
ever, optimizing the camera placement by maximizing the coverage area dataset was used widely in construction research for detecting whether
with minimum cost has been challenging to date. In recent studies, re­ workers wore hard hats [61]. Construction material recognition datasets
searchers have tried to address both single-camera [38] and multi- created by Dimitrov and Golparvar-Fard [62] were used for construction
camera placement [39] issues as an optimization problem. Meanwhile, progress monitoring. In the case of nonavailability of problem-specific
mobile cameras have been attached to aerial robots such as UAVs [40] or datasets, researchers developed customized datasets with the visual
ground-dwelling robots [41] for robotic operations and/or inspection of data collected from construction sites [6,63,64]. However, the visual
dynamic job sites [42]. Some applications also highlighted the use of data collected through web crawling or from construction sites need to
cameras mounted on the hard hats of construction workers for localizing go through a filtering and selection process to ensure their usability. This
them on the construction sites [43]. Apart from normal RGB cameras, process removes duplicate and privacy-protected images and checks the
stereo cameras/depth cameras and 360-degree cameras have also shown suitability of image resolution and object size [50]. Another important
promising performance in construction-specific applications. Stereo part of the preparation process is annotating the selected images ac­
cameras were found to be useful in capturing the depth information of cording to the ground truth information. Various annotation tools have
the three-dimensional (3D) environment [44,45], tracking dynamic been used by researchers to label visual data manually. Some of popular
objects within job sites [35], and measuring concrete slump for in situ offline annotation tools are LabelImg [65], LabelMe [66], and LabelBox
quality management [46]. The 360-degree cameras were used when [67]. The use of LabelImg was mainly found in annotating images for
large visual fields needed to be covered and the vastness of construction object detection with the bounding boxes and image classification
sites suited such applications [47]. The recent commercialization of [68–70], whereas the LabelMe and LabelBox were mainly used for se­
quadruped robots such as ‘SPOT’ also made them a viable option for mantic segmentation [16] and instance segmentation [21], apart from
visual data collection from construction sites [48,49]. the similar usage like LabelImg. Some researchers have used custom-
As there exist multiple data collection techniques, the project man­ made GUIs written in MATLAB for annotating images for semantic
agers can choose the one that best fulfills their problem-specific re­ segmentation purposes [41]. Roberts et al. [71] developed a web-based
quirements and meets the budget. Once camera type is selected, annotation tool for construction worker’s pose estimation and activity
effective camera placement for fixed cameras, and path planning and analysis. As the annotation task is very time-consuming and requires
waypoints selection for mobile cameras are still quite challenging in real domain knowledge and expertise, researchers also tried a crowdsourcing
construction scenarios. Waypoints are the intermediate points from annotation method using Amazon Mechanical Turk [72]. A similar
where the mobile cameras capture visual data while traversing through method was applied in a study of human-object interaction recognition
a predefined data collection path. Further research in these directions by Tang et al. [73,74] introduced a novel crowdsourcing-based labeling
can be of interest to future researchers. approach for effective construction site safety. VATIC is another online
video annotation tool from Irvine, California, that crowdsources work
3.1.2. Preparation from Amazon Mechanical Turk [75]. This was used by Kim et al. [76] for
The success of DL-based supervised learning models heavily depends annotating videos of tunnel construction. Roberts et al. [77] developed
on the quality and quantity of training data. In the case of visual data an annotation tool for labeling construction equipment with segmenta­
analytics, models need to be trained with images. However, collecting a tion masks and its pose using the Unity 3D game engine. Whatever
large number of construction images is sometimes difficult, because of process may be followed for annotating the data, the final annotations
the limited access to construction sites [50]. The annotation of the need to be reviewed thoroughly by an independent reviewer before
collected images is challenging because it requires a huge amount of moving to the next step. The reviewer needs to check the quality of the
manual effort and skill [50]. Researchers have followed various methods annotations and ensure that there is no missing annotation.
for dataset preparations. Web crawling is an effective method for col­ Researchers so far have created several public datasets and explored
lecting images automatically from the internet. Arabi et al. [51] and methods like crowdsourcing for accelerating the time-consuming human
Xiao and Kang [52] used a similar approach for collecting images of efforts for annotation. However, those datasets are still limited for spe­
construction machinery. Some researchers used 3D models and game cific purposes and more public datasets are required for addressing
engines for generating pre-annotated synthetic datasets of construction various construction challenges. To reduce the data dependency for
entities. Soltani et al. [53] used synthetic data for pose estimation of training and to avoid the labor-intensive annotation process, future
construction equipment. The Unity game engine was used in the study of research needs to explore several semi-supervised learning approaches
Torres Calderon et al. [54] to create synthetic datasets for activity where fewer data are required for training.
analysis. Similarly, Neuhausen et al. [55] trained an object tracking
model using synthetic data of construction worker images created using 3.1.3. Pre-processing
Blender 2.8. Although there exist some common large image datasets for Pre-processing of the data is a precursor to the DL model training.
visual recognition such as ImageNet [56], PASCAL VOC [57], and COCO Resizing the image up to a certain dimension is a necessary step in pre-

8
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

processing. Generally, DL models will be trained faster on smaller im­ input layer, convolutional layers, an activation function, pooling layers,
ages. In addition, many model architectures require the same size im­ fully connected layers, and a classification layer. The main advantage of
ages for training [78]. As customized image datasets may contain images CNNs is parameter sharing, which can control the number of parameters
of several sizes, resizing is required. Sometimes, images may also be and increase the classification efficiency. The convolutional layers are
required to be resized to a square shape. In such cases, the larger useful for feature extraction from the visual data, whereas pooling layers
dimension of the raw image needs to be adjusted to the required sort the important features. They are associated with the nonlinear
dimension and the gap on the side of the smaller dimension may be filled activation functions to enhance the expression ability of the model and
with padding pixels of black or white color or with pixels showing a to deal with non-linear problems. Fully connected layers combine the
reflection of the original image [78]. The resizing dimensions can be features associated with the objects and output to the classification layer
decided based on the selected model’s requirements and the available for classifying the object [85]. The advent of AlexNet [13] in 2012
processing capacity of the system (CPU/GPU). Image enhancement was triggered the use of CNNs for visual data analytics. Since then, re­
found to be beneficial in some applications in improving model per­ searchers have tried several CNN models for solving visual recognition
formance by highlighting certain features in the images. Common tasks. Table 5 provides a list of CNN models used for image classification
image-enhancement techniques may include color augmentation [64]; in construction applications. It also highlights the target classes. It was
changing brightness, contrast, sharpness, hue, and saturation values found that safety hazard identification [86] and defect detection and
[79,80]; and application of different image-processing techniques classification [87] from images was the major early application of DL-
[21,28]. Image enhancement through DL is also found in recent studies based visual data analytics methods in construction. However, in due
[81]. As preparing a customized dataset with a large number of images is course, researchers have classified images by construction equipment
often difficult, image augmentation is found to be a feasible solution. [88], construction materials [89], equipment breakdown condition
The number of training samples can be boosted by adding augmented [90], building fabrics [91], and progress stage of construction [92].
images. Researchers have tried different image-augmentation tech­ Although image classification is the very fundamental task to be
niques such as cropping, flipping, rotating, translating, and shearing solved in CV, it alone cannot provide the spatial and semantic infor­
[80,82]; temporal jittering [54]; noise adding to the training images mation of the target object. Hence the importance of object detection,
[21]; and so on. Recent studies have also used generative adversarial which not only identifies the object classes but also locates the object
networks (GAN) for augmenting images by removing and inpainting class inside a given image [98]. Therefore, for the obvious reason, this is
pixels [83]. Although the augmented images look very similar to the the prerequisite for initiating many other CV tasks such as object
original image to the human eye, computers perceive them as tracking [42], pose estimation [99], and activity recognition [18].
completely different images because of the change in RGB values and These object detection models are divided into two categories: two-
pixel locations [79]. Once the dataset is ready after all the pre- stage detection models and single-stage detection models. The region-
processing tasks, it is divided into mutually exclusive training, valida­ based CNN (R-CNN) and its variants (Fast R-CNN and Faster R-CNN),
tion, and test sets. As there is no fixed ratio for the dataset splitting, and the region-based fully convolutional network (R-FCN) come under
different researchers have tried different ratios. Kim [24] comprehen­ category one, whereas You Only Look Once (YOLO) and its variants
sively reviewed model performances corresponding to different splitting (YOLO V2, V3, and V4), and single-shot multi-box detector (SSD) falls in
ratios which were used in various visual data analytics research. In the second category [50]. The main difference between the two-stage
general, the training set needs to contain a much higher number of and single-stage detection models is the concept of the region proposal
images than the validation and test sets as more training data provides network (RPN). After determining the image features in the first stage,
better learning opportunities to the model. the RPN in the second stage directs the model to search for target objects
Along with the commonly used data augmentation techniques, in some specific regions by providing multiple options for bounding
future research can be benefitted from the DL-based image enhancement boxes [98]. The detection speed of R-CNN models [100] was improved
and GAN applied image augmentations. These techniques may address further by improving the RPN with region-of-interest (ROI) pooling in
the illumination challenges faced by the CV methods in general. Fast R-CNN [101] and by introducing anchor boxes in Faster R-CNN
[102]. The RPN in the region-based methods works as the backbone
3.1.4. Model selection architecture. Dai et al. [103] proposed R-FCN with a more accelerated
One needs to choose the model that best fits the problem to be solved. detection speed than Faster R-CNN. Because of the two-stage operation,
Thus far, there exist different DL models for solving various CV tasks. region-based detection models were found to be more accurate than the
The prior experience of construction professionals will be handy for
identifying the CV tasks that need to be solved for extracting the
Table 5
required knowledge for solving the pre-defined problem. The CV tasks List of image classification models used in construction applications.
that have been solved by the researchers through DL-based visual data
Model name Target class & references
analytics in various construction management applications can be
divided into six major groups: image classification, object detection with AlexNet Hazard [86], Equipment [88], Defect
the bounding box, object detection with semantic segmentation, activity [87]; [93]; [94], Building fabric [91]
GoogLeNet Defect [87]; [94], Building fabric [91]
recognition, object tracking, and pose estimation. Individual solutions VGG-16 Equipment [88], Defect [87]; [93]; [94],
for these tasks can be found in simple applications. However, a combi­ Construction Materials [89], Building
nation of solutions for different tasks is needed for solving more complex fabric [91]
problems. For example, object detection alone was useful for confirming VGG-19 Defect [87]; [93]; [92]
ResNet-12, 18, 34 Defect [95]
the PPE usage of workers at a site [84] but for analyzing the productivity
ResNet-50 Defect [87]; [93]; [94], Progress state
of an earthmoving operation, a combination of object detection, object [92], Rock [96]
tracking, and activity recognition was required [52]. Different re­ ResNet-101, 152 Defect [87]; [93]
searchers have tried different DL models for visual data analytics with Inception-v3 Equipment breakdown [90], Defect [93]
three major neural network architectures namely convolutional neural Inception-v4, Inception-ResNet-v2, Defect [93]
DenseNet-121, DenseNet-169,
networks (CNN), recurrent neural networks (RNN), and generative ResNeXt-50-32 x 4d, ResNeXt-101-32
adversarial networks (GAN) for solving these tasks in the context of x 8d, Wide-ResNet-50-2, Wide-
solving the bigger problem of a construction project. ResNet-101-2
A CNN is a neural network architecture with multiple layers used for MobileNet, Xception Progress state [92]
Customized CNN Defect [97]; [64]
visual recognition and classification tasks. A typical CNN consists of an

9
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

single-stage counterparts. However, the single-stage models such as significantly over the past few years, occlusion, low light condition, the
YOLO and SSD outperformed them in terms of processing speed and brightness of the images, and shape, color, scale, and orientation of the
became the choice of the researchers for real-time applications [104]. objects still affect the detection model’s performance. Continuing
The YOLO algorithm uses one CNN for feature extraction and bounding research for solving these challenges is still needed.
box generation at the same time by evaluating the whole image [105]. Bounding-box-based object detection is powerful for many applica­
Although this model had the limitation of small-scale object detection, it tions, however, some demand segmentation of the image pixel by pixel
was further improved in its latest versions. In 2016, Liu et al. [106] for inferring more semantic information of the object in addition to its
proposed SSD, which had greater accuracy and detection speed than location and identity [21]. These segmentation methods are classified
YOLO, and was developed by combining the bounding-box regression of into semantic segmentation and instance segmentation. The latter is
YOLO and the anchor box of Faster R-CNN. The single CNN responsible more powerful for instantiating between the same object classes. Mask
for the entire object detection task is considered the backbone of the R-CNN [140], U-Net [141], and DeepLabV3 [142] were found to be
single-stage models. Table 6 provides a list of object detection models some of the popular models for image segmentation. Mask R-CNN is an
and their backbone CNNs used in construction applications. extension of Faster R-CNN that was made capable of instance segmen­
Researchers have used these models for detecting construction en­ tation by adding an additional branch for predicting the object mask
tities such as workers [73], equipment [107], personal protective [140]. On the other hand, U-Net and DeepLabV3 use a downsampling
equipment (PPE) [70], defects [108], construction materials [109], and upsampling approach for image segmentation through encoder and
construction waste [110], rebar [111], buildings, structural elements of decoder CNNs. The downsampling path helps in identifying the object
a building [112], building information model (BIM) elements [113], and class and the upsampling path helps in precise localization. To improve
so on from visual data. These detection results were used in various the segmentation accuracy researchers incorporated depth information
independent applications or combined applications with other CV tasks. using two-stream fusion-based CNNs. Fuse-Net [143] is an example of
Apart from the specified CV tasks, CNN-based detection models were that type of model. Construction researchers have used semantic/
also used in optical character recognition (OCR) for earthmoving oper­ instance segmentation methods for various object classes such as
ations monitored by checking the number plates of the construction building structural components [19], construction materials [16], con­
vehicles entering and leaving the site [5]. crete slump [46], BIM elements in synthetically generated images [44],
Although object detection algorithms have been improved building in satellite images [144], urban scenes [145] and so on. Table 7
provides a list of popular CNN-based semantic/instance segmentation
Table 6
methods used in construction applications.
List of object detection models used in construction applications. Semantic segmentation and instance segmentation models face
similar challenges like object detection and need further improvements.
Model name Backbone Target object and reference
architecture
Also, labeling a segmentation dataset with polygon annotation is much
more tedious than that with bounding boxes. Future research may look
R-FCN Not Specified Equipment [76]
into possible solutions for automating this process. Additionally, active
ResNet-50 Equipment [114]
R-CNN ResNet-50 Screw [115] learning or other semi-supervised learning approaches which is mostly
Faster RCNN Not Specified Equipment [68]; [70]; [116]; [117]; [82]; investigated for object detection may draw the attention of future
Worker [70]; [116]; [63]; [118]; [117]; [119]; researchers.
PPE [70]; [120]; [63]; Defect [20] Object tracking is useful for estimating the position of the object
ResNet-50 Worker [73]; [109]; [99]; PPE [73]; Material
[109]; Equipment [73]; [109]
spatio-temporally. Additionally, it can also find the trajectory of
ResNet-101 Defect [98]; Worker [36]; Equipment [36]; movements of the object and predict its future position [8]. This method
[17]; Building [121] received special interest in construction management research for
ResNet-152 Worker [122] identifying human-object interaction for safety management purposes
VGG-16 Defect [79]; [80]; Worker [107]; Equipment
[42,107], and for tracking construction machinery [59] and workers
[107]
ZF Net Construction Waste [110]; Defect [79]; [80] [150] for productivity enhancement. The automatic tracking-by-
Model name Backbone Target object and reference detection methods have become very popular among researchers for
architecture solving this task as it has already overcome many challenges like the
YoloV2 Darknet Hoisting Hooks [123], Equipment [78]; [69]; ‘Cold Start’ of the conventional tracking methods [23]. ‘Cold Start’ re­
Worker [78]; [69]; Building [78] fers to the phenomenon where the user needs to specify the object of
YoloV3 Darknet53 Equipment body parts [124]; PPE [125];
[126]; [84]; Equipment [127]; [68]; [78];
[104]; [42]; [50]; Worker [127]; [128]; [23]; Table 7
[78]; [129]; [42]; [18]; Rebar [111]; Building List of semantic/instance segmentation models used in construction
[78], Road manhole cover [130] applications.
YoloV4 Darknet53 Rebar [111]
Model name Backbone architecture Target object and reference
SSD VGG-16 Defect [108]; Equipment [131]; PPE [61];
[132]; [133]; Worker [133] Mask R-CNN Default CNN Structural Support [146]; Worker
DenseNet Defect [108]; Building Structural Elements [16,31,146]; Equipment [16,31];
[112] Vehicle [147]; Building; Structural
ResNet50 Defect [134] Elements [19]; Construction Waste
MobileNet Equipment [51]; PPE [135] [21]; Concrete Slump [46]; Material
Not Specified Equipment [68]; Falling Object [136]; Precast [16],
element [137] Mask R-CNN ResNet101 Buildings [144]
Customized SqizeNet PPE [138] U-Net Encoder: ResNet Defect [22]
CNN MobileNet PPE [138] Not Specified Defect [45]
AlexNet Defect [40] DeepLabV3 Encoder: MobileNetV2 Defect [41]
VGG-16 BIM Element [113]; PPE [85] Encoder: ResNet-18, Defect [148]
ResNet-50 Equipment [32] Mobile Net V1, Cross
ResNet 101 Equipment [32] Net
InceptionV3 Equipment [32] CNN with Encoder: VGG Net Defect [149]
Xception Equipment [32] Encoder-
DenseNet PPE [85] Decoder
Not Specified PPE [139]; Worker [35] FuseNet Encoder-VGG Net BIM Elements [44]

10
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

interest to initialize the tracking process. In the case of CNN-based the illumination. Later the machines were detected using a YOLO-v3
tracking methods, an object detection algorithm detects the object fea­ detector and tracked using Kalman filter-based tracking.
tures, and the tracking algorithm matches the features in different Although researchers have tried several visual tracking algorithms,
frames and assigns unique tracking ID numbers [81]. Table 8 shows a list their performance is heavily affected by false detection, severe occlu­
of tracking models with their model bases used by construction re­ sion, cold start, and identity switching. Integrated application of sensor-
searchers. Simple Online and Real-time Tracking (SORT) and its modi­ based technologies and vision-based tracking needs more attention from
fied versions were frequently used in multi-object tracking applications researchers to tackle these challenges.
for their lower computational cost and higher processing speed [17]. The pose estimation of construction entities can provide valuable
Some CNN-based tracking models such as MD Net [150] for single- information about their work pattern which can be beneficial for
object tracking, Deep SORT [107], and Tubelets with CNNs (T-CNN) different CM applications. The CNN-based models play an important
[59] for multi-object tracking were very successful in construction role in estimating the 2D or 3D joint positions of the object skeleton.
applications. These models have been heavily used in construction applications for
Xiao and Zhu [156] conducted a comparative study on 15 state-of- worker’s workload assessment [60], ergonomic assessment [161], and
the-art 2D visual tracking methods in construction scenarios and pre­ for equipment’s pose estimations for avoiding collision with nearby
sented some interesting outcomes. These methods were tested in objects [33]. Table 9 shows the list of pose estimation models used in
different scenarios such as varying lighting conditions of day and night, construction. It is observed that the Stacked Hourglass Network (SHG)
occlusion, scale variations, and background clutter. The comparison [18] and Cascade Pyramid Networks (CPN) [16] were used most
results highlighted that the methods named adaptive structural local frequently. However, a more sophisticated CNN model, OpenPose,
sparse appearance model (ASLA), tracking via sparse collaborative [162] which is capable of multi-person pose estimation, was also tried
appearance model (SCM), tracking via multi-task sparse learning (MTT), by construction researchers recently [163].
L1 tracker using accelerated proximal gradient approach (L1APG), cir­ As the pose estimation models depend on the keypoint detection, it
culant structure of tracking-by-detection with kernels (CSK), and dis­ also faces problems like object detection models. Additionally, some
tribution fields for Tracking (DFT) were found to achieve good postures cause severe self-occlusion and lead to false detection or miss
performance in both accuracy and robustness. The study also high­ detection of some keypoints. For equipment pose estimation, keypoints
lighted that the accuracy of the DL-based tracking methods such as associated with the relatively static body parts are predicted less accu­
tracking via convolutional networks without training (CNT), hierarchi­ rately [164]. These are some of the open challenges in pose estimation
cal convolutional features for tracking (CF2), and tracking learning that need further attention.
detection (TLD) was relatively low. However, these methods could track Activity recognition/action recognition is a fundamental method in
objects effectively even in severe occlusion conditions. A recent study by applications related to vision-based construction productivity moni­
Xiao et al. [81] tracked construction machines at night time. All frames toring research [54]. Project managers could benefit from the valuable
of the nighttime video were processed with GLADNet [160] to enhance information about resource usage at construction sites. The CNN-based
models were found to be more robust in the complex environment or
group activity recognitions than the earlier conventional models [169].
Table 8
As temporal features are the key to understand the state of any con­
List of object-tracking models used in construction applications.
struction activity, CNN models combining spatial and temporal features
Model name Model basis Target object & were mainly used for activity recognition. Researchers have used many
reference
such CNN models in their applications. Multi-Scale Temporal CNN (MS-
SORT Kalman-filter-based tracking Worker [133]; TCN) was used by Torres Calderon et al. [54] for earthmoving opera­
method [16]; [151];
tion’s productivity analysis, Temporal Segment Networks (TSN) [150]
Equipment [16]
Modified SORT Improved SORT for tackling Equipment [17] and i3D [18] models were used for worker’s activity recognition. Few
ID switch issue other customized 3D CNN models were also used for these purposes
DeepSORT CNN with Kalman-filter- Worker [107]; [23,52]. Table 10 provides a list of activity recognition models that were
based tracking Equipment [107];
used in construction applications.
[17]
SORT + KCF High-speed tracking with Worker [23]
In most cases, activity recognition of a single object in each frame
kernelized correlation filters was studied. However, real construction application scenarios may
(KCF) [152] appear where multiple entities are involved in different activities. Also,
Construction Machine IoU and image hashing based Equipment [52] very few studies have considered object-to-object interactions while
Tracker per frame detection matching
analyzing the activities. Future research may investigate these limita­
T-CNN: Tubelets with Convolutional-network-based Equipment [59]
Convolutional Neural tracker (FCNT) for multi- tions and provide solutions that are more real-life application-ready.
Networks object tracking [153] RNN is a type of neural network that works well with sequential data.
MD Net [154] CNN-based method for single- Worker [150] It is widely used in machine learning for processing time-dependent
object tracking
Tracking via Convolutional A CNN-based method that Equipment [156]
Networks Without doesn’t require training
Training (CNT) [155] Table 9
Hierarchical convolutional CNN-based method Equipment [156] List of pose estimation models used in construction applications.
features for tracking
Model name Model basis Target object & references
(CF2) [157]
Social GAN with Generator Combined with LSTM for Worker [42]; [158] Stacked Hourglass A CNN based method Worker [18]; [37]; [165]; [60];
and Discriminator trajectory prediction ; Equipment [42], Network Equipment [33]; [166]; [167]
network of LSTM [158] Cascade Pyramid A CNN based method Equipment [16]; [33,164];
Customized Algorithm RNN based on temporal Worker [35] Network Worker [16]
attention pooling Body 25 A CNN based method Worker [168]
LSTM-based trajectory Precast Element Convolutional Pose A CNN based method Worker [161]
prediction [137] Machine
Kalman-filter-based tracking Worker [55]; [128] Open Pose [162] A CNN based method Worker [163]
method Customized CNN A CNN based method Worker [99]
Based on tracking learning Equipment [156]; Gated Recurrent An RNN based method Equipment [164]
detection (TLD) [159] [82]; [39] Unit for Pose Forecasting

11
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

Table 10 high processing capacity [59]. The processing speed of any selected
List of activity recognition models used in construction applications. model highly depends on processing resources such as central processing
Model name Base model Target object & units (CPUs) and GPUs. A system equipped with Intel i7 and above
references equivalent processors and the latest NVIDIA GPUs would be recom­
Dense Trajectory Estimation Hidden Markov Equipment [59], mended for a satisfactory outcome.
models-based tracking
method 3.1.5. Model training
MS-TCN (Multi-Scale A CNN Equipment: [54] Model training is the most crucial part of this entire workflow. In this
Temporal Convolutional
Neural Network)
stage, the selected models are trained with the training datasets. Based
Temporal Segment Networks Based on Spatial and Worker [150] on the availability of labeled training data, the DL training strategies are
(TSN) Temporal ConvNets chosen. These DL training strategies for visual data can be broadly
[170] classified into three categories: supervised learning, semi-supervised
i3D 2D ConvNet inflation Worker [18]
learning, and unsupervised learning. In the case of the availability of a
[171]
Customized CNN 3D ResNet Equipment [172] large amount of labeled data, one can choose the supervised learning
VGG-16 Worker [169] strategy. However, with very few labeled data or with no label data,
3D ResNeXt Equipment [52]; semi-supervised learning or unsupervised learning strategies, respec­
Worker [23] tively, may be used. With supervised learning, the model can predict
Customized CNN + LSTM CNN LSTM Hybrid Worker [173], [136];
Network Equipment [82]; [39];
only those classes for which it is trained. Several training methods were
[17] tried by researchers for supervised learning, and the most promising was
found to be the transfer-learning [178]. To train the DL models from the
scratch, one needs to feed in a huge amount of labeled input data.
data. For visual data analytics, the use of RNN on video sequences was However, in practice, acquiring such huge labeled datasets for con­
found to be very useful for extracting temporal features. The main struction project-related applications is very time-consuming and
advantage of an RNN is that it can remember information of the previ­ cumbersome [50]. Researchers have found that, through transfer
ously processed video frames and provide a required prediction result learning, models can be initially trained with large public datasets, such
for the next frames. This property was found by researchers to be very as ImageNet [56], PASCAL VOC [57], and COCO [58], to learn the low-
significant for object trajectory predictions, pose prediction, and activity level visual features [179]. The learned weights can be further trans­
recognition applications. Wei et al. [35] used an RNN-based temporal ferred to the target models. This approach can significantly reduce the
attention pooling method for reidentifying workers at construction sites time and resources required for training [114]. Most of the recent
for checking unsafe behaviors. Hegde et al. [90] analyzed the break­ construction-related research took advantage of this method [124,132].
down conditions in a rig connection process through a CNN–RNN The target models then need to be trained with new datasets specific to
combination. LSTM is a type of RNN with a more sophisticated archi­ construction applications. During this process, the model is tuned to
tecture. LSTM units include a “memory cell” that can maintain infor­ make it fit for the specific task, and the last layer of the model is replaced
mation in memory for longer periods. A set of gates is used to control with the new class definitions as per the construction-specific applica­
when information enters the memory, when it is outputted, and when it tion. The fine-tuning process generally modifies the hyperparameters
is forgotten. This architecture lets them learn longer-term dependencies associated with the neural network, such as initial network weight,
[82]. An LSTM-based object tracking method was applied by Zhang et al. depth of the model, nonlinear activation functions, learning rates, reg­
[137] for automatic alignment of a precast concrete element. Combined ularization parameters, epoch, and batch sizes [180]. For transfer-
CNN-LSTM double-layered models were frequently applied for action learning cases, the target network’s weights are generally initiated
recognition of construction entities for productivity monitoring [39], with the pre-trained model weights. The learning rate governs the
accident prevention [136], human-object interaction [173], and site learning speed of the model. The regularization methods such as drop
management [174]. GRUs are similar to LSTMs but use a simplified out and max norm are deployed to avoid overfitting. The epoch number
structure. They also use a set of gates to control the flow of information, indicates the number of cycles that the model will traverse through the
but they do not use separate memory cells, and they use fewer gates. whole data. Additionally, the model performance can be optimized by
GRUs are well known for their low computation costs. Luo et al. [164] trying different optimizers such as Stochastic Gradient Descent, Adam,
used a GRU network for pose forecasting of excavators to avoid collision RMS Prop, etc., and by implementing different classifiers such as Ada­
accidents. The RNN models used for object tracking, pose estimation, boost, support vector machines (SVM), NavieBayesian, and SoftMax.
and activity recognition are shown in Tables 8 to 10, respectively. Changing these parameters significantly affects the training time and the
GAN was introduced by Goodfellow et al. [175]. This is mainly GPU memory usage. The ultimate objective of the training process is to
studied for unsupervised learning. This network consists of two minimize a loss function such as cross-entropy loss, hinge loss, Kullback
competing neural networks, namely a generator and a discriminator. Leibler Divergence Loss, or any other suitable one according to the
The first network (the generator) has the goal of generating a data problem. For further details about the neural network training pro­
sample indistinguishable from ground-truth data. The second network cesses, one may refer to the book written by Goodfellow et al. [180]. The
(the discriminator) is asked to decide whether the generated sample is availability of recent DL software such as PyTorch [181], TensorFlow
close to the ground-truth data or otherwise discriminate it. Kim et al. [182], Keras [183], Caffe [184], and Theano [185] have eased its ap­
[42] used Social GAN (S-GAN) [176] for predicting construction plications in various domains in addition to computer science, especially
equipment’s trajectory in the situation of a robotic construction envi­ in construction [61,124,164,173].
ronment to avoid any collision accidents. Both generator and discrimi­ The collection and annotation of a large amount of data from con­
nator networks of S-GAN consist of LSTM units for processing sequential struction sites are always time-consuming and labor-intensive. Also,
data. In this case, the generator predicted the trajectories and the construction projects in different phases require different resources.
discriminator evaluated the prediction results. The application of GAN Training DL models for different phases with huge image databases is
with CNN-based generator and discriminator was found in image difficult and costly [68]. To overcome these challenges, researchers are
augmentation [83] and image resolution enhancement [177] for constantly looking for alternatives where a DL model can be trained with
improving object detection accuracy. In these cases, GAN was used to a small amount of quality data to achieve equivalent or better perfor­
reconstruct the missing pixels in the images. mance than a model trained with a huge quantity of data. Semi-
These DL models are generally computationally heavy and require supervised learning strategies are found effective for training models

12
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

with less amount of data. Two promising learning methods namely the Table 11
deep active learning approach and the few-shot learning approach are List of activity recognition models used in construction applications.
mainly found in recent studies. Kim et al. [68] used deep active learning Evaluation metric Representations
for vision-based monitoring of construction sites. The deep active
Confusion Matrix
learning approach took unlabeled data as input and calculated the un­
certainty of the data. The top 10% of data with the highest uncertainty
were labeled by a human annotator in each stage. An object detection
model using Faster R-CNN was trained with the labeled images. The
result showed that the training with as low as 180 images could achieve A matrix representation of the predicted results
Accuracy TP + TN
80% mean Average Precision (mAP). A similar semi-supervised learning
TP + FP + TN + FN
approach was observed for defect detection [186,187] and in urban Precision TP
scene segmentation [145]. In the case of the few-shot learning approach TP + FP
Recall/Sensitivity/True TP
adopted by Kim and Chi [188], the model learned meta-knowledge from Positive Rate (TPR) TP + FN
the annotated images of base classes and used that information to learn Specificity TP
novel classes. The few-shot learning model consisted of three modules: TN + FP
False Positive Rate (FPR) = FP
meta-knowledge extraction, class attention, and prediction. In the meta- 1 − Specificity TN + FP
knowledge extraction module, the model learned general visual features F1 Score 2*Precision*Recall
that represent the characteristics of various construction objects. The Precision + Recall
It highlights the balance between precision and recall
class attention module helped in learning class-related features from the
and a suitable measure of models tested with
input images. Finally, the prediction module classified the object types imbalanced datasets
and localized their bounding boxes. To generate few-shot scenarios for Precision-Recall or PR Curve A trade-off curve between precision and recall. The
training, the base classes were randomly split into k-ways with n-shots. higher area under the curve represents better
With only 20 training images of an unseen construction object, the few- performance.
shot object detection model could perform with 73.1% mAP. Receiver operating It is a plot of TPR vs. FPR. It illustrates the diagnostic
characteristic (ROC) ability of a classifier system.
In the case of the nonavailability of labeled data, unsupervised Curve
learning approaches learn patterns from the unlabeled data and cluster
them together. The k-means clustering is a prominent method in unsu­
pervised learning. It clusters the instances based on their Euclidean performances are evaluated through a different set of metrics. The
distances. Czerniawski et al. [189] used Density-Based Spatial Clus­ multiple-object detection models are evaluated through the average
tering of Applications with Noise (DBSCAN) and k-means clustering to precision (AP) of each class and the mean average precision (mAP)
detect and classify planner objects from as-built building point cloud [17,124]. Some additional evaluation metrics that confirm the perfor­
data. The technique has potential application in facility management mance of the semantic segmentation models are pixel accuracy, mean
and construction progress monitoring. pixel accuracy, and mean intersection over union (mIoU) [22]. The pose
So far, most of the DL applications in construction management estimates are checked for normalized error (NE) and percentage of
leveraged supervised learning where transfer learning is the most correct key points (PCK) [33]. The tracking model’s performances are
adopted approach. The high average citation score of the keyword confirmed through ID switch (IDSW), multiple-object tracking accu­
‘transfer learning’ also highlights that. Although there exist difficulties racies (MOTA), multiple-object tracking precision (MOTP), the number
in data collection and annotation of construction images, very few of ground-truth trajectories (GT), the number of most-tracked trajec­
studies have explored semi-supervised and unsupervised learning ap­ tories (MT), tracking speed in frames per second (FPS) [52], average
proaches. However, to overcome the data collection and annotation overlap score (AOS), center location error ratio (CER), and tracking
difficulties, future research in the AEC industry may need to focus more length (TL) [55]. Along with the standard evaluation metrics, row-
on exploring these methods. To reduce the annotation load further, the normalized confusion matrices are used for checking the activity
active learning approach linked with other labeling efforts, such as recognition models [18].
crowdsourcing or Synthetic dataset, can be explored in the future. Although these metrics are useful for evaluating the model’s per­
formance, often the solution provided against a construction problem
3.1.6. Validation and testing cannot be evaluated due to the lack of proper evaluation metrics, e.g.
During the training, the model’s performance is checked through a k- Luo et al. [23]. Another line of future research may consider the
fold cross-validation strategy [17] before deploying for testing. In this formulation of evaluation metrics specific to construction applications.
case, the entire training dataset is randomly divided into k number of
subsets, where (k − 1) sets are used for training and one set is used for 3.1.7. Deployment
validation. The validation sets results are useful for updating higher- For the successful implementation of DL-based visual data analytics
level hyperparameters during the training. The performance of the for efficient construction management, selecting a proper deployment
model is calculated as the average of the performance achieved at the strategy is necessary. With the advent of modern computing systems,
end of the entire training with a specified number of epochs. Once several options for deploying the DL model in construction projects are
satisfactory performance is obtained through validation, the model is available today. Some of the most adopted options are deployment on
applied to completely unseen test data for prediction. Finally, the pre­ GPU-enabled desktop or laptop computers, on smartphones, on edge
diction results of different CV tasks are evaluated through various computing devices, and on cloud servers. These options can be broadly
metrics. Some of the most common metrics [174] with their represen­ classified into two paradigms: cloud computing and edge computing. In
tations are shown in Table 11. These evaluation methods were the case of cloud computing, data are collected from the edge of the
frequently used by construction researchers. network (construction sites) through various sensing devices and sent to
Here, TP stands for true positives, i.e., predicted positive results that a cloud server for storing, processing, inferring, and decision making. On
are actually positive; FP is false positives, i.e., predicted positive results the other hand, edge computing carries out the storing, processing,
that are actually negative; TN is true negatives, i.e., predicted negative inferring, and decision-making tasks on the edge of the network itself.
results that are actually negative; and FN is false negatives, i.e., pre­ Internet is used as the medium of communication in these processes.
dicted negative results that are actually positive. Both strategies come with their respective pros and cons. Latency, data
Apart from these common evaluation metrics, different CV task’s privacy, network connectivity, and cost are some of the important

13
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

factors that influence the choice of deployment strategy. The deploy­ accelerator, was attached with the R. Pi for supporting it with DL
ment of DL through cloud computing is relatively fast and easy to set up, inference computation. Jetson Nano and R. Pi with NCS showed sub­
but latency, data privacy, and high cost are often considered as major stantial normalized benefits. The use of Jetson TX2 and Jetson Nano was
limitations. On the contrary, edge computing takes advantage of its recommended for more real-time applications such as construction
operations on the edge where the to and fro data transfer delay, cost of safety for their high inference speed and R. Pi with NCS was recom­
data transfer can be reduced significantly and more secure data control mended for semi-real-time applications or batch inference applications
policies can be adopted by the local management authorities [190]. such as construction productivity, and progress monitoring. Wang et al.
Several devices can be used as an inference system on the edge such as [192] deployed a semantic segmentation model through a mobile robot
desktop computers, smartphones, and efficient edge computing devices. equipped with an RGB-D camera and R. Pi computer for visual under­
Often the use of desktop computers requires connectivity through op­ standing of construction sites. For deploying DL models through cloud
tical fiber cables and the installation and setup of such a network may computing, leading cloud providers offer dedicated services such as
involve high cost. Smartphones can also be used as an edge computing Amazon’s AWS SageMaker [193], Microsoft’s Azure Machine Learning
device for their high inference speed. Maeda et al. [191] used smart­ [194], Google Cloud’s Vertex AI [195], and so on. However, they come
phones for detecting and classifying road damages. Currently, there are with certain expenses. For example, Arabi et al. [190] reported the cost
different edge computing embedded devices that range from simple of 20 hours of computing time and 890,000 batch predictions on
computing devices to highly efficient GPU-enabled devices for running Amazon Machine Learning was more than $90 USD. In addition, Kar­
DL models. After training the DL model on the computer, it is converted aaslan et al. [186] deployed a DL-based semantic segmentation model
to a lighter model for loading into mobile or embedded devices. Further, on a wearable holographic headset and handheld mixed reality (MR)
the model is optimized by quantization for efficiency. TensorFlow Lite, device for infrastructure defect inspection. The inference was supported
PyTorch Mobile are some of the software packages that help in this by cloud computing.
conversion process. Arabi et al. [51] compared the performance and cost Although the deployment of the DL model is very important for vi­
of three such embedded devices, namely NVIDIA Jetson TX2, NVIDIA sual data analytics on construction sites, so far very few studies of the
Jetson Nano, and Raspberry Pi 3B+ with Intel NCS, for object detection AEC researchers have focused on that. The maturity of the research on
at construction sites. While the first two are NVIDIA GPU-enabled sys­ the model development side and the advancement in recent computing
tems, Raspberry Pi (R. Pi) is a high-performing edge computer for basic systems have created a huge opportunity for future research in this di­
computing tasks. Intel Neural Computing Stick (NCS), a DL inference rection. In the current scenario, for construction applications, model

Tunnel boring Hazardous area


identification
Earthmoving Unsafe condition
operation Equipment checking Safety guard rail
productivity detection
Rig connection
process
Collision

Ceiling tile Robots to improve


installation productivity Struck by
Construction Accident
Management prevention
Workmen
Floor tilling work productivity Applications Fall on ground

Brick laying In-situ quality Fall from height


Construction Human object
control
Productivity interaction
Plastering Onsite/ Offsite Physical fatigue
construction Construction Safety assessment
quality inspection Ergonomic
Concrete related
assessment
activities Quality
Joint level
Post construction Management
workload
Reinforcement quality assessment Construction Waste assessment
steel work Management
Glass facade
Waste Facilities PPE wearing
Formwork & identification & Management Unsafe action
scaffolding related collection by Non-certified
checking
Masonry structure activities robots Progress work
Monitoring Safety rule
Damage/ Defect/ Segmentation of violation
Building element
Timber structure Crack detection building point Other Applications detection (Beam,
cloud
Column, etc)
Loading
assessment
Structural health
Concrete structure monitoring Building material
Analysis of time
classification
series sensor data
Workspace
Steel structure Robots for SHM planning Completion stage
assessment
Building/Road
detection
Road work
Urban scene Urban planning progress
segmentation

Fig. 7. DL-based visual data analytics applications in construction management.

14
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

deployment through embedded systems offers more benefits than cloud- 70 14


based deployment in terms of cost, latency, and data privacy.

Cumulative number of publications (OD)

Cumulative number of publications


60 12

4. Evolution of construction-management-specific applications 50 10 OD-OT


OD-AR
40 8
DL-based visual data analytics have been widely used in construction OD-OT-AR
management applications in the last five years. This study identified six 30 6 OD-PE-AR
major fields and 52 subfields of applications through the in-depth review 20 4 OD-OT-PE-AR
of the selected papers. The major application fields include safety OD
10 2
management (44%), productivity management (24%), facilities man­ Expon. (OD)

agement (19%), progress monitoring (5%), quality management (3%), 0 0


2016 2017 2018 2019 2020
construction waste management (2%), and other applications (3%). The Year of publication
percentages in parentheses for each application indicate their pro­
portions of the entire collection of papers. Fig. 7 shows a tree-diagram Fig. 9. Importance of maturity for complexity evolution.
visualization of those applications. As previous reviews have already
discussed various CV applications [24,196] and DL applications in integrated. Some recent applications where these integrations can be
construction [26,27], this study mainly focuses on highlighting the observed are discussed next.
evolution of DL-based visual data analytics research related to con­ As safety management is a crucial requirement for smooth manage­
struction management. To understand the evolution in terms of ment of construction projects, it was studied most often by researchers.
complexity, the selected papers were classified based on the CV tasks Fang et al. [16] proposed a method for localizing construction-related
performed in each construction application. entities such as equipment, workers, and materials from monocular
In Fig. 8, the proportions of publications representing each CV task vision data by combining it with prior knowledge and semantic infor­
involved in a construction application are plotted against year of pub­ mation. The study leveraged semantic segmentation, tracking, and pose
lication. It is clearly observed that early applications until 2017 estimation models and proposed a possible application in the develop­
attempted to solve simpler problems dealing with a single CV task such ment of a real-time early-warning safety system that could reduce the
as image classification [86,88] or object detection [139]. However, in chances of accidents. As earthmoving operations are a basic necessity for
the later applications starting from 2018, researchers focused on solving any construction project, improving the productivity of such operations
more complex problems integrating two or more CV tasks simulta­ will greatly impact the project schedule and cost. In recent years, re­
neously [107,109]. Recent research papers have even integrated three searchers have proposed several methods for improving this by solving
or more CV tasks to generate more semantic insights from the visual data simultaneously three CV tasks: object detection, tracking, and activity
[18,82]. Furthermore, they have analyzed them for extracting relevant recognition. Roberts and Golparvar-Fard [59] and Xiao and Kang [50]
information to improve construction management practices proposed end-to-end methods for detection tracking and activity
[17,50,174]. recognition of excavators and dump trucks. However, the interaction
As object detection or semantic/instance segmentation works as the between machines was not considered in their studies. Later Kim and Chi
basis for initiating other CV tasks, a considerable amount of research [82] used a CNN+LSTM-based sequential pattern analysis technique for
effort has been spent on improving these methods. This is evident from improving an excavator’s performance and operation cycle time. The
the higher proportion of such research, even recently. The exponential proposed method contributes to the automated action recognition and
growth trend of object detection-related paper publication can be seen in operation analysis of excavators. The method was improved by Kim and
Fig. 9. Once applications involving a CV task attained sufficient matu­ Chi [39]. In that study, a multi-camera-based approach was adopted to
rity, they evolved to integrate other CV tasks for solving more complex monitor the interactions between excavators and dump trucks during
challenges. From Fig. 9, it can be observed that, after attaining certain the excavation process. Lin et al. [17] analyzed the operation cycle and
maturity in object detection research, researchers have focused on the interaction between machines to highlight irregular activities using
integrating object tracking and activity recognition with object detec­ a line chart. Project managers can inspect those irregular activities to
tion. These individual research areas have evolved at their own pace. take corrective actions for improving the operation process. Whether for
However, again with a certain maturity, these areas started mingling monitoring construction workers’ productivity or ergonomic assess­
with each other and evolved to more complex scenarios where object ment, information about a worker’s pose and activities being performed
detection, tracking, pose estimation, and activity recognition tasks are is necessary. Roberts et al. [18] applied object detection, pose estima­
tion, and activity recognition methods on RGB video footage containing
100%
construction workers for analyzing brick work and plastering activity.
90%
OD-OT-PE-AR Workspace planning is useful for effective site management. Luo et al.
80%
SS/IS-OT-PE
[6] detected and tracked workers and identified their activities to align
Proportion of publications

OD-PE-AR
70%
OD-OT-AR
them in the dynamic construction workspace. Furthermore, the work­
60%
OD-AR space was visualized in accordance with the site layout plan. This
50% OD-OT arrangement helped project managers to focus on active site locations to
enhance performance and ensure safety. Understanding the semantic
40% OCR
30%
20%
SS/IS
relationships between various construction entities from visual data and
PE
10% AR
applying them for job-site management purposes has recently gained the
0% OD attention of researchers. Liu et al. [174] tried to describe the construc­
2015 2016 2017 2018 2019 2020 IC tion activities seen in visual data through image captioning. Similarly,
Year of publication
Zhang et al. [70] recognized high-risk conditions in building construc­
tion through image semantics by estimating the relationships between
IC – Image Classification; OD – Object Detection; AR – Activity Recognition; PE – Pose Estimation; various entities present at the site. Tang et al. [73] detected human-
object interaction for job-site safety inspections. It is worth
SS/IS – Semantic/ Instance Segmentation; OCR – Optical Character Recognition; OT – Object Tracking

mentioning here that there are some recent applications of vision-


Fig. 8. Evolution of the complexity of visual data analytics in construction assisted robotic operations for construction productivity enhancement
applications.

15
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

[123,197], and waste management [110]. aligned with various instances with the same semantic meaning [205].
Some recent research by Perez-Perez et al. [206] and Wang et al. [204]
5. Future research directions for automatic conversion of MEP components to equivalent BIM models
shows really promising results. However, the proposed methods are yet
DL-based visual data analytics have evolved with time and in recent to include DL into their workflows. Agapaki and Brilakis [203] proposed
years they have gained sufficient maturity to tackle more complex en­ a method for instance segmentation of point-cloud data based on their
gineering problems. The successful applications of DL to 2D visual data previously developed class segmentation network CLOI-Net [207] that
in the context of construction engineering and the maturity of CV consisted of an optimized PointNet++ [208]. However, near-miss in­
research in other disciplines provide a strong foundation for future re­ stances were still a limitation in that study. Agapaki and Brilakis [203]
searchers to explore 3D visual data for smart construction management. highlighted the requirement for the improvement of class segmentation
Keeping that in mind, in this section, three prospective research areas algorithms for better instance segmentation. DL-based 3D semantic
where DL-based visual data analytics can be applied to 3D scene data are segmentation methods proposed by the CV community [205,209] can be
highlighted. The proposed applications supported with knowledge from explored further to improve the instance segmentation methods
related disciplines will be of interest to construction researchers for required for construction applications.
improving future construction management practices. Research initiated by Czerniawski et al. [210] for BIM-assisted
automatic filling of incomplete point clouds and building change
5.1. Detection and localization of objects in a 3D scene detection possesses huge potential for application in building mainte­
nance and facility management. A hierarchical deep variational
Because of the dynamic and complex nature of the construction in­ autoencoder was used for filling the missing data to complete the point
dustry, it is often difficult for construction workers to identify the po­ cloud. However, the study suggested that the combination of point-
tential safety risks at construction sites, which eventually lead to cloud completion and building change detection into one algorithm
accidents. As discussed in the previous sections, researchers have tried with the concept of triplet learning [211] can be investigated further.
several methods using 2D images or video frames to identify spatial
relationships between workers, machines, and objects and estimate 5.3. Integrated application of BIM, point cloud, and sensors for work
proximity, crowdedness, or interactions between them in order to avoid progress measurement
potential accidents. However, often those predictions were found to be
inaccurate [109]. Quantified measurement of work progress is the key requirement for
To enhance accuracy, the detection and localization of objects productivity assessment of construction resources. Volume-based or
(workers, machines, and other hazardous objects) in a 3D scene and area-based measurement of work progress through visual data can help
estimation of the spatial relationships between them can be a potential project managers in productivity control of construction activities. Thus
solution. Jeelani et al. [43] adopted an approach similar to the simul­ far, vision-based progress-monitoring methods are capable of detecting
taneous localization and mapping (SLAM) approach [198] for locating the existence of building elements in a 3D scene by leveraging 4D BIM
workers in a 3D scene and tried to establish a spatial relationship be­ and point clouds. However, to date, there is no method that can measure
tween the worker and hazardous area. They avoided the risk of drift in the progress of building elements in terms of volume or area from visual
the SLAM approach by constructing a global map with previously data. Future researchers can explore similar techniques proposed by
captured visual data and identified the hazardous condition using se­ Zhang et al. [212] and Fallqvist [213] on top of existing progress
mantic segmentation of images. However, the research was unable to monitoring methods for measuring progress quantities of building ele­
address the dynamic hazard conditions. Multiple-object detection and ments. Furthermore, DL-based point-cloud segmentation and voxeliza­
behavior recognition from 3D motion data may be a possible solution to tion of point clouds in connection with BIM models [214] can be
this issue. A combination of the research of Kim et al. [199] and the investigated further for measurement of construction activities associ­
latest DL techniques needs to be explored further for several construc­ ated with BIM models.
tion applications related to object detection and behavior assessment in A few activities are generally not modeled in BIM. One such activity
3D scenes. Qi et al. [200] proposed a method named ‘ImVoteNet’ for 3D is earthworks. For earthworks activities, a few researchers have already
object detection that applied DL on point clouds and combined 2D im­ started using DL-based methods on visual data for volume measurement
ages. Sarlin et al. [201] proposed a method for localization of a six- [215,216], which can be useful for improving the efficiency of earth­
degree-of-freedom (DoF) camera into a 3D scene using a monolithic moving operations. However, vision-based methods have limitations
CNN hierarchical feature network (HF-Net). Future studies may consider due to occlusion. To overcome this, integration with other sensor-based
fusing the research of Jeelani et al. [43], Qi et al. [200], and Sarlin et al. methods could be a solution. The research by Rasul et al. [217] inte­
[201] for localizing workers and objects in a complex construction site grated point clouds, image data, sensors, and CAD models for excavation
environment. volume measurement and progress monitoring to ease automated
excavation processes at job sites. However, Rasul et al. encouraged
5.2. Instance segmentation of point-cloud data and geometric-shape further research in this direction to improve the accuracy of the
modeling measured quantities. Progress measurement of those construction ac­
tivities that are usually neither modeled in BIM nor can be measured
Another line of research that is gaining popularity is the segmenta­ directly from point clouds (e.g., plastering, painting, etc.) is still an open
tion of point-cloud data and automatic geometric-shape modeling. This question for researchers to solve.
has potential for various construction management applications such as
construction progress monitoring, quality management, and facility 6. Conclusion
management [202]. It is observed that the segmentation of point-cloud
data is a foundation-level task in the effort of digital twinning the With the rise of artificial intelligence (AI), the world is becoming
already built entities [203]. It is also found to be a necessary step in the smarter, as are construction projects. The recent availability of inex­
automatic ‘scan to BIM’ processes [204]. Techniques of 3D point-cloud pensive options for visual data collection, storing, and processing tech­
segmentation can be broadly classified into semantic segmentation niques has encouraged architecture, engineering, and construction
and instance segmentation, where the latter is much more challenging professionals to leverage visual data for smart construction manage­
than the former. In the case of instance segmentation, points are not only ment. Additionally, deep learning (DL), a subfield of AI, has demon­
divided into different clusters of semantic meaning but also need to be strated promising performance for extracting valuable information from

16
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

visual data that can improve existing construction management prac­ field automatically evolved to the next stage after attaining a certain
tices by addressing age-old construction industry problems such as level of maturity in previous stages of research. It was also discussed in
occupational safety, poor productivity, improper product quality, and so detail that recent applications where more complex construction man­
on. Therefore, this study conducted an in-depth review of 142 journal agement problems were solved by integrating three or more CV tasks
articles and conference papers retrieved from two major online data­ simultaneously.
bases, the Web of Science core collection and Scopus, to indicate the Fourth, this exploration presented three promising future research
current trend of DL-based visual data analytics research in smart con­ directions that leverage relatively less-explored 3D visual data. They
struction management and highlight open challenges and future were detection and localization of objects in the 3D scene, instance
research directions. The systematic review indicated an average year- segmentation of point-cloud data for geometric shape modeling, and
on-year publication growth of approximately 50% since 2018 in this integrated application of BIM, a point cloud, and sensors for measure­
research domain. The keyword co-occurrence analysis stressed the ment of work progress. These research areas can be considered as the
importance of DL for visual data analytics by emphasizing a strong inevitable evolution of the DL-based 2D visual data analytics research
relationship between the two most frequently used keywords “computer that created a strong foundation for DL research within the construction
vision” and “deep learning” in this area. The highest average citation management research community.
scores of keywords such as “transfer learning”, “object detection” and Despite the significant contribution of this study, there exist a few
“Safety” indicated their importance among researchers of DL-based vi­ limitations. Although this research briefly introduced the four stages of
sual data analytics. the knowledge management system within the proposed workflow, only
The contribution of this paper is fourfold. First, to the best of our the knowledge extraction stage was discussed in detail to maintain
knowledge, this is the only paper so far that has provided a generalized thematic relevance. However, details of the other three stages will be
workflow for DL-based visual data analytics for smart construction part of future studies. This study reviewed papers published until Feb
management highlighting every minute detail of it. The workflow 2021, but because of the very fast growth of this research domain, we
specified knowledge management through four steps: domain knowl­ fear that there may be a significant number of research publications
edge mapping, knowledge extraction, knowledge fusion, and knowledge between the cutoff date and publication date of this paper. Also, this
synthesis. It also indicated the importance of construction-domain study only includes papers written in English because of the authors’
knowledge integration within the workflow for achieving the desired limited linguistic competence in other languages.
outcome. Furthermore, the knowledge extraction from visual data With the recent inordinate growth of visual data at construction sites
through DL-based methods was elaborated. This step consisted of seven and the advancement of DL-based analytics, construction management
major stages such as data collection, preparation, pre-processing of data, is reaching a new height. Understanding this new technology will not be
model selection, model training, validation and testing, and deploy­ optional but mandatory for construction professionals in the near future.
ment. This workflow can be easily adopted by those construction pro­ We believe that this review will certainly benefit the new researchers
jects that are on the verge of applying DL-based visual data analytics and construction practitioners to understand the overall process of DL-
methods to their practice. based visual data analytics and inspire future researchers to try
Second, this review investigated the research gaps and open chal­ advanced methods for solving the open challenges in this field.
lenges in every stage of the knowledge extraction workflow and sug­
gested future research possibilities. Some of them are summarized here. Declaration of Competing Interest
For visual data collection from construction sites, the effective placing of
fixed cameras and selection of waypoint locations for mobile cameras The authors declare that they have no known competing financial
are still challenging. Datasets required for solving construction-specific interests or personal relationships that could have appeared to influence
problems are still limited. Research efforts on minimizing the manual the work reported in this paper.
annotation effort are still insufficient. In general, object detection is
required to initiate other CV tasks. However, the detection model’s Acknowledgements
performance is still gets affected by many factors such as occlusion, low
light condition, the brightness of the images, and shape, color, scale, and This research was supported jointly by two projects (Grant numbers:
orientation of the objects. Tracking models are yet to solve the problems MOST 109-2621-M-002-012 & MOST 109-2221-E-002-054-MY3) fun­
of cold start and identity switching completely. Self-occlusion in various ded by the Ministry of Science and Technology, Taiwan. Also, this article
postures still limits the accurate detection of keypoints for pose esti­ was subsidized for English editing by National Taiwan University, under
mation models. Very few studies have investigated multi-object activity the Excellence Improvement Program for Doctoral Students (Grant
analysis or object-to-object interaction for effective activity analysis. So number: MOST 108-2926-I-002-002-MY4).
far, DL-based visual data analytics research in the AEC industry heavily
relies on supervised learning. Semi-supervised learning approaches, References
such as active learning and few-shot learning, need extensive research
on reducing the data dependency for model training. Evaluation of the [1] CMAA, What is Construction Management? Construction Management
Association of America, 2020. https://www.cmaanet.org/about-us/what-c
DL-based visual data analytics specific to the construction scenario is onstruction-management> (Apr. 3, 2021).
sometimes hindered because of the lack of proper metrics. Research [2] McKinsey Global Institute, Reinventing Construction: A Route to Higher
formulating such metrics is still limited. Lastly, there exist very few Productivity, McKinsey & Company, 2017. https://www.mckinsey.com/busine
ss-functions/operations/our-insights/reinventing-construction-through-a-produc
examples of DL model deployment in construction sites. Real case tivity-revolution> (Jun. 25, 2021).
studies indicating DL application in constriction from data collection to [3] USDL, National Census of Fatal Occupational Injuries in 2019, Bureau of Labor
deployment would benefit the AEC community significantly. Statistics, 2020. https://www.bls.gov/news.release/pdf/cfoi.pdf> (Mar. 22,
2021).
Third, the study identified six major application fields such as con­ [4] K.M. Rashid, J. Louis, Activity identification in modular construction using audio
struction safety, productivity, quality management, progress moni­ signals and machine learning, Autom. Constr. 119 (November 2020) (2020)
toring, facilities management, construction waste management, and 103361, https://doi.org/10.1016/j.autcon.2020.103361.
[5] H. Kim, Y. Ham, W. Kim, S. Park, H. Kim, Vision-based nonintrusive context
other applications, and 52 subfields within the construction manage­
documentation for earthmoving productivity simulation, Autom. Constr. Elsevier
ment domain where DL-based visual data analytics have been applied so 102 (June 2019) (2019) 135–147, https://doi.org/10.1016/j.
far. It also highlighted the evolution of DL-based visual data analytics autcon.2019.02.006.
methods over the years for solving complex problems in construction
management. It was observed that the complexity of the overall research

17
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

[6] X. Luo, H. Li, H. Wang, Z. Wu, F. Dai, D. Cao, Vision-based detection and [30] M.E. Rose, J.R. Kitchin, pybliometrics: Scriptable bibliometrics using a Python
visualization of dynamic workspaces, Autom. Constr. Elsevier 104 (December interface to Scopus, SoftwareX 10 (July–December 2019) (2019) 100263, https://
2018) (2019) 1–13, https://doi.org/10.1016/j.autcon.2019.04.001. doi.org/10.1016/j.softx.2019.100263.
[7] J. Yang, M.W. Park, P.A. Vela, M. Golparvar-Fard, Construction performance [31] W. Fang, L. Ma, P.E.D. Love, H. Luo, L. Ding, A. Zhou, Knowledge graph for
monitoring via still images, time-lapse photos, and video streams: Now, identifying hazards on construction sites: integrating computer vision with
tomorrow, and the future, Adv. Eng. Inform. 29 (2) (2015) 211–224, https://doi. ontology, Autom. Constr. 119 (November 2020) (2020) 103310, https://doi.org/
org/10.1016/j.aei.2015.01.011. 10.1016/j.autcon.2020.103310.
[8] A. Pal, S.-H. Hsieh, Vision based construction site monitoring: a review from [32] R. Wu, Y. Fujita, K. Soga, Integrating domain knowledge with deep learning
construction management point of view, in: Enabling the Development and models: An interpretable AI system for automatic work progress identification of
Implementation of Digital Twins: Proceedings of the 20th International NATM tunnels, Tunn. Undergr. Space Technol. 105 (November 2020) (2020)
Conference on Construction Applications of Virtual Reality, Teesside University, 103558, https://doi.org/10.1016/j.tust.2020.103558.
30 Sep. - 2 Oct. 2020, Middlesbrough, UK, 2020, pp. 44–55, 9780992716127. [33] H. Luo, M. Wang, P.K.Y. Wong, J.C.P. Cheng, Full body pose estimation of
[9] L. Deng, D. Yu, Deep learning: methods and applications, Found. Trends Sign. construction equipment using computer vision and deep learning techniques,
Proc. 7 (3–4) (2014) 197–387, https://doi.org/10.1561/2000000039. Autom. Constr. 110 (February 2020) (2020) 103016, https://doi.org/10.1016/j.
[10] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (2015) 436–444, autcon.2019.103016.
https://doi.org/10.1038/nature14539. [34] J. Gong, C.H. Caldas, Computer vision-based video interpretation model for
[11] N. O’Mahony, S. Campbell, A. Carvalho, S. Harapanahalli, G.V. Hernandez, automated productivity analysis of construction operations, J. Comput. Civ. Eng.
L. Krpalkova, D. Riordan, J. Walsh, Deep learning vs. traditional computer vision, 24 (3) (2010) 252–263, https://doi.org/10.1061/(ASCE)CP. 1943-
in: Proceedings of the 2019 Computer Vision Conference (CVC), 2-3 May 2019, 5487.0000027.
Las Vegas, USA, 2020, pp. 128–144, https://doi.org/10.1007/978-3-030-17795- [35] R. Wei, P.E.D. Love, W. Fang, H. Luo, S. Xu, Recognizing people’s identity in
9_10. construction sites with computer vision: a spatial and temporal attention pooling
[12] M.W. Park, I. Brilakis, Construction worker detection in video frames for network, Adv. Eng. Inform. 42 (October 2019) (2019) 100981, https://doi.org/
initializing vision trackers, Autom. Constr. Elsevier BV 28 (December 2012) 10.1016/j.aei.2019.100981.
(2012) 15–25, https://doi.org/10.1016/j.autcon.2012.06.001. [36] X. Yan, H. Zhang, H. Li, Computer vision-based recognition of 3D relationship
[13] A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep between construction entities for monitoring struck-by accidents, Comp. Aid.
convolutional neural networks, in: Proceedings of the 25th International Civil Infrastr. Eng. (2020) 1–16, https://doi.org/10.1111/mice.12536.
Conference on Neural Information Processing Systems, 3 - 6 December 2012, Lake [37] Y. Yu, H. Li, X. Yang, L. Kong, X. Luo, A.Y.L. Wong, An automatic and non-
Tahoe, Nevada, USA, 2012, pp. 1097–1105, https://doi.org/10.1145/3065386. invasive physical fatigue assessment method for construction workers, Autom.
[14] X. Lv, C. Dai, L. Chen, Y. Lang, R. Tang, Q. Huang, J. He, A robust real-time Constr. Elsevier 103 (July 2019) (2019) 1–12, https://doi.org/10.1016/j.
detecting and tracking framework for multiple kinds of unmarked object, Sensors autcon.2019.02.020.
(Switzerland) 20 (1) (2020), https://doi.org/10.3390/s20010002. [38] X. Yang, H. Li, T. Huang, X. Zhai, F. Wang, C. Wang, Computer-aided
[15] M.-A. Zamora-Hernández, J.A. Castro-Vargas, J. Azorin-Lopez, J. Garcia- optimization of surveillance cameras placement on construction sites, Comp. Aid.
Rodriguez, Deep learning-based visual control assistant for assembly in Industry Civil Infrastr. Eng. 33 (12) (2018) 1110–1126, https://doi.org/10.1111/
4.0, Comput. Ind. 131 (October 2021) (2021) 103485, https://doi.org/10.1016/j. mice.12385.
compind.2021.103485. [39] J. Kim, S. Chi, Multi-camera vision-based productivity monitoring of earthmoving
[16] Q. Fang, H. Li, X. Luo, C. Li, W. An, A sematic and prior-knowledge-aided operations, Autom. Constr. 112 (April 2020) (2020) 103121, https://doi.org/
monocular localization method for construction-related entities, Comp. Aid. Civil 10.1016/j.autcon.2020.103121.
Infrastr. Eng. 35 (9) (2020) 979–996, https://doi.org/10.1111/mice.12541. [40] A. Karmokar, N. Jani, A. Kalla, H. Harlalka, P. Sonar, Inspection of Concrete
[17] Z. Lin, A.Y. Chen, S. Hsieh, Automation in construction temporal image analytics Structures by a Computer Vision Technique and an Unmanned Aerial Vehicle, in:
for abnormal construction activity identification, Autom. Constr. Elsevier BV 124 Proceedings of the 2020 International Conference on Computational Performance
(April 2021) (2021) 103572, https://doi.org/10.1016/j.autcon.2021.103572. Evaluation, ComPE 2020, IEEE, 2-4 July 2020, Shillong, India, 2020,
[18] D. Roberts, W. Torres Calderon, S. Tang, M. Golparvar-Fard, Vision-based pp. 338–343, https://doi.org/10.1109/ComPE49325.2020.9200107.
construction worker activity analysis informed by body posture, J. Comput. Civ. [41] E. McLaughlin, N. Charron, S. Narasimhan, Combining deep learning and robotics
Eng. 34 (4) (2020) 1–17, https://doi.org/10.1061/(asce)cp. 1943-5487.0000898. for automated concrete delamination assessment, in: Proceedings of the 36th
[19] A. Braun, S. Tuttas, A. Borrmann, U. Stilla, Improving progress monitoring by International Symposium on Automation and Robotics in Construction, ISARC
fusing point clouds, semantic data and computer vision, Autom. Constr. 116 2019, 21-24 May 2019, Banff, Canada, 2019, pp. 485–492, https://doi.org/
(August 2020) (2020) 103210, https://doi.org/10.1016/j.autcon.2020.103210. 10.22260/isarc2019/0065.
[20] L. Liu, R.J. Yan, V. Maruvanchery, E. Kayacan, I.M. Chen, L.K. Tiong, Transfer [42] D. Kim, S. Lee, V.R. Kamat, Proximity prediction of mobile objects to prevent
learning on convolutional activation feature as applied to a building quality contact-driven accidents in co-robotic construction, J. Comput. Civ. Eng. 34 (4)
assessment robot, Int. J. Adv. Robot. Syst. 14 (3) (2017) 1–12, https://doi.org/ (2020) 1–10, https://doi.org/10.1061/(asce)cp. 1943-5487.0000899.
10.1177/1729881417712620. [43] I. Jeelani, K. Asadi, H. Ramshankar, K. Han, A. Albert, Real-time vision-based
[21] Z. Wang, H. Li, X. Yang, Vision-based robotic system for on-site construction and worker localization & hazard detection for construction, Autom. Constr. Elsevier
demolition waste sorting and recycling, J. Build. Eng. 32 (November 2020) BV 121 (January 2021) (2021) 103448, https://doi.org/10.1016/j.
(2020) 101769, https://doi.org/10.1016/j.jobe.2020.101769. autcon.2020.103448.
[22] Z. Dong, J. Wang, B. Cui, D. Wang, X. Wang, Patch-based weakly supervised [44] F. Pour Rahimian, S. Seyedzadeh, S. Oliver, S. Rodriguez, N. Dawood, On-demand
semantic segmentation network for crack detection, Constr. Build. Mater. 258 monitoring of construction projects through a game-like hybrid application of
(2020) 120291, https://doi.org/10.1016/j.conbuildmat.2020.120291. BIM and machine learning, Autom. Constr. 110 (February 2020) (2020) 103012,
[23] X. Luo, H. Li, H. Wang, Z. Wu, F. Dai, D. Cao, Vision-based detection and https://doi.org/10.1016/j.autcon.2019.103012.
visualization of dynamic workspaces, Autom. Constr. Elsevier 104 (August 2019) [45] P. Shokri, M. Shahbazi, D. Lichti, J. Nielsen, Vision-based approaches for
(2019) 1–13, https://doi.org/10.1016/j.autcon.2019.04.001. quantifying cracks in concrete structures, Int. Archiv. Photogram. Rem. Sens.
[24] J. Kim, Visual analytics for operation-level construction monitoring and Spat. Inform. Sci. ISPRS Archiv. 43 (B2) (2020) 1167–1174, https://doi.org/
documentation: state-of-the-art technologies, research challenges, and future 10.5194/isprs-archives-XLIII-B2-2020-1167-2020.
directions, Front. Built Environ. 6 (November) (2020) 1–20, https://doi.org/ [46] N.M. Tuan, Q. Van Hau, S. Chin, S. Park, In-situ concrete slump test incorporating
10.3389/fbuil.2020.575738. deep learning and stereo vision, Autom. Constr. Elsevier BV 121 (January 2021)
[25] B. Sherafat, C.R. Ahn, R. Akhavian, A.H. Behzadan, M. Golparvar-Fard, H. Kim, Y. (2021) 103432, https://doi.org/10.1016/j.autcon.2020.103432.
C. Lee, A. Rashidi, E.R. Azar, Automated methods for activity recognition of [47] T. Lu, S. Tervola, X. Lü, C.J. Kibert, Q. Zhang, T. Li, Z. Yao, A novel methodology
construction workers and equipment: state-of-the-art review, J. Constr. Eng. for the path alignment of visual SLAM in indoor construction inspection, Autom.
Manag. 146 (6) (2020) 1–19, https://doi.org/10.1061/(ASCE)CO.1943- Constr. Elsevier BV 127 (July 2021) (2021) 103723, https://doi.org/10.1016/j.
7862.0001843. autcon.2021.103723.
[26] T.D. Akinosho, L.O. Oyedele, M. Bilal, A.O. Ajayi, M.D. Delgado, O.O. Akinade, A. [48] K. Afsari, S. Halder, M. Ensafi, S. DeVito, J. Serdakowski, Fundamentals and
A. Ahmed, Deep learning in the construction industry: a review of present status prospects of four-legged robot application in construction progress monitoring,
and future innovations, J. Build. Eng. 32 (2020) 101827, https://doi.org/ EPiC Ser. Built Environ. (2021) 274–283, https://doi.org/10.29007/cdpd.
10.1016/j.jobe.2020.101827. [49] M. Day, Spot in construction, in: AEC Magazine, Wolverhampton, 2020. https
[27] L. Hou, H. Chen, G.K. Zhang, X. Wang, Deep learning-based applications for ://aecmag.com/reality-capture-modelling/spot-in-construction-boston-dyna
safety management in the AEC industry: a review, Appl. Sci. (Switzerland) 11 (2) mics/> (Jun. 25, 2021).
(2021) 1–18, https://doi.org/10.3390/app11020821. [50] B. Xiao, S.-C. Kang, Development of an image data set of construction machines
[28] M. Arashpour, T. Ngo, H. Li, Scene understanding in construction and buildings for deep learning object detection, J. Comput. Civ. Eng. 35 (2) (2021) 1–18,
using image processing methods: a comprehensive review and a case study, https://doi.org/10.1061/(asce)cp. 1943-5487.0000945.
J. Build. Eng. 33 (January 2021) (2021) 101672, https://doi.org/10.1016/j. [51] S. Arabi, A. Haghighat, A. Sharma, A deep-learning-based computer vision
jobe.2020.101672. solution for construction vehicle detection, Comp. Aid. Civil Infrastr. Eng. 35 (7)
[29] B. Zhong, H. Wu, L. Ding, P.E.D. Love, H. Li, H. Luo, L. Jiao, Mapping computer (2020) 753–767, https://doi.org/10.1111/mice.12530.
vision research in construction: Developments, knowledge gaps and implications [52] B. Xiao, S.-C. Kang, Vision-based method integrating deep learning detection for
for research, Autom. Constr. 107 (November 2019) (2019) 102919, https://doi. tracking multiple construction machines, J. Comput. Civ. Eng. 35 (2) (2021)
org/10.1016/j.autcon.2019.102919. 1–18, https://doi.org/10.1061/(asce)cp. 1943-5487.0000957.

18
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

[53] M.M. Soltani, Z. Zhu, A. Hammad, Skeleton estimation of excavator by detecting [77] D. Roberts, Y. Wang, A. Sabet, M. Golparvar-Fard, Annotating 2D imagery with
its parts, Autom. Constr. Elsevier 82 (October 2017) (2017) 1–15, https://doi. 3D kinematically configurable assets of construction equipment for training pose-
org/10.1016/j.autcon.2017.06.023. informed activity analysis and safety monitoring algorithms, in: Proceedings of
[54] W. Torres Calderon, D. Roberts, M. Golparvar-Fard, Synthesizing pose sequences the ASCE International Conference on Computing in Civil Engineering 2019, 17-
from 3D assets for vision-based activity analysis, J. Comput. Civ. Eng. 35 (1) 19 June 2019, Atlanta, GA, USA, 2019, pp. 32–38, https://doi.org/10.1061/
(2021) 1–17, https://doi.org/10.1061/(asce)cp. 1943-5487.0000937. 9780784482421.005.
[55] M. Neuhausen, P. Herbers, M. König, Synthetic data for evaluating the visual [78] N.D. Nath, A.H. Behzadan, Deep convolutional networks for construction object
tracking of construction workers, in: Proceedings of the Construction Research detection under different visual conditions, Front. Built Environ. 6 (August)
Congress 2020, American Society of Civil Engineers (ASCE), 8–10 March 2020, (2020) 1–22, https://doi.org/10.3389/fbuil.2020.00097.
Tempe, Arizona, USA, 2020, pp. 354–361, https://doi.org/10.1061/ [79] J.C.P. Cheng, M. Wang, Automated detection of sewer pipe defects in closed-
9780784482865.038. circuit television images using deep learning techniques, Autom. Constr. Elsevier
[56] O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, 95 (November 2018) (2018) 155–171, https://doi.org/10.1016/j.
A. Karpathy, A. Khosla, M. Bernstein, A.C. Berg, F.-F. Li, ImageNet large scale autcon.2018.08.006.
visual recognition challenge, Int. J. Comp. Vis. Springer US 115 (3) (2015) [80] M. Wang, J.C.P. Cheng, Development and improvement of deep learning based
211–252, https://doi.org/10.1007/s11263-015-0816-y. automated defect detection for sewer pipe inspection using faster R-CNN, in:
[57] M. Everingham, L. Van Gool, C.K.I. Williams, J. Winn, A. Zisserman, The pascal Proceedings of the 2018 Workshop of the European Group for Intelligent
visual object classes (VOC) challenge, Int. J. Comput. Vis. 88 (2) (2010) 303–338, Computing in Engineering (EG-ICE), Lecture Notes in Computer Science, Springer
https://doi.org/10.1007/s11263-009-0275-4. International Publishing, 10-13 June 2018, Lausanne, Switzerland, 2018,
[58] T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, C. pp. 171–192, https://doi.org/10.1007/978-3-319-91638-5_9.
L. Zitnick, Microsoft COCO: common objects in context, in: Proceedings of the [81] B. Xiao, Q. Lin, Y. Chen, A vision-based method for automatic tracking of
2014 European Conference on Computer Vision (ECCV), Lecture Notes in construction machines at nighttime based on deep learning illumination
Computer Science, Springer, 6-12 September 2014, Zurich, Switzerland, 2014, enhancement, Autom. Constr. Elsevier BV 127 (July 2021) (2021) 103721,
pp. 740–755, https://doi.org/10.1007/978-3-319-10602-1_48. https://doi.org/10.1016/j.autcon.2021.103721.
[59] D. Roberts, M. Golparvar-Fard, End-to-end vision-based detection, tracking and [82] J. Kim, S. Chi, Action recognition of earthmoving excavators based on sequential
activity analysis of earthmoving equipment filmed at ground level, Autom. pattern analysis of visual features and operation cycles, Autom. Constr. Elsevier
Constr. 105 (September 2019) (2019) 102811, https://doi.org/10.1016/j. 104 (August 2019) (2019) 255–264, https://doi.org/10.1016/j.
autcon.2019.04.006. autcon.2019.03.025.
[60] Y. Yu, H. Li, W. Umer, C. Dong, X. Yang, M. Skitmore, A.Y.L. Wong, Automatic [83] S. Bang, F. Baek, S. Park, W. Kim, H. Kim, Image augmentation to improve
biomechanical workload estimation for construction workers by computer vision construction resource detection using generative adversarial networks, cut-and-
and smart insoles, J. Comput. Civ. Eng. 33 (3) (2019) 1–13, https://doi.org/ paste, and image transformation techniques, Autom. Constr. 115 (July 2020)
10.1061/(ASCE)CP. 1943-5487.0000827. (2020) 103198, https://doi.org/10.1016/j.autcon.2020.103198.
[61] J. Wu, N. Cai, W. Chen, H. Wang, G. Wang, Automatic detection of hardhats worn [84] Z. Xie, H. Liu, Z. Li, Y. He, A convolutional neural network based approach
by construction personnel: a deep learning approach and benchmark dataset, towards real-time hard hat detection, in: Proceedings of the 2018 IEEE
Autom. Constr. 106 (October 2019) (2019) 102894, https://doi.org/10.1016/j. International Conference on Progress in Informatics and Computing, PIC 2018,
autcon.2019.102894. IEEE, 14-16 Dec. 2018, Suzhou, China, 2018, pp. 430–434, https://doi.org/
[62] A. Dimitrov, M. Golparvar-Fard, Vision-based material recognition for automated 10.1109/PIC.2018.8706269.
monitoring of construction progress and generating building information [85] J. Shen, X. Xiong, Y. Li, W. He, P. Li, X. Zheng, Detecting safety helmet wearing
modeling from unordered site image collections, Adv. Eng. Inform. 28 (1) (2014) on construction sites with bounding-box regression and deep transfer learning,
37–49, https://doi.org/10.1016/j.aei.2013.11.002. Comp. Aid. Civil Infrastr. Eng. 36 (2) (2021) 180–196, https://doi.org/10.1111/
[63] W. Fang, L. Ding, H. Luo, P.E.D. Love, Falls from heights: a computer vision-based mice.12579.
approach for safety harness detection, Autom. Constr. Elsevier 91 (July 2018a) [86] S. McMahon, N. Sunderhauf, M. Milford, B. Upcroft, TripNet: detecting trip
(2018) 53–61, https://doi.org/10.1016/j.autcon.2018.02.018. hazards on construction sites, in: Proceedings of the Australasian Conference on
[64] M. Kouzehgar, Y. Krishnasamy Tamilselvam, M. Vega Heredia, M. Rajesh Elara, Robotics and Automation 2015, 2-4 December 2015, Canberra, Australia, 2015.
Self-reconfigurable façade-cleaning robot equipped with deep-learning-based https://www.araa.asn.au/acra/acra2015/papers/pap158.pdf> (Jul. 30, 2021).
crack detection based on convolutional neural networks, Autom. Constr. 108 [87] F. Özgenel, A. Gönenç Sorguç, Performance comparison of pretrained
(December 2019) (2019) 102959, https://doi.org/10.1016/j. convolutional neural networks on crack detection in buildings, in: Proceedings of
autcon.2019.102959. the 35th International Symposium on Automation and Robotics in Construction,
[65] T. Darrenl, LabelImg: A Graphical Image Annotation Tool. https://github.com/t ISARC 2018, 20-25 July 2018, Berlin, Germany, 2018, pp. 693–700, https://doi.
zutalin/labelImg.>, 2018 (Mar. 29, 2021). org/10.22260/isarc2018/0094.
[66] W. Kentaro, Labelme: Image Polygonal Annotation with Python. https://github. [88] M.M. Soltani, S.F. Karandish, W. Ahmed, Z. Zhu, A. Hammad, Evaluating the
com/wkentaro/labelme>, 2016 (Mar. 29, 2021). performance of Convolutional Neural Network for classifying equipment on
[67] LabelBox, LabelBox. https://labelbox.com/solutions>, 2019 (Mar. 29, 2021). construction sites, in: Proceedings of the 34th International Symposium on
[68] J. Kim, J. Hwang, S. Chi, J.O. Seo, Towards database-free vision-based Automation and Robotics in Construction, ISARC 2017, 28 June – 1 July 2017,
monitoring on construction sites: a deep active learning approach, Autom. Constr. Taipei, Taiwan, 2017, pp. 509–516, https://doi.org/10.22260/isarc2017/0071.
120 (December 2020) (2020) 103376, https://doi.org/10.1016/j. [89] J.J. Lin, Lee, Jae Yong, M. Golparvar-Fard, Exploring the potential of image-based
autcon.2020.103376. 3D geometry and appearance reasoning for automated construction progress
[69] H. Luo, J. Liu, W. Fang, P.E.D. Love, Q. Yu, Z. Lu, Real-time smart video monitoring, in: Proceedings of the ASCE International Conference on Computing
surveillance to manage safety: a case study of a transport mega-project, Adv. Eng. in Civil Engineering 2019, 17-19 June 2019, Atlanta, GA, USA, 2019,
Inform. 45 (August 2020) (2020) 101100, https://doi.org/10.1016/j. pp. 162–170, https://doi.org/10.1061/9780784482438.021.
aei.2020.101100. [90] C. Hegde, O. Awan, T. Wiemers, Application of real-time video streaming and
[70] M. Zhang, M. Zhu, X. Zhao, Recognition of high-risk scenarios in building analytics to breakdown rig connection process, in: Proceedings of the Annual
construction based on image semantics, J. Comput. Civ. Eng. 34 (4) (2020) 1–16, Offshore Technology Conference, 30 April – 3 May 2018, Houston, Texas, USA,
https://doi.org/10.1061/(asce)cp. 1943-5487.0000900. 2018, pp. 2505–2518, https://doi.org/10.4043/28742-ms.
[71] D. Roberts, M. Wang, W. Torres Calderon, M. Golparvar-Fard, An annotation tool [91] Z. Fang, J. Qi, T. Yang, L. Wan, Y. Jin, ‘Reading’ cities with computer vision: a
for benchmarking methods for automated construction worker pose estimation new multi-spatial scale urban fabric dataset and a novel convolutional neural
and activity analysis, in: Proceedings of the International Conference on Smart network solution for urban fabric classification tasks, in: Proceedings of the 28th
Infrastructure and Construction 2019, ICSIC 2019: Driving Data-Informed International Symposium on Advances in Geographic Information Systems,
Decision-Making, ICE Publishing, 8 - 10 July 2019, Cambridge, UK, 2019, SIGSPATIAL ’20, 3 - 6 November 2020, Seattle, WA, USA, 2020, pp. 507–517,
pp. 307–313, https://doi.org/10.1680/icsic.64669.307. https://doi.org/10.1145/3397536.3422240.
[72] K. Liu, M. Golparvar-Fard, Crowdsourcing construction activity analysis from [92] P. Byvshev, P.A. Truong, Y. Xiao, Image-based renovation progress inspection
jobsite video streams, J. Constr. Eng. Manag. 141 (11) (2015) 1–19, https://doi. with deep siamese networks, in: Proceedings of the 2020 12th International
org/10.1061/(ASCE)CO.1943-7862.0001010. Conference on Machine Learning and Computing (ICMLC 2020), 15–17 February
[73] S. Tang, D. Roberts, M. Golparvar-Fard, Human-object interaction recognition for 2020, Shenzhen, China, 2020, pp. 96–104, https://doi.org/10.1145/
automatic construction site safety inspection, Autom. Constr. 120 (December 3383972.3384036.
2020) (2020) 103356, https://doi.org/10.1016/j.autcon.2020.103356. [93] A.S. Rao, T. Nguyen, M. Palaniswami, T. Ngo, Vision-based automated crack
[74] Y. Wang, P.C. Liao, C. Zhang, Y. Ren, X. Sun, P. Tang, Crowdsourced reliable detection using convolutional neural networks for condition assessment of
labeling of safety-rule violations on images of complex construction scenes for infrastructure, Struct. Health Monit. 20 (4) (2020) 2124–2142, https://doi.org/
advanced vision-based workplace safety, Adv. Eng. Inform. 42 (October 2019) 10.1177/1475921720965445.
(2019) 101001, https://doi.org/10.1016/j.aei.2019.101001. [94] E. Holm, A.A. Transeth, O.O. Knudsen, A. Stahl, Classification of corrosion and
[75] V. Carl, P. Donald, R. Deva, Efficiently scaling up crowdsourced video annotation, coating damages on bridge constructions from images using convolutional neural
Int. J. Comp. Vis. (IJCV) 101 (1) (2013) 184–204, https://doi.org/10.1007/ networks, in: Proceedings of the Twelfth International Conference on Machine
s11263-012-0564-1. Vision (ICMV 2019), SPIE Proceedings, 16-18 Nov 2019, Amsterdam,
[76] H. Kim, S. Bang, H. Jeong, Y. Ham, H. Kim, Analyzing context and productivity of Netherlands, 2020, https://doi.org/10.1117/12.2557380.
tunnel earthmoving processes using imaging and simulation, Autom. Constr. [95] M. Alipour, D.K. Harris, Increasing the robustness of material-specific deep
Elsevier 92 (August 2018) (2018) 188–198, https://doi.org/10.1016/j. learning models for crack detection across different materials, Eng. Struct. 206
autcon.2018.04.002.

19
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

(December 2019) (2020) 110157, https://doi.org/10.1016/j. [117] J. Zhang, D. Zhang, X. Liu, R. Liu, G. Zhong, A framework of on-site construction
engstruct.2019.110157. safety management using computer vision and real-time location system, in:
[96] L. Chen, Q. Zheng, Z. Liu, W. Mao, F. Lin, Self-intersection attention pooling Proceedings of the International Conference on Smart Infrastructure and
based classification for rock recognition, in: Proceedings of the 2020 16th Construction 2019, ICSIC 2019: Driving Data-Informed Decision-Making, ICE
International Conference on Control, Automation, Robotics and Vision (ICARCV), Publishing, 8 - 10 July 2019, Cambridge, UK, 2019, pp. 327–333, https://doi.org/
IEEE, 13-15 December 2020. Shenzhen, China, 2020, pp. 1210–1215, https://doi. 10.1680/icsic.64669.327.
org/10.1109/ICARCV50220.2020.9305305. [118] P. Martinez, B. Barkokebas, F. Hamzeh, M. Al-Hussein, R. Ahmad, A vision-based
[97] K. Chaiyasarn, W. Khan, L. Ali, M. Sharma, D. Brackenbury, M. DeJong, Crack approach for automatic progress tracking of floor paneling in offsite construction
detection in masonry structures using convolutional neural networks and support facilities, Autom. Constr. Elsevier BV 125 (May 2021) (2021) 103620, https://
vector machines, in: Proceedings of the 35th International Symposium on doi.org/10.1016/j.autcon.2021.103620.
Automation and Robotics in Construction, ISARC 2018, 20-25 July 2018, Berlin, [119] Q. Fang, H. Li, X. Luo, L. Ding, H. Luo, T.M. Rose, W. An, Detecting non-hardhat-
Germany, 2018, pp. 118–125, https://doi.org/10.22260/isarc2018/0016. use by a deep learning method from far-field surveillance videos, Autom. Constr.
[98] M. Wang, P. Wong, H. Luo, S. Kumar, V. Delhi, J. Cheng, Predicting safety Elsevier 85 (January 2018) (2018) 1–9, https://doi.org/10.1016/j.
hazards among construction workers and equipment using computer vision and autcon.2017.09.018.
deep learning techniques, in: Proceedings of the 36th International Symposium [120] Z. Fan, C. Peng, L. Dai, F. Cao, J. Qi, W. Hua, A deep learning-based ensemble
on Automation and Robotics in Construction, ISARC 2019, 21-24 May 2019, method for helmet-wearing detection, PeerJ Comp. Sci. 6 (2020) 1–21, https://
Banff, Canada, 2019, pp. 399–406, https://doi.org/10.22260/isarc2019/0054. doi.org/10.7717/peerj-cs.311.
[99] X. Yan, H. Zhang, H. Li, Estimating worker-centric 3D spatial crowdedness for [121] Z. Lu, K. Liu, Y. Zhang, Z. Liu, J. Dong, L. Qingjie, T. Xu, Building detection via
construction safety management using a single 2D camera, J. Comput. Civ. Eng. complementary convolutional features of remote sensing images, in: Proceedings
33 (5) (2019) 1–13, https://doi.org/10.1061/(ASCE)CP. 1943-5487.0000844. of third Chinese Conference on Pattern Recognition and Computer Vision, PRCV
[100] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate 2020, Lecture Notes in Computer Science, 16–18 October 2020, Nanjing, China,
object detection and semantic segmentation, in: Proceedings of the 2014 IEEE 2020, pp. 638–648, https://doi.org/10.1007/978-3-030-60633-6_53.
Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 23-28 [122] H. Son, H. Choi, H. Seong, C. Kim, Detection of construction workers under
June 2014,Columbus, OH, USA, 2014, pp. 580–587, https://doi.org/10.1109/ varying poses and changing background in image sequences via very deep
CVPR.2014.81. residual networks, Autom. Constr. Elsevier 99 (March 2019) (2019) 27–38,
[101] R. Girshick, Fast R-CNN, in: Proceedings of the 2015 IEEE International https://doi.org/10.1016/j.autcon.2018.11.033.
Conference on Computer Vision (ICCV), IEEE, 7-13 Dec. 2015, Santiago, Chile, [123] H. Li, X. Luo, M. Skitmore, Intelligent hoisting with car-like mobile robots,
2015, pp. 1440–1448, https://doi.org/10.1109/ICCV.2015.169. J. Constr. Eng. Manag. 146 (12) (2020) 1–14, https://doi.org/10.1061/(asce)
[102] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: towards real-time object co.1943-7862.0001931.
detection with region proposal networks, IEEE Trans. Pattern Anal. Mach. Intell. [124] C. Borngrund, U. Bodin, F. Sandin, Machine vision for construction equipment by
39 (6) (2017) 1137–1149, https://doi.org/10.1109/TPAMI.2016.2577031. transfer learning with scale models, in: Proceedings of the 2020 International
[103] J. Dai, Y. Li, K. He, J. Sun, R-FCN: object detection via region-based fully Joint Conference on Neural Networks (IJCNN), IEEE, 19-24 July 2020, Glasgow,
convolutional networks, in: Proceedings of the 30th International Conference on UK, 2020, pp. 1–8, https://doi.org/10.1109/IJCNN48605.2020.9207577.
Neural Information Processing Systems, NIPS’16, 5 December 2016, Barcelona, [125] F. Wu, G. Jin, M. Gao, Z. He, Y. Yang, Helmet detection based on improved YOLO
Spain, 2016, in: https://proceedings.neurips.cc/paper/2016/file/577ef1154f3 V3 deep model, in: Proceedings of the 2019 IEEE 16th International Conference
240ad5b9b413aa7346a1e-Paper.pdf> (Jul. 30, 2021). on Networking, Sensing and Control, ICNSC 2019, IEEE, 9-11 May 2019, Banff,
[104] B. Xiao, S.C. Kang, Deep learning detection for real-time construction machine AB, Canada, 2019, pp. 363–368, https://doi.org/10.1109/ICNSC.2019.8743246.
checking, in: Proceedings of the 36th International Symposium on Automation [126] V.S.K. Delhi, R. Sankarlal, A. Thomas, Detection of personal protective equipment
and Robotics in Construction, ISARC 2019, 21-24 May 2019, Banff, Canada, 2019, (PPE) compliance on construction site using computer vision based deep learning
pp. 1136–1141, https://doi.org/10.22260/isarc2019/0151. techniques, Front. Built Environ. 6 (136) (2020), https://doi.org/10.3389/
[105] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real- fbuil.2020.00136.
time object detection, in: Proceedings of the 2016 IEEE Conference on Computer [127] D. Kim, M. Liu, S.H. Lee, V.R. Kamat, Remote proximity monitoring between
Vision and Pattern Recognition (CVPR), 27-30 June 2016, Las Vegas, NV, USA, mobile construction resources using camera-mounted UAVs, Autom. Constr.
2016, pp. 779–788, https://doi.org/10.1109/CVPR.2016.91. Elsevier 99 (March 2019) (2019) 168–182, https://doi.org/10.1016/j.
[106] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.Y. Fu, A.C. Berg, SSD: Single autcon.2018.12.014.
shot multibox detector, in: Proceedings of the 2016 European Conference on [128] M. Neuhausen, P. Herbers, M. König, Using synthetic data to improve and
Computer Vision (ECCV), Lecture Notes in Computer Science, Springer, 11–14 evaluate the tracking performance of construction workers on site, Appl. Sci.
October 2016, Amsterdam, Netherlands, 2016, pp. 21–37, https://doi.org/ (Switzerland) 10 (14) (2020) 1–18, https://doi.org/10.3390/app10144948.
10.1007/978-3-319-46448-0_2. [129] M. Neuhausen, D. Pawlowski, M. König, Comparing classical and modern
[107] N. Wang, X. Zhao, P. Zhao, Y. Zhang, Z. Zou, J. Ou, Automatic damage detection machine learning techniques for monitoring pedestrian workers in top-view
of historic masonry buildings based on mobile deep learning, Autom. Constr. construction site video sequences, Appl. Sci. (Switzerland) 10 (23) (2020) 1–20,
Elsevier 103 (July 2019) (2019) 53–66, https://doi.org/10.1016/j. https://doi.org/10.3390/app10238466.
autcon.2019.03.003. [130] L. Qing, K. Yang, W. Tan, J. Li, Automated detection of manhole covers in MLS
[108] F. Ding, Z. Zhuang, Y. Liu, D. Jiang, X. Yan, Z. Wang, Detecting defects on solid point clouds using a deep learning approach, in: Proceedings of the 2020 IEEE
wood panels based on an improved SSD algorithm, Sensors (Switzerland) 20 (18) International Geoscience and Remote Sensing Symposium, IEEE, 26 Sept.-2 Oct.
(2020) 1–17, https://doi.org/10.3390/s20185315. 2020, Waikoloa, HI, USA, 2020, pp. 1580–1583, https://doi.org/10.1109/
[109] X. Luo, H. Li, D. Cao, Y. Yu, X. Yang, T. Huang, Towards efficient and objective IGARSS39084.2020.9324137.
work sampling: Recognizing workers’ activities in site surveillance videos with [131] Y. Guo, Y. Xu, S. Li, Dense construction vehicle detection based on orientation-
two-stream convolutional networks, Autom. Constr. Elsevier 94 (October 2018) aware feature fusion convolutional neural network, Autom. Constr. 112 (April
(2018) 360–370, https://doi.org/10.1016/j.autcon.2018.07.011. 2020) (2020) 103124, https://doi.org/10.1016/j.autcon.2020.103124.
[110] Z. Wang, H. Li, X. Zhang, Construction waste recycling robot for nails and screws: [132] Y. Guo, H. Niu, S. Li, Safety monitoring in construction site based on unmanned
Computer vision technology and neural network approach, Autom. Constr. aerial vehicle platform with computer vision using transfer learning techniques,
Elsevier 97 (January 2019) (2019) 220–228, https://doi.org/10.1016/j. in: Proceedings of the 7th Asia-Pacific Workshop on Structural Health Monitoring,
autcon.2018.11.009. APWSHM 2018, 12-15 November 2018, Hong Kong SAR, China, 2018. https://
[111] Y. Li, Y. Lu, J. Chen, A deep learning approach for real-time rebar counting on the www.ndt.net/article/apwshm2018/papers/166.pdf> (Jul. 31, 2021).
construction site based on YOLOv3 detector, Autom. Constr. Elsevier BV 124 [133] Q. Fang, H. Li, X. Luo, L. Ding, H. Luo, C. Li, Computer vision aided inspection on
(April 2021) (2021) 103602, https://doi.org/10.1016/j.autcon.2021.103602. falling prevention measures for steeplejacks in an aerial environment, Autom.
[112] X. Hou, Y. Zeng, J. Xue, Detecting structural components of building engineering Constr. Elsevier 93 (September 2018) (2018) 148–164, https://doi.org/10.1016/
based on deep-learning method, J. Constr. Eng. Manag. 146 (2) (2020) 1–11, j.autcon.2018.05.022.
https://doi.org/10.1061/(asce)co.1943-7862.0001751. [134] C.Y. Lin, C.H. Chen, C.Y. Yang, F. Akhyar, C.Y. Hsu, H.F. Ng, Cascading
[113] J. Kim, J. Song, J. Lee, Inference of relevant BIM objects using CNN for visual- convolutional neural network for steel surface defect detection, in: Proceedings of
input based auto-modeling, in: Proceedings of the 36th International Symposium the AHFE 2019 International Conference on Human Factors in Artificial
on Automation and Robotics in Construction, ISARC 2019, 21-24 May 2019, Intelligence and Social Computing, the AHFE International Conference on Human
Banff, Canada, 2019, pp. 393–398, https://doi.org/10.22260/isarc2019/0053. Factors, Software, Service and Systems Engineering, and the AHFE International
[114] H. Kim, H. Kim, Y.W. Hong, H. Byun, Detecting construction equipment using a Conference of, Springer International Publishing, 24-28 July 2019, Washington D.
region-based fully convolutional network and transfer learning, J. Comput. Civ. C., USA, 2020, pp. 202–212, https://doi.org/10.1007/978-3-030-20454-9_20.
Eng. 32 (2) (2018) 1–15, https://doi.org/10.1061/(asce)cp. 1943-5487.0000731. [135] Y. Li, H. Wei, Z. Han, J. Huang, W. Wang, Deep learning-based safety helmet
[115] P. Martinez, M. Al-Hussein, R. Ahmad, Intelligent vision-based online inspection detection in engineering management based on convolutional neural networks,
system of screw-fastening operations in light-gauge steel frame manufacturing, Adv. Civ. Eng. 2020 (2020) 1–10, https://doi.org/10.1155/2020/9703560.
Int. J. Adv. Manuf. Technol. Springer 109 (2020) 645–657, https://doi.org/ [136] C. Li, L. Ding, Falling Objects Detection for Near Miss Incidents Identification on
10.1007/s00170-020-05695-y. Construction Site, in: Proceedings of the ASCE International Conference on
[116] W. Fang, L. Ding, B. Zhong, P.E.D. Love, H. Luo, Automated detection of workers Computing in Civil Engineering 2019, 17-19 June 2019, Atlanta, GA, USA, 2019,
and heavy equipment on construction sites: a convolutional neural network pp. 105–113, https://doi.org/10.1061/9780784482438.018.
approach, Adv. Eng. Inform. Elsevier 37 (August 2018) (2018) 139–149, https:// [137] K. Zhang, S. Tong, H. Shi, Trajectory prediction of assembly alignment of
doi.org/10.1016/j.aei.2018.05.003. columnar precast concrete members with deep learning, Symmetry 11 (5) (2019)
629, https://doi.org/10.3390/sym11050629.

20
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

[138] N. Filatov, N. Maltseva, A. Bakhshiev, Development of hard hat wearing [160] W. Wang, C. Wei, W. Yang, J. Liu, GLADNet: Low-light enhancement network
monitoring system using deep neural networks with high inference speed, in: with global awareness, in: Proceedings of the 2018 13th IEEE International
Proceedings of the 2020 International Russian Automation Conference, Conference on Automatic Face & Gesture Recognition (FG 2018), IEEE, 15-19
RusAutoCon 2020, 6-12 Sept. 2020, Sochi, Russia, 2020, pp. 459–463, https:// May 2018, Xi’an, China, 2018, pp. 751–755, https://doi.org/10.1109/
doi.org/10.1109/RusAutoCon49822.2020.9208155. FG.2018.00118.
[139] M. Siddula, F. Dai, Y. Ye, J. Fan, Unsupervised Feature Learning for Objects of [161] H. Zhang, X. Yan, H. Li, Ergonomic posture recognition using 3D view-invariant
Interest Detection in Cluttered Construction Roof Site Images, in: Procedia features from single ordinary camera, Autom. Constr. Elsevier 94 (October 2018)
Engineering: International Conference on Sustainable Design, Engineering and (2018) 1–10, https://doi.org/10.1016/j.autcon.2018.05.033.
Construction, Elsevier B.V., 145, 2016, pp. 428–435, https://doi.org/10.1016/j. [162] Z. Cao, G. Hidalgo, T. Simon, S.E. Wei, Y. Sheikh, OpenPose: realtime multi-
proeng.2016.04.010. person 2D pose estimation using part affinity fields, IEEE Trans. Pattern Anal.
[140] K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN, IEEE Trans. Pattern Anal. Mach. Intell. 43 (1) (2021) 172–186, https://doi.org/10.1109/
Mach. Intell. 42 (2) (2018) 386–397, https://doi.org/10.1109/ TPAMI.2019.2929257.
TPAMI.2018.2844175. [163] A. Ojelade, F. Paige, Construction worker posture estimation using OpenPose, in:
[141] W. Weng, X. Zhu, U-Net: Convolutional Networks for Biomedical Image Proceedings of the Construction Research Congress 2020, American Society of
Segmentation, in: Proceedings of the 2015 International Conference on Medical Civil Engineers (ASCE), 8–10 March 2020, Tempe, Arizona, USA, 2020,
Image Computing and Computer-Assisted Intervention, MICCAI 2015, Lecture pp. 556–564, https://doi.org/10.1061/9780784482872.060.
Notes in Computer Science, 5-9 October 2015, Munich, Germany, 2015, [164] H. Luo, M. Wang, P.K.-Y. Wong, J. Tang, J.C.P. Cheng, Vision-based pose
pp. 234–240, https://doi.org/10.1109/ACCESS.2021.3053408. forecasting of construction equipment for monitoring construction site safety, in:
[142] L.C. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking atrous convolution for Proceedings of the 18th International Conference on Computing in Civil and
semantic image segmentation, arXiv preprint (2017). https://arxiv.org/abs/1706. Building Engineering, Lecture Notes in Civil Engineering, Springer, 18-20 August
05587> (Jul. 30, 2021). 2020, São Paulo, Brazil, 2020, pp. 1127–1138, https://doi.org/10.1007/978-3-
[143] C. Hazirbas, L. Ma, C. Domokos, D. Cremers, FuseNet: incorporating depth into 030-51295-8_78.
semantic segmentation via fusion-based CNN architecture, in: Proceedings of the [165] Y. Yu, X. Yang, H. Li, X. Luo, H. Guo, Q. Fang, Joint-level vision-based ergonomic
13th Asian Conference on Computer Vision, Lecture Notes in Computer Science, assessment tool for construction workers, J. Constr. Eng. Manag. 145 (5) (2019)
20-24 November 2016, Taipei, Taiwan, 2017, pp. 213–228, https://doi.org/ 1–15, https://doi.org/10.1061/(asce)co.1943-7862.0001647.
10.1007/978-3-319-54181-5_14. [166] C.J. Liang, K.M. Lundeen, W. McGee, C.C. Menassa, S.H. Lee, V.R. Kamat,
[144] B. Lv, L. Peng, T. Wu, R. Chen, Research on urban building extraction method A vision-based marker-less pose estimation system for articulated construction
based on deep learning convolutional neural network, in: Proceedings of the First robots, Autom. Constr. Elsevier 104 (August 2019) (2019) 80–94, https://doi.
China Digital Earth Conference, 18-20 November 2019, Beijing, China, 2020, org/10.1016/j.autcon.2019.04.004.
https://doi.org/10.1088/1755-1315/502/1/012022. [167] C.J. Liang, K.M. Lundeen, W. McGee, C.C. Menassa, S. Lee, V.R. Kamat, Stacked
[145] L. Chen, R.G. Lopes, B. Cheng, M.D. Collins, E.D. Cubuk, B. Zoph, H. Adam, hourglass networks for markerless pose estimation of articulated construction
J. Shlens, Naive-student: leveraging semi-supervised learning in video sequences robots, in: Proceedings of the 35th International Symposium on Automation and
for urban scene segmentation, in: Proceedings of the 2020 European Conference Robotics in Construction, ISARC 2018, 20-25 July 2018, Berlin, Germany, 2018,
on Computer Vision (ECCV), Lecture Notes in Computer Science, Springer, 23–28 pp. 859–865, https://doi.org/10.22260/isarc2018/0120.
August 2020, Glasgow, UK, 2020, pp. 695–714, https://doi.org/10.1007/978-3- [168] D.A. Calvache, H.A. Bernal, J.F. Guarín, K. Aguía, A.D. Orjuela-Cañón, O.
030-58545-7_40. J. Perdomo, Automatic estimation of pose and falls in videos using computer
[146] W. Fang, B. Zhong, N. Zhao, P.E.D. Love, H. Luo, J. Xue, S. Xu, A deep learning- vision model, in: Proceedings of the 16th International Symposium on Medical
based approach for mitigating falls from height with computer vision: Information Processing and Analysis, SPIE Proceedings, 3-4 October 2020, Lima,
convolutional neural network, Adv. Eng. Inform. Elsevier 39 (January 2019) Peru, 2020, pp. 1–8, https://doi.org/10.1117/12.2579615.
(2019) 170–177, https://doi.org/10.1016/j.aei.2018.12.005. [169] H. Luo, C. Xiong, W. Fang, P.E.D. Love, B. Zhang, X. Ouyang, Convolutional
[147] Y. Xia, X. Jian, B. Yan, D. Su, Infrastructure safety oriented traffic load monitoring neural networks: Computer vision-based workforce activity assessment in
using multi-sensor and single camera for short and medium span bridges, Remote construction, Autom. Constr. Elsevier 94 (October 2018) (2018) 282–289,
Sens. 11 (22) (2019) 2651, https://doi.org/10.3390/rs11222651. https://doi.org/10.1016/j.autcon.2018.06.007.
[148] Q. Song, Y. Wu, X. Xin, L. Yang, M. Yang, H. Chen, C. Liu, M. Hu, X. Chai, J. Li, [170] L. Wang, Y. Xiong, Z. Wang, Y. Qiao, D. Lin, X. Tang, L. Van Gool, Temporal
Real-time tunnel crack analysis system via deep learning, IEEE Access, IEEE 7 segment networks: towards good practices for deep action recognition, in:
(2019) 64186–64197, https://doi.org/10.1109/ACCESS.2019.2916330. Proceedings of the 2016 European Conference on Computer Vision (ECCV),
[149] M.M. Manjurul Islam, J.M. Kim, Vision-based autonomous crack detection of Lecture Notes in Computer Science, Springer, 11-14 October 2016, Amsterdam,
concrete structures using a fully convolutional encoder–decoder network, Sensors Netherlands, 2016, pp. 20–36, https://doi.org/10.1007/978-3-319-46484-8_2.
(Switzerland) 19 (19) (2019) 1–12, https://doi.org/10.3390/s19194251. [171] J. Carreira, A. Zisserman, Quo Vadis, action recognition? A new model and the
[150] X. Luo, H. Li, D. Cao, F. Dai, J. Seo, S. Lee, Recognizing diverse construction kinetics dataset, in: Proceedings of the 2017 IEEE Conference on Computer Vision
activities in site images via relevance networks of construction-related objects and Pattern Recognition (CVPR), IEEE, 21-26 July 2017, Honolulu, HI, USA,
detected by convolutional neural networks, J. Comput. Civ. Eng. 32 (3) (2018) 2017, pp. 4724–4733, https://doi.org/10.1109/CVPR.2017.502.
1–16, https://doi.org/10.1061/(asce)cp. 1943-5487.0000756. [172] C. Chen, Z. Zhu, A. Hammad, W. Ahmed, Vision-based excavator activity
[151] Q. Fang, H. Li, X. Luo, L. Ding, T.M. Rose, W. An, Y. Yu, A deep learning-based recognition and productivity analysis in construction, in: Proceedings of the ASCE
method for detecting non-certified work on construction sites, Adv. Eng. Inform. International Conference on Computing in Civil Engineering 2019, June 17–19,
Elsevier 35 (January 2018) (2018) 56–68, https://doi.org/10.1016/j. 2019, Atlanta, Georgia, 2019, pp. 241–248, https://doi.org/10.1061/
aei.2018.01.001. 9780784482438.031.
[152] J.F. Henriques, R. Caseiro, P. Martins, J. Batista, High-speed tracking with [173] L. Ding, W. Fang, H. Luo, P.E.D. Love, B. Zhong, X. Ouyang, A deep hybrid
kernelized correlation filters, IEEE Trans. Pattern Anal. Mach. Intell. 37 (3) learning model to detect unsafe behavior: Integrating convolution neural
(2015) 583–596, https://doi.org/10.1109/TPAMI.2014.2345390. networks and long short-term memory, Autom. Constr. Elsevier 86 (February
[153] L. Wang, W. Ouyang, X. Wang, H. Lu, Visual tracking with fully convolutional 2018) (2018) 118–124, https://doi.org/10.1016/j.autcon.2017.11.002.
networks, in: Proceedings of the 2015 IEEE International Conference on [174] H. Liu, G. Wang, T. Huang, P. He, M. Skitmore, X. Luo, Manifesting construction
Computer Vision (ICCV), IEEE, 7-13 Dec. 2015, Santiago, Chile, 2015, activity scenes via image captioning, Autom. Constr. 119 (November 2020)
pp. 3119–3127, https://doi.org/10.1109/ICCV.2015.357. (2020) 103334, https://doi.org/10.1016/j.autcon.2020.103334.
[154] H. Nam, B. Han, Learning multi-domain convolutional neural networks for visual [175] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
tracking, in: Proceedings of the 2016 IEEE Conference on Computer Vision and A. Courville, Y. Bengio, Generative adversarial networks, Commun. ACM 63 (11)
Pattern Recognition (CVPR), 27-30 June 2016, Las Vegas, NV, USA, 2016, (2014) 139–144, https://doi.org/10.1145/3422622.
pp. 4293–4302, https://doi.org/10.1109/CVPR.2016.465. [176] A. Gupta, J. Johnson, F.-F. Li, S. Savarese, A. Alahi, Social GAN: socially
[155] K. Zhang, Q. Liu, Y. Wu, M.-H. Yang, Robust visual tracking via convolutional acceptable trajectories with generative adversarial networks, in: Proceedings of
networks without training, IEEE Trans. Image Process. 25 (4) (2016) 1779–1792, the 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
https://doi.org/10.1109/TIP. 2016.2531283. IEEE, 18-23 June 2018, Salt Lake City, UT, USA, 2018, pp. 2255–2264, https://
[156] B. Xiao, Z. Zhu, Two-dimensional visual tracking in construction scenarios: a doi.org/10.1109/CVPR.2018.00240.
comparative study, J. Comput. Civ. Eng. 32 (3) (2018) 1–10, https://doi.org/ [177] N. Nath, A. Behzadan, Deep generative adversarial network to enhance image
10.1061/(asce)cp. 1943-5487.0000738. quality for fast object detection in construction sites, in: Proceedings of the 2020
[157] C. Ma, J. Bin Huang, X. Yang, M.H. Yang, Hierarchical convolutional features for Winter Simulation Conference, IEEE, 14-18 Dec. 2020, Orlando, FL, USA, 2020,
visual tracking, in: Proceedings of the 2015 IEEE International Conference on pp. 2447–2459, https://doi.org/10.1109/WSC48552.2020.9383890.
Computer Vision (ICCV), 7-13 Dec. 2015, Santiago, Chile, 2015, pp. 3074–3082, [178] Z. Kolar, H. Chen, X. Luo, Transfer learning and deep convolutional neural
https://doi.org/10.1109/ICCV.2015.352. networks for safety guardrail detection in 2D images, Autom. Constr. Elsevier 89
[158] D. Kim, M. Liu, S. Lee, V.R. Kamat, Trajectory prediction of mobile construction (May 2018) (2018) 58–70, https://doi.org/10.1016/j.autcon.2018.01.003.
resources toward pro-active struck-by hazard detection, in: Proceedings of the [179] S.J. Pan, Q. Yang, A survey on transfer learning, IEEE Trans. Knowledge Data Eng.
36th International Symposium on Automation and Robotics in Construction, IEEE 22 (10) (2010) 1345–1359, https://doi.org/10.1109/TKDE.2009.191.
ISARC 2019, 21-24 May 2019, Banff, Canada, 2019, pp. 982–988, https://doi. [180] I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016. htt
org/10.22260/isarc2019/0131. p://www.deeplearningbook.org (Jun. 25, 2021).
[159] Z. Kalal, K. Mikolajczyk, J. Matas, Tracking-learning-detection, IEE Trans. Pattern [181] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin,
Analys. Machine Intellig. IEEE 34 (7) (2012) 1409–1422, https://doi.org/ A. Desmaison, L. Antiga, A. Lerer, PyTorch: an imperative style, high-performance
10.1109/TPAMI.2011.239. deep learning library, in: Advances in Neural Information Processing Systems 32

21
A. Pal and S.-H. Hsieh Automation in Construction 131 (2021) 103892

(NeurIPS 2019), Curran Associates, Inc, 8-14 Dec. 2019, Vancouver, Canada, [200] C.R. Qi, X. Chen, O. Litany, L.J. Guibas, ImVoteNet: boosting 3D object detection
2019. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-per in point clouds with image votes, in: Proceedings of the 2020 IEEE/CVF
formance-deep-learning-library.pdf> (Mar. 20, 2021). Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 13-19
[182] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G.S. Corrado, June 2020, Seattle, WA, USA, 2020, pp. 4403–4412, https://doi.org/10.1109/
A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, CVPR42600.2020.00446.
M. Isard, Y.J. Rafal Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, [201] P.E. Sarlin, C. Cadena, R. Siegwart, M. Dymczyk, From coarse to fine: Robust
M. Schuster, R. Monga, S. Moore, D. Murray, C. Olah, Jo Shlens, B. Steiner, hierarchical localization at large scale, in: Proceedings of the 2019 IEEE/CVF
I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 15-20
O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: June 2019, Long Beach, CA, USA, 2019, pp. 12708–12717, https://doi.org/
large-scale machine learning on heterogeneous systems, in: tensorflow.org, 2015. 10.1109/CVPR.2019.01300.
https://www.tensorflow.org/> (Jun. 25, 2021). [202] F. Leite, Automated approaches towards BIM-based intelligent decision support in
[183] C. François, Keras, GitHub, 2015. https://github.com/fchollet/keras> (Apr. 18, design, construction, and facility operations, in: Proceedings of the 2018
2021). Workshop of the European Group for Intelligent Computing in Engineering (EG-
[184] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, ICE), Lecture Notes in Computer Science, Springer International Publishing, 10-
T. Darrell, Caffe: convolutional architecture for fast feature embedding, in: 13 June 2018, Lausanne, Switzerland, 2018, pp. 276–286, https://doi.org/
Proceedings of the 22nd ACM international conference on Multimedia, 3-7 Nov. 10.1007/978-3-319-91638-5_15.
2014,Orlando, Florida, USA, 2014, pp. 675–678, https://doi.org/10.1145/ [203] E. Agapaki, I. Brilakis, Instance segmentation of industrial point cloud data, arXiv
2647868.2654889. preprint (2020). https://arxiv.org/pdf/2012.14253.pdf> (Jul. 30, 2021).
[185] R. Al-Rfou, G. Alain, A. Almahairi, C. Angermueller, D. Bahdanau, N. Ballas, [204] B. Wang, C. Yin, H. Luo, J.C.P. Cheng, Q. Wang, Fully automated generation of
Theano: a Python framework for fast computation of mathematical expressions, parametric BIM for MEP scenes based on terrestrial laser scanning data, Autom.
arXiv preprint (2016). http://arxiv.org/pdf/1605.02688.pdf> (Jul. 30, 2021). Constr. Elsevier BV 125 (May 2021) (2021) 103615, https://doi.org/10.1016/j.
[186] E. Karaaslan, U. Bagci, F.N. Catbas, Attention-guided analysis of infrastructure autcon.2021.103615.
damage with semi-supervised deep learning, Autom. Constr. Elsevier BV 125 [205] Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, M. Bennamoun, Deep learning for 3D point
(May 2021) (2021) 103634, https://doi.org/10.1016/j.autcon.2021.103634. clouds: a survey, IEEE Trans. Pattern Anal. Mach. Intell. (2020) 1–27, https://doi.
[187] G. Zhang, Y. Pan, L. Zhang, Automation in construction semi-supervised learning org/10.1109/tpami.2020.3005434.
with GAN for automatic defect detection from images, Autom. Constr. Elsevier BV [206] Y. Perez-Perez, M. Golparvar-Fard, K. El-Rayes, Segmentation of point clouds via
128 (August 2021) (2021) 103764, https://doi.org/10.1016/j. joint semantic and geometric features for 3D modeling of the built environment,
autcon.2021.103764. Autom. Constr. Elsevier BV 125 (May 2021) (2021) 103584, https://doi.org/
[188] J. Kim, S. Chi, Automation in construction A few-shot learning approach for 10.1016/j.autcon.2021.103584.
database-free vision-based monitoring on construction sites, Autom. Constr. [207] E. Agapaki, I. Brilakis, CLOI-NET: class segmentation of industrial facilities’ point
Elsevier BV 124 (April 2021) (2021) 103566, https://doi.org/10.1016/j. cloud datasets, Adv. Eng. Inform. 45 (November 2019) (2020) 101121, https://
autcon.2021.103566. doi.org/10.1016/j.aei.2020.101121.
[189] T. Czerniawski, B. Sankaran, M. Nahangi, C. Haas, F. Leite, 6D DBSCAN-based [208] C.R. Qi, L. Yi, H. Su, L.J. Guibas, PointNet++: Deep hierarchical feature learning
segmentation of building point clouds for planar object classification, Autom. on point sets in a metric space, in: Proceedings of the 31st International
Constr. Elsevier 88 (April 2018) (2018) 44–58, https://doi.org/10.1016/j. Conference on Neural Information Processing Systems, 4-9 Dec. 2017, Long
autcon.2017.12.029. Beach, CA, USA, 2017. https://papers.nips.cc/paper/2017/file/d8bf84be38
[190] S. Arabi, A. Haghighat, A. Sharma, A deep learning based solution for 00d12f74d8b05e9b89836f-Paper.pdf> (Feb. 12, 2021).
construction equipment detection: from development to deployment, arXiv [209] J. Zhang, X. Zhao, Z. Chen, Z. Lu, A review of deep learning-based semantic
preprint (2019). http://arxiv.org/abs/1904.09021> (Jul. 30, 2021). segmentation for point cloud, IEEE Access, IEEE 7 (2019) 179118–179133,
[191] H. Maeda, Y. Sekimoto, T. Seto, T. Kashiyama, H. Omata, Road damage detection https://doi.org/10.1109/ACCESS.2019.2958671.
and classification using deep neural networks with smartphone images, Comp. [210] T. Czerniawski, J.W. Ma, Fernanda Leite, Automated building change detection
Aid. Civil Infrastr. Eng. 33 (12) (2018) 1127–1141, https://doi.org/10.1111/ with amodal completion of point clouds, Autom. Constr. Elsevier BV 124 (April
mice.12387. 2021) (2021) 103568, https://doi.org/10.1016/j.autcon.2021.103568.
[192] Z. Wang, Y. Zhang, K.M. Mosalam, Y. Gao, S.L. Huang, Deep semantic [211] E. Hoffer, N. Ailon, Deep metric learning using triplet network, in: Proceedings of
segmentation for visual understanding on construction sites, Comp. Aid. Civil the 2015 International Workshop on Similarity-Based Pattern Recognition,
Infrastr. Eng. (2021) 1–18, https://doi.org/10.1111/mice.12701. Lecture Notes in Computer Science, 7 - 9 May 2015, San Diego, CA, USA, 2015,
[193] AWS, Amazon SageMaker. https://aws.amazon.com/sagemaker/, 2021 (Jun. 25, pp. 84–92, https://doi.org/10.1007/978-3-319-24261-3_7.
2021). [212] B. Zhang, N. Guo, J. Huang, B. Gu, J. Zhou, Computer vision estimation of the
[194] Microsoft, Azure Machine Learning. https://azure.microsoft.com/en-us/services volume and weight of apples by using 3D reconstruction and noncontact
/machine-learning/>, 2021 (Jun. 25, 2021). measuring methods, J. Sens. 2020 (2020), https://doi.org/10.1155/2020/
[195] Google, Vertex AI. https://cloud.google.com/vertex-ai>, 2021 (Jun. 25, 2021). 5053407.
[196] K. Mostafa, T. Hegazy, Review of image-based analysis and applications in [213] M. Fallqvist, Automatic Volume Estimation using Structure-from-Motion fused
construction, Autom. Constr. Elsevier BV 122 (February 2021) (2021) 103516, with a Cellphone’s Inertial Sensors, Linköping University, 2016. http://liu.diva
https://doi.org/10.1016/j.autcon.2020.103516. -portal.org/smash/get/diva2:1172784/FULLTEXT01.pdf> (May 20, 2021).
[197] C.J. Liang, V.R. Kamat, C.C. Menassa, Teaching robots to perform construction [214] J.W. Ma, T. Czerniawski, F. Leite, Semantic segmentation of point clouds of
tasks via learning from demonstration, in: Proceedings of the 36th International building interiors with deep learning: augmenting training datasets with synthetic
Symposium on Automation and Robotics in Construction, ISARC 2019, 21-24 May BIM-based point clouds, Autom. Constr. 113 (May 2020) (2020) 103144, https://
2019, Banff, Canada, 2019, pp. 1305–1311, https://doi.org/10.22260/ doi.org/10.1016/j.autcon.2020.103144.
isarc2019/0175. [215] M. Kamari, Y. Ham, Vision-based volumetric measurements via deep learning-
[198] R. Mur-Artal, J.M.M. Montiel, J.D. Tardos, ORB-SLAM: a versatile and accurate based point cloud segmentation for material management in jobsites, Autom.
monocular SLAM system, IEEE Trans. Robot. 31 (5) (2015) 1147–1163, https:// Constr. Elsevier BV 121 (January 2021) (2021) 103430, https://doi.org/
doi.org/10.1109/TRO.2015.2463671. 10.1016/j.autcon.2020.103430.
[199] K. Kim, M. Cao, S. Rao, J. Xu, S. Medasani, Y. Owechko, Multi-object detection [216] Z.W. Yao, Q. Huang, Z. Ji, X.F. Li, Q. Bi, Deep learning-based prediction of piled-
and behavior recognition from motion 3D data, in: Proceedings of the IEEE up status and payload distribution of bulk material, Autom. Constr. 121 (January
Computer Society Conference on Computer Vision and Pattern Recognition 2021) (2021) 103424, https://doi.org/10.1016/j.autcon.2020.103424.
Workshops (CVPRW), IEEE, 20-25 June 2011, Colorado Springs, CO, USA, 2011, [217] A. Rasul, J. Seo, A. Khajepour, Development of integrative methodologies for
pp. 37–42, https://doi.org/10.1109/CVPRW.2011.5981808. effective excavation progress monitoring, Sensors (Switzerland) 21 (2) (2021)
1–25, https://doi.org/10.3390/s21020364.

22

You might also like