Professional Documents
Culture Documents
Automation in Construction
journal homepage: www.elsevier.com/locate/autcon
Review
A R T I C L E I N F O A B S T R A C T
Keywords: Deep learning has been acknowledged as being robust in managing and controlling the performance of con
Safety struction safety. However, there is an absence of state-of-the-art review that examines its developments and
Deep learning applications from the perspective of data utilization. Our review aims to fill this void and addresses the following
Computer vision
research question: what developments in deep learning for data mining have been made to manage safety in con
Natural language processing
struction? We systematically review the extant literature of deep-learning-based data analytics for construction
Knowledge graph
safety management, including: (1) image/video-based; (2) text-based; (3) non-visual sensor-based; and (4) multi-
modal-based. The review revealed three challenges of existing research in the construction industry: (1) lack of
high-quality database; (2) inadequate ability of deep learning models; and (3) limited application scenarios.
Based on our observations for the prevailing literature and practice, we identify that future research on safety
management is needed and focused on the: (1) development of dynamic multi-modal knowledge graph; and (2)
knowledge graph-based decision-making for safety. The application of deep learning is an emerging line of in
quiry in construction, and this study not only identifies new research opportunities to support safety manage
ment, but also facilitates practicing deep learning for construction projects.
* Corresponding author.
E-mail address: luohbcem@hust.edu.cn (H. Luo).
https://doi.org/10.1016/j.autcon.2022.104302
Received 13 September 2021; Received in revised form 13 February 2022; Accepted 26 April 2022
Available online 10 May 2022
0926-5805/© 2022 Elsevier B.V. All rights reserved.
J. Liu et al. Automation in Construction 140 (2022) 104302
4). In Section 5, we review the existing deep learning applications for amount of inputted data. Transfer learning can leverage knowledge and
safety in construction industry to provide a contextual backdrop of the skills learned in similar domains/tasks, and it is being applied as a
research. Finally, the research challenges, recommendation of future common approach to overcome the absence of domain data sets [23,24].
research, and conclusion are discussed and presented. Based on the representation of knowledge to be transferred, transfer
learning algorithms can be categorized into: (1) instance-transfer; (2)
2. Data source for safety in construction feature-representation-transfer; (3) parameter-transfer; and (4)
relational-knowledge-transfer [25].
Within the context of safety, we focus on unsafe behavior and rele Unlabeled data, essentially, is adequate or can be easily collected in
vant conditions that pervade on-site practice. It has been observed that many cases, while high-quality labeled data is normally limited and
88% of the accidents that occurred on construction sites were as a result expensive to acquire. More importantly, the model’s accuracy may not
of unsafe behaviors, while 10% of that was attributable to unsafe work be significantly improved, though additional time is consumed with an
conditions (e.g., plant and structural monitoring). As addressed, AI re enlarged training data set. In this regard, active learning has been
quires different types of data (i.e., digital images and safety reports) developed, with an aim of iteratively selecting information samples from
useful for the management of safety. These include: the unlabeled pool to minimize the cost of constructing datasets and
then improve model performance [26,27]. Emerging from the applica
• Image/video: In the construction industry, video surveillance system tion scenarios engenders three types of active learning: (1) membership
is installed on the sites to record daily activities in real-time with query synthesis; (2) stream-based selective sampling; and (3) pool-based
permission. Additionally, the safety inspectors use cameras or active learning [26]. Membership query synthesis actively generates
phones to record hazards during their on-site inspection ((Fig. 1 (a)). new unlabeled instances to be queried. Stream-based selective sampling
• Safety report: As noted from Fig. 1 (b), the safety report data (e.g., is based on a hypothesis that an unlabeled instance is observed at a time,
quality and safety reporting notice, weekly supervision form) are and query selection is performed in order by identifying whether each
recorded by engineers who are experienced in documenting safety instance is queried. Additionally, pool-based active learning performs
incidents, as is shown in Fig. 1(b). query selection by prioritizing instances using acquisition functions and
• Non-visual sensor data: Having non-visual sensors (e.g., RFID, UWB, assumes that there are many unlabeled instances.
GPS) in place is also a common practice to monitor unsafe behavior Data augmentation is another technique widely used in deep
and conditions. For example, positioning sensors are used to track learning tasks. It deals with overfitting and contributes to improving the
people’s location in construction sites. Likewise, sensors (e.g., performance of model that is associated with unseen samples [28]. Data
displacement, deformation) are used to monitor surrounding settle augmentation focuses on enhancing the value of limited data and can be
ment during excavation. This type of data is stored in numeric divided into the supervised and unsupervised (data augmentation)
format, as shown in Fig. 1 (c). methods. The supervised data augmentation adopts preset data trans
formation rules. It performs data augmentation depending on existing
3. Deep learning in computer science data, e.g., (1) single-sample data augmentation, which is based on the
geometric or color transformation of the sample; and (2) multi-samples
The advent of deep learning algorithms has enabled a wide appli data augmentation, which integrates multiple samples to generate new
cation of deep learning to various areas such as healthcare [20], self- samples. By contrast, the unsupervised data augmentation learns the
driving cars [21], and language translations [22]. In line with the distribution of data through the model, randomly generating samples
research question above, the deep learning techniques and the devel that are consistent with the distribution of the training data set. Notably,
opment of two prevailing applications (e.g., computer vision and natural a typical application of the unsupervised data augmentation algorithm is
language processing) are presented. We examine the deep-learning- Generative Adversarial Network.
related technologies in computer science, as they form the underlying
basis for deep learning-based analytics in construction. 3.2. Deep learning for computer vision
3.1. Deep learning techniques Computer vision is an interdisciplinary field of science, which fo
cuses on constructing computational models to capture contextual in
Having benefitted from large-scale data training has advanced deep formation from digital images or videos for automating the tasks of the
learning. However, in the industries where training data are lacking, the human visual system [29]. Deep-learning-based computer vision has
applications of deep learning are limited due to its rigorously required contributed to producing impressive results and even exceeded human-
2
J. Liu et al. Automation in Construction 140 (2022) 104302
level performance in some cases [30,31,32]. The typical tasks of com language automatically [59,60]. The general pipeline of practicing NLP
puter vision include: tasks is indicated in Fig. 2. Corpus is a collection of texts, which could be
either standard public data sets or existing electronic texts. However,
• Object detection: Object detection describes the ability to detect or corpus preprocessing is time-consuming and laborious, including corpus
identify objects in any given image correctly along with their spatial cleaning, word segmentation, and stop words removal. Word embedding
position in the given image. It is in the form of rectangular boxes is adopted to convert words into vector representations prior to feature
(known as Bounding Boxes) within which the object it is bounded. extraction.
Cases in point regarding object detection networks are relation to The common tasks of NLP include:
Faster R-CNN [33], F-RCN [34], SSD [35], YOLO [36] and RetianNet
[37]. • Name entity recognition (NER): The NLP model is used to identify,
• Object tracking: Object tracking is to predict the size and location of extract and categorize entities from sentences. It comprises the
the target in subsequent frames, given the size and location of the extracting names of people, locations and things, which are derived
target in the initial frame of a video sequence. As a result, the posi from the text and placed under certain categories, such as person,
tion relationship of the object to be tracked in the continuous video location and time. For instance, BERT and BiLSTM-CRF are the ap
sequence is established; thus, the complete trajectory of the object plications of NER [61,62].
can be obtained. Typical attempts (tracking algorithms) that were • Text classification: The NLP model is trained to classify documents/
developed based on deep learning include C-COT [38], DLT [39], text reports according to specific attributes, e.g., subject, document
FCNT [40], and MDNet [41]. type and time. The model extracts the features of the text and then
• Instance segmentation: It aims to locate the individual objects and matches them with the predetermined categories for classification. A
segment their boundaries (i.e., lines and curves) within a scene, popular case in point regarding text classification models is BERT
regardless of whether they are under the same type. Well-known [61].
instance segmentation networks comprise Mask R-CNN [42], Mas • Information retrieval: The NLP model can locate the required infor
kLab [43], and YOLACT [44]. mation among unstructured data. An information retrieval system
• Pose/Activity recognition: Pose/Activity recognition is used to un indexes a collection of documents and analyzes the user’s query.
derstand human actions and behaviors by determining human joints’ Then, it compares the description of each document with the query
location from images, for example, CPM [45], Hourglass [46], CPN and finally generates and presents the relevant results. The prevail
[47], and HRNet [48]. ing methods for information retrieval involve DRMM [63] and
SNRM_PRF [64].
Table 1 presents the examples of deep-learning-based computer • Text summarization: An NLP algorithm can be used to create a
vision algorithms in the context of computer science. shortened version of a text report with main points. Fundamentally,
there are two general approaches for text summarization, i.e.,
3.3. Deep learning for natural language processing abstractive and extractive summarizations. In the abstractive case,
the NLP model engenders an entirely new summary in terms of
Natural language processing (NLP) is a series of theoretically phrases and sentences used in the analyzed text. Regarding the
inspired computing techniques used to represent and analyze human extractive summarization, the model extracts phrases and sentences
from the existing text and groups them into a summary.
Table 1
Examples of deep learning-based computer vision algorithms in computer
4. Research approach
science.
The systematic review, as addressed above, has been performed for
Task Algorithms Performance Reference
this research. It is initially built on retrieving the relevant research pa
SAPD model can achieve a pers from Scopus database. The use of this database is robust for opti
Zhu et al.
SAPD single-model single-scale AP of
[49] mizing search results for a systematic review [65]. A series of keywords,
47.4% on COCO database
FCOS with ResNeXt-64x4d-101 such as “deep learning”, “construction safety” or “safety management”
Tian et al.
Object detection FCOS achieves 44.7% in AP on COCO or “safety”, and “construction”, were inputted into the search engine of
[50]
database the selected database. Keywords were applied in the Title/Abstract/
FSAF model can achieve a state- Keyword’ (T/A/K) field of the Scopus. In this research, the filters were
Zhu et al.
FSAF of-the-art 44.6% mAP on COCO
database
[51] applied as follows: (1) peer-reviewed articles; (2) language: English; and
GMTracker achieves 63.8 IDF1 (3) timespan: between 2012 and 2021.
GM- He et al.
on MOT17 and 63.9 IDF1 on To avoid missing influential papers, the Top 5 journals in terms of the
Tracker [52]
MOT16 number of their articles relating to deep learning-based construction
SiamMOT (DLA-169) achieves
Shuai et al. safety were further searched to filter out the papers containing “deep
Object tracking SiamMOT 53.2 MOTA and 51.7 IDF1 on
HiEve challenge
[53] learning” in their titles, or keywords or abstracts. These include Auto
DyGLIP achieves 84.6 MOTA and mation in Construction (AIC), Advanced Engineering Informatics (AEI),
Quach et al.
DyGLIP 39.9 IDF1 in S05 on AI City
[54]
ASCE Journal of Construction Engineering and Management (JCEM), ASCE
challenge database Journal of Computing in Civil Engineering (JCCE), and Computer-Aided Civil
DC- NAS achieves 84.3% on
Dcnas Cityscapes, and 86.9% on
Zhang et al. and Infrastructure Engineering (CACIE). The filtered papers were then
[55] manually reviewed to determine their relevance. In summary, a total of
Object PASCAL VOC 2012
segmentation DystaB achieves 58.9 and 47.2 185 articles were identified for our next analysis.
Yang et al.
DystaB on DAVIS17 and Youtube-VOS
[56] Fig. 3 illustrates the distribution of the 185 bibliographic records
database
from 2012 to 2021. There is a trend that the number of publications in
SimPoE achieves 56.7 MPJPE
and 26.6 MPJPE on Human3.6 M Yuan et al. this topic has significantly increased over the past five years. Accord
SimPoE ingly, we conclude that deep-learning-based safety management has
Pose/Activity and MPII- INF-3DHP dataset, [57]
recognition respectively. become and continues being a significant research scheme and, there
FCPose achieves 64.8% APkp on Mao et al. fore, a plethora of scholarly works is expected in this area in the near
FCPose
the COCO dataset [58]
future.
3
J. Liu et al. Automation in Construction 140 (2022) 104302
Fig. 3. The year profile of publications. As noted from the above sections, deep-learning-related techniques
relate to data analytics. Therefore, they have the potential to accurately
Research collaboration describes collaborative research works un detect unsafe behavior and unsafe condition. Considering this point of
dertaken to achieve the common goal of producing new scientific view, we examine the progress of deep-learning-based data analytics for
knowledge. A collaboration analysis in this study was run to illustrate safety in construction.
4
J. Liu et al. Automation in Construction 140 (2022) 104302
5.1. Image/video-based data analytics the 3D spatial relationship between people and heavy vehicles using the
3D bounding box reconstructed from the video that is recorded by the
Exploring the relevant literature has identified that previous studies monocular camera, aiming to address the spatial relationship distortion
were focused on using image/video to manage construction safety. They caused by two-dimensional image pixels-based estimation.
primarily concentrated on: (1) prevention of people’s unsafe behavior;
(2) structural health monitoring. 5.1.1.3. Activity-based. People-activity monitoring enables the appli
cation of automatic detection for unsafe behavior, which relies on
5.1.1. Prevention of people’s unsafe behavior mining spatiotemporal information in video frames [14,77,78,79]. For
Research has shown that approximately 88% of accidents are caused example, Ding et al. [14] applied a CNN model to extract the visual
by people’s unsafe behavior [66,67]. An accurate detection for people’s features from videos and utilized an LSTM model to sequence the
unsafe behavior in real-time is thus pivotal for managing safety learning features for unsafe behavior recognition, which outperformed
behavior. As demonstrated above, deep-learning-based computer vision the descriptor-based methods. In response to the characteristics of far-
approaches have been adopted by past studies to identify people’s un field surveillance videos, Luo et al. [78] adapted a deep action recog
safe behavior automatically. According to the methods used for behavior nition method with a fusion strategy to recognize workers’ actions and
monitoring, we categorized such studies into (1) object detection; (2) then proposed a Bayesian nonparametric hidden semi-Markov model to
proximity measurement; (3) activity-based; and (4) semantic reasoning. infer people’s activities based on action sequences.
5.1.1.1. Object detection. Since studies suggest that a considerable 5.1.1.4. Semantic reasoning. The task-oriented model developed based
number of injuries occurred as the result of the absence of personal on predetermined rules cannot identify multiple types of unsafe be
protective equipment (PPE) [68], and detecting whether workers wear haviors. This is because it is unable to accommodate the adjustments of
PPE as required has been the hotspot, including the detection of the hard safety regulations either. To solve this issue, the semantic-reasoning-
hat [69,70,71], vest [69] and safety harness [72]. Emerging from Nath based approach has been developed [80,81,82,83]. For example, Tang
et al.’s research [69], for example, is related to three deep learning et al. [84] proposed a human-object interaction (HOI) recognition model
approaches that are built on You-Only-Look-Once (YOLO) V3 for to detect spatial relationships among objects in digital images so that
determining if a person wears a hard hat, vest, or both properly. The hazards implied in the image can be determined and detected.
result showed that the one-step end-to-end approach ensures the best Continuing with this, Fang et al. [80] combined computer vision algo
performance with a mean average precision of 72.3% and a processing rithms with ontology to develop a knowledge graph to understand
speed of 11 frames per second. construction scene and then infer whether there is an existence of unsafe
behavior. Likewise, Pan et al. [81] integrated zero-shot learning tech
5.1.1.2. Proximity measurement. The improper spatial relationship on niques with a knowledge graph to improve the accuracy and timeless
construction sites engenders risks for people who enter the hazard area. knowledge graph updates. Table 2 summarizes prior key studies on
Fang et al. [73] utilized Mask R-CNN to detect people traversing struc computer vision and deep-learning-based unsafe behavior recognition.
tural support according to whether their masks overlap. Continuing with
this theme, more studies have developed deep-learning-based computer 5.1.2. Structural health monitoring
vision methods to detect people entering dynamic hazard areas, such as Structural health monitoring (SHM) is a strategy and process for
the working area of heavy vehicles, which can lead to struck-by acci damage identification and characterization of engineering structures
dents [74,75,8,76]. For instance, Luo et al. [8] applied YOLO v2 to [89]. Structural damage refers to the change of structural material pa
detect plant and people and then measured the distance between them rameters and geometric characteristics, which can lead to structural
based on perspective transformation. Similarly, Yan et al. [76] estimated failure as well as significant safety accidents under unforeseen
5
J. Liu et al. Automation in Construction 140 (2022) 104302
Table 3
Safety reports record valuable sources of information about how and
Prior works on deeplearning-based structural health monitoring.
why an event occurred. The existing attempts to analyze such reports
Application/ Description Type of data Reference depend on manual processing, while the unstructured free-text format
Task
could make the work be lengthy and challenging. With the development
1) Whether there are Image, of deep learning in NLP, there is a trend to explore deep learning tech
[23,90]
cracks video
niques for automatic safety report analysis. The examples of typical
Crack 2) Locate cracks in Image,
detection the image video
[91,92] application scenarios are shown in Table 4.
3) Geometric Image,
[93,94,95,96,97,98,99]
measurement video Table 4
1) Locate potholes in Image,
[92] Prior works on deep learning-based safety report analysis.
the image video
Pothole
Image, Application/ Description Method Reference
detection 2) Geometric
video, point [100,101] Task
measurement
cloud
Accident causes 1) Classify accident Word2Vec skip-gram [113]
Corrosion 1) Locate corrosion Image,
[102] analysis causes model and a hybrid
detection in the image video
structured deep neural
1) Locate bolts in the
Bolt-loosening Image, network
image and estimate [103]
detection video Convolutional [114]
the rotational angles
bidirectional long short-
1) Whether there are Image,
[104] term memory model
defects video
Sewer 2) Identify the most CNN and hierarchical [115]
2) Locate defects in Image,
pipe defect [105] impactful hazards attention networks
the image video
detection and behaviors
3) Geometric Image,
[106] Hazard 1) Classify hazard BERT [116]
measurement video
classification types Word embedding, CNN, [117]
Structure
1) Concrete strength and LDA
strength Image [107,108]
evaluation LDA, CNN, Word Co- [118]
estimation
occurrence network and
1) Provide fined-
Structure word cloud technology
detailed defect Vibration
damage [109,110,111,112] Accident 1) Critical BiLSTM-CRF [119]
information for signal
diagnosis information information
structure
extraction extraction from
accident reports
6
J. Liu et al. Automation in Construction 140 (2022) 104302
The analysis of accident causes extracted from accident reports is and 93.2% in locating damage and classifying sixteen damage mecha
useful for understanding, predicting, and preventing the occurrence of nisms, respectively.
construction accidents. In this instance, Zhang et al. [114] proposed a
convolutional bidirectional long short-term memory-based method to 5.4. Multi-modal-based data analytics
classify the construction accident narratives according to the type of the
accident cause. Moreover, Baker et al. [115] applied Hierarchical Multiple data fusion is the process of integrating data from different
Attention Networks to learn valid injury precursors from accident re sources to produce more consistent, accurate, and useful information
ports to identify the most significant hazards and behaviors in a fully than that collected from a singular ‘supplier’. Some research has tended
automated and data-driven manner. In contrast, classifying safety re to focus on how multi-modal can be fused and used to manage safety in
ports based on hazard types has received attention [116,117,118]. For construction, attempting to leverage reports to improve the accuracy of
example, Zhong et al. [118] applied a Latent Dirichlet Allocation algo managing safety. For example, Fang et al. (2021) applied a matching
rithm to identify hazard topics, then used a Convolution Neural Network approach by fusing image features and text features to detect people’s
algorithm to classify hazard records, and finally generated a Word Co- unsafe behavior from images [124]. While headway is being made to
occurrence Network to determine the interrelations between hazards manage safety by leveraging reports to improve the accuracy, multi-
and exploited Word Cloud technology to provide a visual overview of source data has not been fully utilized and fused for safety.
hazard records. In addition to the classification task focusing on the
singular dimension, Feng and Chen [119] adopted the BiLSTM-CRF 6. Research challenges
model to extract critical information from accident reports, including
date, location, accident causes, type of accident fatality, and injury Deep learning has been a important technique to support safety
situations. management in the construction industry. Nonetheless, there is an ex
Despite significant contribution above made to the text-based in istence of challenges that hinder its applications.
formation extraction for safety, with the following issues, such as open
information extraction, long text-based information extraction and dy 6.1. Lack of high-quality database
namic information update, are still being questioned. Further, using
information that has been extracted for safety decision-making is being Deep learning for safety management in the construction industry
challenging for research. requires an extensive database for model training. However, the current
database is limited to specific tasks such as worker and excavator
5.3. Non-visual sensor-based data analytics detection [125,126]. One of the main challenges hindering the creation
of a high-quality database is about establishing the ground-truth label.
Managing construction safety through the use of non-visual sensor is The ground-truth labels are critical to the performance of deep
one of foci in the existing literature. The research scopes of relevant learning methods. Nevertheless, it is difficult to obtain such high-quality
studies include: (1) prevention of plant accident; and (2) structural labels due to the limited availability of experienced safety engineers as
health monitoring. well as huge manual ground-truth labeling workloads. For example, the
ground-truth label of psychological signals used for occupational safety
5.3.1. Prevention of plant accident requests people to complete questionnaires, which can be subjective and
Although the plant-related accidents can result in severe casualties error-prone [127,128]. Likewise, the supervised deep learning-based
and significant economic loss, the existing research on applying deep computer vision algorithms for hazard identification need thousands
learning to manage such issues is limited. In essence, several studies of images and corresponding labels. Such training processes requires
have made headways to the operation status control of the shield ma human experts to label images and indicate the relationship between
chine [120,121,122,123]. For example, Zhou et al. [123] proposed a construction resources (e.g., worker, equipment and environment in the
hybrid deep learning model composed of the wavelet transform, CNN, scenes), which may result in ‘false alarms’ in those manual labels. The
and LSTM, to predict shield machine’s attitude and position during false labels can significantly mislead the machine learning process and
Tunnel Boring Machine (TBM) excavation. Their model overwhelmed generate an issue that computer vision models produce false hazard
the three widely used predictive models (ARIMA, LSTM, and WLSTM) in detections.
terms of prediction accuracy. Subsequently, Zhang et al. [122] identified Collecting quality data in complex and dynamic working conditions
a cyber-physical system (CPS)-based hierarchical autonomous control is another challenge for high-quality database creation. The harsh work
scheme to support pressure balance control of slurry shield tunnel environment and people’s perspiration and movement can adversely
boring machines. Emerging from Zhang et al. [122]’s work is a deep affect the signal quality of the physiological sensors for occupational
learning model for the coordination-level controller. However, their health and safety monitoring. Put simply, a model that performs well in
result showed that its performance was not better than an autonomous the laboratory may not effectively work on a construction site in
control system based on the proposed switched model predictive practice.
controller. Strategies, such as data augmentation (e.g., noise addition, genera
In the case of plants, there has been a paucity of research examining tive models) and transfer learning, can contribute addressing and
their location and status. This is due to the difficulties in the identifi resolving the above issues. Further research into augmentation tech
cation of their mechanical parameters and the extraction of attributes. niques would produce more extensive data sets, thereby enhancing the
Moreover, AI-based plant status prediction using historical and real-time performance of deep learning. In addition, semi-supervised learning or
data remains a grand challenge in real projects. unsupervised learning is also a promising way to accommodate the
challenges above, because they require smaller samples for training. As
5.3.2. Structural health monitoring such, future work is needed to develop new semi-supervised learning or
Vibration signal is another data source widely used for SHM, and the unsupervised learning algorithms for safety.
underlying assumption is that the dynamic characteristics and responses
will be changed when there damages exist. In this instance, Sajedi and 6.2. Limited ability of deep learning models
Liang [112] applied a fully convolutional encoder-decoder neural
network for vibration-based SHM technique and performed semantic Extant studies tend to use a relatively small database to train deep
damage segmentation in a grid environment framework available for learning. Therefore, the generalization of the models needs to be
large-scale SHM. The proposed model achieved an accuracy of 96.3% enhanced. Generalization describes the ability of a trained machine
7
J. Liu et al. Automation in Construction 140 (2022) 104302
learning model to predict unseen samples accurately during the model further developed and utilized to mine data for safety management in
testing phase [129]. Thus, a good generalization is needed for any the construction industry (Fig. 6).
practical deep learning algorithms. Normally, the training and testing
samples are different, and the performance of a deep learning model 7.1. Construction of dynamic multi-modal knowledge graph
during testing stage is worse than that in the training process.
Multiple factors may determine the generalization performance of Currently, safety-related information is stored by different stake
deep learning models, such as cross-site and cross-device. For example, holders in the form of heterogeneous data (e.g., images, videos and
there are many wristbands for psychological signal collection on the texts). The data is scattered in multiple systems with diverse, complex
market. Although many of them can measure PPG signals, the signal and isolated characteristics. The value of a single data is not prominent.
quality varies by brand. Even for devices from the same company, the To address the emerging challenges, future research should focus on
quality of signals is varied with the generation of different models. Thus, developing new solutions to integrate and use multi-source data for
signal processing and machine learning approaches need to handle knowledge extraction and decision-making.
cross-device variations effectively. Transfer learning techniques can be One of the feasible solutions for data integration is developing a
used to alleviate the above limitations by making use of data or machine multi-modal knowledge graph, where entities and their interdepen
learning models from an auxiliary domain or task. dency (relationships) are represented and captured as nodes and edges,
Compared with traditional machine learning algorithms, the supe respectively [130]. Both the nodes and edges in a knowledge graph are
riority of deep learning lies in the elimination of feature engineering high-dimensional. As addressed above [130], Fang et al. [80] are
design. However, deep learning models have been identified as being claimed as a pioneer to embed computer vision into ontology to estab
“black boxes”. In this instance, it is difficult for engineers to understand lish a knowledge graph for unsafe behavior recognition that is under
the model’s learning processes and how detection is made and to iden pinned by rule-based reasoning. Multiple unsafe behaviors in the scene
tify the parameters that need to be adjusted for improving detection can be identified at one time, however, the developed knowledge graph
accuracy. To address the above challenges, there is a need for deep is small-scale with very limited reasoning ability.
learning models to integrate engineering knowledge for characterizing As noted from Fig. 6, the first step for constructing a multi-modal
engineering. knowledge graph is to develop a domain ontology. We have identified
that safety codes, historical safety reports, and expert experience are
pivotal for enabling the ontology (Fig. 6).
6.3. Limited to specific applications The second step is of extracting knowledge from multi-source data.
Deep-learning-based computer vision model is robust in extracting
Whilst significant development has been made to identify hazards knowledge (from images), understanding the objects, object-object
using deep learning, it is essential to realize that these approaches are interaction, and implying unsafe events (from images/ videos). Simi
being specific for some certain tasks of safety only, such as defect (e.g., larly, the NLP technique should be developed to identify entities and
crack) detection, or not wearing PPE detection. Put simply, a wide range entity-entity/attribute relationships. Then, different types of knowledge
of problems has not been well addressed by using deep learning tech graph can be identified from the extracted knowledge accordingly. The
niques. With the advancement of big data, cloud computing, deep well-defined entities enable filtering relevant information from multi-
learning, and other informatics technologies, developing new methods is source data and constructing precise knowledge graphs [130].
needed to discover knowledge from multi-source data and train them to The final stage of the Step 2 relates to the fusion of different
detect on-site safety issues in construction within a wider context. knowledge graph. Put simply, it is regarding how to link different
knowledge graphs and fuse them. We have constructed different
7. Recommendation for future work knowledge graphs based on the previous discussion, such as the image/
video knowledge and safety report-based event knowledge graphs.
Addressing the challenges discussed above requires future research. Additionally, the update of the multi-modal knowledge graph should
Noteworthy, existing studies on deep-learning-based safety are also be considered, as multi-source data can be collected using sensors
detection-focused and have been limited to certain safety-related prob (e.g., CCTV) and manual inspection daily. Hence, accurately updating
lems. With this in mind, future research is needed to be conducted the knowledge graph and reducing the “noise” of the data will be
within a wider context so that outcomes generated from deep learning research challenges in our future work.
can be learned by site managers to better manage safety issues and then
identify and take actions to control and minimize them. Accordingly, we
have proposed a framework to illustrate how deep learning can be
8
J. Liu et al. Automation in Construction 140 (2022) 104302
With an accurate multi-modal knowledge graph, the final step of the The authors would like to acknowledge the financial support pro
proposed framework above (Step 3) is with regard to making effective vided by the National Natural Science Foundation of China (Grant No.
and efficient decisions to reduce or prevent accidents. Decision-making 51978302, U21A20151, 71732001) and the China Scholarship Council.
in the construction sites needs to consider different relief stakeholders
(e.g., construction unit, supervision). In addition to identifying and References
assessing hazards from as-built multi-modal knowledge graph in con
struction site, future work on decision making needs to focus on: (1) [1] S. Guo, L. Ding, Y. Zhang, M.J. Skibniewski, K. Liang, Hybrid recommendation
approach for behavior modification in the Chinese construction industry,
dynamic prediction of hazards on construction sites; (2) recommenda
J. Constr. Eng. Manag. 145 (6) (2019) 04019035, https://doi.org/10.1061/
tion of knowledge for hazards prevention; (3) recommendation of (ASCE)CO.1943-7862.0001665.
knowledge for hazard intervention (e.g., worker safety training). [2] P.E.D. Love, L. Ika, B. Luo, Y. Zhou, B. Zhong, W. Fang, Rework, failures, and
We suggest that hazards on construction sites can be predicted in an unsafe behavior: Moving toward an error management mindset in construction,
IEEE Trans. Eng. Manag. (2020) 1–13, https://doi.org/10.1109/
early manner. Noteworthy, the characteristics and patterns of an unsafe TEM.2020.2982463. In press.
event can be noted before it occurs. For example, in the case where a [3] P.E. Love, P. Teo, J. Smith, F. Ackermann, Y. Zhou, The nature and severity of
person enters the excavator dangerous area, their direction will be to workplace injuries in construction: engendering operational benchmarking,
Ergonomics 62 (10) (2019) 1273–1288, https://doi.org/10.1080/
ward the excavator, which is subject to a closer distance to the worker. 00140139.2019.1644379.
Based on the as-built multi-modal knowledge graph, the characteristics [4] Health and Safety Executive, Workplace Fatal Injuries in Great Britain 2021,
and patterns of a hazard can be extracted and analyzed so that hazards 2021. http://www.hse.gov.uk/statistics/pdf/fatalinjuries.pdf (accessed Feburary
14, 2022).
can be predicted in an early manner. Nevertheless, the following [5] U.S. Bureau of Labor Statistics, Census of Fatal Occupational Injuries. htt
research challenges need to be considered: (1) how can an event’s ps://www.bls.gov/iif/oshwc/cfoi/cftb0330.htm, 2019 (accessed Feburary 14,
characteristics and patterns be defined before it occurs? (2) how to 2022).
[6] Ministry of Housing and Urban-Rural Development of the People’’s Republic of
predict the hazard according to the extracted information? and (3) how
China, Circular of the General Office of the Ministry of Housing and Urban-Rural
early can the hazard be predicted? Development on the production safety accidents of housing and municipal
Knowledge for different stakeholders can also be recommended to engineering in 2019. https://www.mohurd.gov.cn/gongkai/fdzdgknr/tzgg/202
006/20200624_246031.html, 2020 (accessed Feburary 14, 2022).
take actions to prevent accidents as per the as-built multi-modal
[7] W. Fang, L. Ding, P.E.D. Love, H. Luo, H. Li, F. Peña-Mora, B. Zhong, C. Zhou,
knowledge graph. Through the consideration for the type of hazards, the Computer vision applications in construction safety assurance, Autom. Constr.
multi-modal knowledge graph can be used to recommend possible so 110 (2020), 103013, https://doi.org/10.1016/j.autcon.2019.103013.
lutions to prevent hazards. In this instance, it is essential for future [8] H. Luo, J. Liu, W. Fang, P.E. Love, Q. Yu, Z. Lu, Real-time smart video
surveillance to manage safety: a case study of a transport mega-project, Adv. Eng.
research to address three questions: (1) how to define the hazard from Inform. 45 (2020) 101100, https://doi.org/10.1016/j.aei.2020.101100.
the perspective of knowledge? (2) how to design a recommendation [9] C. Zhou, H. Luo, W. Fang, R. Wei, L. Ding, Cyber-physical-system-based safety
system and then recommend knowledge to the stakeholders based on the monitoring for blind hoisting with the internet of things: a case study, Autom.
Constr. 97 (2019) 138–150, https://doi.org/10.1016/j.autcon.2018.10.017.
multi-modal knowledge graph? and (3) how to evaluate the effective [10] S. Tang, D.R. Shelden, C.M. Eastman, P. Pishdad-Bozorgi, X. Gao, A review of
ness and feasibility of the recommendation system on hazards solution? building information modeling (BIM) and the internet of things (IoT) devices
integration: present status and future trends, Autom. Constr. 101 (2019)
127–139, https://doi.org/10.1016/j.autcon.2019.01.020.
8. Conclusion [11] C.J. Anumba, A. Akanmu, X. Yuan, C. Kan, Cyber—physical systems development
for construction applications, Front. Eng. Manag. 8 (1) (2021) 72–87, https://doi.
org/10.1007/s42524-020-0130-4.
We have placed a systematic review in this paper to understand the
[12] T. Hartmann, A. Trappey, Advanced engineering informatics-philosophical and
developments of the deep-learning-related studies and their applications methodological foundations with examples from civil and construction
to construction safety, focusing on four key areas according to different engineering, Develop. Built Environ. 4 (2020), 100020, https://doi.org/10.1016/
types of data: (1) image/video-based; (2) text-based; (3) non-visual j.dibe.2020.100020.
[13] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature 521 (7553) (2015)
sensor-based; (4) multi-modal-based. Furthermore, our review has 436–444, https://doi.org/10.1038/nature14539.
provided an insight into the challenges confronted by research and [14] L. Ding, W. Fang, H. Luo, P.E. Love, B. Zhong, X. Ouyang, A deep hybrid learning
practice. These involve: (1) lack high-quality database; (2) inadequate model to detect unsafe behavior: integrating convolution neural networks and
long short-term memory, Autom. Constr. 86 (2018) 118–124, https://doi.org/
ability of deep learning models; and (3) limited to specific applications. 10.1016/j.autcon.2017.11.002.
To further manage safety through the use of deep learning, we have [15] Q. Fang, H. Li, X. Luo, L. Ding, H. Luo, T.M. Rose, W. An, Detecting non-hardhat-
identified that our future study needs to focus on utilizing multi-source use by a deep learning method from far-field surveillance videos, Autom. Constr.
85 (2018) 1–9, https://doi.org/10.1016/j.autcon.2017.09.018.
data and extracting knowledge from the data. In doing so, there is also a [16] B. Zhong, H. Wu, L. Ding, P.E. Love, H. Li, H. Luo, L. Jiao, Mapping computer
need of paying attention to: (1) the construction of dynamic multi-modal vision research in construction: developments, knowledge gaps and implications
knowledge graph; and (2) knowledge graph-based decision-making for for research, Autom. Constr. 107 (2019) 102919, https://doi.org/10.1016/j.
autcon.2019.102919.
safety management. To this end, the contributions of our research are
[17] T.D. Akinosho, L.O. Oyedele, M. Bilal, A.O. Ajayi, M.D. Delgado, O.O. Akinade, A.
threefold, as we have: (1) provided novel insights into examining the A. Ahmed, Deep learning in the construction industry: a review of present status
development of deep learning for studying safety in construction from and future innovations, J. Build. Eng. 32 (2020), 101827, https://doi.org/
10.1016/j.jobe.2020.101827.
the perspective of data utilization; (2) identified the research challenges
[18] L. Hou, H. Chen, G.K. Zhang, X. Wang, Deep learning-based applications for
of adopting deep learning; and (3) presented new research opportunities safety management in the AEC industry: a review, Appl. Sci. 11 (2) (2021) 821,
to support and facilitate deep learning-based data analytics for safety in https://doi.org/10.3390/app11020821.
construction. [19] A. Pal, S.-H. Hsieh, Deep-learning-based visual data analytics for smart
construction management, Autom. Constr. 131 (2021), 103892, https://doi.org/
10.1016/j.autcon.2021.103892.
[20] A. Esteva, A. Robicquet, B. Ramsundar, V. Kuleshov, M. DePristo, K. Chou, C. Cui,
Declaration of Competing Interest G. Corrado, S. Thrun, J. Dean, A guide to deep learning in healthcare, Nat. Med.
25 (1) (2019) 24–29, https://doi.org/10.1038/s41591-018-0316-z.
[21] S. Ramos, S. Gehrig, P. Pinggera, U. Franke, C. Rother, Detecting unexpected
This manuscript has not been published or presented elsewhere in obstacles for self-driving cars: Fusing deep learning and geometric modeling, in:
part or in entirety and is not under consideration by another journal. We 2017 IEEE Intelligent Vehicles Symposium (IV), 2017, pp. 1025–1032, https://
have read and understood your journal’s policies, and we believe that doi.org/10.1109/IVS.2017.7995849.
[22] D.W. Otter, J.R. Medina, J.K. Kalita, A survey of the usages of deep learning for
neither the manuscript nor the study violates any of these. There are no natural language processing, IEEE Trans. Neural Networks Learn. Syst. 32 (2)
conflicts of interest to declare. (2020) 604–624, https://doi.org/10.1109/TNNLS.2020.2979670.
9
J. Liu et al. Automation in Construction 140 (2022) 104302
[23] K. Gopalakrishnan, S.K. Khaitan, A. Choudhary, A. Agrawal, Deep convolutional [47] Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, J. Sun, Cascaded pyramid network for
neural networks with transfer learning for computer vision-based data-driven multi-person pose estimation, Proc. IEEE Conf. Comput. Vis. Pattern Recognit.
pavement distress detection, Constr. Build. Mater. 157 (2017) 322–330, https:// (2018) 7103–7112. https://openaccess.thecvf.com/content_cvpr_2018/papers/
doi.org/10.1016/j.conbuildmat.2017.09.110. Chen_Cascaded_Pyramid_Network_CVPR_2018_paper.pdf.
[24] L. Torrey, J. Shavlik, Transfer learning, handbook of research on machine [48] K. Sun, B. Xiao, D. Liu, J. Wang, Deep high-resolution representation learning for
learning applications and trends: algorithms, methods, and techniques, IGI Global human pose estimation, in: Proceedings of the IEEE/CVF Conference on
(2010) 242–264, https://doi.org/10.4018/978-1-60566-766-9.ch011. Computer Vision and Pattern Recognition, 2019, pp. 5693–5703. https://opena
[25] S.J. Pan, Q. Yang, A survey on transfer learning, IEEE Trans. Knowl. Data Eng. 22 ccess.thecvf.com/content_CVPR_2019/papers/Sun_Deep_High-Resolution_Rep
(10) (2009) 1345–1359, https://doi.org/10.1109/TKDE.2009.191. resentation_Learning_for_Human_Pose_Estimation_CVPR_2019_paper.pdf.
[26] J. Han, S. Kang, Active learning with missing values considering imputation [49] C. Zhu, F. Chen, Z. Shen, M. Savvides, Soft anchor-point object detection, in:
uncertainty, Knowl.-Based Syst. 224 (2021), 107079, https://doi.org/10.1016/j. Computer Vision–ECCV 2020: 16th European conference, Glasgow, UK, August
knosys.2021.107079. 23–28, 2020, Proceedings, Part IX 16, 2020, pp. 91–107, https://doi.org/
[27] J. Kim, J. Hwang, S. Chi, J. Seo, Towards database-free vision-based monitoring 10.1007/978-3-030-58545-7_6.
on construction sites: a deep active learning approach, Autom. Constr. 120 [50] Z. Tian, C. Shen, H. Chen, T. He, Fcos: fully convolutional one-stage object
(2020), 103376, https://doi.org/10.1016/j.autcon.2020.103376. detection, in: Proceedings of the IEEE/CVF International Conference on Computer
[28] C. Shorten, T.M. Khoshgoftaar, A survey on image data augmentation for deep Vision, 2019, pp. 9627–9636. https://openaccess.thecvf.com/content_ICCV_
learning, J. Big Data 6 (1) (2019) 1–48, https://doi.org/10.1186/s40537-019- 2019/papers/Tian_FCOS_Fully_Convolutional_One-Stage_Object_Detection_ICCV_
0197-0. 2019_paper.pdf.
[29] W. Fang, P.E. Love, H. Luo, L. Ding, Computer vision for behaviour-based safety [51] C. Zhu, Y. He, M. Savvides, Feature selective anchor-free module for single-shot
in construction: a review and future directions, Adv. Eng. Inform. 43 (2020) object detection, in: Proceedings of the IEEE/CVF Conference on Computer Vision
100980, https://doi.org/10.1016/j.aei.2019.100980. and Pattern Recognition, 2019, pp. 840–849. https://openaccess.thecvf.com/con
[30] A. Esteva, B. Kuprel, R.A. Novoa, J. Ko, S.M. Swetter, H.M. Blau, S. Thrun, tent_CVPR_2019/papers/Zhu_Feature_Selective_Anchor-Free_Module_for_Single
Dermatologist-level classification of skin cancer with deep neural networks, -Shot_Object_Detection_CVPR_2019_paper.pdf.
Nature 542 (7639) (2017) 115–118, https://doi.org/10.1038/nature21056. [52] J. He, Z. Huang, N. Wang, Z. Zhang, Learnable graph matching: incorporating
[31] H.A. Haenssle, C. Fink, R. Schneiderbauer, F. Toberer, T. Buhl, A. Blum, A. Kalloo, graph partitioning with deep feature learning for multiple object tracking, in:
A.B.H. Hassen, L. Thomas, A. Enk, Man against machine: diagnostic performance Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
of a deep learning convolutional neural network for dermoscopic melanoma Recognition, 2021, pp. 5299–5309. https://openaccess.thecvf.com/content
recognition in comparison to 58 dermatologists, Ann. Oncol. 29 (8) (2018) /CVPR2021/papers/He_Learnable_Graph_Matching_Incorporating_Graph_
1836–1842, https://doi.org/10.1093/annonc/mdy166. Partitioning_With_Deep_Feature_Learning_CVPR_2021_paper.pdf.
[32] A. Jamaludin, T. Kadir, A. Zisserman, SpineNet: automatically pinpointing [53] B. Shuai, A. Berneshawi, X. Li, D. Modolo, J. Tighe, SiamMOT: Siamese multi-
classification evidence in spinal MRIs, in: International Conference on Medical object tracking, in: Proceedings of the IEEE/CVF Conference on Computer Vision
Image Computing and Computer-Assisted Intervention, Springer, 2016, and Pattern Recognition, 2021, pp. 12372–12382. https://openaccess.thecvf.co
pp. 166–175, https://doi.org/10.1007/978-3-319-46723-8_20. m/content/CVPR2021/papers/Shuai_SiamMOT_Siamese_Multi-Object_Tracking_
[33] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object detection CVPR_2021_paper.pdf.
with region proposal networks, IEEE Trans. Pattern Anal. Machine Intell. 39 (6) [54] K.G. Quach, P. Nguyen, H. Le, T.D. Truong, C.N. Duong, M.T. Tran, K. Luu,
(2017) 1137–1149, https://doi.org/10.1109/TPAMI.2016.2577031. DyGLIP: a dynamic graph model with link prediction for accurate multi-camera
[34] J. Dai, Y. Li, K. He, J. Sun, R-fcn: object detection via region-based fully multiple object tracking, in: Proceedings of the IEEE/CVF Conference on
convolutional networks, Adv. Neural Inf. Proces. Syst. 29 (2016), in: https://proc Computer Vision and Pattern Recognition, 2021, pp. 13784–13793. https
eedings.neurips.cc/paper/2016/file/577ef1154f3240ad5b9b413aa7346a1e- ://openaccess.thecvf.com/content/CVPR2021/papers/Quach_DyGLIP_A_Dyna
Paper.pdf. mic_Graph_Model_With_Link_Prediction_for_Accurate_CVPR_2021_paper.pdf.
[35] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, A.C. Berg, Ssd: single [55] X. Zhang, H. Xu, H. Mo, J. Tan, C. Yang, L. Wang, W. Ren, Dcnas: densely
shot multibox detector, in: European Conference on Computer Vision, Springer, connected neural architecture search for semantic image segmentation, in:
2016, pp. 21–37, https://doi.org/10.1007/978-3-319-46448-0_2. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
[36] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real- Recognition, 2021, pp. 13956–13967. https://openaccess.thecvf.com/content
time object detection, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2016) /CVPR2021/papers/Zhang_DCNAS_Densely_Connected_Neural_Architecture
779–788, doi:110.1109/CVPR.2016.91. _Search_for_Semantic_Image_Segmentation_CVPR_2021_paper.pdf.
[37] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object [56] Y. Yang, B. Lai, S. Soatto, DyStaB: Unsupervised object segmentation via
detection, in: Proceedings of the IEEE International Conference on Computer dynamic-static bootstrapping, in: Proceedings of the IEEE/CVF Conference on
Vision, 2017, pp. 2980–2988. https://openaccess.thecvf.com/content_ICCV_2 Computer Vision and Pattern Recognition, 2021, pp. 2826–2836. https://opena
017/papers/Lin_Focal_Loss_for_ICCV_2017_paper.pdf. ccess.thecvf.com/content/CVPR2021/papers/Yang_DyStaB_Unsupervised_Obje
[38] M. Danelljan, A. Robinson, F.S. Khan, M. Felsberg, Beyond correlation filters: ct_Segmentation_via_Dynamic-Static_Bootstrapping_CVPR_2021_paper.pdf.
learning continuous convolution operators for visual tracking, in: European [57] Y. Yuan, S.E. Wei, T. Simon, K. Kitani, J. Saragih, SimPoE: simulated character
Conference on Computer Vision, Springer, 2016, pp. 472–488, https://doi.org/ control for 3D human pose estimation, in: Proceedings of the IEEE/CVF
10.1007/978-3-319-46454-1_29. Conference on Computer Vision and Pattern Recognition, 2021, pp. 7159–7169.
[39] N. Wang, D.Y. Yeung, Learning a deep compact image representation for visual https://openaccess.thecvf.com/content/CVPR2021/papers/Yuan_SimPoE_
tracking, Adv. Neural Inf. Proces. Syst. 26 (2013) 809–817, in: https://proc Simulated_Character_Control_for_3D_Human_Pose_Estimation_CVPR_2021_paper.
eedings.neurips.cc/paper/2013/file/dc6a6489640ca02b0d42dabeb8e46bb7-Pa pdf.
per.pdf. [58] W. Mao, Z. Tian, X. Wang, C. Shen, FCPose: fully convolutional multi-person pose
[40] L. Wang, W. Ouyang, X. Wang, H. Lu, Visual tracking with fully convolutional estimation with dynamic instance-aware convolutions, in: Proceedings of the
networks, in: Proceedings of the IEEE International Conference on Computer IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021,
Vision, 2015, pp. 3119–3127. https://openaccess.thecvf.com/content_iccv_ pp. 9034–9043. https://openaccess.thecvf.com/content/CVPR2021/papers
2015/papers/Wang_Visual_Tracking_With_ICCV_2015_paper.pdf. /Mao_FCPose_Fully_Convolutional_Multi-Person_Pose_Estimation_With_Dynamic_
[41] H. Nam, B. Han, Learning multi-domain convolutional neural networks for visual Instance-Aware_Convolutions_CVPR_2021_paper.pdf.
tracking, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2016) 4293–4302. [59] N. Indurkhya, F.J. Damerau, Handbook of Natural Language Processing, CRC
https://openaccess.thecvf.com/content_cvpr_2016/papers/Nam_Learning_Multi Press, 2010, https://doi.org/10.1201/9781420085938.
-Domain_Convolutional_CVPR_2016_paper.pdf. [60] Y. Zou, A. Kiviniemi, S.W. Jones, Retrieving similar cases for construction project
[42] K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask r-cnn, in: Proceedings of the IEEE risk management using natural language processing techniques, Autom. Constr.
International Conference on Computer Vision, 2017, pp. 2961–2969. https://op 80 (2017) 66–76, https://doi.org/10.1016/j.autcon.2017.04.003.
enaccess.thecvf.com/content_ICCV_2017/papers/He_Mask_R-CNN_ICCV_2017_p [61] J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, Bert: Pre-training of Deep
aper.pdf. Bidirectional Transformers for Language Understanding, arXiv preprint. arXiv:1
[43] L.-C. Chen, A. Hermans, G. Papandreou, F. Schroff, P. Wang, H. Adam, Masklab: 810.04805, 2018. https://arxiv.org/pdf/1810.04805.pdf.
instance segmentation by refining object detection with semantic and direction [62] Z. Huang, W. Xu, K. Yu, Bidirectional LSTM-CRF Models for Sequence Tagging,
features, Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2018) 4013–4022. http arXiv Preprint. arXiv:1508.01991, 2015. https://arxiv.org/pdf/1508.01991.pdf.
s://openaccess.thecvf.com/content_cvpr_2018/papers/Chen_MaskLab_Instance_ [63] J. Guo, Y. Fan, Q. Ai, W.B. Croft, A deep relevance matching model for ad-hoc
Segmentation_CVPR_2018_paper.pdf. retrieval, in: Proceedings of the 25th ACM International on Conference on
[44] D. Bolya, C. Zhou, F. Xiao, Y.J. Lee, Yolact: real-time instance segmentation, in: Information and Knowledge Management, 2016, pp. 55–64, https://doi.org/
Proceedings of the IEEE/CVF International Conference on Computer Vision, 10.1145/2983323.2983769.
2019, pp. 9157–9166. http://openaccess.thecvf.com/content_ICCV_2019/pape [64] H. Zamani, M. Dehghani, W.B. Croft, E. Learned-Miller, J. Kamps, From neural re-
rs/Bolya_YOLACT_Real-Time_Instance_Segmentation_ICCV_2019_paper.pdf. ranking to neural ranking: Learning a sparse representation for inverted indexing,
[45] S.-E. Wei, V. Ramakrishna, T. Kanade, Y. Sheikh, Convolutional pose machines, in: Proceedings of the 27th ACM International Conference on Information and
Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (2016) 4724–4732. https://ope Knowledge Management, 2018, pp. 497–506, https://doi.org/10.1145/
naccess.thecvf.com/content_cvpr_2016/papers/Wei_Convolutional_Pose_Mach 3269206.3271800.
ines_CVPR_2016_paper.pdf. [65] W.M. Bramer, M.K. Rethlefsen, J. Kleijnen, O.H. Franco, Optimal database
[46] A. Newell, K. Yang, J. Deng, Stacked hourglass networks for human pose combinations for literature searches in systematic reviews: a prospective
estimation, in: European Conference on Computer Vision, Springer, 2016, exploratory study, Syst. Rev. 6 (1) (2017) 1–12, https://doi.org/10.1186/s13643-
pp. 483–499, https://doi.org/10.1007/978-3-319-46484-8_29. 017-0644-y.
10
J. Liu et al. Automation in Construction 140 (2022) 104302
[66] H.W. Heinrich, Industrial Accident Prevention. A Scientific Approach, Industrial [91] D. Kang, Y.-J. Cha, Autonomous UAVs for structural health monitoring using deep
Accident Prevention. A Scientific Approach, Second edition, 1941, https://doi. learning and an ultrasonic beacon system with geo-tagging, Comp. Aided Civil
org/10.2105/ajph.22.1.119-b. Infrastruct. Eng. 33 (10) (2018) 885–902, https://doi.org/10.1111/mice.12375.
[67] H. Li, M. Lu, S.-C. Hsu, M. Gray, T. Huang, Proactive behavior-based safety [92] C. Zhang, C.-C. Chang, M. Jamshidi, Concrete bridge surface damage detection
management for construction safety improvement, Saf. Sci. 75 (2015) 107–117, using a single-stage detector, Comp. Aided Civil Infrastruct. Eng. 35 (4) (2020)
https://doi.org/10.1016/j.ssci.2015.01.013. 389–409, https://doi.org/10.1111/mice.12500.
[68] H. Li, X. Li, X. Luo, J. Siebert, Investigation of the causality patterns of non-helmet [93] S. Bang, S. Park, H. Kim, H. Kim, Encoder–decoder network for pixel-level road
use behavior of construction workers, Autom. Constr. 80 (2017) 95–103, https:// crack detection in black-box images, Comp. Aided Civil Infrastruct. Eng. 34 (8)
doi.org/10.1016/j.autcon.2017.02.006. (2019) 713–727, https://doi.org/10.1111/mice.12440.
[69] N.D. Nath, A.H. Behzadan, S.G. Paal, Deep learning for site safety: real-time [94] K. Jang, Y.-K. An, B. Kim, S. Cho, Automated crack evaluation of a high-rise
detection of personal protective equipment, Autom. Constr. 112 (2020), 103085, bridge pier using a ring-type climbing robot, Comp. Aided Civil Infrastruct. Eng.
https://doi.org/10.1016/j.autcon.2020.103085. 36 (1) (2021) 14–29, https://doi.org/10.1111/mice.12550.
[70] J. Shen, X. Xiong, Y. Li, W. He, P. Li, X. Zheng, Detecting safety helmet wearing [95] S. Jiang, J. Zhang, Real-time crack assessment using deep neural networks with
on construction sites with bounding-box regression and deep transfer learning, wall-climbing unmanned aerial system, Comp. Aided Civil Infrastruct. Eng. 35 (6)
Comp. Aided Civil Infrastruct. Eng. 36 (2) (2021) 180–196, https://doi.org/ (2020) 549–564, https://doi.org/10.1111/mice.12519.
10.1111/mice.12579. [96] R. Kalfarisi, Z.Y. Wu, K. Soh, Crack detection and segmentation using deep
[71] J. Wu, N. Cai, W. Chen, H. Wang, G. Wang, Automatic detection of hardhats worn learning with 3D reality mesh model for quantitative assessment and integrated
by construction personnel: a deep learning approach and benchmark dataset, visualization, J. Comput. Civ. Eng. 34 (3) (2020) 04020010, https://doi.org/
Autom. Constr. 106 (2019), 102894, https://doi.org/10.1016/j. 10.1061/(ASCE)CP.1943-5487.0000890.
autcon.2019.102894. [97] J. Liu, X. Yang, S. Lau, X. Wang, S. Luo, V.C.S. Lee, L. Ding, Automated pavement
[72] Q. Fang, H. Li, X. Luo, L. Ding, H. Luo, C. Li, Computer vision aided inspection on crack detection and segmentation based on two-step convolutional neural
falling prevention measures for steeplejacks in an aerial environment, Autom. network, Comp. Aided Civil Infrastruct. Eng. 35 (11) (2020) 1291–1305, https://
Constr. 93 (2018) 148–164, https://doi.org/10.1016/j.autcon.2018.05.022. doi.org/10.1111/mice.12622.
[73] W. Fang, B. Zhong, N. Zhao, P.E. Love, H. Luo, J. Xue, S. Xu, A deep learning- [98] Q. Song, Y. Wu, X. Xin, L. Yang, M. Yang, H. Chen, C. Liu, M. Hu, X. Chai, J. Li,
based approach for mitigating falls from height with computer vision: Real-time tunnel crack analysis system via deep learning, IEEE Access 7 (2019)
convolutional neural network, Adv. Eng. Inform. 39 (2019) 170–177, https://doi. 64186–64197, https://doi.org/10.1109/ACCESS.2019.2916330.
org/10.1016/j.aei.2018.12.005. [99] X. Zhang, D. Rajan, B. Story, Concrete crack detection using context-aware deep
[74] S. Bang, Y. Hong, H. Kim, Proactive proximity monitoring with instance semantic segmentation network, Comp. Aided Civil Infrastruct. Eng. 34 (11)
segmentation and unmanned aerial vehicle-acquired video-frame prediction, (2019) 951–971, https://doi.org/10.1111/mice.12477.
Comp. Aided Civil Infrastruct. Eng. 36 (6) (2021) 800–816, https://doi.org/ [100] F. Wei, G. Yao, Y. Yang, Y. Sun, Instance-level recognition and quantification for
10.1111/mice.12672. concrete surface bughole based on deep learning, Autom. Constr. 107 (2019),
[75] D. Kim, M. Liu, S. Lee, V.R. Kamat, Remote proximity monitoring between mobile 102920, https://doi.org/10.1016/j.autcon.2019.102920.
construction resources using camera-mounted UAVs, Autom. Constr. 99 (2019) [101] H. Wu, L. Yao, Z. Xu, Y. Li, X. Ao, Q. Chen, Z. Li, B. Meng, Road pothole extraction
168–182, https://doi.org/10.1016/j.autcon.2018.12.014. and safety evaluation by integration of point cloud and images derived from
[76] X. Yan, H. Zhang, H. Li, Computer vision-based recognition of 3D relationship mobile mapping sensors, Adv. Eng. Inform. 42 (2019), 100936, https://doi.org/
between construction entities for monitoring struck-by accidents, Comp. Aided 10.1016/j.aei.2019.100936.
Civil Infrastruct. Eng. 35 (9) (2020) 1023–1038, https://doi.org/10.1111/ [102] D.J. Atha, M.R. Jahanshahi, Evaluation of deep learning approaches based on
mice.12536. convolutional neural networks for corrosion detection, Struct. Health Monit. 17
[77] S. Han, S. Lee, F. Peña-Mora, Comparative study of motion features for similarity- (5) (2018) 1110–1128, doi:10.1177%2F1475921717737051.
based modeling and classification of unsafe actions in construction, J. Comput. [103] T.-C. Huynh, J.-H. Park, H.-J. Jung, J.-T. Kim, Quasi-autonomous bolt-loosening
Civ. Eng. 28 (5) (2014) A4014005, https://doi.org/10.1061/(ASCE)CP.1943- detection method using vision-based deep learning and image processing, Autom.
5487.0000339. Constr. 105 (2019), 102844, https://doi.org/10.1016/j.autcon.2019.102844.
[78] X. Luo, H. Li, X. Yang, Y. Yu, D. Cao, Capturing and understanding workers’ [104] S.S. Kumar, D.M. Abraham, M.R. Jahanshahi, T. Iseley, J. Starr, Automated defect
activities in far-field surveillance videos with deep action recognition and classification in sewer closed circuit television inspections using deep
Bayesian nonparametric learning, Comp. Aided Civil Infrastruct. Eng. 34 (4) convolutional neural networks, Autom. Constr. 91 (2018) 273–283, https://doi.
(2019) 333–351, https://doi.org/10.1111/mice.12419. org/10.1016/j.autcon.2018.03.028.
[79] D. Roberts, W.T. Calderon, S. Tang, M. Golparvar-Fard, Vision-based construction [105] M. Wang, S.S. Kumar, J.C.P. Cheng, Automated sewer pipe defect tracking in
worker activity analysis informed by body posture, J. Comput. Civ. Eng. 34 (4) CCTV videos based on defect detection and metric learning, Autom. Constr. 121
(2020) 04020017, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000898. (2021), 103438, https://doi.org/10.1016/j.autcon.2020.103438.
[80] W. Fang, L. Ma, P.E. Love, H. Luo, L. Ding, A. Zhou, Knowledge graph for [106] G. Pan, Y. Zheng, S. Guo, Y. Lv, Automatic sewer pipe defect semantic
identifying hazards on construction sites: integrating computer vision with segmentation based on improved U-net, Autom. Constr. 119 (2020) 103383,
ontology, Autom. Constr. 119 (2020) 103310, https://doi.org/10.1016/j. https://doi.org/10.1016/j.autcon.2020.103383.
autcon.2020.103310. [107] Y. Jang, Y. Ahn, Y. Kim Ha, Estimating compressive strength of concrete using
[81] Z. Pan, C. Su, Y. Deng, J. Cheng, Video2Entities: a computer vision-based entity deep convolutional neural networks with digital microscope images, J. Comput.
extraction framework for updating the architecture, engineering and construction Civ. Eng. 33 (3) (2019) 04019018, https://doi.org/10.1061/(ASCE)CP.1943-
industry knowledge graphs, Autom. Constr. 125 (2021), 103617, https://doi.org/ 5487.0000837.
10.1016/j.autcon.2021.103617. [108] W. Wang, P. Shi, L. Deng, H. Chu, X. Kong, Residual strength evaluation of
[82] R. Xiong, Y. Song, H. Li, Y. Wang, Onsite video mining for construction hazards corroded textile-reinforced concrete by the deep learning-based method,
identification with visual relationships, Adv. Eng. Inform. 42 (2019), 100966, Materials 13 (14) (2020) 3226, https://doi.org/10.3390/ma13143226.
https://doi.org/10.1016/j.aei.2019.100966. [109] M. Azimi, G. Pekcan, Structural health monitoring using extremely compressed
[83] B. Zhong, H. Li, H. Luo, J. Zhou, W. Fang, X. Xing, Ontology-based semantic data through deep learning, Comp. Aided Civil Infrastruct. Eng. 35 (6) (2020)
modeling of knowledge in construction: classification and identification of 597–614, https://doi.org/10.1111/mice.12517.
hazards implied in images, J. Constr. Eng. Manag. 146 (4) (2020) 04020013, [110] Y.-Z. Lin, Z.-H. Nie, H.-W. Ma, Structural damage detection with automatic
https://doi.org/10.1061/(ASCE)CO.1943-7862.0001767. feature-extraction through deep learning, Comp. Aided Civil Infrastruct. Eng. 32
[84] S. Tang, D. Roberts, M. Golparvar-Fard, Human-object interaction recognition for (12) (2017) 1025–1046, https://doi.org/10.1111/mice.12313.
automatic construction site safety inspection, Autom. Constr. 120 (2020), [111] F. Ni, J. Zhang, M.N. Noori, Deep learning for data anomaly detection and data
103356, https://doi.org/10.1016/j.autcon.2020.103356. compression of a long-span suspension bridge, Comp. Aided Civil Infrastruct. Eng.
[85] W. Fang, L. Ding, H. Luo, P.E. Love, Falls from heights: a computer vision-based 35 (7) (2020) 685–700, https://doi.org/10.1111/mice.12528.
approach for safety harness detection, Autom. Constr. 91 (2018) 53–61, https:// [112] S.O. Sajedi, X. Liang, Vibration-based semantic damage segmentation for large-
doi.org/10.1016/j.autcon.2018.02.018. scale structural health monitoring, Comp. Aided Civil Infrastruct. Eng. 35 (6)
[86] H. Son, H. Seong, H. Choi, C. Kim, Real-time vision-based warning system for (2020) 579–596, https://doi.org/10.1111/mice.12523.
prevention of collisions between workers and heavy equipment, J. Comput. Civ. [113] F. Zhang, A hybrid structured deep neural network with Word2Vec for
Eng. 33 (5) (2019) 04019029, https://doi.org/10.1061/(ASCE)CP.1943- construction accident causes classification, Int. J. Constr. Manag. (2019) 1–21,
5487.0000845. https://doi.org/10.1080/15623599.2019.1683692.
[87] Z.H. Lin, A.Y. Chen, S.H. Hsieh, Temporal image analytics for abnormal [114] J. Zhang, L. Zi, Y. Hou, D. Deng, W. Jiang, M. Wang, A C-BiLSTM approach to
construction activity identification, Autom. Constr. 124 (2021), 103572, https:// classify construction accident reports, Appl. Sci. (Switzerland) 10 (17) (2020)
doi.org/10.1016/j.autcon.2021.103572. 5754, https://doi.org/10.3390/app10175754.
[88] H. Wu, B. Zhong, H. Li, P. Love, X. Pan, N. Zhao, Combining computer vision with [115] H. Baker, M.R. Hallowell, A.J.P. Tixier, Automatically learning construction
semantic reasoning for on-site safety management in construction, J. Build. Eng. injury precursors from text, Autom. Constr. 118 (2020), 103145, https://doi.org/
42 (2021), 103036, https://doi.org/10.1016/j.jobe.2021.103036. 10.1016/j.autcon.2020.103145.
[89] D. Balageas, C.-P. Fritzen, A. Güemes, Structural Health Monitoring, John Wiley [116] W. Fang, H. Luo, S. Xu, P.E.D. Love, Z. Lu, C. Ye, Automated text classification of
& Sons, 2010. ISBN:978-1-905209-01-9. near-misses from safety reports: An improved deep learning approach, Adv. Eng.
[90] S. Park, S. Bang, H. Kim, H. Kim, Patch-based crack detection in black box images Inform. 44 (2020), 101060, https://doi.org/10.1016/j.aei.2020.101060.
using convolutional neural networks, J. Comput. Civ. Eng. 33 (3) (2019) [117] B. Zhong, X. Pan, P.E.D. Love, L. Ding, W. Fang, Deep learning and network
04019017, https://doi.org/10.1061/(ASCE)CP.1943-5487.0000831. analysis: classifying and visualizing accident narratives in construction, Autom.
Constr. 113 (2020), 103089, https://doi.org/10.1016/j.autcon.2020.103089.
11
J. Liu et al. Automation in Construction 140 (2022) 104302
[118] B. Zhong, X. Pan, P.E.D. Love, J. Sun, C. Tao, Hazard analysis: a deep learning and and semantic rules, IEEE Trans. Eng. Manag. (2021), https://doi.org/10.1109/
text mining framework for accident prevention, Adv. Eng. Inform. 46 (2020), TEM.2021.3093166.
101152, https://doi.org/10.1016/j.aei.2020.101152. [125] D. Roberts, M. Golparvar-Fard, End-to-end vision-based detection, tracking and
[119] D. Feng, H. Chen, A small samples training framework for deep learning-based activity analysis of earthmoving equipment filmed at ground level, Autom.
automatic information extraction: case study of construction accident news Constr. 105 (2019), 102811, https://doi.org/10.1016/j.autcon.2019.04.006.
reports analysis, Adv. Eng. Inform. 47 (2021), 101256, https://doi.org/10.1016/ [126] B. Xiao, S.-C. Kang, Development of an image data set of construction machines
j.aei.2021.101256. for deep learning object detection, J. Comput. Civ. Eng. 35 (2) (2021) 05020005,
[120] D. Guo, J. Li, S.-H. Jiang, X. Li, Z. Chen, Intelligent assistant driving method for https://doi.org/10.1061/(ASCE)CP.1943-5487.0000945.
tunnel boring machine based on big data, Acta Geotech. (2021) 1–12, https://doi. [127] C.R. Ahn, S. Lee, C. Sun, H. Jebelli, K. Yang, B. Choi, Wearable sensing technology
org/10.1007/s11440-021-01327-1. applications in construction safety and health, J. Constr. Eng. Manag. 145 (11)
[121] Q. Wang, X. Xie, H. Yu, M.A. Mooney, Predicting slurry pressure balance with a (2019) 03119007, https://doi.org/10.1061/(ASCE)CO.1943-7862.0001708.
long short-term memory recurrent neural network in difficult ground condition, [128] I. Awolusi, E. Marks, M. Hallowell, Wearable technology for personalized
Comp. Intel. Neurosci. 2021 (2021), https://doi.org/10.1155/2021/6678355. construction safety monitoring and trending: review of applicable devices,
[122] Y. Zhang, G. Gong, H. Yang, W. Li, J. Liu, Precision versus intelligence: Autom. Constr. 85 (2018) 96–106, https://doi.org/10.1016/j.
autonomous supporting pressure balance control for slurry shield tunnel boring autcon.2017.10.010.
machines, Autom. Constr. 114 (2020), 103173, https://doi.org/10.1016/j. [129] M.T. Musavi, K.H. Chan, D.M. Hummels, K. Kalantri, On the generalization ability
autcon.2020.103173. of neural network classifiers, IEEE Trans. Pattern Anal. Mach. Intell. 16 (6) (1994)
[123] C. Zhou, H. Xu, L. Ding, L. Wei, Y. Zhou, Dynamic prediction for attitude and 659–663, https://doi.org/10.1109/34.295911.
position in shield tunneling: a deep learning method, Autom. Constr. 105 (2019), [130] C. Fan, C. Zhang, A. Yahja, A. Mostafavi, Disaster City digital twin: a vision for
102840, https://doi.org/10.1016/j.autcon.2019.102840. integrating artificial and human intelligence for disaster management, Int. J. Inf.
[124] W. Fang, P.E. Love, L. Ding, S. Xu, T. Kong, H. Li, Computer vision and deep Manag. 56 (2021), 102049.
learning to manage safety in construction: matching images of unsafe behavior
12