You are on page 1of 11

A Novel and Intelligent Safety-Hazard Classification

Method with Syntactic and Semantic Features for


Large-Scale Construction Projects
Dan Tian 1; Mingchao Li, M.ASCE 2; Shuai Han 3; and Yang Shen 4
Downloaded from ascelibrary.org by SOUTH CHINA UNIVERSITY OF on 08/17/22. Copyright ASCE. For personal use only; all rights reserved.

Abstract: To improve the efficiency of safety management, it is important to classify massive and complex construction site safety hazard
texts in large-scale projects. High-precision safety hazard text classification is a lengthy and challenging process. Most existing safety
hazard text classification methods capture semantic information using machine learning or deep learning, ignoring the syntactic dependency
between words. However, syntactic dependency contains rich structural information that is useful to alleviate information loss and enrich
text features. To address these issues, this study proposes a graph structure–based hybrid deep learning method to achieve the automatic
classification of large-scale project safety hazard texts. The method uses syntactic dependency and Bidirectional Encoder Representation
from Transformers to express the syntactic structure and semantic information of text, and a graph structure fusing the syntactic structure
and semantic information is constructed to quantify text information. Further, an encoding-decoding mechanism is built using a graph
convolutional neural network and bidirectional long short-term memory to address graph structure data and classify safety hazard texts.
Our proposed method is used to classify hydraulic engineering construction safety hazard texts, and the classification accuracy reaches
86.56%. Meanwhile, the experimental results demonstrate that our model achieves superior performance compared to existing methods. This
proves the ability of our model to capture and analyze text information and verifies the reliability and effectiveness of this method in large-
scale project safety hazard management. DOI: 10.1061/(ASCE)CO.1943-7862.0002382. © 2022 American Society of Civil Engineers.
Author keywords: Large-scale project; Construction safety hazard; Text classification; Graph structure; Bidirectional Encoder
Representations from Transformers (BERT); Graph convolutional network (GCN); Bidirectional long short-term memory (BiLSTM).

Introduction similarity in terms of management measure and hazard risk and


contains similar safety hazard knowledge, so safety hazard man-
Construction site safety hazards can easily lead to safety incidents agement efficiency can be improved by text classification (Chi et al.
and are a major concern of safety management (Kim and Chi 2019; 2016; Zhong et al. 2020a).
Xu et al. 2021a). In particular, the construction of a large-scale
project is characterized by a wide construction area, scattered per-
sonnel, many parallel construction projects, and complex proce- Problem Statement
dures and construction environment, which increases the difficulty The main task of construction site safety management is to discover
of controlling safety hazards (Ding and Li 2013; Tian et al. 2021). and analyze safety hazards and formulate corresponding safety haz-
To obtain construction site safety hazard information in a timely ard control measures (Han et al. 2021). However, the massive safety
manner, a construction unit will organize a large number of inves- hazard text data are diversified and fragmented, which directly af-
tigations and record safety hazard information with text (Lin et al. fects the efficiency of construction site safety management. Text
2020; Tixier et al. 2016; Qiu et al. 2021). Therefore, large-scale classification can be used to cluster related texts, which can make
project construction sites generate a large number of safety hazard it possible to give a specific theme to clustered texts (Chen et al.
texts, which affects the efficiency of safety hazard management 2022). It is helpful to reduce the difficulty of text analysis and
and analysis. Fortunately, the same safety hazard type has high improve the application value of text information. Thus, managers
will define the safety hazard type in investigating safety hazards.
1
Ph.D. Candidate, State Key Laboratory of Hydraulic Engineering Currently, safety hazard text classification at construction sites
Simulation and Safety, Tianjin Univ., Tianjin 300350, China. Email: mainly relies on manual operation. There are two main problems
tiand@tju.edu.cn with manual classification methods because the content of safety
2 hazards is complicated. First, manual analysis efficiency is low
Professor, State Key Laboratory of Hydraulic Engineering Simulation
and Safety, Tianjin Univ., Tianjin 300350, China (corresponding author). and requires considerable time to understand and classify safety
ORCID: https://orcid.org/0000-0002-3010-0892. Email: lmc@tju.edu.cn hazard text, which does not meet the requirement of automated
3
Postdoctoral Fellow, Dept. of Building and Real Estate, Hong Kong safety hazard text management (Hong et al. 2021; Baker et al.
Polytechnic Univ., Hong Kong. Email: drjasonhan@163.com 2020). Second, the accuracy of manual operation is low, and
4
Senior Engineer, China Three Gorges Corporation, No. 1 Yuyuantan manual classification mainly relies on safety management expe-
South Rd., Haidian District, Beijing 100038, China. Email: tjusy1984@
rience, but there are differences in the experience and knowledge
163.com
Note. This manuscript was submitted on February 22, 2022; approved of managers, and their understanding of safety hazards is also dif-
on May 31, 2022; published online on August 1, 2022. Discussion period ferent, making it difficult to unify the classification results of
open until January 1, 2023; separate discussions must be submitted for in- safety hazards. Meanwhile, the manual operation makes it diffi-
dividual papers. This paper is part of the Journal of Construction Engi- cult to utilize experience with safety hazards and the knowledge
neering and Management, © ASCE, ISSN 0733-9364. gain from that experience. Because there are many safety hazard

© ASCE 04022109-1 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 2022, 148(10): 04022109


types at a large-scale project construction site, the boundaries of dependency and semantic similarity within text. (3) To the best of
safety hazard types are undefined, and it is easy to cause errors in our knowledge, this is the first application of a GCN model to
understanding. safety hazard classification. The GCN carried out in-depth analysis
Especially in the field of large-scale project safety hazard man- of the syntactic structure and semantic information presented by
agement, the investigation and recording of safety hazards are usu- graph data and improves the accuracy of safety hazard recognition.
ally carried out by many people. Safety hazard texts are complex
and diverse, and the same safety hazard type may have different
forms of expression, which decreases the correlation between Literature Review
safety hazard texts and increases the difficulty of text analysis.
Although some scholars have developed safety hazard classifica- There exist two text classification methods for construction safety
tion methods by machine learning and deep learning, it is difficult management text: shallow machine learning–based text classifi-
to obtain comprehensive text features only by analyzing the seman- cation and deep learning–based text classification (Salama and
Downloaded from ascelibrary.org by SOUTH CHINA UNIVERSITY OF on 08/17/22. Copyright ASCE. For personal use only; all rights reserved.

tic information for complex safety hazard texts (Zhang et al. 2019; El-Gohary 2016; Liu et al. 2021; Zhang et al. 2020). Machine
Chen et al. 2022). The existing methods still have a large improve- learning–based text classification methods are used to extract fea-
ment space in computational accuracy. Fortunately, there are many tures by analyzing text semantics. Chokor et al. (2016) used unsu-
technical terms in a safety hazard text. Although the semantic dif- pervised machine learning methods to analyze safety incidents and
ference between technical terms is ubiquitous, the function of tech- establish an accident report classification model based on the
nical terms of the same type is similar in syntactic structure, which k-means method. The model evaluated the capabilities of machine
allows for an effective approach to strengthening the relationship learning methods in safety management. Goh and Ubeynarayana
between different expressions of safety hazards. Thus, it is neces- (2017) evaluated six machine learning algorithms, including a
sary to establish an automatic classification method suitable for the support vector machine (SVM), linear regression, random forest,
management of large-scale construction site safety hazards. k-nearest neighbor, decision tree, and naive Bayes, and found that
the SVM outperformed other classifiers, which verified the ap-
plicability of machine learning algorithms for safety management
Proposed Solution text analysis. Zhang et al. (2019) proposed an ensemble model in-
Motivated by the preceding discussion, this study proposes a graph tegrating five machine learning methods to classify the causes of
structure–based hybrid deep learning method (GSHDLM) that accidents, and an actual case was used to verify the accuracy and
integrates syntactic structure and semantic information. It aims to robustness of the method. The proposed joint learning model has
improve text classification accuracy and solve large-scale project been reported to have better accuracy than a single machine learn-
safety hazard management problems. The method uses syntactic ing method. Shallow machine learning can only use shallow fea-
dependency to express text structure. The Bidirectional Encoder tures specified by humans, and the process of feature extraction is
Representations from Transformers (BERT) method is adopted for influenced by human domain knowledge; thus, it is difficult to
training and obtaining word vectors. The text semantic information achieve deep mining of text features (Zhong et al. 2020a; Fang
is expressed as the similarity between words. The syntactic struc- et al. 2020). Unlike shallow machine learning, deep learning algo-
ture and semantic information are used to build a safety hazard rithms can automatically identify text features and use nonlinear
information network graph as the model input. The nodes indicate combinations of multiple functions to learn complex tasks from
words in a document and the edges indicate the syntactic and training data (Lecun et al. 2015; Alam et al. 2020). Zhong et al.
semantic relations from one node to another. Furthermore, a graph (2020b) built a text classification model fusing the Latent Dirichlet
convolutional network (GCN) and bidirectional long short-term Allocation (LDA) method and a convolutional neural network
memory (BiLSTM) are used to establish a text encoding (CNN) model to automatically recognize construction safety haz-
and decoding framework that can extract key features from infor- ards, which confirmed the operability and reliability of deep learn-
mation network graphs and achieve end-to-end safety hazard ing methods in construction text classification. Fang et al. (2020)
recognition. developed an improved deep learning method based on BERT to
The text classification model proposed in this study combines achieve automatic text classification and validate the effectiveness
the safety hazard text features of large-scale projects, not only cap- and feasibility of the method, which proved the superiority of
turing semantic correlations but also considering syntactic struc- BERT for safety text classification. Cheng et al. (2020) proposed
ture relationships. The model strengthens the relationship between a symbiotic gated recurrent unit (GRU) incorporating a GRU and
words, enhances the features of text information, and reduces the symbiotic organisms search (SOS) to classify construction site
impact of information fragmentation for safety hazard classification accident texts, which can search the best parameters of the GRU to
by constructing an information network graph. Furthermore, the ensure optimal performance. Feng and Chen (2021) proposed a
encoding and decoding framework based on a GCN and BiLSTM natural language data augmentation–based small sample training
can fully integrate the node information and discover the potential framework, and then the BiLSTM–conditional random field was
syntactic and semantic relationships, which means the model is used to classify construction safety accident features, which carried
capable of selectively utilizing relevant information that is helpful out the automatic extraction of safety hazard information.
for safety hazard classification. These studies verified the practicability and reliability of ma-
The main innovations and contributions of this paper can be chine learning and deep learning in safety text classification.
summarized as follows. (1) An encoding-decoding structure frame- However, they mainly used semantic analysis to extract text fea-
work based on a GCN and BiLSTM is proposed to fuse syntactic tures, and text features contain not only semantic information but
structure semantic information, which enhances safety hazard text also syntactic structure information (Xu et al. 2021b). A syntactic
features and improves the text understanding effect. (2) To incor- structure provides a rule for a sentence to express the structural re-
porate the syntactic structure and semantic information into text lationship between words. Especially for large-scale project safety
features, BERT is used to calculate the semantic similarity between hazard texts, the syntactic structure can strengthen the text features
words; a graph structure is constructed to capture the syntactic and and improve the accuracy of text analysis. The syntactic structure is
semantic information with words as nodes based on the syntactic the key to extracting the correlation information between words,

© ASCE 04022109-2 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 2022, 148(10): 04022109


which can enrich text information and avoid or reduce information In safety hazard text, the dependency between words includes
loss caused by long-distance dependency (Hu et al. 2021; Ko and the subject–predicate relationship (SBV), verb–object relationship
Jeong 2020). An association network generated by syntactic struc- (VOB), adjective-in-state relationship (ADV), and so on, but only
ture analysis is graph data, which can represent the spatial relation- one core relationship (HED) is included (Fig. 1).
ship between words and show the propagation path of information The Hanlp toolkit is used to calculate the syntactic dependency
(Lu et al. 2018). It obtains structural information on the network, between words in safety hazard texts, and an adjacency matrix
such as the degree distribution and communities, which further is constructed to show the syntactic structure, which is defined as
enriches the text information. Graph data, as non-Euclidean spatial follows:
data, do not meet translation invariance requirements and cannot be 2 3
addressed using traditional deep learning models (Hu et al. 2021). 0 a12 a13 · · · a1n
6 7
GCNs, a specialized method for processing graph data, have been 6 a21 0 a23 · · · a2n 7
6 7
applied to a number of fields (Gao and Huang 2021). Park et al. 6 7
Downloaded from ascelibrary.org by SOUTH CHINA UNIVERSITY OF on 08/17/22. Copyright ASCE. For personal use only; all rights reserved.

A ¼ 6 a31 a32 0 · · · a3n 7 ð1Þ


(2020) developed an attention-based graph convolutional network 6 7
6 ··· ··· ··· ··· ··· 7
(AGCN) leveraging contextual information and structural knowl- 4 5
edge to address the problem of key information loss. Hu et al. an1 an2 an3 · · · 0
(2021) proposed a graph convolutional network over interclause
dependency to fuse semantics and structural information, which where A = syntactic dependency-based adjacency matrix; n = num-
can automatically learn how to selectively attend the relevant ber of words contained in sentence; and aij = syntactic dependency
clauses useful for emotion cause analysis. between the ith and jth words, which is either 0 or 1 and defined by
In summary, a GCN provides an effective method for extracting 
and analyzing text syntactic structure features. If only syntactic 1 if Rðwi ; wj Þ ≠ null
structure information is used for rule construction and feature ex- aij ¼ ð2Þ
0 if Rðwi ; wj Þ ¼ null
traction, the nonlinear semantic relationship between words in sen-
tences will not be learned and utilized, which will easily lead to text where Rðwi ; wj Þ = syntactic dependency calculated by Hanlp;
classification errors. Therefore, it is necessary to establish a large- when there exists a syntactic dependency between word wi and
scale project safety hazard classification method fusing semantic word wj , that is, Rðwi ; wj Þ ≠ null, aij is equal to 1. Otherwise, aij
and structural features. is equal to 0.

Text Semantic Information


Methodology Syntactic dependency allows for the extraction of text information
at a structural level, but it is difficult to express the semantic rela-
tionship between words. To deeply analyze text semantics and fully
Construction Text Information Quantization
consider the contextual relationship, the BERT model is used to
calculate a word vector (Wang et al. 2020a). In Fig. 2, the input to
Syntactic Structure Information the BERT model includes three parts: word embedding, position
Text content is mainly expressed in the form of words, and the embedding, and segmentation embedding. Word embedding is the
words are combined according to specific syntactic rules to de- sentence feature. Position embedding encodes the position of
scribe construction site safety hazard content. Syntactic structure words in the input text and quantifies the relative positional rela-
can accurately define and distinguish the role of the same words tionship of words. Segmentation embedding is mainly used to
in a sentence, which is significant for improving the accuracy of distinguish sentences. When the text contains a sentence, the seg-
text understanding (Barnes et al. 2021). For example, “The inside mentation embedding is all 0. Otherwise, segmentation embedding
of the distribution box is seriously polluted, which can easily cause will use different numbers to express sentences and realize sentence
a short circuit in the distribution box” includes “distribution box” distinction. The hidden layer of BERT mainly uses a bidirectional
twice, which have the same semantics, but the functions of the transformer model for encoding and decoding text. The bidirec-
words in the syntactic structure are not the same. The first “distri- tional transformer model is used to analyze the textual context
bution box” is used as the subject to describe the specific object of and obtain text features (Devlin et al. 2018). The BERT model
pollution; the second “distribution box” is used as an attribute to needs to be pretrained to make the BERT model match the safety
modify the “short circuit.” Therefore, it is necessary to judge the hazard word vector calculation task. The pretraining mainly con-
syntactic structure relationship between words in safety hazard text tains two parts. One is the word masking model, which uses
analysis. Syntactic dependency can reveal the syntactic structure [MASK] to randomly replace some words in the sentence and then
relationship between words, which can offset a deficiency of syn- transmits the sentence to BERT to encode each word’s information.
tactic structure information in semantic analysis (Guo et al. 2021). Finally, the encoding information of [MASK] is used to predict the

HED CMP
ATT
ADV

root Distribution box inner pollution serious

Fig. 1. Syntactic dependency.

© ASCE 04022109-3 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 2022, 148(10): 04022109


Output C TThe Tlamp Tis T[Mask] Tout T[SEP] TThere Texist Tsafety T[Mask] Tproblem T[SEP]

T1 T2 … Tk … T1 T2 … Tu … T1 T2 … Tn

Trm Trm … Trm … Trm Trm … Trm … Trm Trm … Trm

BRET

Trm Trm … Trm … Trm Trm … Trm … Trm Trm … Trm

… … … … …
Downloaded from ascelibrary.org by SOUTH CHINA UNIVERSITY OF on 08/17/22. Copyright ASCE. For personal use only; all rights reserved.

E1 E2 Ek E1 E2 Eu E1 E2 En

Embedding E[CLS] EThe Elamp Eis E[Mask] Eout E[SEP] EThere Eexist Esafety E[Mask] Eproblem E[SEP]

Masked [CLS] The lamp is [Mask] out [SEP] There exist safety [Mask] problem [SEP]

Input [CLS] The lamp is burnt out [SEP] There exist safety hazard problem [SEP]

Fig. 2. BERT model.

correct word for this position (Devlin et al. 2018; Li et al. 2021a). text information graph G ¼ ðN; EÞ, N is a node of the graph, that is,
The other is the prediction of the next sentence, inputting sentences it represents the syntactic structure correlation and semantic infor-
into the BERT model to predict the sequence of sentences. By mation correlation obtained in the section “Construction Text
pretraining on safety hazard text, the semantic features are deeply Information Quantization” (Fig. 3).
extracted, and the word vector is obtained. For a k-layer GCN, the initial input data are x ∈ Rn×m , where n
Inputting the construction site safety hazard text into the is the node number of Graph G and m is the dimension of the word
BERT model calculates the text word vector, and then the word vector. Thus, the input of the GCN is defined as
vector is used to express the correlation between words, which is
defined as X ð0Þ ¼ x ð5Þ

Vi × Vj where X ð0Þ = initial input of the GCN model; and X = word vector
Sij ¼ ð3Þ set calculated by BERT. The normalized matrix X ð0Þ is regarded as
jV i j × jV j j
input, text features are propagated among layers of the GCN, and
where Sij = similarity between words i and j in sentence; and the feature propagation mode is defined as
V i and V j = word vectors of words i and j calculated by BERT.
~ −12 A~ D
X ðkÞ ¼ σðD ~ −12 X ðk−1Þ W ðk−1Þ Þ ð6Þ
A semantic correlation matrix is built by combining the word
co-occurrence and similarity to represent the semantic information,
where X ðkÞ = k th-layer output of GCN; X ðk−1Þ = k th-layer input of
which is defined as follows:
GCN; σð·Þ = activation function; Wðk−1Þ = weight matrix; D ~ =
2 3
1 S12 S13 · · · S1n degree matrix of Graph G; and A~ = adjacency matrix that contains
6 7 the syntactic structure and semantic information, which is deter-
6 S21 1 S23 · · · S2n 7
6 7 mined as
6 7
S ¼ 6 S31 S32 1 · · · S3n 7 ð4Þ
6 7
6 ··· ··· ··· ··· ··· 7 A~ ¼ A þ S ð7Þ
4 5
Sn1 Sn2 Sn3 · · · 1 The quantified construction site safety hazard text is input into
the GCN model, and the text syntactic structure and semantic in-
where S = semantic correlation matrix; and Sij = similarity formation are analyzed by Eq. (6); the encoding result is obtained
between ith and jth words in the sentence. as follows:
ðkÞ ðkÞ ðkÞ ðkÞ
X ¼ fX 1 ; X 2 ; X 3 ; : : : ; X C g ð8Þ
GCN-Based Text Information Encoding
ðkÞ
Based on the syntactic structure–based adjacency matrix and se- where H = text feature with GCN encoding; X 1 = first sentence in
mantic correlation matrix, a GCN is used to encode the syntactic safety hazard text; and C = total number of sentences.
structure and semantic information. The information transfer me-
chanism of the GCN is similar to that of a multilayer perceptron,
BiLSTM-Based Text Information Decoding
and the difference between them is that the GCN has graph data
to capture the information of adjacent nodes (Hu et al. 2021; Considering the long-distance dependencies between text features,
Wang et al. 2020b). Unlike the existing GCN model (Zhou BiLSTM is used to decode the text features obtained using GCN
et al. 2020; Bai et al. 2021), this study takes graph data built by an encoding. BiLSTM provides a bidirectional mechanism that can
adjacency matrix and semantic correlation matrix as input, which deeply analyze text features from two different directions (Zhang
contains the syntactic structure and semantic information. For the et al. 2020; Zhong et al. 2020c). It is composed of multiple long

© ASCE 04022109-4 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 2022, 148(10): 04022109


Hidden Layer f11 f2
e11e11 e2 e2
f11 f2
e1 e2 f2
f1
f3 f4
e3 e3 f4
e4 e4 f3
e3 e4 f5
e5 e5 f3 f4
f5
e5 f5
f7
e6 e6 e7 e7 f6 f7
e8 e8 e7 f6 f8
f7
e6 f6 f8
e8 f8
Downloaded from ascelibrary.org by SOUTH CHINA UNIVERSITY OF on 08/17/22. Copyright ASCE. For personal use only; all rights reserved.

Graph structure Graph feature extraction Graph feature

Fig. 3. GCN model.

Ct-1

tanh

Forget gate Input gate tanh Output gate

Fig. 4. LSTM unit.

short-term memory (LSTM) units, and each LSTM unit includes where Ct = internal state at current moment; Ct−1 = internal state of
an input gate, forget gate, and output gate (Ren et al. 2021) previous LSTM unit; and C~ t = candidate state of text information at
(Fig. 4). The input gate selectively inputs new information into current moment.
the LSTM unit. The forget gate selectively forgets information in The text features encoded by the GCN are input into the
the LSTM unit, controlling the amount of information that needs to ! !
BiLSTM model to calculate the forward hidden state H ¼ f h 1 ;
be forgotten in the previous LSTM unit. The output gate controls ! !
h 2 ; : : : ; h u g and the backward hidden state H ¼ f h 1 ; h 2 ; : : : ;
how much information exists in the current LSTM unit to input into
the external state (Li et al. 2021b). h u g. The forward and backward hidden states are connected to
The conversion relationship of the text feature encoded by the form a text feature decoding result:
GCN model between the three gates is defined as follows: !
H ¼ ½H ; H  ð11Þ
ðkÞ
f t ¼ σðwf · ½ht−1 ; X t  þ bf Þ where H = text feature decoding result based on BiLSTM model;
ðkÞ and [,] = concatenation operations.
it ¼ σðwi · ½ht−1 ; X t  þ bi Þ
ðkÞ
ot ¼ σðwo · ½ht−1 ; X t  þ bo Þ ð9Þ Construction Safety Hazard Text Classification Model
ðkÞ
where ft = forget gate; it = input gate; ot = output gate; = text Xt We propose a GSHDLM combining BERT, a GCN, and BiLSTM
feature encoded by GCN; ht−1 = external state of previous LSTM to fuse syntactic structure and semantic information and achieve
unit; and w and b = weight vector and bias vector of each gate, intelligent large-scale project safety hazard recognition (Fig. 5).
respectively. The external state ht of the LSTM unit is calculated The method includes five layers: input layer, graph construction
by combining the input, output, and forget gates, defined as layer, text encoding layer, text decoding layer, and output layer.
They are described as follows:
ht ¼ ot ⊙ tanhðCt Þ Input layer: The input layer includes two aspects. One is the
syntactic structure analysis of the safety hazard text, which uses
Ct ¼ f t ⊙Ct−1 þ it ⊙C~ t ð10Þ Hanlp to analyze and extract the syntactic relationship between

© ASCE 04022109-5 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 2022, 148(10): 04022109


w1 w2 w3 … wn w2
w1
w1 1 1.8 1.7 … 0.5
w2 1.8 1 0.6 … 1.7
w3 1.7 0.6 1 … 0.6

Encoding Layer
w2 w3
… … … … … …
wn
wn 0.5 1.7 0.6 … 1
Graph Structure Layer
w6 w4

w1 w2 w3 … wn w1 w2 w3 … wn
w1 0 1 1 …

0 w1 1 0.8 0.7 … 0.5
w2 1 0 0 … 1 w2 0.8 1 0.6 … 0.7
w3 1 0 0 … 0 w3 0.7 0.6 1 … 0.6
Downloaded from ascelibrary.org by SOUTH CHINA UNIVERSITY OF on 08/17/22. Copyright ASCE. For personal use only; all rights reserved.

… … … … … … … … … … … …
wn 0 1 0 … 0 wn 0.5 0.7 0.6 … 1

Decoding Layer
LSTM LSTM LSTM … LSTM

Syntactic
actic structure BERT … LSTM
LSTM LSTM LSTM

Input Output
Layer W1 W2 W3 … Wn W1 W2 W3 … Wn … Layer

Fig. 5. Graph structure–based hybrid deep learning method workflow.

words. The other is the semantic analysis of safety hazard text. Case Study
BERT is used to analyze the textual context, extract the text
semantic information, and obtain a word vector from the seman- Text Data Collection and Preprocessing
tic level.
Graph construction layer: The graph construction layer uses Taking the safety hazard text derived from a hydraulic engineering
the syntactic structure relationship and the word vector input construction site as the data source, a total of 28,756 safety hazard
from the input layer to build a safety hazard graph structure, which texts were collected; each text recorded a safety hazard on the
takes the words as nodes and uses the syntactic and semantic construction site, which contained the location, time, and content
relationship between words as edges. Notably, semantic relations of safety hazards. Combining the text characteristics and national
between words are expressed as the similarity calculated by word standards, the hydraulic engineering safety hazard was divided
vector. into 12 categories: vehicle damage, electrocution, falling accident,
Text encoding layer: The text encoding layer uses the GCN collapse, incivilization construction, object strike, mechanical
model to analyze the graph structure obtained in the graph con- damage, lifting injury, violation behavior (e.g., illegal operations,
struction layer. It aims to realize safety hazard text encoding from illegal command, violation of labor discipline), fire, explosion, and
the level of the syntactic structure and semantic information and drowning.
convert the graph data into a text feature vector, which is regarded Owing to the wide construction scope and complex procedures
as the key data for feature decoding to realize safety hazard depth of hydraulic engineering, the investigation of safety hazards mainly
analysis. depends on employees. Employees upload the safety hazards at the
Text decoding layer: The text decoding layer takes the en- construction site by the safety hazard management APP, and then
coded text features as input, and the BiLSTM model is used to managers compile a list of safety hazards based on the uploaded
analyze the text features in the forward and backward directions safety hazards (Lin et al. 2019). The safety hazard list can reflect
and obtain long-distance dependency. The purpose of this layer in real time the safety management status of the construction site.
is to strengthen the text feature through the decoding process, However, the safety hazard category is not clearly marked in the
which is helpful to improve the accuracy of safety hazard text list. It is necessary to mark safety hazard texts before classification
classification. calculation. The safety hazard texts are marked by analyzing the
Output layer: The output layer classifies the safety hazard ac- relationship between key information and category. For example,
cording to the extracted text feature. The text feature decoded using “the insulation rubber of the cable is damaged and the cable core is
BiLSTM is taken as input, and Dropout technology is used to pre- exposed” includes key information, for example, “cable,” “insula-
vent overfitting. The Softmax classifier is adopted to calculate the tion rubber,” and “damaged,” which is related to electrocution, so
probability of each safety hazard category and obtain the text the category of this text is defined as electrocution.
classification result: The marked text is divided into three parts: (1) training set,
which is used to extract text features and optimize text data sam-
Y ¼ soft maxðHW 0 Þ ð12Þ ples; the number of training sets is 20,540; (2) validating set, which
is used to adjust the parameter values and evaluate the computing
where W 0 = weight matrix; H = text feature obtained from ability of the model; the data volume of the validating set is 4,108;
encoding-decoding mechanism; and Y = safety hazard text category and (3) testing set, which is used to evaluate the generalization abil-
prediction matrix. The category of each safety hazard text can be ity of the model; the data volume of the testing set is 4,108. Table 1
defined by analyzing the prediction matrix. shows the division of the texts.

© ASCE 04022109-6 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 2022, 148(10): 04022109


Table 1. Hydraulic engineering safety hazard text sample
Label Content Training Validating Testing
Vehicle damage There is no edge warning on the road, which is prone to vehicle damage. 750 150 150
Electrocution The welding machine is placed directly on the steel bars without insulation 3,480 696 696
protection.
Falling accident The gap in the railings was not restored in time after the rack was removed. 3,060 612 612
Collapse Rock fissures are obviously open and fragmented, and there is a risk of 1,070 214 214
collapse.
Incivilization construction The construction site materials were messy, and the rubbish was not cleaned up 3,185 637 637
in time.
Object strike A small amount of bamboo gangplank was placed on the side slope frame, and 2,235 447 447
no binding measures were taken.
Downloaded from ascelibrary.org by SOUTH CHINA UNIVERSITY OF on 08/17/22. Copyright ASCE. For personal use only; all rights reserved.

Mechanical damage Woodworking chainsaws did not meet the requirements for tool and 1,150 230 230
equipment.
Lifting injury The slag cleaning workers did not take avoidance measures when the excavator 910 182 182
was operated.
Violation behavior Workers did not wear safety helmets correctly. 910 182 182
Fire The place where the welding machine is put was not equipped with a fire 1,615 323 323
extinguisher.
Explosion Oxygen and acetylene cylinder were placed in the open air, which may cause 1,305 261 261
explosion hazards.
Drowning There were no warnings in the water-collecting well, and there were safety 870 174 174
hazards.
Total 20,540 4,108 4,108

Python 3.6 was used as the programming language for devel- The results in Table 2 show that the average precision rate, recall
oping a computing environment, with PyTorch being imported for rate, and F1 value of the GSHDLM proposed in this study are
handling the GCN and BiLSTM models. The model was designed 87.12%, 86.59%, and 86.46%, respectively, which can accurately
with the following parameters: number of LSTM units ¼ 128, identify potential safety hazards on construction sites. This method
embedding dimension ¼ 768, hidden layer size ¼ 128, number of has good performance on safety hazard recognition, such as vehi-
iterations ¼ 100, dropout probability ¼ 0.5, learning rate ¼ 0.001, cle damage, electrocution, incivilization construction, mechanical
and neurons in the GCN layer ¼ 128. damage, lifting injury fire, explosion, and so on. The F1 value ex-
ceeds 85%. The accuracy of the model was verified by comparing
the TextCNN and BiLSTM methods. As seen in Table 2, the aver-
Model Performance Evaluation age F1 value of the GSHDLM was 86.46%, and the accuracy rate
For construction site safety hazard texts, BERT is used to calculate was 86.56%; the average F1 values of the TextCNN and BiLSTM
word vectors and obtain similarities between words. An adjacency methods were 80.42% and 79.56%, respectively. Compared with
matrix is obtained by combining similarity and syntactic structure the TextCNN method, the precision rate, recall rate, and F1 value
correlations. Taking the graph constructed by the correlation matrix of the model in this study were 5.60%, 5.92%, and 6.04% higher,
as the input, the GCN is used to carry out text feature encoding, and respectively. Compared with the BiLSTM method, the accuracy
then BiLSTM is used to complete text feature decoding and output rate, recall rate, and F1 value of the model in this study were
the safety hazard classification result. Precision, recall rates, and F1 6.21%, 6.53%, and 6.90% higher, respectively. Except for electro-
score metrics (F1) are widely used as performance indicators to cution, the F1 value of the GSHDLM was higher than the TextCNN
evaluate a model’s performance (Zhong et al. 2020a; Tian et al. and BiLSTM methods in all categories, demonstrating that the
2021). The results are shown in Table 2. GSHDLM has higher accuracy in safety hazard recognition.

Table 2. Safety hazard text classification performance


GSHDLM TextCNN BiLSTM
Label Precision Recall F1 Precision Recall F1 Precision Recall F1 Support
Vehicle damage 0.9259 0.8333 0.8772 0.8182 0.8400 0.8289 0.7857 0.8067 0.7961 150
Electrocution 0.8873 0.9842 0.9332 0.8090 0.9856 0.8886 0.9342 0.9784 0.9558 696
Falling accident 0.8761 0.8088 0.8411 0.7606 0.8203 0.7893 0.8177 0.8284 0.8231 612
Collapse 0.7500 0.9112 0.8228 0.8410 0.7664 0.8020 0.6215 0.7290 0.6710 214
Incivilization construction 0.8951 0.8980 0.8966 0.9095 0.6939 0.7872 0.8106 0.8666 0.8376 637
Object strike 0.7382 0.8076 0.7714 0.6587 0.6779 0.6681 0.7335 0.6711 0.7009 447
Mechanical damage 0.8565 0.8565 0.8565 0.8895 0.7348 0.8048 0.6867 0.6957 0.6911 230
Lifting injury 0.8619 0.8571 0.8595 0.8038 0.9231 0.8593 0.6371 0.9066 0.7483 182
Violation behavior 0.9244 0.6044 0.7309 0.8015 0.5769 0.6709 0.6912 0.5165 0.5912 182
Fire 0.9759 0.8762 0.9233 0.9618 0.7802 0.8615 0.9660 0.7926 0.8707 323
Explosion 0.8519 0.9693 0.9068 0.7823 0.9502 0.8581 0.7695 0.9080 0.8330 261
Drowning 0.9542 0.7184 0.8197 0.7602 0.8563 0.8054 0.9524 0.3448 0.5063 174
Average 0.8712 0.8659 0.8646 0.8152 0.8067 0.8042 0.8091 0.8006 0.7956 4,108

© ASCE 04022109-7 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 2022, 148(10): 04022109


Table 3. Ablation study for GSHDLM model
No. Syntactic structure BERT GCN BiLSTM Precision Recall F1 Accuracy
1 Yes Yes Yes Yes 0.8712 0.8659 0.8646 0.8656
2 Yes Yes Yes No 0.8309 0.8242 0.8201 0.8234
3 Yes No Yes Yes 0.8447 0.8406 0.8359 0.8406
4 Yes Yes No Yes 0.8162 0.8089 0.8040 0.8089
5 No Yes Yes Yes 0.8237 0.8165 0.8118 0.8165

The method proposed in this study includes four parts: syn- construction site safety hazard text. Existing text classification
Downloaded from ascelibrary.org by SOUTH CHINA UNIVERSITY OF on 08/17/22. Copyright ASCE. For personal use only; all rights reserved.

tactic structure, BERT-based semantic information, a GCN, and methods use text semantics to deeply extract potential relation-
BiLSTM. To understand the GSHDLM at length, an ablation study ships. However, syntactic structure and semantic information have
was conducted on the proposed model by removing some compo- the same importance in the text analysis process, which can affect
nents or features from the model to verify the importance of each the text classification effect. There are many technical terms in a
part. Table 3 provides the results. large-scale project safety hazard, the same technical term may
In Table 3, we observe that removing BiLSTM from the have different utilities in the same safety hazard text, and it is
GSHDLM model led to a poor F1 score of 0.8201, the accuracy of difficult using semantic analysis to identify differences in syntac-
text classification dropped by 4.22%, and the precision, recall, and tic structures. Therefore, it is necessary to analyze the syntactic
F1 values dropped by 4.03%, 4.17%, and 4.45%, respectively, structure and obtain the syntactic information of a text. Although
which confirmed the significance of BiLSTM for safety hazard existing studies demonstrated the importance of syntactic struc-
classification. It has been proven that BiLSTM can capture contex- ture in text classification, it is rarely applied in the text analysis
tualized information and strengthen text features. Without BERT- of construction site safety hazards. This study proposes a safety
based semantic information the accuracy rate of the GSHDLM hazard text classification model based on syntactic structure and
decreased by 2.50%. This confirmed the importance of the seman- semantic information that aims to improve the safety hazard clas-
tic information extracted by BERT in safety hazard analysis. When sification efficiency. Taking a hydraulic engineering construction
the GCN was removed from the GSHDLM, the F1 score and ac- site safety hazard as an example, compared with existing text
curacy rate dropped to 0.8040 and 0.8089, respectively. We specu- classification methods, the superiority of our model was con-
late that the GCN can accurately and fully capture the syntactic firmed (Fig. 6).
structure and semantic information and can avoid the loss of in- Fig. 6 shows the classification effect of different methods.
formation in the feature extraction process. Removing the syntactic Note that the model proposed in this study had the best effect
structure degraded the performance of our model’s accuracy rate by on the classification of hydraulic engineering text, and the accu-
nearly 5% because after dropping syntactic structure the GSHDLM racy rate was 86.56%. Compared with deep learning models, such
could not apply all the syntactic information that was necessary to as Region-CNN (RCNN), BiLSTM, bidirectional gated recurrent
the text classification task. This showed the importance of syntactic unit (BiGRU), and BERT, the accuracy of our model is 5.13%,
structure in safety hazard text analysis. 5.89%, 6.50%, 7.06%, and 7.88% higher, respectively. The afore-
mentioned methods only consider the semantic relationship and
lack syntactic structure analysis, which leads to the drop. FastText
Discussion has the lowest accuracy, with an accuracy rate of 72.36%. Fast-
Efficient and accurate safety hazard recognition is an important Text is a shallow network, and it is had a hard time carrying out
approach to improving the efficiency of safety management for the deep mining of safety hazard features. This indicates that a

90.00%
86.56%

85.00%
81.43%
80.67% 80.06% 79.50%
80.00% 78.68%
Accuracy

75.00%
72.36%

70.00%

65.00%
Our model RCNN TextCNN BiLSTM BiGRU BERT FastText
Text classification methods

Fig. 6. Performance comparison of different text classification models.

© ASCE 04022109-8 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 2022, 148(10): 04022109


deep network structure can improve the accuracy of the analysis hazard text classification. It is necessary to automatically obtain
of hydraulic engineering safety hazard text. Unlike the aforemen- safety hazard solution measures based on classified texts and form
tioned research methods, this study fully considered text syntactic a construction site safety hazard intelligent management system.
structure information. BERT was used to extract text semantic Therefore, future work will be to establish a question-and-answer
features and build an encoding and decoding mechanism based model for classified construction site safety hazards and automati-
on a GCN and BiLSTM. It not only ensured the comprehensive- cally generate safety hazard management measures.
ness of text information extraction but also improved the accuracy
of text classification.
Compared with existing safety analysis methods and text intel- Conclusions
ligent classification models, the value of the method proposed in
this paper is reflected in three aspects. First, BERT is used to cal- For large-scale project construction sites with massive text and
culate word vectors and obtain semantic similarities between various safety hazard categories, an intelligent text classification
Downloaded from ascelibrary.org by SOUTH CHINA UNIVERSITY OF on 08/17/22. Copyright ASCE. For personal use only; all rights reserved.

words. A graph structure fusing syntactic and semantic informa- method is helpful to improve the efficiency of hazard management.
tion is used to quantify safety hazard texts, which enriches the text The safety hazard information of large-scale projects is highly frag-
quantification method and improves the accuracy and comprehen- mented, which increases the difficulty of text classification. Large-
siveness of text feature extraction. An ablation study indicated that scale project construction texts record involve a large number of
when only semantic information or syntactic structure was consid- technical terms, and the accuracy of technical term descriptions
ered, the text classification accuracy was 84.06% and 80.89%, will directly affect text semantic analysis. Therefore, an intelligent
respectively, lower than our model considering syntactic structure classification method for large-scale projects combining syntactic
and semantic information. This illustrates the importance of syn- structure and semantic information was built to automatically rec-
tactic structure and semantic information in safety hazard text ognize construction site safety hazards.
classification. 1. BERT is used to calculate a word vector and obtain the similarity
Second, a GCN model is introduced into the safety hazard text between words, and a similarity-based adjacency matrix is built
deep analysis method to process the graph structure and carry out to represent semantic information. The syntactic structure ex-
the encoding of syntactic structure and semantic information. tracted by Hanlp is used to calculate the syntactic relationship
Furthermore, a BiLSTM-based safety hazard text decoding mecha- between words and form an adjacency matrix. A graph structure
nism is proposed to mine text features and represent the long- fusing syntactic and semantic information is constructed to
distance dependency of text features, which can show the complex
quantify text content. Based on the adjacency matrix, words are
relationship between text features and ensure the accuracy of syn-
taken as nodes, and semantic and syntactic information is taken
tactic and semantic analysis. This method enriches the construction
as edges to construct the graph structure and quantify text con-
text intelligent analysis theory system, which has important guid-
tent. The graph data are regarded as the basic data of text mining
ing significance for improving the accuracy of text classification.
to carry out text feature mining from the syntactic structure and
By comparing the safety hazard classification effects of BiLSTM
semantic information levels.
and a GCN, the classification accuracy of a single method is
2. We proposed a graph structure–based hybrid deep learning
80.89% and 82.34%, respectively, which are lower than our model.
method to determine complex syntactic and semantic relation-
The results prove that the GCN and BiLSTM-based safety hazard
encoding–decoding mechanism can effectively improve the text ships and identify large-scale project construction site safety
classification accuracy. hazards. A GCN model was used to encode syntactic and se-
Third, this study provides an intelligent and accurate safety haz- mantic features, and the encoded features were decoded using
ard recognition method that can quickly determine the category of BiLSTM. Twelve safety hazard categories were established by
safety hazards on construction sites. Furthermore, it is unnecessary hydraulic engineering construction site texts and relevant stan-
to preprocess uploaded safety hazard records, and the records can dards. Safety hazard texts were used to train the intelligent clas-
be directly input into the model to obtain the corresponding safety sification model, and the recognition accuracy was 86.56%.
hazard categories, which improves the application efficiency of the To reflect the superiority of the model in safety hazard process-
model and ensures the timeliness of safety hazard analysis. The ing, compared with other deep learning models, such as CNN,
output safety hazard categories can be used to cluster texts, analyze RNN, RCNN, and so on, it was concluded that the model
the occurrence rule of safety hazards, and formulate safety hazard proposed in this study was superior to other text intelligent clas-
management measures. It is helpful manage safety hazard texts sification models. Meanwhile, an ablation study was executed
systematically. to prove the effect of syntactic structure, BERT-based semantic
information, GCN, and BiLSTM in the model and verify the
accuracy and reliability of the safety hazard intelligent classifi-
Limitation cation model.
The method proposed in this study has high accuracy and reliabil- 3. The GSHDLM adopts a new deep learning method and graph
ity and can efficiently identify construction site safety hazards. structure fusing syntactic structure and semantic information to
However, there is still room for further refinement. Construction carry out large-scale project construction site safety hazard in-
site safety hazard texts contain a lot of information, and a safety telligent classification within a short time so that text features
hazard text may contain one or more safety hazard categories. This are not only limited to text semantics but also fully consider the
study defines each safety hazard text as corresponding to a safety impact of syntactic structure. It is helpful for improving the clas-
hazard category, but it does not consider a multilabel safety hazard sification efficiency of safety hazards. The research effort en-
text problem, which affects the comprehensiveness of text informa- riches the theoretical system of large-scale project construction
tion mining. Therefore, future work should build a multilabel safety text analysis, which can provide necessary key information for
hazard text classification model to comprehensively mine safety construction site safety hazard management and decision-
hazard information and improve the accuracy of safety hazard rec- making, and it is an important prerequisite to perform intelligent
ognition. Meanwhile, current research mainly focuses on safety construction safety management.

© ASCE 04022109-9 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 2022, 148(10): 04022109


Data Availability Statement Anal. Prev. 108 (Nov): 122–130. https://doi.org/10.1016/j.aap.2017
.08.026.
Data generated and analyzed during this study are available from Guo, Q., X. Qiu, X. Xue, and Z. Zhang. 2021. “Syntax-guided text gen-
the corresponding author by request. eration via graph neural network.” Sci. China Inf. Sci. 64 (5): 1–10.
https://doi.org/10.1007/s11432-019-2740-1.
Han, Y., Y. Diao, Z. Yin, R. Jin, J. Kangwa, and O. J. Ebohon. 2021.
“Immersive technology-driven investigations on influence factors of
Acknowledgments cognitive load incurred in construction site hazard recognition, analysis
and decision making.” Adv. Eng. Inf. 48 (Apr): 101298. https://doi.org
This research was supported by the National Natural Science Foun-
/10.1016/j.aei.2021.101298.
dation of China (Grant 52179139) and the Open Fund of Hubei
Hong, Y., H. Xie, G. Bhumbra, and I. Brilakis. 2021. “Comparing natu-
Key Laboratory of Construction and Management in Hydropower ral language processing methods to cluster construction schedules.”
Engineering (Grant 2020KSD05). J. Constr. Eng. Manage. 147 (10): 04021136. https://doi.org/10.1061
Downloaded from ascelibrary.org by SOUTH CHINA UNIVERSITY OF on 08/17/22. Copyright ASCE. For personal use only; all rights reserved.

/(ASCE)CO.1943-7862.0002165.
Hu, G., G. Lu, and Y. Zhao. 2021. “FSS-GCN: A graph convolutional
References networks with fusion of semantic and structure for emotion cause
analysis.” Knowl.-Based Syst. 212 (Jan): 106584. https://doi.org/10
Alam, K. M., N. Siddique, and H. Adeli. 2020. “Dynamic ensemble learn- .1016/j.knosys.2020.106584.
ing algorithm for neural networks.” Neural Comput. Appl. 32 (12): Kim, T., and S. Chi. 2019. “Accident case retrieval and analyses: Using
8675–8690. https://doi.org/10.1007/s00521-019-04359-7. natural language processing in the construction industry.” J. Constr.
Bai, Y., C. Li, Z. Lin, Y. Wu, Y. Miao, Y. Liu, and Y. Xu. 2021. “Efficient Eng. Manage. 145 (3): 04019004. https://doi.org/10.1061/(ASCE)CO
data loader for fast sampling-based GNN training on large graphs.” .1943-7862.0001625.
IEEE Trans. Parallel Distrib. Syst. 32 (10): 2541–2556. https://doi Ko, T., and H. D. Jeong. 2020. “Syntactic approach to extracting key
.org/10.1109/TPDS.2021.3065737. elements of work modification cause in change-order documents.”
Baker, H., M. R. Hallowell, and A. J.-P. Tixier. 2020. “Automatically learn- In Proc., Construction Research Congress (CRC) on Construction
ing construction injury precursors from text.” Autom. Constr. 118 (Oct): Research and Innovation to Transform Society, 134–142. Tucson: Con-
103145. https://doi.org/10.1016/j.autcon.2020.103145. struct Res Council.
Barnes, J., R. Kurtz, S. Oepen, L. Ovrelid, and E. Velldal. 2021. “Structured Lecun, Y., Y. Bengio, and G. Hinton. 2015. “Deep learning.” Nature
sentiment analysis as dependency graph parsing.” In Proc., Joint 521 (7553): 436–444. https://doi.org/10.1038/nature14539.
Conf. of 59th Annual Meeting of the Association-for-Computational- Li, R., L. Wang, Z. Jiang, D. Liu, M. Zhao, and X. Lu. 2021a. “Incremental
Linguistics (ACL)/11th Int. Joint Conf. on Natural Language Process- BERT with commonsense representations for multi-choice reading
ing (IJCNLP)/6th Workshop on Representation Learning for NLP comprehension.” Multimedia Tools Appl. 80 (21–23): 32311–32333.
(RepL4NLP), 3387–3402. Stroudsburg, PA: Association for Computa- https://doi.org/10.1007/s11042-021-11197-0.
tional Linguistics. Li, X., M. Cui, J. Li, R. Bai, Z. Lu, and U. Aickelin. 2021b. “A hybrid
Chen, S., J. Xi, Y. Chen, and J. Zhao. 2022. “Association mining of near medical text classification framework: Integrating attentive rule con-
misses in hydropower engineering construction based on convolutional struction and neural network.” Neurocomputing 443 (Jul): 345–355.
neural network text classification.” Comput. Intell. Neurosci. https://doi.org/10.1016/j.neucom.2021.02.069.
2022 (Jan): 1–16. https://doi.org/10.1155/2022/4851615.
Lin, J.-R., Z.-Z. Hu, J.-L. Li, and L.-M. Chen. 2020. “Understanding
Cheng, M.-Y., D. Kusoemo, and R. A. Gosno. 2020. “Text mining-based
on-site inspection of construction projects based on keyword extraction
construction site accident classification using hybrid supervised ma-
and topic modeling.” IEEE Access 8 (Nov): 198503–198517. https://doi
chine learning.” Autom. Constr. 118 (Oct): 103265. https://doi.org/10
.org/10.1109/ACCESS.2020.3035214.
.1016/j.autcon.2020.103265.
Lin, P., P. Wei, Q. Fan, and W. Chen. 2019. “CNN model for mining safety
Chi, N.-W., K.-Y. Lin, N. El-Gohary, and S.-H. Hsieh. 2016. “Evaluating
hazard data from a construction site.” [In Chinese.] J. Tsinghua Univ.
the strength of text classification categories for supporting construction
59 (8): 628–634. https://doi.org/10.16511/j.cnki.qhdxxb.2019.26.008.
field inspection.” Autom. Constr. 64 (Apr): 78–88. https://doi.org/10
.1016/j.autcon.2016.01.001. Liu, J., Z. S. Y. Wong, H.-Y. So, and K. L. Tsui. 2021. “Evaluating resam-
pling methods and structured features to improve fall incident report
Chokor, A., H. Naganathan, W. K. Chong, and M. El Asmar. 2016.
“Analyzing Arizona OSHA injury reports using unsupervised machine identification by the severity level.” J. Am. Med. Inf. Assoc. 28 (8):
learning.” Procedia Eng. 145 (Jan): 1588–1593. https://doi.org/10.1016 1756–1764. https://doi.org/10.1093/jamia/ocab048.
/j.proeng.2016.04.200. Lu, J., J. Xuan, G. Zhang, and X. Luo. 2018. “Structural property-
Devlin, J., M. W. Chang, K. Lee, and K. Toutanova. 2018. “Bert: aware multilayer network embedding for latent factor analysis.” Pattern
Pre-training of deep bidirectional transformers for language understand- Recognit. 76 (Apr): 228–241. https://doi.org/10.1016/j.patcog.2017
ing.” Preprint, submitted October 11, 2018. https://arxiv.org/abs/1810 .11.004.
.04805. Park, C., J. Park, and S. Park. 2020. “AGCN: Attention-based graph con-
Ding, L. Y., and H. Li. 2013. “Information technologies in safety manage- volutional networks for drug-drug interaction extraction.” Expert Syst.
ment of large-scale infrastructure projects.” Autom. Constr. 34 (Sep): Appl. 159 (Nov): 113538. https://doi.org/10.1016/j.eswa.2020.113538.
1–2. https://doi.org/10.1016/j.autcon.2012.10.016. Qiu, Z., Q. Liu, X. Li, J. Zhang, and Y. Zhang. 2021. “Construction and
Fang, W., H. Luo, S. Xu, P. E. D. Love, Z. Lu, and C. Ye. 2020. “Automated analysis of a coal mine accident causation network based on text min-
text classification of near-misses from safety reports: An improved deep ing.” Process Saf. Environ. Prot. 153 (Sep): 320–328. https://doi.org/10
learning approach.” Adv. Eng. Inf. 44 (Apr): 101060. https://doi.org/10 .1016/j.psep.2021.07.032.
.1016/j.aei.2020.101060. Ren, Q., M. Li, H. Li, and Y. Shen. 2021. “A novel deep learning predic-
Feng, D., and H. Chen. 2021. “A small samples training framework for tion model for concrete dam displacements using interpretable mixed
deep Learning-based automatic information extraction: Case study of attention mechanism.” Adv. Eng. Inf. 50 (Oct): 101407. https://doi.org
construction accident news reports analysis.” Adv. Eng. Inf. 47 (Jan): /10.1016/j.aei.2021.101407.
101256. https://doi.org/10.1016/j.aei.2021.101256. Salama, D. M., and N. M. El-Gohary. 2016. “Semantic text classifica-
Gao, W., and H. Huang. 2021. “A gating context-aware text classification tion for supporting automated compliance checking in construction.”
model with BERT and graph convolutional networks.” J. Intell. Fuzzy J. Comput. Civ. Eng. 30 (1): 04014106. https://doi.org/10.1061
Syst. 40 (3): 4331–4343. https://doi.org/10.3233/JIFS-201051. /(ASCE)CP.1943-5487.0000301.
Goh, Y. M., and C. U. Ubeynarayana. 2017. “Construction accident nar- Tian, D., M. Li, J. Shi, Y. Shen, and S. Han. 2021. “On-site text classifi-
rative classification: An evaluation of text mining techniques.” Accid. cation and knowledge mining for large-scale projects construction by

© ASCE 04022109-10 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 2022, 148(10): 04022109


integrated intelligent approach.” Adv. Eng. Inf. 49 (Aug): 101355. Zhang, F., H. Fleyeh, X. Wang, and M. Lu. 2019. “Construction site
https://doi.org/10.1016/j.aei.2021.101355. accident analysis using text mining and natural language processing
Tixier, A. J. P., M. R. Hallowell, B. Rajagopalan, and D. Bowman. 2016. techniques.” Autom. Constr. 99 (Mar): 238–248. https://doi.org/10
“Automated content analysis for construction safety: A natural language .1016/j.autcon.2018.12.016.
processing system to extract precursors and outcomes from unstruc- Zhang, J., L. Zi, Y. Hou, D. Deng, W. Jiang, and M. Wang. 2020.
tured injury reports.” Autom. Constr. 62 (Feb): 45–56. https://doi.org/10 “A C-BiLSTM approach to classify construction accident reports.”
.1016/j.autcon.2015.11.001. Appl. Sci. 10 (17): 5754. https://doi.org/10.3390/app10175754.
Wang, Z., Z. Huang, and J. Gao. 2020a. “Chinese text classification method Zhong, B., X. Pan, P. E. D. Love, L. Ding, and W. Fang. 2020a. “Deep
based on BERT word embedding.” In Proc., 5th Int. Conf. on Math- learning and network analysis: Classifying and visualizing accident
ematics and Artificial Intelligence (ICMAI), 66–71. New York: Asso- narratives in construction.” Autom. Constr. 113 (May): 103089. https://
ciation for Computing Machinery. https://doi.org/10.1145/3395260
doi.org/10.1016/j.autcon.2020.103089.
.3395273.
Zhong, B., X. Pan, P. E. D. Love, J. Sun, and C. Tao. 2020b. “Hazard
Wang, Z., C.-H. Wu, Q.-B. Li, B. Yan, and K.-F. Zheng. 2020b. “Encoding
Downloaded from ascelibrary.org by SOUTH CHINA UNIVERSITY OF on 08/17/22. Copyright ASCE. For personal use only; all rights reserved.

analysis: A deep learning and text mining framework for accident


text information with graph convolutional networks for personality rec-
ognition.” Appl. Sci. 10 (12): 4081. https://doi.org/10.3390/app10124081. prevention.” Adv. Eng. Inf. 46 (Oct): 101152. https://doi.org/10.1016/j
Xu, N., L. Ma, Q. Liu, L. Wang, and Y. Deng. 2021a. “An improved text .aei.2020.101152.
mining approach to extract safety risk factors from construction acci- Zhong, B., X. Xing, H. Luo, Q. Zhou, H. Li, T. Rose, and W. Fang. 2020c.
dent reports.” Saf. Sci. 138 (Jun): 105216. https://doi.org/10.1016/j.ssci “Deep learning-based extraction of construction procedural constraints
.2021.105216. from construction regulations.” Adv. Eng. Inf. 43 (Jan): 101003. https://
Xu, N., L. Ma, L. Wang, Y. Deng, and G. Ni. 2021b. “Extracting domain doi.org/10.1016/j.aei.2019.101003.
knowledge elements of construction safety management: Rule-based Zhou, J., J. X. Huang, Q. V. Hu, and L. He. 2020. “SK-GCN: Modeling
approach using Chinese natural language processing.” J. Manage. Syntax and Knowledge via Graph Convolutional Network for aspect-
Eng. 37 (2): 04021001. https://doi.org/10.1061/(ASCE)ME.1943-5479 level sentiment classification.” Knowl.-Based Syst. 205 (Oct): 106292.
.0000870. https://doi.org/10.1016/j.knosys.2020.106292.

© ASCE 04022109-11 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 2022, 148(10): 04022109

You might also like