(Asce) Co 1943-7862 0002122

State-of-the-Art Review
Addressing Legal and Contractual Matters in Construction

Using Natural Language Processing: A Critical Review
Fahad ul Hassan, S.M.ASCE 1; Tuyen Le, A.M.ASCE 2; and Xuan Lv 3
Abstract: Claims, disputes, and litigations are major legal issues in construction projects, which often result in cost overruns, delays, and
adverse working relationships among the contracting parties. Recent advances in natural language processing (NLP) techniques offer great
potentials that can process voluminous unstructured data from legal documents to draw insightful information about the root causes of issues
and prevention strategies. Several efforts have been undertaken in the last decades that used NLP to tackle a wide range of problems related to
legal issues in construction such as the quality review of contracts and the identification of common patterns in legal cases. The research line
on NLP-based techniques for analyzing legal texts of construction projects has progressed well recently; it, however, is still in the early stage.
This paper aims to perform a critical review of recently published articles to analyze the achievements and limitations of the state of the art on
NLP-based approaches to address common legal issues associated with legal documents arising across different project stages. The study also
provides a roadmap for future research to expand the adoption of NLP for the processing of legal texts in construction. DOI: 10.1061/(ASCE)
CO.1943-7862.0002122. © 2021 American Society of Civil Engineers.
Author keywords: Legal issues; Disputes; Artificial intelligence; Natural language processing (NLP); Contracts; Project requirements;
Litigation; Claims; Linguistics.
Introduction that often require complex, time-consuming, and costly solutions.

Therefore, automated methods to augment engineers in drafting, re-
Legal issues such as disputes are common in construction projects viewing, and processing contractual and legal requirements has
due to the natural complexity in terms of size, disciplines, and the emerged as a critical need to minimize legal matters in construction
number of separate parties (Younis et al. 2008). These disputes gen- (Çevikol and Aydemir 2019; Pishdad-Bozorgi and De La Garza
erally cause significant loss of time and money for the stakeholders 2012; Shen et al. 2017).
(Walsh 2017). Research has repeatedly reported that the use of The implementation of natural language processing (NLP), a
natural language in legal documents is among the leading causes branch of artificial intelligence, particularly for natural languages
of legal issues, necessitating advanced techniques in text process- such as texts and speeches, has been demonstrated as a promising
ing (Lee et al. 2019). Examples of well-known linguistic issues of solution in the literature to improve the contract quality, comprehen-
construction contracts include vagueness and ambiguities (Çevikol sion, and management (Arora et al. 2015; Zait and Zarour 2019).
and Aydemir 2019) or biased clauses that shift risk to one party NLP offers a collection of techniques to process the natural language
(Lee et al. 2019). The client often maliciously modifies the contrac- text for many applications such as machine translation, speech rec-
tual terms and specifications to shift risks to the contractors (Lee ognition, information retrieval, and text classification (Manning and
et al. 2019). The conflict in each party’s understanding of their roles Schütze 1999). Recent advances in machine learning–based NLP
and obligations provided in the contracts would mostly lead to con- have offered unprecedented opportunities for the construction do-
struction disputes (Chong 2012; Rajoo 2010; Walsh 2017). Zait and main to prevent potential human errors in provisions and legal issues
Zarour (2019) reported the imprecise inferences of contractual (Zait and Zarour 2019). NLP can support the contract drafting and
terms as a major cause of disagreements leading to project failures. reviewing process to ensure precise scope comprehension along
Furthermore, the traditional method of checking the compliance of a with the identification of any poisonous clauses. In addition, NLP
project with construction laws and regulations such as design codes can assist in verifying consistency across multiple contracts as well
or safety procedures is prone to errors due to the manual process- as the compliance checking of construction activities in accordance
ing of voluminous regulatory texts (Zhang and El-Gohary 2016a). with the project requirements, codes, and regulatory documents.
Noncompliance can lead to redesign, rework, and monetary losses However, the body of knowledge lacks a holistic understanding
of the previously developed NLP frameworks to control legal issues
1
Ph.D. Student, Glenn Dept. of Civil Engineering, Clemson Univ., and any gaps that merit further research.
Clemson, SC 29634. ORCID: https://orcid.org/0000-0002-2308-2606. This study has two primary objectives. The first objective is to
Email: fhassan@g.clemson.edu conduct a comprehensive review of existing NLP frameworks de-
2
Assistant Professor, Glenn Dept. of Civil Engineering, Clemson Univ., veloped by previous researchers in the construction domain. The
Clemson, SC 29634 (corresponding author). ORCID: https://orcid.org limitations of existing frameworks in addressing legal issues in con-
/0000-0002-8606-9214. Email: tuyenl@clemson.edu struction are also examined. To achieve this objective, the study
3
Assistant Professor, Moss School of Construction, Infrastructure and
investigated a number of articles published in high-impact journals
Sustainability, Florida International Univ., Miami, FL 33174. Email:
xulv@fiu.edu
in the last decades on the implementation of NLP to reduce the
Note. This manuscript was published online on July 7, 2021. Discussion likelihood of legal disputes or litigation. The second objective of
period open until December 7, 2021; separate discussions must be sub- the study is to report potential areas requiring more research where
mitted for individual papers. This paper is part of the Journal of Construc- the capability of NLP should be explored. The findings of this
tion Engineering and Management, © ASCE, ISSN 0733-9364. study will help the practitioners and researchers to understand
© ASCE 03121004-1 J. Constr. Eng. Manage.

better the current body of knowledge and future research needs in can help improve compliance checking, thus reducing violations of
the utilization of NLP in addressing legal issues in construction. laws and regulations in construction.
Natural Language Processing

Background
As discussed earlier, many legal issues in construction are strongly
associated with the use of natural language in legal documents
Common Causes of Legal Issues Associated with (Chong 2012; Walsh 2017); thus, recent advances in NLP offer a
Legal Documents in Construction great potential to improve the drafting, reading, reviewing, compre-
Disputes between the stakeholders of a construction project such as hension, and management of legal texts. NLP is a brand of artificial
the owner, designer, contractor, and subcontractors during the ex- intelligence (AI) that refers to a set of advanced computational
ecution of a construction project are prevalent (Kilian 2003). Any techniques capable of enabling computers to understand the lin-
dispute that cannot be resolved by negotiation efforts would lead to guistics data in the natural language such as speeches and texts
costly litigation and cause significant negative impacts on the fi- (Chowdhury 2003). Examples of NLP tasks include machine trans-
nancial condition and reputation of all parties (Jagannathan and lation, information extraction, and opinion mining (Manning and
Delhi 2020). Studies have reported a serious increase in litigation Schütze 1999). NLP extracts and processes linguistic information
in construction projects over the past years, necessitating the need ranging from the syntactic level of words such as the occurrence
for new solutions to reduce the litigation in construction by con- frequency and their part of speech (POS) to the grammatical depend-
trolling the causes that actually lead to the litigation (Mitkus and encies between linguistic units and to the semantics of individual
Mitkus 2014). words and sequences of words such as sentences or documents.
Many legal issues in the construction industry are associated
Fundamental NLP Techniques
with the natural language in the following two common types
The most fundamental NLP techniques include POS tagging,
of legal documents: (1) contracts, and (2) laws and regulations
dependency parsing, and lemmatization. POS tagging assigns POS
(Cakmak and Cakmak 2013). Contract drafting has been repeat-
tags (i.e., noun, verb, adjective) to each linguistic unit in the text
edly reported as a major source of legal disputes (Cakmak and
which are then used to perform certain text processing tasks, for in-
Cakmak 2013; Kumar Viswanathan et al. 2020). One of the most
stance, information extraction (Cutting et al. 1992). Dependency
common source of legal disputes associated with contract draftin
parsing generates the parsed tree indicating the grammatical structure
issue is the lack of clarity in contracts resulting in poor understand-
of a sentence where the grammatical relationships between the words
ing among the contracting parties regarding their roles and obli-
included in the sentence are shown (Kübler et al. 2009). The depend-
gations provided in contracts (Chong 2012; Rajoo 2010; Walsh
ency information is also useful for information extraction tasks. In
2017). Mitkus and Mitkus (2014) found that contracts with incom-
addition, lemmatization mainly restricts the different grammatical
plete requirements often lead to design changes and associated dis-
forms of a word to its root form (Balakrishnan and Ethel 2014).
putes. Missing critical contractual information also adversely affects
For instance, constructs, constructed, and construction can be re-
construction work quality (Jagannathan and Delhi 2020). Besides,
turned to the same root form construct. A few other basic NLP tech-
incomplete project requirements in a subcontract often lead to dis-
niques include tokenization (conversion of a series of text into
agreements among the contracting parties (Sun and Meng 2009).
individual tokens), lowercasing (converts the whole text into a single
The misinterpretation of contracts is another primary reason behind
lowercase format), and removal of punctuations and stopwords (re-
conflicts or even legal issues in construction projects (Chan and
move all punctuations and stopwords such as the, is, of, etc.)
Suen 2005). Such misinterpretations of contractual terminologies (Runeson et al. 2007). This linguistic data extracted using NLP is
are caused by lexical ambiguities (i.e., polysemes) (Berry et al. used to infer meaningful information such as locations, time, actions,
2003) or syntactic ambiguities (i.e., sentence structure) (Zait and and topics (Chi et al. 2017; Mollá et al. 2006; Moon et al. 2021). In
Zarour 2019). Azghandi-Roshnavand (2019) found 24 sources the context of legal texts, NLP has the potential to obtain legal con-
of ambiguities related to the following types of clauses: roles cepts, including but not limited to the following: obligation, parties,
and responsibilities, dispute resolution, insurance and bonds, pay- rights, and penalties. These concepts serve as the input information
ment conditions, transportation criteria, criteria related to testing, that the computer can utilize to automatically assess the quality of the
defects, and damages. Furthermore, the use of poisonous clauses in contractual document or examine patterns of legal cases.
contracts also causes disagreements among the contracting parties
in dispute resolutions (Lee et al. 2019). These biased clauses are NLP Approaches to Text Processing
often added in construction contracts by the owner to shift the NLP-based information extraction algorithms can be categorized
maximum risks to contractors and other parties (Lee et al. 2019; into the following two common approaches: rule-based and ma-
Youssef et al. 2018). chine learning–based text processing (Salama and El-Gohary 2013).
In addition to contracts, the violation of regulatory codes and Human-crafted rules are often used when relatively high perfor-
laws has been cited as a main source of legal issues in construction. mance is required (Li et al. 2012). For example, to support distin-
These errors often lead to redesign, rework, and consequently con- guishing between obligation and rights, a keyword such as must
tract disputes due to monetary loss (Almutairi et al. 2020; Kilian can be used as a criterion rule. Manually constructed rules yield
and Gibson 2005). One major cause of design errors was found to higher results, but that high performance is achieved at the cost of
be the unfamiliarity with relevant and up-to-date design codes, con- the large human effort required in developing those rules (Manning
struction laws, and regulations among engineers (Yap et al. 2018). and Schütze 1999). Rule-based methods produce very robust mod-
Design errors also occur due to the use of inefficient methods of els for specific applications due to the human involvement in rules
compliance checking of designs with relevant standards and laws development. A simple rule-based model is founded on a set of
(Cakmak and Cakmak 2013). The traditional compliance checking IF-THEN rules in which the rule part after IF reflects the condition
methods are time-consuming and prone to human errors due to the and the rule part after THEN is the conclusion (Liu et al. 2014).
reliance on manual code reading and extraction of rules (Zhang and Compared with the rule-based approach, machine learning (ML)
El-Gohary 2016a). NLP has become an emerging technique that is considered more robust and greatly scalable (Li et al. 2012). ML is

a process that first learns from the completed examples during the method is often used (Sebastiani 2002). The IDF parameter in this
training phase to apply the gained knowledge to perform the same method weighs down high frequency words while it scales up the
task on unseen samples in the testing phase (Sebastiani 2002). ML weights of rare terms to identify actual discriminating words in a
typically relies on examples to make predictions on new instances. corpus (Salton and Buckley 1988).
ML can be of two types: (1) supervised learning, in which the la- Recently, word embeddings have been increasingly used by re-
beled text data is provided during the training phase (Sebastiani searchers for representation of text as numerical vectors due to the
2002); and (2) unsupervised learning, in which the model learns capability of capturing the contextual and semantic information in
automatically on its own from the corpus without any labeled data text (Gutiérrez-Batista et al. 2019). Unlike BOW, which produces
or instructions (Ozgur 2004). Several ML algorithms have been de- a vector for each sentence, word embeddings use a pretrained vec-
veloped by the researchers in the computer science domain that can tor space model to generate a unique fixed-length n-dimensional
be applied to develop applications for other disciplines. The most vector for each word in the corpus (Gutiérrez-Batista et al. 2019;
commonly used supervised learning algorithms include the naïve Wei et al. 2019). Distribution models are used to measure the
Bayes (NB), support vector machines (SVM), logistic regression semantic similarity between terms according to their context in cor-
(LR), k-nearest neighbor (kNN), decision tree (DT), and the deep pus. Word embeddings typically generate a high-dimensional vec-
learning algorithms such as neural networks. Among these, the SVM tor space where the vectors of words appearing in same context in a
algorithm which generates a hyperplane to classify text is reported as corpus are adjacent to each other due to high semantic similarity
the best performing algorithm in most cases (Yasodha and Prakash among them. Two state-of-the-art methods used to generate word
2012). An extra benefit of using SVM is the presence of kernels embeddings include word2vec (Mikolov et al. 2013) and Glove
which help it to process nonlinear data as well (Yasodha and (Pennington et al. 2014). Word2vec is generally applied in two ar-
Prakash 2012). On the other hand, the probabilistic algorithms, chitectures: (1) continuous bag of words (CBOW), and (2) skip-
including NB and LR, are comparatively simple than SVM yet gram. CBOW combined the vector representations of surrounding
produce promising results (Domingos 2012). The probabilistic al- words to predict the target word, whereas skip-gram uses the vector
gorithms also produce probability values implying the confidence representation of target word to predict the surrounding words or
level of the predictions. In contrast, proximity-based algorithms context. In addition, Glove is a vector space model trained on a
such as kNN process the test data by identifying the labels of its global occurrence matrix based on the assumption that the proba-
k nearest matches in training data (Yuan et al. 2008). Although DT bility of co-occurrence of two words is equal to the dot product of
algorithm is the simplest and easy to understand, it is highly sus- their corresponding vectors.
ceptible to overfitting (Apté et al. 1994). DT algorithm uses a tree
structure where the nodes of the tree represent a series of rules used
to process the text. Recently, neural networks such as convolutional Review Scope and Methodology
neural network (CNN) and recurrent neural network (RNN) have
gained much attention due to their higher performance in text This paper involves a literature review of the academic articles that
processing tasks (Kamath et al. 2018; Kowsari et al. 2017). RNN focus on the utilization of NLP in addressing legal issues in the
is generally applied in long short-term memory (LSTM) and gated construction field. The study adopted a modified four-step standard
recurrent unit (GRU) architectures. However, the neural networks literature review methodology proposed by Arksey and O’Malley
require larger training data as well as longer training time than the (2005). Fig. 1 shows the steps involved in the methodology. First,
other supervised algorithms (Kowsari et al. 2019). the research questions, along with the extent and range of research,
were identified. Second, a keyword-based literature search was
Vectorization of Textual Data used to collect relevant studies from the selected search engines
Text processing using machine learning typically requires the input and journals. In the third step, the screening of the collected articles
text data to be converted into a computer-understandable numerical was carried out by reading the abstracts and the conclusions to
format (Baker et al. 2020). Two common word embeddings methods identify closely related papers. The fourth step involved an in-depth
used for converting text data into numerical format are: (1) bag-of- critical review of the screened articles to collate, summarize, and
words (BOW), and (2) word embeddings. BOW is a vector space report the research findings. Additionally, the leading research gaps
model which converts the whole corpus into a bag of unique words in the current body of knowledge were also identified to highlight
ignoring the context and sequential information of the words in cor- the potential future research areas.
pus (Gonçalves and Quaresma 2004). In BOW, each document As noted earlier, the goal of this study is to review current
or sentence of the target corpus is represented by a vector of N el- NLP frameworks and their limitations in addressing the legal issues
ements where N corresponds to the total number of unique words in in construction domain, as identified earlier in the Background
the corpus (Orsenigo et al. 2018). The elements of the vector can section. Relevant articles were collected using several search en-
either be zero or a real number implying the absence or occurrence gines and databases, including Google Scholar, ASCE Library,
frequency of a certain word, respectively. The occurrence frequency Engineering Village, ASCE Civil Engineering Database, Research
of a word typically indicates the unique power of word in predicting Gate, Web of Science, and Scopus. The study adopted the keyword-
a certain output (i.e., topics of a text) (Sebastiani 2002). The word based search method, which has been widely used by many
frequencies are also commonly weighted to identify discriminat-
ing words using following two methods: term frequency (TF),
and term frequency-invert document frequency (TF-IDF). TF is a
simple method in which the weight indicates the frequency of a
word in a sentence normalized by the sentence length (Robertson
and Spärck Jones 1994). This method assigns higher weights to
the words that occur frequently in a corpus. However, certain words
including domain specific stop words occur frequently in corpus,
but they carry less discriminating power to predict an output. There-
Fig. 1. Methodology of the literature review process.
fore, in order to identify actual discriminating words, the TF-IDF

Table 1. Number of reviewed papers by publication type
Initial number Final number
Publication type of publications of publications
Journal articles 31 19
Conference papers 14 9
Total 45 28
researchers to collect relevant articles (Deng and Smyth 2013; Lin

and Shen 2007; Xue et al. 2010; Yi and Chan 2014). The keywords
included “construction”, “legal issues”, “disputes”, “litigation”,
“claims”, “change orders”, “contracts”, “laws”, “codes”, and “natu-
ral language processing”, along with their variations in spellings
and tenses. Only relevant papers published in high-impact journals
and conference proceedings were considered. Initially, a total of
45 articles were collected. Of those, 31 were published in journals Fig. 3. Distribution of reviewed papers based on the publication
while the remaining 14 were published in the proceedings (as illus- source.
trated in Table 1). The papers were then screened by reviewing their
abstracts and conclusions. The articles irrelevant to the research
goal were then discarded. As a result of this stage, a total of 28
articles were found to be relevant, which were critically reviewed Distribution of Publications by Journal
by the authors. The distribution of the reviewed articles based on the publication
source is shown in Fig. 3. The review revealed eight journal sour-
ces with at least one article on NLP implementation to prevent
Results legal issues. As shown, the maximum number of articles (7 articles)
was published in the Journal of Computing in Civil Engineering,
Descriptive Results while Automation in Construction published 5 articles. Moreover,
many journals and proceedings contributed one article each. Over-
Publication Trend of NLP-Based Research in all, a total 9 articles were published in the different conference
the Past Two Decades proceedings.
To help the reader understand the overall trend of NLP-related
research to address legal issues in construction, a summary of col-
lected articles by year is illustrated in Fig. 2. As shown, there is a Themes of State-of-the-Art Research on NLP
significant increase in the number of articles throughout the past Approaches to Construction Legal Matters
decade, indicating that more and more researchers are interested Although an emerging number of researchers have investigated the
in using NLP to handle legal issues in the construction field. Ac- ability of NLP to address legal issues in the construction domain,
cording to the articles reviewed in this study, no article was found this line of research in construction is still at a very early stage. This
before 2000. From that time on, relatively few articles that met the study found that existing NLP models primarily aimed to address
keyword search criteria were published in the first 10 years. How- the legal issues associated with different legal documents. Accord-
ever, an increase in the trend was observed between 2011 and 2015, ingly, the present review study classified the state-of-the-art on
with 6 papers published in the period. Especially within the last six NLP to address legal issues in construction into three categories
years, from 2016 to 2021, 19 papers were found. This jump in the according to types of legal documents (see Fig. 4): (1) contracts,
number of publications shows that studies on the topic are signifi- (2) regulatory codes and standards, and (3) legal cases. As shown in
cantly emerging.
Fig. 4. Number of NLP publication addressing legal issues associated

Fig. 2. Publication trend of the NLP related articles over time. with different types of legal documents.

the figure, previous studies have focused more on the contract- Lee et al. (2020) to predict if a FIDIC clause is favorable to the
related legal issues with 13 publications as compared with other le- contractor. This prediction-based model reported an F-score of
gal documents. The use of NLP to analyze regulatory codes and 80.0%. Despite the impressive results, the method is less scalable
standards is also well studied with 11 publications. The following due to the reliance on manual development of rules, which seems
sections provide a detailed critical analysis of prior studies found in tedious and may not perform well on other standards contracts such
this state-of-the-art review for each of the categories. as the American Institute of Architects (AIA). More importantly, it
is impossible to develop generic rules applicable to all types of text,
NLP-Aided Contract Review as acknowledged by Lee et al. (2019). To address the drawbacks of
This critical review found that various studies have been conducted rule-based approaches, another intelligent NLP framework, namely
to explore the ability of NLP in improving contract drafting with the “risk-o-meter”, for assessing risk-prone clauses was developed
the ultimate aim of enabling contracting parties to be aware of their by Chakrabarti et al. (2019) using supervised ML algorithms such
rights, obligations, and the associated risks before signing the con- as NB and SVM. To prepare train data, they converted contractual
tract. Relevant studies in this regard, as summarized in Table 2, text into numeric vectors using word embeddings and labeled the
have been focused on tackling the following challenges: automated data with one of the following risk categories: liability, indemnity,
detection of ambiguities, automated detection of risk-prone clauses, and confidentiality. The ML-based risk-o-meter model outper-
and requirement structuring for ease of contract administration. formed the previously discussed rule-based model developed by
Details of NLP-based approaches to address those three areas are Lee et al. (2019). The risk-o-meter revealed the optimal performance
as follows. of 92% for accuracy and 86% for F-measure when being trained
Automated Detection of Ambiguities. This review found only one with SVM. This superior performance compared with the rule-based
study that aimed at improving the clarity of a contract draft through approach is because the ML-based detection of risk-prone clauses
automated detection of ambiguous linguistic units. That study by enables the self-learning of complex relationships between input
Curtotti and McCreath (2011) used corpus profiling and chunk text features.
analysis to examine the distinctive characteristics of the contract Structuring Text for Ease of Contractual Requirements
language compared with general English. The authors specifically Management. Another line of NLP approaches to tackle legal is-
compared the contract corpus of one million words with two gen- sues related to natural language contracts was focused on develop-
eral English corpora, including Brown and Reuter corpora. They ing classification models using mostly ML for structuring the text
found that the contract corpus is less sparse than the general English in the contract documents to facilitate ease of contract review. The
corpora, which means that the contract corpus includes specific first effort in this area was made by Caldas et al. (2002), who de-
well-defined terms and sentence structure. The descriptive statistics veloped a model for classifying the information in construction
reported the following distinctive terms of the contract corpus: documents into thirteen topics (general, schedule, plumbing,
agreement, shall, party, clause, services, etc., whereas that of HVAC, etc.) using supervised ML. They used the term frequency-
the general corpora are but, was, his, they, would, etc. The analysis inverse document frequency (TF-IDF) to extract text features. They
also revealed the frequent use of lengthy sentences and prepositions tested three different algorithms such as NB, SVM, kNN, and com-
in contract language, whereas the use of past tense and pronouns in pared them with existing commercial text mining tools such as
contract language is less common. The authors suggested the use of Rocchio and IBM Miner for Text miner using a test set comprising
these unique characteristics of contract language as an input for the 845 text instances. The SVM model achieved the highest accuracy
development of a software-based contract drafting tool. They envi- of 91.12%, whereas the kNN model reported the lowest perfor-
sioned a tool that would assist contract drafters in detecting uncom- mance of 49.11% accuracy. Addressing the same classification
mon legal terms, prepositional phrase which would help to problem but using a new approach, Caldas and Soibelman (2003)
minimize ambiguities. developed a hierarchical text classification model with more de-
Automated Detection of Poisonous Clauses. Another focus of the tailed labels indicating the 121 CSI Master Format items. Among
literature regarding contract review was to develop NLP techniques these, 16 items were included in level 1 of label hierarchy, 52 items
to support risk assessment of construction clauses. The first con- in level 2 of label hierarchy, and 53 items were included in level 3 of
tribution in this line of research was made by Serag et al. label hierarchy. The accuracy of 95.88% achieved for level 1 labels
(2010) who used parsing along with ontology and reasoners to de- by the hierarchical method on a dataset of 3030 documents was
tect high-risk clauses and conflicts between general conditions and higher than the flat text classification method used in the earlier
supplementary conditions. The named entity extractors were used model. The performance was dropped to 86.37% for classifying
to parse the input clause. The extracted information is then con- documents into labels at level 3 of hierarchy. Similar to the previous
verted into a structured logical format of ontology. The reasoner study, this study also found that SVM was superior to other ML
module detects the conflict or similarity of the input clause alternatives. Recently, Hassan and Le (2020) developed a highly re-
with the well-known risk-prone clauses included in a database pro- liable classification model with a recall of 95% using ML to distin-
duced from previous post-project reviews. However, since the guish between requirement and nonrequirement text in contracts.
authors did not evaluate the model, the effectiveness of the model The best recall of 95% was reported by the SVM algorithm which
in identifying conflicts and risk-prone clauses is not known yet. outperformed other different algorithms, including rule-based algo-
Another model developed by Lee et al. (2019) used rule-based rithm, NB, LR, and feedforward neural network (FNN). The rule-
NLP to detect poisonous FIDIC clauses that are more biasedly based method could reach a recall of 90.6% only. In contrast with
favorable to the owner. The method includes the development previous studies in which the authors used ML for structuring the
of a lexicon describing the semantic elements of contract clauses text in contracts, Al Qady and Kandil (2010) used other approaches
(i.e., actors, payment, action), a contract parser using dependency including shallow parsing to identify the tags (i.e., noun, verbs, sub-
parsing (DP) and syntactic rules for extracting the semantic infor- ject) for each language unit in a clause and their dependencies to
mation from clauses, and a set of hand-coded rules for identifying facilitate the organization of clauses based on the information ex-
poisonous clauses based on the information extracted. The method tracted such as types of actors, types of required action. The study
showed reliable results with an F-score of 81.8% on a test dataset by Al Qady and Kandil (2010) performs the information organiza-
prepared by human experts. This approach was also adopted by tion using linguistic units such as active subjects, verbs, and passive

© ASCE
Table 2. Summary of studies on NLP-assisted contract review

Topic Specific solution Approach Methodology description Performance References
Automated detection Automated detection of ambiguities Corpus profiling and chunk analysis Compared a contract corpus with two N/A Curtotti and
of ambiguities associated with domain-specific words, general English corpora to identify the McCreath (2011)
prepositions, and conjunctions specific qualitative and quantitative
characteristics of contract language
Automated detection Identification of conflicting and Named entity extraction, ontology Developed a reasoner that compares the N/A Serag et al. (2010)
of poisonous clauses risk-prone clauses and logical reasoning logic facts with logic rules to identify
conflicting and risk-prone clauses
Identification of poisonous clauses Rule-based information extraction Utilized NLP along with semantic and 81.8% (F-score) Lee et al. (2019)
method syntactic rules to extract the poisonous
clauses from contracts
Identification of contractor-friendly Rule-based NLP method Rules-based framework for identifying 85.7% (precision), Lee et al. (2020)
clauses contractor-friendly clauses 75.0% (recall),
80.0% (F-score)
Extraction of risk-prone clauses from ML-based text classification Developed an automated model 94% (accuracy), Chakrabarti et al.
legal documents “risk-o-meter” to detect the 86% (F-score) (2019)
risk-prone paragraphs in contracts
Structuring text for Contractual requirement classification ML-based text classification Categorized the contract documents into 91.12% (accuracy) Caldas et al. (2002)
ease of contract review thirteen categories using ML algorithms
and management
Contractual requirement classification ML-based hierarchical text Performed the hierarchical text 95.88% (accuracy) Caldas and
classification classification to classify the contract Soibelman (2003)
documents into sixteen categories using
03121004-6
ML algorithms
Contractual requirement recognition Rule-based and ML-based binary Classified the contractual text into 98.15% (recall) Hassan and
text classification requirements and nonrequirements using Le (2020)
ML algorithms
Concept relation extraction from Shallow parsing and rules Utilized shallow parsing to perform 68% (F-measure) Al Qady and
contracts syntactic segmentation of contract clauses Kandil (2010)
and extract active concepts, passive
concepts, and relations
Automated extraction of subcontract ML-based text classification Used ML to classify main contract 94.18% (accuracy) Hassan et al. (2020)
scope requirements into different categories
corresponding to subcontract disciplines
Automated extraction of subcontract ML-based text classification Compared the performance of ML and deep 93.08 (recall) Hassan and
scope learning algorithms in classifying general Le (2021)
DB contract requirements into different
categories such as design, construction, and
O&M
J. Constr. Eng. Manage.
subjects in comparison with previous studies in which the infor- environmental, health, safety, security, quality, etc. The developed
mation organization was performed based on complete clauses. classifier used NB, SVM, and maximum entropy (ME) algorithms.
The proposed system used the Sundance shallow syntactic parser The SVM algorithm achieved the highest recall of 100% on their
(Riloff and Phillips 2004) to segment a contract clause into active dataset comprising of 330 regulatory provisions. On the other hand,
concepts (i.e., the client and the contractor), passive concepts NB and ME both reached the same recall of 96% only. To further
(i.e., work items), and their relations (i.e., action). When being classify the environmental regulatory provisions into more detailed
evaluated on general conditions of the standard form construction categories, Zhou and El-Gohary (2016b) investigated a different
contract, the proposed sentence parsing system achieved an F-score approach, namely, hierarchical text classification, to classify the
of 68% in comparison with the 76% reported by human annotators clauses in environmental regulatory codes into 10 different topics
who used conventional method. of environmental compliance checking, including air leakage, fen-
A few authors also developed NLP models for structuring the estration, lightening power, thermal insulation, etc. The authors used
requirements of DB contracts to support automated subcontract NB, SVM, kNN, DT, radius-based neighbors (RBN), and random
scope extraction. Hassan et al. (2020) implemented supervised ML forest (RF) algorithms to develop classification models. The best
algorithms including NB, SVM, LR, kNN, DT, and FNN to de- SVM model achieved a recall and precision of 97% and 84%, re-
velop a model that can extract subcontractor scope from the design- spectively, when being tested on a dataset of 1200 clauses. Other
build (DB) contracts. The model can classify DB requirements into algorithms also reported reliable results; however, RBN exhibited
three categories indicating different disciplines namely design, con- a very low 37.5% recall performance. Another study by Song et al.
struction, and operation and maintenance (O&M). The models (2018) used a deep learning algorithm to classify the regulatory pro-
were evaluated on a dataset of 2634 DB requirements where the visions into different categories such as site, building, structure, fa-
LR model achieved the highest accuracy of 94.18% and the NB cility, evacuation and fireproof, etc. The proposed method applies
model revealed the lowest accuracy of 87.48%. Following this the word2vec embedding technique to convert the meaning of words
study, Hassan and Le (2021) compared the performance of tradi- into numerical values. The classification labels are helpful to know
tional ML and deep learning in the classification of DB require- what type of information is required to be extracted from a provi-
ments. They tested a total of eight different machine learning sion. The developed model can also extract the top related words for
methods including traditional ML algorithms (NB, SVM, LR, any input word in the provision. The advantages of performing text
kNN, DT) and three deep learning algorithms (CNN, RNN_LSTM, classification before information extraction is that the irrelevant text
RNN_GRU). They also examined the ensemble of the classifier is filtered out and semantically similar provisions are grouped, which
which was found to exhibit the highest recall of 93.08%. Addition- improves the efficiency and accuracy of information extraction and
ally, they reported that the classification reliability depends on the subsequent CC rules.
vectorization method where word embedding vector space models Automated Information Extraction for Rules Development
seemed to outperform bag-of-words (BOW) method. Such clas- from Regulatory Provisions. In attempt to automate the compli-
sification models for separating DB requirements into different ance checking of designed structures against regulatory provisions,
categories, namely, design, construction, and O&M, can assist the a few studies have developed models for converting natural lan-
general contractors in extracting the precise scope of a subcontrac- guage provisions into computer-understandable rules. Most studies
tor from a lengthy and complex DB contract. The classification used rule-based approaches for this purpose. The first model for
models developed by researchers for structuring contractual text information extraction from regulatory provisions to develop CC
can assist in improving organization and access to critical infor- rules was proposed by Zhang (2011). The developed model can
mation provided in a large number of electronic text documents automatically extract essential information (i.e., subject, attribute,
produced in a construction project. Enabling quick access to es- comparison, quantity) which are then used to develop logic rules.
sential information through document organization improves the The information is extracted by matching patterns produced using
coordination, collaboration, and information exchange among proj- semantic (domain-specific meaning/context-related features) and
ect members to support effective planning and decision making. syntactic features (nouns, verbs, etc.) of the text. An ontology
It can further protect the contractor from any financial loss and dis- and a syntactic parser were used to extract semantic and syntactic
putes due to the missing information. features respectively from text to develop patterns. Their evaluation
results on quantitative requirements of International Building Code
NLP-Assisted Detection of Violation with 2006 indicated that the use of semantic features (using ontology) to
Construction Laws and Regulations develop patterns yields better performance than using syntactic
Another main research theme of NLP approaches to address legal features (using POS tags) since the semantic features could well
issues in construction was aimed at automated development of understand domain-specific terms and contexts. The syntactic and
logic rules from natural language regulatory provisions to support semantic features-based information extraction methods reported
automated compliance checking (ACC). Table 3 summarizes the an f-measure of 75% and 97.4%, respectively. Following this study,
research efforts undertaken that explored the capability of NLP- Zhang and El-Gohary (2016b) developed a more advanced
based ACC. As shown in the table, the main focus in this area was rule-based information extraction model that utilized dependency
on the following topics: automated classification of regulatory pro- information (i.e., nsubj, dobj, nmod) along with simple syntactic
visions, automated information extraction for rules development (i.e., noun, verb, adjective, adverb) and semantic (i.e., domain-
from regulatory provisions, and ACC of building information model specific meaning/context-related features) features of the text. In
(BIMs). The details of developed NLP frameworks in this area of comparison with the previous model, this model can extract more
research are discussed below. information from a provision such as subject, subject restriction,
Automated Classification of Regulatory Provisions. The classi- compliance checking attribute, deontic operator indicator, quantita-
fication of voluminous sets of unstructured provisions in laws and tive relation, comparative relation, quantity value, quantity unit, and
regulations is the first step required for ACC platforms. To address quantity restriction. The information extraction rules of the current
this, Salama and El-Gohary (2013) proposed an ML-based text study were validated on the 2009 International Building Code,
classification model for classifying regulatory provisions into four- resulting in a precision and recall of 96.9% and 94.4%, respec-
teen predefined categories of compliance checking (CC) such as tively. This approach also produces impressive performance with

© ASCE
Table 3. Summary of studies on NLP-assisted detection of violation with construction laws and regulations
Topic Solutions Approach Methodology description Performance References
Automated classification Classification of regulatory ML-based text classification Used ML algorithms to classify general 100% (recall), Salama and
of regulatory provisions provisions conditions of a contract into fourteen CC 96% (precision) El-Gohary (2013)
categories
Environmental code classification ML-based hierarchical text Used ML algorithms to perform hierarchical 97% (recall), Zhou and
classification text classification of environmental 84% (precision) El-Gohary (2016b)
regulatory codes into ten different topics
Automated rule checking system Deep learning-based approach Converted the meaning of words in numerical N/A Song et al. (2018)
values to classify the topic of sentences
Automated information Information extraction from Pattern matching-based rules using Extracted and represented the semantic 75%-100% (precision), Zhang (2011)
extraction for rules building code provisions for rules parsing information, POS tagging, information from building codes in a 75%-95% (recall),
development from construction and ontology computer-understandable structure to enable 75%–97.4% (F-score)
regulatory provisions rules construction
Information extraction from Rule-based information extraction Extracted regulatory information 96.9% (precision), Zhang and
building code provisions for rules approach from building codes using syntactic 94.4% (recall) El-Gohary (2016b)
construction (syntax/grammar-related) and semantic
(meaning/context-related) features
Extraction of information from Rule-based information extraction Used rule-based NLP methods to classify 98.1% (recall), Zhou and
environmental requirements methods and extract information from environmental 98.5% (precision) El-Gohary (2016a)
requirements in contracts
Information extraction from Dependency parsing (DP) and Compared the performance of DP and PSG to 94.3%–96.9% (F-score) Zhang and
fire code provisions for rules phrase structure grammar (PSG) extract information from fire codes El-Gohary (2012b)
construction methods
03121004-8
Information extraction from utility Semantic frame-based information Used a semantic frame-based information 92.23% (precision) Xu and Cai (2019)
policies extraction method extraction method with a focus on domain
semantics and lexical semantics
A nonproprietary and Logic-based representation and Investigated a logic-based representation and text representation >visual Zhang (2017)
user-understandable representation tree-based visualization approaches tree-based visualization methods representation >logic
of building regulations representation
ACC of Building Extension of current IFC schema Pattern-matching-based rules, Extract concepts from building regulations 88.7%–97.1% (precision), Zhang and
Information Models ML-based text classification and match the extracted concepts to concepts 94.2%–99.2% (recall), El-Gohary (2016a)
(BIMs) in the IFC class hierarchy to further extend 91.4%–98.1% (F-score)
the schema
Development of logic reasoning A combination of semantic NLP Used the NLP and EXPRESS data-based 87.6% (precision), Zhang and
system and EXPRESS data-based techniques to extract and transform both 98.7% (recall), El-Gohary (2017)
technique regulatory and design information in BIMs 92.8% (F-score)
into logical format to perform automated
compliance reasoning
a precision of 98.5% and a recall of 98.1%, which is higher than the methods were validated on small sets of building codes. Further
96.9% and 94.4% when being tested on a different dataset compris- research on automated design verification should be expanded to
ing of environmental regulations of a real construction contract other domains such as the highway sector and should use larger
(Zhou and El-Gohary 2016a). In a separate study, Zhang and samples of codes. Additionally, previous studies have mainly dealt
El-Gohary (2012b) made another comparison of performance be- with prescriptive requirements of which the design constraint can
tween rules developed based on DP and phrase-structure grammar be presented using the first order-logic rules. However, real work
(PSG). PSG corresponds to a set of phrase structure relations which design codes also include objective requirements that do not explic-
are defined by the rules that predict the different combinations of itly specify any quantitative constraint. We know very little about
tokens forming a grammatical phrase. PSG is actually used to pro- whether the existing methods perform well on such requirements.
duce phrase tags (QP→JJR IN CD) by using individual POS tags
(JJR, IN, CD). Phrase tags are used to extract information when a NLP Frameworks for the Analysis of
specific combination of POS tags is encountered. When being va- Historical Legal Cases
lidated on the 2009 International Fire Code, DP rules slightly out- Another main focus of the previous studies was centered on the
perform PSG rules as their F-measures were reported to be 96.9% processing of legal case data. The major NLP frameworks devel-
and 94.3%, respectively. A recent study by Xu and Cai (2019) used oped by previous researchers, as summarized in Table 4, were aimed
dependency parsing and POS tagging along with knowledge bases at addressing the following research problems: analysis of legal
such as ontologies and lexicons for entity recognition. In comparison cases data of construction defects and similar case retrieval. NLP
with the previously discussed rule-based methods, the proposed frameworks are highly effective in analyzing the previous case data
method reported a low precision of 92.23% for extracting the regu- to provide important insights to the project participants as well as the
latory information from the Indiana utility accommodation policy. court professionals. The details of the NLP frameworks developed
Another contribution to digital representation of building regulations in this area are discussed below.
was made by Zhang (2017). To enable the nonproprietary and user- Analysis of Legal Cases Data of Construction Defects. To exam-
understandable representation of building regulations, the author in- ine the common types of construction defects resulting in costly
vestigated a logic-based representation and tree-based visualization delays and disputes, Jallan et al. (2019) tested the ability of
method for building regulatory requirements to improve the under- NLP for analyzing legal cases related to construction defects.
standability and reading speed of building regulations. In terms of The authors used the frequency analysis of keywords in the historic
understandability and reading speed, the text representation per- construction legal case documents from the LexisNexis database to
formed better than the visual and logic representation in the exper- identify the common topics. They used the latent Dirichlet alloca-
imental study. tion (LDA) method, a modeling method for identifying the topic of
ACC of Building Information Models. Another critical task re- a document, to cluster the documents by topics based on their dis-
quired for ACC of building information models (BIMs) is the au- tinct keyword features. The study identified 14 unique topics from
tomatic extraction of CC-related information from BIMs to the cluster analysis. This developed automated framework can help
integrate machine readable-rules with BIMs. Due to the limited project participants to understand better the construction defects
coverage of CC-related concepts in the existing Industry Founda- that lead to legal cases in the past. Subsequently, this knowledge
tion Classes (IFC)–based BIM models, a few efforts have been can reveal the root causes of defects to prevent similar issues and
made to automate the process of extending IFC schemas using costly litigation in future projects.
NLP. To address this issue, a new method for extending the IFC Similar Case Retrieval. Several studies have used NLP to enhance
schema was proposed by Zhang and El-Gohary (2016a). The the efficiency and effectiveness of legal case analysis by reviewing
method utilizes pattern matching-based methods and ML tech- similar cases in a certain legal database. Fan and Li (2013) devel-
niques including NB, SVM, kNN, and DT to extract concepts from oped a framework for retrieving similar cases in a legal case library
regulatory documents and predict their relationship with IFC con- given a certain input case. The search results are based on the se-
cepts to extend the schema. Pattern matching–based rules based on mantic similarity rather than string similarity between the user’s
POS tag information are used for concept extraction from codes input within the case descriptions. In their study, the documents
which are then matched with the IFC class hierarchy to extract were vectorized using the bag of word (BOW) method along with
the most related IFC concepts using WordNet. The ML algorithm the use of TF-IDF for measuring the term weight. The method was
predicts the relationship between the extracted concepts and the found to outperform the traditional string-based document search.
most related IFC concepts which is used to further extend It was tested on the Westlaw legal information database with differ-
the IFC schema. The kNN and NB model reported the highest and ent input legal case descriptions and received approximately 40%
lowest precision of 91% and 76.2%, respectively, when evaluated precision and 90% recall at the top 90 results. One limitation of this
on International Building Codes 2006 and 2009. Following this study is that it did not consider the variation of terms such as syn-
study, Zhang and El-Gohary (2017) also offer a novel system for onyms in the input description and the text database. Besides, this
fully automated checking of BIMs for compliance with build- approach also requires the user to have a lengthy description of the
ing regulations. The system comprises three modules of regulatory case while searching for information using the key phrases is pref-
information extraction, design information extraction, and com- erable. To address those drawbacks, Zou et al. (2017) developed a
pliance reasoning. The first module uses pattern matching-based new keyword-based search algorithm that uses the semantics of
rules to extract semantic information elements (subject, compliance words for legal case retrieval. They used a risk-related domain dic-
checking attribute, quantity, etc.) from regulatory codes and convert tionary as well as the general dictionary WordNet, which helps the
them into logic rules. In the second module, the EXPRESS data- algorithm to improve the search results as the synonyms and other
based techniques are implemented to transform design information lexical forms of the input keywords are also used as additional in-
in BIMs into logic facts. Finally, the compliance of the logic facts put. The authors tested their method on WorkSafeBC and NIOSH
with the logic rules is performed by executing the semantic-based case datasets with a precision of mostly above 90% and an F-score
logic reasoning algorithms in the compliance reasoning module. varying from 42% to 83% at the top 10 results.
The model developed in this study reported an F-score of 92.8%. Although similar case retrieval is useful for making an appropri-
In the area of compliance checking, most existing NLP-based ate judgment on new dispute situations, an insightful comparison

Cavar et al. (2018)
Jallan et al. (2019)
Fan and Li (2013)
Zou et al. (2017)

between dissimilar cases would also be necessary, especially when
References
only a limited number of cases are available. This problem was re-
cently tackled by Cavar et al. (2018), who developed a new method
to support in-depth examination of the difference and contradiction
between legal cases. The authors proposed a deep linguistic NLP
method for constructing the knowledge graph for each of the cases
in a legal document repository. They used graph theories to identify
14 unique topics identified
parts of the case description that differ or overlap with one another.
42.3%–83.3% (F-score)
70%–100% (precision),
These graphs also include a domain vocabulary composed of syn-
50%–100% (recall), onyms and other semantically related terms. This study, however,
Performance
90%–100% (recall)
was not robustly tested due to the lack of gold standard resources
N/A and corpora.
Directions for Future Research
Overview of Future Needs

cases to identify patterns such as topics,
Used NLP to retrieve similar cases for
Model for comparing cases in terms of

similar cases for construction project
Analyze past 10-year historical legal
Utilized NLP techniques to retrieve
By far, many scholars have devoted themselves to enabling a sig-

Methodology description
alternative dispute resolution in
nificant improvement in the analysis and management of textual

legal data in construction by leveraging recent advances in NLP.
similarity and contradiction
However, the status quo regarding this line of research still involves
construction accidents
the following three common limitations. Firstly, the lack of avail-

able domain-specific datasets poses a major challenge for applying
risk management
types of defects
NLP in the construction domain. Due to the application-oriented

nature of research in construction, the collection of domain-specific
datasets is critical for developing and evaluating NLP models, as
well as providing benchmarks for future efforts. However, obtaining
high-quality datasets in construction can be very difficult. Existing
databases for legal cases and documents in the construction domain
are very limited due to privacy data concerns from stakeholders.
semantic query expansion approach
Latent Dirichlet Allocation (LDA)
To ensure reliable performance of NLP models, significant effort

Vector space model (VSM) and
partnering with industry organizations in making labeled data

Deep linguistic NLP method
widely available is a pressing need. Second, the performance of

most existing NLP models to address legal issues in construction
Approach
is still inadequate as the contextual information is seldom integrated.

Vector space model
Since legal construction documents consist of unique terminologies

and syntactic patterns, it is difficult for generic NLP models that
proved their effectiveness in other domains to produce equally re-
Table 4. Summary of studies on NLP-based analysis of historical legal cases
liable performance without adapting to the construction domain.

A few recent studies have attempted to address this challenge by
infusing domain knowledge into the NLP models with the help
of semantic models such as ontologies, taxonomies, and lexicons
Automated information retrieval
Automated information retrieval
(Le and Jeong 2017; Lee et al. 2020; Xu and Cai 2019; Zhang
construction-defect litigation
and El-Gohary 2016b). The findings from those efforts showed

Identification of patterns in
an impressive improvement; thus, future studies would need to con-

tinue the use of domain knowledge models to strengthen further the
Solution
applicability of NLP frameworks to the legal context in construc-

from similar cases
from similar cases
Case law analysis
tion. Third, most of the existing methods for addressing construction

legal issues are in the proof-of-concept stage. These studies were
developed and validated based on researcher-labeled data. There
is still a lack of evaluation using real-world scenarios and feedbacks
cases
from construction practitioners.

In addition to the need for enhancing the robustness of the
existing NLP models, future research is encouraged to tackle other
data of construction defects
aspects of legal issues that have not been well investigated. Table 5
Analysis of legal cases
suggests several new research problems that are worth investigating

Similar case retrieval
to support the following tasks: contract drafting, contract review,

contract simplification for ease of reading, impact tracing of changes
in contractual requirements, early compliance checking of design
codes, construction safety violence assessment, and court outcome
Topic
prediction of construction disputes. The description of the potential

future research problems is provided as follows.

© ASCE
Table 5. Suggested future research areas on legal text processing in construction using NLP
Topic Research problem Proposed solution References
Contract drafting assistance Contract drafting and comprehension An automated framework to assist in precise contract drafting as Demasco and McCoy (1992), Hamie and
well to facilitate the expansion of the abbreviations for contract Abdul-Malak (2018)
comprehension
Subcontract scope extraction Separation of general contraction into small subcontracts indicating Assaad et al. (2020), Hassan and Le (2021)
different disciplines
Automated contract review Contract clarity and preciseness Automated model to verify the inclusion of sufficient details and Jagannathan and Delhi (2020)
conditions in contracts to prevent litigious behavior
Ambiguity in contractual clauses An automated model to predict the litigation proneness of Jagannathan and Delhi (2019)
contractual clauses
Administration of owner obligations A model to extract and classify the owner obligations in standard Abotaleb et al. (2019)
contracts into different categories, including permits, design
documents, reviews, etc.
Simplification of legal text for ease of Contract language simplification Replacement of infrequent legal jargons with domain-specific Lin et al. (2009), Saseendran et al. (2020)
reading common and simpler words to improve the readability of legal texts
Expansions of abbreviations used in contracts to reduce reader’s McCoy and Demasco (1995)
cognitive load spent for remembering abbreviations
An automated text simplification method to simplify the structure Rameezdeen and Rodrigo (2013, 2014),
and language used in contracts Uusitalo et al. (2011)
Impact tracing of changes in contractual Requirement changes impact analysis Development of a model to predict all requirements in a contract Arora et al. (2015), Navon and Isaac (2009),
requirements being impacted by the change in a specific requirement Rameezdeen and Rodrigo (2014)
Enabling early compliance checking with Precise selection of applicable provisions An automated model to select the correct applicable requirements Altmann and Samani (unpublished report,
design codes from codes from the regulatory design codes to carryout conceptual design 1978), Nedev and
03121004-11
Khan (2011)
Construction safety violence assessment Construction safety violent assessment Automated analysis of accident inspection reports for Zhang et al. (2019a)
identification of the OSHA citations violated by the contractor to
determine the penalty
Court outcome prediction of construction Assessment of quality of construction Development of a model for identifying similar cases in previous Aletras et al. (2016)
disputes claims litigation history to predict the outcome of a new legal case at an
early stage
Contract Drafting Assistance contract is a common issue found in construction project contracts
necessitating further effort from the academic community (Mshali
Despite recent advances in the implementation of NLP in control-
2016; Parvizimosaed 2020). The aforementioned issues in contract
ling the quality of contract drafting, writing assistance models for
documents can be resolved by developing NLP models using hand-
contract drafting are still an open challenge (Curtotti and McCreath
coded rules (Lee et al. 2019). Although the development of rules is
2011; McCoy and Demasco 1995). The current state of the art in
a time-consuming task, the rule-based approaches yield higher per-
this regard involves several NLP approaches to the automated de-
formance than ML-based approaches given the fact that large-sized
tection of writing errors such as ambiguous terminologies, use of
poisonous clauses, or missing clauses (Chakrabarti et al. 2019; Lee datasets are not often available in construction domain (Salama and
et al. 2019, 2020; Serag et al. 2010). Still, little effort has been paid El-Gohary 2013).
Furthermore, most previous studies on contract administration
to computational methods that can assist contract drafters in select-
ing the right language or structure to phrase clauses. It would be were orientated towards contractor obligations (Caldas et al. 2002;
helpful to examine the ability of NLP in determining the attributes Hassan and Le 2021). However, owners also have several obliga-
of high-quality contract documents and effective strategies that can tions in a project, such as making payments, providing site infor-
help the contract writer improve their writing product. NLP meth- mation, assistance with permits, work inspection, and reviewing
ods such as the sentence COMPANSION technique developed by and approving submittals. Failure to comply with these obliga-
Demasco and McCoy (1992) may be adapted to develop a domain- tions unintentionally or intentionally was also reported as the main
specific writing assistance method for the drafting of contracts source of disputes in construction projects (Abotaleb et al. 2019).
or other legal documents in construction. The model was devel- Since the standard form of contracts (e.g., AIA and FIDIC) are in-
oped by the computer science domain experts for individuals with creasingly used in construction (Fawzy and El-adaway 2012), re-
language impairments that can generate a well-formed sentence searchers are encouraged to leverage NLP to examine highlight the
from a input sequence of uninflected words by the user. The prin- similarities and differences in the owner’s obligation clauses be-
ciple underpinning their system is the analysis of semantic infor- tween the standard forms. Each type of contract defines different
mation or how words co-occur in a trained corpus. Furthermore, an obligations for the owner. For instance, FIDIC contracts demand
intelligent NLP model that prompts the contract writer with appro- that the owner provide assistance in document preparation only in
priate words would allow him/her to select the most appropriate obtaining approvals, whereas ConsensusDOCS contracts require
word choice. As suggested by Garay-Vitoria and Gonzalez-Abascal the owner to pay as well to obtain approvals and permits. Further-
(1997), researchers may explore how NLP can learn the sequence more, extraction of the owner obligations from the contracts and
of words and their contexts to predict the most suitable terms. subsequent classification into different categories (e.g., permits, de-
Moreover, there is a need for an automated or semiautomated sign documents, review of submittals and requests, etc.) can assist
approach to the separation of the main contract into the smaller in the better administration of contracts.
subcontracts. Typically, the majority of the construction work (from
80%–90%) is performed by subcontractors (Arditi and Chotibhongs Simplification of Legal Texts for Ease of Reading
2005; Mbachu 2008). Therefore, the process of subcontract separa-
tion is often required (Assaad et al. 2020). Each subcontract is typ- Clearly, reading contracts is a key task performed by engineers.
ically responsible for only a small portion; thus, related clauses to a However, legal language is typically not a native language of en-
particular work portion need to be identified and added to the sub- gineers, causing much difficulty for many engineers when read-
contract requirements. The missing of any critical requirement as- ing legal text (Chong 2012; Rameezdeen and Rajapakse 2007;
sociated with the subcontractor scope can lead to costly disputes Rameezdeen and Rodrigo 2014). Thus, legal contract provisions
among the contracting parties. Although the NLP models developed should be written in such a way that it can be easier for engineers
by Hassan et al. (2020) and Hassan and Le (2021) facilitated to comprehend (Saseendran et al. 2020). In this regard, potential
subcontract scope extraction, the model is applicable to only three future research includes using state-of-the-art NLP methods for text
disciplines in highway projects. Since several subcontractors are simplification to convert the legal language into an equivalent but
generally involved in large multidisciplinary projects, a more com- simpler language (Temnikova 2012; Wang et al. 2016). Readability
prehensive model that can extract scope associated with any subcon- assessment tools should be developed that can help contract writers
tracted task (such as plumbing, electrical, etc.) in any project type to identify difficult legal terms. Domain-specific methods capable of
(industrial, commercial, etc.) is still required. replacing infrequent legal jargons with more common and simple
words may improve the ease of reading legal documents for con-
struction engineers. It is worth investigating such a strategy for sim-
Automated Contract Review plifying construction contracts as it has shown considerable success
Despite the fact that significant efforts have been made to develop in other fields (Dell’Orletta et al. 2015). The use of simpler language
contract review methods for enhancing the quality of contracts, there would be extremely helpful, especially for projects in which
are still significant challenges that require further investigation. For the stakeholders are from multiple countries (Foliente 2000; Lin
example, several researchers expressed the need for new contract et al. 2009; Moku et al. 2012). Foreign professionals whose native
review methods capable of detecting dispute-prone clauses in the language is not English may face great difficulties in understanding
contract (Chakrabarti et al. 2019). Specially, future research should legal documents written in English. Moreover, abbreviations are fre-
offer effective means for automated identification of contractual quently used in the contracts. Research shows that expanding the
clauses (e.g., penalties, rights, obligations, payments clauses) that abbreviations used in the natural text can further improve compre-
are more likely to lead to disputes (Jagannathan and Delhi 2019). hension as it reduces the reader’s cognitive load spent for remem-
Data-driven approaches that leverage databases of the legal case bering a large number of abbreviations for different words (McCoy
such as LexisNexis would be worth considering. These approaches and Demasco 1995). Another text simplification strategy that is
require novel NLP models for analyzing the frequency of contract worth investigating is proposing a controlled natural language
clauses cited in legal cases. Novel measures of litigation-proneness for construction project contracts. The use of standardized vocabu-
would also be in a critical need to address the above challenge. lary and grammar will help reduce variation in contract writing
Additionally, the inclusion of contradictive clauses in a single styles, thus minimizing reading effort spent on a new contract.

This approach was tested in the industrial sector by Uusitalo et al. to significantly accelerate and improve the quality of the design
(2011). Given that no similar efforts have been found in construc- process right in the early phase.
tion, researchers are strongly encouraged to explore such text sim-
plification strategies specifically for construction legal documents.
Construction Safety Violence Assessment
As showed earlier in the “Results” section, previous studies were
Impact Tracing of Changes in Contractual
focused on automated compliance checking of design products
Requirements
against those requirements in codes and standards. No NLP model
Contractual requirements are often changed according to the needs has been developed to support compliance checking of the construc-
that evolved during the execution stage (Jallow et al. 2008; Navon tion operation. During the construction stage, legal issues often arise
and Isaac 2009; Rendon 2012). Assessing the impact of the changes once an accident occurs on the site. Construction operation is typ-
made for a specific requirement on other requirements in the con- ically governed by various safety standards developed and managed
tract is essential to maintain the consistency and correctness of by the Occupational Safety and Health Administration (OSHA).
requirements (Rameezdeen and Rodrigo 2014). As shown in the These regulatory documents mandate the contractors to maintain
results of this review, no study was found that assist construction safety programs to protect the health and safety of construction
engineer in tracing the dependencies between requirements. In soft- workers (Arditi and Chotibhongs 2005). However, it is quite often
ware engineering, efforts have been made to address the similar that such safety programs are not complied by contractors, resulting
requirement tracing using NLP. For example, Arora et al. (2015) in many fatalities and serious injuries (Arditi and Chotibhongs
introduced a change impact analysis model for software design 2005). Once that occurs, inspectors are required to develop an in-
requirements that adopts NLP techniques to identify relevant re- spection narrative report, which is then used to determine OSHA
quirements potentially impacted by a given change in a certain re- violation citations and corresponding penalties (Nichting 1999;
quirement based on the matching phrases between requirements. Treadwell 1971). The need for such an automated method for
Adopting such a phrase-based approach for construction contracts retrieving legal articles relevant to the accident description has
may be a good research opportunity. The authors of this paper emerged as the manual process of identifying such legal references
strongly believe that such a string-based relevance measurement is time-consuming (Çevikbaş and Köksal 2018). NLP models
method should be supplemented with engineering knowledge sour- that analyze accident inspection reports and retrieve appropriate
ces. Ontologies that model civil engineering rules are necessary to references in lengthy OSHA safety standards would greatly
allow more precise identification of dependent requirements. This is benefit safety inspection officers. To tackle this problem, deep
a challenging problem as construction is well identified as a frag- neural networks such as convolutional neural network (CNN)
mented industry where stakeholders are from different unique en- were shown to be superior to keyword-based search as reported
tities and use different language. by Zhang et al. (2019b) who trained a supervised classification
model using CNN that predict the applicable articles for a criminal
case. Researchers may also examine other traditional ML algo-
Enabling Early Compliance with Design Codes rithms and compare them with CNN. However, one well-
Compliance checking is again a critical measure to avoid legal known challenge of this end-to-end ML approach may be the lack
issues as it helps to control design errors and ensures the structure of data since such labelled data is hardly available for accident
meets all required safety measures to control environmental risks inpsection reports. Given the fact that OSHA regulations
such as flood, earthquake, etc. (Pritchard 2013). As discussed ear- include a large number of articles, this difficulty should not be
lier, researchers have developed plenty of NLP models that trans- overlooked.
form natural language requirements into machine-readable rules
(Zhang 2011; Zhang and El-Gohary 2012a, 2016b). The benefit
of these techniques in minimizing design errors is inevitable. Still, Court Outcome Prediction of Construction Disputes
there are significant research gaps that are worth investigating. Construction projects often suffer costly delays due to the
Existing NLP models for compliance checking assumed that a set lengthy litigation process (Essex 1996). The loss of time and
of requirements applicable to the project is given. Thus, they are money can be prevented if the contracting parties can evaluate
only applicable to the later design phase when a complete BIM and understand the worth of their claims before taking it to the
model has been developed. Any errors found at this stage will be costly litigation process (Iyer et al. 2008). This information is cru-
costly to fix. In addition, previous studies require a set of require- cial for both the contractor and the owner in their negotiation and
ments to be defined, which is hardly available without a tedious dispute resolution. State of the art has overlooked the application of
process of reading various voluminous regulatory documentation NLP in this area. There is a need to develop models for predicting
to identify what provisions to be included at the beginning of the court outcome for a certain construction claim. Indeed few stud-
the design stage. The selection of correct provisions from standards ies were found that used NLP for predicting court judgment for hu-
and codes is critical to control environmental risks involved in a man rights-related cases, for example, the one by Aletras et al.
project (Altmann and Samani, unpublished report, 1978; Nedev (2016). They develop a binary classification model using supervised
and Khan 2011). However, the standards and codes are often com- ML to predict if a case violated or did not violate a human rights
plex documents including a large set of requirements which makes article or regulation. Such models can significantly reduce the fre-
the selection of applicable requirements a challenging task espe- quency of litigation in construction. Given the fact that legal cases
cially for young engineers having limited knowledge and experi- are widely available in legal datasets such as LexisNexis, training a
ence (Bulleit 2012). Therefore, more attention should be paid to predictive model on those textual data using ML seems to be a fea-
explore the capability of NLP to support early conceptual designs. sible approach. Future research needs to use NLP techniques to ex-
Developing query frameworks using NLP to assist engineers in de- tract valuable data such as types of disputed delays from these
termining design provisions applicable to a given type of risk factor unstructured sources to leverage these data to predict if the outcome
could be an interesting research problem. Such research is expected is favorable to a plaintiff. Anticipating the likelihood of success

would help the contractor and the owner avoids a costly litigation Data Availability Statement
process.
The literature review data generated or used during the study are
available from the corresponding author by request.
Conclusions
The paper presents an in-depth analysis of the state-of-the art and References
the identification of potential future research needs regarding the
use of NLP to reduce legal issues in construction. The review dem- Abotaleb, I. S., I. H. El-adaway, and M. B. Moussa. 2019. “Guidelines
onstrated that various NLP models, using either human-crafted for administrating and drafting nonpayment owners’ obligation provi-
sions under design-build contracts.” J. Manage. Eng. 35 (4): 04019010.
rules or machine learning, have been developed that aimed at
https://doi.org/10.1061/(ASCE)ME.1943-5479.0000693.
processing different types of legal documents including contracts, Aletras, N., D. Tsarapatsanis, D. Preoţiuc-Pietro, and V. Lampos. 2016.
regulatory codes, and legal case data. While most research efforts “Predicting judicial decisions of the European Court of Human Rights:
have been centered on contracts and regulatory texts, legal case data A natural language processing perspective.” PeerJ Comput. Sci. 2: e93.
gained comparatively less attention. Various NLP frameworks have https://doi.org/10.7717/peerj-cs.93.
been developed that can assist engineers in classifying contractual Almutairi, S., J. Kashiwagi, D. Kashiwagi, and K. Sullivan. 2020. “Factors
requirements and detecting risk-prone clauses. In addition, signifi- causing construction litigation in Saudi Arabia.” J. Adv. Perform Inf.
cant efforts have been focused on developing computer-readable Value 7 (1): 58. https://doi.org/10.37265/japiv.v7i1.54.
rules from natural language design provisions to support the auto- Al Qady, M., and A. Kandil. 2010. “Concept relation extraction from
construction documents using natural language processing.” J. Constr.
mated verification of design products. Despite a significant ad-
Eng. Manage. 136 (3): 294–302. https://doi.org/10.1061/(ASCE)CO
vancement that has been made in the last decades, the full .1943-7862.0000131.
potential of NLP for reducing legal conflicts in construction is Apté, C., F. Damerau, and S. M. Weiss. 1994. “Automated learning of
yet to be achieved. It is worth noting that none of the current meth- decision rules for text categorization.” ACM Trans. Inf. Syst. 12 (3):
ods was tested in a real-world context and many studies reported a 233–251. https://doi.org/10.1145/183422.183423.
low validation accuracy. The main barrier can be attributed to the Arditi, D., and R. Chotibhongs. 2005. “Issues in subcontracting practice.”
lack of high-quality data due to privacy concerns from construction J. Constr. Eng. Manage. 131 (8): 866–876. https://doi.org/10.1061
firms. In addition, the inadequate integration of domain knowledge /(ASCE)0733-9364(2005)131:8(866).
bases such as taxonomies and ontologies into generic NLP tech- Arksey, H., and L. O’Malley. 2005. “Scoping studies: Towards a methodo-
niques could made further loss in their performance. logical framework.” Int. J. Soc. Res. Methodol. Theory Pract. 8 (1):
19–32. https://doi.org/10.1080/1364557032000119616.
As demonstrated, this critical review has also identified the fol-
Arora, C., M. Sabetzadeh, A. Goknil, L. C. Briand, and F. Zimmer. 2015.
lowing key research areas that are worth further investigation. First, “Change impact analysis for Natural Language requirements: An NLP
insightful knowledge on contractual language extracted from the approach.” In Proc., 2015 IEEE 23rd Int. Requirements Engineering
analysis of historical case data should be integrated into computa- Conf., RE 2015, 6–15. New York: IEEE.
tional tools that can assist contract writers in selecting the right Assaad, R., A. Elsayegh, G. Ali, M. Abdul Nabi, and I. H. El-Adaway. 2020.
terms and structure. Additionally, it is necessary to develop new “Back-to-back relationship under standard subcontract agreements:
advanced NLP methods that can make legal texts be more readable Comparative study.” J. Leg. Aff. Dispute Resolut. Eng. Constr. 12 (3):
to engineers. Replacing difficult terms with more popular word 04520020. https://doi.org/10.1061/(ASCE)LA.1943-4170.0000406.
choices or shortening complex sentences have been identified as Azghandi-Roshnavand, A. 2019. “Evaluation of construction contract
potential strategies to simplifying legal texts. Finally, research documents to be applied in modular construction focusing ambiguities:
A text processing approach.” Ph.D. dissertation, Dept. of Building,
on automated compliance checking should be expanded from
Civil and Environmental Engineering, Concordia Univ.
the design to construction operation context. Researchers may in- Baker, H., M. R. Hallowell, and A. J. P. Tixier. 2020. “Automatically learn-
vestigate the capability of NLP in processing construction operation ing construction injury precursors from text.” Autom. Constr. 118 (Oct):
related documents such as safety accident reports to verify their com- 103145. https://doi.org/10.1016/j.autcon.2020.103145.
pliance with national and regional construction standards. Balakrishnan, V., and L.-Y. Ethel. 2014. “Stemming and lemmatization: A
The contribution of this paper is threefold. First, the study comparison of retrieval performances.” Lect. Notes Software Eng. 2 (3):
explicitly identified and systematically synthesized legal issues 262–267. https://doi.org/10.7763/LNSE.2014.V2.134.
and appropriate state-of-the-art NLP solutions. This is meaningful Berry, D. M., E. Kamsties, and M. M. Krieger. 2003. “From contract draft-
to construction professionals as it helps to enhance their awareness ing to software specification: Linguistic sources of ambiguity—A hand-
book version 1.0.” In Automated Software Engineering, 1–80. State
about the applicability of NLP in tackling a specific problem in the
College, PA: Pennsylvania State Univ. http://citeseerx.ist.psu.edu
use of legal documents in construction. Second, this paper is hoped /viewdoc/summary?doi=10.1.1.9.7928.
to increase the robust understanding of the reader about methodo- Bulleit, W. M. 2012. “Structural building codes and communication sys-
logical concerns by generating knowledge from various studies. tems.” Pract. Period. Struct. Des. Constr. 17 (4): 147–151. https://doi
This would help to improve researchers’ understanding regarding .org/10.1061/(ASCE)SC.1943-5576.0000136.
the reliability and limitations of different technical approaches. The Cakmak, P. I., and E. Cakmak. 2013. “An analysis of causes of disputes
study findings can provide guidance for future research in design- in the construction industry using analytical hierarchy process (AHP).”
ing suitable research methodology. Finally, this paper contributes to In Proc., AEI 2013: Building Solutions for Architectural Engineering—
the body of knowledge by making various recommendations for Proc., 2013 Architectural Engineering National Conf., 93–101. Reston,
future research areas. It is expected that the identified research gaps VA: ASCE.
Caldas, C. H., and L. Soibelman. 2003. “Automating hierarchical docu-
would encourage researchers to make continuous contributions in
ment classification for construction management information systems.”
order to realize the full potential of NLP, particularly in enabling the Autom. Constr. 12 (4): 395–406. https://doi.org/10.1016/S0926-5805(03)
ease and effectiveness when dealing with the natural language use in 00004-9.
legal construction documents. Ultimately, successfully filling these Caldas, C. H., L. Soibelman, and J. Han. 2002. “Automated classification of
gaps in processing legal documents are expected to make continu- construction project documents.” J. Comput. Civ. Eng. 16 (4): 234–243.
ous improvement in reducing legal conflicts. https://doi.org/10.1061/(ASCE)0887-3801(2002)16:4(234).

Cavar, D., J. Herring, and A. Meyer. 2018. “Case law analysis using deep Gonçalves, T., and P. Quaresma. 2004. “The impact of NLP techniques in
NLP and knowledge graphs.” In Proc. 11th Int. Conf. Language Re- the multilabel text classification problem.” In Intelligent information
sources and Evaluation, edited by G. Rehm, V. Rodríguez-Doncel, and processing and web mining, 424–428. Berlin: Springer.
J. Moreno-Schneider, 7–12. Paris: European Language Resources Gutiérrez-Batista, K., J. R. Campaña, M.-A. Vila, and M. J. Martin-Bautista.
Association. 2019. “Using word embeddings and deep learning for supervised topic
Çevikbaş, M., and A. Köksal. 2018. “An investigation of litigation process in detection in social networks.” In Proc., Int. Conf. on Flexible Query
construction industry in Turkey.” Teknik Dergi/Techn. J. Turk. Chamber Answering Systems, 155–165. Cham, Switzerland: Springer.
Civ. Eng. 29 (6): 8715–8729. https://doi.org/10.18400/tekderg.389757. Hamie, J., and M. A. Abdul-Malak. 2018. “Rules-based approach for
Çevikol, S., and F. B. Aydemir. 2019. “Detecting inconsistencies of natural construction contract documents interpretation.” In Proc., Construction
language requirements in satellite ground segment domain.” In Proc., Research Congress 2018, 186–195. Reston, VA: ASCE.
CEUR Workshop, 2376. Aachen, Germany: Center for European Union Hassan, F., and T. Le. 2021. “Computer-assisted separation of design-
Research. build contract requirements to support subcontract drafting.” Autom.
Chakrabarti, D., N. Patodia, U. Bhattacharya, I. Mitra, S. Roy, J. Mandi, Constr. 122 (Feb): 103479. https://doi.org/10.1016/j.autcon.2020
N. Roy, and P. Nandy. 2019. “Use of artificial intelligence to analyse risk .103479.
in legal documents for a better decision support.” In Proc., IEEE Region Hassan, F., T. Le, and D. H. Tran. 2020. “Multi-class categorization
10 Annual Int. Conf., TENCON, 683–688. New York: IEEE. of design-build contract requirements using text mining and natural
Chan, E. H., and H. C. Suen. 2005. “Disputes and dispute resolution systems language processing techniques.” In Proc., Construction Research
in Sino-foreign joint venture construction projects in China.” J. Civ. Eng. Congress 2020: Project Management and Controls, Materials, and
Educ. 131 (2): 141–148. https://doi.org/10.1061/(ASCE)1052-3928 Contracts, 1266–1274. Reston, VA: ASCE.
(2005)131:2(141). Hassan, F. U., and T. Le. 2020. “Automated requirements identification
Chi, N. W., K. Y. Lin, N. El-Gohary, and S. H. Hsieh. 2017. “Gazetteers for from construction contract documents using natural language process-
information extraction applications in construction safety management.” ing.” J. Leg. Aff. Dispute Resolut. Eng. Constr. 12 (2): 04520009. https://
In Computing in civil engineering, 401–408. Reston, VA: ASCE. doi.org/10.1061/(ASCE)LA.1943-4170.0000379.
Chong, H. Y. 2012. “Improving contract administration: Information sys- Iyer, K. C., N. B. Chaphalkar, and G. A. Joshi. 2008. “Understanding time
tem approach on legal information.” Information 15 (11B): 4891–4900. delay disputes in construction contracts.” Int. J. Project Manage. 26 (2):
Chowdhury, G. G. 2003. “Natural language processing.” Annu. Rev. Inf. 174–184. https://doi.org/10.1016/j.ijproman.2007.05.002.
Sci. Technol. 37 (1): 51–89. https://doi.org/10.1002/aris.1440370103. Jagannathan, M., and V. S. K. Delhi. 2019. “Litigation proneness of dispute
Curtotti, M., and E. C. McCreath. 2011. “A corpus of Australian contract resolution clauses in construction contracts.” J. Leg. Aff. Dispute Res-
language: Description, profiling and analysis.” In Proc., Int. Conf. on olut. Eng. Constr. 11 (3): 04519011. https://doi.org/10.1061/(ASCE)
Artificial Intelligence and Law, 199–208. New York: Association for LA.1943-4170.0000301.
Computing Machinery. Jagannathan, M., and V. S. K. Delhi. 2020. “Litigation in construction con-
Cutting, D., J. Kupiec, J. Pedersen, and P. Sibun. 1992. “A practical part- tracts: Literature review.” J. Leg. Aff. Dispute Resolut. Eng. Constr. 12 (1):
of-speech tagger.” In Proc., 3rd Conf. on Applied Natural Language 03119001. https://doi.org/10.1061/(ASCE)LA.1943-4170.0000342.
Processing, 133–140. New York: Association for Computing Jallan, Y., E. Brogan, B. Ashuri, and C. M. Clevenger. 2019. “Application
Machinery. of natural language processing and text mining to identify patterns in
Dell’Orletta, F., M. Wieling, G. Venturi, A. Cimino, and S. Montemagni. construction-defect litigation cases.” J. Leg. Aff. Dispute Resolut. Eng.
2015. “Assessing the readability of sentences: Which corpora and fea- Constr. 11 (4): 04519024. https://doi.org/10.1061/(ASCE)LA.1943
tures?” In Proc., 9th Workshop on Innovative Use of NLP for Building -4170.0000308.
Educational Applications, 163–173. Stroudsburg, PA: Association for Jallow, A. K., P. Demian, A. N. Baldwin, and C. J. Anumba. 2008. “Life-
Computational Linguistics. cycle approach to requirements information management in construc-
Demasco, P. W., and K. F. McCoy. 1992. “Generating text from compressed tion projects: State-of-the-art and future trends.” In Vol. 2 of Proc., 24th
input: An intelligent interface for people with severe motor impair- Annual Conf. of Association of Researchers in Construction Manage-
ments.” Commun. ACM 35 (5): 68–78. https://doi.org/10.1145/129875 ment ARCOM, 769–778. Leesburg, VA: Association of Researchers in
.129881. Construction Management.
Deng, F., and H. Smyth. 2013. “Contingency-based approach to firm perfor- Kamath, C. N., S. S. Bukhari, and A. Dengel. 2018. “Comparative study
mance in construction: Critical review of empirical research.” J. Constr. between traditional machine learning and deep learning approaches for
Eng. Manage. 139 (10): 04013004. https://doi.org/10.1061/(ASCE)CO text classification.” In Proc., ACM Symp. on Document Engineering
.1943-7862.0000738. 2018, DocEng 2018. New York: Association for Computing Machinery.
Domingos, P. M. 2012. “A few useful things to know about machine learn- Kilian, J. J. 2003. A forensic analysis of construction litigation, US Naval
ing.” Commun. ACM 55 (10): 78–87. https://doi.org/10.1145/2347736 Facilities Engineering Command. Monterey, CA: Naval Postgraduate
.2347755. School.
Essex, R. J. 1996. “Means of avoiding and resolving disputes during con- Kilian, J. J., and G. E. Gibson. 2005. “Construction litigation for the
struction.” Tunnelling Underground Space Technol. 11 (1): 27–31. U.S. naval facilities engineering command, 1982–2002.” J. Constr. Eng.
https://doi.org/10.1016/0886-7798(96)00048-X. Manage. 131 (9): 945–952. https://doi.org/10.1061/(ASCE)0733-9364
Fan, H., and H. Li. 2013. “Retrieving similar cases for alternative dispute (2005)131:9(945).
resolution in construction accidents using text mining techniques.” Kowsari, K., D. E. Brown, M. Heidarysafa, K. Jafari Meimandi, M. S.
Autom. Constr. 34 (Sep): 85–91. https://doi.org/10.1016/j.autcon.2012 Gerber, and L. E. Barnes. 2017. “HDLTex: Hierarchical deep learning
.10.014. for text classification.” In Proc., 16th IEEE Int. Conf. on Machine
Fawzy, S. A., and I. H. El-adaway. 2012. “Contract administration guide- Learning and Applications, ICMLA 2017, 364–371. New York: IEEE.
lines for U.S. contractors working under world bank—Funded proj- Kowsari, K., K. J. Meimandi, M. Heidarysafa, S. Mendu, L. Barnes, and
ects.” J. Leg. Aff. Dispute Resolut. Eng. Constr. 4 (2): 40–50. https:// D. Brown. 2019. “Text classification algorithms: A survey.” Informa-
doi.org/10.1061/(ASCE)LA.1943-4170.0000088. tion (Switzerland) 10 (4): 1–68. https://doi.org/10.3390/info10040150.
Foliente, G. C. 2000. “Developments in performance-based building codes Kübler, S., R. McDonald, and J. Nivre. 2009. Dependency parsing. Synthesis
and standards.” Forest Prod. J. 50 (7–8): 2–11. lectures on human language technologies. San Rafael, CA: Morgan &
Garay-Vitoria, N., and J. Gonzalez-Abascal. 1997. “Intelligent world- Claypool.
prediction to enhance text input rate (A syntactic analysis-based word- Kumar Viswanathan, S., A. Panwar, S. Kar, R. Lavingiya, and K. N. Jha.
prediction aid for people with severe motor and speech disability).” 2020. “Causal modeling of disputes in construction projects.” J. Leg.
In Proc., Int. Conf. on Intelligent User Interfaces, Proceedings IUI, Aff. Dispute Resolut. Eng. Constr. 12 (4): 04520035. https://doi.org/10
241–244. New York: Association for Computing Machinery. .1061/(ASCE)LA.1943-4170.0000432.

Le, T., and H. D. Jeong. 2017. “NLP-based approach to semantic classifi- Orsenigo, C., C. Vercellis, and C. Volpetti. 2018. “Concatenating or
cation of heterogeneous transportation asset data terminology.” J. Com- averaging? Hybrid sentences representations for sentiment analysis.”
put. Civ. Eng. 31 (6): 04017057. https://doi.org/10.1061/(ASCE)CP In Proc., Int. Conf. on Intelligent Data Engineering and Automated
.1943-5487.0000701. Learning, 567–575. Berlin: Springer.
Lee, J., Y. Ham, J. S. Yi, and J. Son. 2020. “Effective risk positioning Ozgur, A. 2004. Supervised and unsupervised machine learning techniques
through automated identification of missing contract conditions from for text document categorization. İstanbul: Boğaziçi Univ.
the contractor’s perspective based on FIDIC contract cases.” J. Manage. Parvizimosaed, A. 2020. “Towards the specification and verification of
Eng. 36 (3): 05020003. https://doi.org/10.1061/(ASCE)ME.1943-5479 legal contracts.” In Proc., 2020 IEEE 28th Int. Requirements Engineer-
.0000757. ing Conf. (RE), 445–450. New York: IEEE.
Lee, J., J.-S. S. Yi, and J. Son. 2019. “Development of automatic-extraction Pennington, J., R. Socher, and C. D. Manning. 2014. “GloVe: Global vec-
model of poisonous clauses in international construction contracts using tors for word representation Jeffrey.” In Proc., 2014 Conf. on Empirical
rule-based NLP.” J. Comput. Civ. Eng. 33 (3): 04019003. https://doi.org Methods in Natural Language Processing (EMNLP), 1532–1543.
/10.1061/(ASCE)CP.1943-5487.0000807. Stroudsburg, PA: Association for Computational Linguistics.
Li, L., W. Fan, D. Huang, Y. Dang, and J. Sun. 2012. “Boosting perfor- Pishdad-Bozorgi, P., and J. M. De La Garza. 2012. “Comparative analysis
mance of gene mention tagging system by hybrid methods.” J. Biomed. of design-bid-build and design-build from the standpoint of claims.” In
Inf. 45 (1): 156–164. https://doi.org/10.1016/j.jbi.2011.10.004. Proc., 2012 Construction Research Congress: Construction Research
Lin, G., and Q. Shen. 2007. “Measuring the performance of value manage- Congress 2012: Construction Challenges in a Flat World, 21–30. Re-
ment studies in construction: Critical review.” J. Manage. Eng. 23 (1): ston, VA: ASCE.
2–9. https://doi.org/10.1061/(ASCE)0742-597X(2007)23:1(2). Pritchard, R. W. 2013. “2011 to 2012 Queensland floods and cyclone events:
Lin, K. Y., K. W. Chou, H. T. Lin, S. H. Hsieh, and H. P. Tserng. 2009. Lessons learnt for bridge transport infrastructure.” Aust. J. Struct. Eng.
“Exploring the effectiveness of Chinese-to-English machine transla- 14 (2): 167–176. https://doi.org/10.7158/S13-009.2013.14.2.
tion for CLIR applications in earthquake engineering.” J. Comput. Civ. Rajoo, S. 2010. “The PAM 2006 standard form of building contract—A
Eng. 23 (3): 140–147. https://doi.org/10.1061/(ASCE)0887-3801(2009) change in risk allocation.” Malayan Law J. 4 (10): 14.
23:3(140). Rameezdeen, R., and C. Rajapakse. 2007. “Contract interpretation: The im-
Liu, H., A. Gegov, and F. Stahl. 2014. “Categorization and construction of pact of readability.” Constr. Manage. Econ. 25 (7): 729–737. https://doi
rule based system.” In Proc., Int. Conf. on Engineering Applications of .org/10.1080/01446190601099228.
Neural Networks, 183–194. Berlin: Springer. Rameezdeen, R., and A. Rodrigo. 2013. “Textual complexity of standard
Manning, C. D., and H. Schütze. 1999. Foundations of statistical natural conditions used in the construction industry.” Aust. J. Constr. Econ.
language processing. Cambridge, MA: MIT Press. Build. 13 (1): 1–12. https://doi.org/10.5130/AJCEB.v13i1.3046.
Mbachu, J. 2008. “Conceptual framework for the assessment of subcon- Rameezdeen, R., and A. Rodrigo. 2014. “Modifications to standard forms
tractors’ eligibility and performance in the construction industry.” of contract: The impact on readability.” Aust. J. Constr. Econ. Build.
Constr. Manage. Econ. 26 (5): 471–484. https://doi.org/10.1080 14 (2): 31–40. https://doi.org/10.5130/AJCEB.v14i2.3778.
/01446190801918730. Rendon, R. 2012. “The contract changes management process: Managing
McCoy, K. F., and P. Demasco. 1995. “Some applications of natural lan- and controlling contract changes Part 2.” Contract Manage. 52: 56–64.
guage processing to the field of augmentative and alternative commu- Riloff, E., and W. Phillips. 2004. An Introduction to the Sundance and
nication.” In Proc., IJCAI’95 Workshop on Developing AI Applications AutosSlog Systems. Technical Report UUCS-04-015. Salt Lake City:
for Disabled People, 97–112. Washington, DC: National Institute on Univ. of Utah.
Disability and Rehabilitation Research. Robertson, S. E., and K. Spärck Jones. 1994. Simple, proven approaches to
Mikolov, T., K. Chen, G. Corrado, and J. Dean. 2013. “Efficient estimation text retrieval. Cambridge, UK: Univ. of Cambridge.
of word representations in vector space.” In Proc., 1st Int. Conf. on Runeson, P., M. Alexandersson, and O. Nyholm. 2007. “Detection of du-
Learning Representations, ICLR 2013—Workshop Track, 1–12. plicate defect reports using natural language processing.” In Proc., Int.
Stroudsburg, PA: Association for Computational Linguistics. Conf. on Software Engineering, 499–508. New York: IEEE.
Mitkus, S., and T. Mitkus. 2014. “Causes of conflicts in a construction Salama, D. M., and N. M. El-Gohary. 2013. “semantic text classification for
industry: A communicational approach.” Procedia—Social Behav. Sci. supporting automated compliance checking in construction.” J. Com-
110 (Jan): 777–786. https://doi.org/10.1016/j.sbspro.2013.12.922. put. Civ. Eng. 30 (1): 04014106. https://doi.org/10.1061/(ASCE)CP
Moku, M., K. Yamamoto, and A. Makabi. 2012. “Automatic easy Japanese .1943-5487.0000301.
translation for information accessibility of foreigners.” In Proc., Work- Salton, G., and C. Buckley. 1988. “Term-weighting approaches in auto-
shop on Speech and Language Processing Tools in Education, 85–90. matic text retrieval.” Inf. Process. Manage. 24 (5): 513–523. https://doi
Stroudsburg, PA: Association for Computational Linguistics. .org/10.1016/0306-4573(88)90021-0.
Mollá, D., M. Van Zaanen, and D. Smith. 2006. “Named entity recognition Saseendran, A., B. F. Bigelow, Z. K. Rybkowski, and D. E. Jourdan.
for question answering.” In Proc., 2006 Australasian Language Tech- 2020. “Disputes in construction: Evaluation of contractual effects of
nology Workshop 2006, 51–58. Stroudsburg, PA: Association for Com- ConsensusDOCS.” J. Leg. Aff. Dispute Resolut. Eng. Constr. 12 (2):
putational Linguistics. 04520008. https://doi.org/10.1061/(ASCE)LA.1943-4170.0000377.
Moon, S., G. Lee, S. Chi, and H. Oh. 2021. “Automated construction speci- Sebastiani, F. 2002. “Machine learning in automated text categorization.”
fication review with named entity recognition using natural language ACM Comput. Surv. (CSUR) 34 (1): 1–47. https://doi.org/10.1145
processing.” J. Constr. Eng. Manage. 147 (1): 04020147. https://doi.org /505282.505283.
/10.1061/(ASCE)CO.1943-7862.0001953. Serag, E., H. Osman, and M. Ghanem. 2010. “Semantic detection of risks
Mshali, R. M. 2016. “An investigation into construction contracts in and conflicts in construction contracts.” In Proc., 27th Int. Conf.: CIB
Malawi—Turnkey versus traditional contracts.” Ph.D. dissertation, W78 2010, 16–18. Rotterdam, Netherlands: International Council for
Malawi Institute of Management, Univ. of Bolton. Research and Innovation in Building.
Navon, R., and S. Isaac. 2009. “An automated tool for identifying the im- Shen, W., W. Tang, W. Yu, C. F. Duffield, F. K. P. Hui, Y. Wei, and J.
plications of changes in construction projects.” In Vol. 134 of Proc., Fang. 2017. “Causes of contractors’ claims in international engineering-
2009 Construction Research Congress: Building a Sustainable Future, procurement-construction projects.” J. Civ. Eng. Manage. 23 (6): 727–
1–10. Reston, VA: ASCE. 739. https://doi.org/10.3846/13923730.2017.1281839.
Nedev, G., and U. Khan. 2011. “Guidelines for conceptual design of short- Song, J., J. Kim, and J.-K. Lee. 2018. “NLP and deep learning-based
span bridges.” Master’s thesis, Dept. of Civil and Environmental Engi- analysis of building regulations to support automated rule checking
neering, Chalmers Univ. of Technology. system—The international association for automation and robotics
Nichting, A. T. 1999. “OSHA reform: An examination of third party in construction.” In Vol. 35 of Proc., Int. Symp. on Automation and
audits.” In Vol. 75 of Symp. on Legal Disputes over Body Tissue, Robotics in Construction, 1–7. Oulu, Finland: International Association
195–227. Chicago: Chicago-Kent College of Law. for Automation and Robotics in Construction.

Sun, M., and X. Meng. 2009. “Taxonomy for change causes and effects in 1st IEEE Int. Workshop on Semantic Computing and Systems, WSCS
construction projects.” Int. J. Project Manage. 27 (6): 560–572. https:// 2008, 133–140. New York: IEEE.
doi.org/10.1016/j.ijproman.2008.10.005. Zait, F., and N. Zarour. 2019. “Addressing lexical and semantic ambiguity
Temnikova, I. 2012. “Text complexity and text simplification in the crisis in natural language requirements.” In Proc., 5th Int. Symp. on Innovation
management domain.” Ph.D. dissertation, Univ. of Wolverhampton. in Information and Communication Technology, ISIICT 2018. New
https://wlv.openrepository.com/handle/2436/297482. York: IEEE.
Treadwell, J. S. 1971. “Contractor quality control: A method of construc- Zhang, F., H. Fleyeh, X. Wang, and M. Lu. 2019a. “Construction site
tion inspection.” Master’s thesis, Dept. of Civil, Architectural and Envi- accident analysis using text mining and natural language processing
ronmental Engineering, Univ. of Missouri-Rolla. techniques.” Autom. Constr. 99 (Jan): 238–248. https://doi.org/10.1016
Uusitalo, E., M. Raatikainen, T. Männistö, and T. Tommila. 2011. “Struc- /j.autcon.2018.12.016.
tured natural language requirements in nuclear energy domain: Towards Zhang, H., X. Wang, H. Tan, and R. Li. 2019b. “Applying data discretiza-
improving regulatory guidelines.” In Proc., 2011 4th Int. Workshop on
tion to DPCNN for law article prediction.” In Proc., CCF Int. Conf.
Requirements Engineering and Law, RELAW 2011, Held in Conjunc-
on Natural Language Processing and Chinese Computing, 459–470.
tion with the 19th International Requirements Engineering Conf.,
Berlin: Springer.
67–73. New York: IEEE.
Zhang, J. 2011. Automated information extraction from construction-
Walsh, K. P. 2017. “Identifying and mitigating the risks created by prob-
lematic clauses in construction contracts.” J. Leg. Aff. Dispute Resolut. related regulatory documents for automated compliance checking.
Eng. Constr. 9 (3): 03717001. https://doi.org/10.1061/(ASCE)LA.1943 Rotterdam, Netherlands: International Council for Research and Inno-
-4170.0000225. vation in Building.
Wang, T., P. Chen, K. Amaral, and J. Qiang. 2016. “An experimental study Zhang, J. 2017. “A logic-based representation and tree-based visualization
of LSTM encoder-decoder model for text simplification.” Preprint, method for building regulatory requirements.” Visual. Eng. 5 (1): 1–14.
submitted September 13, 2016. http://arxiv.org/abs/1609.03663. https://doi.org/10.1186/s40327-017-0043-4.
Wei, F., H. Qin, S. Ye, and H. Zhao. 2019. “Empirical study of deep learn- Zhang, J., and N. El-Gohary. 2012a. “Automated regulatory information
ing for text classification in legal document review.” In Proc., 2018 extraction from building codes leveraging syntactic and semantic infor-
IEEE Int. Conf. on Big Data, Big Data 2018, 3317–3320. New York: mation.” In Proc., Construction Research Congress 2012: Construction
IEEE. Challenges in a Flat World, 622–632. Reston, VA: ASCE.
Xu, X., and H. Cai. 2019. “Semantic frame-based information extraction Zhang, J., and N. El-Gohary. 2012b. “Extraction of construction regu-
from utility regulatory documents to support compliance checking.” latory requirements from textual documents using natural language
In Advances in informatics and computing in civil and construction processing techniques.” Comput. Civ. Eng. 453–460. https://doi.org/10
engineering, 223–230. New York: Springer. .1061/9780784412343.0057.
Xue, X., Q. Shen, and Z. Ren. 2010. “Critical review of collaborative work- Zhang, J., and N. El-Gohary. 2016a. “Extending building information mod-
ing in construction projects: Business environment and human behav- els semiautomatically using semantic natural language processing tech-
iors.” J. Manage. Eng. 26 (4): 196–208. https://doi.org/10.1061/(ASCE) niques.” J. Comput. Civ. Eng. 30 (5): C4016004. https://doi.org/10.1061
ME.1943-5479.0000025. /(ASCE)CP.1943-5487.0000536.
Yap, J. B. H., H. Abdul-Rahman, C. Wang, and M. Skitmore. 2018. Zhang, J., and N. M. El-Gohary. 2016b. “Semantic NLP-based information
“Exploring the underlying factors inducing design changes during extraction from construction regulatory documents for automated com-
building production.” Prod. Plan. Control 29 (7): 586–601. https://doi pliance checking.” J. Comput. Civ. Eng. 30 (2): 04015014. https://doi
.org/10.1080/09537287.2018.1448127. .org/10.1061/(ASCE)CP.1943-5487.0000346.
Yasodha, S., and P. S. Prakash. 2012. “Data mining classification technique
Zhang, J., and N. M. El-Gohary. 2017. “Integrating semantic NLP and logic
for talent management using SVM.” In Proc., 2012 Int. Conf. on
reasoning into a unified system for fully-automated code checking.”
Computing, Electronics and Electrical Technologies, ICCEET 2012,
Autom. Constr. 73 (Jan): 45–57. https://doi.org/10.1016/j.autcon.2016
959–963. New York: IEEE.
.08.027.
Yi, W., and A. P. C. Chan. 2014. “Critical review of labor productivity
research in construction journals.” J. Manage. Eng. 30 (2): 214–225. Zhou, P., and N. El-Gohary. 2016a. “Automated extraction of environmen-
https://doi.org/10.1061/(ASCE)ME.1943-5479.0000194. tal requirements from contract specifications.” In Vol. 1676 of Proc.,
Younis, G., G. Wood, and M. A. A. Malak. 2008. “Minimizing construction 16th Int. Conf. on Computing in Civil and Building Engineering,
disputes: The relationship between risk allocation and behavioural at- 1669. Reston, VA: ASCE.
titudes.” In Proc., CIB Int. Conf. on Building Education & Research Zhou, P., and N. El-Gohary. 2016b. “Domain-specific hierarchical text clas-
BEAR2008, Sri Lanka, 134–135. Salford, UK: Univ. of Salford. sification for supporting automated environmental compliance check-
Youssef, A., H. Osman, M. Georgy, and N. Yehia. 2018. “Semantic risk ing.” J. Comput. Civ. Eng. 30 (4): 04015057. https://doi.org/10.1061
assessment for ad hoc and amended standard forms of construction con- /(ASCE)CP.1943-5487.0000513.
tracts.” J. Leg. Aff. Dispute Resolut. Eng. Constr. 10 (2): 04518002. Zou, Y., A. Kiviniemi, and S. W. Jones. 2017. “Retrieving similar cases for
https://doi.org/10.1061/(ASCE)LA.1943-4170.0000253. construction project risk management using Natural Language Process-
Yuan, P., Y. Chen, H. Jin, and L. Huang. 2008. “MSVM-kNN: Combin- ing techniques.” Autom. Constr. 80 (Aug): 66–76. https://doi.org/10
ing SVM and k-NN for multi-class text classification.” In Proc., .1016/j.autcon.2017.04.003.

(Asce) Co 1943-7862 0002122

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(Asce) Co 1943-7862 0002122

Uploaded by

Copyright:

Available Formats

State-of-the-Art Review

Addressing Legal and Contractual Matters in Construction

Introduction that often require complex, time-consuming, and costly solutions.

© ASCE 03121004-1 J. Constr. Eng. Manage.

Natural Language Processing

© ASCE 03121004-2 J. Constr. Eng. Manage.

© ASCE 03121004-3 J. Constr. Eng. Manage.

researchers to collect relevant articles (Deng and Smyth 2013; Lin

Fig. 4. Number of NLP publication addressing legal issues associated

© ASCE 03121004-4 J. Constr. Eng. Manage.

© ASCE 03121004-5 J. Constr. Eng. Manage.

Table 2. Summary of studies on NLP-assisted contract review

© ASCE 03121004-7 J. Constr. Eng. Manage.

© ASCE 03121004-9 J. Constr. Eng. Manage.

Fan and Li (2013)

Zou et al. (2017)

Directions for Future Research

Overview of Future Needs

Used NLP to retrieve similar cases for

Model for comparing cases in terms of

Utilized NLP techniques to retrieve

By far, many scholars have devoted themselves to enabling a sig-

alternative dispute resolution in

nificant improvement in the analysis and management of textual

the following three common limitations. Firstly, the lack of avail-

NLP in the construction domain. Due to the application-oriented

To ensure reliable performance of NLP models, significant effort

partnering with industry organizations in making labeled data

widely available is a pressing need. Second, the performance of

is still inadequate as the contextual information is seldom integrated.

Since legal construction documents consist of unique terminologies

liable performance without adapting to the construction domain.

Automated information retrieval

and El-Gohary 2016b). The findings from those efforts showed

an impressive improvement; thus, future studies would need to con-

applicability of NLP frameworks to the legal context in construc-

from similar cases

Case law analysis

tion. Third, most of the existing methods for addressing construction

from construction practitioners.

suggests several new research problems that are worth investigating

to support the following tasks: contract drafting, contract review,

prediction of construction disputes. The description of the potential

© ASCE 03121004-10 J. Constr. Eng. Manage.

© ASCE 03121004-12 J. Constr. Eng. Manage.

© ASCE 03121004-13 J. Constr. Eng. Manage.

© ASCE 03121004-14 J. Constr. Eng. Manage.

© ASCE 03121004-15 J. Constr. Eng. Manage.

© ASCE 03121004-16 J. Constr. Eng. Manage.

© ASCE 03121004-17 J. Constr. Eng. Manage.

You might also like