Professional Documents
Culture Documents
Abstract: Claims, disputes, and litigations are major legal issues in construction projects, which often result in cost overruns, delays, and
adverse working relationships among the contracting parties. Recent advances in natural language processing (NLP) techniques offer great
potentials that can process voluminous unstructured data from legal documents to draw insightful information about the root causes of issues
and prevention strategies. Several efforts have been undertaken in the last decades that used NLP to tackle a wide range of problems related to
legal issues in construction such as the quality review of contracts and the identification of common patterns in legal cases. The research line
on NLP-based techniques for analyzing legal texts of construction projects has progressed well recently; it, however, is still in the early stage.
This paper aims to perform a critical review of recently published articles to analyze the achievements and limitations of the state of the art on
NLP-based approaches to address common legal issues associated with legal documents arising across different project stages. The study also
provides a roadmap for future research to expand the adoption of NLP for the processing of legal texts in construction. DOI: 10.1061/(ASCE)
CO.1943-7862.0002122. © 2021 American Society of Civil Engineers.
Author keywords: Legal issues; Disputes; Artificial intelligence; Natural language processing (NLP); Contracts; Project requirements;
Litigation; Claims; Linguistics.
ML algorithms
Contractual requirement recognition Rule-based and ML-based binary Classified the contractual text into 98.15% (recall) Hassan and
text classification requirements and nonrequirements using Le (2020)
ML algorithms
Concept relation extraction from Shallow parsing and rules Utilized shallow parsing to perform 68% (F-measure) Al Qady and
contracts syntactic segmentation of contract clauses Kandil (2010)
and extract active concepts, passive
concepts, and relations
Automated extraction of subcontract ML-based text classification Used ML to classify main contract 94.18% (accuracy) Hassan et al. (2020)
scope requirements into different categories
corresponding to subcontract disciplines
Automated extraction of subcontract ML-based text classification Compared the performance of ML and deep 93.08 (recall) Hassan and
scope learning algorithms in classifying general Le (2021)
DB contract requirements into different
categories such as design, construction, and
O&M
J. Constr. Eng. Manage.
subjects in comparison with previous studies in which the infor- environmental, health, safety, security, quality, etc. The developed
mation organization was performed based on complete clauses. classifier used NB, SVM, and maximum entropy (ME) algorithms.
The proposed system used the Sundance shallow syntactic parser The SVM algorithm achieved the highest recall of 100% on their
(Riloff and Phillips 2004) to segment a contract clause into active dataset comprising of 330 regulatory provisions. On the other hand,
concepts (i.e., the client and the contractor), passive concepts NB and ME both reached the same recall of 96% only. To further
(i.e., work items), and their relations (i.e., action). When being classify the environmental regulatory provisions into more detailed
evaluated on general conditions of the standard form construction categories, Zhou and El-Gohary (2016b) investigated a different
contract, the proposed sentence parsing system achieved an F-score approach, namely, hierarchical text classification, to classify the
of 68% in comparison with the 76% reported by human annotators clauses in environmental regulatory codes into 10 different topics
who used conventional method. of environmental compliance checking, including air leakage, fen-
A few authors also developed NLP models for structuring the estration, lightening power, thermal insulation, etc. The authors used
requirements of DB contracts to support automated subcontract NB, SVM, kNN, DT, radius-based neighbors (RBN), and random
scope extraction. Hassan et al. (2020) implemented supervised ML forest (RF) algorithms to develop classification models. The best
algorithms including NB, SVM, LR, kNN, DT, and FNN to de- SVM model achieved a recall and precision of 97% and 84%, re-
velop a model that can extract subcontractor scope from the design- spectively, when being tested on a dataset of 1200 clauses. Other
build (DB) contracts. The model can classify DB requirements into algorithms also reported reliable results; however, RBN exhibited
three categories indicating different disciplines namely design, con- a very low 37.5% recall performance. Another study by Song et al.
struction, and operation and maintenance (O&M). The models (2018) used a deep learning algorithm to classify the regulatory pro-
were evaluated on a dataset of 2634 DB requirements where the visions into different categories such as site, building, structure, fa-
LR model achieved the highest accuracy of 94.18% and the NB cility, evacuation and fireproof, etc. The proposed method applies
model revealed the lowest accuracy of 87.48%. Following this the word2vec embedding technique to convert the meaning of words
study, Hassan and Le (2021) compared the performance of tradi- into numerical values. The classification labels are helpful to know
tional ML and deep learning in the classification of DB require- what type of information is required to be extracted from a provi-
ments. They tested a total of eight different machine learning sion. The developed model can also extract the top related words for
methods including traditional ML algorithms (NB, SVM, LR, any input word in the provision. The advantages of performing text
kNN, DT) and three deep learning algorithms (CNN, RNN_LSTM, classification before information extraction is that the irrelevant text
RNN_GRU). They also examined the ensemble of the classifier is filtered out and semantically similar provisions are grouped, which
which was found to exhibit the highest recall of 93.08%. Addition- improves the efficiency and accuracy of information extraction and
ally, they reported that the classification reliability depends on the subsequent CC rules.
vectorization method where word embedding vector space models Automated Information Extraction for Rules Development
seemed to outperform bag-of-words (BOW) method. Such clas- from Regulatory Provisions. In attempt to automate the compli-
sification models for separating DB requirements into different ance checking of designed structures against regulatory provisions,
categories, namely, design, construction, and O&M, can assist the a few studies have developed models for converting natural lan-
general contractors in extracting the precise scope of a subcontrac- guage provisions into computer-understandable rules. Most studies
tor from a lengthy and complex DB contract. The classification used rule-based approaches for this purpose. The first model for
models developed by researchers for structuring contractual text information extraction from regulatory provisions to develop CC
can assist in improving organization and access to critical infor- rules was proposed by Zhang (2011). The developed model can
mation provided in a large number of electronic text documents automatically extract essential information (i.e., subject, attribute,
produced in a construction project. Enabling quick access to es- comparison, quantity) which are then used to develop logic rules.
sential information through document organization improves the The information is extracted by matching patterns produced using
coordination, collaboration, and information exchange among proj- semantic (domain-specific meaning/context-related features) and
ect members to support effective planning and decision making. syntactic features (nouns, verbs, etc.) of the text. An ontology
It can further protect the contractor from any financial loss and dis- and a syntactic parser were used to extract semantic and syntactic
putes due to the missing information. features respectively from text to develop patterns. Their evaluation
results on quantitative requirements of International Building Code
NLP-Assisted Detection of Violation with 2006 indicated that the use of semantic features (using ontology) to
Construction Laws and Regulations develop patterns yields better performance than using syntactic
Another main research theme of NLP approaches to address legal features (using POS tags) since the semantic features could well
issues in construction was aimed at automated development of understand domain-specific terms and contexts. The syntactic and
logic rules from natural language regulatory provisions to support semantic features-based information extraction methods reported
automated compliance checking (ACC). Table 3 summarizes the an f-measure of 75% and 97.4%, respectively. Following this study,
research efforts undertaken that explored the capability of NLP- Zhang and El-Gohary (2016b) developed a more advanced
based ACC. As shown in the table, the main focus in this area was rule-based information extraction model that utilized dependency
on the following topics: automated classification of regulatory pro- information (i.e., nsubj, dobj, nmod) along with simple syntactic
visions, automated information extraction for rules development (i.e., noun, verb, adjective, adverb) and semantic (i.e., domain-
from regulatory provisions, and ACC of building information model specific meaning/context-related features) features of the text. In
(BIMs). The details of developed NLP frameworks in this area of comparison with the previous model, this model can extract more
research are discussed below. information from a provision such as subject, subject restriction,
Automated Classification of Regulatory Provisions. The classi- compliance checking attribute, deontic operator indicator, quantita-
fication of voluminous sets of unstructured provisions in laws and tive relation, comparative relation, quantity value, quantity unit, and
regulations is the first step required for ACC platforms. To address quantity restriction. The information extraction rules of the current
this, Salama and El-Gohary (2013) proposed an ML-based text study were validated on the 2009 International Building Code,
classification model for classifying regulatory provisions into four- resulting in a precision and recall of 96.9% and 94.4%, respec-
teen predefined categories of compliance checking (CC) such as tively. This approach also produces impressive performance with
Table 3. Summary of studies on NLP-assisted detection of violation with construction laws and regulations
Topic Solutions Approach Methodology description Performance References
Automated classification Classification of regulatory ML-based text classification Used ML algorithms to classify general 100% (recall), Salama and
of regulatory provisions provisions conditions of a contract into fourteen CC 96% (precision) El-Gohary (2013)
categories
Environmental code classification ML-based hierarchical text Used ML algorithms to perform hierarchical 97% (recall), Zhou and
classification text classification of environmental 84% (precision) El-Gohary (2016b)
regulatory codes into ten different topics
Automated rule checking system Deep learning-based approach Converted the meaning of words in numerical N/A Song et al. (2018)
values to classify the topic of sentences
Automated information Information extraction from Pattern matching-based rules using Extracted and represented the semantic 75%-100% (precision), Zhang (2011)
extraction for rules building code provisions for rules parsing information, POS tagging, information from building codes in a 75%-95% (recall),
development from construction and ontology computer-understandable structure to enable 75%–97.4% (F-score)
regulatory provisions rules construction
Information extraction from Rule-based information extraction Extracted regulatory information 96.9% (precision), Zhang and
building code provisions for rules approach from building codes using syntactic 94.4% (recall) El-Gohary (2016b)
construction (syntax/grammar-related) and semantic
(meaning/context-related) features
Extraction of information from Rule-based information extraction Used rule-based NLP methods to classify 98.1% (recall), Zhou and
environmental requirements methods and extract information from environmental 98.5% (precision) El-Gohary (2016a)
requirements in contracts
Information extraction from Dependency parsing (DP) and Compared the performance of DP and PSG to 94.3%–96.9% (F-score) Zhang and
fire code provisions for rules phrase structure grammar (PSG) extract information from fire codes El-Gohary (2012b)
construction methods
03121004-8
Information extraction from utility Semantic frame-based information Used a semantic frame-based information 92.23% (precision) Xu and Cai (2019)
policies extraction method extraction method with a focus on domain
semantics and lexical semantics
A nonproprietary and Logic-based representation and Investigated a logic-based representation and text representation >visual Zhang (2017)
user-understandable representation tree-based visualization approaches tree-based visualization methods representation >logic
of building regulations representation
ACC of Building Extension of current IFC schema Pattern-matching-based rules, Extract concepts from building regulations 88.7%–97.1% (precision), Zhang and
Information Models ML-based text classification and match the extracted concepts to concepts 94.2%–99.2% (recall), El-Gohary (2016a)
(BIMs) in the IFC class hierarchy to further extend 91.4%–98.1% (F-score)
the schema
Development of logic reasoning A combination of semantic NLP Used the NLP and EXPRESS data-based 87.6% (precision), Zhang and
system and EXPRESS data-based techniques to extract and transform both 98.7% (recall), El-Gohary (2017)
technique regulatory and design information in BIMs 92.8% (F-score)
into logical format to perform automated
compliance reasoning
J. Constr. Eng. Manage.
a precision of 98.5% and a recall of 98.1%, which is higher than the methods were validated on small sets of building codes. Further
96.9% and 94.4% when being tested on a different dataset compris- research on automated design verification should be expanded to
ing of environmental regulations of a real construction contract other domains such as the highway sector and should use larger
(Zhou and El-Gohary 2016a). In a separate study, Zhang and samples of codes. Additionally, previous studies have mainly dealt
El-Gohary (2012b) made another comparison of performance be- with prescriptive requirements of which the design constraint can
tween rules developed based on DP and phrase-structure grammar be presented using the first order-logic rules. However, real work
(PSG). PSG corresponds to a set of phrase structure relations which design codes also include objective requirements that do not explic-
are defined by the rules that predict the different combinations of itly specify any quantitative constraint. We know very little about
tokens forming a grammatical phrase. PSG is actually used to pro- whether the existing methods perform well on such requirements.
duce phrase tags (QP→JJR IN CD) by using individual POS tags
(JJR, IN, CD). Phrase tags are used to extract information when a NLP Frameworks for the Analysis of
specific combination of POS tags is encountered. When being va- Historical Legal Cases
lidated on the 2009 International Fire Code, DP rules slightly out- Another main focus of the previous studies was centered on the
perform PSG rules as their F-measures were reported to be 96.9% processing of legal case data. The major NLP frameworks devel-
and 94.3%, respectively. A recent study by Xu and Cai (2019) used oped by previous researchers, as summarized in Table 4, were aimed
dependency parsing and POS tagging along with knowledge bases at addressing the following research problems: analysis of legal
such as ontologies and lexicons for entity recognition. In comparison cases data of construction defects and similar case retrieval. NLP
with the previously discussed rule-based methods, the proposed frameworks are highly effective in analyzing the previous case data
method reported a low precision of 92.23% for extracting the regu- to provide important insights to the project participants as well as the
latory information from the Indiana utility accommodation policy. court professionals. The details of the NLP frameworks developed
Another contribution to digital representation of building regulations in this area are discussed below.
was made by Zhang (2017). To enable the nonproprietary and user- Analysis of Legal Cases Data of Construction Defects. To exam-
understandable representation of building regulations, the author in- ine the common types of construction defects resulting in costly
vestigated a logic-based representation and tree-based visualization delays and disputes, Jallan et al. (2019) tested the ability of
method for building regulatory requirements to improve the under- NLP for analyzing legal cases related to construction defects.
standability and reading speed of building regulations. In terms of The authors used the frequency analysis of keywords in the historic
understandability and reading speed, the text representation per- construction legal case documents from the LexisNexis database to
formed better than the visual and logic representation in the exper- identify the common topics. They used the latent Dirichlet alloca-
imental study. tion (LDA) method, a modeling method for identifying the topic of
ACC of Building Information Models. Another critical task re- a document, to cluster the documents by topics based on their dis-
quired for ACC of building information models (BIMs) is the au- tinct keyword features. The study identified 14 unique topics from
tomatic extraction of CC-related information from BIMs to the cluster analysis. This developed automated framework can help
integrate machine readable-rules with BIMs. Due to the limited project participants to understand better the construction defects
coverage of CC-related concepts in the existing Industry Founda- that lead to legal cases in the past. Subsequently, this knowledge
tion Classes (IFC)–based BIM models, a few efforts have been can reveal the root causes of defects to prevent similar issues and
made to automate the process of extending IFC schemas using costly litigation in future projects.
NLP. To address this issue, a new method for extending the IFC Similar Case Retrieval. Several studies have used NLP to enhance
schema was proposed by Zhang and El-Gohary (2016a). The the efficiency and effectiveness of legal case analysis by reviewing
method utilizes pattern matching-based methods and ML tech- similar cases in a certain legal database. Fan and Li (2013) devel-
niques including NB, SVM, kNN, and DT to extract concepts from oped a framework for retrieving similar cases in a legal case library
regulatory documents and predict their relationship with IFC con- given a certain input case. The search results are based on the se-
cepts to extend the schema. Pattern matching–based rules based on mantic similarity rather than string similarity between the user’s
POS tag information are used for concept extraction from codes input within the case descriptions. In their study, the documents
which are then matched with the IFC class hierarchy to extract were vectorized using the bag of word (BOW) method along with
the most related IFC concepts using WordNet. The ML algorithm the use of TF-IDF for measuring the term weight. The method was
predicts the relationship between the extracted concepts and the found to outperform the traditional string-based document search.
most related IFC concepts which is used to further extend It was tested on the Westlaw legal information database with differ-
the IFC schema. The kNN and NB model reported the highest and ent input legal case descriptions and received approximately 40%
lowest precision of 91% and 76.2%, respectively, when evaluated precision and 90% recall at the top 90 results. One limitation of this
on International Building Codes 2006 and 2009. Following this study is that it did not consider the variation of terms such as syn-
study, Zhang and El-Gohary (2017) also offer a novel system for onyms in the input description and the text database. Besides, this
fully automated checking of BIMs for compliance with build- approach also requires the user to have a lengthy description of the
ing regulations. The system comprises three modules of regulatory case while searching for information using the key phrases is pref-
information extraction, design information extraction, and com- erable. To address those drawbacks, Zou et al. (2017) developed a
pliance reasoning. The first module uses pattern matching-based new keyword-based search algorithm that uses the semantics of
rules to extract semantic information elements (subject, compliance words for legal case retrieval. They used a risk-related domain dic-
checking attribute, quantity, etc.) from regulatory codes and convert tionary as well as the general dictionary WordNet, which helps the
them into logic rules. In the second module, the EXPRESS data- algorithm to improve the search results as the synonyms and other
based techniques are implemented to transform design information lexical forms of the input keywords are also used as additional in-
in BIMs into logic facts. Finally, the compliance of the logic facts put. The authors tested their method on WorkSafeBC and NIOSH
with the logic rules is performed by executing the semantic-based case datasets with a precision of mostly above 90% and an F-score
logic reasoning algorithms in the compliance reasoning module. varying from 42% to 83% at the top 10 results.
The model developed in this study reported an F-score of 92.8%. Although similar case retrieval is useful for making an appropri-
In the area of compliance checking, most existing NLP-based ate judgment on new dispute situations, an insightful comparison
References
only a limited number of cases are available. This problem was re-
cently tackled by Cavar et al. (2018), who developed a new method
to support in-depth examination of the difference and contradiction
between legal cases. The authors proposed a deep linguistic NLP
method for constructing the knowledge graph for each of the cases
in a legal document repository. They used graph theories to identify
14 unique topics identified
parts of the case description that differ or overlap with one another.
42.3%–83.3% (F-score)
70%–100% (precision),
These graphs also include a domain vocabulary composed of syn-
50%–100% (recall), onyms and other semantically related terms. This study, however,
Performance
90%–100% (recall)
was not robustly tested due to the lack of gold standard resources
N/A and corpora.
However, the status quo regarding this line of research still involves
construction accidents
(Le and Jeong 2017; Lee et al. 2020; Xu and Cai 2019; Zhang
construction-defect litigation
aspects of legal issues that have not been well investigated. Table 5
Analysis of legal cases
Table 5. Suggested future research areas on legal text processing in construction using NLP
Topic Research problem Proposed solution References
Contract drafting assistance Contract drafting and comprehension An automated framework to assist in precise contract drafting as Demasco and McCoy (1992), Hamie and
well to facilitate the expansion of the abbreviations for contract Abdul-Malak (2018)
comprehension
Subcontract scope extraction Separation of general contraction into small subcontracts indicating Assaad et al. (2020), Hassan and Le (2021)
different disciplines
Automated contract review Contract clarity and preciseness Automated model to verify the inclusion of sufficient details and Jagannathan and Delhi (2020)
conditions in contracts to prevent litigious behavior
Ambiguity in contractual clauses An automated model to predict the litigation proneness of Jagannathan and Delhi (2019)
contractual clauses
Administration of owner obligations A model to extract and classify the owner obligations in standard Abotaleb et al. (2019)
contracts into different categories, including permits, design
documents, reviews, etc.
Simplification of legal text for ease of Contract language simplification Replacement of infrequent legal jargons with domain-specific Lin et al. (2009), Saseendran et al. (2020)
reading common and simpler words to improve the readability of legal texts
Expansions of abbreviations used in contracts to reduce reader’s McCoy and Demasco (1995)
cognitive load spent for remembering abbreviations
An automated text simplification method to simplify the structure Rameezdeen and Rodrigo (2013, 2014),
and language used in contracts Uusitalo et al. (2011)
Impact tracing of changes in contractual Requirement changes impact analysis Development of a model to predict all requirements in a contract Arora et al. (2015), Navon and Isaac (2009),
requirements being impacted by the change in a specific requirement Rameezdeen and Rodrigo (2014)
Enabling early compliance checking with Precise selection of applicable provisions An automated model to select the correct applicable requirements Altmann and Samani (unpublished report,
design codes from codes from the regulatory design codes to carryout conceptual design 1978), Nedev and
03121004-11
Khan (2011)
Construction safety violence assessment Construction safety violent assessment Automated analysis of accident inspection reports for Zhang et al. (2019a)
identification of the OSHA citations violated by the contractor to
determine the penalty
Court outcome prediction of construction Assessment of quality of construction Development of a model for identifying similar cases in previous Aletras et al. (2016)
disputes claims litigation history to predict the outcome of a new legal case at an
early stage
J. Constr. Eng. Manage.
Contract Drafting Assistance contract is a common issue found in construction project contracts
necessitating further effort from the academic community (Mshali
Despite recent advances in the implementation of NLP in control-
2016; Parvizimosaed 2020). The aforementioned issues in contract
ling the quality of contract drafting, writing assistance models for
documents can be resolved by developing NLP models using hand-
contract drafting are still an open challenge (Curtotti and McCreath
coded rules (Lee et al. 2019). Although the development of rules is
2011; McCoy and Demasco 1995). The current state of the art in
a time-consuming task, the rule-based approaches yield higher per-
this regard involves several NLP approaches to the automated de-
formance than ML-based approaches given the fact that large-sized
tection of writing errors such as ambiguous terminologies, use of
poisonous clauses, or missing clauses (Chakrabarti et al. 2019; Lee datasets are not often available in construction domain (Salama and
et al. 2019, 2020; Serag et al. 2010). Still, little effort has been paid El-Gohary 2013).
Furthermore, most previous studies on contract administration
to computational methods that can assist contract drafters in select-
ing the right language or structure to phrase clauses. It would be were orientated towards contractor obligations (Caldas et al. 2002;
helpful to examine the ability of NLP in determining the attributes Hassan and Le 2021). However, owners also have several obliga-
of high-quality contract documents and effective strategies that can tions in a project, such as making payments, providing site infor-
help the contract writer improve their writing product. NLP meth- mation, assistance with permits, work inspection, and reviewing
ods such as the sentence COMPANSION technique developed by and approving submittals. Failure to comply with these obliga-
Demasco and McCoy (1992) may be adapted to develop a domain- tions unintentionally or intentionally was also reported as the main
specific writing assistance method for the drafting of contracts source of disputes in construction projects (Abotaleb et al. 2019).
or other legal documents in construction. The model was devel- Since the standard form of contracts (e.g., AIA and FIDIC) are in-
oped by the computer science domain experts for individuals with creasingly used in construction (Fawzy and El-adaway 2012), re-
language impairments that can generate a well-formed sentence searchers are encouraged to leverage NLP to examine highlight the
from a input sequence of uninflected words by the user. The prin- similarities and differences in the owner’s obligation clauses be-
ciple underpinning their system is the analysis of semantic infor- tween the standard forms. Each type of contract defines different
mation or how words co-occur in a trained corpus. Furthermore, an obligations for the owner. For instance, FIDIC contracts demand
intelligent NLP model that prompts the contract writer with appro- that the owner provide assistance in document preparation only in
priate words would allow him/her to select the most appropriate obtaining approvals, whereas ConsensusDOCS contracts require
word choice. As suggested by Garay-Vitoria and Gonzalez-Abascal the owner to pay as well to obtain approvals and permits. Further-
(1997), researchers may explore how NLP can learn the sequence more, extraction of the owner obligations from the contracts and
of words and their contexts to predict the most suitable terms. subsequent classification into different categories (e.g., permits, de-
Moreover, there is a need for an automated or semiautomated sign documents, review of submittals and requests, etc.) can assist
approach to the separation of the main contract into the smaller in the better administration of contracts.
subcontracts. Typically, the majority of the construction work (from
80%–90%) is performed by subcontractors (Arditi and Chotibhongs Simplification of Legal Texts for Ease of Reading
2005; Mbachu 2008). Therefore, the process of subcontract separa-
tion is often required (Assaad et al. 2020). Each subcontract is typ- Clearly, reading contracts is a key task performed by engineers.
ically responsible for only a small portion; thus, related clauses to a However, legal language is typically not a native language of en-
particular work portion need to be identified and added to the sub- gineers, causing much difficulty for many engineers when read-
contract requirements. The missing of any critical requirement as- ing legal text (Chong 2012; Rameezdeen and Rajapakse 2007;
sociated with the subcontractor scope can lead to costly disputes Rameezdeen and Rodrigo 2014). Thus, legal contract provisions
among the contracting parties. Although the NLP models developed should be written in such a way that it can be easier for engineers
by Hassan et al. (2020) and Hassan and Le (2021) facilitated to comprehend (Saseendran et al. 2020). In this regard, potential
subcontract scope extraction, the model is applicable to only three future research includes using state-of-the-art NLP methods for text
disciplines in highway projects. Since several subcontractors are simplification to convert the legal language into an equivalent but
generally involved in large multidisciplinary projects, a more com- simpler language (Temnikova 2012; Wang et al. 2016). Readability
prehensive model that can extract scope associated with any subcon- assessment tools should be developed that can help contract writers
tracted task (such as plumbing, electrical, etc.) in any project type to identify difficult legal terms. Domain-specific methods capable of
(industrial, commercial, etc.) is still required. replacing infrequent legal jargons with more common and simple
words may improve the ease of reading legal documents for con-
struction engineers. It is worth investigating such a strategy for sim-
Automated Contract Review plifying construction contracts as it has shown considerable success
Despite the fact that significant efforts have been made to develop in other fields (Dell’Orletta et al. 2015). The use of simpler language
contract review methods for enhancing the quality of contracts, there would be extremely helpful, especially for projects in which
are still significant challenges that require further investigation. For the stakeholders are from multiple countries (Foliente 2000; Lin
example, several researchers expressed the need for new contract et al. 2009; Moku et al. 2012). Foreign professionals whose native
review methods capable of detecting dispute-prone clauses in the language is not English may face great difficulties in understanding
contract (Chakrabarti et al. 2019). Specially, future research should legal documents written in English. Moreover, abbreviations are fre-
offer effective means for automated identification of contractual quently used in the contracts. Research shows that expanding the
clauses (e.g., penalties, rights, obligations, payments clauses) that abbreviations used in the natural text can further improve compre-
are more likely to lead to disputes (Jagannathan and Delhi 2019). hension as it reduces the reader’s cognitive load spent for remem-
Data-driven approaches that leverage databases of the legal case bering a large number of abbreviations for different words (McCoy
such as LexisNexis would be worth considering. These approaches and Demasco 1995). Another text simplification strategy that is
require novel NLP models for analyzing the frequency of contract worth investigating is proposing a controlled natural language
clauses cited in legal cases. Novel measures of litigation-proneness for construction project contracts. The use of standardized vocabu-
would also be in a critical need to address the above challenge. lary and grammar will help reduce variation in contract writing
Additionally, the inclusion of contradictive clauses in a single styles, thus minimizing reading effort spent on a new contract.