Professional Documents
Culture Documents
net/publication/337021489
CITATIONS READS
71 3,688
1 author:
Haytham H. Elmousalami
Zagazig University
22 PUBLICATIONS 285 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
PREDICTION OF CONSTRUCTION COST FOR FIELD CANALS IMPROVEMENT PROJECTS IN EGYPT View project
All content following this page was uploaded by Haytham H. Elmousalami on 25 January 2021.
Abstract: This study reviews the common practices and procedures conducted to identify the cost drivers that the past literature has classified
into two main categories: qualitative and quantitative procedures. In addition, the study reviews different computational intelligence (CI)
Downloaded from ascelibrary.org by haytham elmousalami on 10/19/19. Copyright ASCE. For personal use only; all rights reserved.
techniques and ensemble methods conducted to develop practical cost prediction models. This study discusses the hybridization of these
modeling techniques and the future trends for cost model development, limitations, and recommendations. The study focuses on reviewing
the most common artificial intelligence (AI) techniques for cost modeling such as fuzzy logic (FL) models, artificial neural networks (ANNs),
regression models, case-based reasoning (CBR), hybrid models, diction tree (DT), random forest (RF), supportive vector machine (SVM),
AdaBoost, scalable boosting trees (XGBoost), and evolutionary computing (EC) such as genetic algorithm (GA). Moreover, this paper pro-
vides the comprehensive knowledge needed to develop a reliable parametric cost model at the conceptual stage of the project. Additionally,
field canals improvement projects (FCIPs) are used as an actual case study to analyze the performance of the ML models. Out of 20 AI
techniques, the results showed that the most accurate and suitable method is XGBoost with 9.091% and 0.929 based on mean absolute
percentage error (MAPE) and adjusted R2 , respectively. Nonlinear adaptability, handling missing values and outliers, model interpretation,
and uncertainty are discussed for the 20 developed AI models. In addition, this study presents a publicly open data set for FCIPs to be used for
future model validation and analysis. DOI: 10.1061/(ASCE)CO.1943-7862.0001678. © 2019 American Society of Civil Engineers.
Author keywords: Artificial intelligence; Feature engineering; Ensemble methods; Hybrid intelligent systems; Fuzzy analytic hierarchy
process; Genetic algorithm; Factor analysis; Fuzzy logic; XGBoost; Project cost modeling.
U ij Þ where, for example, civil (C), mechanical works (M), and cluded that too many features higher than optimal diminishes
electrical works (E) refer to three different criteria, respectively the model performance. As shown in Fig. 4, a feature selection al-
(Elmousalami et al. 2018a) gorithm can be categorized into three main categories, filter, wrap-
per, and embedded, according to their selection manners (Guyon
Y
n 1 Yn 1 Y
n 1 and Elisseeff 2003). Filter algorithms rank features by evaluating
n n n
W ijn ¼ lijn ; mijn ; uijn ð8Þ correlation with independent variables such as Boruta feature
n¼1 n¼1 n¼1 analysis and principal component analysis. They do not optimize
the prediction accuracy. Therefore, filter algorithms are often more
where i = a criterion such as C, M, or E; j = screened cost param- computationally efficient than wrapper algorithms. However, the
eter for a defined case study; n = number of experts; Lij = minimum key limitation of the filter algorithms is not taking into consider-
of the experts’ common consensuses; M ij = average of the experts’ ation the effects of the selected features. Wrapper algorithms
common consensuses; U ij = maximum of the experts’ common depend on an inductive algorithm for sample regression in an iter-
consensuses; Lj = opinions mean of the minimum of the experts’ ative manner. Wrapper algorithms include optimization based on
common consensuses (Lij ); M j = opinions mean of the average of genetic algorithm (GA) or greedy approaches such as forward se-
the experts’ common consensuses (M ij ); U j = opinions mean of the lection and backward elimination. However, wrapper algorithms
maximum of the experts’ common consensuses (U ij ); and W ijn = can be prone to overfitting and high-cost computation. The em-
aggregated triangular fuzzy numbers of the nth expert’s view. bedded approach is a hybrid approach between filter algorithms
Based on the aggregated pairwise comparison matrix, the value and wrapper algorithms. The embedded approach depends on cer-
of fuzzy synthetic extent Si with respect to the ith criterion can be tain types of an inductive algorithm such as SVM-based recursive
computed by Eq. (9) by algebraic operations on triangular fuzzy feature elimination (RFE) (Lin et al. 2012).
numbers (Saaty 1994; Srichetta and Thurachon 2012) The objective of variables identification is to increase the model
X m Xn Xm −1 prediction accuracy and provide a better understanding of collected
Si ¼ W ij × W ij ð9Þ data (Guyon and Elisseeff 2003). Accurate cost drivers identification
j¼1 i¼1 j¼1 leads to the optimal performance of the developed the cost model.
Quantitative methods depend on the collected data such as factor
where i = a criterion; j = screened parameter; W ij = aggregated analysis, regression methods, and correlation methods in which ma-
triangular fuzzy numbers of the nth expert’s view; and Si = value chine learning models can be applied and conducted to figure out
of fuzzy synthetic extent. Based on the fuzzy synthetic extent val- patterns and relations in the collected data. Therefore, quantitative
ues, this study used Chang’s method (Saaty 1980) to determine techniques can automatically identify the key cost drivers.
the degree of possibility by Eq. (10). Accordingly, the degree of
possibility can assess and evaluate the system alternatives
8 9 Factor Analysis
>
> 1; if mm ≥ mc >>
>
< >
= Factor analysis (FA) is a machine learning method to cluster correlated
0; if lm ≥ uc variables to a lower number of factors that is used to filter data and
VðSm ≥ Sc Þ ¼ ð10Þ
>
> lc − um >
> determine key parameters. Many types of factoring exist such as prin-
>
: ; Otherwise ; >
ðlm − uc Þ − ðlm − uc Þ cipal component analysis (PCA), canonical factor analysis, and image
factoring (Polit and Beck 2012). The advantage of exploratory factor
where VðSm ≥ Sc Þ = degree of possibility between (C) criterion analysis (EFA) is to combine two or more variables into a single factor
and (M) criterion; ðlc ; mc ; uc Þ = fuzzy synthetic extent of (C) that reduces the number of variables. However, factor analysis cannot
criterion; and ðlm ; mm ; um Þ = fuzzy synthetic extent of (M) provide results’ causality to interpret the factored data.
criterion. EFA is conducted by PCA to reduce the number of variables,
as well as to understand the structure of a set of variables (Field
2009). The following questions should be answered before con-
Key Cost Drivers Identification by Quantitative ducting EFA:
Procedures 1. How large does the sample need to be?
2. Is there multicollinearity or singularity?
One of the key challenges of predictive modeling is the high dimen- 3. What is the method of data extraction?
sionality of data with small data size. Therefore, the dimensionality 4. What is the number of factors to retain?
reduction technique should be equipped in the prediction model 5. What is the method of factor rotation?
to simplify the model to a decision maker, to enable faster 6. Should factor analysis or principal component analysis be used?
Kline (1999) The absolute minimum sample size required is 100 cases.
Sample Size eigenvalues where a scree plot diagram can be developed to retain
factors (Cattell 1966). All factors that have eigenvalues greater than
Factors obtained from small data sets cannot generalize as well as
those derived from larger samples. Some researchers have sug- 1 can be retained (Kaiser 1960). On the other hand, Jolliffe (1972,
gested using the ratio of sample size to the number of variables. 1986) recommended retaining factors that have eigenvalues more
As illustrated in Table 1, this ratio can be 10 times the number of than 0.7.
all variables (Nunnally 1978) or between 5 and 10 cases per each According to factor rotation, two types of rotation exist: orthog-
variable (Kass and Tinsley 1979). For example, if the number of onal rotation and oblique rotation (Field 2009). Orthogonal rotation
variables is ten variables, the sample size will be at least 100 ob- can be varimax, quartimax, and equamax, whereas oblique rota-
servations (Nunnally 1978) or between 50 and 100 observations tion can be direct oblimin and promax. Accordingly, the resulting
(Kass and Tinsley 1979). outputs depend on the selected rotation method. For a first analysis,
the varimax rotation should be selected to easily interpret the fac-
tors, and this method can generally be conducted. The objective of
Multicollinearity or Singularity the varimax is to maximize the loadings dispersion within factors
and to load a smaller number of clusters (Field 2009). Stevens
The first step is to check the correlation among variables and avoid
(2002) concludes that no difference between factor analysis and
multicollinearity and singularity (Tabachnick and Fidell 2007;
Hays 1983). Multicollinearity means variables are correlated too component analysis exists if there are 30 or more variables and
highly, whereas singularity means variables are perfectly corre- communalities greater than 0.7 for all variables. On the other hand,
lated. It is used to describe variables that are perfectly correlated a difference between factor analysis and component analysis exists
(it means the correlation coefficient is 1 or −1). There are two if the variables are fewer than 20 variables and there are low
methods for assessing multicollinearity or singularity. The first communalities (<0.4).
method depends on the correlation matrix scanning, whereas the
second methods depends on the determinant of the correlation Regression Methods for Key Cost Drivers Selection
matrix. The first method is conducted by scanning the correlation
matrix among all independent variables to eliminate variables with Regression analysis can be used for both cost drivers selection and
correlation coefficients greater than 0.90 (Field 2009; Hays 1983) cost prediction modeling (Ratner 2010). Regression models can
or correlation coefficients greater than 0.80 (Rockwell 1975). The learn from the given data by adjusting the regression parameters
second method is to scan the determinant of the correlation matrix. to map a mathematical relationship based on the given data. The
Multicollinearity or singularity may be in existence if the determi- current study focused on cost drivers selection. Therefore, the for-
nant of the correlation matrix is less than 0.00001. One simple heu- ward, backward, and stepwise methods are reviewed as follows.
ristic is that the determinant of the correlation matrix should be Forward selection initiates with no variables in the model, where
greater than 0.00001 (Field 2009; Hays 1983). If the visual inspec- each added variable is tested by a comparison criterion to improve
tion reveals no substantial number of correlations greater than 0.3, the model performance (Wilkinson and Dallal 1981). If the inde-
PCA probably is not appropriate. Also, any variables that corre- pendent variable significantly improves the ability of the model to
late with no others (R ¼ 0) should be eliminated (Field 2009; predict the dependent variable, then this predictor is retained in the
Hays 1983). model and the method searches for a second independent variable
Bartlett’s test can be used to test the adequacy of the correlation (Field 2009; Draper and Smith 1998).
matrix. It tests the null hypothesis that the correlation matrix is an Backward selection is the opposite of the forward method.
identity matrix where all the diagonal values are equal to 1 and In this method, all input independent variables are initially selected,
all off-diagonal values are equal to 0. A significant test indicates and then the most unimportant independent variables are eliminated
that the correlation matrix is not an identity matrix with a signifi- one-by-one based on the significance value of the t-test for each
cance value less than 0.05 and the null hypothesis can be rejected variable. The contribution of the remaining variable is then reas-
(Dziuban and Shirkey 1974). sessed (Field 2009; Draper and Smith 1998).
According to factor extraction, factor (component) extraction is Stepwise selection is an extension of the forward selection ap-
conducting EFA to determine the smallest number of components proach, in which input variables may be removed at any subsequent
that can be used to represent interrelations among a set of variables iteration (Field 2009; Draper and Smith 1998). Despite forward se-
(Tabachnick and Fidell 2007). Factors can be retained based on lection, stepwise selection tests at each step for variables to be
T
X1 1 0 0
X2 1 0 1 Si
P Yfinal
P= EA ANNs
Xm 0 1 1
F
Feature selection
system
Petroutsatou et al. Questionnaire survey A questionnaire survey has been conducted to determine significant parameters for an ANN
(2012) cost prediction model for tunnel construction in Greece.
Williams (2002) Regression methods Based on bidding data, the stepwise regression method has been utilized to check the
significance of each parameter and select the key cost drivers for the regression model.
El Sawalhi (2012) Questionnaire survey Both a questionnaire survey and relative index ranking technique have been conducted
to investigate and rank the factors affecting the cost of building construction for the fuzzy
logic model.
Knight and Fayek Questionnaire survey Based on past related literature and interview surveys, all parameters affecting cost overruns
(2002) for building projects have been identified and ranked for the fuzzy logic model.
Choi et al (2014) Questionnaire survey Based on a questionnaire survey, attributes of the road construction project have been
identified.
Alroomi et al. Questionnaire survey þ factor Based on 228 completed questionnaires, all relevant cost data of competencies have been
(2012) analysis collected by experts, whereas the factor analysis has been conducted to investigate the
correlation effects of the estimating competencies.
Kim (2013) Questionnaire survey þ factor Based on a questionnaire survey and factor analysis, all parameters affecting best practices of
analysis infrastructure projects have been identified and ranked.
Manoliadis et al. FAHP Based on qualifications survey, FDM is conducted to assess bidders’ suitability for improving
(2009) bidder selection.
Pan (2008) FAHP Fuzzy AHP is conducted to provide the vagueness and uncertainty for selecting a
bridge construction method. Therefore, FAHP obtains more reliable results than the
conventional AHP.
Marzouk and Questionnaire survey A questionnaire survey has been conducted to identify and evaluate 14 parameters affecting
Ahmed (2011) the construction costs of pump station projects.
Liu (2013) FDM þ FAHP Both FDM and FAHP were conducted to evaluate and filter all factors affecting indicators of
managerial competence.
Saaty (2008) AHP For a countless application, AHP has been conducted as a powerful decision-making
procedure among different criteria and alternatives.
Laarhoven and FAHP AHP and fuzzy theory were combined to produce FAHP in which the objective was to evaluate
Pedrycz (1983) the most important cost parameters.
Ma et al. (2010) FAHP FAHP was conducted for pile-type selection based on the collected field factors where the
fuzzy AHP approach produces an efficient performance for pile-type selection.
Erensal et al. (2006) FAHP FAHP was conducted for evaluating key parameters in technology management.
Srichetta and FAHP FAHP was conducted for evaluating notebook computer products.
Thurachon (2012)
Hsu et al. (2010) FDM þ FAHP This study utilized two processes of selection and decision making. FDM is the first process to
identify the most important factors, whereas the second process is FAHP to identify the
importance of each factor.
Marzouk and Elkadi Questionnaire survey þ FA EFA was conducted to select the cost drivers of water treatment plants in which a total of
(2016) 33 variables were reduced to four components. Such components are used as inputs to the
ANN model.
Woldesenbet and FA Based on the collected roadway project data, factor analysis of a covariance and correlation
Jeong (2012) matrix have been investigated to identify critical factors of the project.
Park and Kwon Questionnaire survey þ FA Both questionnaire and FA have been investigated to discover the critical success factors for
(2011) infrastructure projects in Korea.
Akintoye (2000) FA Seven factors out of 24 factors influencing contractors’ cost estimating have been selected
by FA.
Stoy et al. (2012) Regression method Backward regression method has been computed to determine key cost drivers based on a total
of 75 residential projects.
Lowe et al. (2006) Regression method Based on 286 sets of data collected in the United Kingdom, both forward and backward
stepwise regression have been used to develop six parametric cost models.
Yang (2005) Correlation method Correlation matrix should be scanned to reduce variable and to detect redundant variables.
Ranasinghe (2000) Correlation method This study presents induced correlation concept to analyze input cost variables for residential
building projects in Germany.
the input parameters or data to obtain the cost estimate. The esti-
FDM + mates are the outputs of the cost model.
FAHP
8%
Questionnaire
survey + Computational Intelligence
Literature survey
FAHP 24%
Computational intelligence (CI) techniques are aspects of human
24%
knowledge and make computations adaptively to become more
vigorous in system modeling than classical mathematical modeling
(Bezdek 1994). Based on CI, an intelligent system can be devel-
Trial and error
approach oped to produce consequent outputs and actions depending on
4% the observed inputs and outputs of the system (Siddique and
AHP Adeli 2013).
4%
The objective is to solve complex real-world problems based on
GA Regression
8% methods Factor analysis
data analytics such as classification, regression, prediction, and
12% 8% optimization in an uncertain environment. The core advantage of
the intelligent systems is their human-like capability to make de-
Correlation method
8%
cisions that depend on information with uncertainty and impreci-
sion. The basic approaches to computational intelligence are fuzzy
Fig. 6. Cost drivers identification methods. logic (FL), artificial neural networks (ANNs), and evolutionary
computing (EC). Accordingly, CI is a combination of FL, neuro-
computing, and EC (Engelbrecht 2002). The scope of this study
focuses on the three methodologies of computational intelligence—
FL, ANNs, and EC—and their fusion.
Single-unit rate methods calculate the total cost of the project
based on a unit such as the area of the building or an accommo-
dation method such as cost per bed for hotels or hospitals. Para- Multiple Regression Analysis
metric cost modeling is to develop a model based on statistical Multiple regression analysis (MRA) is a statistical analysis that
relations of the key parameters extracted by conducting qualitative uses given data for prediction applications. Based on historical
techniques (Elmousalami et al. 2018a) or statistical analyses such cases, regression analysis develops a mathematical form to fit the
as regression models, ANNs, and FL models. Elemental cost analy- given data (Field 2009; Walker 1989). This mathematical form can
sis is dividing the project into the main elements and estimating the be formulated as Eq. (11)
cost of each element based on historical data. A quantity survey is a
detailed cost estimate based on quantities surveyed and contract Y ¼ B0 þ B 1 X 1 þ B 2 X 2 þ : : : : : : : : : B n X n ð11Þ
unit costs rates in which such quantities include the resources used
such as materials, labor, and equipment for each activity. Therefore, where Y = dependent variable; B0 = constant; Bi = variable coef-
estimators usually apply single-unit rate method or parametric cost ficient; and X i = independent variables. The change by one unit of
estimate at the conceptual stage where no detailed information of the independent variable X 1 causes a change by B1 in the dependent
the project is available. variable Y. Similarly, the change by one unit of the independent
The estimating process consists of six main elements (Kan variable X 2 causes a change by B2 in the dependent variable Y.
2002): project information, historical data, current data, estimating In addition, the sign of B1 and B2 determines the decrease or in-
methodology, cost estimator, and estimates. Project information in- crease in the dependent variable Y. The objective of the regression
cludes the project characteristics that can be used as inputs to the model is to mathematically represent data with minimal prediction
cost model. Historical data are the collected data of the previous error. Therefore, regression analysis is applied in cost estimate
projects to statistically develop the cost model. Current data are modeling to represent the cost-estimate relationships where the
the data extracted from the project information such as unit cost cost prediction is represented as the dependent variable and the cost
rates of material, labor, and equipment. Estimating methodology drivers are represented as the independent variables.
is the method used for cost estimate such as parametric cost model. Multiple regression analysis includes a model called polynomial
The cost estimator is the user who uses the cost model and enters regression. Polynomial regression regresses a dependent variable
Y ¼ B0 þ B1 X i þ B2 X 2i þ : : : Bn X Ki þ ei ; for i ¼ 1; 2; : : : ; n A B
ð12Þ 1.0
Core Core
where K = degree of the polynomial.
According to the sample size, (50 þ 8k) may be the minimum 0.5
sample size, where k is the number of predictors (Green 1991).
According to deleting outliers, Cook’s distance detects the impact
of a certain case on the regression model (Cook and Weisberg
1982). If the Cook’s values are <1, there is no need to delete that 0.0 X
a1 a2 a3 a4 b3 b4
case (Stevens 2002). Otherwise, if the Cook’s values are >1, there b1 b2
is a need to delete that case. Variables are highly correlated where
Downloaded from ascelibrary.org by haytham elmousalami on 10/19/19. Copyright ASCE. For personal use only; all rights reserved.
the coefficient of determination is higher than 0.8 (R2 > 0.8) Support Set of A Cross point of A and B
(Rockwell 1975). The variance inflation factor (VIF) examines Fig. 7. Fuzzy trapezoidal membership function (MF).
the linear relationship with the other variable (Field 2009), and if
the average VIF is greater than 1, then multicollinearity occurs and
can be detected (Bowerman and O’Connell 1990; Myers 1990).
Homoskedasticity occurs when the residual terms vary constantly if-then rules can be developed to establish rule-based systems.
and the residual variance should be constant to avoid a biased re- Each rule represents human logic and experience in which all rules
gression model (Field 2009). The Durbin–Watson test is conducted represent the brain of the fuzzy system. A single fuzzy if-then rule
to check the correlations among errors where the test values range can be represented by the following:
between 0 and 4. The value of two denotes that residuals are un-
correlated (Durbin and Watson 1951). Accordingly, regression If hfuzzy proposition ðx is A1 Þi Then hfuzzy proposition ðy is B2 Þi
models can be summarized in the following steps:
1. Collect and prepare the historical cases. where x = input parameter; A1 = MF of x; y = output parameter; and
2. Divide the collected cases into a training set and a validation set. B2 = MF of y. Rule-based systems are systems that have more than
3. Check the sample size of the collected training data (Green one rule to represent human logic and experience to the developed
1991; Stevens 2002). system. Aggregation of rules is the process of developing the over-
4. Define key independent parameters (cost drivers) and dependent all consequent from the individual consequents added by each rule
parameter (cost variable). (Siddique and Adeli 2013).
5. Develop a regression model and check the significance As shown in Fig. 8, there are two parameters X 1 and X 2 , where
(P-value) of each coefficient (Field 2009). μ X 1 ¼ fa1 ; b1 ; c1 ; d1 g, μ X 2 ¼ fa2 ; b2 ; c2 ; d2 g, μ Y ¼ fay ; by ;
6. Check outliers (Cook and Weisberg 1982). cy ; dy g and the fuzzy system consists of two rules as follows:
7. Check the variance inflation factor (VIF) (Bowerman and
O’Connell 1990; Myers 1990). Rule 1: IF X 1 is a1 AND X 2 is c2 THEN y is ay
8. Check homoskedasticity (Durbin and Watson 1951).
9. Calculate the resulting error such as mean absolute percentage Rule 2: IF X 1 is b1 AND X 2 is d2 THEN y is by
error (MAPE).
where two inputs are used fX 1 ¼ 4; X 2 ¼ 6g. Such two inputs
intersect with the antecedents MF of the two rules where two con-
Fuzzy Logic sequent rules are produced {R1 and R2 } based on minimum inter-
sections. The consequent rules are aggregated based on maximum
Fuzzy logic (FL) is the modeling of human decision making by
intersections where the final crisp value is 3. The aggregated output
representing uncertainty, incompleteness, and randomness of the
for Ri rules is given by
real-world system (Zadeh 1965, 1973). In addition, FL represents
the experts’ experience and knowledge by developing fuzzy rules. Rule 1: μ R1 ¼ min½μ a1 ðX 1 Þ and μ c2 ðX 2 Þ
Such knowledge is represented in fuzzy systems by membership
functions (MFs), which range from zero to one. MFs can be tri- Rule 2: μ R2 ¼ min½μ b1 ðX 1 Þ and μ d2 ðX 2 Þ
angular, trapezoidal, Gaussian, and bell-shaped functions in which
the selection of the MF function is problem-dependent. Fig. 7 Y: Fuzzification½max½R1 ; R2
illustrates a trapezoidal MF that consists of a core set fa2 ; a3 g and
a support set fa1 ; a2 ; a3 ; a4 g. The shape of MF significantly influ- Fuzzification is transforming crisp values into fuzzy inputs.
ences the performance of a fuzzy model (Wang 1997; Chi et al. Conversely, defuzzification is transforming a fuzzy quantity into a
1996). Therefore, many methods, such as clustering approaches crisp output. Many different methods of defuzzification exist such
and genetic algorithms, are applied to develop MFs automatically as max-membership, center of gravity, weighted average, mean-
to select the optimal shape of MFs. max, and center of sums (Runker 1997). An inference mechanism
Once MFs can be identified for each dependent and independent is the process of converting input space to output space such as
parameter, a set of operations on fuzzy sets can be conducted. Such Mamdani fuzzy inference, Sugeno fuzzy inference, and Tsukamoto
operations are the union of fuzzy sets, the intersection of fuzzy sets, fuzzy inference (Mamdani and Assilian 1974; Takagi and Sugeno
and complement of fuzzy set and α-cut of a fuzzy set. Linguistic 1985; Sugeno and Kang 1988; Tsukamoto 1979).
terms are used to approximately represent the system features Fuzzy modeling identification includes two phases: structure
where such terms cannot be represented as quantitative terms identification and parameter identification (Emami et al. 1998).
(Zadeh 1976). Once MFs and linguistic terms have been defined, Structure identification is to define input and output variables
a1 ay
c2
X2R1 X2R1
X1R1 AND Rule 1
min X1R1
X1R1
one ten
YC = 3
Third: aggregation and defuzzification
and to develop input and output relations through if-then rules. The In a feedforward network, all neurons are connected together.
following points summarize the structure identification of fuzzy The feedforward network consists of the input vector (x), a weight
system: matrix (W), a bias vector (b), and an output vector (Y) that can be
1. Determine relevant inputs and outputs. formulated as Eq. (13)
2. Select fuzzy inference system, e.g., Mamdani, Sugeno, or
Tsukamoto.
3. Define the linguistic terms associated with each input and output Y ¼ fðW · x þ bÞ ð13Þ
variable.
4. Develop a set of fuzzy if-then rules to represent the relation
between the inputs and outputs. where fðÞ includes a nonlinear activation function. Different types
On the other hand, the parameters identification is an optimiza- of activation functions exist such as linear function, step function,
tion problem in which the objective is to maximize the performance ramp function, and tan sigmoid function. Selection of ANN param-
of the developed system. Defining MFs such as the shape of eters such as the number of neurons, connections transfer functions,
MF (triangular, trapezoidal, Gaussian, and bell-shaped functions) and hidden layers mainly depend on the ANN’s application.
and its corresponding values can significantly optimize the system Several types of feedforward neural network architectures exist
such as multilayer perceptron networks (MLP), radial basis
performance.
function networks, generalized regression neural networks, prob-
abilistic neural networks, belief networks, Hamming networks,
Artificial Neural Networks and stochastic networks in which each architecture is problem-
dependent (Siddique and Adeli 2013). In this study, multilayer
ANNs are biologically inspired models to mimic the human neural perceptron networks (MLP) are explained in some detail.
system for information-processing and computation purposes. As shown in Fig. 9, an MLP network is a network with several
ANN is a machine learning (ML) technique that can learn from layers of perceptrons in which each layer has a weight matrix (W),
past data. Learning forms can be supervised, unsupervised, and a bias vector (b), and an output vector (Y). The input vector X ¼
reinforcement learning. Contrary to traditional modeling tech- fX 1 ; X 2 ; X 3 ; : : : g feeds forward to n neurons in the hidden layer
niques such as linear regression analysis, ANN models have the with a transfer function fðÞ where weights w ¼ fw1 ; w2 ; w3 ; : : : g
ability to approximate nonlinear functions to a specified accuracy. combined to produce the output. The outputs of each layer are
The first model of artificial neural networks came in 1943 when computed as Y kn ¼ fðW n;m;k · X m þ bi;k Þ, where k is the number
Warren McCulloch, a neurophysiologist, and Walter Pitts, a young of layers, d is the number of inputs, i is the number of bias nodes,
mathematician, outlined the first formal model of an elementary n is the number of neurons, m is the number of weight for each
computing neuron (McCulloch and Pitts 1943). The first model sending neuron, and fðÞ is the activation function (e.g., sigmoid
of ANNs proposed by Warren McCulloch to mimic the human neu- and tan sigmoid functions). No exact rule exists to determine the
ral system, the model is based on the concept of electrical circuits, number of hidden layers and neurons in the hidden layer. Huang
in which the output is zero or one. This is called a perceptron or and Huang (1991) and Choi et al. (2001) stated that one hidden
neuron, and such a neuron is the unit of the ANN (McCulloch and layer MLP needs at least (P − 1) hidden neurons to classify P
Pitts 1943). Hopfield connected these neurons and developed a patterns. A standard rectified linear unit (ReLU) is an activation
network to create ANNs (Hopfield 1982). Generally, ANNs can be function that can enhance the computing performance of ANNs
categorized into two main categories: feedforward networks and (Nair and Hinton 2010). Mathematically, ReLU is defined as
recurrent networks. follows:
Xi ; if X i ≥ 0 yi ðW × X i þ bÞ ≥ 0 − ξ; i ¼ 1; 2; 3; : : : m ð15Þ
A¼
0; Xi < 0
Accordingly, the objective function for SVM optimization is as
For big data and high dimensionality of data, deep neural net- expressed in Eq. (16)
works (DNNs) can be conducted with a ReLU activation function
for high performance computing (LeCun et al. 2015). X
i¼m
1 X
i¼m
A three-layer network (input layer, hidden layer, and output Min w × wT þ C ξi ð16Þ
i¼0
2 i¼0
layer) can solve a wide range of prediction, approximation, and
classification problems. Moreover, to avoid overfitting problems
and enhance the generalization capability, the number of training
Decision Trees
cases should be more than the size of the network (Rutkowski
2005). The learning mechanism of ANNs is modifying the weights Decision trees (DT) is a supervised ML model that divides the
and biases of the network to minimize the in-sample error. Devel- given data into hierarchical rules on each tree node by a repetitive
oping ANNs can be summarized in the following steps: splitting algorithm. Three of the most commonly applied algo-
1. Collect and prepare the historical cases. rithms for decision trees modeling are the chi-square automatic in-
2. Divide the collected cases into a training set and a valida- teraction detector (CHAID), classification and regression trees
tion set. (CART), and C4.5/C5.0 algorithms; CHAID is for categorical var-
3. Determine relevant inputs and outputs. iables, and CART and C4.5/C5.0 are for both continuous variables
4. Select the number of hidden layers. and categorical variables (Berry and Linoff 1997; Breiman et al.
5. Select the number of neurons in each hidden layer. 1984). CART is a tree learning model that can be applied for both
6. Select the transfer function. regression (continuous variables) and classification (categorical
7. Set initial weights. variables) applications (Quinlan 2014). An alternative to the
8. Select the learning algorithm to develop the ANNs’ weights. black box nature of ANNs, DT generates logic statements and inter-
9. Train the model for several iterations to get minimal predic- pretable rules that can be used for identifying the importance of
tion error. data features (Perner et al. 2001). Another advantage of DT is
10. Calculate the resulting error such as mean absolute percentage avoiding the curse of dimensionality and providing high per-
error (MAPE). formance computing efficiency through its splitting procedure
(Prasad et al. 2006). However, DT produces unsatisfactory perfor-
mance in time series, noisy, or nonlinear data (Curram and
Support Vector Machines Mingers 2017).
A support vector machine (SVM) is a nonparametric supervised
ML algorithm that can be applied for regression and classification
Case-Based Reasoning
problems (Vapnik 1979). The objective is to minimize misclassifi-
cations cases through optimizing the margin and hyperplanes Case-based reasoning (CBR) is a sustained learning and incremen-
distance as shown in Fig. 10. tal approach that solves problems by searching the most similar
Slack variables are added to solve the inseparability problem past case and reusing it for the new problem situation (Aamodt
(Cortes and Vapnik 1995). According to the linear separable case, and Plaza 1994). Therefore, CBR mimics human problem solving
the objective is to maximize the hyperplane distance between the (Ross 1989; Kolodner 1992). As illustrated in Fig. 11, CBR is a
two class boundaries cyclic process of learning from past cases to solve a new case. The
main processes of CBR are retrieving, reusing, revising, and retain-
W × Xi þ b ≥ 1; if yi ≥ 0 ing. The retrieving process solves a new case by retrieving the past
Linear SVM ¼ ð14Þ
W × X i þ b < −1; if yi < 0 cases. The case can be defined by key attributes. Such attributes are
used to retrieve the most similar case, whereas the reusing process
for i ¼ 1; 2; 3; : : : ; m. uses the new case information to solve the problem. The revising
Eq. (14) can be generalized as yi ðW × X i þ bÞ ≥ 0. For opti- process evaluates the suggested solution to the problem. Finally,
mum separation, the distance between Pi¼m the two marginal hyper- retaining process is to update the stored past cases with such a
1
planes should be minimized as i¼0 2 w × wT . For nonlinear new case by incorporating the new case to the existing case-base
data, a positive slack variable (ξ) is added to handle the nonlinearity (Aamodt and Plaza 1994). A CBR model can be developed to pre-
of the data as shown in Eq. (15) dict the conceptual cost based on similar attributes of the entered
+ 20 - 20 +5 +5
F(X1) = +20 +5 = + 25
Misclassified cases
F(X2) = -20 +5 = - 15
Downloaded from ascelibrary.org by haytham elmousalami on 10/19/19. Copyright ASCE. For personal use only; all rights reserved.
Supportive vectors
Fig. 12. Additive function concept.
Fig. 10. Linear support vector machine. (Data from Burges 1998.)
Ensemble Methods
Problem
Case attributes Ensemble methods (fusion learning) are elegant data mining tech-
1 niques to combine multiple learning algorithms to enhance the
New case
Retrieve overall performance (Hansen and Salamon 1990). Ensemble meth-
Learned case ods can apply any ML algorithm such as ANN, decision tree, and
SVM, which are called base model or base learner, as inputs for
4 Retain ensemble methods. The concept behind ensemble methods can be
Cases-base Retrieved case
illustrated as in Fig. 12 and mathematically as Eq. (19) (Chen and
Confirmed case
2 Guestrin 2016).
3 Reuse For a given data set (D) with n examples and m features
Revise Suggested case D ¼ fðxi ; yi Þgðxi ∈ Rm ; yi ∈ RÞ where R is the real numbers set.
K is an additive function to predict the output as Eq. (19)
Fig. 11. CBR processes. (Data from Aamodt and Plaza 1994.)
X
K
y^ i ¼ fk ðX i Þ; fk ∈ F ð19Þ
k¼1
case comparable with the stored cases. Once attributes are entered,
where F ¼ ffðxÞ ¼ wqðxÞ g (q: Rm → T, T ∈ Rm ). q is the struc-
attributes similarities (AS) can be computed based on Eq. (17)
ture of each tree that maps an example to the corresponding. T cor-
(Kim et al. 2004)
responds to the number of leaves in the tree. y^ i is the predicted
MinðAV N ; AV R Þ dependent variable. Each fk represents an independent tree struc-
AS ¼ ð17Þ ture q and leaf weights w. xi represents independent variables.
MaxðAV N ; AV R Þ
F represents the regression trees space. Fig. 12 shows a tree ensem-
where AS = attribute similarity; AVN = attribute value of the newly ble model. wi corresponds to the score on ith leaf where each leaf
entered case; and AVR = attribute value of the retrieved case. has a continuous score unlike decision trees.
Depending on AS and attribute weights (AW), case similarity (CS) Ensemble methods compromise different methods such as
can be computed by Eq. (18) (Perera and Watson 1998). AW are voting, stacking, bagging, and boosting. Voting and averaging
selected by an expert to emphasize the existence and importance of are two of the simplest ensemble methods, in which averaging is
the case attributes used for regression and voting is used for classification (Opitz
Pn and Maclin 1999). The ensemble learning methods can effec-
ðAS × AW i Þ tively deal with the problems of high-dimension data, complex
CS ¼ i¼1 Pn i ð18Þ
i¼1 ðAW i Þ
data structures, and small sample size (Dietterich 2000). Bagging
algorithms [Fig. 13(a)] can increase generalization by decreasing
where CS = case similarity; AS = attribute similarity; AW = attrib- variance (Breiman 1998), whereas boosting [Fig. 13(b)] can im-
ute weight; and i = number of the attributes (key cost drivers). prove generalization by decreasing bias error (Schapire et al.
The advantage of CBR is that it deals with a vast amount of data 1998).
where all past cases and new cases are stored in database techniques In addition, ensemble models can be classified into two main
(Kim et al. 2004). Developing CBR methods can be summarized in types: homogeneous and heterogeneous. The homogeneous model
the following steps: applies the same base algorithm on different training data sets,
1. Collect and prepare the historical cases. whereas the heterogeneous model uses different base algorithms
2. Divide the collected cases into a training set and a validation set. on the same training data (Reid 2007). Ensemble methods can
3. Determine relevant input attributes and outputs. effectively handle continuous, categorical, and dummy features
4. Identify the similarity function and conduct CBR processes. with missing values. However, ensemble methods may increase
5. Calculate the resulting error such as mean absolute percentage model complexity, which decreases the model interpretability
error (MAPE). (Kuncheva 2004).
Ik Ik
D1 D2 Dk D1 D2 Dk
D D D
Bagging the top ten data mining algorithms, and the authors of AdaBoost
won the Gödel Prize for their work (Wu et al. 2008). AdaBoost
Bagging is a variance reduction algorithm to train several classifiers
serially manipulates the given data for each base learner.
based on bootstrap aggregation. A bagging algorithm randomly
AdaBoost assigns equal weights for all instances in which larger
draws replicas of a training data set with replacement to train each
weights are assigned to the misclassified cases. The objective is
classifier (Breiman 1996). As a result, diversity is obtained by re-
to make a greater focus on the misclassified cases to be corrected
sampling several data subsets. On average, each bootstrap sample
in the consequent iteration. In addition, the AdaBoost algorithm
contains 63.2% of the original training data set. The following three
uses other weights to rank each individual base learning algorithm
steps summarize the algorithm execution (Breiman 1999):
based on its accuracy (Bauer and Kohavi 1999).
1. T bootstrap samples BS1 ; BS2 ; : : : ; BST are generated.
2. A classifier Ci is developed based on each bootstrap sample Bsi.
3. An optimal classifier C is selected from C1 ; C2 ; : : : ; CT whose Extreme Gradient Boosting (XGBoost)
output is the class predicted most often by its subclassifiers, with
ties broken arbitrarily. Extreme Gradient Boosting (XGBoost) is a large-scale machine
learning system that can build a highly scalable end-to-end ensem-
ble tree boosting system for big data processing. The unique
Random Forest
advantage of the XGBoost is its scalability, so it can process noisy
The random forest (RF) is a bagging ensemble learning model data. XGBoost applies parallel computing to effectively reduce
that can produce accurate performance without overfitting issues computational complexity and learn faster (Chen and Guestrin
(Breiman 2001). RF algorithms draw bootstrap samples to develop 2016). The unique advantage of XGBoost is its scalability to fit
a forest of trees based on random subsets of features. Therefore, high-dimension data without overfitting based on the following
some features may be selected more than once, whereas others Eq. (20):
might never be selected (Breiman 2001). RF is more robust against
noisy data or big data than the DT algorithm (Breiman 1996; X
n X
K
Dietterich 2000). The key limitation of the RF algorithm is that LðÞ ¼ ðx þ aÞn ¼ lð^yi ; yi Þ þ Ωðfk Þ;
it cannot interpret the importance of features or the mechanism k¼0 k¼1
of producing the results. RF does not search for the best split var- 1
iables that diminish the correlation among the developed trees and where ΩðfÞ ¼ γT þ λkwk2 ð20Þ
2
the strength of every single tree. As a result, RF decreases the gen-
eralization error (Breiman 2001). However, RF cannot interpret the
produced predictions. An extremely randomized tree (ERT) algo- where L represents a differentiable convex cost function that deter-
rithm merges the randomization of random subspace to a random mines the difference between the predicted output y^ i and the actual
selection of the cut-point during the tree node splitting process. output yi . Ω is to avoid overfitting and smooth the learnt weights
Extremely randomized tree mainly controls the attribute randomi- (W i ), The second term penalizes the complexity of the regression
zation and smoothing parameters (Geurts et al. 2006). tree functions.
XGBoost is a traditional gradient boosting tree algorithm
with a regularization parameter. Once the regularization term
Boosting and Adaptive Boosting (AdaBoost)
is removed, the XGBoost is converted back to the traditional
Schapire has presented a boosting procedure (also known as adap- gradient tree boosting. A differentiable convex cost function
tive resampling) as an algorithm that boosts the performance of can be replaced by Taylor series as a second-order approxima-
weak learning algorithms (Schapire 1990). Bagging generates clas- tion for fasting optimization (Friedman et al. 2000). Another key
sifiers in parallel, whereas boosting develops the classifiers sequen- advantage of XGBoost is handling the missing values for which
tially as shown in Fig. 13(c). Thus, boosting converts weak models defaults direction is identified as shown in Fig. 12. Accordingly,
to strong ones. Freund and Schapire (1997) have presented an adap- no effort is needed for cleaning the collected data (Fan et al.
tive boosting algorithm (AdaBoost). AdaBoost is selected as one of 2008).
The fusion of the CI methodologies is called a hybrid intelligent ever, such an approach is time-consuming and does not guarantee
system, and Zadeh (1994) has predicted that the hybrid intelli- the optimal set of fuzzy rules. Moreover, the number of fuzzy
gent systems will be the way of the future. FL is an approximate if-then inference rules increases exponentially by increasing the
reasoning technique. However, it does not have any adaptive capac- number of inputs, linguistic variables, or outputs. In addition, the
ity or learning ability. On the other hand, ANNs provide an efficient experts cannot easily define all required fuzzy rules and the asso-
mechanism for learning from given data and accounting for uncer- ciated MFs. In many engineering problems, the evolutionary algo-
tainty that naturally exists. EC enables an optimization structure rithm (EA) has been conducted to automatically develop fuzzy
for the developed system. Combining these methodologies can rules and MFs to improve system performance (Chou 2006; Loop
enhance the computational model so that the limitations of any et al. 2010). The genetic-fuzzy model has been developed to
single method can be compensated by other methods (Siddique and optimally generate fuzzy inference rules. The formulation of the
Adeli 2013). Fig. 14 illustrates a fusion of three basic models: genetic algorithm model depends mainly on defining two core
FL, ANNs, and EC. Based on such models, many hybrid models terms: a chromosome representation and an objective function.
can be evolved such as neurofuzzy models and evolutionary neural Based on the Michigan approach, the chromosomes represents the
networks. fuzzy rules where the number of chromosomes is the number of
Moreover, the objective of data transformation is to address the fuzzy rules. Each chromosome contains a number of genes. The
normality assumption of data distribution in which the probability process of the developed model consists of five main steps:
distribution shape has an important role in statistical modeling 1. An initial population of chromosomes has been identified to
to convert error terms for linear models (Tabachnick and Fidell represent the initial state of the fuzzy rules. The four key cost
2007). Data transformation may produce more accurate results. drivers have been fed to the fuzzy system.
Stoy et al. (2008, 2012) have developed a semilog model to pre- 2. The fuzzy system produces the final predicted output of the
dict the cost of residential construction in which the MAPE for the system, y^ i .
semilog model (9.6%) was better than the linear regression model 3. The predicted cost (^yi has been fed to fitness function (F) to
(9.7%). The previous result proved that semilog models may pro- evaluate the model performance where fitness function (F) is
duce a more accurate model than a plain regression model. How- model evaluation function.
ever, this is not a rule; in other words, plain regression models 4. GA uses the fitness function (F) to evaluate the search process in
may produce more accurate and simpler models than transformed which crossover probability and mutation probability have been
models. set at 0.7 and 0.01, respectively.
Lowe et al. (2006) have established a predictive model based on 5. The new population of fuzzy rules has been produced based
286 historical cases in which three alternatives—cost=m2 , the log on the crossover and mutation processes to form the optimal
of cost variable, and the log of cost=m2 —have been developed fuzzy rules.
Neural Networks
Evolutionary Computing
prediction if MAPE is less than 10%, and if between 10% and in Table 4.
20% it can provide good prediction. Between 20% and 50%, Many previous studies have applied AI techniques and ML
MAPE generates acceptable forecasting, and more than 50% models. Building information modeling (BIM) can feed data for
gives inaccurate prediction (Lewis 1982). MAPE can be ex- cost estimation, whereas a predictive ML model such as a regres-
pressed as follows: sion model or ANN can predict the project’s cost on a macro level
(Juszczyk 2017). ANN has been applied for cost estimation of
1X n
jyi − y^ i j
MAPE ¼ × 100 ð21Þ sports fields where the general applicability of ANNs model has
n i¼1 y^ i been investigated (Juszczyk et al. 2018). ANNs have conducted
the early cost estimation of building projects for reinforced con-
where n = number of cases; i = number of the case; Y i ðhatÞ = crete buildings with acceptable performance (Ambrule and Bhirud
outcome of model; and Y i = actual outcome. The MSE 2017). CBR has been proposed for estimation the preliminary costs
measures of how well the regression fits the data as following of sports field construction based on 16 predictors using 143 con-
Eq. (22) (Aczel 1989): struction projects. Different calculations were conducted to formu-
late the case similarity based on quantitative and qualitative data for
1X n
MSE ¼ ½y − y^ i 2 ð22Þ which the final total error was 14% at the early stage (Leśniak and
n i¼1 i Zima 2018). Prediction performance of a cost prediction model has
been improved by 17.23% and 4.39% for business facilities and
where yi = observed values; and y^ i = predicted values of the multifamily housing, respectively.
dependent variable Y for the ith case. The RMSE equals the Based on more than 1,400 projects, a multilayer ensemble of
square root of MSE. methods has been developed for forecasting the unit price bids of
The R2 is expressed as Eq. (23): resurfacing highway projects (Cao et al. 2018). Wang and Ashuri
Pn (2016) have applied a random tree model for construction cost
2 SSE ðyi − y^ i Þ2
R ¼1− ¼ 1 − Pi¼1n ðy − ȳ Þ2
ð23Þ index prediction. Williams and Gong (2014) have built a stacking
SST i¼1 i i ensemble learning and text mining model to estimate the cost over-
where SSE = sum of squares of the residuals; SST = total sum of run using the project contract document for which the accuracy was
squares; y = arithmetic mean of the variable Y; and R2 measures 44%. Building information modeling (BIM) can automate cost
the percentage of the variation percentage of the predictor Y ex- estimation processes and improve inaccuracies where New Rules
plained by the dependent variable X. Thus, R2 indicates how well of Measurement (NRM) for cost estimation can be mined for au-
the regression model fits the data. R2 ranges from zero to one, tomatic cost estimation based on a four-dimensional BIM modeling
−1 ≤ R2 ≤ 1. If the R2 value is 0.9 or above, it is classified as very software (Kim et al. 2019).
good, above 0.8 is good, above 0.5 is satisfactory, and below 0.5 is Arabzadeh et al. (2018) have developed ANN, regression, and
poor (Aczel 1989; Ostertagová 2011). Adjusted R-squared (R2 ) hybrid models for cost estimation of spherical storage tanks. The
is computed by Eq. (24): results indicated that ANNs were more accurate than a hybrid re-
gression model and hybrid ANNs were more accurate than single
ð1 − R2 ÞK ANNs. Linear and multiple regression models have been counted to
R2 ¼ R2 − ð24Þ predict the preliminary estimate of road projects in Nigeria at the
n − ðK þ 1Þ
early stage (Ogungbile et al. 2018). However, the whole collected
where R2 is adjusted for the number of variables included in the data set was only 50 for seven predictors, which is not a sufficient
regression equation where R2 is lower than the R2 value. For data size to train regression models. Zhang et al. (2018) have con-
model evaluation, R2 is always preferred to R2 to avoid the over- verted the time series model into a graph to forecast the construc-
fitting problem (Aczel 1989; Ostertagová 2011). tion cost index, and the application showed its ability to provide
more accurate estimations.
A parametric model mainly depends on parameters to simu-
Cost Modeling Review late and describe the case studied (AACE 2004; Elfaki et al. 2014).
Parametric modeling builds a mathematical relationship between
The objective of the cost modeling review is to provide an overview dependent and independent variables. CI, machine learning, and
of the recent and future trends in construction cost model develop- data science are the disciplines that map the relationships among
ment. The study has reviewed the past practices of parametric these variables and figure out such patterns. The most common
cost estimation at the conceptual stage for construction projects. techniques for those disciplines have been reviewed and discussed
Recently, many international journals have been reviewed such as in the paper, including MRA, FL, ANNs, CBR, and hybrid systems
residential buildings developed to predict the cost per m2 of reinforced concrete for 4- to 8-story Doğan (2004)
residential buildings in Turkey in 2004. The cost estimation accuracy is 93%
ANNs Highway construction In 2005, an ANN model was built for highway construction costs in which the index Wilmot and Mei
of highway construction cost reflected the change in overall cost over time. (2005)
ANNs Building projects Based on 286 past cases of data collected in the United Kingdom, linear regression Lowe et al. (2006)
models and ANN models have been established to assess the cost of buildings. Three
alternatives—cost=m2 , the log of cost variable, and the log of cost=m2 —have been
conducted instead of raw cost data when such data transformation approaches have
better results than an untransformed data model. A total of six models have been
developed based on forward and backward stepwise regression analyses. The best
regression model was the log of cost backward model.
ANNs Building projects Based on 169 examples, an ANN cost model has been developed for building El-Sawalhi and
construction projects with acceptable prediction error. Shehatto (2014)
ANNs Highway construction A prediction model has been developed with an MAPE of 1.4% for the unit cost of Elbeltagi et al.
the highway project in Libya by changing ANNs structure, training functions, and (2014)
training algorithms until the optimum model was reached.
ANNs Public construction Based on 232 public construction projects in Turkey, a multilayer perceptron Bayram et al.
projects (MLP) model and radial basis function (RBF) model were developed to (2015)
estimate construction cost. RBF shows superior performance to MAPE, with
approximately 0.7%.
ANNs Building projects Based on 657 building projects in Germany, a multistep ahead approach is conducted Dursun and Stoy
to increase the accuracy of the model’s prediction. (2016)
ANNs Water treatment plant First, cost drivers that influence construction costs of water treatment plants have Marzouk and
costs been identified. Cost drivers have been determined through descriptive statistics Elkadi (2016)
ranking (DSR) and exploratory factor analysis. Principal component analysis (PCA)
with varimax rotation through five iterations has been used to minimize the
multicollinearity problem. Kaiser criterion was used so that a total of 33 variables
were reduced to eight components, whereas using Cattell’s scree test reduced
variables to four components.
ANNs and Highway projects Radial basis neural networks and regression models were developed for completed Williams (2002)
regression project cost estimation. The regression model produced better performance than the
ANN model. Moreover, a hybrid model was developed and produced reliable results.
A natural log transformation helped to improve the linear relationship between
variables.
ANNs and Cost deviation in Based on 41 examples, this study compared an ANN model with the regression Attalla and
regression reconstruction projects model for cost deviation in reconstruction projects in 2003. Hegazy (2003)
ANNs and Tunnel construction Based on 33 constructed tunnels, both ANNs and regression models have been Petroutsatou et al.
regression developed for tunnel construction in which the developed models were fitted for their (2012)
purpose and were reliable for cost prediction.
ANNs and Structural steel Based on 35 examples, a cost model consisted of three input parameters was El-Sawah and
regression buildings developed to predict the preliminary cost of structural steel buildings in 2014. ANNs Moselhi (2014)
produced better performance than regression models in which the ANNs model had
improved the MAPE by approximately 4% compared to the regression model.
ANNs and Field canals This paper developed a quadratic regression model and ANNs that can predict ElMousalami
regression improvement projects the conceptual cost at 9% MAPE. Data transformation plays important role in et al. (2018b)
prediction accuracy.
CBR Building projects This study incorporated the decision tree into CBR to identify attribute weights of Doğan et al.
CBR. Such an approach shows more reliable results for residential building projects (2008)
cost assessment.
CBR Pavement Based on the library of past cases, this study developed a CBR model for pavement Chou (2009)
maintenance operation maintenance operations costs based on computing case similarity.
CBR Pump stations A parametric cost model was presented in which a questionnaire survey was Marzouk and
organized to analyze the most critical factors affecting the final cost of pump Ahmed (2011)
stations. Using a Likert scale, these factors were screened to determine the key
factors. A case-based reasoning was built and tested to develop the proposed model.
CBR Building projects Ji et al. (2018) have proposed a learning method to handle missing data values Ji et al. (2018)
based on a data mining algorithm to improve the stability and performance of the
CBR model.
CBR and GA Bridges A cost estimation model was developed based on CBR and GA for bridge projects Kim and Kim
which was used for optimizing parameters of CBR. Such methodology improves the (2010)
accuracy compared to the conventional cost model.
CBR and AHP Highway Analytic hierarchy process (AHP) was incorporated into CBR to build a reliable cost Kim (2013)
estimation model for highway projects in South Korea.
Evolutionary neural Highway Based on 18 examples, a reliable NN cost model was developed based on optimizing Hegazy and Ayed
network (NN) NN weights for highway projects. Simplex optimization of neural network weights is (1998)
more accurate than trial and error and GA optimization for which the MAPE was 1%.
Evolutionary NN Residential buildings Based on 498 cases, a reliable NN cost model was developed based on optimizing Kim et al. (2005)
NN weights for residential buildings. GA optimization of NN parameters was more
accurate than trial and error model for which the MAPE was 4.63%.
Evolutionary fuzzy Building projects This study incorporated computation intelligence models such as ANNs, FL, and EA Cheng et al.
neural inference to make a hybrid model that improves the prediction accuracy in a complex project. (2009)
model (EFNIM), As a result, an evolutionary fuzzy neural model was developed for conceptual cost
estimation for building projects with reliable accuracy.
Evolutionary fuzzy Building projects An evolutionary fuzzy hybrid neural network model was developed for conceptual Cheng and Roy
hybrid neural cost estimation. FL was used for fuzzification and defuzzification for inputs and (2010)
network outputs, respectively. GA was used for optimizing the parameters of models such as
NN layer connections and FL memberships.
Evolutionary fuzzy Building projects Hybrid AI system based on SVM, FL, and GA were built for decision making for Cheng and Roy
and SVM project construction management. The system used FL to handle uncertainty in the (2010)
system, SVM to map fuzzy inputs and outputs, and GA to optimize the FL and SVM
parameters. The objective of such a system is to produce accurate results with less
human intervention in which MF shapes and distributions can be automatically
mapped.
GA for ANNs Residential buildings This study has built three cost NN models by back-propagation (BP) algorithm, GA Kim et al. (2005)
for optimizing NN weights, and GA for parameters optimization of the BP algorithm.
Optimizing the parameters of the BP algorithm produced the best results.
GA for ANNs Bridge construction GA was used as an optimizing tool for ANNs and CBR cost models in which two Chou et al. (2015)
and CBR projects such models have been developed for bridge projects in Taiwan. Both models have
displayed reliable results.
Fuzzy linear Wastewater treatment Based on 48 wastewater treatment plants, a fuzzy logic model was developed with Chen (2002)
regression plants acceptable error and uncertainty considerations.
Fuzzy logic Design cost overruns Based on the collected building projects in 2002, a fuzzy logic model was developed Knight and Fayek
on building projects for estimating design cost overruns on building projects with acceptable error and (2002)
uncertainty considerations.
Fuzzy sets Cost range estimation This study proposed the use of fuzzy numbers for cost range estimation and claimed Shaheen et al.
the fuzzy numbers for fuzzy scheduling range assessment. (2007)
Fuzzy model Wastewater treatment This study compared a linear regression model with a fuzzy linear regression model Papadopoulos
projects for wastewater treatment plants in Greece. The results of both models are similar and et al. (2007)
reliable.
Fuzzy model Building projects A fuzzy model is built based on four inputs and one output in which a set of if-then Yang and Xu
rules, the center of gravity fuzzification, the product inference engine, and singleton (2010)
fuzzifier are applied. The maximal error is 3.2%.
Fuzzy model Building projects This study applied index values for membership degree and exponential smoothing Shi et al. (2010)
method to develop a construction cost model.
Fuzzy neural Cost estimation An evolutionary fuzzy neural network model was developed for cost estimation Zhu et al. (2010)
network based on 18 examples and 2 examples for training and testing, respectively. GA is
used to avoid sinking in local minimum results.
Fuzzy logic Building projects Based on 106 building projects in the Gaza Strip in 2012, a fuzzy logic model was El Sawalhi (2012)
developed for building projects with acceptable error and good generalization.
Regression Building projects Based on 530 examples, three cost models composed of nine parameters were Kim et al. (2004)
analysis, NN, and developed to predict the building costs in Korea in 2004. The NN model produces
CBR better results than the CBR and regression models. However, CBR produces better
results than NN for long-term use due to updating cases to the CBR system.
Neurogenetic Residential buildings Based on 530 cases of residential buildings, a neurofuzzy cost estimation model was Kim et al. (2005)
built in which GA is applied for optimizing BP algorithm parameters. Such an
approach has more accurate results than a trial and error BP algorithm.
Neurofuzzy Residential This study developed an adaptive neurofuzzy model for cost estimation for Yu and
construction projects residential construction projects. Such a model is the integration of the ratio Skibniewski
estimation method with the adaptive neurofuzzy method to obtain mining assessment (2010)
knowledge that is not available in traditional approaches.
Neurofuzzy and GA Semiconductor Based on 54 case studies of semiconductor hookup construction, a neurofuzzy cost Hsiao et al. (2012)
hookup construction estimation model was built and optimized by GA. Such a model has an accuracy
better than the conventional cost method by approximately 20%.
Neurofuzzy Water infrastructure Based on 98 examples, a combination of neural networks and fuzzy set theory was Ahiaga-Dagbui
incorporated to develop a more accurate and precise model for water infrastructure et al. (2013)
projects in which MAPE is 0.8%.
Neurofuzzy Water infrastructure Based on 1,600 water infrastructure projects in the United Kingdom, a neurofuzzy Tokede et al.
projects hybrid cost model has been built in which max-product composition produces better (2014)
results than max-min composition.
Regression Building projects A logarithmic regression model has been developed to examine the project time–cost Love et al. (2005)
relationship. Projects in various Australian states have performed a transformed
regression model (semilog) to estimate a building cost index based on historical
construction projects in several markets (Wheaton and Simonton 2007).
Regression Building projects A semilog model has used to predict the cost of residential construction projects in Stoy et al. (2008)
which the MAPE for the semilog model (9.6%) is more than a linear regression
model (9.7%). The previous result proved that semilog models may produce a more
accurate model than plain regression models. However, this is not a rule; in other
words, plain regression models may produce more accurate and simpler models than
transformed models.
Regression Building projects A semilog regression model was performed to develop cost models for residential Stoy et al. (2012)
building projects in Germany. The most significant variables were identified by
backward regression method. For the selected population, the proposed model has a
prediction accuracy of 7.55%.
ANNs and MRA Highway Gardner et al. (2016) built an ANN and MRA for conceptual cost estimating for Gardner et al.
infrastructure Highway infrastructure (2016)
Bayesian regression Masonry retrofit Nasrazadani et al. (2017) have created a Bayesian regression to develop probabilistic Nasrazadani et al.
cost models for retrofit actions based on 167 masonry retrofit projects. (2017)
LASSO regularized Highway construction Zhang et al. (2017) developed a LASSO regularized regression for Forecasting cost Zhang et al.
regression projects of highway construction projects (2017)
ANN and GA Sale prices of real Rafiei and Adeli (2015) have proposed a novel hybrid model of deep belief restricted Rafiei and Adeli
estate units Boltzmann machine and genetic algorithm for estimation of sale prices of real (2015)
estate units
SVM Building construction Based on 62 cases of building construction projects in Korea, the SVM model was An et al. (2007a,
project developed to evaluate conceptual cost estimation. Such a model can help clients to b)
know the quality and accuracy of cost prediction.
SVM Building projects This study utilized the theory of the rough set (RS) with SVM to improve the HongWei (2009)
prediction accuracy. RS was used for attributes reduction.
SVM and ANNs Building projects Based on 92 building projects, ANNs and SVM were used to predicted cost and Wang et al. (2012)
schedule success at the conceptual stage. Such a model has a prediction accuracy of
92% and 80% for cost success and schedule success, respectively.
SVM Commercial building Based on 84 cases of commercial building projects, a principal component analysis Son et al. (2012)
projects method was developed into SVM to predict cost estimate based on project
parameters.
(Siddique and Adeli 2013). For example, ElMousalami et al. models where hybrid models represent all combined methods such
(2018b) have identified the key cost drivers based on both quali- as fuzzy neural network and evolutionary fuzzy hybrid neural net-
tative and quantitative techniques. Consequently, ElMousalami work. As shown in Fig. 15(a), the percentages of the catego-
et al. (2018b) have conducted two machine learning models by uti- ries are 27%, 25%, 14%, 13%, 11%, and 10% for hybrid models,
lizing MRA and ANNs. Accordingly, the selected quadratic regres- ANNs, fuzzy models, regression, SVM, and CBR, respectively.
sion model produces a prediction accuracy of 9.12% and 7.82% for These percentages indicate that hybrid models are the current trend
training and validation, respectively. Similarly, Marzouk and Elkadi in parametric cost estimate modeling in which the researchers use
(2016) have identified the key cost drivers based on qualitative such hybrid models to enhance the performance of the developed
techniques such as questionnaires and quantitative techniques such model and the accuracy of the prediction results. In addition, hybrid
as exploratory factor analysis. Consequently, the next stage is the models avoid the limitations of a single method. For example, the
model development. Marzouk and Elkadi (2016) have applied hybrid model of ANNs and FL produces a neurofuzzy model that
ANNs when the MAPE for test sets was 21.18%. provides uncertainty for ANNs. On the other hand, the ANN model
Fan et al. (2006) have developed a decision tree approach for provides learning ability to the fuzzy system.
investigating the relationship between house prices and housing The second percentage of 25% represents the use of ANNs,
characteristics. Moussa et al. (2006) have established a decision which is a powerful ML technique to represent nonlinear data.
tree model using integrated multilevel stochastic networks. Cao The third percentage is 14% that represents fuzzy models. The
et al. (2018) have proposed a multilayer ensemble of methods for fuzzy model should be widely conducted since the fuzzy model
prediction of unit price bids of resurfacing highway projects based provides vagueness and uncertainty to the results and more reli-
on more than 1,400 projects. The ensemble of methods was com- able prediction to the future real-world cases. The fourth percent-
posed of gradient boosting machine, XGBoost, and RF models, of age is 13% that represents the regression model. Generally, the
which the XGBoost has the best accuracy. Monte Carlo simulation regression model has been widely conducted because of its sim-
and a multiple linear regression model have been developed as a plicity. However, ANNs can provide better results than the regres-
benchmark model to evaluate the model’s performance where the sion model, specifically with nonlinear data. SVM and CBR have
MAPE was 7.56%. Wang and Ashuri (2016) have developed a similarly small percentages. However, CBR represents a promis-
highly accurate model based on random tree ensembles to predict ing technique in which a CBR works as an incremental search
a construction cost index in which the model’s accuracy has
engine for similar cases.
reached 0.8%. Williams and Gong (2014) have built a stacking en-
Based on the reviewed studies as shown in Table 4 and
semble learning and text mining to estimate the cost overrun using
Fig. 15(a), the survey has been classified into four main categories
the project contract document in which the accuracy was 44%.
to represent the projects used for the cost estimate. These categories
Chou and Lin (2012) have established an ensemble learning model
are buildings, highway, water infrastructure, and other projects. The
of ANNs, SVM, and a decision tree for predicting the potential for
buildings category includes residential, industrial, and commercial
disputes in public–private partnership (PPP) with an accuracy
building projects, whereas the highway category includes highway,
of 84%.
road, pavement maintenance, and bridge construction projects.
Water infrastructure includes wastewater treatment and water infra-
Review Analysis and Discussion for Cost Modeling structure projects. Other projects include tunnel projects, steel proj-
Techniques ects, and telecommunication towers, etc.
As shown in Fig. 15(b), the building category represents 48% of
Based on the reviewed studies shown in Table 4, the survey has all collected projects, whereas the other projects category repre-
been classified into six main categories to represent models used sents 30%, the highway category represents 13% of all collected
for cost model development. These categories are ANN model, projects, and the water infrastructure category represents 9%. Sub-
FL model, regression model, SVM model, CBR model, and hybrid sequently, building projects and highway projects have the greatest
valves (P3 ), and construction year (P4 ). Accordingly, a total of the model interpretability (Kuncheva 2004). RF (M10) is more a
144 FCIPs during 2010 and 2015 have been collected. robust algorithm against noisy data or big data than the DT (M16)
For validation purposes, this collected sample has randomly algorithm (Breiman 1996; Dietterich 2000). However, the RF algo-
branched into a training sample (111 instances) and a testing sam- rithm is unable to interpret the importance of features or the mecha-
ple (33 instances). The training sample in the present case study of nism of producing the results.
111 instances would be sufficiently acceptable to train reliable ML DNNs (M15) produce 12.059% MAPE, which is less accurate
models (Green 1991). than all the developed MLP models (M4, M5, and M8). Accord-
ingly, DNNs provide bad performances with a small data set.
Conversely, deep learning and DNNs can produce the most accu-
Comparison and Analysis rate performance with high-dimension data (LeCun et al. 2015).
MAPE and R2 have been validated for the 20 developed models as An alternative to the black box nature of ANNs and DNNs, DTs
displayed in Table 5. The whole developed models have been generate logic statements and interpretable rules that can be used
sorted in descending order from M1 to M20 based on MAPE. for identifying the importance of data features (Perner et al. 2001).
ElMousalami et al. (2018b) have presented a quadratic regression Another advantage of DT is avoiding the curse of dimensionality
model (M2) as the most accurate for FCIPs among the developed and providing a high performance computing efficiency through its
regression and ANN models (M3, M4, M5, M6, M8, M13, and splitting procedure (Prasad et al. 2006). However, DT produces
M14) with 9.120 and 0.851 for MAPE and R2 , respectively. How- unsatisfactory performance for time series, noisy, or nonlinear data
ever, this study presents that XGBoost (M1) is more accurate than (Curram and Mingers 2017). Although DT (CART) is inherently
quadratic regression (M2). XGBoost (M1) has obtained the first used as a base learner for the ensemble methods, DT (M16) pro-
place slightly higher than M2 with 9.091% and 0.929 for MAPE duces 12.488% MAPE, which is less accurate than all developed
and R2 , respectively. Moreover, the unique advantage of the ensemble methods (M1, M7, M9, M10, M11, and M12). Therefore,
XGBoost is its high scalability; it can process noisy data and fit ensemble methods produce better performance than a single learn-
high dimension data without overfitting. XGBoost applies parallel ing algorithm. Moreover, ensemble methods can effectively handle
computing to effectively reduce computational complexity and missing values and noisy data because of their scalability.
learn faster (Chen and Guestrin 2016). Another key advantage of Ensemble methods and data transformation play an important
XGBoost is handling the missing values where defaults direction is role in prediction accuracy. However, the main gap of the previous
identified. Accordingly, no effort is needed for cleaning the col- models is the lack of uncertainty modeling in the prediction cost
lected data. model. Therefore, fuzzy logic theory has been conducted to main-
Ensemble methods such as Extra Trees (M7), bagging (M9), tain uncertainty concept through the fuzzy logic model (M17) as
RF (M10), AdaBoost (M11), and SGB (M12) have produced high, shown in Fig. 16 and hybrid fuzzy model (M20) as shown in
acceptable performance with accuracy ranging from 9.714% to Fig. 17. The number of generated rules by the fuzzy genetic model
11.008%. The ensemble learning methods can effectively deal with (M17) is 63 rules, and the MAPE is 14.7%. On the other hand, a
the problems of high-dimension data, complex data structures, and traditional fuzzy logic model (M20) has been built based on the
small sample size. Bagging algorithms can increase generalization experts’ experience in which a total of 190 rules were generated
by decreasing variance error (Breiman 1998), whereas boosting can to cover all the possible combinations of the fuzzy system, and
improve generalization by decreasing bias error (Schapire et al. the MAPE is 26.3%. Moreover, the fuzzy rules (if-then rules) gen-
1998). Ensemble methods can effectively handle continuous, cat- erated by experts have redundant rules that can be deleted to im-
egorical, and dummy features with missing values. However, en- prove the model computation and performance. Moreover, the
semble methods increase the model complexity, which decreases experts’ knowledge cannot cover all combinations to represent
M9 Providing higher performance than a single Depending on other algorithms performance No No Yes
algorithm
M10 Accurate and high performance on many No interpretability, need to choose the number No No Yes
problems including nonlinear of trees
M11 High scalability, and high adaptability Depends on other algorithms’ performance No No Yes
M12 Handling difficult examples High sensitivity to noisy data No No Yes
M13 Handling data nonlinearity and training Unable to capture complex patterns Yes No No
small sample size
M14 Handling data nonlinearity and training Unable capture complex patterns Yes No No
small sample size
M15 Capturing complex patterns, processing big Sufficient training data and high cost computation No No No
data and high performance computing
M16 Working on both linear and nonlinear data, Poor results on too small data sets, overfitting can Yes No No
and producing logical expressions easily occur
M17 Handling uncertainty and more accurate than More complex than fuzzy model and needs more Yes Yes No
fuzzy model computational resources
M18 Handling small data sets, simple and needs Poor performance for case in which the optimal Yes No No
less computational time case cannot be retrieved
M19 Easily adaptable, works very well on Compulsory to apply feature scaling, more No No No
nonlinear problems, not biased by outliers difficult to understand
M20 Handling uncertainty Low accuracy Yes Yes No
have been extracted and summarized. The following points sum- different kinds of construction projects to help project man-
marize the recommendations and future trends: agers and cost estimation engineers.
1. Computational models and information systems have been 6. Automated cost models are prone to many machine learning
applied in business and construction industry to effectively problems such as overfitting issues, and hyper-parameter
improve the job efficiency (Davis 1993). Therefore, the hybrid selection. It is recommended to develop more than one cost
model represents the current trend of parametric cost model- prediction model such as regression, ANNs, FL, or CBR. As
ing to improve the model performance and accuracy so that a result, the researcher can compare the results of the devel-
the limitations of each technique can be avoided. The objective oped models and set a benchmark to select the most accurate
is to develop computerized automated systems with fewer model. In addition, the comparisons of the developed models
interventions of humans to save time and effort, and to enhance the quality of cost estimate and the decisions based
avoid human error for the cost estimate. Moreover, computer on it (Amason 1996).
technologies have a great ability to deal with vast data and 7. There is a need to develop a model that has the ability
complicated computations. to give justification for the model’s results and to give answers
2. AI and CI models such as ANNs, FL, and GA are used widely and interpretations for the predicted cost. That may require a
for hybrid model development. Moreover, ML techniques can higher level of AI and may represent the future trend of cost
be efficiently conducted for the parametric cost modeling. modeling. Moreover, such concept may be generalized for any
Therefore, the cost modeling researcher should firstly study prediction model. The objective is to avoid the estimator’s
ML, CI, and AI techniques. biases, warn the user to the input parameters of the model,
3. CBR represents the increasing importance of ML tools and and avoid the limitation of the black box nature.
data mining techniques for knowledge acquisition, prediction, 8. The conceptual cost estimate is conducted under uncertainty.
and decision making. Specifically, CBR efficiently deals with Therefore, this study recommends using fuzzy theory such as
vast data and has the ability to update the case-base for future FL and to develop a hybrid model based on FL to obtain un-
problem solving. Moreover, Finding similarities and similar certainty for the developed model and produce a more reliable
cases improves the reliability and confidence in the output. performance as illustrated in Fig. 18 (Elmousalami 2019).
4. Hybrid models can be incorporated to CBR to enhance the 9. One of the key challenges in cost estimation accuracy is
performance of CBR such as by applying GA and decision substantial variations over time. Therefore, each cost predic-
tree to optimize attribute weights and by applying regression tion model should have an input time-related macroeconomic
analysis for the revision process. indicator that represents the change in the market inflation
5. Most of the studies focus on building types of construction rate over time because significant variability in material cost
projects, so a need exists to apply cost models widely for or inflation rate can reduce the prediction performance.
ture projects.” In Proc., 29th Annual ARCOM Conf., 2–4. Reading, UK:
Association of Researchers in Construction Management.
Ahn, J., M. Park, H. S. Lee, S. J. Ahn, S. H. Ji, K. Song, and B. S. Son.
ElMousalami et al. (2018b) maintain reliability for future pre-
2017. “Covariance effect analysis of similarity measurement methods
diction based on the following formula: for early construction cost estimation using case-based reasoning.”
Future cost ¼ Predicted cost × ð1 þ IRÞn Autom. Constr. 81 (Sep): 254–266. https://doi.org/10.1016/j.autcon
.2017.04.009.
where IR = average inflation rate; and n = number of years Ajayi, S. O., and L. O. Oyedele. 2018. “Waste-efficient materials procure-
from present to the future time. ment for construction projects: A structural equation modelling of
critical success factors.” Waste Manage. 75: 60–69.
10. ML gives satisfactory performance and accuracy, but it needs
Akintoye, A. 2000. “Analysis of factors influencing project cost estimating
sufficient features and a sufficient data size to train ML algo- practice.” Constr. Manage. Econ. 18 (1): 77–89. https://doi.org/10.1080
rithms. Monte Carlo simulation, regression, and time series /014461900370979.
analysis are not sufficiently robust for cost prediction with Alroomi, A., D. H. S. Jeong, and G. D. Oberlender. 2012. “Analysis of cost-
high variability and nonlinearity of data. Cao et al. (2018) re- estimating competencies using criticality matrix and factor analysis.”
ported that the ensemble learning models provide 37.98% and J. Constr. Eng. Manage. 138 (11): 1270–1280. https://doi.org/10
26.89% more accurate results than the regression method .1061/(ASCE)CO.1943-7862.0000351.
and the Monte Carlo simulation method, respectively, using Amason, A. 1996. “Distinguishing the effects of functional and dysfunc-
mean absolute error (MAE) scale. tional conflict on strategic decision making: Resolving a paradox for top
11. Ensemble methods are promising techniques that can handle management teams.” Acad. Manage. J. 39 (1): 123–148.
a large number of features, model both numerical and catego- Ambrule, V. R., and A. N. Bhirud. 2017. “Use of artificial neural network
for pre design cost estimation of building projects.” Int. J. Recent
rical variables, capture nonlinear patterns, and fit data with
Innovation Trends Comput. Commun. 5 (2): 173–176.
missing values.
An, S.-H., G.-H. Kim, and K.-I. Kang. 2007a. “A case-based reasoning cost
12. Decision tree algorithms and ensemble methods can provide estimating model using experience by analytic hierarchy process.”
an alternative technique to many ML algorithms such as multi- Build. Environ. 42 (7): 2573–2579. https://doi.org/10.1016/j.buildenv
ple regression analysis and ANNs. The study emphasizes the .2006.06.007.
importance of ensemble methods for improving the prediction An, S.-H., U.-Y. Park, K.-I. Kang, M.-Y. Cho, and H.-H. Cho. 2007b.
accuracy and handling noisy and missing data. However, the “Application of support vector machines in assessing conceptual cost
key limitation of the ensemble methods in an inability to inter- estimates.” J. Comput. Civ. Eng. 21 (4): 259–264. https://doi.org/10
pret the produced results. .1061/(ASCE)0887-3801(2007)21:4(259).
Most of the studies focus on building types of construction proj- Anderson, S. D., K. R. Molenaar, and C. J. Schexnayder. 2006. Guidance
ects, so a need exists to apply cost models widely for different kinds for cost estimation and management for highway projects during plan-
ning, programming, and preconstruction. NCHRP Rep. No. 574.
of construction projects. In addition, more reviewed studied means
Washington, DC: Transportation Research Board.
more generalization and better quality of the results. Finally, accu-
Angelov, P. P. 2002. Evolving rule-based models: A tool for design of
rate cost estimate means accurate decisions about the project man- flexible adaptive systems. Wurzburg, Germany: Physica-Verlag.
agement. Therefore, this study has analyzed the past cost modeling Arabzadeh, V., S. T. A. Niaki, and V. Arabzadeh. 2018. “Construction
practices to provide a recent direction for construction cost model- cost estimation of spherical storage tanks: Artificial neural networks and
ing. The study shows that the computational intelligence, artificial hybrid regression—GA algorithms.” J. Ind. Eng. Int. 14 (4): 747.
intelligence, and machine learning techniques have a powerful abil- https://doi.org/10.1007/s40092-017-0240-8.
ity to develop applicable and accurate cost predictive models. Ashuri, B., and J. Lu. 2010. “Time series analysis of ENR construction cost
Moreover, the cost modeling research area needs more studies index.” J. Constr. Eng. Manage. 136 (11): 1227–1237. https://doi.org
to develop intelligent models that have the ability to interpret /10.1061/(ASCE)CO.1943-7862.0000231.
the resulting cost prediction and analyze the input model’s param- Attalla, M., and T. Hegazy. 2003. “Predicting cost deviation in reconstruc-
eters. In addition, this study has provided a list of recommendations tion projects: Artificial neural networks versus regression.” J. Constr.
Eng. Manage. 129 (4): 405–411. https://doi.org/10.1061/(ASCE)0733
and references for cost model developers to build a more practical
-9364(2003)129:4(405).
parametric cost model. Back, W. E., W. W. Boles, and G. T. Fry. 2000. “Defining triangular prob-
ability distributions from historical cost data.” J. Constr. Eng. Manage.
Data Availability Statement 126 (1): 29–37. https://doi.org/10.1061/(ASCE)0733-9364(2000)
126:1(29).
Data generated by the authors or analyzed during the study are Bauer, E., and R. Kohavi. 1999. “An empirical comparison of voting clas-
available at: https://github.com/HaythamElmousalami/Field-canals sification algorithms: Bagging, boosting, and variants.” Mach. Learn.
-improvement-projects-FCIPs-. 36 (1–2): 105–139. https://doi.org/10.1023/A:1007515423169.
estimate using fuzzy logic.” Int. J. Emerging Technol. Adv. Eng. 2 (4): Trans. Pattern Anal. Mach. Intell. 12 (10): 993–1001. https://doi.org/10
631–636. .1109/34.58871.
El-Sawalhi, N. I., and O. Shehatto. 2014. “A neural network model for Hastie, T., R. Tibsharani, and J. Friedman. 2009. The elements of statistical
building construction projects cost estimating.” J. Constr. Eng. Project learning: Data mining, inference, and prediction. 2nd ed. New York:
Manage. 4 (4): 9–16. https://doi.org/10.6106/JCEPM.2014.4.4.009. Springer. https://doi.org/10.1007/b94608.
ElSawy, I., H. Hosny, and M. Abdel Razek. 2011. “A neural network model Hays, W. L. 1983. “Review of using multivariate statistics. [Review of the
for construction projects site overhead cost estimating in Egypt.” Int. J. book Using Multivariate Statistics. B. G. Tabachnick & L. S. Fidell].”
Comput. Sci. Issues 8 (3): 273–283. Contemp. Psychol. 28 (8): 642. https://doi.org/10.1037/022267.
Emami, M. R., I. B. Turksen, and A. A. Goldberg. 1998. “Development of a Hegazy, T. 2014. Computer-based construction project management.
systematic methodology of fuzzy logic modeling.” IEEE Trans. Fuzzy Essex: Pearson Education.
Syst. 6 (3): 346–361. https://doi.org/10.1109/91.705501. Hegazy, T., and A. Ayed. 1998. “Neural network model for parametric
Emsley, M. W., D. J. Lowe, A. R. Duff, A. Harding, and A. Hickson. 2002. cost estimation of highway projects.” J. Constr. Eng. Manage. 124 (3):
“Data modeling and the application of a neural network approach to the 210–218. https://doi.org/10.1061/(ASCE)0733-9364(1998)124:3(210).
prediction of total construction costs.” Constr. Manage. Econ. 20 (6): Holland, J. H. 1975. Adaptation in natural and artificial systems. Ann
465–472. https://doi.org/10.1080/01446190210151050. Arbor, MI: University Michigan Press.
Engelbrecht, A. P. 2002. Computational intelligence: An introduction. HongWei, M. 2009. “An improved support vector machine based on rough
New York: Wiley. set for construction cost prediction.” In Vol. 2 of Proc., 2009 Int. Forum
on Computer Science-Technology and Applications. New York: IEEE.
Erensal, Y. C., T. Öncan, and M. L. Demircan. 2006. “Determining key
Hopfield, J. J. 1982. “Neural networks and physical systems with emergent
capabilities in technology management using fuzzy analytic hierarchy
collective computational abilities.” Proc. Natl. Acad. Sci. 79 (8): 2554–
process: A case study of Turkey.” Inf. Sci. 176 (18): 2755–2770. https://
2558. https://doi.org/10.1073/pnas.79.8.2554.
doi.org/10.1016/j.ins.2005.11.004.
Hsiao, F.-Y., S.-H. Wang, W.-C. Wang, C.-P. Wen, and W.-D. Yu. 2012.
Fan, G. Z., S. E. Ong, and H. C. Koh. 2006. “Determinants of house price:
“Neuro-fuzzy cost estimation model enhanced by fast messy genetic
A decision tree approach.” Urban Stud. 43 (12): 2301–2315. https://doi
algorithms for semiconductor hookup construction.” Comput.-Aided
.org/10.1080/00420980600990928.
Civ. Infrastruct. Eng. 27 (10): 764–781. https://doi.org/10.1111/j.1467
Fan, R. E., K. W. Chang, C. J. Hsieh, X. R. Wang, and C. J. Lin. 2008.
-8667.2012.00786.x.
“LIBLINEAR: A library for large linear classification.” J. Mach. Learn.
Hsu, C.-C., and B. A. Sandford. 2007. “The Delphi technique: Making
Res. 9 (Aug): 1871–1874.
sense of consensus.” Pract. Assess. Res. Eval. 12 (10): 1–8.
Field, A. 2009. Discovering statistics using SPSS for windows. London:
Hsu, Y.-L., C.-H. Lee, and V. Kreng. 2010. “The application of fuzzy
Sage Publications. Delphi method and fuzzy AHP in lubricant regenerative technology
Flom, P. L., and D. L. Cassell. 2007. “Stopping stepwise: Why stepwise and selection.” Expert Syst. Appl. 37 (1): 419–425. https://doi.org/10
similar selection methods are bad, and what you should use.” In Proc., .1016/j.eswa.2009.05.068.
2007 Conf. NorthEast SAS Users Group (NESUG): Statistics and Data Huang, S.-C., and Y.-F. Huang. 1991. “Bounds on the number of hidden
Analysis. Portland, OR: NorthEast SAS Users Group. neurons in multilayer neurons.” IEEE Trans. Neural Networks 2 (1):
Freund, Y., and R. E. Schapire. 1996. “Experiments with a new boosting 47–55. https://doi.org/10.1109/72.80290.
algorithm.” In Proc., 13th Int. Conf. on Machine Learning. Princeton, Hutcheson, G., and N. Sofroniou. 1999. The multivariate social scientist.
NJ: International Machine Learning Society. London: Sage.
Freund, Y., and R. E. Schapire. 1997. “A decision-theoretic generalization Ilbeigi, M., B. Ashuri, and A. Joukar. 2016. “Time-series analysis for fore-
of on-line learning and an application to boosting.” J. Comput. Syst. Sci. casting asphalt-cement price.” J. Manage. Eng. 33 (1): 04016030.
55 (1): 119–139. https://doi.org/10.1006/jcss.1997.1504. https://doi.org/10.1061/(ASCE)ME.1943-5479.0000477.
Friedman, J., T. Hastie, and R. Tibshirani. 2000. “Additive logistic regres- Ishikawa, A., M. Amagasa, T. Shiga, G. Tomizawa, R. Tatsuta, and
sion: A statistical view of boosting (with discussion and a rejoinder by H. Mieno. 1993. “The max-min Delphi method and fuzzy Delphi
the authors).” Ann. Stat. 28 (2): 337–407. https://doi.org/10.1214/aos method via fuzzy integration.” Fuzzy Sets Syst. 55 (3): 241–253.
/1016218223. https://doi.org/10.1016/0165-0114(93)90251-C.
Friedman, J., T. Hastie, and R. Tibshirani. 2001. Vol. 1 of The elements of Ji, S.-H., J. Ahn, E.-B. Lee, and Y. Kim. 2018. “Learning method for
statistical learning. New York: Springer. knowledge retention in CBR cost models.” Autom. Constr. 96 (Dec):
Gardner, B. J., D. D. Gransberg, and H. D. Jeong. 2016. “Reducing data- 65–74. https://doi.org/10.1016/j.autcon.2018.08.019.
collection efforts for conceptual cost estimating at a highway agency.” Ji, S.-H., M. Park, and H.-S. Lee. 2012. “Case adaptation method of case-
J. Constr. Eng. Manage. 142 (11): 04016057. https://doi.org/10.1061 based reasoning for construction cost estimation in Korea.” J. Constr.
/(ASCE)CO.1943-7862.0001174. Eng. Manage. 138 (1): 43–52. https://doi.org/10.1061/(ASCE)CO.1943
Geurts, P., D. Ernst, and L. Wehenkel. 2006. “Extremely randomized -7862.0000409.
trees.” Mach. Learn. 63 (1): 3–42. https://doi.org/10.1007/s10994-006 Jin, R., K. Cho, C. Hyun, and M. Son. 2012. “MRA-based revised CBR
-6226-1. model for cost prediction in the early stage of construction projects.”
Green, B. N., C. D. Johnson, and A. Adams. 2006. “Writing narrative Expert Syst. Appl. 39 (5): 5214–5222. https://doi.org/10.1016/j.eswa
literature reviews for peer-reviewed journals: Secrets of the trade.” .2011.11.018.
doi.org/10.1016/j.jchromb.2012.05.020.
.1177/001316446002000116.
Liu, W.-K. 2013. “Application of the fuzzy Delphi method and the fuzzy
Kaiser, H. F. 1970. “A second generation little jiffy.” Psychometrika 35 (4):
analytic hierarchy process for the managerial competence of multina-
401–415. https://doi.org/10.1007/BF02291817.
tional corporation executives.” IJEEEE 3 (4): 313–317. https://doi.org
Kaiser, H. F. 1974. “An index of factorial simplicity.” Psychometrika 39 (1):
/10.7763/IJEEEE.2013.V3.248.
31–36. https://doi.org/10.1007/BF02291575.
Loop, B. P., S. D. Sudhoff, S. H. Zak, and E. L. Zivi. 2010. “Estimating
Kan, P. 2002. “Parametric cost estimating model for conceptual cost esti- regions of asymptotic stability of power electronics systems using
mating of building construction projects.” Ph.D. thesis, Faculty of the genetic algorithms.” IEEE Trans. Control Syst. Technol. 18 (5):
Graduate School, Univ. of Texas. 1011–1022. https://doi.org/10.1109/TCST.2009.2031325.
Karatas, Y., and F. Ince. 2016. “Feature article: Fuzzy expert tool for small Love, P. E. D., R. Y. C. Tse, and D. J. Edwards. 2005. “Time–cost relation-
satellite cost estimation.” IEEE Aerosp. Electron. Syst. Mag. 31 (5): ships in Australian building construction projects.” J. Constr. Eng.
28–35. https://doi.org/10.1109/MAES.2016.140210. Manage. 131 (2): 187–194. https://doi.org/10.1061/(ASCE)0733-9364
Kass, R. A., and H. E. A. Tinsley. 1979. “Factor analysis.” J. Leisure Res. (2005)131:2(187).
11 (4): 120–138. Lowe, D. J., M. W. Emsley, and A. Harding. 2006. “Predicting construction
Kim, G. H., D. S. Seo, and K. I. Kang. 2005. “Hybrid models of neural cost using multiple regression techniques.” J. Constr. Eng. Manage.
networks and genetic algorithms for predicting preliminary cost esti- 132 (7): 750–758. https://doi.org/10.1061/(ASCE)0733-9364(2006)
mates.” J. Comput. Civ. Eng. 19 (2): 208–211. https://doi.org/10.1061 132:7(750).
/(ASCE)0887-3801(2005)19:2(208). Ma, L., S. Shen, J. Zhang, Y. Huang, and F. Shi. 2010. “Application of
Kim, G.-H., S.-H. An, and K.-I. Kang. 2004. “Comparison of construction fuzzy analytic hierarchy process model on determination of optimized
cost estimating models based on regression analysis, neural networks, pile-type.” Front. Archit. Civ. Eng. China 4 (2): 252–257. https://doi
and case-based reasoning.” Build. Environ. 39 (10): 1235–1242. https:// .org/10.1007/s11709-010-0017-2.
doi.org/10.1016/j.buildenv.2004.02.013. MacCallum, R. C., K. F. Widaman, S. Zhang, and S. Hong. 1999. “Sample
Kim, G.-H., J.-M. Shin, S. Kim, and Y. Shin. 2013. “Comparison of school size in factor analysis.” Psychol. Methods 4 (1): 84–99. https://doi.org
building construction costs estimation methods using regression analy- /10.1037/1082-989X.4.1.84.
sis, neural network, and support vector machine.” J. Build. Constr. Makridakis, S., S. C. Wheelwright, and R. J. Hyndman. 1998. Forecasting
Plann. Res. 1 (1): 1–7. https://doi.org/10.4236/jbcpr.2013.11001. methods and applications. New York: Wiley.
Kim, K. J., and K. Kim. 2010. “Preliminary cost estimation model Mamdani, E. H., and S. Assilian. 1974. “Application of fuzzy algorithms
using case-based reasoning and genetic algorithms.” J. Comput. Civ. for control of simple dynamic plant.” Proc., Institution of Electrical
Eng. 24 (6): 499–505. https://doi.org/10.1061/(ASCE)CP.1943-5487 Engineers 121 (12), 1585–1588.
.0000054. Manoliadis, O. G., J. P. Pantouvakis, and S. E. Christodoulou. 2009.
Kim, S. 2013. “Hybrid forecasting system based on case-based reasoning “Improving qualifications-based selection by use of the fuzzy Delphi
and analytic hierarchy process for cost estimation.” J. Civ. Eng. Manage. method.” Constr. Manage. Econ. 27 (4): 373–384. https://doi.org/10
19 (1): 86–96. https://doi.org/10.3846/13923730.2012.737829. .1080/01446190902758993.
Kim, S., S. Chin, and S. Kwon. 2019. “A discrepancy analysis of BIM- Marzouk, M., and M. Alaraby. 2014. “Predicting telecommunication tower
costs using fuzzy subtractive clustering.” J. Civ. Eng. Manage. 21 (1):
based quantity take-off for building interior components.” J. Manage.
67–74. https://doi.org/10.3846/13923730.2013.802736.
Eng. 35 (3): 05019001. https://doi.org/10.1061/(ASCE)ME.1943-5479
Marzouk, M., and A. Amin. 2013. “Predicting construction materials
.0000684.
prices using fuzzy logic and neural networks.” J. Constr. Eng. Manage.
Kline, P. 1999. The handbook of psychological testing. 2nd ed. London:
139 (9): 1190–1198. https://doi.org/10.1061/(ASCE)CO.1943-7862
Routledge.
.0000707.
Klir, G. J., and B. Yuan. 1995. Fuzzy sets and fuzzy logic theory and
Marzouk, M., and M. Elkadi. 2016. “Estimating water treatment plants
applications. Upper Saddle River, NJ: Prentice Hall. costs using factor analysis and artificial neural networks.” J. Clean.
Knight, K., and A. R. Fayek. 2002. “Use of fuzzy logic for predicting Prod. 112 (Part 5): 4540–4549. https://doi.org/10.1016/j.jclepro.2015
design cost overruns on building projects.” J. Constr. Eng. Manage. .09.015.
128 (6): 503–512. https://doi.org/10.1061/(ASCE)0733-9364(2002) Marzouk, M. M., and R. M. Ahmed. 2011. “A case-based reasoning ap-
128:6(503). proach for estimating the costs of pump station projects.” J. Adv. Res.
Kohavi, R., and G. John. 1997. “Wrappers for feature subset selection.” 2 (4): 289–295. https://doi.org/10.1016/j.jare.2011.01.007.
Artif. Intell. 97 (1/2): 273–324. https://doi.org/10.1016/S0004-3702 McCulloch, W. S., and W. H. Pitts. 1943. “A logical calculus of the ideas
(97)00043-X. imminent in nervous activity.” Bull. Math. Biophys. 5 (4): 115–133.
Kolodner, J. L. 1992. “An introduction to case-based reasoning.” Artif. https://doi.org/10.1007/BF02478259.
Intell. Rev. 6 (1): 3–34. https://doi.org/10.1007/BF00155578. Moselhi, O., and T. Hegazy. 1993. “Markup estimation using neural net-
Kuncheva, L. I. 2004. Combining pattern classifiers: Methods and work methodology.” Comput. Syst. Eng. 4 (2–3): 135–145. https://doi
algorithms. New York: Wiley. .org/10.1016/0956-0521(93)90039-Y.
Kursa, M., and W. Rudnicki. 2010. “Feature selection with the Boruta Moussa, M., J. Ruwanpura, and G. Jergeas. 2006. “Decision tree modeling
package.” J. Stat. Software 36 (11): 1–13. using integrated multilevel stochastic networks.” J. Constr. Eng.