Improved Similarity Measure in Case-Based Reasoning With Global Sensitivity Analysis - An Example of Construction Quantity Estimating

Improved Similarity Measure in Case-Based Reasoning
with Global Sensitivity Analysis: An Example of

Construction Quantity Estimating
Jing Du, Ph.D. 1; and Jeff Bormann, M.Sc. 2
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/31/18. Copyright ASCE. For personal use only; all rights reserved.
Abstract: In recognition of the importance of historical knowledge in decision making, case based reasoning (CBR) is utilized as a form of
an expert system to tackle construction management issues such as quantity takeoff in the proposal development phase of a project. It builds
on a proposition that past projects similar to the new one would suggest a reasonable range of craft quantities. This paper finds that when
measuring the similarity between the new project and historical projects, traditional similarity measure methods fail to consider the non-
linearity and muticollinearity embedded in the problem, as well as differences across crafts. An innovative similarity measurement algorithm
was therefore proposed to tackle the above issues with a carefully designed orthogonalization process and Sobol’s total sensitivity analysis.
The application of the proposed algorithm to the craft quantity takeoff of a power plant project was introduced, demonstrating a better result
compared with traditional methods. It is likely that the proposed algorithm will advance current CBR practices in construction management.
DOI: 10.1061/(ASCE)CP.1943-5487.0000267. © 2014 American Society of Civil Engineers.
Author keywords: Conceptual cost estimation; Quantity takeoff; Case based reasoning; Global sensitivity analysis; Sobol’s TSI; Artificial
neural networks; Principal component analysis.
Introduction measurements between two cases (Changchien and Lin 2005). It

has been confirmed that a proper similarity measure has a signifi-
Quantity takeoff is an important task in the proposal development cant influence on the efficiency and accuracy of case retrieval
phase of a project (Alder 2006). Estimators develop estimates under (Changchien and Lin 2005). As a result, current research efforts
extreme time pressures and with very limited information (Alder have been made to refine similarity measurement methodologies
2006). The probability of inaccurate quantity estimates and and weight determination algorithms (Watson 1999). However, a
unqualified budgets further increase the financial risk in the closer examination has revealed certain limitations of current
construction execution phase (Carr 1989). In recognition of the similarity measurement methodologies.
importance of relevant historical experience, case-based reasoning First, typical CBR analysis builds on the questionable
(CBR) is used as an alternative to enhance conceptual quantity assumption that “cases with similar problem descriptions (with
estimating (Perera and Watson 1998). CBR solves new problems the new case) will refer to similar problems and hence similar
by adopting past solutions that were proven to be successful solutions” (Watson 1999). In quantity estimating it means if two
(Ashley 2006). A new problem is compared with historical projects are very similar in project characteristics (or project param-
problems, and similar problems are retrieved to provide feasible eters), they will likely end up with similar craft quantities. A deeper
solutions. This process is very similar to how an expert solves a belief in this assumption is that there is a simple and straightfor-
problem, and thus CBR is considered as an alternative expert ward projection between features and solutions, which is not
system (Kim et al. 2004a). At present, CBR has been applied to always justified. Imagine that there are two spaces: the feature
tackle a variety of construction related issues, including cost and space and the solution space (Fig. 1), and fðxi Þ is the function that
schedule estimating (Kim et al. 2004), structural design (Heylighen projects the features of a case to its solution space such as a regres-
and Neuckermans 2001), safety issues (Hu and Zhang 2008), sion function that predicts construction quantities (solution) based
bidding decision making (Chua et al. 2001), and construction on project parameters (features). In a typical CBR analysis, the sim-
management (Yau and Yang 1998a). ilarity between two cases is measured by the case-to-case distance
A typical CBR system follows four steps (Kim et al. 2004): in the feature space (Dij , Fig. 1), i.e., the similarity of project
(1) case representation; (2) case retrieval; (3) case adaption; and parameters as in conceptual estimating. However, it should be
(4) case update. A critical step of CBR analysis is case retrieval, noted that the ultimate purpose of a CBR analysis is to find the
i.e., retrieving historically similar cases based on similarity most similar solution, i.e., the nearest case in the solution space
(Dij0 , Fig. 1). For example, in conceptual estimating the target of
1
Assistant Professor, Dept. of Construction Science, Univ. of an estimator is not searching for the most similar historical project,
Texas at San Antonio, San Antonio, TX 78207 (corresponding author). but to figure out a project that required similar amount of resources
E-mail: jing.du@utsa.edu as the target project would need. This is not completely determined
2
Director, Dept. of Project Controls, Zachry Holdings, Inc., 527
by the similarity of characteristics between two projects. Rather
Logwood Ave., San Antonio, TX 78221. E-mail: bormannj@zhi.com
Note. This manuscript was submitted on July 17, 2012; approved on
what matters is the relationship between project characteristics
October 24, 2012; published online on October 29, 2012. Discussion period and construction resources, i.e., how project characteristics affect
open until August 14, 2014; separate discussions must be submitted for construction resources. If such a relationship is simple and straight-
individual papers. This paper is part of the Journal of Computing in Civil forward, then a similar solution is directly identified by looking for
Engineering, © ASCE, ISSN 0887-3801/04014020(18)/$25.00. similar features [Fig. 1(a)]; but if such a relationship is complex and
© ASCE 04014020-1 J. Comput. Civ. Eng.
J. Comput. Civ. Eng., 2014, 28(6): 04014020

Fig. 1. Does similar feature refer to similar solution?
nonlinear, then searching for the most similar features is not This paper aims to introduce an innovative case retrieval algo-
sufficient for figuring out the most similar solution [Fig. 1(b)]. rithm for CBR analysis that meets the following requirements:
In estimating practices, we have observed certain cases where a (1) similarity between two cases is measured with an overall
slight change of project parameters results in significant differences consideration of both feature similarity and feature-solution pro-
in craft quantities. Assigning weights to case features (to reflect jection functions; (2) similarity is estimated on a craft-by-craft
relative importance) is ineffective and mostly insufficient to capture basis so differences across crafts is considered; and (3) the inter-
such nonlinear complexity. In contrast, a proper method should be dependences among project parameters are counted to capture
able to capture the projection function fðxi Þ, which projects the the pure effects. This paper considers the similarity measurement
distance in the feature space to the solution space. In other words, of CBR as a projection process from the feature space to the sol-
a similarity measure is given by the measurement of distances in the ution space, and global sensitivity analysis (GSA) is employed to
solution space discover the quantitative relationship in the projection process.
An artificial neural network (ANN) is utilized to deal with the
Simij ¼ Dij0 ¼ f½Dij ; fðxi ; xj Þ ð1Þ nonlinearity of the problem, and principal component analysis
(PCA) is applied to deal with the multicollinearity issue. In the
remainder of this paper, relevant theories and proposed algorithm
Second, even if the assigned weights can reflect the relative are introduced in detail.
importance of different case features, they should not be identical
across different crafts. Otherwise, it probably gives biased conclu-
sions since the relationship between project parameters and quan-
tities of different crafts is very unique on a craft-by-craft basis. Literature Review
For example, in a power plant project the main rack distance is
a critical factor driving estimating pipes, but for underground con- Case Retrieval of Case-Based Reasoning
duit the total distance between the major working areas is more A central task of CBR analysis is accurate case retrieval, which
important. Consequentially, main rack distance and total distance
aims to select one or more similar historical cases from the case
between major working areas should be assigned different weights
library so that solutions can be adapted for new problems. Typical
when different crafts are considered.
subtasks of case retrieval include identifying case features, initial
The third issue roots from the correlation among project
parameters. In practice, we found project parameters are often match, search, and selection (Aamodt and Plaza 1994). In general
highly correlated. Such correlation leads to multicollinearity of case retrieval methodologies can be categorized into four types
the input data and the effects of multicollinearity may significantly (Watson 1999; Shin and Han 2001): (1) nearest neighbor, which
distort the analysis results (Farrar and Glauber 1967). For example, retrieves matched cases according to a weighted distance of fea-
job configuration and total number of equipment are two tures between cases; (2) induction, which identifies patterns
interdependent variables. If assigning too much importance to amongst cases and classifies the cases into clusters according to
job configuration and total number of equipment in CBR-oriented problem descriptions; (3) fuzzy logics, which formalizes the
conceptual estimating, results could be too sensitive to the slight symbolic processing of fuzzy linguistic terms associated with
change of either one of them, and in certain cases, the estimated differences in the attributes describing case features; and (4) knowl-
quantities will be exaggerated. A method that can differentiate edge guided, which applies existing domain and experimental
the pure effects and interaction effects of a case feature is needed. knowledge to locate relevant cases.
J. Comput. Civ. Eng., 2014, 28(6): 04014020

Nearest neighbor is the most commonly used approach (Shin of appropriate weights becomes very difficult. Existing methods
and Han 1999, 2001; Watson 1999; Changchien and Lin 2005). are either uneconomic, such as genetic algorithms (Chiu 2002),
Nearest neighbor algorithms all work in a similar fashion: first, or stuck in local optimum such as gradient descent methods (Yau
the similarity of the target case to a case in the case library for and Yang 1998a, b). Following the information theory (Shannon
each case feature is determined, and this measure can be 1949), it is argued that the importance of a case feature is indeed
multiplied by a weighting factor. Then the sum of the similarities determined by its contribution to the variation of dependent
of all features is calculated to provide a measure of the overall variables—outcomes (Saltelli et al. 2008). The weight of a case
similarity of that case in the library to the target case (Watson feature should be able to reflect the projection function from
1999). A critical issue is the similarity measurement between the feature space to the solution space. To realize this target,
two cases (Changchien and Lin 2005). Many approaches have sensitivity analysis (SA) can be applied.
been proposed, such as kernel methods (Fyfe and Corchado
2002), the similarity flooding algorithm (Madhusudan et al.
2004), and fuzzy similarity indices (Slonim and Schneider 2001). Sensitivity Analysis
One of the most obvious measures of similarity between two SA is used in this paper to compute relative importance of project
cases is through the distance (Shin and Han 1999; Changchien parameters, i.e., to what extent the variations of craft quantities are
and Lin 2005), as shown in Eq. (2) attributable to different variations in project parameters (Saltelli
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi et al. 2008). In a comprehensive review of SA literature, Saltelli
X n
and Annoni (2010) found that most studies apply SA in an OAT
Disab ¼ wi × ðxia − xib Þ2 ð2Þ
(one-at-a-time) fashion, i.e., changing the value of uncertain factors
i¼1
one-at-a-time while keeping the others constant. In construction
literature, OAT-SA has been widely used for risk control (Tsai et al.
where Disab is the distance between two cases a and b in the
2011), construction operations optimization (Ozcan Deniz et al.
hyperspace (the number of dimensionality is the number of fea-
2012), transportation analysis (Feng and Figliozzi 2011), and
tures), xia and xib are the values of the ith feature of case a and b.
structural design (Low 2005). But it has already been found that
Eq. (2) in fact is a Euclidean distance. Although other definitions
of distance are also used in CBR analysis such as Minkowski OAT-SA is justified only if the model is linear (Saltelli et al.
distance (Duverlie and Castelain 1999), they all work in the same 2006; Lilburne and Tarantola 2009). If the problem is nonlinear,
way. Another issue of nearest neighbor is the determination of OAT always leads to misleading conclusions (Thogmartin 2010;
weight to each case feature [i.e., wi in Eq. (2)] since it has a Varella et al. 2010). Regarding construction problems as nonlinear
significant influence on the efficiency and accuracy of case systems (Du and El-Gafy 2010; Du and El-Gafy 2012), nonlinear-
retrieval (Changchien and Lin 2005). In many cases, the subjec- ities should not be ignored when selecting SA methods.
tive weight assignment is applied and thus the retrieved solu- Failure to capture nonlinearities has also been found in
tions cannot always be guaranteed (Changchien and Lin 2005). regression- and correlation-based SA. Evidence indicates that
Therefore, different methods have been proposed to enhance regression-based SA only works for linear models and its effective-
the determination of feature weights such as gradient descent ness depends on the goodness of fit (Lilburne and Tarantola 2009).
methods (Yau and Yang 1998a, b), genetic algorithms (Chiu Correlation measures are not effective at approximating the data
2002), artificial neural networks (Hui et al. 2001), analytic either in nonlinear problems since nonlinearities are poorly taken
hierarchy processing (Chang et al. 2004), information gain into account (Maier et al. 2005). In recognition of the nonlinearity
(Wettschereck and Aha 1995), and other statistical methods to issues, remedies have been proposed such as the Morris method
maximize variance between cases (Mohri and Tanaka 1994). and importance measure. However, while the Morris method can
These weight determination algorithms, however, are incapable account for nonlinearity, it has an assumption of monotonicity,
of capturing complex relationship between case features and which is not always true (Maier et al. 2005). And the Morris
corresponding solutions, i.e., the link between “feature similar- method cannot differentiate between the effects caused by nonli-
ity” and “solution similarity” (Fig. 2). As shown in Fig. 2(b), nearity in the model and parameter interactions (Maier et al. 2005).
if the feature-solution relationship is nonlinear, the approximation As for the importance measure, it only provides first-order effects
(a) (b)
Fig. 2. The influence of feature-solution projection function on solution selection
J. Comput. Civ. Eng., 2014, 28(6): 04014020

(parameter interactions are not considered) and is really computa- Sensitivity indices are then defined as
tionally demanding (Lilburne and Tarantola 2009).
Di1 ; : : : ;ik
As a result, variance-based GSA has been receiving increased Si1 ; : : : ;ik ¼ ð7Þ
attention recently, where unconditional variance of the model D
output is decomposed into terms due to individual factors plus and the summation of all the sensitivity indices equals 1
terms due to interaction among factors (Chen et al. 2005). GSA
has the capability of accounting for model nonlinearity and non- X
n X
n
Si1 ; : : : ;ik ¼ 1 ð8Þ
monotonicity, regardless of generic assumptions on the underlying
k¼1 i1 < : : : <ik
model (Maier et al. 2005; Lilburne and Tarantola 2009). In a variety
of studies, GSA has been proven to be better than traditional SA If k ¼ 1, then Si1 ; : : : ;ik is called main sensitivity index (MSI); if
(Sallaberry and Helton 2006). k ≥ 2, then Si1 ; : : : ;ik is called the interaction sensitivity index (ISI).
Among all GSA methods, the extended Fourier amplitude
The TSI is then defined as

sensitivity test (FAST) and Sobol’s total sensitivity analysis
method are cited as being the most popular in the literature because ^ ^
i ¼ Si þ Si;∼i ¼ 1 − S∼i
Stot ð9Þ
of their proven track record of reliable performance. But further
comparison also finds several limitations of FAST: (1) FAST where S^ i;∼i is the summation of all the Si1 ; : : : ;ik that involve the in-
assumes uniform parameter probability distributions for input dex i and at least one index from (1; : : : ; i − 1, i þ 1; : : : ; m); S^ ∼i
parameters. If input distribution is not uniform, like most construc- is the summation of all the Si1 ; : : : ;ik that do not involve any index i.
tion project parameters, FAST becomes too sensitive to the param- Stot
i therefore represents the average variation in the outputs of the
eter range selected (Maier et al. 2005); (2) FAST does not work model that is contributable to the input variable i through its sole
properly when inputs are not continuous in their ranges (Lilburne influences and interactions with other variables. It should be noted
and Tarantola 2009); (3) Sobol’s indices are slightly better than that TSIs are meaningful only when input variables are independent
FAST indices if a highly nonlinear problem is considered, because of each other (Saltelli and Tarantola 2002; Chen et al. 2005; Sobol
variation of a highly nonlinear problem is hard to represent with et al. 2007). Without orthogonality, sums of squares for different
a monodimensional curve (Sallaberry and Helton 2006); and components of the function are no longer partition to the total
(4) FAST is often less effective in approaching total sensitivity in- sum of squares (Oakley and O’Hagan 2004). Therefore, if the input
dices (Sathyanarayanamurthy and Chinnam 2009). Thus, Sobol’s variables are dependent, orthogonalization is needed. PCA is one
method is a better fit for this study. of the most commonly used orthogonalization methods, which will
Sobol’s total sensitivity index (TSI) method assumes a nonlinear be introduced later (Jolliffe 2002). Another key component of a
function can be decomposed to summands of orthogonal increasing successful global sensitivity analysis is the model that captures
order terms, which is called analysis of variance (ANOVA) the latent relationships between input variables and output obser-
representation (Sobol 2001) vations. An artificial neural network is a feasible modeling ap-
proach to reproduce such relationships.
X
m X
m X
m
fðx1 ; x2 ; : : : ; xm Þ ¼ f0 þ f i ðxi Þ þ fi1 i2 ðxi1 ; xi2 Þ
i¼1 i1 ¼1 i2 ¼i1 þ1
Artificial Neural Network
þ · · · þf 1; : : : ;m ðx1 ; : : : ; xm Þ ð3Þ
For many years linear modeling has been the commonly used
technique in most modeling domains such as regression analysis
Assume xi ði ¼ 1; 2; : : : ; mÞ are independent random variables (Kouskoulas and Koehn 1974). Scholars found that many construc-
with probability density functions pi ðxi Þ, then the constant term f 0 tion estimating problems can be solved with regression analysis if
is determined by linearity is satisfied (Stewart et al. 1995). However, where the linear
Z Y
m approximation is not valid these models suffer accordingly (Tam
f0 ¼ fðxÞ ½pi ðxi Þdxi ð4Þ and Fang 1999). Besides, the parametric modeling methodology
i¼1 adopted by most linear models requires a priori knowledge to de-
scribe the structure of the model (e.g., a function of relationship
Therefore, the general form of the k-order term of between inputs and outputs)—it is therefore very easy to build a
fðx1 ; x2 ; : : : ; xm Þ (a decomposition term depending on k input model that does not fit into the particular problem (Adeli and
variables) is given by Wu 1998). Since the 1990s, as the development of artificial intel-
ligence and increasing availability of relevant software packages,
f i1 ; : : : ;im ðxi1 ; : : : ; xim Þ the ANN has became a popular topic in construction research, in-
Z Y cluding cost estimating (Adeli and Wu 1998; Ayed 1998; Kim et al.
¼ fðxÞ ½pj ðxj Þdxj 2005). As a nonparametric approach, ANN does not require pre-
j≠i1 ; : : : ;im specified functions, but allows the data to speak for itself (Kim et al.
X
m−1 X 2005). It only makes few assumptions on the structure of the model,
− fj1 ; : : : ;jk ðxj1 ; : : : ; xjk Þ − f 0 ð5Þ and typically adapts the structure to accommodate the complexity
k¼1 j1 ; : : : ;jk ∈ði1 ; : : : ;im Þ of data (Friedman et al. 2009). Attributed to the features of ANN, it
has been found to fit well to nonlinear problems and is usually used
A key assumption of Sobol’s method is orthogonality, i.e., the to model complex relationships or discover the pattern of data
terms of fðx1 ; x2 ; : : : ; xm Þ are uncorrelated with each other. As a (Friedman et al. 2009).
result, the variance of fðx1 ; x2 ; : : : ; xm Þ can be determined by An ANN is a computational model that mimics the structure and
functionalities of biological neural networks (Friedman et al. 2009).
X
m X
m X
m A typical ANN model consists of an interconnected group of ar-
D¼ Di þ Di1 i2 þ : : : þ D1; : : : ;m ð6Þ tificial neurons called hidden layer(s), and it processes information
i¼1 i1 ¼1 i2 ¼i1 þ1 through the negative feedback interaction of artificial neurons or
J. Comput. Civ. Eng., 2014, 28(6): 04014020

Input 1
In the early phase of this study it was found that linear models
Wij Σ Wij Fj failed to capture the relationship between project parameters
and craft quantities (based on the available data), and therefore
cannot be used to generate reliable samples for sensitivity analy-
W jk sis. For example, a linear regression model was built to predict
Input 2
the length of AG conduit (LF) based on 12 project parameters.
R square was 0.59 suggesting a lack of explanation to the data.
Σ Output In contrast, an ANN yields R squares of 0.82 and 0.84 for the
training set and testing set, respectively, demonstrating a signifi-
Input 3 cant improvement in the goodness of fit. Similar evidence
has been found in other crafts, indicating strong nonlinearities
embedded in the problem investigated in this study. As a result,
ANN should be employed.

Input 4
Fig. 3. An ANN model The Algorithm
As a response to the research questions, a new similarity measure

perceptons (Fig. 3). As a kind of supervised learning, ANN can algorithm is proposed. It has two unique features to enable overall
be regarded as an adaptive system that changes its structure consideration of two spaces (feature space and solution space) and
according to external information that flows through the network the projection between them: First, Sobol’s total sensitivity ap-
(Friedman et al. 2009). Among all the available structures of proach is utilized to discover the actual influences of input variables
modern ANN models, the best-known example of a neural network on observations. These influences then can be used as weights of
structure is the freeforward multilayer percepton (MLP) neural distance because they represent relative contributions of input
network shown in Fig. 3 (Auer et al. 2008). variables to observed variations. Second, the Mahalanobis (1936)
The target of ANN modeling is to adjust the structure by data distance is used instead of Euclidean distance to eliminate the
training. In a training iteration (or an epoch), values of input unnecessary influence of covariance between variables. In order
variables are weighted and summed up for an artificial neuron. to realize Sobol’s method, as well as for a better decomposition
Then an activation function is applied to convert the summation of variations, input variables are orthogonalized in the hope that
of weighted input to output activation. A commonly used activation transformed input variables would not exert interacted influences.
function is the hyperbolic tangent function (TanH), which trans- Finally, similarity is measured as the weighted Mahalanobis
forms values to be between −1 and 1, and is the centered and scaled distance (Fig. 4).
version of the logistic function (Xiao et al. 2005) With a focus on the case retrieval methodology of the CBR
P analysis, Fig. 5 shows the major analysis procedure. First, ortho-
X
n
e2× ðXw−θÞ − 1 gonality of input data is examined, and if input variables are
fðX j Þ ¼ TanH X i wij − θij ¼ P ð10Þ correlated, PCA is applied to orthogonalize data. The transformed
i¼1 e2× ðXw−θÞ þ 1 data is then used to create an ANN model that establishes the basis
for the global sensitivity analysis. Sobol’s method is utilized to
Then through a linear combination, the output is obtained by
construct weights of input variables represented as TSIs of trans-
X
m formed project parameters (principals). Finally, the weighted
fðX K Þ ¼ X j wjk ð11Þ Mahalanobis distance is calculated as the measurement of similar-
j¼1 ity between two cases. In the rest of this section each step is
explained in detail.
where w is the weight between a perception and input or output,
and θ is a bias term. The target is to adjust the values of w and θ
to minimize the difference between actual observations and trained Orthogonality Examination
observation. A commonly used objective function is through least
square errors (Kim et al. 2004). A possibly more robust method Orthogonality of input variables is the prerequisite of Sobol’s
is the likelihood approach because composite log likelihood func- (2001) total sensitivity method. Independence between the input
tions for multiple responses can be constructed by adding the indi- variables allows a tidy decomposition of the total variance into
vidual response log likelihoods without leading to scale variant component variances so that the quantities of total sensitivity
estimates (Friedman et al. 2009). To realize the likelihood estimat- indices can be easily obtained (Sobol et al. 2007). Therefore,
ing, a Gaussian log likelihood is built the orthogonality of input data should be examined prior to the
following analysis. In this paper we use Pearson’s correlation

2π coefficient to examine the orthogonality of input variables (Aldrich
LGaussian ¼ n logðSSEÞ þ 1 þ log 2 ð12Þ 1995). It is defined as
n
where SSE is the squared errors between actual and predicted covðX; YÞ E½ðX − μX ÞðY − μY Þ
observation values, given by ρX;Y ¼ ¼ ð14Þ
σX σY σX σY
X
n
SSE ¼ ½yi − fðxi Þ2 ð13Þ
i¼1 where covðX; YÞ is the covariance of two variables, and σX and σY
are their standard deviations. In practice, the sample Pearson’s
The target of fitting an ANN model is using maximum likeli- correlation coefficient is used as an estimator of population corre-
hood of estimating the model parameters. lation coefficient
J. Comput. Civ. Eng., 2014, 28(6): 04014020

1 Orthogonalize Mahalanobis
3
Distance
ω1
Assign
2 4
weights
Global Sensitivity
Analysis
ω2
ω5
ω3
ω4
Fig. 4. Geometric view of the proposed algorithm
Fig. 5. Analysis flowchart of the proposed algorithm
P P P P
xi yi − nx̄ ȳ n xi yi − xi yi Orthogonalization
rxy ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P P pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P P
ðn − 1Þsx sy n x2i − ð xi Þ2 n y2i − ð yi Þ2 If the requirement of orthogonality is not met, orthogonalization
ð15Þ should be performed to make transformed input variables inde-
pendent with each other. In this paper, we propose the use of
where x̄ and ȳ are the sample means of two variables, and sX and PCA as the orthogonalization method (Jolliffe 2002). Building
sY are sample standard deviations. Besides Pearson’s correlation on the eigenvalue decomposition of the covariance matrix,
coefficient, scattergraph is also plotted to demonstrate the relation- PCA enables a linear transformation to convert correlated
ships among input variables. variables into a set of uncorrelated variables called principal
J. Comput. Civ. Eng., 2014, 28(6): 04014020

Table 1. Crafts Used for Testing Proposed Algorithm
Number Craft Description UOM
1 Concrete PFC Volume of concrete pouring, finishing and curing CY
2 Concrete embedded steel Volume of concrete reinforcing CY
3 Concrete formwork Area size of concrete formwork SF
4 Steel erect Weight of major structural steel erected on site TN
5 Steel grating Area size of steel grating SF
6 Steel pipe racks Weight of steel pipe racks TN
7 Total steel Total steel weight TN
8 Complete piping Total length of all main pipes LF
9 Main circulating water system Length of circulating water system LF
10 Small bore pipe Length of small bore pipe LF
11 Large bore pipe Length of large bore pipe LF

12 Total cable Total length of main cables LF
13 Cable 600V and above Length of cables for 600V and above transmission LF
14 Cable below 600V Length of cables for below 600V transmission LF
15 Control cable Length of cables used for controlling system LF
16 Instrument cable Length of cables used for instruments such as sensors LF
17 Cable tray Length of cable tray LF
18 A/G conduit Length of above ground conduit LF
19 U/G conduit w/civil Length of underground conduit LF
20 Install instruments Count of installed instruments EA
Table 2. Project Parameters Used in the Analysis

Covert to quantitative
Number Symbol Description Data type variables
1 MW The megawattreading of the project Numeric N/A
2 Type The nature of the job Nominal Coding
3 ENG Engineering company in the project Nominal Coding
4 Config The mechanical equipment configuration of the project Nominal Coding
5 Equip Total number of mechanical equipment Numeric N/A
6 Layout Job site layout classification Nominal Coding
7 Stack Measurement of center line (feet) of outside Numeric N/A
stacks in a multiple unit configuration
8 Vendor The vendor of steam turbine Nominal Coding
9 Status Refers to whether the installation of a project Nominal Dummy variables
is at an existing facility
10 TowerCells Total number of cooling tower cells Numeric N/A
11 MainRack End-to-end length in linear feet of the main Numeric N/A
pipe rack connecting the multiple units
12 RadiusSum Linear feet measurement from a central location Numeric N/A
to the major work areas
of control room, water treatment facility, storage
tanks and cooling tower
components. (Jolliffe 2002). This transformation is defined in where I p is a p p identify matrix. In that λ is the eigenvalue of
such a way that the first principal component accounts for as Σ, and α1 is the eigenvector. Our target is to find the maximum λ
much of the variability in the data as possible, i.e., has a high so that the first principal component is obtained. For the suc-
variance, and each succeeding principal component in turn has ceeding principal components the above procedure is repeated
the highest variance subjected to the constraint that it be orthogo- with the constraint
nal to the preceding components (Jolliffe 2002). Consider the
form of the first principal component as a linear transformation cov½αi0 x; αj0 x ¼ 0; ð0 < i; j ≤ pÞ ð18Þ
of X, i.e., α10 x,P
per the definition of PCA we know α1 maximizes
var½α10 x ¼ α10 α1 under the constraint of α10 α1 ¼ 1. To find α10 , Finally we define a matrix A which contains all the αi , which
the Lagrange multiplier is used can give
X
α10 α1 − λðα10 α1 − 1Þ ð16Þ P ¼ X × A ð19Þ
1×n 1×n n×n
where λ is the Lagrange multiplier. Following Eq. (16), we know where X is the standardized values of input variables; P is the
transformed uncorrelated variables (i.e., principals). A is a linear
X
transformation matrix on original input data, i.e., a matrix used
−λI p α1 ¼ 0 ð17Þ
to transform original correlated input variables to uncorrelated
J. Comput. Civ. Eng., 2014, 28(6): 04014020

variables (principals). A is referred to as the matrix of orthogonal (NeuralWare 2014), we determine the number of neurons as
transformation (MOT) in this paper. 2=3 of the total number of inputs and outputs. The other issue
that needs special attention is the stopping condition of an
ANN model. In the so-called generalized delta rule, the gradient
ANN Modeling vector of the error surface is calculated, which points along the
Orthogonalized input variables are then used to build ANN mod- line of the steepest descent from the current point to optimum.
els. There is no appropriate rule for determining the proper num- The difficulty is to determine the condition to stop (Bishop 1995).
ber of neurons (Kim et al. 2004). Too few neurons may be In this paper, we propose the use of a quasi-Newton method,
inefficient with respect to learning, but too many neurons could BFGS (Broyden-Fletcher-Goldfarb-Shanno), to optimize the
lead to overfitting (Tetko et al. 1995). Following a general rule objective function. It has been proven that BFGS performs
dist
cell
rack
sum
Fig. 6. Correlation scatter graph of original input data
Table 3. Eigenvalues of Principal Component Analysis

Principal component 1 2 3 4 5 6 7 8 9 10 11 12
λ 5.892 2.023 1.247 0.851 0.502 0.45 0.35 0.268 0.183 0.146 0.065 0.03
% of variance 49.1 16.86 10.39 7.095 4.183 3.74 2.88 2.231 1.522 1.215 0.54 0.25
Cumulative % 49.1 65.96 76.35 83.44 87.63 91.4 94.2 96.47 97.99 99.21 99.75 100
J. Comput. Civ. Eng., 2014, 28(6): 04014020

0 1
significantly better than back propagation by allowing faster con- s1 ::: 0
B .. .. .. C
vergence (Bishop 1995). In addition, to combat the overfitting S ¼@ . . . A ð20Þ
n×n
problem, a penalty function is applied to enhance the objective 0 ··· sn
function (Setiono 1997).
where si ði ¼ 1; : : : ; nÞ is the TSIs indices obtained from Sobol’s

Obtaining Total Sensitivity Indices
global sensitivity analysis. S is a diagonal matrix showing the
On the basis of developed ANN models, Sobol’s total sensitivity distributions of different principal components to the variations of
method is applied to find the TSIs for transformed input variables observations (i.e., craft quantities in this paper), and thus is referred
(i.e., principal components). SimLab (2011), an open source uncer- to as the matrix of principals weights (MPW) in this paper. It is worth
tainty analysis tool, was used to find the TSIs for transformed input noting that MPW is only applicable for transformed input variables,
variables (i.e., principal components). In order to perform Monte i.e., variables after orthogonalization. As a result, it should be con-
Carlo simulation, probability distributions of principal components verted to the matrix applicable for original input variables.
should be identified at the beginning of the analysis (Mooney
1997). A chi-square test is conducted to examine whether fitted
distribution is a good representation of a principal component Calculating Weighted Mahalanobis Distance
(Greenwood and Nikulin 1996). After obtaining TSIs of all the The final step aims to calculate the ultimate measurement of sim-
principals, we construct a matrix S to show the weights for trans- ilarity based on MPW. If a MPW is known for the problem, the
formed axes weighted Mahalanobis distance can be obtained by
Fig. 7. Correlation scatter graph of input data after orthogonalization
J. Comput. Civ. Eng., 2014, 28(6): 04014020

Distw ¼ ðP × S × ST × PT Þ1=2 ¼ ðX × A × S × ST × AT × X T Þ1=2 Application
ð21Þ
Background
Then we define another matrix W to demonstrate the weights of We applied the proposed algorithm to the quantity takeoff of a power
original input variables, referred to as matrix of original weights plant project (project 0) to demonstrate its applicability. Quantity
(MOW). Following its definition, we know that takeoff in conceptual estimating is complicated involving the consid-
eration of many unique factors such as actual geographic conditions
Distw ¼ ðX × W × W T × X T Þ1=2 ð22Þ of the job site. However, for project 0 only limited information
was available and the time frame for developing the proposal was
Given Eqs. (21) and (22), we know that very short. Even though parametric estimating tools were available,
estimators still tended to validate the estimates on the basis of histor-
X × W × W T × X T ¼ X × A × S × ST × AT × X T ð23Þ ically similar projects. It was critical to find similar projects as a refer-
ence of quantity takeoff of project 0. Twenty crafts were selected to
examine the effectiveness and usefulness of the proposed algorithm
Finally, W is determined by
(Table 1). For typical EPC (engineering, procurement, and construc-
W ¼A×S ð24Þ tion) projects, these crafts account for a significant portion of direct
labor hours and thus can be regarded as a fair representation of the
In the analysis, we can use W (MOW) to calculate weighted quantity of the usage of the entire project.
Mahalanobis distance directly out of the value of standardized input Following the Project Management Institute (2008), project
variables [see Eq. (22)]. The weighted Mahalanobis distance then parameters are “project characteristics that can be used to develop
can be used as the similarity measure between two cases. If we mathematical models to predict total project costs.” They are physi-
defined the maximum weighted Mahalanobis distance as 100%, cal attributes of a project that are considered to be major cost drivers
a similarity measure is obtained by comparing it with such as project size, project type, and location (NASA 2008). Stat-
istical relationships may be found between these attributes and final
Distiw construction costs (Ayed 1998). Based on expert opinions and a
Simi ¼ 1 − preliminary MANOVA (multiple factor analysis of variance)
maxðDistiw Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (Freund and Littell 1981), 12 project parameters (Table 2) were
⇀ ⇀ ⇀ ⇀ T
ð x n − x i Þ × W × D−1 × W T × ð x n − x i Þ selected as explanatory factors that are believed to be able affect
¼1− qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi the quantities of 20 crafts. Then data from 47 historical projects
⇀ ⇀ ⇀ ⇀ T
maxð ð x n − x i Þ × W × D−1 × W T × ð x n − x i Þ (1993 through 2010) was collected for the analysis.
ð25Þ
Applying the Algorithm
where D is the covariance matrix. Simi is used as the improved Pearson’s correlation analysis was performed to examine the
similarity measurement in CBR. orthogonality of project parameters (Fig. 6). Among 12 project
Fig. 8. PDFs of transformed project parameters (principal)
J. Comput. Civ. Eng., 2014, 28(6): 04014020

parameters, 6 are qualitative variables. Therefore, they need to For other nonnumeric variables with too many statuses, follow-
be converted into quantitative variables for further analysis. Two ing Ayed (1998), a coding method was adopted to simply code
methods have been used to deal with qualitative variables, qualitative variables as numbers. For example, engineering
including the “dummy variable” method (Draper et al. 1966) partners haven been coded as numbers ranging from 1 to 6.
and the “coding method” (Ayed 1998). A dummy variable is a Table 2 lists the conversion method used for each nominal project
quantitative variable (aka an indicator variable) that takes the parameter.
values 0 or 1 to replace qualitative variables, including nominal A MANOVA test finds strong relationships among input
and ordinal variables (Draper et al. 1966). For certain nonnu- variables such as between configuration and total number of equip-
meric variables the dummy variable method was employed. ment. Therefore, PCA was performed to orthogonalize the data. We
For example, there are two statuses for a new project: green- keep the maximum number of principal components (i.e., the same
field (brand new construction) or brownfield (construction number of original input variables) since the target is not in dimen-
on existing facilities). A dummy variable “status” was created sionality reduction (Table 3).
with 0 designating “greenfield” and 1 designating “brownfield.” The MOT matrix is finally obtained as follows:
0 1
0.367 0.209 0.083 0.058 0.210 0.137 −0.010 −0.045 −0.152 −0.533 −0.656 −0.096
B 0.285 −0.331 −0.185 0.037 0.604 −0.050 −0.016 0.091 0.584 −0.140 0.191 −0.013 C
B C
B C
B 0.002 −0.030 0.749 −0.526 0.311 −0.017 0.128 0.145 −0.113 0.049 0.108 0.018 C
B C
B 0.365 −0.181 −0.160 −0.161 0.045 0.297 −0.066 0.053 −0.134 0.490 −0.268 0.596 C
B C
B C
B 0.382 0.105 −0.054 −0.065 −0.016 0.386 −0.101 0.051 −0.065 0.380 0.119 −0.715 C
B C
B 0.041 0.628 0.240 0.172 0.005 −0.200 −0.089 −0.010 0.534 0.365 −0.203 0.096 C
B C
B 0.293 0.426 0.066 0.201 0.064 0.285 −0.057 −0.126 −0.174 −0.256 0.616 0.328 C
B C
B C
B 0.284 −0.264 0.264 0.151 −0.144 −0.390 −0.748 −0.085 −0.124 −0.007 0.048 −0.013 C
B C
B −0.011 −0.355 0.461 0.697 −0.049 0.272 0.283 0.000 0.036 0.113 −0.062 −0.002 C
B C
B 0.347 −0.031 −0.042 0.003 0.067 −0.470 0.461 −0.609 −0.176 0.176 0.044 −0.062 C
B C
B C
@ 0.310 −0.156 0.132 −0.304 −0.656 0.145 0.148 −0.109 0.462 −0.258 0.034 0.051 A
0.346 0.066 −0.074 0.129 −0.171 −0.391 0.291 0.746 −0.158 −0.018 0.075 0.011
Table 4. Sobol’s TSIs of Transformed Project Parameters (Principal)

Total Complete Concrete Concrete Steel Cable Small
Principals Total steel cable piping PFC formwork Instrumentation grating 600V− bore Pipe racks
Principal1 0.568 −0.108 0.030 0.394 0.005 0.016 0.164 0.232 0.676 0.072
Principal2 0.098 0.198 0.385 0.051 0.124 0.054 0.060 0.128 0.208 0.265
Principal3 0.867 0.869 0.844 0.815 0.632 0.741 0.954 0.791 0.580 0.766
Principal4 0.044 −0.033 0.274 0.076 0.145 0.045 0.020 0.049 0.007 0.061
Principal5 0.122 0.073 0.158 0.027 0.053 0.024 0.049 0.058 0.010 0.024
Principal6 0.251 0.175 0.025 0.098 0.170 0.175 −0.005 0.026 0.023 0.029
Principal7 0.037 0.254 0.141 0.025 0.021 0.073 0.146 0.088 0.009 0.258
Principal8 0.133 −0.030 0.011 0.128 0.117 0.041 0.028 0.054 0.013 0.114
Principal9 0.092 0.029 0.087 0.075 0.135 0.667 0.088 0.039 0.005 0.041
Principal10 0.072 0.255 0.161 0.134 0.108 0.044 0.091 0.054 0.035 0.016
Principal11 0.093 0.003 0.089 0.024 0.121 0.147 0.065 0.062 0.018 0.018
Principal12 0.086 −0.118 0.161 0.188 0.125 0.010 0.004 0.052 0.009 0.093
Cable Concrete Erect Large Instrument Cable Control Main U/G
600V+ embedded steel steel bore cable tray cable circ conduit A/G conduit
Principal1 0.330 0.013 0.254 0.381 0.251 0.060 0.372 0.414 0.141 0.125
Principal2 0.018 0.008 0.064 0.072 0.131 0.256 −0.082 0.060 0.024 0.003
Principal3 0.962 1.022 0.539 0.886 0.623 0.752 0.741 1.029 0.815 0.869
Principal4 0.098 0.007 0.127 0.104 0.159 0.009 0.041 0.186 0.067 −0.012
Principal5 0.003 0.000 0.099 0.105 0.017 0.017 0.012 0.152 0.310 0.021
Principal6 0.025 0.002 0.048 0.011 0.092 0.023 0.023 0.245 0.078 0.009
Principal7 0.015 0.010 0.053 0.231 0.034 0.001 0.039 0.027 0.271 0.021
Principal8 0.040 −0.010 0.104 0.273 0.004 0.250 0.142 0.159 0.095 0.027
Principal9 −0.003 −0.002 0.293 0.137 0.011 0.058 0.059 −0.019 0.055 0.010
Principal10 0.062 0.001 0.095 −0.049 0.355 0.062 0.044 0.012 0.222 0.040
Principal11 0.177 −0.004 0.025 0.017 0.091 0.053 0.088 −0.008 0.254 0.105
Principal12 0.044 −0.008 0.226 0.013 0.094 0.515 0.063 0.060 0.095 0.007
J. Comput. Civ. Eng., 2014, 28(6): 04014020

(a) (b)
Fig. 9. Examples of MPW and MOW (total steel)
Table 5. Similarity Measures of 47 Projects on a Craft Basis

Standardized Complete Total Total Concrete Concrete Steel Cable Small Pipe
Euclidean piping cable steel PFC formwork Instrument grating 600V− bore racks
Job distance (%) (%) (%) (%) (%) (%) (%) (%) (%) (%) (%)
1 9.2 51.9 66.2 23.4 33.7 60.0 68.0 65.0 51.8 19.7 67.5
2 57.3 56.8 56.1 47.9 48.9 56.5 53.5 55.4 52.3 53.0 56.7
3 11.3 52.3 60.6 21.0 30.7 57.0 62.8 59.7 47.9 18.3 63.7
4 52.2 64.6 64.0 54.8 57.5 64.9 65.7 64.9 61.6 58.6 65.8
5 50.0 70.4 70.8 52.7 57.5 72.2 73.2 70.1 64.8 53.1 72.1
6 53.4 71.1 89.1 84.5 86.5 77.6 85.1 94.4 90.1 86.6 87.7
7 5.1 39.2 49.3 5.2 17.4 54.3 51.4 50.8 35.5 0.0 45.2
8 36.4 55.0 58.6 57.1 59.0 62.5 66.2 66.2 62.7 59.8 59.2
9 44.3 45.4 45.1 51.9 48.8 44.0 47.5 47.0 46.6 69.1 45.8
10 13.2 27.0 62.4 58.7 66.6 64.6 85.3 84.1 67.2 50.5 45.5
11 43.9 26.0 28.9 41.1 33.8 27.0 29.8 29.4 29.1 63.7 27.8
12 33.0 33.4 33.9 27.4 27.5 34.7 37.0 34.0 30.4 36.8 34.3
13 61.5 66.6 69.8 67.5 67.8 69.3 67.5 71.0 69.0 71.4 68.9
14 63.5 73.0 71.5 80.7 78.3 75.5 77.3 75.6 76.2 88.1 69.3
15 69.2 63.0 62.7 66.9 64.2 62.6 64.1 63.1 62.7 76.8 62.4
16 11.5 12.8 42.5 55.8 56.7 44.5 53.6 57.0 49.3 57.1 28.5
17 48.0 37.8 39.5 50.8 45.3 39.1 40.7 41.3 41.3 69.6 39.1
18 27.5 43.7 48.9 52.4 53.1 50.1 54.8 54.0 51.9 63.2 48.1
19 42.7 0.0 0.0 17.6 7.4 0.0 0.0 0.0 0.0 49.4 0.0
20 61.3 65.4 68.5 70.8 69.9 68.1 67.8 70.8 69.5 76.9 67.7
21 41.7 52.1 51.2 41.5 44.7 51.4 53.0 52.9 49.0 47.8 52.2
22 0.0 28.2 39.7 4.7 14.1 41.3 45.6 40.6 28.4 3.1 36.8
23 31.6 61.3 82.7 82.0 83.8 71.9 83.7 86.1 83.9 91.2 84.2
24 21.7 52.3 65.8 35.2 45.5 73.3 74.3 74.6 59.8 29.0 60.3
25 15.0 38.4 66.9 39.9 50.3 70.6 82.5 79.3 61.5 31.9 53.7
26 15.5 47.7 52.2 14.9 25.0 56.7 50.4 52.9 40.7 11.4 50.5
27 14.5 37.7 66.6 40.0 50.3 70.3 82.6 79.2 61.4 32.0 53.1
28 4.6 20.6 57.9 58.5 66.7 59.2 80.2 82.5 65.4 51.0 40.1
29 62.6 65.8 69.2 70.2 69.9 69.0 66.8 71.3 69.8 75.4 68.3
30 11.9 51.7 64.1 14.8 28.2 69.7 76.7 65.9 48.2 7.8 59.3
31 72.7 77.9 79.6 69.1 66.2 64.0 64.4 64.5 64.3 79.5 64.0
32 15.4 48.0 63.4 21.8 33.6 68.5 75.7 67.0 50.9 15.1 57.5
33 69.7 81.4 85.5 75.0 78.5 87.6 86.3 88.5 83.7 73.2 84.6
34 27.4 18.5 19.6 20.0 17.7 20.3 23.4 20.1 17.3 35.8 19.7
35 16.3 51.4 63.3 26.4 35.4 59.8 66.8 62.5 51.0 23.2 63.0
36 17.1 47.0 55.0 39.8 46.0 51.9 51.2 59.8 53.8 43.0 55.1
37 45.6 62.4 64.7 59.5 63.2 64.2 66.3 69.2 65.9 65.0 66.1
38 14.6 51.6 60.2 30.7 37.9 55.2 59.5 60.5 51.2 30.2 59.4
39 54.5 62.9 63.7 62.9 64.1 61.2 60.3 66.2 64.8 72.0 65.3
40 51.9 45.9 46.3 51.2 48.6 44.4 46.2 47.3 46.6 66.9 47.5
41 54.5 46.4 47.2 53.5 50.2 45.4 47.4 48.1 47.6 69.8 48.3
42 6.6 2.3 8.6 0.0 0.0 8.7 12.9 9.2 3.6 12.0 7.4
43 22.7 64.3 64.1 28.1 40.0 83.3 75.4 76.5 60.3 22.3 80.7
44 24.8 57.3 78.7 51.5 57.1 66.3 66.8 78.2 69.2 49.0 77.1
45 16.6 59.0 73.3 26.9 37.8 68.0 75.7 70.8 56.6 21.8 74.2
46 53.1 54.6 59.0 56.2 55.9 58.3 56.2 59.9 57.4 61.7 58.2
47 16.4 29.0 38.9 21.0 26.6 42.4 46.2 42.5 34.1 22.1 34.3
J. Comput. Civ. Eng., 2014, 28(6): 04014020

Table 5. (Continued.)
Cable Concrete Erect Large Instrument Cable Control Main U/G A/G
600V+ embedded steel bore cable tray cable circ conduit conduit
Job (%) steel (%) (%) (%) (%) (%) (%) (%) (%) (%)
1 45.6 74.1 30.9 37.1 36.0 64.2 33.1 37.3 59.9 67.6
2 50.6 57.1 48.5 49.4 50.8 56.3 48.8 49.9 56.0 55.9
3 41.9 67.6 29.0 34.0 32.3 61.9 30.2 34.6 55.2 62.0
4 60.2 67.0 57.3 57.9 57.6 65.4 57.5 57.9 59.9 65.6
5 62.5 73.4 57.1 58.5 60.9 70.3 57.0 59.4 62.1 71.2
6 88.4 97.0 78.0 84.7 77.2 87.2 87.8 80.8 87.8 95.5
7 31.0 60.3 18.1 21.9 24.0 45.8 15.6 24.8 50.3 53.8
8 62.5 69.6 57.4 58.3 49.1 61.4 59.6 59.9 59.8 68.2
9 46.5 47.5 48.1 45.6 47.4 44.3 49.5 45.2 43.5 47.4
10 74.9 96.8 62.5 67.4 56.6 46.6 64.0 67.8 81.7 89.5
11 29.2 29.5 32.8 30.0 32.2 25.3 34.4 30.1 30.2 29.4
12 28.5 36.2 27.5 26.9 28.2 35.2 27.6 27.5 31.6 34.8
13 68.4 72.2 66.3 67.1 66.6 68.2 67.6 67.7 69.9 71.4
14 77.1 77.4 76.8 72.1 77.1 75.1 78.3 76.8 70.1 77.3
15 62.5 63.5 64.2 62.2 64.1 62.4 64.4 62.7 62.9 63.4
16 56.5 58.8 53.3 55.5 45.9 29.3 55.1 54.3 58.1 58.4
17 41.2 41.8 43.8 42.3 42.7 41.0 46.2 41.7 40.1 41.8
18 52.7 55.9 50.5 50.0 49.8 51.0 53.6 49.3 41.3 55.2
19 0.0 0.0 7.2 4.3 6.0 0.0 8.5 2.9 0.0 0.0
20 69.6 71.6 68.4 68.3 67.0 67.2 70.0 68.9 69.5 71.2
21 47.5 55.2 45.5 45.2 46.4 52.2 44.7 44.6 51.7 53.7
22 24.4 47.4 14.0 17.2 18.2 37.1 12.9 18.8 36.5 42.8
23 83.6 87.8 69.5 78.1 71.2 83.8 85.8 73.6 58.3 87.2
24 57.8 87.3 45.5 48.6 44.9 61.6 44.0 51.9 71.1 78.8
25 61.9 97.4 48.7 53.2 49.4 54.6 48.1 55.4 75.7 84.0
26 36.5 60.8 24.9 28.2 30.8 51.0 23.6 31.2 50.6 55.6
27 62.0 97.1 48.7 53.2 49.3 53.9 48.1 55.4 75.9 84.0
28 75.1 92.4 61.6 66.0 54.5 40.0 64.0 66.3 80.4 87.8
29 69.8 72.2 68.0 69.2 67.8 68.9 70.1 69.0 70.1 71.8
30 43.2 79.4 29.8 33.1 35.7 60.3 26.3 35.6 64.2 69.8
31 64.1 64.7 66.0 64.5 65.9 64.3 66.5 64.6 64.3 64.6
32 47.0 78.6 34.6 38.1 37.9 58.8 31.9 40.5 65.9 70.5
33 82.8 92.0 78.2 79.8 76.9 84.3 78.1 80.8 87.0 89.7
34 16.2 21.6 18.0 16.2 18.3 20.0 18.0 15.8 16.6 20.7
35 45.8 69.9 33.9 38.0 37.4 62.7 34.5 39.1 59.7 64.7
36 51.4 64.2 41.6 46.0 44.9 60.0 46.3 44.6 54.2 61.8
37 65.6 70.9 62.7 64.0 61.1 65.9 63.7 61.5 63.0 69.8
38 47.1 66.2 35.6 37.1 39.8 54.3 37.0 40.3 57.3 62.2
39 64.3 67.0 62.6 64.0 63.1 65.4 64.9 61.2 65.9 66.5
40 46.0 47.7 48.3 47.7 47.4 47.6 49.5 45.3 46.8 47.4
41 47.2 48.3 49.7 49.1 48.7 48.4 51.1 46.7 47.5 48.1
42 1.3 12.0 0.0 0.0 0.0 7.7 0.0 0.0 9.3 10.1
43 54.4 97.1 39.3 42.5 48.2 74.7 38.3 46.7 65.0 80.6
44 64.6 84.7 49.7 57.0 53.2 75.0 56.9 57.4 73.5 80.1
45 50.0 82.3 35.8 41.4 41.3 77.7 36.9 42.0 66.6 74.0
46 56.5 61.2 53.8 55.6 54.5 57.7 56.0 55.7 60.0 60.4
47 32.6 47.0 27.3 27.8 28.8 35.9 25.5 29.5 41.3 44.3
Fig. 7 shows the scatter graph after orthogonalization. As shown, (principals) were fitted (Fig. 8). These PDFs were used as random
there is no correlation between transformed input variables (or trans- generators of a Monte Carlo process.
formed project parameters i.e., principal components 1 through 12). The Sobol’s TSIs of transformed project parameters (principals)
Then a set of ANN models were built based on the transformed pertaining to 20 crafts were then computed using SimLab (2011).
project parameters (principals) and actual craft quantities of 47 6,656 samples were generated by a quasirandom sampling process
historical projects. Holdback validation was applied and the result for each craft. Table 4 shows Sobol’s TSIs for each craft.
shows that ANN models fit well to the data. Because principal Based on Sobol’s TSIs, the MPW matrices for 20 crafts were
components 1 through 3 explain 76% of the total variation, profiler constructed. MPW matrices are diagonal matrices with TSIs in
graphs of the ANN models on three principal components were the diagonal line. Following Eq. (24), the MOW matrices were
plotted to check the nonlinearity of the problem (not shown in this obtained by multiplying MOT and MPW. Fig. 9 shows examples
paper). Profiler graphs indicate high nonlinearity of the problem. of MPW and MOW matrices of total steel. The MOW matrices
This is a strong signal that GSA should be utilized (Reedijk 2000). were finally applied directly to original data to calculate weighted
In order to realize Sobol’s total sensitivity analysis, distribution Mahalanobis distances following Eq. (22). Final percent similar-
probability functions (PDFs) of transformed project parameters ities and similarity ranking were calculated based on Eq. (25) as
J. Comput. Civ. Eng., 2014, 28(6): 04014020

Table 6. Similarity Rankings of 47 Projects on a Craft Basis
Standardized
Euclidean Complete Total Total Concrete Concrete Steel Cable Small Pipe
Job distance piping cable steel PFC formwork Instrument grating 600V− bore racks
1 43 23 14 37 36 24 15 22 27 40 12
2 8 17 30 25 24 30 32 32 25 23 26
3 42 21 25 39 38 28 26 30 34 41 17
4 13 9 19 17 16 17 23 23 17 20 14
5 15 5 8 19 15 6 14 16 12 22 7
6 11 4 1 1 1 3 3 1 1 3 1
7 45 34 34 45 44 32 34 36 40 47 37
8 22 18 28 14 14 22 22 20 16 19 23
9 18 32 38 21 25 39 37 39 37 13 35
10 39 41 24 12 9 18 2 4 9 25 36
11 19 42 44 27 35 44 44 44 43 16 44
12 23 38 43 34 40 43 43 43 42 30 41
13 6 6 9 8 7 11 17 13 8 10 9
14 4 3 7 3 4 4 8 10 4 2 8
15 3 11 23 9 11 21 25 25 15 6 19
16 41 45 39 16 18 37 31 31 31 21 43
17 16 36 41 24 29 42 42 41 38 12 39
18 25 33 35 20 20 35 30 33 26 17 33
19 20 47 47 42 46 47 47 47 47 26 47
20 7 8 11 5 6 14 16 15 6 5 11
21 21 22 33 26 30 34 33 35 32 28 30
22 47 40 40 46 45 41 41 42 44 46 40
23 24 14 3 2 2 7 4 3 2 1 3
24 29 20 15 31 28 5 13 11 21 35 20
25 36 35 12 29 22 8 6 6 18 33 28
26 34 28 32 43 42 29 36 34 39 44 31
27 38 37 13 28 21 9 5 7 19 32 29
28 46 43 29 13 8 26 7 5 11 24 38
29 5 7 10 6 5 12 19 12 5 7 10
30 40 24 17 44 39 10 9 21 33 45 22
31 1 2 4 7 10 20 24 24 14 4 16
32 35 27 21 38 37 13 10 18 30 42 25
33 2 1 2 4 3 1 1 2 3 8 2
34 26 44 45 41 43 45 45 45 45 31 45
35 33 26 22 36 34 25 18 26 29 36 18
36 30 29 31 30 27 33 35 29 24 29 27
37 17 13 16 11 13 19 21 17 10 15 13
38 37 25 26 32 32 31 28 27 28 34 21
39 9 12 20 10 12 23 27 19 13 9 15
40 14 31 37 23 26 38 39 38 36 14 34
41 10 30 36 18 23 36 38 37 35 11 32
42 44 46 46 47 47 46 46 46 46 43 46
43 28 10 18 33 31 2 12 9 20 37 4
44 27 16 5 22 17 16 20 8 7 27 5
45 31 15 6 35 33 15 11 14 23 39 6
46 12 19 27 15 19 27 29 28 22 18 24
47 32 39 42 40 41 40 40 40 41 38 42
Concrete
Cable embedded Erect Large Instrument Cable Control Main U/G A/G
Job 600V+ steel steel bore cable tray cable circ conduit conduit
1 35 15 37 35 36 16 36 36 24 21
2 26 34 24 23 18 26 24 23 30 32
3 37 23 39 37 38 19 38 38 31 28
4 19 25 15 16 13 14 15 16 23 23
5 14 16 16 14 12 8 16 15 21 16
6 1 4 2 1 1 1 1 2 1 1
7 41 32 43 43 43 36 44 43 35 35
8 15 22 14 15 22 21 14 14 25 20
9 32 39 26 28 25 37 22 28 38 39
10 6 5 12 7 14 35 11 7 3 3
11 42 44 36 39 39 44 35 40 44 44
12 43 43 40 42 42 42 39 42 43 43
13 9 18 7 8 7 10 7 8 11 14
14 4 14 3 4 2 5 3 3 10 11
J. Comput. Civ. Eng., 2014, 28(6): 04014020

Table 6. (Continued.)
Concrete
Cable embedded Erect Large Instrument Cable Control Main U/G A/G
Job 600V+ steel steel bore cable tray cable circ conduit conduit
15 16 29 9 13 9 18 10 11 20 26
16 22 33 18 19 28 43 19 21 28 31
17 38 42 29 31 31 38 28 32 41 42
18 24 35 19 22 19 31 20 24 40 34
19 47 47 46 46 46 47 46 46 47 47
20 8 19 5 6 6 11 6 6 12 15
21 28 36 27 29 27 30 29 30 33 36
22 44 40 45 44 45 40 45 44 42 41
23 2 8 4 3 4 3 2 4 27 5
24 20 9 28 25 29 20 30 22 8 10
25 18 1 23 20 20 27 26 20 6 7
26 39 31 42 40 40 32 42 39 34 33
27 17 3 22 21 21 29 25 19 5 6
28 5 6 13 9 15 39 12 9 4 4
29 7 17 6 5 5 9 5 5 9 13
30 36 12 38 38 37 22 40 37 18 19
31 13 27 8 10 8 15 8 10 17 25
32 31 13 34 33 34 24 37 33 15 17
33 3 7 1 2 3 2 4 1 2 2
34 45 45 44 45 44 45 43 45 45 45
35 34 21 35 34 35 17 34 35 26 24
36 25 28 30 27 30 23 27 29 32 29
37 10 20 10 12 11 12 13 12 19 18
38 30 26 33 36 33 28 32 34 29 27
39 12 24 11 11 10 13 9 13 14 22
40 33 38 25 26 26 34 23 27 37 38
41 29 37 20 24 23 33 21 25 36 37
42 46 46 47 47 47 46 47 47 46 46
43 23 2 31 30 24 7 31 26 16 8
44 11 10 21 17 17 6 17 17 7 9
45 27 11 32 32 32 4 33 31 13 12
46 21 30 17 18 16 25 18 18 22 30
47 40 41 41 41 41 41 41 41 39 40
shown in Tables 5 and 6. In addition, standardized Euclidean Then three indices were computed to quantify the difference be-
distances and relevant percent similarities were also obtained. tween approved quantities and actual historical quantities: sum of
A Wilcoxon signed-rank test (Becker et al. 2012) was squared error (SSE), root of sum of squared error (RSS), and mar-
performed to examine if the orders of similarity ranking across gin of error (RSS/sum of approved quantities). As shown in Table 8,
20 crafts are statistically equivalent. Result shows that only 35 the craft-oriented method yields the smallest margin of error with
out of 190 possible one-to-one paired groups of crafts (18%) dem- respect to craft quantities between project 0 and retrieved historical
onstrate similar orders of similarity ranking (Table 7), suggesting crafts, demonstrating the effectiveness of the proposed algorithm. It
significant differences among crafts with respect to similarity also highlights the importance of differentiating crafts in similar
ranking. case retrieval since retrieved similar projects (6, 23, and 33) yielded
bigger margins of error. The result of the standard Euclidean
distance method is not satisfactory either.
Comparison of Results
Additional observations were made as follows:
Actual craft quantities of selected historical projects were compared 1. The order of similarity ranking obtained from the standardized
with the approved estimates of project 0. In order to evaluate the Euclidean distance remains the same for all crafts since the
relative advantage of the proposed algorithm, two comparisons difference among crafts is not recognized, while the similarity
were performed: obtained from the proposed algorithm results in different or-
• Craft oriented comparison: Each individual craft of project 0 ders of similarity ranking for most crafts, which reflects the
was compared with the most similar craft in history determined uniqueness of each craft.
by the proposed algorithm (Table 6) with respect to quantity. 2. The proposed algorithm tends to allocate varied importance to
This method reflects the target of the proposed algorithm project parameters given different conditions. A linear
directly, i.e., recognizing difference across crafts. ANOVA was performed to test the significance of project
• Project level comparison: Entire projects were chosen to parameters. Result shows using the standardized Euclidean
compare with project 0 with respect to craft quantities. Project distance method, job configuration always plays an important
31 was selected based on the calculation of standard Euclidean role (p ¼ 0.047), but the proposed algorithm tends to priori-
distance. Projects 6, 23, and 33 were selected according to an tize project parameters differently. For example, in complete
overall similarity measure (1=average rank order) developed piping estimating, the most important parameters are engineer-
based on the proposed algorithm (Fig. 10). ing company (p ¼ 0.011) and job type (p ¼ 0.014), while in
J. Comput. Civ. Eng., 2014, 28(6): 04014020

Table 7. Similar Pairs of Crafts with Respect to Similarity Ranking Table 8. Difference between Approved Estimates of Project 0 and Selected
Historical Projects
Test Prob Prob Prob
Pairs statistics > jSj >S <S Compared to SSE RSS Margin of error (%)
Control cable—small bore −4 0.9668 0.5166 0.4834 Craft oriented 1.74 ×1010 131,983 5.2
Small bore—concrete PFC −8.5 0.9295 0.5353 0.4647 6 2.41 × 1010 155,128 6.1
Cable 600V—complete piping 13 0.8889 0.4444 0.5556 33 2.82 × 1011 531,211 20.9
Steel grating—instrument −16.5 0.8547 0.5727 0.4273 23 6.52 × 1010 255,340 10.1
Large bore—small bore 18 0.8513 0.4256 0.5744 31 2.27 × 1011 476,504 18.8
Main circ—small bore 26.5 0.7825 0.3913 0.6087
Large bore—concrete PFC 28.5 0.7516 0.3758 0.6242
Instrument cable—small bore −36 0.7076 0.6462 0.3538
Control cable—instrument cable 53.5 0.5646 0.2823 0.7177
Job type, which basically indicates the actual work amount in
Erect steel—small bore −61 0.5244 0.7378 0.2622 main stacks, is a good indicator for piping amount. Outside
Cable 600V+—complete piping −62 0.5042 0.7479 0.2521 stack distance, on the other hand, does reflect the need of total
Small bore—complete piping −68 0.4777 0.7612 0.2388 cable since cables are used to connect different major func-
Cable tray—pipe racks 69 0.4267 0.2134 0.7866 tional areas of the plant.
Instrument cable—concrete PFC −74.5 0.4216 0.7892 0.2108
Main circ—concrete PFC 78 0.3846 0.1923 0.8077
Control cable—large bore −82 0.3761 0.812 0.188
Discussion
Instrument cable—large bore −84 0.3644 0.8178 0.1822
Main circ—control cable 105 0.2557 0.1278 0.8722
Fundamentally, CBR analysis aims to find feasible solutions from
Instrument—total cable −106.5 0.2488 0.8756 0.1244
Pipe racks—concrete formwork −102.5 0.2358 0.8821 0.1179
the past for new problems. It requires an explicit and accurate def-
Cable tray—concrete formwork −115 0.2125 0.8938 0.1062 inition of the similarity between cases. This paper finds that the
Pipe racks—small bore 121 0.2037 0.1018 0.8982 relationship between case features and case outcomes (solutions)
Cable tray—small bore 128.5 0.1765 0.0883 0.9117 can significantly affect similarity measure, which in turn deter-
Cable 600V+—small bore 130 0.1714 0.0857 0.9143 mines the accuracy and reliability of CBR analysis. In cases where
Small bore—cable 600V- −146 0.1235 0.9382 0.0618 the feature-outcome relationship is simple and straightforward,
Instrument cable—erect steel 142.5 0.1205 0.0602 0.9398 seeking similar solutions can be achieved directly by finding cases
Instrument cable—total steel 144 0.1166 0.0583 0.9417 with similar features, or problem description, while if the feature-
Main circ—large bore 144.5 0.1032 0.0516 0.9484
outcome relationship is nonlinear and complex such a relationship
Small bore—concrete formwork −157 0.097 0.9515 0.0485
Pipe racks—cable 600V- 159.5 0.0812 0.0406 0.9594
influences accurate solution seeking. This paper attempts to de-
Control cable—concrete PFC −144.5 0.0806 0.9597 0.0403 velop an algorithm to capture the projection from the feature space
Concrete PFC—complete piping −160 0.0803 0.9599 0.0401 to the solution space. Global sensitivity analysis was utilized to es-
Main circ—instrument cable 161 0.0783 0.0392 0.9608 timate the first-order and high-order influences of project param-
Control cable—complete piping −170 0.0716 0.9642 0.0358 eters on craft quantities. Following the statistical definition of
Erect steel—total steel 169.5 0.0633 0.0316 0.9684 “importance” (Saltelli et al. 2008), the measurements of these in-
fluences were used immediately to convert feature similarity to sol-
ution similarity.
total cable estimating, the most important parameters are Another issue addressed in this paper is the multicollinearity
engineering company (p ¼ 0.007), and outside stack distance among project parameters. It was found that there are strong
(p ¼ 0.036). This directly leads to different orders of similar- correlations among project parameters. Without careful processing
ity ranking. It has been confirmed that the importance of of these correlations, the analysis becomes fragile to any slight
allocation of the proposed algorithm performs better in reflect- change in the input, and the importance of certain project param-
ing the nature of craft works. For example, piping and cable eters is likely to be exaggerated. In addition, correlated input
usage are both highly dependent on the technologies adopted, variables make the interpretation of GSA result difficult. Therefore,
which root from the preferences and capacities of different en- this paper employs PCA to orthogonalize input data (i.e., project
gineering companies. As a result, the selection of engineering parameters) prior to any analysis. The constructed conversion
companies becomes important in estimating piping and cable. matrix with the weight matrix from GSA together can be used
Fig. 10. Overall similarities of 47 projects
J. Comput. Civ. Eng., 2014, 28(6): 04014020

to transform original input data for accurate CBR analysis. Last, Adeli, H., and Wu, M. (1998). “Regularization neural network for construc-
an additional advantage from the proposed algorithm is the unique- tion cost estimation.” J. Constr. Eng. Manage., 10.1061/(ASCE)0733-
ness of each craft is perceived, and analysis results reflect the 9364(1998)124:1(18), 18–24.
difference between crafts. Alder, M. A. (2006). Comparing time and accuracy of building information
modeling to on-screen takeoff for a quantity takeoff of a conceptual
Monte Carlo simulation was used in the GSA to sample random
estimate, M.S. thesis, Brigham Young Univ., Provo, UT.
numbers. The distribution fitting to input data, however, was not Aldrich, J. (1995). “Correlations genuine and spurious in Pearson and
completely satisfied. Certain distribution fittings cannot pass the Yule.” Statist. Sci., 10(4), 364–376.
chi-square test, which suggests unsatisfied fits. Using these fitted Ashley, K. (2006). “Case-based reasoning.” Information technology and
PDFs as random number generators may distort the variation de- lawyers, A. R. Lodder and A. Oskamp, eds, Springer, Berlin, 23–60.
composition and lead to a misleading conclusion. This is a common Auer, P., Burgsteiner, H., and Maass, W. (2008). “A learning rule for very
issue in most practical problems. A potential solution for this issue simple universal approximators consisting of a single layer of
is to consider the use of an expert opinion to revise the distributions, perceptrons.” Neural Netw., 21(5), 786–795.
such as triangular distribution, which is commonly used in Ayed, A. (1998). “Neural network model for parametric cost estimation of
construction research (Maio et al. 2000). Another issue with the highway projects.” J. Constr. Eng. Manage., 10.1061/(ASCE)0733-
9364(1998)124:3(210), 210–218.
proposed algorithm is that certain qualitative project parameters
Becker, T., Jaselskis, E., El-Gafy, M., and Du, J. (2012). “Industry practices
were simply converted into integer codes; this method is not as rea- for estimating, controlling and managing key indirect construction
sonable as the dummy variable method. Last, interpretation of the costs at the project level.” Construction Research Congress 2012,
analysis results could be difficult because of the use of PCA. PCA 2469–2478.
serves as a linear combination function that breaks the original Bishop, C. M. (1995). Neural networks for pattern recognition, Oxford
meanings of input variables. These issues might affect the appli- University Press, Oxford, U.K.
cability of the proposed algorithm. Carr, R. I. (1989). “Cost-estimating principles.” J. Constr. Eng. Manage.,
10.1061/(ASCE)0733-9364(1989)115:4(545), 545–551.
Chang, C. L., Cheng, B. W., and Su, J. L. (2004). “Using case-based rea-
soning to establish a continuing care information system of discharge
Conclusions planning.” Expert Syst. Appl., 26(4), 601–613.
Changchien, S., and Lin, M. C. (2005). “Design and implementation of a
This study proposed a new algorithm to improve the similarity mea-
case-based reasoning system for marketing plans.” Expert Syst. Appl.,
sure, which is critical for proper case retrieval when applying CBR 28(1), 43–53.
in quantity takeoff in the proposal development phase. We have Chen, W., Jin, R., and Sudjianto, A. (2005). “Analytical variance-based
noted that the current weighting method used in the CBR similarity global sensitivity analysis in simulation-based design under uncer-
measure lacks a statistically sound procedure. Questions remain tainty.” J. Mech. Des., 127(5), 875–886.
unanswered especially how to tackle the multicollinearity among Chiu, C. (2002). “A case-based customer classification approach for direct
input variables, nonlinearity between feature space and outcome marketing.” Expert Syst. Appl., 22(2), 163–168.
space, and difference across crafts. Building on existing methods, Chua, D., Li, D., and Chan, W. (2001). “Case-based reasoning approach in
including Sobol’s TSI, PCA, and ANN, the multicollinearity bid decision making.” J. Constr. Eng. Manage., 10.1061/(ASCE)0733-
and uniqueness of crafts have been addressed by the proposed 9364(2001)127:1(35), 35–45.
Draper, N. R., Smith, H., and Pownell, E. (1966). Applied regression analy-
algorithm.
sis, Wiley, New York.
A better similarity measure is also equally important to Du, J., and El-Gafy, M. (2010). “Virtual organizational imitation for
estimators in conceptual estimating. We have found that a practical construction enterprises (VOICE): Managing business complexity us-
approach adopted by many estimators at conceptual estimating ing agent based modeling.” Proc., Construction Research Congress
phases is to make comparisons with historical jobs due to the (CRC) 2012, ASCE, Reston, VA.
lack of necessary information. In certain extreme cases, estima- Du, J., and El-Gafy, M. (2012). “Virtual organizational imitation for
tors do not have any information for estimating. A better sim- construction enterprises: Agent-based simulation framework for
ilarity measurement method will be a significant time saver for exploring human and organizational implications in construction
estimators. management.” J. Comput. Civ. Eng., 10.1061/(ASCE)CP.1943-5487
This study also highlights the importance of GSA in the con- .0000122, 282–297.
Duverlie, P., and Castelain, J. (1999). “Cost estimation during design step:
struction engineering and management field. Sensitivity analysis
parametric method versus case based reasoning method.” Int. J. Adv.
is the X-ray of modelers (Fürbinger 1996). At a time when model- Manufact. Technol., 15(12), 895–906.
ing and simulation approaches are of increasing interest among Farrar, D. E., and Glauber, R. R. (1967). “Multicollinearity in regression
construction scholars (Taylor 2010), it is necessary to keep SA analysis: The problem revisited.” Rev. Econ. Stat., 49(1), 92–107.
methods updated. We have found, to our best knowledge, that there Feng, W., and Figliozzi, M. (2011). “Empirical findings of bus bunching
is a tendency in environmental research to move from conventional distributions and attributes using archived AVL/APC bus data.” Proc.,
SA to variance-based GSA. The application of Sobol’s TSI has 11th Int. Conf. of Chinese Transportation Professionals (ICCTP),
recently become particularly prevalent in uncertainty analysis, ASCE, Reston, VA.
sensitivity analysis, and model evaluation. Although not directly Freund, R. J., and Littell, R. C. (1981). SAS for linear models: A guide to
focused on introducing GSA, this paper is expected to draw the the ANOVA and GLM procedures, SAS Institute, Cary, North Carolina.
Friedman, J., Tibshirani, R., and Hastie, T. (2009). The elements of stat-
attention of construction scholars as a improved post-modeling
istical learning: Data mining, inference, and prediction, Springer,
analytical procedure. New York.
Fürbinger, J. (1996). “Sensitivity analysis for modellers.” Air Infiltration
Rev., 17(4), 8–10.
References Fyfe, C., and Corchado, J. (2002). “A comparison of kernel methods for
instantiating case based reasoning systems.” Adv. Eng. Inform., 16(3),
Aamodt, A., and Plaza, E. (1994). “Case-based reasoning: Foundational 165–178.
issues, methodological variations, and system approaches.” AI Comm., Greenwood, P. E., and Nikulin, M. S. (1996). A guide to chi-squared
7(1), 39–59. testing, Wiley, New York, NY.
J. Comput. Civ. Eng., 2014, 28(6): 04014020

Heylighen, A., and Neuckermans, H. (2001). “A case base of case-based Sallaberry, C. J., and Helton, J. C. (2006). “An introduction to complete
design tools for architecture.” Comput. Aided Des., 33(14), 1111–1122. variance decomposition.” 24th Conf. and Exposition on Structural
Hu, M., and Zhang, L. X. (2008). “Research of case-based reasoning for Dynamics 2006 (IMAC-XXIV), Curran Associates, St Louis, MO.
tunnelling risk using rough set.” Comput. Eng., 34(23), 244–246. Saltelli, A., and Annoni, P. (2010). “How to avoid a perfunctory sensitivity
Hui, S., Fong, A., and Jha, G. (2001). “A web-based intelligent fault diag- analysis.” Environ. Model. Software, 25(12), 1508–1517.
nosis system for customer service support.” Eng. Appl. Artif. Intell., Saltelli, A., et al. (2008). Global sensitivity analysis: The primer, Wiley
14(4), 537–548. Online Library, Hoboken, NJ.
Jolliffe, I. T. (2002). “Principal component analysis and factor analysis.” Saltelli, A., Ratto, M., Tarantola, S., and Campolongo, F. (2006).
Principal component analysis, Springer, New York, NY, 150–166. “Sensitivity analysis practices: Strategies for model-based inference.”
Kim, G., An, S., and Kang, K. (2004a). “Comparison of construction cost Reliability Eng. Syst. Safety, 91(10), 1109–1125.
estimating models based on regression analysis, neural networks, and Saltelli, A., and Tarantola, S. (2002). “On the relative importance of input
case-based reasoning.” Build. Environ., 39(10), 1235–1242. factors in mathematical models.” J. Am. Stat. Assoc., 97(459), 702–709.
Kim, G., Seo, D., and Kang, K. (2005). “Hybrid models of neural networks Sathyanarayanamurthy, H., and Chinnam, R. B. (2009). “Metamodels for
and genetic algorithms for predicting preliminary cost estimates.” variable importance decomposition with applications to probabilistic
J. Comput. Civ. Eng., 10.1061/(ASCE)0887-3801(2005)19:2(208), engineering design.” Comput. Ind. Eng., 57(3), 996–1007.
208–211. Setiono, R. (1997). “A penalty-function approach for pruning feedforward
Kim, G. H., Yoon, J. E., An, S. H., Cho, H. H., and Kang, K. I. (2004b). neural networks.” Neural Comput., 9(1), 185–204.
“Neural network model incorporating a genetic algorithm in estimating Shannon, C. E., and Weaver, W. (1949). A mathematical theory of commu-
construction costs.” Build. Environ., 39(11), 1333–1340. nication, Univ. of Illinois Press, Urbana, IL.
Kouskoulas, V., and Koehn, E. (1974). “Predesign cost-estimation function Shin, K., and Han, I. (1999). “Case-based reasoning supported by genetic
for buildings.” J. Constr. Div., 100(4), 589–604. algorithms for corporate bond rating.” Expert Syst. Appl., 16(2), 85–95.
Lilburne, L., and Tarantola, S. (2009). “Sensitivity analysis of spatial Shin, K., and Han, I. (2001). “A case-based approach using inductive index-
models.” Int. J. Geograph. Inform. Sci., 23(2), 151–168. ing for corporate bond rating.” Decis. Support Syst., 32(1), 41–52.
Low, B. (2005). “Reliability-based design applied to retaining walls.” SimLab. (2011). Software package for uncertainty and sensitivity analysis,
Geotechnique, 55(1), 63–75. Joint Research Centre of the European Commission, Brussels, Belgium.
Slonim, T., and Schneider, M. (2001). “Design issues in fuzzy case-based
Madhusudan, T., Zhao, J. L., and Marshall, B. (2004). “A case-based rea-
reasoning.” Fuzzy Sets Syst., 117(2), 251–267.
soning framework for workflow model management.” Data Knowl.
Sobol, I., Tarantola, S., Gatelli, D., Kucherenko, S., and Mauntz, W. (2007).
Eng., 50(1), 87–115.
“Estimating the approximation error when fixing unessential factors in
Mahalanobis, P. C. (1936). “On the generalized distance in statistics.” Proc.
global sensitivity analysis.” Reliab. Eng. Syst. Safety, 92(7), 957–960.
Natl. Inst. Sci. India, 2(1), 49–55.
Sobol, I. M. (2001). “Global sensitivity indices for nonlinear mathematical
Maier, H., Dandy, G., Norton, J., and Croke, B. (2005). “A comparison
models and their Monte Carlo estimates.” Math. Comput. Simulat.,
of sensitivity analysis techniques for complex models for environmental
55(1–3), 271–280.
management.” Proc, Int. Congress on Modelling and Simulation,
Stewart, R. D., Wyskida, R. M., and Johannes, J. D. (1995). Cost estima-
Melbourne, VIC, 2533–2539.
tor’s reference manual, Wiley-Interscience, Hoboken, NJ.
Maio, C., Schexnayder, C., Knutson, K., and Weber, S. (2000). “Probability Tam, C., and Fang, C. F. (1999). “Comparative cost analysis of using high-
distribution functions for construction simulation.” J. Constr. Eng. performance concrete in tall building construction by artificial neural
Manage., 10.1061/(ASCE)0733-9364(2000)126:4(285), 285–292. networks.” ACI Struct. J., 96(6), 927–936.
Mohri, T., and Tanaka, H. (1994). “An optimal weighting criterion of case Taylor, J. (2010). “Invitation to submit scholarly articles using agent-based
indexing for both numeric and symbolic attributes.” AAAI-94 Work- simulation to tackle challenging civil engineering problems.” J. Com-
shop Program: Case-Based Reasoning, Working Notes, 123–127. put. Civ. Eng., 10.1061/(ASCE)CP.1943-5487.0000069, 465–466.
Mooney, C. Z. (1997). Monte Carlo simulation, Sage Publications, Tetko, I. V., Livingstone, D. J., and Luik, A. I. (1995). “Neural network
Thousand Oaks, CA. studies. 1. Comparison of overfitting and overtraining.” J. Chem.
NASA. (2008). “2008 NASA cost estimating handbook.” 〈http://www.nasa Inform. Comput. Sci., 35(5), 826–833.
.gov/pdf/263676main_2008-NASA-Cost-Handbook-FINAL_v6.pdf〉 Thogmartin, W. E. (2010). “Sensitivity analysis of North American bird
(Jan. 22, 2014). population estimates.” Ecol. Model., 221(2), 173–177.
NeuralWare. (2014). “How many hidden nodes or hidden layers should Tsai, T., Kao, C., Surampalli, R., Huang, W., and Rao, J. (2011). “Sensi-
I use?.” Frequently asked questions, 〈http://www.neuralware.com/ tivity analysis of risk assessment at a petroleum-hydrocarbon contami-
index.php/frequently-asked-questions#professional-plus-hidden-layers〉 nated site.” J. Hazard. Toxic Radioact. Waste, 10.1061/(ASCE)HZ
(Jan. 22, 2014). .1944-8376.0000067, 89–98.
Oakley, J. E., and O’Hagan, A. (2004). “Probabilistic sensitivity analysis of Varella, H., Guérif, M., and Buis, S. (2010). “Global sensitivity analysis
complex models: A Bayesian approach.” J. Roy. Stat. Soc. Ser. B, 66(3), measures the quality of parameter estimation: the case of soil param-
751–769. eters and a crop model.” Environ. Model. Software, 25(3), 310–319.
Ozcan Deniz, G., Zhu, Y., and Ceron, V. (2012). “Time, cost and environ- Watson, I. (1999). “Case-based reasoning is a methodology not a technol-
mental impact analysis on construction operation optimization using ogy.” Knowl. Based Syst., 12(5–6), 303–308.
genetic algorithms.” J. Manage. Eng., 10.1061/(ASCE)ME.1943- Wettschereck, D., and Aha, D. (1995). “Weighting features.” Case-based
5479.0000098, 265–272. reasoning research and development, Springer, Berlin, Heidelberg,
Perera, S., and Watson, I. (1998). “Collaborative case-based estimating and 347–358.
design.” Adv. Eng. Softw., 29(10), 801–808. Xiao, F., Honma, Y., and Kono, T. (2005). “A simple algebraic interface
Project Management Institute. (2008). A guide to the project management capturing scheme using hyperbolic tangent function.” Int. J. Numer.
body of knowledge: PMBOK guide, 4th Ed., Project Management Meth. Fluids, 48(9), 1023–1040.
Institute, Newtown Square, PA. Yau, N. J., and Yang, J. B. (1998a). “Applying case-based reasoning tech-
Reedijk, C. (2000). “Sensitivity analysis of model output: Performance of nique to retaining wall selection.” Automat. Constr., 7(4), 271–283.
various local and global sensitivity measures on reliability problems.” Yau, N. J., and Yang, J. B. (1998b). “Case-based reasoning in construction
M.S. thesis, Delft Univ. of Technology, Delft, Netherlands. management.” Comput. Aided Civ. Infrastruct. Eng., 13(2), 143–150.
J. Comput. Civ. Eng., 2014, 28(6): 04014020

Improved Similarity Measure in Case-Based Reasoning With Global Sensitivity Analysis - An Example of Construction Quantity Estimating

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Improved Similarity Measure in Case-Based Reasoning With Global Sensitivity Analysis - An Example of Construction Quantity Estimating

Uploaded by

Copyright:

Available Formats

Improved Similarity Measure in Case-Based Reasoning

with Global Sensitivity Analysis: An Example of

Introduction measurements between two cases (Changchien and Lin 2005). It

© ASCE 04014020-1 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

Fig. 1. Does similar feature refer to similar solution?

© ASCE 04014020-2 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

Fig. 2. The influence of feature-solution projection function on solution selection

© ASCE 04014020-3 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

The TSI is then defined as

© ASCE 04014020-4 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

ANN should be employed.

Fig. 3. An ANN model The Algorithm

As a response to the research questions, a new similarity measure

© ASCE 04014020-5 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

Fig. 5. Analysis flowchart of the proposed algorithm

© ASCE 04014020-6 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

11 Large bore pipe Length of large bore pipe LF

Table 2. Project Parameters Used in the Analysis

© ASCE 04014020-7 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

Fig. 6. Correlation scatter graph of original input data

Table 3. Eigenvalues of Principal Component Analysis

© ASCE 04014020-8 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

where si ði ¼ 1; : : : ; nÞ is the TSIs indices obtained from Sobol’s

Fig. 7. Correlation scatter graph of input data after orthogonalization

© ASCE 04014020-9 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

Fig. 8. PDFs of transformed project parameters (principal)

© ASCE 04014020-10 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

Table 4. Sobol’s TSIs of Transformed Project Parameters (Principal)

© ASCE 04014020-11 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

Fig. 9. Examples of MPW and MOW (total steel)

Table 5. Similarity Measures of 47 Projects on a Craft Basis

© ASCE 04014020-12 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

© ASCE 04014020-13 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

© ASCE 04014020-14 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

© ASCE 04014020-15 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

Fig. 10. Overall similarities of 47 projects

© ASCE 04014020-16 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

© ASCE 04014020-17 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

© ASCE 04014020-18 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 2014, 28(6): 04014020

You might also like