Professional Documents
Culture Documents
Abstract: In recognition of the importance of historical knowledge in decision making, case based reasoning (CBR) is utilized as a form of
an expert system to tackle construction management issues such as quantity takeoff in the proposal development phase of a project. It builds
on a proposition that past projects similar to the new one would suggest a reasonable range of craft quantities. This paper finds that when
measuring the similarity between the new project and historical projects, traditional similarity measure methods fail to consider the non-
linearity and muticollinearity embedded in the problem, as well as differences across crafts. An innovative similarity measurement algorithm
was therefore proposed to tackle the above issues with a carefully designed orthogonalization process and Sobol’s total sensitivity analysis.
The application of the proposed algorithm to the craft quantity takeoff of a power plant project was introduced, demonstrating a better result
compared with traditional methods. It is likely that the proposed algorithm will advance current CBR practices in construction management.
DOI: 10.1061/(ASCE)CP.1943-5487.0000267. © 2014 American Society of Civil Engineers.
Author keywords: Conceptual cost estimation; Quantity takeoff; Case based reasoning; Global sensitivity analysis; Sobol’s TSI; Artificial
neural networks; Principal component analysis.
nonlinear, then searching for the most similar features is not This paper aims to introduce an innovative case retrieval algo-
sufficient for figuring out the most similar solution [Fig. 1(b)]. rithm for CBR analysis that meets the following requirements:
In estimating practices, we have observed certain cases where a (1) similarity between two cases is measured with an overall
slight change of project parameters results in significant differences consideration of both feature similarity and feature-solution pro-
in craft quantities. Assigning weights to case features (to reflect jection functions; (2) similarity is estimated on a craft-by-craft
relative importance) is ineffective and mostly insufficient to capture basis so differences across crafts is considered; and (3) the inter-
such nonlinear complexity. In contrast, a proper method should be dependences among project parameters are counted to capture
able to capture the projection function fðxi Þ, which projects the the pure effects. This paper considers the similarity measurement
distance in the feature space to the solution space. In other words, of CBR as a projection process from the feature space to the sol-
a similarity measure is given by the measurement of distances in the ution space, and global sensitivity analysis (GSA) is employed to
solution space discover the quantitative relationship in the projection process.
An artificial neural network (ANN) is utilized to deal with the
Simij ¼ Dij0 ¼ f½Dij ; fðxi ; xj Þ ð1Þ nonlinearity of the problem, and principal component analysis
(PCA) is applied to deal with the multicollinearity issue. In the
remainder of this paper, relevant theories and proposed algorithm
Second, even if the assigned weights can reflect the relative are introduced in detail.
importance of different case features, they should not be identical
across different crafts. Otherwise, it probably gives biased conclu-
sions since the relationship between project parameters and quan-
tities of different crafts is very unique on a craft-by-craft basis. Literature Review
For example, in a power plant project the main rack distance is
a critical factor driving estimating pipes, but for underground con- Case Retrieval of Case-Based Reasoning
duit the total distance between the major working areas is more A central task of CBR analysis is accurate case retrieval, which
important. Consequentially, main rack distance and total distance
aims to select one or more similar historical cases from the case
between major working areas should be assigned different weights
library so that solutions can be adapted for new problems. Typical
when different crafts are considered.
subtasks of case retrieval include identifying case features, initial
The third issue roots from the correlation among project
parameters. In practice, we found project parameters are often match, search, and selection (Aamodt and Plaza 1994). In general
highly correlated. Such correlation leads to multicollinearity of case retrieval methodologies can be categorized into four types
the input data and the effects of multicollinearity may significantly (Watson 1999; Shin and Han 2001): (1) nearest neighbor, which
distort the analysis results (Farrar and Glauber 1967). For example, retrieves matched cases according to a weighted distance of fea-
job configuration and total number of equipment are two tures between cases; (2) induction, which identifies patterns
interdependent variables. If assigning too much importance to amongst cases and classifies the cases into clusters according to
job configuration and total number of equipment in CBR-oriented problem descriptions; (3) fuzzy logics, which formalizes the
conceptual estimating, results could be too sensitive to the slight symbolic processing of fuzzy linguistic terms associated with
change of either one of them, and in certain cases, the estimated differences in the attributes describing case features; and (4) knowl-
quantities will be exaggerated. A method that can differentiate edge guided, which applies existing domain and experimental
the pure effects and interaction effects of a case feature is needed. knowledge to locate relevant cases.
2004), and fuzzy similarity indices (Slonim and Schneider 2001). Sensitivity Analysis
One of the most obvious measures of similarity between two SA is used in this paper to compute relative importance of project
cases is through the distance (Shin and Han 1999; Changchien parameters, i.e., to what extent the variations of craft quantities are
and Lin 2005), as shown in Eq. (2) attributable to different variations in project parameters (Saltelli
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi et al. 2008). In a comprehensive review of SA literature, Saltelli
X n
and Annoni (2010) found that most studies apply SA in an OAT
Disab ¼ wi × ðxia − xib Þ2 ð2Þ
(one-at-a-time) fashion, i.e., changing the value of uncertain factors
i¼1
one-at-a-time while keeping the others constant. In construction
literature, OAT-SA has been widely used for risk control (Tsai et al.
where Disab is the distance between two cases a and b in the
2011), construction operations optimization (Ozcan Deniz et al.
hyperspace (the number of dimensionality is the number of fea-
2012), transportation analysis (Feng and Figliozzi 2011), and
tures), xia and xib are the values of the ith feature of case a and b.
structural design (Low 2005). But it has already been found that
Eq. (2) in fact is a Euclidean distance. Although other definitions
of distance are also used in CBR analysis such as Minkowski OAT-SA is justified only if the model is linear (Saltelli et al.
distance (Duverlie and Castelain 1999), they all work in the same 2006; Lilburne and Tarantola 2009). If the problem is nonlinear,
way. Another issue of nearest neighbor is the determination of OAT always leads to misleading conclusions (Thogmartin 2010;
weight to each case feature [i.e., wi in Eq. (2)] since it has a Varella et al. 2010). Regarding construction problems as nonlinear
significant influence on the efficiency and accuracy of case systems (Du and El-Gafy 2010; Du and El-Gafy 2012), nonlinear-
retrieval (Changchien and Lin 2005). In many cases, the subjec- ities should not be ignored when selecting SA methods.
tive weight assignment is applied and thus the retrieved solu- Failure to capture nonlinearities has also been found in
tions cannot always be guaranteed (Changchien and Lin 2005). regression- and correlation-based SA. Evidence indicates that
Therefore, different methods have been proposed to enhance regression-based SA only works for linear models and its effective-
the determination of feature weights such as gradient descent ness depends on the goodness of fit (Lilburne and Tarantola 2009).
methods (Yau and Yang 1998a, b), genetic algorithms (Chiu Correlation measures are not effective at approximating the data
2002), artificial neural networks (Hui et al. 2001), analytic either in nonlinear problems since nonlinearities are poorly taken
hierarchy processing (Chang et al. 2004), information gain into account (Maier et al. 2005). In recognition of the nonlinearity
(Wettschereck and Aha 1995), and other statistical methods to issues, remedies have been proposed such as the Morris method
maximize variance between cases (Mohri and Tanaka 1994). and importance measure. However, while the Morris method can
These weight determination algorithms, however, are incapable account for nonlinearity, it has an assumption of monotonicity,
of capturing complex relationship between case features and which is not always true (Maier et al. 2005). And the Morris
corresponding solutions, i.e., the link between “feature similar- method cannot differentiate between the effects caused by nonli-
ity” and “solution similarity” (Fig. 2). As shown in Fig. 2(b), nearity in the model and parameter interactions (Maier et al. 2005).
if the feature-solution relationship is nonlinear, the approximation As for the importance measure, it only provides first-order effects
(a) (b)
where SSE is the squared errors between actual and predicted covðX; YÞ E½ðX − μX ÞðY − μY Þ
observation values, given by ρX;Y ¼ ¼ ð14Þ
σX σY σX σY
X
n
SSE ¼ ½yi − fðxi Þ2 ð13Þ
i¼1 where covðX; YÞ is the covariance of two variables, and σX and σY
are their standard deviations. In practice, the sample Pearson’s
The target of fitting an ANN model is using maximum likeli- correlation coefficient is used as an estimator of population corre-
hood of estimating the model parameters. lation coefficient
ω1
Assign
2 4
weights
Global Sensitivity
Analysis
ω2
ω5
ω3
ω4
Fig. 4. Geometric view of the proposed algorithm
P P P P
xi yi − nx̄ ȳ n xi yi − xi yi Orthogonalization
rxy ¼ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P P pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P P
ðn − 1Þsx sy n x2i − ð xi Þ2 n y2i − ð yi Þ2 If the requirement of orthogonality is not met, orthogonalization
ð15Þ should be performed to make transformed input variables inde-
pendent with each other. In this paper, we propose the use of
where x̄ and ȳ are the sample means of two variables, and sX and PCA as the orthogonalization method (Jolliffe 2002). Building
sY are sample standard deviations. Besides Pearson’s correlation on the eigenvalue decomposition of the covariance matrix,
coefficient, scattergraph is also plotted to demonstrate the relation- PCA enables a linear transformation to convert correlated
ships among input variables. variables into a set of uncorrelated variables called principal
components. (Jolliffe 2002). This transformation is defined in where I p is a p p identify matrix. In that λ is the eigenvalue of
such a way that the first principal component accounts for as Σ, and α1 is the eigenvector. Our target is to find the maximum λ
much of the variability in the data as possible, i.e., has a high so that the first principal component is obtained. For the suc-
variance, and each succeeding principal component in turn has ceeding principal components the above procedure is repeated
the highest variance subjected to the constraint that it be orthogo- with the constraint
nal to the preceding components (Jolliffe 2002). Consider the
form of the first principal component as a linear transformation cov½αi0 x; αj0 x ¼ 0; ð0 < i; j ≤ pÞ ð18Þ
of X, i.e., α10 x,P
per the definition of PCA we know α1 maximizes
var½α10 x ¼ α10 α1 under the constraint of α10 α1 ¼ 1. To find α10 , Finally we define a matrix A which contains all the αi , which
the Lagrange multiplier is used can give
X
α10 α1 − λðα10 α1 − 1Þ ð16Þ P ¼ X × A ð19Þ
1×n 1×n n×n
where λ is the Lagrange multiplier. Following Eq. (16), we know where X is the standardized values of input variables; P is the
transformed uncorrelated variables (i.e., principals). A is a linear
X
transformation matrix on original input data, i.e., a matrix used
−λI p α1 ¼ 0 ð17Þ
to transform original correlated input variables to uncorrelated
dist
cell
rack
sum
variables (i.e., principal components). In order to perform Monte i.e., variables after orthogonalization. As a result, it should be con-
Carlo simulation, probability distributions of principal components verted to the matrix applicable for original input variables.
should be identified at the beginning of the analysis (Mooney
1997). A chi-square test is conducted to examine whether fitted
distribution is a good representation of a principal component Calculating Weighted Mahalanobis Distance
(Greenwood and Nikulin 1996). After obtaining TSIs of all the The final step aims to calculate the ultimate measurement of sim-
principals, we construct a matrix S to show the weights for trans- ilarity based on MPW. If a MPW is known for the problem, the
formed axes weighted Mahalanobis distance can be obtained by
X × W × W T × X T ¼ X × A × S × ST × AT × X T ð23Þ ically similar projects. It was critical to find similar projects as a refer-
ence of quantity takeoff of project 0. Twenty crafts were selected to
examine the effectiveness and usefulness of the proposed algorithm
Finally, W is determined by
(Table 1). For typical EPC (engineering, procurement, and construc-
W ¼A×S ð24Þ tion) projects, these crafts account for a significant portion of direct
labor hours and thus can be regarded as a fair representation of the
In the analysis, we can use W (MOW) to calculate weighted quantity of the usage of the entire project.
Mahalanobis distance directly out of the value of standardized input Following the Project Management Institute (2008), project
variables [see Eq. (22)]. The weighted Mahalanobis distance then parameters are “project characteristics that can be used to develop
can be used as the similarity measure between two cases. If we mathematical models to predict total project costs.” They are physi-
defined the maximum weighted Mahalanobis distance as 100%, cal attributes of a project that are considered to be major cost drivers
a similarity measure is obtained by comparing it with such as project size, project type, and location (NASA 2008). Stat-
istical relationships may be found between these attributes and final
Distiw construction costs (Ayed 1998). Based on expert opinions and a
Simi ¼ 1 − preliminary MANOVA (multiple factor analysis of variance)
maxðDistiw Þ
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi (Freund and Littell 1981), 12 project parameters (Table 2) were
⇀ ⇀ ⇀ ⇀ T
ð x n − x i Þ × W × D−1 × W T × ð x n − x i Þ selected as explanatory factors that are believed to be able affect
¼1− qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi the quantities of 20 crafts. Then data from 47 historical projects
⇀ ⇀ ⇀ ⇀ T
maxð ð x n − x i Þ × W × D−1 × W T × ð x n − x i Þ (1993 through 2010) was collected for the analysis.
ð25Þ
Applying the Algorithm
where D is the covariance matrix. Simi is used as the improved Pearson’s correlation analysis was performed to examine the
similarity measurement in CBR. orthogonality of project parameters (Fig. 6). Among 12 project
with 0 designating “greenfield” and 1 designating “brownfield.” The MOT matrix is finally obtained as follows:
0 1
0.367 0.209 0.083 0.058 0.210 0.137 −0.010 −0.045 −0.152 −0.533 −0.656 −0.096
B 0.285 −0.331 −0.185 0.037 0.604 −0.050 −0.016 0.091 0.584 −0.140 0.191 −0.013 C
B C
B C
B 0.002 −0.030 0.749 −0.526 0.311 −0.017 0.128 0.145 −0.113 0.049 0.108 0.018 C
B C
B 0.365 −0.181 −0.160 −0.161 0.045 0.297 −0.066 0.053 −0.134 0.490 −0.268 0.596 C
B C
B C
B 0.382 0.105 −0.054 −0.065 −0.016 0.386 −0.101 0.051 −0.065 0.380 0.119 −0.715 C
B C
B 0.041 0.628 0.240 0.172 0.005 −0.200 −0.089 −0.010 0.534 0.365 −0.203 0.096 C
B C
B 0.293 0.426 0.066 0.201 0.064 0.285 −0.057 −0.126 −0.174 −0.256 0.616 0.328 C
B C
B C
B 0.284 −0.264 0.264 0.151 −0.144 −0.390 −0.748 −0.085 −0.124 −0.007 0.048 −0.013 C
B C
B −0.011 −0.355 0.461 0.697 −0.049 0.272 0.283 0.000 0.036 0.113 −0.062 −0.002 C
B C
B 0.347 −0.031 −0.042 0.003 0.067 −0.470 0.461 −0.609 −0.176 0.176 0.044 −0.062 C
B C
B C
@ 0.310 −0.156 0.132 −0.304 −0.656 0.145 0.148 −0.109 0.462 −0.258 0.034 0.051 A
0.346 0.066 −0.074 0.129 −0.171 −0.391 0.291 0.746 −0.158 −0.018 0.075 0.011
9 46.5 47.5 48.1 45.6 47.4 44.3 49.5 45.2 43.5 47.4
10 74.9 96.8 62.5 67.4 56.6 46.6 64.0 67.8 81.7 89.5
11 29.2 29.5 32.8 30.0 32.2 25.3 34.4 30.1 30.2 29.4
12 28.5 36.2 27.5 26.9 28.2 35.2 27.6 27.5 31.6 34.8
13 68.4 72.2 66.3 67.1 66.6 68.2 67.6 67.7 69.9 71.4
14 77.1 77.4 76.8 72.1 77.1 75.1 78.3 76.8 70.1 77.3
15 62.5 63.5 64.2 62.2 64.1 62.4 64.4 62.7 62.9 63.4
16 56.5 58.8 53.3 55.5 45.9 29.3 55.1 54.3 58.1 58.4
17 41.2 41.8 43.8 42.3 42.7 41.0 46.2 41.7 40.1 41.8
18 52.7 55.9 50.5 50.0 49.8 51.0 53.6 49.3 41.3 55.2
19 0.0 0.0 7.2 4.3 6.0 0.0 8.5 2.9 0.0 0.0
20 69.6 71.6 68.4 68.3 67.0 67.2 70.0 68.9 69.5 71.2
21 47.5 55.2 45.5 45.2 46.4 52.2 44.7 44.6 51.7 53.7
22 24.4 47.4 14.0 17.2 18.2 37.1 12.9 18.8 36.5 42.8
23 83.6 87.8 69.5 78.1 71.2 83.8 85.8 73.6 58.3 87.2
24 57.8 87.3 45.5 48.6 44.9 61.6 44.0 51.9 71.1 78.8
25 61.9 97.4 48.7 53.2 49.4 54.6 48.1 55.4 75.7 84.0
26 36.5 60.8 24.9 28.2 30.8 51.0 23.6 31.2 50.6 55.6
27 62.0 97.1 48.7 53.2 49.3 53.9 48.1 55.4 75.9 84.0
28 75.1 92.4 61.6 66.0 54.5 40.0 64.0 66.3 80.4 87.8
29 69.8 72.2 68.0 69.2 67.8 68.9 70.1 69.0 70.1 71.8
30 43.2 79.4 29.8 33.1 35.7 60.3 26.3 35.6 64.2 69.8
31 64.1 64.7 66.0 64.5 65.9 64.3 66.5 64.6 64.3 64.6
32 47.0 78.6 34.6 38.1 37.9 58.8 31.9 40.5 65.9 70.5
33 82.8 92.0 78.2 79.8 76.9 84.3 78.1 80.8 87.0 89.7
34 16.2 21.6 18.0 16.2 18.3 20.0 18.0 15.8 16.6 20.7
35 45.8 69.9 33.9 38.0 37.4 62.7 34.5 39.1 59.7 64.7
36 51.4 64.2 41.6 46.0 44.9 60.0 46.3 44.6 54.2 61.8
37 65.6 70.9 62.7 64.0 61.1 65.9 63.7 61.5 63.0 69.8
38 47.1 66.2 35.6 37.1 39.8 54.3 37.0 40.3 57.3 62.2
39 64.3 67.0 62.6 64.0 63.1 65.4 64.9 61.2 65.9 66.5
40 46.0 47.7 48.3 47.7 47.4 47.6 49.5 45.3 46.8 47.4
41 47.2 48.3 49.7 49.1 48.7 48.4 51.1 46.7 47.5 48.1
42 1.3 12.0 0.0 0.0 0.0 7.7 0.0 0.0 9.3 10.1
43 54.4 97.1 39.3 42.5 48.2 74.7 38.3 46.7 65.0 80.6
44 64.6 84.7 49.7 57.0 53.2 75.0 56.9 57.4 73.5 80.1
45 50.0 82.3 35.8 41.4 41.3 77.7 36.9 42.0 66.6 74.0
46 56.5 61.2 53.8 55.6 54.5 57.7 56.0 55.7 60.0 60.4
47 32.6 47.0 27.3 27.8 28.8 35.9 25.5 29.5 41.3 44.3
Fig. 7 shows the scatter graph after orthogonalization. As shown, (principals) were fitted (Fig. 8). These PDFs were used as random
there is no correlation between transformed input variables (or trans- generators of a Monte Carlo process.
formed project parameters i.e., principal components 1 through 12). The Sobol’s TSIs of transformed project parameters (principals)
Then a set of ANN models were built based on the transformed pertaining to 20 crafts were then computed using SimLab (2011).
project parameters (principals) and actual craft quantities of 47 6,656 samples were generated by a quasirandom sampling process
historical projects. Holdback validation was applied and the result for each craft. Table 4 shows Sobol’s TSIs for each craft.
shows that ANN models fit well to the data. Because principal Based on Sobol’s TSIs, the MPW matrices for 20 crafts were
components 1 through 3 explain 76% of the total variation, profiler constructed. MPW matrices are diagonal matrices with TSIs in
graphs of the ANN models on three principal components were the diagonal line. Following Eq. (24), the MOW matrices were
plotted to check the nonlinearity of the problem (not shown in this obtained by multiplying MOT and MPW. Fig. 9 shows examples
paper). Profiler graphs indicate high nonlinearity of the problem. of MPW and MOW matrices of total steel. The MOW matrices
This is a strong signal that GSA should be utilized (Reedijk 2000). were finally applied directly to original data to calculate weighted
In order to realize Sobol’s total sensitivity analysis, distribution Mahalanobis distances following Eq. (22). Final percent similar-
probability functions (PDFs) of transformed project parameters ities and similarity ranking were calculated based on Eq. (25) as
9 18 32 38 21 25 39 37 39 37 13 35
10 39 41 24 12 9 18 2 4 9 25 36
11 19 42 44 27 35 44 44 44 43 16 44
12 23 38 43 34 40 43 43 43 42 30 41
13 6 6 9 8 7 11 17 13 8 10 9
14 4 3 7 3 4 4 8 10 4 2 8
15 3 11 23 9 11 21 25 25 15 6 19
16 41 45 39 16 18 37 31 31 31 21 43
17 16 36 41 24 29 42 42 41 38 12 39
18 25 33 35 20 20 35 30 33 26 17 33
19 20 47 47 42 46 47 47 47 47 26 47
20 7 8 11 5 6 14 16 15 6 5 11
21 21 22 33 26 30 34 33 35 32 28 30
22 47 40 40 46 45 41 41 42 44 46 40
23 24 14 3 2 2 7 4 3 2 1 3
24 29 20 15 31 28 5 13 11 21 35 20
25 36 35 12 29 22 8 6 6 18 33 28
26 34 28 32 43 42 29 36 34 39 44 31
27 38 37 13 28 21 9 5 7 19 32 29
28 46 43 29 13 8 26 7 5 11 24 38
29 5 7 10 6 5 12 19 12 5 7 10
30 40 24 17 44 39 10 9 21 33 45 22
31 1 2 4 7 10 20 24 24 14 4 16
32 35 27 21 38 37 13 10 18 30 42 25
33 2 1 2 4 3 1 1 2 3 8 2
34 26 44 45 41 43 45 45 45 45 31 45
35 33 26 22 36 34 25 18 26 29 36 18
36 30 29 31 30 27 33 35 29 24 29 27
37 17 13 16 11 13 19 21 17 10 15 13
38 37 25 26 32 32 31 28 27 28 34 21
39 9 12 20 10 12 23 27 19 13 9 15
40 14 31 37 23 26 38 39 38 36 14 34
41 10 30 36 18 23 36 38 37 35 11 32
42 44 46 46 47 47 46 46 46 46 43 46
43 28 10 18 33 31 2 12 9 20 37 4
44 27 16 5 22 17 16 20 8 7 27 5
45 31 15 6 35 33 15 11 14 23 39 6
46 12 19 27 15 19 27 29 28 22 18 24
47 32 39 42 40 41 40 40 40 41 38 42
Concrete
Cable embedded Erect Large Instrument Cable Control Main U/G A/G
Job 600V+ steel steel bore cable tray cable circ conduit conduit
1 35 15 37 35 36 16 36 36 24 21
2 26 34 24 23 18 26 24 23 30 32
3 37 23 39 37 38 19 38 38 31 28
4 19 25 15 16 13 14 15 16 23 23
5 14 16 16 14 12 8 16 15 21 16
6 1 4 2 1 1 1 1 2 1 1
7 41 32 43 43 43 36 44 43 35 35
8 15 22 14 15 22 21 14 14 25 20
9 32 39 26 28 25 37 22 28 38 39
10 6 5 12 7 14 35 11 7 3 3
11 42 44 36 39 39 44 35 40 44 44
12 43 43 40 42 42 42 39 42 43 43
13 9 18 7 8 7 10 7 8 11 14
14 4 14 3 4 2 5 3 3 10 11
15 16 29 9 13 9 18 10 11 20 26
16 22 33 18 19 28 43 19 21 28 31
17 38 42 29 31 31 38 28 32 41 42
18 24 35 19 22 19 31 20 24 40 34
19 47 47 46 46 46 47 46 46 47 47
20 8 19 5 6 6 11 6 6 12 15
21 28 36 27 29 27 30 29 30 33 36
22 44 40 45 44 45 40 45 44 42 41
Downloaded from ascelibrary.org by Universidad Nacional De Ingenieria on 10/31/18. Copyright ASCE. For personal use only; all rights reserved.
23 2 8 4 3 4 3 2 4 27 5
24 20 9 28 25 29 20 30 22 8 10
25 18 1 23 20 20 27 26 20 6 7
26 39 31 42 40 40 32 42 39 34 33
27 17 3 22 21 21 29 25 19 5 6
28 5 6 13 9 15 39 12 9 4 4
29 7 17 6 5 5 9 5 5 9 13
30 36 12 38 38 37 22 40 37 18 19
31 13 27 8 10 8 15 8 10 17 25
32 31 13 34 33 34 24 37 33 15 17
33 3 7 1 2 3 2 4 1 2 2
34 45 45 44 45 44 45 43 45 45 45
35 34 21 35 34 35 17 34 35 26 24
36 25 28 30 27 30 23 27 29 32 29
37 10 20 10 12 11 12 13 12 19 18
38 30 26 33 36 33 28 32 34 29 27
39 12 24 11 11 10 13 9 13 14 22
40 33 38 25 26 26 34 23 27 37 38
41 29 37 20 24 23 33 21 25 36 37
42 46 46 47 47 47 46 47 47 46 46
43 23 2 31 30 24 7 31 26 16 8
44 11 10 21 17 17 6 17 17 7 9
45 27 11 32 32 32 4 33 31 13 12
46 21 30 17 18 16 25 18 18 22 30
47 40 41 41 41 41 41 41 41 39 40
shown in Tables 5 and 6. In addition, standardized Euclidean Then three indices were computed to quantify the difference be-
distances and relevant percent similarities were also obtained. tween approved quantities and actual historical quantities: sum of
A Wilcoxon signed-rank test (Becker et al. 2012) was squared error (SSE), root of sum of squared error (RSS), and mar-
performed to examine if the orders of similarity ranking across gin of error (RSS/sum of approved quantities). As shown in Table 8,
20 crafts are statistically equivalent. Result shows that only 35 the craft-oriented method yields the smallest margin of error with
out of 190 possible one-to-one paired groups of crafts (18%) dem- respect to craft quantities between project 0 and retrieved historical
onstrate similar orders of similarity ranking (Table 7), suggesting crafts, demonstrating the effectiveness of the proposed algorithm. It
significant differences among crafts with respect to similarity also highlights the importance of differentiating crafts in similar
ranking. case retrieval since retrieved similar projects (6, 23, and 33) yielded
bigger margins of error. The result of the standard Euclidean
distance method is not satisfactory either.
Comparison of Results
Additional observations were made as follows:
Actual craft quantities of selected historical projects were compared 1. The order of similarity ranking obtained from the standardized
with the approved estimates of project 0. In order to evaluate the Euclidean distance remains the same for all crafts since the
relative advantage of the proposed algorithm, two comparisons difference among crafts is not recognized, while the similarity
were performed: obtained from the proposed algorithm results in different or-
• Craft oriented comparison: Each individual craft of project 0 ders of similarity ranking for most crafts, which reflects the
was compared with the most similar craft in history determined uniqueness of each craft.
by the proposed algorithm (Table 6) with respect to quantity. 2. The proposed algorithm tends to allocate varied importance to
This method reflects the target of the proposed algorithm project parameters given different conditions. A linear
directly, i.e., recognizing difference across crafts. ANOVA was performed to test the significance of project
• Project level comparison: Entire projects were chosen to parameters. Result shows using the standardized Euclidean
compare with project 0 with respect to craft quantities. Project distance method, job configuration always plays an important
31 was selected based on the calculation of standard Euclidean role (p ¼ 0.047), but the proposed algorithm tends to priori-
distance. Projects 6, 23, and 33 were selected according to an tize project parameters differently. For example, in complete
overall similarity measure (1=average rank order) developed piping estimating, the most important parameters are engineer-
based on the proposed algorithm (Fig. 10). ing company (p ¼ 0.011) and job type (p ¼ 0.014), while in
Erect steel—small bore −61 0.5244 0.7378 0.2622 main stacks, is a good indicator for piping amount. Outside
Cable 600V+—complete piping −62 0.5042 0.7479 0.2521 stack distance, on the other hand, does reflect the need of total
Small bore—complete piping −68 0.4777 0.7612 0.2388 cable since cables are used to connect different major func-
Cable tray—pipe racks 69 0.4267 0.2134 0.7866 tional areas of the plant.
Instrument cable—concrete PFC −74.5 0.4216 0.7892 0.2108
Main circ—concrete PFC 78 0.3846 0.1923 0.8077
Control cable—large bore −82 0.3761 0.812 0.188
Discussion
Instrument cable—large bore −84 0.3644 0.8178 0.1822
Main circ—control cable 105 0.2557 0.1278 0.8722
Fundamentally, CBR analysis aims to find feasible solutions from
Instrument—total cable −106.5 0.2488 0.8756 0.1244
Pipe racks—concrete formwork −102.5 0.2358 0.8821 0.1179
the past for new problems. It requires an explicit and accurate def-
Cable tray—concrete formwork −115 0.2125 0.8938 0.1062 inition of the similarity between cases. This paper finds that the
Pipe racks—small bore 121 0.2037 0.1018 0.8982 relationship between case features and case outcomes (solutions)
Cable tray—small bore 128.5 0.1765 0.0883 0.9117 can significantly affect similarity measure, which in turn deter-
Cable 600V+—small bore 130 0.1714 0.0857 0.9143 mines the accuracy and reliability of CBR analysis. In cases where
Small bore—cable 600V- −146 0.1235 0.9382 0.0618 the feature-outcome relationship is simple and straightforward,
Instrument cable—erect steel 142.5 0.1205 0.0602 0.9398 seeking similar solutions can be achieved directly by finding cases
Instrument cable—total steel 144 0.1166 0.0583 0.9417 with similar features, or problem description, while if the feature-
Main circ—large bore 144.5 0.1032 0.0516 0.9484
outcome relationship is nonlinear and complex such a relationship
Small bore—concrete formwork −157 0.097 0.9515 0.0485
Pipe racks—cable 600V- 159.5 0.0812 0.0406 0.9594
influences accurate solution seeking. This paper attempts to de-
Control cable—concrete PFC −144.5 0.0806 0.9597 0.0403 velop an algorithm to capture the projection from the feature space
Concrete PFC—complete piping −160 0.0803 0.9599 0.0401 to the solution space. Global sensitivity analysis was utilized to es-
Main circ—instrument cable 161 0.0783 0.0392 0.9608 timate the first-order and high-order influences of project param-
Control cable—complete piping −170 0.0716 0.9642 0.0358 eters on craft quantities. Following the statistical definition of
Erect steel—total steel 169.5 0.0633 0.0316 0.9684 “importance” (Saltelli et al. 2008), the measurements of these in-
fluences were used immediately to convert feature similarity to sol-
ution similarity.
total cable estimating, the most important parameters are Another issue addressed in this paper is the multicollinearity
engineering company (p ¼ 0.007), and outside stack distance among project parameters. It was found that there are strong
(p ¼ 0.036). This directly leads to different orders of similar- correlations among project parameters. Without careful processing
ity ranking. It has been confirmed that the importance of of these correlations, the analysis becomes fragile to any slight
allocation of the proposed algorithm performs better in reflect- change in the input, and the importance of certain project param-
ing the nature of craft works. For example, piping and cable eters is likely to be exaggerated. In addition, correlated input
usage are both highly dependent on the technologies adopted, variables make the interpretation of GSA result difficult. Therefore,
which root from the preferences and capacities of different en- this paper employs PCA to orthogonalize input data (i.e., project
gineering companies. As a result, the selection of engineering parameters) prior to any analysis. The constructed conversion
companies becomes important in estimating piping and cable. matrix with the weight matrix from GSA together can be used
such as triangular distribution, which is commonly used in Ayed, A. (1998). “Neural network model for parametric cost estimation of
construction research (Maio et al. 2000). Another issue with the highway projects.” J. Constr. Eng. Manage., 10.1061/(ASCE)0733-
9364(1998)124:3(210), 210–218.
proposed algorithm is that certain qualitative project parameters
Becker, T., Jaselskis, E., El-Gafy, M., and Du, J. (2012). “Industry practices
were simply converted into integer codes; this method is not as rea- for estimating, controlling and managing key indirect construction
sonable as the dummy variable method. Last, interpretation of the costs at the project level.” Construction Research Congress 2012,
analysis results could be difficult because of the use of PCA. PCA 2469–2478.
serves as a linear combination function that breaks the original Bishop, C. M. (1995). Neural networks for pattern recognition, Oxford
meanings of input variables. These issues might affect the appli- University Press, Oxford, U.K.
cability of the proposed algorithm. Carr, R. I. (1989). “Cost-estimating principles.” J. Constr. Eng. Manage.,
10.1061/(ASCE)0733-9364(1989)115:4(545), 545–551.
Chang, C. L., Cheng, B. W., and Su, J. L. (2004). “Using case-based rea-
soning to establish a continuing care information system of discharge
Conclusions planning.” Expert Syst. Appl., 26(4), 601–613.
Changchien, S., and Lin, M. C. (2005). “Design and implementation of a
This study proposed a new algorithm to improve the similarity mea-
case-based reasoning system for marketing plans.” Expert Syst. Appl.,
sure, which is critical for proper case retrieval when applying CBR 28(1), 43–53.
in quantity takeoff in the proposal development phase. We have Chen, W., Jin, R., and Sudjianto, A. (2005). “Analytical variance-based
noted that the current weighting method used in the CBR similarity global sensitivity analysis in simulation-based design under uncer-
measure lacks a statistically sound procedure. Questions remain tainty.” J. Mech. Des., 127(5), 875–886.
unanswered especially how to tackle the multicollinearity among Chiu, C. (2002). “A case-based customer classification approach for direct
input variables, nonlinearity between feature space and outcome marketing.” Expert Syst. Appl., 22(2), 163–168.
space, and difference across crafts. Building on existing methods, Chua, D., Li, D., and Chan, W. (2001). “Case-based reasoning approach in
including Sobol’s TSI, PCA, and ANN, the multicollinearity bid decision making.” J. Constr. Eng. Manage., 10.1061/(ASCE)0733-
and uniqueness of crafts have been addressed by the proposed 9364(2001)127:1(35), 35–45.
Draper, N. R., Smith, H., and Pownell, E. (1966). Applied regression analy-
algorithm.
sis, Wiley, New York.
A better similarity measure is also equally important to Du, J., and El-Gafy, M. (2010). “Virtual organizational imitation for
estimators in conceptual estimating. We have found that a practical construction enterprises (VOICE): Managing business complexity us-
approach adopted by many estimators at conceptual estimating ing agent based modeling.” Proc., Construction Research Congress
phases is to make comparisons with historical jobs due to the (CRC) 2012, ASCE, Reston, VA.
lack of necessary information. In certain extreme cases, estima- Du, J., and El-Gafy, M. (2012). “Virtual organizational imitation for
tors do not have any information for estimating. A better sim- construction enterprises: Agent-based simulation framework for
ilarity measurement method will be a significant time saver for exploring human and organizational implications in construction
estimators. management.” J. Comput. Civ. Eng., 10.1061/(ASCE)CP.1943-5487
This study also highlights the importance of GSA in the con- .0000122, 282–297.
Duverlie, P., and Castelain, J. (1999). “Cost estimation during design step:
struction engineering and management field. Sensitivity analysis
parametric method versus case based reasoning method.” Int. J. Adv.
is the X-ray of modelers (Fürbinger 1996). At a time when model- Manufact. Technol., 15(12), 895–906.
ing and simulation approaches are of increasing interest among Farrar, D. E., and Glauber, R. R. (1967). “Multicollinearity in regression
construction scholars (Taylor 2010), it is necessary to keep SA analysis: The problem revisited.” Rev. Econ. Stat., 49(1), 92–107.
methods updated. We have found, to our best knowledge, that there Feng, W., and Figliozzi, M. (2011). “Empirical findings of bus bunching
is a tendency in environmental research to move from conventional distributions and attributes using archived AVL/APC bus data.” Proc.,
SA to variance-based GSA. The application of Sobol’s TSI has 11th Int. Conf. of Chinese Transportation Professionals (ICCTP),
recently become particularly prevalent in uncertainty analysis, ASCE, Reston, VA.
sensitivity analysis, and model evaluation. Although not directly Freund, R. J., and Littell, R. C. (1981). SAS for linear models: A guide to
focused on introducing GSA, this paper is expected to draw the the ANOVA and GLM procedures, SAS Institute, Cary, North Carolina.
Friedman, J., Tibshirani, R., and Hastie, T. (2009). The elements of stat-
attention of construction scholars as a improved post-modeling
istical learning: Data mining, inference, and prediction, Springer,
analytical procedure. New York.
Fürbinger, J. (1996). “Sensitivity analysis for modellers.” Air Infiltration
Rev., 17(4), 8–10.
References Fyfe, C., and Corchado, J. (2002). “A comparison of kernel methods for
instantiating case based reasoning systems.” Adv. Eng. Inform., 16(3),
Aamodt, A., and Plaza, E. (1994). “Case-based reasoning: Foundational 165–178.
issues, methodological variations, and system approaches.” AI Comm., Greenwood, P. E., and Nikulin, M. S. (1996). A guide to chi-squared
7(1), 39–59. testing, Wiley, New York, NY.
and genetic algorithms for predicting preliminary cost estimates.” variable importance decomposition with applications to probabilistic
J. Comput. Civ. Eng., 10.1061/(ASCE)0887-3801(2005)19:2(208), engineering design.” Comput. Ind. Eng., 57(3), 996–1007.
208–211. Setiono, R. (1997). “A penalty-function approach for pruning feedforward
Kim, G. H., Yoon, J. E., An, S. H., Cho, H. H., and Kang, K. I. (2004b). neural networks.” Neural Comput., 9(1), 185–204.
“Neural network model incorporating a genetic algorithm in estimating Shannon, C. E., and Weaver, W. (1949). A mathematical theory of commu-
construction costs.” Build. Environ., 39(11), 1333–1340. nication, Univ. of Illinois Press, Urbana, IL.
Kouskoulas, V., and Koehn, E. (1974). “Predesign cost-estimation function Shin, K., and Han, I. (1999). “Case-based reasoning supported by genetic
for buildings.” J. Constr. Div., 100(4), 589–604. algorithms for corporate bond rating.” Expert Syst. Appl., 16(2), 85–95.
Lilburne, L., and Tarantola, S. (2009). “Sensitivity analysis of spatial Shin, K., and Han, I. (2001). “A case-based approach using inductive index-
models.” Int. J. Geograph. Inform. Sci., 23(2), 151–168. ing for corporate bond rating.” Decis. Support Syst., 32(1), 41–52.
Low, B. (2005). “Reliability-based design applied to retaining walls.” SimLab. (2011). Software package for uncertainty and sensitivity analysis,
Geotechnique, 55(1), 63–75. Joint Research Centre of the European Commission, Brussels, Belgium.
Slonim, T., and Schneider, M. (2001). “Design issues in fuzzy case-based
Madhusudan, T., Zhao, J. L., and Marshall, B. (2004). “A case-based rea-
reasoning.” Fuzzy Sets Syst., 117(2), 251–267.
soning framework for workflow model management.” Data Knowl.
Sobol, I., Tarantola, S., Gatelli, D., Kucherenko, S., and Mauntz, W. (2007).
Eng., 50(1), 87–115.
“Estimating the approximation error when fixing unessential factors in
Mahalanobis, P. C. (1936). “On the generalized distance in statistics.” Proc.
global sensitivity analysis.” Reliab. Eng. Syst. Safety, 92(7), 957–960.
Natl. Inst. Sci. India, 2(1), 49–55.
Sobol, I. M. (2001). “Global sensitivity indices for nonlinear mathematical
Maier, H., Dandy, G., Norton, J., and Croke, B. (2005). “A comparison
models and their Monte Carlo estimates.” Math. Comput. Simulat.,
of sensitivity analysis techniques for complex models for environmental
55(1–3), 271–280.
management.” Proc, Int. Congress on Modelling and Simulation,
Stewart, R. D., Wyskida, R. M., and Johannes, J. D. (1995). Cost estima-
Melbourne, VIC, 2533–2539.
tor’s reference manual, Wiley-Interscience, Hoboken, NJ.
Maio, C., Schexnayder, C., Knutson, K., and Weber, S. (2000). “Probability Tam, C., and Fang, C. F. (1999). “Comparative cost analysis of using high-
distribution functions for construction simulation.” J. Constr. Eng. performance concrete in tall building construction by artificial neural
Manage., 10.1061/(ASCE)0733-9364(2000)126:4(285), 285–292. networks.” ACI Struct. J., 96(6), 927–936.
Mohri, T., and Tanaka, H. (1994). “An optimal weighting criterion of case Taylor, J. (2010). “Invitation to submit scholarly articles using agent-based
indexing for both numeric and symbolic attributes.” AAAI-94 Work- simulation to tackle challenging civil engineering problems.” J. Com-
shop Program: Case-Based Reasoning, Working Notes, 123–127. put. Civ. Eng., 10.1061/(ASCE)CP.1943-5487.0000069, 465–466.
Mooney, C. Z. (1997). Monte Carlo simulation, Sage Publications, Tetko, I. V., Livingstone, D. J., and Luik, A. I. (1995). “Neural network
Thousand Oaks, CA. studies. 1. Comparison of overfitting and overtraining.” J. Chem.
NASA. (2008). “2008 NASA cost estimating handbook.” 〈http://www.nasa Inform. Comput. Sci., 35(5), 826–833.
.gov/pdf/263676main_2008-NASA-Cost-Handbook-FINAL_v6.pdf〉 Thogmartin, W. E. (2010). “Sensitivity analysis of North American bird
(Jan. 22, 2014). population estimates.” Ecol. Model., 221(2), 173–177.
NeuralWare. (2014). “How many hidden nodes or hidden layers should Tsai, T., Kao, C., Surampalli, R., Huang, W., and Rao, J. (2011). “Sensi-
I use?.” Frequently asked questions, 〈http://www.neuralware.com/ tivity analysis of risk assessment at a petroleum-hydrocarbon contami-
index.php/frequently-asked-questions#professional-plus-hidden-layers〉 nated site.” J. Hazard. Toxic Radioact. Waste, 10.1061/(ASCE)HZ
(Jan. 22, 2014). .1944-8376.0000067, 89–98.
Oakley, J. E., and O’Hagan, A. (2004). “Probabilistic sensitivity analysis of Varella, H., Guérif, M., and Buis, S. (2010). “Global sensitivity analysis
complex models: A Bayesian approach.” J. Roy. Stat. Soc. Ser. B, 66(3), measures the quality of parameter estimation: the case of soil param-
751–769. eters and a crop model.” Environ. Model. Software, 25(3), 310–319.
Ozcan Deniz, G., Zhu, Y., and Ceron, V. (2012). “Time, cost and environ- Watson, I. (1999). “Case-based reasoning is a methodology not a technol-
mental impact analysis on construction operation optimization using ogy.” Knowl. Based Syst., 12(5–6), 303–308.
genetic algorithms.” J. Manage. Eng., 10.1061/(ASCE)ME.1943- Wettschereck, D., and Aha, D. (1995). “Weighting features.” Case-based
5479.0000098, 265–272. reasoning research and development, Springer, Berlin, Heidelberg,
Perera, S., and Watson, I. (1998). “Collaborative case-based estimating and 347–358.
design.” Adv. Eng. Softw., 29(10), 801–808. Xiao, F., Honma, Y., and Kono, T. (2005). “A simple algebraic interface
Project Management Institute. (2008). A guide to the project management capturing scheme using hyperbolic tangent function.” Int. J. Numer.
body of knowledge: PMBOK guide, 4th Ed., Project Management Meth. Fluids, 48(9), 1023–1040.
Institute, Newtown Square, PA. Yau, N. J., and Yang, J. B. (1998a). “Applying case-based reasoning tech-
Reedijk, C. (2000). “Sensitivity analysis of model output: Performance of nique to retaining wall selection.” Automat. Constr., 7(4), 271–283.
various local and global sensitivity measures on reliability problems.” Yau, N. J., and Yang, J. B. (1998b). “Case-based reasoning in construction
M.S. thesis, Delft Univ. of Technology, Delft, Netherlands. management.” Comput. Aided Civ. Infrastruct. Eng., 13(2), 143–150.