Professional Documents
Culture Documents
a r t i c l e i n f o
a b s t r a c t
Article history:
Received 29 August 2013
Received in revised form
15 August 2014
Accepted 21 August 2014
Available online 2 September 2014
Early prediction of the success of green building projects is an important and challenging issue. The aim
of this study was to develop a model to predict the cost and schedule performance of green building
projects based on the level of denition during the pre-project planning phase. To this end, a three-step
process was proposed: pre-processing, variable selection, and prediction model construction. Data from
53 certied green buildings were used to develop the models. After balancing the data set with respect to
the proportion of cases in each of the outcome categories by pre-processing, the number of input variables was reduced from 64 to 13 and 7 for cost and schedule performance prediction respectively, using
the ReliefF-W variable selection method. Then, cost and schedule performance prediction models were
constructed using the selected variables and four different classiers: a support vector machine (SVM), a
back-propagation neural network (BPNN), a C4.5 decision tree algorithm (C4.5), and a logistic regression
(LR). The classication performance of the four models was compared to assess their applicability. The
SVM models exhibited the highest accuracy, sensitivity, and specicity in predicting both the cost and
schedule performance of green building projects. The results of this study empirically validated that the
cost and schedule performance of green building projects is highly dependent on the quality of denition
in the pre-project planning phase.
2014 Elsevier Ltd. All rights reserved.
Keywords:
Green building project
Certied green building
Pre-project planning
Cost performance
Schedule performance
Data mining
1. Introduction
Green building, a practice that is two decades old, has become
more prevalent in recent years. The U.S. Environmental Protection
Agency denes green building as the practice of creating structures and using processes that are environmentally responsible and
resource-efcient throughout a building's lifecycle, from siting to
design, construction, operation, maintenance, renovation, and
deconstruction (U.S. Environmental Protection Agency, 2010a).
Many countries have developed and adopted various green building rating systems (Zhang et al., 2013). The exemplar systems are
the U.K.s Building Research Establishment Environmental Assessment Method (BREEAM), the U.S. Leadership in Energy and Environmental Design (LEED), Australian Green Star, Germany's
Deutsche Gtesiegel Nachhaltiges Bauen (DGNB), the Japanese
Comprehensive Assessment System for Building Environmental
Efciency (CASBEE), and the Korean Green Building Certication
System (K-GBCS). These systems outline specic guidelines for
implementing green practices into the lifecycle of a building, thus
guiding the stakeholders of the building project in the green delivery of their project.
The number of buildings certied with green building rating
systems has rapidly increased as a result of the rapid spread of such
systems and the recognition of the benets of green buildings,
which include reduced operating costs; the creation, expansion,
and development of markets for green products and services;
improved occupant productivity; and optimized life-cycle economic performance (Zhang, 2014; Atlee, 2011; U.S. Environmental
Protection Agency, 2010b; Shen et al., 2010). For example, the total U.S. green building market value is expected to increase from
$10 billion in 2005 to between $98 and $106 billion in 2013
(McGraw-Hill Construction, 2012). This rapid growth of the green
building market has raised concerns among project stakeholders
about the risks involved, especially the high level of uncertainty
with respect to project performance in the delivery of green
building projects (Hwang and Leong, 2013; Robichaud and
Anantatmula, 2011). Hwang and Leong (2013) reported that the
failure rate of 39 green building projects in their survey was 33% in
terms of schedule performance, more than twice that of traditional
building 40 projects.
The successful delivery of green building projects is more difcult and complex than for traditional building projects, particularly
145
146
were certied by either the Leadership in Energy and Environmental Design (LEED), developed by the U.S. Green Building Council
(USGBC), or the Korean Green Building Certication System (KGBCS), developed by the Korea Green Building Council (KGBC). The
median cost of the projects was $131.5 million, with a range of $13.8
to $499.8 million. The duration of these projects was 31.2 months
on average, with a range of 12e56 months. A detailed overview of
the questionnaire data is presented in Table 1.
Thirty-seven projects (70%) had unfavorable cost variance
(actual cost exceeded budgeted cost), while only 16 projects had
favorable cost variance. The average of the unfavorable variances,
i.e. cost escalation, for the 37 failed projects was 13.7%. Databases
on green building projects typically indicate a large proportion of
cost-overrun projects compared to on-budget projects; therefore,
green building projects require careful management with respect
to cost performance. In contrast to cost performance, 32 projects
were completed on schedule, while 21 projects (40%) had unfavorable schedule variance (actual duration exceeded planned
duration). In the case of delays in the completion of projects,
where the agreed planned duration is exceeded, penalties for delays are generally imposed on the contractor. Such penalties usually take the form of withheld payments. Should the contractor
make up time and deliver on the contractual completion date, any
penalties incurred for delay during construction are reimbursed.
The average unfavorable schedule variance for the 21 failed projects was 11.7%.
4. Methodology
The main objective of this study is to develop a model to predict
the cost and schedule performance of green building projects
based on the level of denition during the pre-project planning
phase. To this end, a three-step process is proposed: preprocessing, variable selection, and prediction model construction.
Pre-processing of the data set is carried out to deal with imbalances in the proportions of the outcomes in the data. Variable
selection is used to remove redundant or irrelevant variables, to
reduce the relatively high number of variables compared to the
number of cases. Then, prediction models are constructed using
the selected variables and four different classiers, a support
vector machine (SVM), a back-propagation neural network (BPNN),
a C4.5 decision tree algorithm (C4.5), and a logistic regression (LR),
to predict the cost and schedule performance of green building
projects. Finally, the classication performance of the four models
is compared.
Table 1
Type, size and duration of the green building projects in the study (N 53).
Characteristic
Project type (building use)
Residential
Ofce
Educational
Retail
Cultural
Research
Other
Project size
< $50 million
$50e$100 million
> $100 million
Project duration
< 24 months
24e36 months
> 36 months
a
Frequency
Percentage (%)a
21
16
8
3
2
2
1
39.6
30.2
15.1
5.7
3.8
3.8
1.9
10
12
31
18.9
22.6
58.5
6
37
10
11.3
69.8
18.9
4.1. Pre-processing
The data set comprised 37 failed and 16 successful projects with
respect to cost performance; the failure-to-success ratio was 7:3.
There were 21 failed and 32 successful projects with respect to
schedule performance; the failure-to-success ratio was 4:6. The
data set was unbalanced because different outcome classes were
not equally represented, a prevalent characteristic in the construction industry. Such imbalances lead to a bias toward the majority class, because the standard classiers tend to predict the class
with the higher number of cases, which are positively weighted
ndez et al., 2009). Such a bias results in overduring training (Ferna
ndez et al., 2009). Therefore,
prediction of the majority class (Ferna
pre-processing is necessary to deal with the imbalance problem
before constructing prediction models, by balancing the class distribution in the data set. In this study, the synthetic minority oversampling technique (SMOTE) was employed, proposed by Chawla
et al. (2002).
SMOTE over-samples the minority class by generating synthetic
cases at random intervals between existing minority cases rather
than duplicating existing minority cases (Gao et al., 2011; Li et al.,
2010). Because SMOTE over-samples the minority class without
data duplication, the over-tting problem can be avoided (Gao
ndez et al., 2009). The technique rst nds the
et al., 2011; Ferna
k-nearest neighbors of each minority case. In this study, the value of
k was set to 5, as recommended by Chawla et al. (2002). Next,
depending on the amount of over-sampling required, there are
several iterations in which one neighbor is randomly selected from
the k-nearest neighbors. If the amount of over-sampling required is
200% and k 5, then there are two iterations in which one neighbor
is randomly selected from the ve nearest. Then the difference
between the case under consideration and its neighbor is calculated. This difference is multiplied by a random number between
0 and 1. Finally, the new synthetic cases are added to the data set
and assigned to the minority class.
Fig. 1 shows the class distribution before and after preprocessing with respect to cost and schedule performance. Using
SMOTE, the original number of cases of the minority class was
doubled by reducing the imbalance ration from 7.0:3.0 to 5.4:4.6 for
cost performance and from 4.0:6.0 to 5.7:4.3 for schedule
performance.
4.2. Variable selection
After pre-processing, 69 and 74 cases were obtained in total for
cost and schedule performance prediction, respectively. However,
64 independent variables in the data set represented a large
number compared to the relatively small number of cases. This
would make the standard classiers complex and difcult to train
(Ng et al., 2008). In such cases, it is advisable to discard less relevant
variables by variable selection which determines more relevant
variables in the data set (Liu and Setiono, 1998). Through this
process, both generalization and classication accuracy could be
higher than they would have been otherwise (Ng et al., 2008). In
this study, the ReliefF-W algorithm (weighted by distance) was
employed for variable selection.
ReliefF selects variables based on the estimation of their relevance according to their ability to distinguish between different
classes (Chen and Yu, 2012; Liu et al., 2010). The Relief algorithm
assigns relevance as a weight to each variable; thus irrelevant
variables can be discarded (Kohavi and John, 1997). In the implementation, cases are rst selected randomly from the data set.
Then, the nearest neighbors in the same class (nearest hits) and in
the other class (nearest misses), are determined. Finally, the weight
of each variable is updated by computing the difference in the value
147
Fig. 1. Class distribution before and after pre-processing using SMOTE: (a) original data set; (b) after pre-processing.
predict the cost and schedule performance of green building projects based on the level of denition during the pre-project planning phase. These four classiers were chosen for this study
because they are well established and widely used in solving binary classication problems (for example, Olson et al., 2012; Xia
and Jin, 2008; Quintana et al., 2008; Dreiseitl et al., 2001). All
four classiers were evaluated and their performance was then
compared.
4.3.1. Support vector machine (SVM)
The SVM, developed by Vapnik (1995), is a powerful, supervised
learning method that has strong discriminative power. It can
handle non-linear classication problems by using a kernel function to implicitly map data to a high-dimensional space. It then
constructs a hyperplane as a discriminant function to maximize
the margin of separation between two classes in the highdimensional feature space (Vapnik, 1995). The SVM offers good
generalization ability, based on the principle of structural risk
minimization and with the help of multiplier parameters such as
Lagrange multipliers (Ali and Smith, 2006; Dreiseitl et al., 2001).
The SVM achieves this by solving an optimization problem using
training data.
Given a labeled set of M training data (xi,yi), where xi 2 RN and yi
is the associated label (yi 2 {1,1}), the discriminant hyperplane
is dened as follows:
M
X
The reduced sets of variables were used as input for the four
different types of learning classiers (SVM, BPNN, C4.5, and LR) to
f x
Table 2
ReliefF-W scores for the 13 selected independent variables for cost performance
prediction.
where k(,,,) is the kernel function. Constructing an optimal hyperplane is equivalent to estimating all the nonzero coefcients ai
(support vectors) and the bias b. For further information, see
Vapnik (1995, 1998).
Rank
Variable description
Score
1
2
3
4
5
6
7
8
9
10
11
12
13
0.1183
0.1154
0.1012
0.0913
0.0839
0.0752
0.0742
0.0667
0.0636
0.0632
0.0538
0.0520
0.0514
ai yi kxi ; x b;
(1)
i1
Table 3
ReliefF-W scores for the 7 selected variables for schedule performance prediction.
Rank
Variable description
Score
1
2
3
4
5
6
7
0.0946
0.0901
0.0856
0.0773
0.0674
0.0598
0.0546
148
Dwji n hdj xji aDwji n 1 ;
(2)
input variables. Also, the relationships among the variables and the
strengths of these relationships can be obtained from the LR model
(Stojanova et al., 2012).
The major advantage of LR is that it can produce a simple
probabilistic formula for the classication. In addition, it does not
assume linearity in the relationship between the input variables
and the classes, nor does it require normally distributed input
variables (Keramati and Youse, 2011). The LR model is dened as
follows:
logp=1 p b0 b1 x1 b2 x2 bn xn ;
(3)
Accuracy
TP TN
TP TN FP FN
Sensitivity
Specificity
TP
TP FN
TN
FP TN
(4)
(5)
(6)
5. Experimental results
Table 4
Cost performance prediction results for the four classication models.
149
SVM
BPNN
C4.5
LR
Accuracy (%)
Sensitivity (%)
Specicity (%)
91.30
86.96
82.61
76.81
96.88
87.50
90.63
81.25
86.49
86.49
75.68
72.97
number of samples per leaf (M). To achieve higher prediction accuracy and to avoid over-tting of data, a grid search was carried
out by varying F from 0.15 to 0.35 in steps of 0.05 and M from 1 to
10 in steps of 1. Then, the pair with the best 10-fold crossvalidation accuracy was chosen. In the grid search, it was
observed that the C4.5 algorithm was also sensitive to the condence factor as well as the number of cases per leaf. After conducting the grid search, the optimal (F,M) for cost performance
prediction was found to be (0.25,3) with a cross-validation accuracy of 82.6%. For schedule performance prediction, the optimal
(F,M) was found to be (0.35,3) with a cross-validation accuracy of
77.0%.
5.2. Prediction performance comparisons
The average prediction accuracy, as well as sensitivity and
specicity, for the four different classiers for cost and schedule
performance prediction are shown in Tables 4 and 5. They are
ranked primarily in descending order of their average prediction accuracy. The SVM models exhibited the highest accuracy,
sensitivity, and specicity in predicting both cost and schedule
performance, whereas the LR fared the worst. These results
clearly indicate that the cost and schedule performance of
green building projects can be predicted at an early stage with
the SVM models. The BPNN and C4.5 models showed comparable performance in terms of accuracy, sensitivity, and specicity for both cost and schedule performance predictions. The
statistical LR model had the worst predictive accuracy and the
largest errors of the four models in terms of sensitivity and
specicity.
6. Conclusion
The prediction of cost and schedule performance of green
building projects during the pre-project planning phase is an
important and challenging issue. Pre-project planning is generally
acknowledged to be a key contributor to success in green building
projects. The aim of this study was to develop a model to predict
the cost and schedule performance of green building projects based
on the level of denition during the pre-project planning phase.
Three-step process was proposed and presented to achieve this
objective, pre-processing, variable selection, and prediction model
construction. Data from 53 certied green buildings were used to
develop the models. The data set was balanced in terms of the
relative proportion of the outcome classes by pre-processing. The
Table 5
Schedule performance prediction results for the four classication models.
SVM
BPNN
C4.5
LR
Accuracy (%)
Sensitivity (%)
Specicity (%)
90.54
75.68
77.03
60.81
84.38
71.88
62.50
46.88
95.24
78.57
88.10
71.43
150
Acknowledgments
This research was supported by Basic Science Research
Program through the National Research Foundation of
Korea (NRF) funded by the Ministry of Education (NRF2013R1A1A2A10058175).
Section/category/variable description
I. Basis of Project Decision
A. Business Strategy
A1. Building Use Requirements
A2. Business Justication
A3. Business Plan
A4. Economic Analysis
A5. Facility Requirements
A6. Future Expansion/Alteration Considerations
A7. Site Selection Considerations
A8. Project Objectives Statement
B. Owner Philosophies
B1. Reliability Philosophy
B2. Maintenance Philosophy
B3. Operating Philosophy
B4. Design Philosophy
C. Project Requirements
C1. Value-Analysis Process
C2. Project Design Criteria
C3. Evaluation of Existing Facilities
C4. Scope of Work Overview
C5. Project Schedule
C6. Project Cost Estimate
II. Basis of Design
D. Site Information
D1. Site Layout
D2. Site Surveys
D3. Civil/Geotechnical Information
D4. Governing Regulatory Requirements
D5. Environmental Assessment
D6. Utility Sources with Supply Conditions
D7. Site Life Safety Considerations
D8. Special Water and Waste Treatment Requirements
E. Building Programming
E1. Program Statement
E2. Building Summary Space List
E3. Overall Adjacency Diagrams
E4. Stacking Diagrams
E5. Growth and Phased Development
E6. Circulation and Open Space Requirements
E7. Functional Relationship Diagram/Room by Room
E8. Loading/Unloading/Storage Facilities Requirements
E9. Transportation Requirements
E10. Building Finishes
E11. Room Data Sheets
E12. Furnishings, Equipment, and Built-Ins
E13. Window Treatment
F. Building/Project Design Parameters
F1. Civil/Site Design
F2. Architectural Design
F3. Structural Design
F4. Mechanical Design
F5. Electrical Design
F6. Building Life Safety Requirements
F7. Constructability Analysis
F8. Technological Sophistication
G. Equipment
G1. Equipment List
G2. Equipment Location Drawings
G3. Equipment Utility Requirements
III. Execution Approach
H. Procurement Strategy
H1. Identify Long-lead/Critical Equipment and Materials
H2. Procurement Procedures and Plans
J. Deliverables
J1. CADD/Model Requirements
J2. Documentation/Deliverables
K. Project Control
K1. Project Quality Assurance and Control
K2. Project Cost Control
K3. Project Schedule Control
K4. Risk Management
K5. Safety Procedures
References
Aarabi, A., Wallois, F., Grebe, R., 2006. Automated neonatal seizure detection: a
multistage classication system through feature selection based on relevance
and redundancy analysis. Clin. Neurophysiol. 117, 328e340.
Akay, M.F., 2009. Support vector machines combined with feature selection for
breast cancer diagnosis. Expert Syst. Appl. 36, 3240e3247.
Ali, S., Smith, K.A., 2006. On learning algorithm selection for classication. Appl. Soft
Comput. 6, 119e138.
An, S., Liu, W., Venkatesh, S., 2007. Fast cross-validation algorithms for least squares
support vector machine and kernel ridge regression. Pattern Recogn. 40,
2154e2162.
Atlee, J., 2011. Selecting safer building products in practice. J. Clean. Prod. 19,
459e463.
Bui, D.T., Pradhan, B., Lofman, O., Revhaug, I., 2012. Landslide susceptibility
assessment in Vietnam using support vector machines, decision tree, and nave
bayes models. Math. Probl. Eng. 2012, 1e26.
Caesarendra, W., Widodo, A., Yang, B.-S., 2010. Application of relevance vector
machine and logistic regression for machine degradation assessment. Mech.
Syst. Signal Process. 24, 1161e1171.
Chandramohan, A., Narayanan, S.L., Gaurav, A., Krishna, N., 2012. Cost and time
overrun analysis for green construction projects. Int. J. Green. Econ. 6, 167e177.
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: synthetic
minority over-sampling technique. J. Artif. Intell. Res. 16, 321e357.
Chen, Y.-H., Yu, S.-N., 2012. Selection of effective features for ECG beat recognition
based on nonlinear correlations. Artif. Intell. Med. 54, 43e52.
Cho, C.-S., Gibson Jr., G.E., 2001. Building project scope denition using project
denition rating index. J. Archit. Eng. 7, 115e125.
Construction Industry Institute (CII), 1999. Project Denition Rating Index (PDRI)d
Building Projects. Implementation Resource 155e2, Austin, TX.
Cybenko, G., 1989. Approximation by superpositions of a sigmoidal function. Math.
Control Signal 2, 303e314.
Delen, D., Walker, G., Kadam, A., 2005. Predicting breast cancer survivability: a
comparison of three data mining methods. Artif. Intell. Med. 34, 113e127.
Ding, Y., Song, X., Zen, Y., 2008. Forecasting nancial condition of Chinese listed
companies based on support vector machine. Expert Syst. Appl. 34, 3081e3089.
Dreiseitl, S., Ohno-Machado, L., Kittler, H., Vinterbo, S., Billhardt, H., Binder, M.,
2001. A comparison of machine learning methods for the diagnosis of pigmented skin lesions. J. Biomed. Inf. 34, 28e36.
Duda, R.O., Hart, P.E., Stork, D.G., 2001. Pattern Classication. Chapter 9. John Wiley
& Sons, New York, NY.
D'Haen, J., van den Poel, D., 2013. Model-supported business-to-business prospect
prediction based on an iterative customer acquisition framework. Ind. Mark.
Manag. 42, 544e551.
ndez, A., del Jesus, M.J., Herrera, F., 2009. Hierarchical fuzzy rule based clasFerna
sication systems with genetic rule selection for imbalanced data-sets. Int. J.
Approx. Reason 50, 561e577.
Gao, M., Hong, X., Chen, S., Harris, C.J., 2011. A combined SMOTE and PSO based RBF
classier for two-class imbalanced problems. Neurocomputing 74, 3456e3466.
Gardner, M.W., Dorling, S.R., 1998. Articial neural networks (the multilayer
perceptron)dA review of applications in the atmospheric sciences. Atmos.
Environ. 32, 2627e2636.
Geethanjali, P., Ray, K.K., 2011. Identication of motion from multi-channel EMG
signals for control of prosthetic hand. Australas. Phys. Eng. Sci. Med. 34,
419e427.
Hastie, T., Tibshirani, R., Friedman, J., 2001. The Elements of Statistical Learning:
Data Mining, Inference and Prediction, rst ed. Springer-Verlag, New York, NY.
Hosmer, D.W., Lemeshow, S., 2000. Applied Logistic Regression, second ed. John
Wiley & Sons, New York, NY.
Hsu, C.-W., Chang, C.-C., Lin, C.-J., 2003. A Practical Guide to Support Vector Classication. Technical Report. National Taiwan University, Taipei, Taiwan.
Hwang, B.-G., Leong, L.P., 2013. Comparison of schedule delay and causal factors
between traditional and green construction projects. Technol. Econ. Dev. Eco.
19, 310e330.
Irie, B., Miyake, S., 1988. Capabilities of three-layered perceptrons. In: Proc. IEEE Int.
Conf. on Neural Networks, 24e27 July 1988, San Diego, CA, pp. 641e648.
Keramati, A., Youse, N., 2011. A proposed classication of data mining techniques in
credit scoring. In: Proc. 2011 Int. Conf. on Industrial Engineering and Operations
Management, 22e24 January 2011, Kuala Lumpur, Malaysia, pp. 416e424.
Kohavi, R., 1995. A study of cross-validation and bootstrap for accuracy estimation
and model selection. In: Proc. Int. Joint Conf. on Articial Intelligence, 20e25
al, Canada, pp. 1137e1145.
August 1995, Montre
151
Kohavi, R., John, G.H., 1997. Wrappers for feature subset selection. Artif. Intell. 97,
273e324.
Li, T., Zhang, C., Ogihara, M., 2004. A comparative study of feature selection and
multiclass classication methods for tissue classication based on gene
expression. Bioinformatic 20, 2429e2437.
Li, D.-C., Liu, C.-W., Hu, S.C., 2010. A learning method for the class imbalance
problem with medical data sets. Comput. Biol. Med. 40, 509e518.
Liu, H., Setiono, R., 1998. Some issues on scalable feature selection. Expert Syst.
Appl. 15, 333e339.
Liu, J.-W., Cheng, C.-H., Chen, Y.-H., Chen, T.-L., 2010. OWA rough set model for
forecasting the revenues growth rate of the electronic industry. Expert Syst.
Appl. 37, 610e617.
Ma, C.-Y., Yang, S.-Y., Zhang, H., Xiang, M.-L., Huang, Q., Wei, Y.-Q., 2008. Prediction
models of human plasma protein binding rate and oral bioavailability derived
by using GAeCGeSVM method. J. Pharm. Biomed. 47, 677e682.
McGraw-Hill Construction, 2012. Green Building Outlook Strong for Both Nonresidential & Residential Sectors Despite Soft Economy. Press Release. http://
www.construction.com/about-us/press/green-building-outlook-strong-forboth-non-residential-and-residential.asp.
Min, J.H., Lee, Y.-C., 2005. Bankruptcy prediction using support vector machine with
optimal choice of kernel function parameters. Expert Syst. Appl. 28, 603e614.
Ng, W.W.Y., Yeung, D.S., Firth, M., Tsang, E.C.C., Wang, X.-Z., 2008. Feature selection
using localized generalization error for supervised classication problems using
RBFNN. Pattern Recogn. 41, 3706e3719.
Olson, D.L., Delen, D., Meng, Y., 2012. Comparative analysis of data mining methods
for bankruptcy prediction. Decis. Supp. Syst. 52, 464e473.
Pulaski, M.H., Horman, M.J., Riley, D.R., 2006. Constructability practices to manage
sustainable building knowledge. J. Archit. Eng. 12, 83e92.
Quinlan, J.R., 1993. C4.5 Programs for Machine Learning, rst ed. Morgan Kaufmann,
San Mateo, CA.
Quintana, D., Saez, Y., Mochon, A., Isasi, P., 2008. Early bankruptcy prediction using
ENPC. Appl. Intell. 29, 157e161.
Robichaud, L.B., Anantatmula, V.S., 2011. Greening project management practices
for sustainable construction. J. Manage. Eng. 27, 48e57.
Shen, L.-Y., Tam, V.W.Y., Tam, L., Ji, Y.-B., 2010. Project feasibility study: the key to
successful implementation of sustainable and socially responsible construction
management practice. J. Clean. Prod. 18, 254e259.
Sokolova, M., Lapalme, G., 2009. A systematic analysis of performance measures for
classication tasks. Inf. Process. Manag. 45, 427e437.
Son, H., Kim, C., Kim, C., 2012. Hybrid principal component analysis and support
vector machine model for predicting the cost performance of commercial
building projects using pre-project planning variables. Autom. Constr. 27,
60e66.
Stojanova, D., Kobler, A., Ogrinc, P., Zenko,
B., D
zeroski, S., 2012. Estimating the risk
of re outbreaks in the natural environment. Data Min. Knowl. Disc. 24,
411e442.
Swarup, L., Korkmaz, S., Riley, D., 2011. Project delivery metrics for sustainable,
high-performance buildings. J. Constr. Eng. Manage. 137, 1043e1051.
U.S. Environmental Protection Agency, 2010a. Basic Information. EPA Web site.
http://www.epa.gov/greenbuilding/pubs/about.htm.
U.S. Environmental Protection Agency, 2010b. Why Build Green? EPA. Web site,.
http://www.epa.gov/greenbuilding/pubs/whybuild.htm.
van Hulse, J., Khoshgoftaar, T.M., Napolitano, A., Wald, R., 2012. Threshold-based
feature selection techniques for high-dimensional bioinformatics data. Netw.
Model Anal. Health Inf. Bioinforma. 1, 47e61.
Vapnik, V.N., 1995. The Nature of Statistical Learning Theory, rst ed. Springer, New
York, NY.
Vapnik, V., 1998. The support vector method of function estimation. In:
Suykens, J.A.K., Vandewalle, J.P.L. (Eds.), Nonlinear Modeling: Advanced Black
Box Techniques. Springer, New York, NY, pp. 55e85.
Wang, Y.-R., 2002. Applying the PDRI in Project Risk Management. Ph.D. dissertation. The University of Texas at Austin, Austin, TX.
Wang, Y.-R., Gibson, G.E., 2010. A study of preproject planning and project success
using ANNs and regression models. Autom. Constr. 19, 341e346.
Wang, Y., Tetko, I.V., Hall, M.A., Frank, E., Facius, A., Mayer, K.F.X., Mewes, H.W.,
2005. Gene selection from microarray data for cancer classicationda machine
learning approach. Comput. Biol. Chem. 29, 37e46.
Wang, Y.-R., Yu, C.-Y., Chan, H.-H., 2012. Predicting construction cost and schedule
success using articial neural networks ensemble and support vector machines
classication models. Int. J. Proj. Manage. 30, 470e478.
West, D., 2000. Neural network credit scoring models. Comput. Oper. Res. 27,
1131e1152.
Witten, I.H., Frank, E., 2005. Data Mining: Practical Machine Learning Tools and
Techniques, second ed. Morgan Kaufmann, San Francisco, CA.
Xia, G.-E., Jin, W.-D., 2008. Model of customer churn prediction on support vector
machine. Syst. Eng. Theory Pract. 28, 71e77.
Zhang, X., 2014. Paradigm shift toward sustainable commercial project development. Habitat Int. 42, 186e192.
Zhang, X., Platten, A., Shen, L., 2011a. Green property development practice in
China: costs and barriers. Build. Environ. 46, 2153e2160.
Zhang, X., Shen, L., Wu, Y., 2011b. Green strategy for gaining competitive advantage
in housing development: a China study. J. Clean. Prod. 19, 157e167.
Zhang, X., Shen, L., Zhang, L., 2013. Life cycle assessment of the air emissions during
building construction process: a case study in Hong Kong, Renew. Sust. Energ.
Rev. 17, 160e169.