Professional Documents
Culture Documents
Abstract: Construction projects are typically carried out in highly uncertain environments with the risk of cost and time overruns, and
subsequent disputes between stakeholders. One of common risk factors is that most cost items of a project are dependent random variables.
Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.
Thus, correlations between basic cost items need to be considered in predicting the total cost of the project. This paper intends to propose a
generic copula-based Monte Carlo simulation method for prediction of construction projects’ total costs with dependent cost items. An
algorithm to generate the joint probability distribution function of correlated cost items is developed and two examples are presented to
demonstrate the applicability of copulas in modeling construction costs as random variables. A merit of the proposed method is that it
not only can incorporate all different types of distributions in one framework, but it also captures the best dependence structure between
variables. This paper finds that different dependence structures can lead to different probability distributions of total cost. It also finds that the
existing goodness of fit tests can be employed in choosing the best performing copula. The paper concludes that the copula-based Monte
Carlo simulation method can predict total cost of construction projects with reasonable accuracy. DOI: 10.1061/(ASCE)CO.1943-7862
.0001194. © 2016 American Society of Civil Engineers.
Author keywords: Probability distribution; Cost items; Monte Carlo simulation; Copula; Dependence structure; Cost and schedule.
Introduction cannot calculate the higher moments of the total project cost
and importantly determine the accurate dependence structure of
The construction industry is perhaps one of the most unreliable these basic cost items. Touran and Wiser’s (1992) work is one
professions in terms of project cost and completion time. Many of the earliest efforts in modeling correlated cost items of construc-
construction projects experience cost and time overrun and, sub- tion projects. Based on statistical analysis of historical data, they
sequently, disputes arise between stakeholders. There are many concluded that lognormal distributions can be used to model the
common risk factors that influence cost and, hence, completion cost items of most construction activities, e.g., formwork, electri-
time of work packages and activities of a project. How to consider cal, mechanical, etc. With this assumption, they used a multivariate
and include these factors in determining basic cost items of a lognormal distribution to generate correlated random numbers
project would inevitably affect the accuracy in estimating the total for various construction components or cost items. Wall (1997)
cost of the project. Touran and Wiser (1992) analyzed statistical showed that when modeling the variability of project total cost,
cost data of 1014 building projects and showed that most constitu- the correlation between cost items, as random variables, is more
ent or basic cost items of project’s total cost are random, but not influential than the choice of distribution types of these variables.
independent. As such. correlations between basic cost items need to Touran and Suphot (1997) simulated total project cost using the
be considered in predicting the total cost of the project; otherwise so-called Iman and Conover (1982) method, or briefly IC method,
the variance of total cost will be underestimated, resulting in over- which considers the rank correlation between basic cost items of
run of the cost and completion time. the project. The IC method involves defining numerous marginal
In construction project management, the probability distribution distributions of basic cost items and a pairwise Spearman’s rank
function (pdf) of total cost is usually used in predicting the risk of correlation coefficients between them. In IC method, marginals
project cost overrun and planning for adequate contingencies. Skit- for basic cost items may have different distributions. The method
more and Ng (2002) derived an analytical method for calculation of arranges the generated random numbers in such a way that their
first- and second-order moments of total project cost based on mean ranks in correlation remain the same. The weakness of this method
and covariance values of basic cost items. However, the method is that the selection of marginals for each cost item and a correlation
coefficient between them does not completely describe the charac-
1
Assistant Professor, Construction Engineering and Management teristics of the joint distribution of these cost items. Thus, simula-
Group, Engineering Faculty, Science and Research Branch, Islamic Azad tion techniques that use the IC algorithm are not able to simulate
Univ., Hesarak, 1477893855 Tehran, Iran. E-mail: firouzi@srbiau.ac.ir complex dependence structures.
2
Associate Professor, School of Civil Engineering and Architecture, The Monte Carlo simulation is widely used for prediction of
Wuhan Univ. of Technology, Wuhan 430070, China. E-mail: missyangw@ response or output of complex systems in almost all areas of re-
163.com search interest, including construction management (Yang 2005).
3
Professor and Head, School of Civil, Environmental, and Chemical The Monte Carlo simulation can generate dependent random var-
Engineering, RMIT Univ., Melbourne 3000, Australia (corresponding
iables with designated correlation structures and, thus, fit a prob-
author). E-mail: chunqing.li@rmit.edu.au
Note. This manuscript was submitted on December 28, 2015; approved ability distribution (either parametric or empirical) and/or examine
on April 19, 2016; published online on July 7, 2016. Discussion period confidence levels of simulated results or outputs (e.g., total project
open until December 7, 2016; separate discussions must be submitted cost). For an effective Monte Carlo simulation, the marginal distri-
for individual papers. This paper is part of the Journal of Construction bution function of every random variable and dependence structure
Engineering and Management, © ASCE, ISSN 0733-9364. between them is needed. There are only a very limited number of
CCðu; vÞ ¼ ϕ−1 ½ϕðuÞ þ ϕðvÞ; u; v ∈ ½0,1 ð5Þ Genest and Rémillard (2008) presented a goodness-of-fit (G-o-F)
test to select the best bivariate family of copula for dependence
modeling of random variables, which is introduced briefly here.
where ϕð·Þ is known as a generator function of the copula and
They call the method “blanket test,” in the sense that it involves
Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.
second moment does not exist; and the third limitation is that it is Cn ðu; vÞ ¼ 1ðU i;n ≤ u; V i;n ≤ vÞ; ðu; vÞ ∈ ½0,1
n i¼1
not invariant under strictly increasing transformations. To over-
come these shortcomings of linear correlations, the Kendall’s rank ð10Þ
correlation, τ , can be used as an alternative measure of dependence,
which is defined as where 1ð·Þ is a logical indicator function and ðU i;n ; V i;n Þ are
pseudo-observations from C, which can be computed from the
τ ¼ Pr½Concordance − Pr½Discordance data ðX 1 ; Y 1 Þ; : : : ; ðX n ; Y n Þ as
i¼1
of p-values.
For some large integer N, repeat the following two steps
7. The best performing copula Cθn ðU; VÞ is used to simulate N sim
for every k ∈ f1; : : : ; Ng:
samples of dependent unit uniform random variables ðuj ; vj Þ,
a. Generate a random sample ðU k1 ; V k1 Þ; : : : ; ðU kn ; V kn Þ from
j ¼ 1; : : : ; N sim .
copula Cθn and generate the corresponding samples ðX k1 ;
8. Use the inverse cumulative distribution functions, F−1 ð:Þ and
Y k1 Þ; : : : ; ðX kn ; Y kn Þ using their inverse parametric/empirical
G−1 ð:Þ, to simulate N sim samples of dependent random cost
distribution functions, and deduce from them the associated
items, i.e., xj and yj , j ¼ 1; : : : ; N sim .
pseudo-observations ðU k1;n ; V k1;n Þ; : : : ; ðU kn;n ; V kn;n Þ;
ðkÞ ðkÞ 9. Let wj ¼ xj þ yj , j ¼ 1; : : : ; N sim , where W by definition is
b. Let Cn and θn stand for the versions of Cn and θn derived from
total project cost. The empirical cumulative distribution function
the pseudo-observations ðU k1;n ; V k1;n Þ; : : : ; ðU kn;n ; V kn;n Þ; and
of total cost can be generated from
3. Form an approximate realization of the test statistic under null
hypothesis H 0 as
1 X
N sim
Table 2. Statistics of Building Construction Cost Items (Adapted from σ2 ¼ expð2α þ β 2 Þ½expðβ 2 Þ − 1. Thus, using these relationships α
Touran and Suphot 1997) and β can be calculated accordingly. Furthermore, since the upper
Cost element Distribution type Mean (μ) SD (σ) α β and lower bounds of beta distributions are not given in Touran and
Sitework Gamma 5.7 4.8 1.41 0.25 Sophut (1997), in this paper gamma distribution is assumed for these
Concrete Lognormal 5.8 5.52 0.62 0.53 cases. In addition, their calculated linear correlation matrix is sum-
Masonry Gamma 3.86 3.28 1.38 0.36 marized in Table 3.
Metals Beta 5.12 3.8 1.82 0.35 It should be noted that most of the values of correlation coef-
Carpentry Beta 3.58 3.96 0.82 0.23 ficients given in this matrix are positive. It is well known that,
Isolation Lognormal 2.95 2.44 0.36 0.48 although the mean value of the summation is not affected with cor-
Doors Lognormal 3.78 3.11 0.47 0.47 relation structure, the standard deviation will be less for the case of
Finishes Gamma 5.84 3.74 2.44 0.42 no correlation between random variables than the case where there
Mechanical Gamma 8.94 5.84 2.34 0.26
Electrical Lognormal 5.61 3.45 0.68 0.37
are positive correlations between them (Skitmore and Ng 2002).
To illustrate the difference in total cost between uncorrelated cost
items and correlated ones, the total cost of the project with uncor-
related cost items is also simulated and shown in the Figs. 2 and 3.
where Γð:Þ is the gamma function; while for lognormal random var-
pffiffiffiffiffiffi In this study, two different families of elliptical copulas, namely
iable Y, it will be gðyjα; βÞ ¼ ð1=yβ 2πÞ expf−ðlny − αÞ2 =2β 2 g. Gaussian and Cauchy, are tested to examine the variability of dis-
In the case of gamma random variables, μ ¼ αβ and σ2 ¼ αβ 2 , tribution of total project cost as a dependent random variable be-
while for lognormal random variables, μ ¼ exp½α þ ðβ 2 =2Þ and cause the details of cost items for every building are not available in
Gaussian Copula
0.9
Uncorrelated more severe. For example the total costs at 98% percentile will
0.8 Cauchy Copula be 121 and 105 for Cauchy and Gaussian copulas respectively.
0.7 Second, for relatively small values of deviation from mean value
0.6 (e.g., one standard deviation, the probability of deviation of total
0.5 cost from expected value would be higher if the dependence
0.4 structure is Gaussian. For example, the results of simulation
0.3 suggest that for Gaussian copula Pr½ðμG − σG Þ ≤ T G ≤ ðμG þ σG Þ
0.2 ¼ 0.79, which is 13% higher than that for Cauchy copula,
0.1 i.e., Pr½ðμC − σC Þ ≤ T C ≤ ðμC þ σC Þ ¼ 0.70, where T G and T C
0 are the total cost of project simulated via Gaussian and Cauchy
0 20 40 60 80 100 120 140 160 180 200 copulas, respectively, and μG , μC and σG , σC are the mean and stan-
Total Cost($/ft 2)
dard deviation of T G and T C respectively. Finally the empirical cu-
Fig. 3. Comparison between empirical cumulative distributions of total mulative distribution functions (cdf) shown in Fig. 3 can be used
cost with different dependence between cost items to explain the previously mentioned results as well. This example
clearly confirms the importance of selection of appropriate copula
families, other than Gaussian. The latter is the common practice
in modeling the dependence structure of random variables in con-
Touran and Suphot (1997). It is well known that Student’s t copula struction projects (e.g., Moret and Einstein 2012; Yang 2006).
for a sufficiently large number of degrees of freedom converges to It needs to be noted that the purpose of this example is to test
Gaussian copula (Nelsen 2007). Therefore, in this study, Cauchy two copulas, Cauchy and Gaussian, and to show that there can be
copula is chosen as a lower limit case of t-copulas with degree of differences in employing different copulas to bind marginal and
freedom, ν ¼ 1. Cauchy copula captures dependence in the tails construct joint distributions. Therefore, in this example, only
Table 4. Calculated First Four Moments and Percentile Values of Simulated Total Cost
Copula family Mean SD Skewness Kurtosis 25% percentile 75% percentile 98% percentile
Gaussian 51.20 22.65 1.08 5.11 36.0 62.6 105
Cauchy 51.26 23.17 2.6 16.62 37 58.6 121
Variation (%) 0.01 2 140 225 2.8 −6.8 15.2
10 0.8
8 0.6
V
6
0.4
4
0.2
2
0 0
0 2 4 6 8 0 0.5 1
X U
Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.
(a) (b)
Fig. 4. (a) Distribution of sample data points (X,Y); (b) corresponding (U,V)
Table 5. Estimated Parameters of Marginal pdfs and Results of Chi-Square Goodness of Fit Test at the 5% Significance Level
Random variables
Cost items attributes X Y Kendall’s τ n
pffiffiffiffiffiffiffiffiffiffiffiffi
pdf fðxj2,1Þ ¼ ½1=Γð2Þxe−x gðyj1; 0.5Þ ¼ 1=y ð2=πÞ expf−2ðln y − 1Þ2 g 0.5
p-value 0.51 0.23
Steps 7 to 9 are applied, knowing that Touran and Suphot (1997) very cumbersome. Then, following steps 1–3 of the proposed al-
have already performed Steps 1 to 3 and reported the pdf of mar- gorithm as per Fig. 1 and after conducting the chi-square G-o-F
ginals (i.e., cost items). Steps 4 to 6 of the proposed algorithm test, it is found that random variables X and Y follow gamma and
(i.e., G-o-F test of copulas) need access to the cost database of lognormal distributions respectively. The associated pdf of these
buildings, which are not provided in Touran and Suphot (1997).
The results of this example have shown that simply using Gaussian
copula can mean ignorance of some important information, which
0.4 X data
has been ignored in other published literature. Ignorance of pos- Gamma (2,1)
0.35
sible extreme values of project total cost can be very serious Y data
Lognormal (1,0.5)
and possibly lead to large values of cost overrun, major disputes 0.3
and adverse consequences for and between project parties. 0.25
Density
0.2
Example 2 0.15
V
0.4 0.4 0.4
0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
U U U
Gaussian Copula Cauchy Copula Independent Variables
1 1 1
Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.
V
0.4 0.4 0.4
0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
U U U
random variables and estimated parameters are given in Table 5 and introduced and the blanket test for selecting the best-performing
shown in Fig. 5. Furthermore, following step 4 of the algorithm the copula been presented. An algorithm to generate the joint proba-
Kendall’s rank correlation between these variables can be readily bility distribution function of correlated cost items has been devel-
calculated using Eq. (5). In fact, in this example to check the per- oped in the paper with two examples to demonstrate its
formance of the proposed method, the sample points are intention- applicability of copulas in modeling construction costs as random
ally generated via a Frank copula with a positive-valued right-tailed variables. A merit of the proposed method is that it can not only
marginal distributions, i.e., lognormal and gamma, which are com- incorporate all different types of distributions in one framework,
monly used in construction risk management. The Kendall’s rank but also captures the best dependence structure between variables.
correlation, τ n , is chosen to be 0.5. After fitting marginals, they From numerical results, this paper has found that different depend-
should be bounded via the best copula to be ready for Monte Carlo ence structure can lead to different probability distributions of total
simulation. The copula parameter,θn , for three Archimedean cop- cost and that reliance on Gaussian copulas may result in the neg-
ulas, i.e., Clayton, Gumbel, and Frank is calculated using the func- ligence of small probabilities of extreme project cost/time overruns.
tional relationships given in Table 1. In the case of the other two It has also been found that the existing G-o-F tests can be employed
elliptical copulas, i.e., Gaussian and Cauchy, it is calculated using in choosing the best performing copula and that the confidence
the relationship between Pearson linear correlation coefficient and interval of simulated total cost can be determined more accurately.
Kendall’s rank correlation [i.e., τ ¼ ð2=πÞ sin−1 ðρÞ. These speci- It can be concluded that the proposed method can predict total cost
fied copulas have undergone the parametric bootstrapping pro- of construction projects with reasonable accuracy. Copula-based
cedure (i.e., step 5 of the algorithm) to compute the Cramer-von Monte Carlo simulation can be regarded as a new approach in de-
Mises statistic, Sn , and corresponding p-values and select the best veloping advanced methods for construction risk management of
performing copula according to the step 6 of the algorithm. cost overruns.
The results in Table 6 suggest that Frank copula has the mini-
mum Sn and thus, is the best performing. Furthermore, in reference
to p-values it is the only copula to be chosen at the 10% signifi- Acknowledgments
cance level. The performance of these five estimated copulas in
mimicking the pattern of recorded data in a Monte Carlo simulation Financial support from Australian Research Council under
(i.e., step 7 of proposed algorithm) can be visually checked in Fig. 6. DP140101547 and LP150100413 is gratefully acknowledged.
Again, as it is apparent from the coincidence of scatter of simulated
and recorded data, the performance of Frank copula is the best.
The other observations based on a comprehensive test is that, with References
the increase in Kendall’s rank correlation value, the performance of
the G-o-F test improves and if the data are more inclined to extreme Embrechts, P., McNeil, A., and Straumann, D. (2002). “Correlation and
values, the chance of selecting Cauchy and Clayton copulas dependence in risk management: Properties and pitfalls.” Risk manage-
increases. ment: Value at risk and beyond, M. A. H. Dempster, ed., Cambridge
University Press, Cambridge, 176–223.
Fischer, M., Köck, C., Schlüter, S., and Weigert, F. (2009). “An empir-
Conclusions ical analysis of multivariate copula models.” Quant. Finance, 9(7),
839–854.
A generic copula-based Monte Carlo simulation method has been Genest, C., and Favre, A. C. (2007). “Everything you always wanted to
proposed in this paper for prediction of the total cost of construc- know about copula modeling but were afraid to ask.” J. Hydrol. Eng.,
tion projects. Concepts of dependency and copulas have been 10.1061/(ASCE)1084-0699(2007)12:4(347), 347–368.
-7862.0000507, 1075–1084. Yang, I.-T. (2005). “Simulation-based estimation for correlated cost
Nelsen, R. B. (2007). An introduction to copulas (Springer series in elements.” Int. J. Project Manage., 23(4), 275–282.
statistics), 2nd Ed., Springer, New York. Yang, I.-T. (2006). “Using Gaussian copula to simulate repetitive projects.”
Reddy, M. J., and Ganguli, P. (2012). “Risk assessment of hydroclimatic Constr. Manage. Econ., 24(9), 901–909.
variability on groundwater levels in the Manjara basin aquifer in india Zhang, L., and Singh, V. P. (2006). “Bivariate flood frequency analysis
using archimedean copulas.” J. Hydrol. Eng., 10.1061/(ASCE)HE using the copula method.” J. Hydrol. Eng., 10.1061/(ASCE)1084
.1943-5584.0000564, 1345–1357. -0699(2006)11:2(150), 150–164.
Skitmore, M., and Ng, S. T. (2002). “Analytical and approximate variance Zhou, W., Hong, H. P., and Zhang, S. (2012). “Impact of dependent
of total project cost.” J. Constr. Eng. Manage., 10.1061/(ASCE)0733 stochastic defect growth on system reliability of corroding pipelines.”
-9364(2002)128:5(456), 456–460. Int. J. Press. Vessels Pip., 96–97, 68–77.