You are on page 1of 9

Prediction of Total Cost of Construction Project with

Dependent Cost Items


Afshin Firouzi 1; Wei Yang 2; and Chun-Qing Li 3

Abstract: Construction projects are typically carried out in highly uncertain environments with the risk of cost and time overruns, and
subsequent disputes between stakeholders. One of common risk factors is that most cost items of a project are dependent random variables.
Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.

Thus, correlations between basic cost items need to be considered in predicting the total cost of the project. This paper intends to propose a
generic copula-based Monte Carlo simulation method for prediction of construction projects’ total costs with dependent cost items. An
algorithm to generate the joint probability distribution function of correlated cost items is developed and two examples are presented to
demonstrate the applicability of copulas in modeling construction costs as random variables. A merit of the proposed method is that it
not only can incorporate all different types of distributions in one framework, but it also captures the best dependence structure between
variables. This paper finds that different dependence structures can lead to different probability distributions of total cost. It also finds that the
existing goodness of fit tests can be employed in choosing the best performing copula. The paper concludes that the copula-based Monte
Carlo simulation method can predict total cost of construction projects with reasonable accuracy. DOI: 10.1061/(ASCE)CO.1943-7862
.0001194. © 2016 American Society of Civil Engineers.
Author keywords: Probability distribution; Cost items; Monte Carlo simulation; Copula; Dependence structure; Cost and schedule.

Introduction cannot calculate the higher moments of the total project cost
and importantly determine the accurate dependence structure of
The construction industry is perhaps one of the most unreliable these basic cost items. Touran and Wiser’s (1992) work is one
professions in terms of project cost and completion time. Many of the earliest efforts in modeling correlated cost items of construc-
construction projects experience cost and time overrun and, sub- tion projects. Based on statistical analysis of historical data, they
sequently, disputes arise between stakeholders. There are many concluded that lognormal distributions can be used to model the
common risk factors that influence cost and, hence, completion cost items of most construction activities, e.g., formwork, electri-
time of work packages and activities of a project. How to consider cal, mechanical, etc. With this assumption, they used a multivariate
and include these factors in determining basic cost items of a lognormal distribution to generate correlated random numbers
project would inevitably affect the accuracy in estimating the total for various construction components or cost items. Wall (1997)
cost of the project. Touran and Wiser (1992) analyzed statistical showed that when modeling the variability of project total cost,
cost data of 1014 building projects and showed that most constitu- the correlation between cost items, as random variables, is more
ent or basic cost items of project’s total cost are random, but not influential than the choice of distribution types of these variables.
independent. As such. correlations between basic cost items need to Touran and Suphot (1997) simulated total project cost using the
be considered in predicting the total cost of the project; otherwise so-called Iman and Conover (1982) method, or briefly IC method,
the variance of total cost will be underestimated, resulting in over- which considers the rank correlation between basic cost items of
run of the cost and completion time. the project. The IC method involves defining numerous marginal
In construction project management, the probability distribution distributions of basic cost items and a pairwise Spearman’s rank
function (pdf) of total cost is usually used in predicting the risk of correlation coefficients between them. In IC method, marginals
project cost overrun and planning for adequate contingencies. Skit- for basic cost items may have different distributions. The method
more and Ng (2002) derived an analytical method for calculation of arranges the generated random numbers in such a way that their
first- and second-order moments of total project cost based on mean ranks in correlation remain the same. The weakness of this method
and covariance values of basic cost items. However, the method is that the selection of marginals for each cost item and a correlation
coefficient between them does not completely describe the charac-
1
Assistant Professor, Construction Engineering and Management teristics of the joint distribution of these cost items. Thus, simula-
Group, Engineering Faculty, Science and Research Branch, Islamic Azad tion techniques that use the IC algorithm are not able to simulate
Univ., Hesarak, 1477893855 Tehran, Iran. E-mail: firouzi@srbiau.ac.ir complex dependence structures.
2
Associate Professor, School of Civil Engineering and Architecture, The Monte Carlo simulation is widely used for prediction of
Wuhan Univ. of Technology, Wuhan 430070, China. E-mail: missyangw@ response or output of complex systems in almost all areas of re-
163.com search interest, including construction management (Yang 2005).
3
Professor and Head, School of Civil, Environmental, and Chemical The Monte Carlo simulation can generate dependent random var-
Engineering, RMIT Univ., Melbourne 3000, Australia (corresponding
iables with designated correlation structures and, thus, fit a prob-
author). E-mail: chunqing.li@rmit.edu.au
Note. This manuscript was submitted on December 28, 2015; approved ability distribution (either parametric or empirical) and/or examine
on April 19, 2016; published online on July 7, 2016. Discussion period confidence levels of simulated results or outputs (e.g., total project
open until December 7, 2016; separate discussions must be submitted cost). For an effective Monte Carlo simulation, the marginal distri-
for individual papers. This paper is part of the Journal of Construction bution function of every random variable and dependence structure
Engineering and Management, © ASCE, ISSN 0733-9364. between them is needed. There are only a very limited number of

© ASCE 04016072-1 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 04016072


parametric multivariate pdfs, e.g., normal, lognormal, beta, and an m-copula can be defined as an m-dimensional cumulative proba-
t-student. When historical data is available, a parametric marginal bility distribution function (cdf) whose support is contained in ½0,1m
distribution can be fitted to these data, using maximum likelihood and whose one-dimensional margins are uniform on [0,1]. The re-
or method of moments, and subsequently goodness-of-fit of the lationship between joint distribution function FX ðX 1 ; : : : ; X m Þ, cop-
fitted distribution can be verified statistically. In the absence of ula C, univariate marginal distributions, FX1 ðx1 Þ; :::;FXm ðxm Þ
reliable historical data, appropriate marginal distributions can be and inverse functions F−1 −1
X 1 ðu1 Þ; : : : ; FX N ðum Þ can be expressed
established subjectively based on experts’ experience. However, in as follows:
any case, there is no guarantee that all marginals follow the same
parametric distribution family. FX ðX 1 ; : : : ; X m Þ ¼ FX ½F−1 −1
X 1 ðu1 Þ; : : : ; FX m ðum Þ
In the Monte Carlo simulation, copulas are a very useful tool ¼ Pr½U 1 ≤ u1 ; : : : ; U m ≤ um  ¼ Cðu1 ; : : : ; um Þ
for modeling dependence of random variables. While the Gaussian
copula is the traditional choice for modeling dependence, it is most ð1Þ
sensitive to the center of the distribution, which implies tail inde-
where xi and ui correspondingly are the values for the ith random
Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.

pendence. In other words, an important reason to consider other


copulas than Gaussian copula is its failure to capture dependence variable and cdf value of the random variable X i with the marginal
between extreme events/values. These extreme values are much distribution function FXi ðxi Þ, whilst m is the number of random var-
more important from risk analysis point of view (i.e., cost and time iables and by definition U i ¼ FXi ðxi Þ. Thus with the copula notion
overruns). Clearly selection of other types of copula should be the joint distribution can be expressed in terms of its respective mar-
based on their robustness criteria and upon testing their adequacy ginal distributions and a function C that binds them together
in modeling specific data sets (Embrechts et al. 2002). FðX 1 ; : : : ; X m Þ ¼ C½FX1 ðx1 Þ; : : : ; FXm ðxm Þ; θ ð2Þ
The demand to evaluate the true impact of correlation structure
of individual cost items on the probability distribution of total cost where θ is a vector or scalar quantity called the dependence param-
is the motivation to conduct the current study. In fact, whilst finance eter of copula.
and insurance literature is rife with different aspects of application It can be shown that if the marginals [i.e., FX1 ðx1 Þ; : : : ;
of copula, a thorough examination of published literature reveals FXm ðxm Þ] are continuous, then the corresponding copula C is
that there is no published work on their application to construction unique (Nelson 2007). However, in practice because of scarcity
project management (i.e., prediction of total project cost). This of data for dependence structure, the unique copula could not be
gives rise to the need for the present paper, the main purpose of determined beforehand with certainty. The appropriate copula for
which is to develop a generic copula-based Monte Carlo simulation a particular application is the one which best captures the depend-
method to generate the joint probability distribution of correlated ence feature of the data. A large number of parametric copula
cost items that can be employed for prediction of the total cost of families, including elliptical copulas and Archimedean copulas are
construction projects. The method can not only incorporate all dif- proposed in the literature and each of them imposes a different
ferent types of distributions in one framework, but also captures the dependence structure on the data which are briefly introduced next.
best dependence structure between variables based on goodness-of- Elliptical Copula. The t-copula for bivariate distribution can be
fit test of data and using the Cramer-von Mises statistic (Genest and defined as
Rémillard 2008). With the proposed method, this paper intends to Z t−1 ðuÞ Z t−1 ðvÞ
set a new direction in studying the effect of different dependence v v 1
Ct ðu; v; ρ; νÞ ¼
structures for various interrelated cost items on the total cost of −∞ −∞ 2πð1 − ρ2 Þ1=2
construction projects.  
x2 − 2ρxy þ y2 −ðνþ2Þ=2
In this paper the concepts of dependency and copulas, including × 1þ dxdy ð3Þ
the structure of different elliptical and Archimedean copula fami- νð1 − ρ2 Þ
lies, are introduced. Then the blanket test (Genest and Rémillard
2008), which was recently used for selection of the best-performing where ν (the number of degrees of freedom) and ρ (linear corre-
copula model, is presented. Two worked examples are provided to lation coefficient) are the parameters of the copula and t−1 ν ð·Þ is
demonstrate the importance of dependence structure and appli- the inverse cdf of a t-student distribution with ν degrees of freedom.
cability of copulas in modeling construction costs as random var- The correlation coefficient is defined as ρðX; YÞ ¼ cov½X; Y=σX σY
iables. The proposed method can calculate the confidence interval between a pair of random variables ðX; YÞ, where cov½X; Y ¼
of simulated total cost more accurately. It can be regarded as a new E½XY − E½XE½Y is the covariance measure and σX and σY denote
approach in developing advanced methods for construction risk the standard deviations of X and Y respectively.
management. Accurate prediction of total cost of construction proj- When the number of degrees of freedom is large (approximately
ects can ensure the project be completed in time and on cost. 30 or so), t-copulas converge to the Gaussian copula, which for
bivariate case takes the form
Z Φ−1 ðuÞ Z Φ−1 ðvÞ
1
Modeling of Dependence Structure Cðu; v; ρÞ ¼
−∞ −∞ 2πð1 − ρ2 Þ1=2
 2 
Monte-Carlo simulation of a system or output of the system with x − 2ρxy þ y2
× exp − dxdy ð4Þ
multivariate dependent random variables requires selecting a prob- 2ð1 − ρ2 Þ
ability distribution function for each individual input variable and
deciding the dependency between the input variables. Copulas are where Φ−1 ð·Þ is the inverse cdf of a standard normal random
a useful tool to model the system or output of the system with mar- variable. A t-copula with low values of ν is heavy-tailed, which
ginal distributions for each random variable and determining their means they have more points in the tails than the Gaussian one.
dependence structure separately. According to Sklar’s theorem, an Since the elliptical copulas are derived from multivariate distribu-
m-dimensional copula, or m-copula, is the function C, from unit tions (i.e., multivariate t-student and Gaussian distributions
m-cube ½0,1m to a unit interval [0,1] (Sklar 1959). It follows that respectively), their generalization to unlimited random variables

© ASCE 04016072-2 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 04016072


Z 1
Z 1
is straightforward. Fischer et al. (2009) in a comparative study
τ ¼4 Cðu; vÞdCðu; vÞ − 1 ð7Þ
showed that if the number of dimensions increases, the Student’s 0 0
t-copulas outperform Archimedean copulas and thus, they domi-
nate in empirical applications. For Archimedean family of copulas, there is an explicit
functional relationship between the copula parameter θ and
Kendall’s dependency measure τ , as shown in Eq. (7) and Table 1.
Archimedean Copulas
The function C∶½0,12 → ½0,1 is called a bivariate Archimedean
copula if it has the following property Selection of Best-Fit Copula

CCðu; vÞ ¼ ϕ−1 ½ϕðuÞ þ ϕðvÞ; u; v ∈ ½0,1 ð5Þ Genest and Rémillard (2008) presented a goodness-of-fit (G-o-F)
test to select the best bivariate family of copula for dependence
modeling of random variables, which is introduced briefly here.
where ϕð·Þ is known as a generator function of the copula and
They call the method “blanket test,” in the sense that it involves
Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.

ϕ−1 ð·Þ is its inverse function. Generator function is a convex,


no parameter tuning or other strategic choices (e.g., empirical mar-
decreasing function such that ϕ∶ð0,1 → ½0; ∞Þ and ϕð0Þ ¼ 1.
ginals can be used without need for parametric distributions). This
In this study, three Archimedean families of copula functions are
G-o-F test is based on a parametric bootstrapping procedure and
employed, i.e., Clayton, Gumbel, and Frank copulas. These bivari-
makes use of the Cramer-von Mises statistic Sn :
ate copulas have been successfully used in risk analysis with differ-
Z
ent application areas, namely financial engineering, e.g., Mai and
Scherer (2014) and McNeil et al. (2005); hydrologic engineering, Sn ¼ n½Cn ðu; vÞ − Cθn ðu; vÞ2 dCn ðu; vÞ ð8Þ
½0,12
e.g., Zhang and Singh (2006) and Reddy and Ganguli (2012); and
infrastructure reliability engineering, e.g., Srinivas et al. (2006) and where Cn is called empirical copula and is calculated using n ob-
Zhou et al. (2012). The expressions for Archimedean copula fam- servational data, and Cθn is the corresponding parametric copula
ilies, generator functions and other properties are summarized in with estimated parameter equal to θn . The null hypothesis H 0 is
Table 1 (Nelsen 2007). defined as
H0 ∶C ∈ C0 ; C0 ¼ fCθ ∶θ ∈ Og against H1 ∶C ∈= C0 ð9Þ
Measure of Dependence
where O is an open subset of ℜp for some integer p ≥ 1.
As can be seen from Eq. (3), for elliptical distributions, the depend- Genest et al. (2009) described the steps involved in the paramet-
ence structure is fully determined by the correlation coefficient. ric bootstrap-based G-o-F procedure as
However, the correlation coefficient as the measure of dependence 1. Compute empirical copula Cn from the pseudo-observations,
has some limitations. It is well known that zero correlation does not i.e., ðU 1;n ; V 1;n Þ; : : : ; ðU n;n ; V n;n Þ:
imply independence; the second limitation is that the correlation
coefficient is not defined for some heavy-tailed distributions whose 1X n

second moment does not exist; and the third limitation is that it is Cn ðu; vÞ ¼ 1ðU i;n ≤ u; V i;n ≤ vÞ; ðu; vÞ ∈ ½0,1
n i¼1
not invariant under strictly increasing transformations. To over-
come these shortcomings of linear correlations, the Kendall’s rank ð10Þ
correlation, τ , can be used as an alternative measure of dependence,
which is defined as where 1ð·Þ is a logical indicator function and ðU i;n ; V i;n Þ are
pseudo-observations from C, which can be computed from the
τ ¼ Pr½Concordance − Pr½Discordance data ðX 1 ; Y 1 Þ; : : : ; ðX n ; Y n Þ as

¼ Pr½ðxi − xj Þðyi − yj Þ > 0 − Pr½ðxi − xj Þðyi − yj Þ < 0 1 X n


U i;n ¼ 1ðX j ≤ X i Þ ð11aÞ
  n þ 1 j¼1
n
¼ ðc − dÞ ð6Þ
2
1 X n
V i;n ¼ 1ðY j ≤ Y i Þ; i ∈ f1; : : : ; ng ð11bÞ
n þ 1 j¼1
where n is the sample size, i.e., ðx1 ; y1 Þ; : : : ; ðxn ; yn Þ observations
of independent identically distributed random variables X and Y;
and c and d are the number of concordant and discordant pairs, The pseudo-observations ðU i;n ; V i;n Þ can be interpreted as a
respectively. sample from the underlying copula C. Although in literature
The range of τ is ½−1,1, where 1 represents total concordance there are different methods for estimating the dependence
and -1 represents total discordance, and 0 means zero concordance. parameter of a copula, in the case of bivariate Archimedean
Furthermore, it should be noted that τ can be expressed in terms of copulas it can be directly calculated from Kendall’s depend-
copula functions as: ency measure τ n between n pairs of data (Genest and Favre

Table 1. Summary of Expressions for Archimedean Copula Families


Copula family Cðu; vÞ Parameter range ϕ Relation to Kendall’s τ
Clayton ðu−θ
þ v−θ
− 1Þ−1==θ θ ∈ ½−1; ∞ \ f0g ð1=θÞðt−θ − 1Þ θ=ðθ þ 2Þ
Gumbel expð−½ð− ln uÞθ þ ð− ln vÞθ 1=θ Þ θ ∈ ½1; ∞ ð− ln tÞθ ðθ − 1Þ=θ
Frank −ð1=θÞ log½1 þ ðe−θu − 1Þðe−θv − 1Þ=ðe−θ − 1Þ θ ∈ ½−∞; ∞ \ f0g − ln½ðe−θt − 1Þ=e−θ − 1 1 þ ð4=θÞ½D1 ðθÞ − 1
Note: Dk ðxÞ is the Debye function; for any positive integer k; Dk ðxÞ ¼ k=xk ∫ x0 tk =ðet − 1Þdt.

© ASCE 04016072-3 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 04016072


2007). In this method, dependence parameter θn for any using the maximum likelihood method for the cost items X
particular copula family under consideration can be estimated and Y.
via functional relationship between copula parameter θ and 3. The chi-square G-o-F test is conducted to reject/not reject the
Kendall’s τ as defined in Table 1. It should be noted that, fitted marginal distribution of cost items. If the null hypothesis
in the case of elliptical copulas (i.e., Gaussian and Cauchy), is rejected, another distribution family for the corresponding
the copula parameter is the Pearson correlation coefficient, ρ, cost item is selected and the procedure is repeated from Step 2.
where τ ¼ ð2=πÞ sin−1 ðρÞ: 4. For any copula family, Kendall’s τ n and corresponding copula
2. Compute the Cramer–von Mises statistic: parameter θn are estimated using Eq. (6) and the relationships
Z given in Table 1.
Sn ¼ n½Cn ðu; vÞ − Cθn ðu; vÞ2 dCn ðu; vÞ 5. Conduct blanket test based on which the so-called Cramer-von
½0,12 Mises distance, i.e., Sn , and the corresponding p-values are
Xn calculated for any copula family.
¼ ½Cn ðu; vÞ − Cθn ðu; vÞ2 ð12Þ 6. The best performing copula, i.e., Cθn ðU; VÞ, is selected based
on two criteria, such as minimization of Sn and maximization
Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.

i¼1
of p-values.
For some large integer N, repeat the following two steps
7. The best performing copula Cθn ðU; VÞ is used to simulate N sim
for every k ∈ f1; : : : ; Ng:
samples of dependent unit uniform random variables ðuj ; vj Þ,
a. Generate a random sample ðU k1 ; V k1 Þ; : : : ; ðU kn ; V kn Þ from
j ¼ 1; : : : ; N sim .
copula Cθn and generate the corresponding samples ðX k1 ;
8. Use the inverse cumulative distribution functions, F−1 ð:Þ and
Y k1 Þ; : : : ; ðX kn ; Y kn Þ using their inverse parametric/empirical
G−1 ð:Þ, to simulate N sim samples of dependent random cost
distribution functions, and deduce from them the associated
items, i.e., xj and yj , j ¼ 1; : : : ; N sim .
pseudo-observations ðU k1;n ; V k1;n Þ; : : : ; ðU kn;n ; V kn;n Þ;
ðkÞ ðkÞ 9. Let wj ¼ xj þ yj , j ¼ 1; : : : ; N sim , where W by definition is
b. Let Cn and θn stand for the versions of Cn and θn derived from
total project cost. The empirical cumulative distribution function
the pseudo-observations ðU k1;n ; V k1;n Þ; : : : ; ðU kn;n ; V kn;n Þ; and
of total cost can be generated from
3. Form an approximate realization of the test statistic under null
hypothesis H 0 as
1 X
N sim

J N sim ðwÞ ¼ 1ðwj ≤ wÞ ð15Þ


ðkÞ
X
n
ðkÞ 2
N sim j¼1
Sn ¼ ½Cn ðU ki;n ; V ki;n Þ − CθðkÞ ðU ki;n ; V ki;n Þ ð13Þ
n
i¼1 Eq. (15)—i.e., J N sim ðwÞ—can be used for prediction of total
cost of construction projects and the risk of cost overrun for a given
4. An approximate p-value for the test can be determined as budget.
Two examples will be presented next to demonstrate the appli-
1X N
ðkÞ cability of the developed copula-based Monte Carlo method. In the
p¼ 1ðSn ≥ Sn Þ ð14Þ
N k¼1 first example, the difference is shown in the empirical probability
distributions of total project costs between the one with dependence
The p-value is a measure of how much evidence there is against and the one without dependence of random variable (costs items)
the null hypothesis. If the p-value is larger than a particular signifi- and thus the need for the developed method is justified. The second
cance level, then the null hypothesis is accepted; otherwise, it is example will illustrate how the best performing bivariate copula can
rejected. The larger the p-value, the more strongly the test accepts be selected to model the dependency of cost items of a project.
the null hypothesis. By comparing the performance among the cop- It is acknowledged that, in some real-world situations the
ulas, the copula that results in smaller Cramer-von Mises distance historical data on project costs may not be readily available and
(Sn ) and maximum p-value is preferred. thus, statistical estimation of required parameters for copulas,
(e.g., Kendall’s τ ) may not be possible. In these cases, copulas can
be used to examine the variability of the dependent random vari-
Probability Distribution of Total Cost ables to different dependence structures via different copula fam-
ilies and same correlation matrix.
A stepwise Monte Carlo simulation algorithm is developed in this
paper to generate the joint probability distribution of individual cost
items, which is then used to predict the total cost of construction Worked Examples
projects. The flowchart of the algorithm is summarized in Fig. 1.
It is understood that, for most construction projects, historical data
Example 1
on cost items (as input random variables) are available. From these
data and based on statistical tests, the best performing marginals This example is taken from Touran and Suphot (1997). Their study
and their associated copula can be determined. is based on a database of 131 low-rise office building projects
Since the introduced G-o-F in this paper is for two random var- that consist of 10 cost items as shown in Table 2. Since the work-
iables in the case of Archimedean copulas, the developed simula- packages related to the cost are common in most building construc-
tion algorithm is for the case of bivariate distribution, which has tions, data of their costs are readily available. The total project cost
nine steps. For the cases of more variables, there are extensions of is the sum of these 10 cost items, which are modeled as random
these copulas, e.g., Nested Copulas, which is beyond the scope of variables. The best fitted marginals with their mean and standard de-
this paper. viations are given in Touran and Suphot (1997) and summarized in
1. Collect n pairs of historical data of two cost items, X and Table 2, whilst the corresponding parameters of these distribution
Y [i.e., ðX i ; Y i Þ. functions are back-calculated in this paper (i.e., α and β in Table 2).
2. The parameters of fitted pdf of any distribution family under It should be noted that the pdf of a gamma-distributed random
consideration, i.e., fðxjαx ; β x Þ and gðyjαy ; β y Þ, are estimated variable X is defined as fðxjα; βÞ ¼ f1=½β α ΓðαÞg xα−1 e−x=β

© ASCE 04016072-4 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 04016072


Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.

Fig. 1. Stepwise Monte Carlo simulation procedure

Table 2. Statistics of Building Construction Cost Items (Adapted from σ2 ¼ expð2α þ β 2 Þ½expðβ 2 Þ − 1. Thus, using these relationships α
Touran and Suphot 1997) and β can be calculated accordingly. Furthermore, since the upper
Cost element Distribution type Mean (μ) SD (σ) α β and lower bounds of beta distributions are not given in Touran and
Sitework Gamma 5.7 4.8 1.41 0.25 Sophut (1997), in this paper gamma distribution is assumed for these
Concrete Lognormal 5.8 5.52 0.62 0.53 cases. In addition, their calculated linear correlation matrix is sum-
Masonry Gamma 3.86 3.28 1.38 0.36 marized in Table 3.
Metals Beta 5.12 3.8 1.82 0.35 It should be noted that most of the values of correlation coef-
Carpentry Beta 3.58 3.96 0.82 0.23 ficients given in this matrix are positive. It is well known that,
Isolation Lognormal 2.95 2.44 0.36 0.48 although the mean value of the summation is not affected with cor-
Doors Lognormal 3.78 3.11 0.47 0.47 relation structure, the standard deviation will be less for the case of
Finishes Gamma 5.84 3.74 2.44 0.42 no correlation between random variables than the case where there
Mechanical Gamma 8.94 5.84 2.34 0.26
Electrical Lognormal 5.61 3.45 0.68 0.37
are positive correlations between them (Skitmore and Ng 2002).
To illustrate the difference in total cost between uncorrelated cost
items and correlated ones, the total cost of the project with uncor-
related cost items is also simulated and shown in the Figs. 2 and 3.
where Γð:Þ is the gamma function; while for lognormal random var-
pffiffiffiffiffiffi In this study, two different families of elliptical copulas, namely
iable Y, it will be gðyjα; βÞ ¼ ð1=yβ 2πÞ expf−ðlny − αÞ2 =2β 2 g. Gaussian and Cauchy, are tested to examine the variability of dis-
In the case of gamma random variables, μ ¼ αβ and σ2 ¼ αβ 2 , tribution of total project cost as a dependent random variable be-
while for lognormal random variables, μ ¼ exp½α þ ðβ 2 =2Þ and cause the details of cost items for every building are not available in

© ASCE 04016072-5 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 04016072


Table 3. Correlation Coefficient between Cost Items (Adapted from Touran and Suphot 1997)
Cost items Sitework Concrete Masonry Metals Carpentry Isolation Doors Finishes Mechanical Electrical
Sitework 1 — — — — — — — — —
Concrete 0.18 1 — — — — — — — —
Masonry 0.15 0.06 1 — — — — — — —
Metals 0.17 0.11 −0.01 1 — — — — — —
Carpentry 0.12 −0.04 0.11 −0.33 1 — — — — —
Isolation 0.2 0.19 0.06 0.12 0.18 1 — — — —
Doors 0.36 0.12 0.14 0.39 0.03 0.16 1 — — —
Finishes 0.17 0.21 0.22 0.2 0.07 0.18 0.25 1 — —
Mechanical 0.36 0.38 0.33 0.35 −0.02 0.18 0.28 0.42 1 —
Electrical 0.44 0.4 0.3 0.33 0.07 0.15 0.36 0.44 0.79 1
Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.

more accurately while the Gaussian copula is concentrated more


evenly in the vicinity of centered data points.
Fig. 2 shows the histograms of simulated total cost via two dif-
ferent copulas with same correlation matrix and the calculated first
four moments and percentiles are shown in Table 4. These results
show that, while the mean and standard deviation are not affected
by the chosen copula, the use of Cauchy copula results in higher
third moment (skewness), which is 140% higher and more impor-
tantly, will result in higher fourth moment, i.e., kurtosis, which in
this case is 225% higher compared with the Gaussian copula vari-
ant. Thus, if the true dependence structure of basic cost items is
better captured via Cauchy copula, the total project cost would have
a tendency to a distinct peak near the mean, declining rather rapidly,
Fig. 2. Simulated total cost with different dependence between cost and having heavy tails in the distribution density. The depicted
items histograms and calculated percentiles (Fig. 2 and Table 4) are
clearly in agreement with this finding. Furthermore from the risk
analysis perspective, the following two results will be of interest.
First, if the true dependence structure of cost items is of Cauchy
1 type, the extreme values of total cost (i.e., cost overruns) will be
Cumulative Probability Distribution

Gaussian Copula
0.9
Uncorrelated more severe. For example the total costs at 98% percentile will
0.8 Cauchy Copula be 121 and 105 for Cauchy and Gaussian copulas respectively.
0.7 Second, for relatively small values of deviation from mean value
0.6 (e.g., one standard deviation, the probability of deviation of total
0.5 cost from expected value would be higher if the dependence
0.4 structure is Gaussian. For example, the results of simulation
0.3 suggest that for Gaussian copula Pr½ðμG − σG Þ ≤ T G ≤ ðμG þ σG Þ
0.2 ¼ 0.79, which is 13% higher than that for Cauchy copula,
0.1 i.e., Pr½ðμC − σC Þ ≤ T C ≤ ðμC þ σC Þ ¼ 0.70, where T G and T C
0 are the total cost of project simulated via Gaussian and Cauchy
0 20 40 60 80 100 120 140 160 180 200 copulas, respectively, and μG , μC and σG , σC are the mean and stan-
Total Cost($/ft 2)
dard deviation of T G and T C respectively. Finally the empirical cu-
Fig. 3. Comparison between empirical cumulative distributions of total mulative distribution functions (cdf) shown in Fig. 3 can be used
cost with different dependence between cost items to explain the previously mentioned results as well. This example
clearly confirms the importance of selection of appropriate copula
families, other than Gaussian. The latter is the common practice
in modeling the dependence structure of random variables in con-
Touran and Suphot (1997). It is well known that Student’s t copula struction projects (e.g., Moret and Einstein 2012; Yang 2006).
for a sufficiently large number of degrees of freedom converges to It needs to be noted that the purpose of this example is to test
Gaussian copula (Nelsen 2007). Therefore, in this study, Cauchy two copulas, Cauchy and Gaussian, and to show that there can be
copula is chosen as a lower limit case of t-copulas with degree of differences in employing different copulas to bind marginal and
freedom, ν ¼ 1. Cauchy copula captures dependence in the tails construct joint distributions. Therefore, in this example, only

Table 4. Calculated First Four Moments and Percentile Values of Simulated Total Cost
Copula family Mean SD Skewness Kurtosis 25% percentile 75% percentile 98% percentile
Gaussian 51.20 22.65 1.08 5.11 36.0 62.6 105
Cauchy 51.26 23.17 2.6 16.62 37 58.6 121
Variation (%) 0.01 2 140 225 2.8 −6.8 15.2

© ASCE 04016072-6 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 04016072


1
12

10 0.8

8 0.6

V
6
0.4
4
0.2
2

0 0
0 2 4 6 8 0 0.5 1
X U
Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.

(a) (b)
Fig. 4. (a) Distribution of sample data points (X,Y); (b) corresponding (U,V)

Table 5. Estimated Parameters of Marginal pdfs and Results of Chi-Square Goodness of Fit Test at the 5% Significance Level
Random variables
Cost items attributes X Y Kendall’s τ n
pffiffiffiffiffiffiffiffiffiffiffiffi
pdf fðxj2,1Þ ¼ ½1=Γð2Þxe−x gðyj1; 0.5Þ ¼ 1=y ð2=πÞ expf−2ðln y − 1Þ2 g 0.5
p-value 0.51 0.23

Steps 7 to 9 are applied, knowing that Touran and Suphot (1997) very cumbersome. Then, following steps 1–3 of the proposed al-
have already performed Steps 1 to 3 and reported the pdf of mar- gorithm as per Fig. 1 and after conducting the chi-square G-o-F
ginals (i.e., cost items). Steps 4 to 6 of the proposed algorithm test, it is found that random variables X and Y follow gamma and
(i.e., G-o-F test of copulas) need access to the cost database of lognormal distributions respectively. The associated pdf of these
buildings, which are not provided in Touran and Suphot (1997).
The results of this example have shown that simply using Gaussian
copula can mean ignorance of some important information, which
0.4 X data
has been ignored in other published literature. Ignorance of pos- Gamma (2,1)
0.35
sible extreme values of project total cost can be very serious Y data
Lognormal (1,0.5)
and possibly lead to large values of cost overrun, major disputes 0.3
and adverse consequences for and between project parties. 0.25
Density

0.2

Example 2 0.15

As mentioned previously, the application of copulas is very 0.1

common in many areas, such as financial risk management, insur- 0.05


ance, hydrology, and infrastructure management, to name a few. 0
However their application in risk management of construction proj- 0 2 4 6 8 10 12
Data
ects is very limited. A survey of literature shows that Moret and
Einstein (2012) is one of the latest relevant studies that have advo- Fig. 5. Histograms of sample data points (X, Y) and their fitted prob-
cated the use of Gaussian copula due to the lack of agreement on ability distribution functions
the criteria to select the most appropriate copula. In this example,
the applicability of Genest and Rémillard’s (2008) method to G-o-F
test and selection of best performing bivariate copulas in
construction risk analysis is shown.
Table 6. Estimated Parameters of Different Copula Families and Results of
Suppose that X and Y are two arbitrary dependent random var- Blanket Goodness-of-Fit Test
iables that can be either time or cost items of a construction project
schedule/plan. In this example it is assumed that 300 historical data Estimated copula
points (i.e., realizations) of these random variables are collected Copula family parameter θn Kendall’s τ n Sn p-value
in the construction site. In Fig. 4, the scatter plots of these sample Clayton 2.00 0.5 0.077 <0.001
data and their corresponding inverse empirical cumulative function Gumbel 2.00 0.5 0.050 0.002
values, which are calculated from Eq. (11), are shown. Apparently, Frank 5.72 0.5 0.016 0.449
in the case of repetitive projects (which most construction projects Gaussian 0.71 0.5 0.025 0.098
Cauchy 0.71 0.5 0.063 0.001
are), the collection of these type of data, either cost or time, is not

© ASCE 04016072-7 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 04016072


Recorded Clayton Copula Gumbel Copula Frank Copula
Simulated
1 1 1

0.8 0.8 0.8

0.6 0.6 0.6

V
0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
U U U
Gaussian Copula Cauchy Copula Independent Variables
1 1 1
Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.

0.8 0.8 0.8

0.6 0.6 0.6


V

V
0.4 0.4 0.4

0.2 0.2 0.2

0 0 0
0 0.5 1 0 0.5 1 0 0.5 1
U U U

Fig. 6. Comparison of performance of different copulas in simulation of random variables

random variables and estimated parameters are given in Table 5 and introduced and the blanket test for selecting the best-performing
shown in Fig. 5. Furthermore, following step 4 of the algorithm the copula been presented. An algorithm to generate the joint proba-
Kendall’s rank correlation between these variables can be readily bility distribution function of correlated cost items has been devel-
calculated using Eq. (5). In fact, in this example to check the per- oped in the paper with two examples to demonstrate its
formance of the proposed method, the sample points are intention- applicability of copulas in modeling construction costs as random
ally generated via a Frank copula with a positive-valued right-tailed variables. A merit of the proposed method is that it can not only
marginal distributions, i.e., lognormal and gamma, which are com- incorporate all different types of distributions in one framework,
monly used in construction risk management. The Kendall’s rank but also captures the best dependence structure between variables.
correlation, τ n , is chosen to be 0.5. After fitting marginals, they From numerical results, this paper has found that different depend-
should be bounded via the best copula to be ready for Monte Carlo ence structure can lead to different probability distributions of total
simulation. The copula parameter,θn , for three Archimedean cop- cost and that reliance on Gaussian copulas may result in the neg-
ulas, i.e., Clayton, Gumbel, and Frank is calculated using the func- ligence of small probabilities of extreme project cost/time overruns.
tional relationships given in Table 1. In the case of the other two It has also been found that the existing G-o-F tests can be employed
elliptical copulas, i.e., Gaussian and Cauchy, it is calculated using in choosing the best performing copula and that the confidence
the relationship between Pearson linear correlation coefficient and interval of simulated total cost can be determined more accurately.
Kendall’s rank correlation [i.e., τ ¼ ð2=πÞ sin−1 ðρÞ. These speci- It can be concluded that the proposed method can predict total cost
fied copulas have undergone the parametric bootstrapping pro- of construction projects with reasonable accuracy. Copula-based
cedure (i.e., step 5 of the algorithm) to compute the Cramer-von Monte Carlo simulation can be regarded as a new approach in de-
Mises statistic, Sn , and corresponding p-values and select the best veloping advanced methods for construction risk management of
performing copula according to the step 6 of the algorithm. cost overruns.
The results in Table 6 suggest that Frank copula has the mini-
mum Sn and thus, is the best performing. Furthermore, in reference
to p-values it is the only copula to be chosen at the 10% signifi- Acknowledgments
cance level. The performance of these five estimated copulas in
mimicking the pattern of recorded data in a Monte Carlo simulation Financial support from Australian Research Council under
(i.e., step 7 of proposed algorithm) can be visually checked in Fig. 6. DP140101547 and LP150100413 is gratefully acknowledged.
Again, as it is apparent from the coincidence of scatter of simulated
and recorded data, the performance of Frank copula is the best.
The other observations based on a comprehensive test is that, with References
the increase in Kendall’s rank correlation value, the performance of
the G-o-F test improves and if the data are more inclined to extreme Embrechts, P., McNeil, A., and Straumann, D. (2002). “Correlation and
values, the chance of selecting Cauchy and Clayton copulas dependence in risk management: Properties and pitfalls.” Risk manage-
increases. ment: Value at risk and beyond, M. A. H. Dempster, ed., Cambridge
University Press, Cambridge, 176–223.
Fischer, M., Köck, C., Schlüter, S., and Weigert, F. (2009). “An empir-
Conclusions ical analysis of multivariate copula models.” Quant. Finance, 9(7),
839–854.
A generic copula-based Monte Carlo simulation method has been Genest, C., and Favre, A. C. (2007). “Everything you always wanted to
proposed in this paper for prediction of the total cost of construc- know about copula modeling but were afraid to ask.” J. Hydrol. Eng.,
tion projects. Concepts of dependency and copulas have been 10.1061/(ASCE)1084-0699(2007)12:4(347), 347–368.

© ASCE 04016072-8 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 04016072


Genest, C., and Rémillard, B. (2008). “Validity of the parametric bootstrap Sklar, A. (1959). “Fonctions de répartition à n dimensions et leurs marges.”
for goodness-of-fit testing in semiparametric models.” Ann. Inst. Henri Publ. Inst. Stat. Univ. Paris, 8, 229–231.
Poincaré, 44(6), 1096–1127. Srinivas, S., Menon, D., and Prasad, A. M. (2006). “Multivariate simulation
Genest, C., Rémillard, B., and Beaudoin, D. (2009). “Goodness-of-fit and multimodal dependence modeling of vehicle axle weight with cop-
tests for copulas: A review and a power study.” Insurance Math. Econ., ulas.” J. Transp. Eng., 10.1061/(ASCE)0733-947X(2006)132:12(945),
44(2), 199–213. 945–955.
Iman, R. L., and Conover, W. J. (1982). “A distribution-free approach Touran, A., and Suphot, L. (1997). “Rank correlations in simulating con-
to inducing rank correlations among input variables.” Commun. Stat., struction costs.” J. Constr. Eng. Manage., 10.1061/(ASCE)0733-9364
11(3), 311–334. (1997)123:3(297), 297–301.
Mai, J. F., and Scherer, M. (2014). Financial engineering with copulas Touran, A., and Wiser, E. P. (1992). “Monte Carlo technique with correlated
explained, Palgrave Macmillan, Basingstoke, U.K.
random variables.” J. Constr. Eng. Manage., 10.1061/(ASCE)0733
McNeil, A. J., Frey, R., and Embrechts, P. (2005). Quantitative risk
-9364(1992)118:2(258), 258–272.
management, Princeton University Press, Princeton, NJ.
Wall, D. M. (1997). “Distributions and correlations in Monte-Carlo
Moret, Y., and Einstein, H. H. (2012). “Modeling correlations in rail line
simulation.” Constr. Manage. Econ., 15(3), 241–258.
construction.” J. Constr. Eng. Manage., 10.1061/(ASCE)CO.1943
Downloaded from ascelibrary.org by UNIVERSITE LAVAL on 07/13/16. Copyright ASCE. For personal use only; all rights reserved.

-7862.0000507, 1075–1084. Yang, I.-T. (2005). “Simulation-based estimation for correlated cost
Nelsen, R. B. (2007). An introduction to copulas (Springer series in elements.” Int. J. Project Manage., 23(4), 275–282.
statistics), 2nd Ed., Springer, New York. Yang, I.-T. (2006). “Using Gaussian copula to simulate repetitive projects.”
Reddy, M. J., and Ganguli, P. (2012). “Risk assessment of hydroclimatic Constr. Manage. Econ., 24(9), 901–909.
variability on groundwater levels in the Manjara basin aquifer in india Zhang, L., and Singh, V. P. (2006). “Bivariate flood frequency analysis
using archimedean copulas.” J. Hydrol. Eng., 10.1061/(ASCE)HE using the copula method.” J. Hydrol. Eng., 10.1061/(ASCE)1084
.1943-5584.0000564, 1345–1357. -0699(2006)11:2(150), 150–164.
Skitmore, M., and Ng, S. T. (2002). “Analytical and approximate variance Zhou, W., Hong, H. P., and Zhang, S. (2012). “Impact of dependent
of total project cost.” J. Constr. Eng. Manage., 10.1061/(ASCE)0733 stochastic defect growth on system reliability of corroding pipelines.”
-9364(2002)128:5(456), 456–460. Int. J. Press. Vessels Pip., 96–97, 68–77.

© ASCE 04016072-9 J. Constr. Eng. Manage.

J. Constr. Eng. Manage., 04016072

You might also like