You are on page 1of 15

Expert Systems with Applications 36 (2009) 4253–4267

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Generalized linear model-based expert system for estimating the


cost of transportation projects
Jui-Sheng Chou *
Department of Construction Engineering, National Taiwan University of Science and Technology, 43 Sec. 4, Keelung Road, Taipei 106, Taiwan

a r t i c l e i n f o a b s t r a c t

Keywords: Timely effective cost management requires reliable cost estimates at every stage of project development.
Cost management While underestimation of transportation costs seems to be a global trend, improving early cost prediction
Generalized linear model accuracy in estimates is difficult. This paper presents a parametric estimating technique applied to Texas
Expert system highway projects using a set of project characteristics. Generalized linear models (GLM) of early quantity
Relational database prediction for geometry-related work activities, namely earthwork, pavement and traffic control were
Transportation projects
developed for continuous project cost tracking. The approach of cost breakdown demonstrates the poten-
tial to separate quantity uncertainty from price uncertainty for highway construction. The benefit of this
approach is to provide a platform for evolving the preliminary parametric cost estimates to a fully
detailed cost management as further information becomes available as the project progresses. During
project execution, managers are given opportunities to review the associated work activities and make
better decisions from the developed GLM-based estimating system. Compared to typical practice of
applying a gross cost per lane length during pre-project planning phase, the proposed approach with
the aid of the developed expert system provides more detailed basis and efficiency for tracking the effects
of changes within the project life cycle.
Ó 2008 Elsevier Ltd. All rights reserved.

1. Introduction of Transportation (DOTs) have experienced highly visible projects


that have suffered from excessive cost overruns (GAO, 2002). Inac-
Preliminary cost estimation is conceived as the most significant curate preliminary cost estimates for highway projects deeply af-
starting process to influence the fate of a new transportation pro- fect financial operations of these organizations in the United
ject. The impact to cost variation for the stakeholders decreases States due to marginal budgets (Sanders, Maxwell, & Glagola,
with the growing maturity of project planning and design. In prac- 1992). Many studies of project cost estimates have found the final
tice, over- and under estimation of project costs are both problem- total cost incurred in designing and constructing projects of all
atic. Inaccurate preliminary cost estimates for highway projects types almost always exceeds the amounts estimated (Schexnayder,
deeply affect financial operations of these organizations in the Uni- Weber, & Fiori, 2003). A study of 258 transportation infrastructure
ted States due to marginal budgets. In addition, missed estimates projects among modernized countries led to the following observa-
expose managers to attacks from the public and politicians, jeopar- tions (Flyvbjerg, Holm, & Buhl, 2002): (1) Costs are underestimated
dizing the investments needed to solve transportation problems. in 9 out of 10 transportation infrastructure projects; (2) For road
Poor estimates were blamed on lack of training and standard of projects, actual costs are on average 20% higher than estimated
procedures, insufficient time available for estimation, incomplete costs with a standard deviation of 30%; (3) Cost underestimation
project information, and inexperienced team. It was concluded that appears to be a global phenomenon.
arbitrary changes in estimates by managers, design errors, scope Estimating accuracy is closely related to the extent of informa-
creep, market conditions and project delays were causes of cost tion available at the time when the estimate is developed. The con-
overruns (Akinci & Fischer, 1998; Akintoye, 1998; Anderson, Mole- ceptual estimate is often misleading because of the paucity of
naar, & Schexnayder, 2007). available information. In particular, a stronger case should be made
Preliminary planning is crucial to the viability of a project pro- for predicting and early quantity tracking for eliminating cost error
gressing beyond the initiation stage, yet little data are available to by better exploiting readily available recent district work item unit
develop an accurate budget (Bell, 1987). Many State Departments prices. The ability to influence project cost is the highest at the ear-
liest stages, when scope is being defined (Duverlie & Castelain,
* Tel.: +886 2 2737 6321; fax: +886 2 2737 6606. 1999; Oberlender, 1998). As commitments are made, that ability
E-mail address: ischou@mail.ntust.edu.tw declines over the entire project delivery period. By the time the

0957-4174/$ - see front matter Ó 2008 Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2008.03.017
4254 J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267

design is underway, there is relatively little opportunity for affect- Six independent models have been built to accommodate six pro-
ing the final cost. Thus, it is important to take control of project ject types as defined by AHD. The accuracy of estimates ranged
costs early in the initiation phase. from ±17% to ±35% of the actual project cost.
Harbuck (2002) considered transit system as linear in nature
2. Research objectives and presented the estimating procedure by breaking down the cost
categories that will be needed for the type of transportation project
The objective of this research is to develop an expert estimating being studied (Harbuck, 2002). The proposed approach is proper
system for the early stage by formulating generalized linear mod- when work items of typical cross-section over a given length of
els (GLM) incorporated within computing software for Texas high- alignment are available.
way construction, namely earthwork, pavement and traffic control. Sthapit and Mori (1994) modeled a parametric equation to esti-
The motivation is to assemble parametric costing of work items mate highway earthwork cost in consideration of cutting depth
following the highway construction specifications. Unlike the esti- and cross slope, which acted as hill and slope factors respectively,
mation at detailed design stage, the parametric method can be as well as soil type in the construction area. In preparation of esti-
adopted with only conceptual configuration available. So far most mates, information of highway geometric alignment and geological
kinds of approaches avoid highly complex time consuming ‘‘wild condition is needed to assess the factors and coefficient of soil type.
guesses” and generate acceptable preliminary cost estimates. They Sanders et al. (1992) used 1987–1988 urban bridge widening
do not exploit the accuracy of recent unit prices nor do they estab- projects as database to illustrate simple regression models for
lish an item-level quantity formulation upon which a more accu- work items which individually made up greater than 1% of the total
rate, robust system could be built in the earliest phase. project cost. The cost of engineering, inspections, right-of-way, and
Identifying causes of changes in estimates is difficult on transpor- some other items normally associated with total cost were not in-
tation projects while underestimation of capital and operating cluded in these models. Usually each bid contains more than hun-
costs is common (Sanders et al., 1992). This study intended to build dreds of line items, therefore a total project cost at conceptual
statistical models that toggles project input information at project stages can be reasonably estimated by summing up the costs of a
initiation, and segregated unit prices from highway work item few of major work items.
costs. The analysis is based on the project database collected from Wilmot and Cheng (2003) developed a cost model to estimate
the Texas Department of Transportation (TxDOT) during the period future overall highway construction costs in Louisiana in terms
of 2001–2003. The later projects in 2004 were randomly chosen for of a highway construction cost index. The authors divided the
model validation. An expert system using relational database was overall model into five sub-models, each of which included one
developed to automate a series of quantity calculation, retrieval dominant construction item as a predictor. They then determined
and storage based on the developed quantity models. The concept that an average growth rate of 3.3% per year in construction costs
of this proposed architecture is to better exploit work item histor- in Louisiana was expected, based on the statistical analysis of 2827
ical unit prices and provide a more seamless preliminary estimate highway and bridge contracts during the period of 1984–1987.
with quantity-based models for later on periodic adjustment. This Trost and Oberlender (2003) established a parametric model
strategy enables continuity of item-level quantity tracking as pro- using which can predict the accuracy of early estimates based on
jects evolve to later phases without the influence of price fluctua- the estimate score for capital projects in the process industry.
tion. The output, in the form of a report can be used as a starting The required contingency can be decided by rating scored for the
point for experienced engineers’ subsequent cost management classified factors based on the experienced estimators.
and better decisions. Furthermore, many studies have investigated unit cost estimat-
ing relationships (CERs) between cost (dollar per lane length) and
plan quantity or between cost and cost drivers for building and
3. Literature review infrastructure construction using either statistical techniques or
neural networks (Al-Tabtabai, Alex, & Tantash, 1999; Bell, 1987;
A wide range of parametric models for highway construction Chengalur-Smith et al., 1997; Emsley, Lowe, Duff, Harding, & Hick-
costs can be found in the literature. For example, Saito, Sinha, son, 2002; Hegazy & Ayed, 1998; Herbsman, 1983; Jrade & Alkass,
and Anderson (1991) proposed an analytical approach in develop- 2002; Lowe, Emsley, & Harding, 2006; Masi, 2003; Morcous, Bak-
ing statistical models for the estimation of bridge replacement houm, Taha, & El-Said, 2001; Phaobunjong & Popescu, 2003; Saito
costs using project data of 279 bridges replaced between 1980 et al., 1991; Sanders et al., 1992; Williams, 2005; Yu, 2006).
and 1985 by the Indiana Department of Transportation (INDOT). Hegazy and Ayed (1998) developed a parametric cost estimat-
They concluded that adding component cost models is better than ing model for highway projects using a neural network approach,
total bridge cost model. Likewise, Chengalur-Smith, Ballou, and rather than using black-box commercial software. Factors used as
Pazer (1997) focused on the estimation of bridge rehabilitation model inputs include description of project size, year of construc-
by using non-linear regression with the cost data collected in the tion, project location, capacity, and other uncertainty-related fac-
New York State Department of Transportation. The model ad- tors. The networks weight-determining methods, simplex
dressed eight explanatory variables including region, bridge type, optimization and back-propagation training compared to genetic
deck area, substructure area, age, functional class, component algorithms optimization were observed to produce smaller errors
condition index, and completed work, to explain the response vari- to the 18 highway projects constructed in Newfoundland, Canada.
ables, component cost, subtotal costs, and unit cost respectively. In this study, the use of the spreadsheet interface unraveled the
Whereas their approach was concerned with highly non-linear complicated analysis process and the total budget cost for the pro-
relationships between cost and cost drivers and proved the total ject was output.
cost regression models superior in explanatory power to the unit Morcous et al. (2001) used neural networks to estimate preli-
cost modeling. minary item-level quantity of concrete volume and prestressing
Bell (1987) developed multiple linear regression models for weight for highway bridges in Egypt. The estimate errors were
preliminary cost estimating which can be used by Alabama High- found to be within ±7.5% and ±11.5% for concrete volume and pres-
way Department (AHD) for long range cost forecasting. The total tressing weight respectively. Adeli and Wu (1998) formulated a
project cost per mile is the function of a list of probable predictors model to estimate the cost of reinforced-concrete (RC) pavement.
comprised of line items, such as quantities of work items per mile. They collected 242 data samples from historical RC pavement
J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267 4255

projects conducted by the Ohio Department of Transportation ability to reflect the fluctuation of prices due to market conditions,
(ODOT). Half of the data sets were randomly selected as training time inflation, restricted working hours, political risk and un-
data and the other half were used to validate the model. In the known geologic conditions in preparing estimates. Without cau-
illustrated example, only quantity information for RC pavement tious utilization of post bid data, this practice will lead to an
was used as input data and the unit cost was used as an output var- inaccurate estimate. The conventional approach lacks of quantity
iable. The average error for predictive unit costs for pavement was information and does not initiate the first quantity estimates until
16.5% of average unit cost. 90–95% of the design is completed. Quantity-based estimating
Al-Tabtabai et al. (1999) used neural networks to predict preli- system enables periodic quantity adjustment as projects evolve
minary cost estimation of highway construction based on five ex- to later phases.
perts’ judgment. The output of this model is the percentage The quantity-based approach was implemented in this study to
change in total cost based on the input of nine predominant fac- split an item cost into two parts, measure of quantity and unit
tors: location, utilities, soil nature, type of consultant, detour con- price. The quantity of a work item which is deemed as no impact
struction, hauling distance, financial condition, type of road, and by economic conditions and can be predicted with a number of
need. Williams (2005) developed regression and neural network associated project characteristics with the aid of generalized linear
models with parameters of bid ratios to predict the completed cost modeling. Then the readily unit price updated regularly by the DOT
of the highway projects. The study showed that none of the best can be applied in item quantity for developing accurate item-level
performing models used the bidding ratios but the natural log of cost at early stages. With the use of information technology, the
the low bid as input parameter. statistical models can be incorporated within an expert estimating
Some researchers recognize that cost estimates should not be system to generate a semi-detailed quantity-take-off report. This
represented as a definitive value but rather as a range (Molenaar, output provides a platform for tracking the effects of changes dur-
2005; Touran, 1993). By using Monte Carlo simulation, one can de- ing project development. The GLM-based estimation enables peri-
velop a probabilistic cost estimating system under alternative sce- odic quantity adjustment as projects evolve to later phases with a
narios. In this way, a range of cost estimates can be developed for a new set of project characteristics.
variety of scenarios. In this case, an abundant amount of statistical
data is needed.
Molenaar (2005) presented a methodology developed by the 5. Methodology
Washington State Department of Transportation (WSDOT) for its
Cost Estimating Validation Process (CEVP) through nine case stud- A transportation project is organized by a series of processes
ies. The CEVP provides better understanding and communication undertaking to create a comfortable facility for the road users.
of the risks involved with mega highway projects for a more trans- The processes can be broken into activities or work items to quan-
parent assessment of uncertainty (Molenaar, 2005). Based on the tify the labors, materials, equipment, project duration, quality and
concept that most construction project costs can be calculated by other resources required to complete the project. To improve pre-
combining fixed and variable cost components Touran (1993) used liminary estimates, sources of cost variability must be investigated.
probabilistic simulation of the variable cost components to estab- This study separated the variance of item unit price from item
lish a cumulative distribution function for the total project cost quantity to provide accurate estimates with regularly updated
which offered better insight into cost variability, aside from the bid prices within TxDOT.
influence of subjective correlations between different cost compo- The quantity models were developed from a set of project fea-
nents (Touran, 1993). The author indicated that there was no need tures, functions, and characteristics mining from a notable of pro-
to include every cost component in the project as random vari- ject data to screen out significant variables and thereafter
ables; therefore, only the desired cost elements which exhibited summarized as future input basis for early estimate forecasting.
the greatest amount of variability were estimated with simulation To streamline the estimating procedure, a computational tool is
technique. built up to automate the item quantity prediction. Since the rela-
Estimators should be involved in the bid analysis for a project tionship between the work quantity and the parameters were
prior to award; this affords an opportunity to spot bidding trends non-linear, the empirical analysis employed GLM by regressing a
early, which may impact the award decision. A research report translog quantity variable on a linear combination of non-quantity
pointed out that 31 DOTs generate conceptual estimates solely numerical or categorical variables in log, interaction, or original
based upon historic lane-mile cost averages for similar projects forms to established relationships between project characteristics
(Schexnayder et al., 2003). There are a variety of estimating appli- and associated item-level quantity. Explanatory features were ex-
cations utilized in the State DOTs. Based on the findings, the re- tracted from Design and Construction Information System (DCIS),
search classified the approaches into five categories: (1) lane-unit one of the Texas nationwide databases, consisting of relational data
length cost average; (2) conventional quantity-take-off and ad- on project information, work program, project estimate, project fi-
justed historical unit price; (3) component-level parametric esti- nance, and contract letting tables.
mates with qualitative adjustment factors; (4) work item unit
price range according to quantity range estimates; (5) a mixed esti- 5.1. Cost growth factors for infrastructure projects
mates of above.
The most common causes of cost growth for highway projects
can be divided into three groups, namely project factors, organiza-
4. Challenges of preliminary cost estimates tion factors, and estimate factors. Project cost performance is di-
rectly related to project conditions. These project factors include
Estimate accuracy is closely related to the extent of informa- changing economic or market conditions, project type, project
tion available at the time the estimate is developed. Thus develop- complexity, project location, project size, duration of construction
ing accurate preliminary cost estimates is very challenging. The periods, scope changes, unforeseen engineering complications and
use of historical bid prices is a relatively straightforward practice constructability challenges, construction accessibility, restricted
and can yield good result if applied properly. However, it does not working hours, use of new technology, method of construction or
mean that the average bid prices should represent the same con- construction techniques, and experimental or research items and
ditions for the estimating projects. The estimators must have the special specifications or provisions (Akintoye & Fitzgerald, 2000;
4256 J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267

Black, 1984; Odeck, 2004; Sanders et al., 1992; Wilmot & Cheng, A box plot of engineering quantity versus project type for
2003). embankment is shown in Fig. 1 for instance. The project types con-
Projects can be influenced as well by organization factors, sidered in this study include bridge replacement (BR), bridge wid-
including organizational capacity of the owner, designer, and/or ening and rehabilitation (BWR), interchange (INC), new location
contractor, contract type and context of contract, changes in regu- non-freeway (NNF), rehabilitation of existing roadway (RER), up-
latory requirements, disruption or discontinuity within the man- grade non-freeway (UGN), widening of freeway (WF), and widen-
agement team or local political leadership, lack of site familiarity ing of non-freeway (WNF). The right-hand side plot indeed
by the design team, expertise of the consultants involved in the displays an effective natural logarithmic transformation with
project, and poor communication between districts and head office approximately normal distribution for the originally positively
(Akintoye, 1998; Akintoye & Fitzgerald, 2000; Baloi & Price, 2003; skewed quantity values of most of the project types except BR
Schexnayder et al., 2003; Wilmot & Cheng, 2003). and UGN as shown in the left-hand side.
Many studies pointed out that quality and timing of the esti- Before finalizing the overall models development, a pilot data
mate significantly influence cost performance. These estimate fac- analysis was carried out on the excavation for widening of non-
tors include timing of estimate cost data versus timing of freeways (WNF) projects. Regression analysis was performed to
expenditure, estimator-related factors (e.g. cognitive biases), esti- compare the effects of project length multiplied by project width
mating team experience, quality of cost information, time allowed on engineering quantity for the original and transformed datasets
to prepare the estimate, wide variability in contractor’s (subcon- respectively. Scatter plots presented in Fig. 2 obviously indicated
tractors’) prices, lack of review of cost estimate by management, a significant alleviation on heteroscedasticity (non-equal variance)
lack of adequate guidelines for estimating, and estimators’ lack of that violates one of the statistical assumptions and the transform
data processing techniques (Akinci & Fischer, 1998; Akintoye & makes the fitting feasible. The goodness of fit (R2 = 0.599) indicates
Fitzgerald, 2000; Baloi & Price, 2003; Flyvbjerg et al., 2002; Trost that after transformation the 59.9% variance of engineering quan-
& Oberlender, 2003; Wilmot & Cheng, 2003). tity can be explained by the variance in total project area.
Based on the comprehensive analysis, a general parametric
5.2. Generalized linear model formulation function used in model development was then established as
shown below after a series of trial and error formulation with com-
In the preliminary analysis, several curve estimation models, pliance of statistical assumptions.
including linear, quadratic, compound, growth, exponential, cubic, 0   1
Pm P
n;m Pn P m
inverse, power, and logarithmic were tested to seek a better good- BY bi
n ðai Di Þ cij X i X j  /lj X i Dj
C
ness of fit. The coefficient of determination resulting from logarith- Y ¼ b0 @ ðX i Þ  e i¼1 i¼1;j¼1;i–j i¼1 j¼1
Ae ð1Þ
i¼1
mic transformation was found to be better than others in this
comprehensive analysis.

1400000 14.00
942
881 869
1200000 1,042 12.00
Quantity, 0.765m3 (1 cubic yard)

714 921
286 644 682
379
1000000 1,024 441 10.00 672
1,015
ln(Quantity)

1,009
800000 8.00
1,020 755

600000 6.00
1,012 27
1,040
400000 754 4.00
1,053 8
1,047 1,049
1,032
62 809 36
1,051 1,006 773
926
200000 881 742 1,045 1,052 2.00
869 989 1,035 1,029
1,018 1,038 31 26
284 1,019 188 1,034
728 682 1,008 1,001 950 807 968 2
687 732
860 676 1,010
1,025 1,007
0 814 878
648 967 0.00
832 770
729 907

BR BWR INC NNF RER UGN WF WNF BR BWR INCN NF RER UGN WF WNF
Project Type Project Type

Extreme outlier
° Mild outlier
The max. obsservation not outlier
3rd quartile

Median

1st quartile
The min. observation not outlier

Fig. 1. Box plot for embankment quantity.


J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267 4257

700000.00 No. of data: 150 14.00 No. of data: 150


R2 = 0.599
Quantity, 0.765 m3 (1 cubic yard)

600000.00

12.00
500000.00

ln(Quantity)
400000.00
10.00
300000.00

200000.00
8.00

100000.00

0.00 6.00
0.005 00000.00 1000000.00 1500000.00 2000000.00 2500000.00 3000000.00
10.00 11.00 12.00 13.00 14.00 15.00
Project Length x Project Width, 0.093m2 (1square feet) ln(Project Length x Project Width)

Original Dataset Natural Logarithmic Transform


Fig. 2. Scatter plots of excavation: quantity vs. total project area.

where Y is the response variable, work item quantity, as, bs,cs, /s Cross-product interaction regression models based on Eq. (3)
the estimated parameters, Xs the predictors representing numerical were implemented in the later analyses to improve R2 and to ex-
data, e the exponential constant, Dj is the predictors representing plain interaction effects on the variability of the response variable.
categorical data and e is the error term. When adding interaction terms to the regression models, caution
This parametric model is an extended type of non-linear rela- should be exercised on the existence of multicollinearities between
tionship that has firm grounding in economic theory. It is called some of the predictor variables and some of the interactions terms.
the constant elasticity relationship or multiplicative relationship In addition, when the number of explanatory variables in the
and can be transformed into a linear model by a logarithmic trans- regression model is large, the potential number of interaction
formation (Albright, Winston, & Zappe, 2003; Neter, Kutner, Nac- terms can become very large. Therefore prior knowledge concern-
htsheim, & Wasserman, 1996; Saito et al., 1991). In this study, a ing practical interpretation of the interaction terms that are most
natural logarithm with base e (i.e. ln or loge) was used. The equa- likely to influence the response variable should be utilized when-
tion was transformed therefore into the following linear model: ever possible.
Stepwise regression using probability of F-test as entry (0.05)
lnðYÞ ¼ lnðb0 Þ þ b1 lnðX 1 Þ þ b2 lnðX 2 Þ þ    þ bn lnðX n Þ þ a1 D1
and removal (0.10) criteria was employed throughout the model
þ a2 D2 þ    þ am Dm þ c12 X 1 X 2 þ c13 X 1 X 3 þ   
development. Upon obtaining regression model results, several
þ cnm X n X m þ /11 X 1 D1 þ /12 X 1 D2 þ    þ /nm X n Dm þ lnðeÞ assumptions ought to be revisited alongside hypothesis testing
ð2Þ for the regression model (F-test) and regression coefficients (t-test)
with 95% confidence interval.
Transformation can clearly reduce the impact of outliers and
An important part of linear regression modeling is checking
help allay concerns about violating assumptions of normality and
whether the required assumptions of linearity and i.i.d. (indepen-
homoscedasticity in regression analysis when raw data exhibit a
dence and identical distribution) of observations are met. Although
skewed pattern (Saito et al., 1991; Neter et al., 1996; Chengalur-
the validity of the assumptions can never be entirely certain, there
Smith et al., 1997; Phaobunjong & Popescu, 2003) while these
are ways to check for gross violations by analyzing residuals. The
two assumptions were not nearly as crucial as the need for inde-
prevailing techniques to diagnose residuals include box plot, Q–Q
pendence. If a transformation is performed in a least squares
plot, scatter plot, partial regression plot, residuals versus predicted
regression, the resulting statistical properties (e.g., best, linear,
values, etc. (Norusis, 2002). If the residuals are normally distrib-
unbiased estimates) are true only on the transformed values. Once
uted, then there should be a linear relationship between the ex-
the results are ‘‘back-transformed” to the original units, these sta-
pected and observed cumulative probabilities.
tistical niceties are lost (Bobko, 2001). Eq. (2) can be expressed as a
Multicollinearity exists as a fairly strong linear relationship be-
generalized linear regression model in terms of a linear combina-
tween two or more independent variables (Albright et al., 2003).
tion of predictors as below:
The degree to which the predictors are correlated among them-
Y 0 ¼ AX0 þ e0 ð3Þ selves can affect regression results and make estimation unstable.
The strength of collinearity among the independent variables in
where
the models was measured by a statistic called tolerance. The toler-
ance is the proportion of variability defined as 1  Ri2, where Ri2 is
Y0 ln(Y)
A ½lnðb0 Þb1 b2 ; . . . ; bn a1 ; . . . ; am c12 ; . . . ; cnm the coefficient of determination of variable i when variable i is pre-
/11 ; . . . ; /nm ; row coefficient vector dicted from all other independent variables. SPSS (2003) suggests if
X0 ½1 lnðX 1 Þ lnðX 2 Þ; . . . ; lnðX n Þ D1 ; any of the tolerances are less than 0.1, multicollinearity may be a
. . . ; Dm X 1 X 2 ; . . . ; X n X m X 1 D1 ; . . . ; X n Dm T ; column problem (SPSS, 2003). In this study the predictors were excluded
explanatory variable vector to run the analysis again, once the suggested tolerances were
e0 ln(e) encountered.
4258 J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267

After the needed checks have been taken to make sure the com- with the assistance of the TxDOT Austin TPP (Transportation Plan-
pliance of statistical assumptions, the final step in the model- ning and Programming) Division. While a few projects had incom-
building process is the validity of the predicted models. Model val- plete data, the DCIS was found to contain extremely useful and
idation usually involves checking the model against observed or reliable data. Thus, the database system was used as the data re-
independent data (Neter et al., 1996). The validity of item-level source for this research. By using proven statistical models with
quantity-based models developed in this study was calibrated the cost data, this study attempts to develop quantity models as
through scatter plots of predicted versus observed values. In addi- an alternative approach for preliminary construction cost
tion, predictive ability of models selected from studied work cate- estimation.
gories were checked through the collection of new highway The statistical models were derived from analysis of quantities
projects extracted from TxDOT. and early quantity drivers from historical projects. Eighty-eight
project characteristics that may affect the variance of item-level
6. Data collection and preprocessing quantity based on the reviews and literature findings were ex-
tracted from DCIS with more than five hundred thousand projects.
The Texas Department of Transportation (TxDOT) statewide Unit bid prices associated with corresponding work items were
computer network, Design and Construction Information System segregated in the data analysis. Hence, up-to-date unit prices
(DCIS), allows all districts and Austin headquarters to maintain which have to be obtained beforehand can be employed in the sys-
project data in a standardized format. The raw data were collected tem based on the current market condition and inflation.

Table 1
Final variables considered in model development

Variable name Description Type (level)


 Response variable
1. EngQuantity – Engineering quantity for the specific work item Numerical
 Explanatory variables
1. AdtPresent – The present average daily traffic using the facility, in vehicle/day Numerical
2. DESSpeed – Designed speed for traffic of highway Numerical
3. LaneWidth – Average lane width between lane markings of proposed or existing facilities, in ft. Numerical
4. MNLNNo – Main lane number to be constructed, repaired, improved, or upgraded
5. NHS – Indicate if the highway is on the approved National Highway System Dichotomous (Yes/No)
6. NoTrucks – No. of trucks on the existing highway Numerical
7. PercentTrucks – Percent trucks in the existing roadway, in % Numerical
8. PL – Total roadway length, in miles Numerical
9. ProjType – Eight project types for highway constructiona Nominal
10. PW – Project width, derived from multiplying lane width by no. of main lanes plus shoulder widths Numerical
11. ShoulderWidth – Shoulder width of the main lanes Numerical
12. Terrain – Terrain in the construction area: Mountainous/Rolling/Level Nominal
13. TrunkSysFlag – Indicate if the highway is on the trunk system Dichotomous (Yes/No)
14. UrbanRural – Population code where the project located Dichotomous (Urban/Rural)
 Interaction terms
1. ProjType*PL – To measure if the project types have interactive effects of project length on the response variable Numerical
2. ProjType*PW – To measure if the project types have interactive effects of project width on the response variable Numerical
3. UrbanRural*DESSpeed – To measure if the project location has interactive effects of designed speed on the response variable Numerical
4.UrbanRural*PercentTrucks – To measure if the project location has interactive effects of percent trucks on the response variable Numerical
 Identification variable
1. CSJ – Control-section-job number which is used to identify projects within Texas String
a
Project types can be referred to Fig. 1.

Table 2
Standard work items for geometry-related activity

Geometry-related activity Item Work item Scope of Work


no.
Earthwork and landscape 100 Preparing right of way Remove and dispose of all obstructions
110 Excavation Earth cuts and rock cuts
132 Embankment Furnish, place and compact materials for construction of roadways and embankments
Subgrade treatments and 247 Flexible base Construct a foundation course composed of flexible base
base 260 Lime treatment for materials used as Mix and compact lime, water, and subgrade in the roadway
subgrade (road mixed)
276 Lime treatment for base courses (road Mix and compact lime, water, and base in the roadway
mixed)
Surface courses and 316 Surface treatments Construct a surface treatment consisting of 1 or more applications of a single layer of
pavement asphalt material covered with a single layer of aggregate
354 Planning and texturing pavement Plan, or plane and texture, existing asphalt concrete pavement, asphalt-stabilized base,
or concrete pavement. Texture bridge deck surfaces
360 Concrete pavement Construct hydraulic cement concrete pavement with or without curbs on the concrete
pavement
Lighting, signing, 662 Work zone pavement markings Furnish, place, and maintain work zone pavement
markings and signals
J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267 4259

Table 3
Descriptive statistics of highway projects completed in FY01–FY03

Project No. of Total obligated Break-point item No. of major work items (standard geometry- No. of roll-up work Total No. of work items
typea projects amount (USD) cost% b related items) items employed
BR 319 499,705,405 1.51 5 332 337
BWR 48 90,280,968 0.43 7 218 219
INC 50 1,215,587,973 0.78 6 428 434
NNF 41 318,574,607 6.24 3 251 254
RER 431 1,672,013,465 3.38 3 462 465
UGN 43 160,004,564 4.35 3 208 211
WF 36 1,202,484,985 1.35 6 430 436
WNF 150 1,342,092,319 4.54 3 448 451
Total 1118 6,500,744,286
a
Refer to the explanation of Fig. 1 for full name of each project type.
b
The break-point item cost % is the minimum cost percentage of the cost item among the major work items.

Table 4
Cost percentages of frequently used standard work items in each project type (Unit in %)

Activity Line Item Work item description BR BWR INC NNF RER UGN WF WNF
no.
Earthwork and landscape 1 100 Preparing right of way 1.51 0.49
2 110 Excavation 1.67 1.28 1.69 8.03 4.35 2.46
3 132 Embankment 3.09 1.72 3.62 6.24 4.36 2.94 4.54
Subtotal of cost percentage (%) 6.27 4.11 5.31 14.27 – 8.71 5.40 4.54
Subgrade treatments and 4 247 Flexible base 2.62 1.05 2.41 7.18 8.52 14.20 2.86 6.41
base 5 260 Lime treatment for materials used as 0.99 1.37
subgrade (road mixed)
6 276 Lime treatment for base courses (road 0.78 1.35
mixed)
Subtotal of cost percentage (%) 2.62 1.05 4.18 7.18 8.52 14.20 5.58 6.41
Surface courses and 7 316 Surface treatments 3.38
pavement 8 354 Planning and texturing pavement 0.66
9 360 Concrete pavement 1.55 7.60 6.20 8.53 9.31 11.17
Subtotal of cost percentage (%) 2.31 8.26 6.20 – 11.91 – 9.31 11.17
Lighting, signing, markings 10 662 Work zone pavement markings 0.43
and signals Subtotal of cost percentage (%) – 0.86 – – – – – –
Cumulative cost percentage of standard work items 14.10 18.15 19.85 21.45 20.43 22.91 20.29 22.66
Cost percentage of non-standard work items (roll-up work items) 85.90 81.85 80.15 78.55 79.57 77.09 79.71 77.34

Table 5
Parametric quantity estimating models with correspondent unit and adjusted R2

Item no. Parametric quantity estimating modela Unit Adjusted R2


1.0600.639RER0.712BWR0.590INC0.422NNF 0.45INC+0.363WF 0.044 9.415+2.208NNF+1.221WNF
100 Q100 = (PL) (PW) (PercentTrucks  AdtPresent) e 100 Station 0.839
110 Q110 = (PL)1.0600.639RER0.712BWR0.590INC0.422NNF(PW)0.456INC+0.363WF(PercentTrucks  AdtPresent)0.044 e8.215+2.208NNF+1.221WNF CY 0.640
132 Q132 = (PL)1.4800.954RER1.031BWR1.091INC0.898UGN0.788WNF0.801NNF0.828WF(PW)0.382(PercentTrucks  CY 0.568
AdtPresent)0.103 e8.1252.036RER+0.545INC+0.367TrunkSysFlag
247 Q247 = (PL)1.0610.851NNF0.793BWR0.610INC0.234RER+1.667WF(PW)0.273WNF+0.289NNF e8.719+1.858INC CY 0.767
260 Q260 = (PL)0.716+0.590WF(PercentTrucks  AdtPresent)0.063 e10.6010.596RER SY 0.541
276 Q276 = (PL)0.562+1.617WNF(PercentTrucks  AdtPresent)0.194 e10.1242.504WNF SY 0.724
316 Q316 = (PL)1.127WNF+1.014RER+0.997UGN(PW)0.062RER e8.215 CY 0.698
354 Q354 = (PL)0.54(PW)1.465 e3.8620.944TrunkSysFlag SY 0.300
360 Q360 = (PL)1.01.472UGN0.628INC1.039BWR0.610NNF0.412RER(PW)0.908 e6.384+0.384NHS SY 0.556
662 Q662 = (PL)0.818RER+0.897UGN+0.825WNF(PW)0.6350.49RER e7.254+0.952UGN+0.672WNF LF 0.455
a 3
Q100 is the quantity of preparing right of way, 30.48 m (100-ft. station), Q110 the quantity of excavation, 0.765 m (1 cubic yard), Q132 the quantity of embankment,
0.765 m3 (1 cubic yard), Q247 the quantity of flexible base, 0.765 m3 (1 cubic yard), Q260 the quantity of lime treatment for materials used as subgrade (road mixed), 0.836 m2
(1 square yard), Q276 the quantity of lime treatment for base courses (road mixed), 0.836 m2 (1 square yard), Q316 the quantity of surface treatments, 0.765 m3 (1 cubic yard),
Q354 the quantity of planning and texturing pavement, 0.836 m2 (1 square yard), Q360 the quantity of concrete pavement, 0.836 m2 (1 square yard), Q662 the quantity of work
zone pavement markings, 0.305 m (1 linear foot), PL the project length, 1.609 km (1 mile), PW the project width, 0.305 m (1 linear foot), BR the bridge replacement (reference
category in statistical analysis), BWR the bridge widening or rehabilitation, INC the Interchange, NNF the new location non-freeway, RER the rehabilitation of existing road,
UGN the upgrade to non-freeway standards, WF the widen freeway, WNF the widen non-freeway, PercentTrucks the percent trucks, (%), AdtPresent the present average daily
traffic, vehicle/day, TrunkSysFlag the trunk system flag (yes = 1; no = 0) and UrbanRural is the project location (urban = 1; rural = 0).

6.1. Variables employed in the statistical analysis normality assumption of cost data. The final valid parameters used
in the analysis along with the interaction terms for the model
Candidate predictors were selected based on the findings from development are listed in Table 1.
literature, interviews, and available data. Since cost estimates are This paper focuses on a variety of highway project types as de-
non-negative values and are positively skewed, logarithmic trans- scribed previously for the geometry-related activities: (a) Earth-
formation for the raw data was used to alleviate the violation of work and landscape; (b) Subgrade treatments and base; (c)
4260 J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267

100: Preparing Right of Way 110: Excavation (Sample Size = 1229)


(Sample Size = 795) (Chou et al. 2006)
14

Unstandardized Predicted Value


Unstandardized Predicted Value

6
12

10
4

2
6

4
0 R Sq Linear = 0.65
R Sq Linear = 0.841
2

0 2 4 6 0 5 10 15
LN_EngQuantity LN_EngQuantity

132: Embankment 247: Flexible Base (Sample Size = 492)


(Sample Size = 1053) (Chou et al. 2006)
14
12
Unstandardized Predicted Value
Unstandardized Predicted Value

12
10

10
8

8
6

4 6
R Sq Linear = 0.575 R Sq Linear = 0.772

2 4
0 2 4 6 8 10 12 14 2 4 6 8 10 12 14
LN_EngQuantity LN_EngQuantity

260: Lime Treatment for Materials used as Subgrade 276: Lime Treatment for Base Courses
(Sample Size = 363) (Sample Size = 33)
14 14
Unstandardized Predicted Value

Unstandardized Predicted Value

12
12

10

10
8

8
6
R Sq Linear = 0.546
R Sq Linear = 0.716

4
6
6 8 10 12 14
4 6 8 10 12 14
LN_EngQuantity
LN_EngQuantity

Fig. 3. Predicted values of logarithmic engineering quantity vs. observed values for standard work items: 100, 110, 132, 247, 260, and 276 (dashed lines: 95% confidence
interval; solid line: fit line) 100: preparing right of way (sample size = 795) 110: excavation (sample size = 1229) (Chou et al., 2006) 132: embankment (sample size = 1053)
247: flexible base (sample size = 492) (Chou et al., 2006) 260: lime treatment for materials used as subgrade (sample size = 363) 276: lime treatment for base courses (sample
size = 33).
J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267 4261

Surface courses and pavement; (d) Lighting, signing, markings and items employed in classified project types with the total let
signals. Quantity models for each activity are shown in the follow- amount is summarized in Table 3.
ing section. Specific descriptions for the geometry-related work Take widen freeway as an example, there were 436 work items
items are listed in Table 2. of which 6 were top 20 work items of geometry-related activities.
Break-point item cost determination is equal to the particular item
6.2. Data description cost consumed in FY01–FY03 divided by the total obligated
amount for that specific project type. Roll-up work items consti-
The listing of activities involved in the project type could be tuted hundreds of work items. Once quantities for each standard
very complicated. A study argued that the bill of quantities (BoQ) item can be estimated, a project preliminary estimating cost can
method is cumbersome to list all the items of work which draws be determined on the basis of historical statistical analysis. Table
the estimators’ all attention at the preliminary stages while there 4 presents the vital few contributions of the most frequently used
should be some time saved to survey early project information items for each project type. The blank cells represent cost percent-
available and put efforts on the cost significant items simulta- age of that work item is comparably small to account for the pro-
neously (Munns & Al-Haimus, 2000). Pareto Analysis stated an ject type based on the observations from historical project data.
80/20 principle indicating that 80% of problems are often due to
20% of the causes (PMI, 2004). To identify the significantly few con- 6.3. Data inspection and manipulation
tributors that account for the most of the overall project cost, stan-
dard geometry-related work items based on 80/20 rule were This research is basically a retrospective study by analyzing his-
screened out to help and initiate preliminary quantity-based esti- torical database with statistical techniques to discover trends and
mates. The resulting number of top 20 standard and roll-up work to document lessons learned. In order to explore these data,

316: Surface Treatments 354: Planning and Texturing Pavement


(Sample Size = 458) (Sample Size = 214)
9 12

8
Unstandardized Predicted Value

Unstandardized Predicted Value

11
7

10
6

5
9

4
8
3 R Sq Linear = 0.699 R Sq Linear = 0.31

2 7
2 4 6 8 10 6 8 10 12 14
LN_EngQuantity LN_EngQuantity

360: Concrete Pavement 662: Work Zone Pavement Markings


(Sample Size = 188) (Sample Size = 522)
14 13
Unstandardized Predicted Value

Unstandardized Predicted Value

12
12

11
10

10

8
9

6
8
R Sq Linear = 0.582 R Sq Linear = 0.464

4 7
4 6 8 10 12 14 6 8 10 12 14
LN_EngQuantity LN_EngQuantity

Fig. 4. Predicted values of logarithmic engineering quantity vs. observed values for standard work items: 316, 354, 360, and 662 (dashed lines: 95% confidence interval; solid
line: fit line) 316: surface treatments (sample size = 458) 354: planning and texturing pavement (sample size = 214) 360: concrete pavement (sample size = 188) 662: work
zone pavement markings (sample size = 522).
4262 J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267

pattern recognition using descriptive statistics, box plots, scatter Table 6


plots, and data transformation are adopted to discover features Prediction accuracy of item cost for new data sets

from seemingly trivial data. Data inspection allows the researcher Work category Item Project CSJ % % Error
to observe the quality of the data and to find patterns. Few data no. Error range
were found to be a mixture of metric and English units and there- Earthwork and landscape (n = 15) 132 019503062 8.0 23.0
fore were omitted in the very beginning. Although categorical vari- 070001024 .3 j
ables were collected, these discrete variables were converted to 070001025 22.9 +22.9
073002035 13.2
continuous ones by using dummy indicators for quantitative mod- 083311019 23.0
eling. Recoding was executed on fields for national highway sys- 090132018 20.9
tem, trunk system flag, project type, project location, terrain, and 090248462 7.8
main lane type. The last step in data manipulation was to compute 090824038 1.1
090828011 15.8
and transform needed predictors for regression analysis. For cate-
091712055 13.0
gorical data, dummy variables were created as well as interaction 091811034 3.6
terms to account for the likely effects on the dependent variable. 091811037 6.0
091811042 10.8
091846092 9.9
092323012 11.4
7. Presentation and validation of parametric estimating models
Average prediction % error:
0.63%
Front-end parametric equations for work items were derived by
Subgrade treatments and base (n = 8) 247 006502045 .5 11.7
regressing historical quantities on aforementioned predictors. The 008703026 4.3 j
fitted models were statistically significant using F-test at the 95% 010908037 4.5 +24.6
confidence interval with correspondent coefficients of determina- 014402040 5.2
068301069 10.1
tion. The regression coefficients of all the fitted models were indi-
100601052 24.6
vidually tested with t-value and were to be found significantly 137802022 12.8
different from zero at the 95% confidence interval. Table 5 shows 250601021 11.7
only representative items for each category with adjusted R2 Average prediction % error: 2.9%
values. Lighting, signing, markings, and 662 005001060 15.7 15.9
signals (n = 14) 006502045 2.1 j
7.1. Model validation 007004026 2.0 +10.3
008703026 5.9
010908037 1.1
The validity of the general linear models were inspected 014402040 3.7
through scatter plots of the predicted versus historical values as 068301069 3.9
shown in Figs. 3 and 4. The derived predictive models were cali- 090248575 10.3
090506029 8.3
brated in terms of goodness of fit (R2 values). From the figures,
091512366 7.8
most of the predicted and observed values cluster approximately 097802057 15.9
around the diagonal line. These findings are more obvious for pre- 100601052 8.6
paring right of way, flexible base, lime treatment for base courses, 137802022 6.2
and surface treatment. 250601021 13.4
Average prediction % error: 5.6%
The accuracy of model prediction was checked through the col-
lection of new projects to be examined in terms of whether or not
the regression models developed from the earlier data are still
applicable for other projects. The new data including 30 projects 5.6%. For individual observation, the prediction errors are be-
that 14 from bridge replacement (BR) and 16 from widen non-free- tween 23.0% and +24.6%. These percentage error ranges meet
way (WNF) were randomly extracted from the 2004 contract let- the accuracy suggested by CII and AACE considered preliminary
tings contained in DCIS provided by the TxDOT Transportation estimates.
Planning and Programming (TPP) division. Not all modeled work
items were presented in these randomly selected projects. The pre- 8. Discussion
diction abilities of quantity models were checked by comparing the
predicted item quantity computed against the actual historical The average R2 value of the presented quantity models is 0.61.
quantity recorded. The results of selected work items can be seen In summary, the quality of the data was sufficient to develop pre-
in Table 6. dictive models with moderate high coefficients of determination.
It practice, it is not expected that the predicted quantity is ex- The derived quantity models with corresponding unit prices for a
actly equal to the actual quantity during the early stages. Instead, specific project type as shown in Table 4 can be summed up and
an average value with given parameters is obtained as a best point then divided by a fixed percentage associated with each project
estimate produced by the statistical model. This study assumes type to produce a preliminary cost estimate. For instance, the stan-
that the unit price is segregated from the quantity and the same dard work items in the RER project type historically over the last
unit price for each work item was used in the model validation. three years comprised 20.43% of total project costs. Therefore,
Therefore, the examination of quantity accuracy is similar to that the preliminary cost estimates for RER can be assumed to be the
of cost accuracy. The Construction Industry Institute (CII) has pub- summation of estimated costs for individual standard work items
lished recommended practices regarding preliminary estimate divided by the percentage of a project type and adding a contin-
classes with accuracy from ±30% to ±50% (Trost, 1998). The Associ- gency based upon the perceived level of uncertainty.
ation for the Advancement of Cost Engineering International
(AACE, 1997) suggested the earliest two classes of cost estimate 9. GLM-based expert system for cost estimation
in the range of 30% to +100%. The validation results indicate that
the quantity models performed well for the new highway projects An automated expert system was developed by utilizing the
examined with average prediction errors ranging from 0.63% to aforementioned generalized linear models. Since TxDOT updates
J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267 4263

unit bid prices every month, the system can incorporate the lat- 9.1. System flow
est unit prices for the vital items. The remaining work activities
that can be identified as more information acquired are modeled Based upon user selection of project type, the software program
as single roll-up item with a percentage cost markup at this generates a list of associated standard work items. The system then
stage. requires the user to input project basis information, which is suffi-

Preview/Customize/Print
Estimating Results
Estimator
Estimating report
Step 1: Input or select CSJ
Track previous estimate created before
Enter Step 2: View/customize
system estimating reports and
Go back
ready to print out
----------------------------------
Produce a semi-detailed Next page
PILCES: Main Menu estimating report for paper trail

Option 1: Create new


estimate
Option 2: Track previous
estimate
Option 3: Exit
Item bank & statistical model library
Query standard work items,
unit prices & call associated models
using SQL & Programming

Graphical user Project Information


interface Select BR
Step 1: Input basic parameters
Select BWR Step 2: Calculate Preliminary
Create new estimate Create New Estimate
Select INC Quantity and Unit Price
Select Project Type: .
Step 3: Save to database and go
Go back
---------------------- . . to next page
BR, BWR, INC, NNF, . -----------------------------------
RER, UGN, WF, WNF
Select WNF
. Required Project Information:
Project Type; Your name; CSJ;
Go back
. Estimating date, Project length,
Average daily traffic, etc.

Go back

Fig. 5. System flow for the automatic expert system.

Fig. 6. Project type selection window.


4264 J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267

Fig. 7. Example of input data for widen freeway project.

Fig. 8. Windows for preview, customize, and print preliminary cost estimating results.
J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267 4265

cient for the quantities calling statistical functions to be calculated. quantity models and related unit prices. The information flow of
Unit prices are then applied to the quantities. The final result is a expert system for cost estimation is shown in Fig. 5.
semi-detailed preliminary cost estimate with a listing of basis
information, item description, item quantity and unit, unit price, 9.2. PILCES program
item cost and preliminary total project cost. The report screen al-
lows the user to print out or save the item-level estimates includ- To transform the estimating models into user-friendly applica-
ing basis project information. The user can at any time view and tions, a prototype Preliminary Item-level Cost Estimating System
update previous estimates. For long-term use, the database system (PILCES) was developed with the aid of computer programming.
is extensible, allowing adding more work items, updated statistical The design concept driving the development of this automated

Fig. 9. Sample screen shot of estimating report.


4266 J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267

system was to exploit work item historical unit prices rather to can automatically predict project elements and item quantities,
continue to speculate based on subjective experience and initiate and track changes over project life. The program was developed
preliminary estimates with quantity-based models. This system in the form of a collection of relational tables. Preliminary cost esti-
enables subsequent item-level quantity growth tracking with a pa- mate information can be extracted in a consistent format from the
per trail as the project evolves to the next development phase. relational data warehouse for further analysis.(Rob & Coronel,
As the user proceeds, the system displays a graphical interface. 2007). This system enables subsequent quantity growth tracking
The user can select to create a new estimate, track a previous esti- with minimal effort and minimal information, and leaves a docu-
mate or exit the system. Once the user enters the system and se- mentation trail for early estimate input parameters. The stored
lects ‘‘Create New Estimate”, the program prompts the users to information can also be used for cost long-range management
select a project type as shown in Fig. 6. The graphical interface of- and control. Over time and with good documentation, feedback
fers the users a selection among eight project types. Within the from projects can be used to improve the prediction models.
system, a list of major work items is generated through structured The primary advantage of separate quantity uncertainty from
query language (SQL) based on project type. unit price uncertainty is to remove the consideration of the regio-
Upon selection of a project, the program requests project basis nal cost conversion and fluctuated market rates. Therefore, the po-
information as input parameters for preliminary cost estimation. tential influence of unit price-related drivers in the variance of
A sample input is illustrated in Fig. 7 compiled from the analysis item cost was reduced. Instead of making wild guesses, the item
results in this study. A message box pops up once the quantity unit price can be easily determined through the three lowest bids
and unit price calculation is completed successfully with the pro- recently which are usually posted in public on the agencies’
vided information upon clicking the calculating button. website.
As the user proceeds to the next page, the computer program Estimates made during conceptual planning of project scope
generates a list of major work items related to the selected project. definition usually contribute to inaccurate estimates due to lack
In this window, the user can preview preliminary cost estimating of sufficient project information. Inaccurate estimates may be
results by selecting or inputting control section job (CSJ) number amplified when prepared by inexperienced estimators or with poor
as shown in Fig. 8. procedures and tools. In practice, estimators are highly dependent
The user can export this output into other software applications on past bid data to predict future highway budgets in preliminary
or directly as a report as shown in Fig. 9. For other project types, planning. Therefore, the models developed in this study should be
this program generates similar screens and various outputs in utilized with caution and must be updated regularly as new project
the same format. data accumulating to an extent.
This paper presented a development of an expert system with
statistical models embedded that takes advantage of the historical
10. Conclusions close-out data to predict the cost of a work item. At project incep-
tion this system produces preliminary estimates and provides an
In creating conceptual estimates, 62% of US DOTs use estimating opportunity for periodic quantity and price adjustment till design
cost data that are based solely upon historic lane-mile cost aver- completion. The estimating models developed in this study should
ages for similar projects (Schexnayder et al., 2003). Often for infra- be applied with caution to initiate quantity estimates in other
structure projects, they use only one parameter, historical square- states.
foot or square-meter cost data. This study proposed an alternative
solution by employing generalized linear models for transportation Acknowledgements
construction which separate from fluctuated unit prices and ex-
pected to involve more project’s unique characteristics. This study was based on a research project funded by the Texas
This research contributed to the enhancement of TxDOT’s pres- Department of Transportation (TxDOT). The author gratefully
ent approach to preliminary cost estimating. Statistical techniques acknowledges the historical project information provided by the
were employed to identify the sources of quantity variability that Texas Transportation Planning and Programming Office. In addi-
influence the quantities for earthwork and landscape, subgrade tion, the author would also like to dedicate heartfelt gratitude to
treatments and base, surface courses and pavement, and lighting, Taiwan National Science Council whose sponsorship in part made
signing, markings and signals that are known at the time of devel- this research work continued.
oping the initial estimate for eight infrastructure project types. The
significant factors were found to pertain to highway geometry such References
as project length, proposed main lane number, lane width, shoul-
der width, and project width. AACE. (1997). Recommended Practice No. 17R-97: Cost estimate classification system.
In addition, other basis project functions or features included AACE, Inc.
Adeli, H., & Wu, M. (1998). Regularization neural network for construction cost
truck percentage measured on the road, average daily traffic, pro- estimation. Journal of Construction Engineering and Management, 124(1), 18–24.
ject type, national highway system, and designed speed for traffic Akinci, B., & Fischer, M. (1998). Factors affecting contractors’ risk of cost
of highway. Among the factors mentioned above, interaction ef- overburden. Journal of Management in Engineering, 14(1), 67–76.
Akintoye, A. (1998). Analysis of factors influencing project cost estimating practice.
fects were also considered for the highway construction estimates. Construction Management and Economics, 18, 77–89.
Interestingly whether the project location is in the rural or urban Akintoye, A., & Fitzgerald, E. (2000). A survey of current cost estimating practices in
areas appeared not to have significant difference on work quantity the UK. Construction Management and Economics, 18, 161–172.
Albright, S. C., Winston, W., & Zappe, C. J. (2003). Data analysis and decision making
estimates based on the analysis results. All of these parameters are
with Microsoft Excel. CA: Thomson Brooks/Cole.
readily accessible early during project definition without requiring Al-Tabtabai, H., Alex, A. P., & Tantash, M. (1999). Preliminary cost estimation of
detailed information, such as the geometric design drawings, highway construction using neural networks. Cost Engineering, 41(3), 19–24.
Anderson, S., Molenaar, K., & Schexnayder, C. (2007). Guidance for cost estimation
reducing the potential time consuming estimation and increase
and management for highway projects during planning, programming, and
cost effectiveness. preconstruction. NCHRP Report 574. Washington, DC: National Cooperative
By transforming the models into a user-friendly application, the Highway Research Program, Transportation Research Board.
research contributes to facilitate early and continuous quantity- Baloi, D., & Price, A. D. F. (2003). Modelling global risk factors affecting construction
cost performance. International Journal of Project Management, 21, 261–269.
tracking and easy parameter documentation for subsequent use. Bell, L. C., & Bozai, G. A. (1987). Preliminary cost estimating for highway
The prototype expert system for transportation cost estimation construction projects. AACE Transactions, C.6.1–C.6.4.
J.-S. Chou / Expert Systems with Applications 36 (2009) 4253–4267 4267

Black, J .H. (1984). Application of parametric estimating to cost engineering. AACE Munns, A. K., & Al-Haimus, K. M. (2000). Estimating using cost significant global
Transactions, B.10.1–B.10.5. cost models. Construction Management and Economics, 18(5), 587–598.
Bobko, P. (2001). Correlation and regression applications for industrial organizational Norusis, M.J. (2002). SPSS 11.0 Guide to Data Analysis, Prentice Hall.
psychology and management. CA: Sage Publications Inc. Neter, J., Kutner, M. H., Nachtsheim, C. J., & Wasserman, W. (1996). Applied linear
Chengalur-Smith, I. N., Ballou, D. P., & Pazer, H. L. (1997). Modeling the costs of statistical models. Irwin.
bridge rehabilitation. Transportation Research Part A, 31(4), 281–293. Oberlender, G. D. (1998). Improving early estimates-best practices guide. construction
Chou, J.-S., Peng, M., Persad, K. R., & O’Connor, J. T. (2006). Quantity-based approach industry institute (Implementation Resource 131-2), Austin, TX.
to preliminary cost estimates for highway projects. Transportation Research Odeck, J. (2004). Cost overruns in road construction – what are their sizes and
Record (1946), 22–30. determinants? Transport Policy, 11(1), 43–53.
Duverlie, P., & Castelain, J. M. (1999). Cost estimation during design step: Phaobunjong, K., & Popescu, C. M. (2003). Parametric cost estimating model for
Parametric method versus case based reasoning method. International Journal buildings. AACE International Transaction, EST.13.1–EST.13.11.
of Advanced Manufacturing Technology, 15(12), 895–906. PMI. (2004). A guide to the project management body of knowledge (PMBOK Guide)
Emsley, M. W., Lowe, D. J., Duff, A. R., Harding, A., & Hickson, A. (2002). Data modeling (3rd ed.). Project Management Institute.
and the application of a neural network approach to the prediction of total Rob, P., & Coronel, C. (2007). In Database systems: Design, implementation, &
construction costs. Construction Management and Economics, 20(6), 465–472. management. Thomson Course Technology.
Flyvbjerg, B., Holm, M. S., & Buhl, S. (2002). Underestimating costs in public works Saito, M., Sinha, K. C., & Anderson, V. L. (1991). Statistical models for the estimation
projects – Error or Lie? Journal of the American Planning Association, 68(3), of bridge replacement costs. Transportation Research Part A, 25A(6), 339–
279–295. 350.
GAO. (2002). Transportation infrastructure: Cost and oversight issues on major Sanders, S. R., Maxwell, R. R., & Glagola, C. R. (1992). Preliminary estimating models
highway and bridge projects. GAO-02-702T. Washington, DC: United States for infrastructure projects. Cost Engineering, 34(8), 7–13.
General Accounting Office. Schexnayder, C. J., Weber, S. L., & Fiori, C. (2003). Project cost estimating: A synthesis
Harbuck, R. H. (2002). Using models in parametric estimating for transportation of highway practice. national cooperative research program, Transportation
projects. AACE International Transactions, EST.05(ES51), EST.05.1–EST.05.09. Research Board.
Hegazy, T., & Ayed, A. (1998). Neural network model for parametric cost estimation SPSS. (2003). SPSS for windows, release 12.0.0. Statistical Package for Social Sciences.
of highway projects. Journal of Construction Engineering and Management, 124(3), Chicago: SPSS Inc.
210–218. Sthapit, N., & Mori, H. (1994). Model to estimate highway earthwork cost in Nepal.
Herbsman, Z. (1983). Long-range forecasting highway construction costs. Journal of Journal of Transportation Engineering, 120(3), 498–504.
Construction Engineering and Management, 109(4), 423–435. Touran, A. (1993). Probabilistic cost estimating with subjective correlations. Journal
Jrade, A., & Alkass, S. (2002). An integrated system for conceptual cost estimating of of Construction Engineering and Management, 119(1), 58–71.
building projects. Cost Engineering, 44(10), 28–35. Trost, S. M. (1998). A quantitative model for predicting the accuracy of early cost
Lowe, D. J., Emsley, M. W., & Harding, A. (2006). Predicting construction cost using estimates for construction projects in the process industry, Ph.D. Dissertation,
multiple regression techniques. Journal of Construction Engineering and Oklahoma State University, Stillwater, OK.
Management, 132(7), 750–758. Trost, S. M., & Oberlender, G. D. (2003). Predicting accuracy of early cost estimates
Masi, J. G. (2003). Development and maintenance of a parametric building cost using factor analysis and multivariate regression. Journal of Construction
estimating system. Cost Engineering, 45(2), 33–37. Engineering and Management, 129(2), 198–204.
Molenaar, K. R. (2005). Programmatic cost risk analysis for highway megaprojects. Williams, T. P. (2005). Bidding ratios to predict highway project costs. Engineering,
Journal of Construction Engineering and Management, 131(3), 343–353. Construction and Architectural Management, 12(1), 38–51.
Morcous, G., Bakhoum, M. M., Taha, M. A., & El-Said, M. (2001). Preliminary quantity Wilmot, C. G., & Cheng, G. (2003). Estimating future highway construction costs.
estimate of highway bridges using neural networks. In Proceedings of the sixth Journal of Construction Engineering and Management, 129(3), 272–279.
international conference on the application of artificial intelligence to civil and Yu, W.-D. (2006). PIREM: A new model for conceptual cost estimation. Construction
structural engineering computing (pp. 51–52). Management and Economics, 24(3), 259.

You might also like