You are on page 1of 6


SQIT 3033- Knowledge Acquisition In Decision Making Group A Individual Assignment 1 Title:

Data Mining Application in Construction Project

Prepared by

Chan Yao Liang 211905

Prepared to:

Dr. Izwan Nizal Bin Mohd Shaharanee

Date to submitted: 6 March 2014

1 2


Introduction -----------------------------------------------------1 Cost Overruns in Construction Project ------------------ 1 2.1 Cost Model Development------------------------------1-2

3 4

Conclusion ------------------------------------------------------ 3 References ------------------------------------------------------- 4


Data mining is also known as knowledge discovery in databases, which is the computational process of discovering behaviors and patterns from the large amount of data by using machine learning, artificial intelligence, statistics, and database systems. The main objective of using data mining is to extract the implicit information from the large data sets and make it into a clear situation and understandable structure for future use. One of the applications can be applied and utilized in data mining is construction project. 2 Cost Overruns in Construction Project

Cost performance on a construction project is a very important criterion to success the construction project. It can be a win-win situation to the contractor and the client if the cost performance is great enough. The contractor can get higher profit from the project and the client is able to pay within the budget. However, that is a difficulty to estimate the final cost of construction projects due to the complex web of cost influencing factors that need to be considered such as the type of project, type of client, ground condition and others related factors. One of the main causes of overruns is the lack of adequate information on which to base realistic and accurate estimates. The more the information available, the higher the level of accuracy of the estimates produced. Hence, data mining is the best way of capturing valuable information within historical data to support the estimation process in the construction project. 2.1 Cost Model Development

One of disciplines of data mining is data visualization. This technique is used to model the non-linear relationship between the cost variable and final cost by using the scatter and mean plots. However, most predictors are categorical rather than numerical in nature. Hence, Artificial Neural Networks (ANN) will be chosen as the actual modeling technique. It is because the ANN is a predictive and non-linear model which has the ability to cope with non-linear relationships and categorical variables. Although they are

powerful predictive modeling techniques, some of the power comes from the learning, experience and generalize based on acquired knowledge. Because of their complexity, they are better employed in situations where they can be used and reused, such as the cost variable and the final cost of the construction project. In the initial stage of the research, Artificial Neural networks (ANN) has been used to develop the prototype models. Besides, this has also been utilized to forecast the tender price and for identification and quantification of risk. The final model has developed after an iterative process of fine-tuning the network parameters and inputs until acceptable error levels are accepted or no further improvement in the model. The overall network performance has measured by using the correlation coefficient between the predicted and output values which is the Sum of Squares (SOS). The formula of SOS is stated here as ( Where, Oi is the predicted final cost of the ith data set (Output) Ti is the actual final cost of the ith data set (Target) The higher the SOS value, the poorer is the network at generalization. In other word, the higher the correlation coefficient, the better is the network. Besides, the p-value of the correlation coefficients has computed to measure their statistical significance. The higher the p-value, the less reliable is the observed correlations. Lastly, sensitivity analysis will be performed on each retained network which assessing predictors contribution to network performance. The performance of the model is measured while deleting one input factor at a time, starting from the least important, until the model showed no further improvement. )


By using data mining in construction project, it will improve the cost performance and help the contractors to transform their large data sets into useful information for business improvement. This study focuses on the cost estimation is due to the cost overruns caused by lacking information at the beginning of the construction project. Data mining play an important role as combine the knowledge and existing information to make its forecasts of final cost. Thus, the Artificial Neural Networks (ANN) is applied and utilized toward the cost issues of construction project. One of the major challenges of data mining is to identify a poor culture of data warehousing in the construction industry. For instance, data mining requires large data sets to transform into useful information. However, for most construction firms, there is unavailable or low amount of useful and complete data to model the construction processes. It is a limitation and potential pitfall must always be clearly communicated to the end user and find ways to solve this problem.

4 Data

References Mining. (n.d.). Retrieved March 1, 2014, from

The Best Data Mining Tools You Can Use for Free in Your Company. (n.d.). Retrieved March 1, 2014, from Data Mining 101: Tools and Techniques. (n.d.). Retrieved March 2, 2014, from Data Mining To Reduce Construction Cost Overruns. (n.d.). Retrieved March 2, 2014, from uce_construction_cost_overruns

4 4