Professional Documents
Culture Documents
Big Data in Construction Industry PDF
Big Data in Construction Industry PDF
DOI 10.1016/j.aei.2016.07.001
ESSN 2352-7102
Publisher: Elsevier
NOTICE: this is the author’s version of a work that was accepted for publication in
Advanced Engineering Informatics. Changes resulting from the publishing process, such as
peer review, editing, corrections, structural formatting, and other quality control
mechanisms may not be reflected in this document. Changes may have been made to this
work since it was submitted for publication. A definitive version was subsequently
published in Advanced Engineering Informatics, [3, 30, (2016)] DOI:
10.1016/j.aei.2016.07.001
Copyright © and Moral Rights are retained by the author(s) and/ or other copyright owners. A
copy can be downloaded for personal non-commercial research or study, without prior permission
or charge. This item cannot be reproduced or quoted extensively from without first obtaining
permission in writing from the copyright holder(s). The content must not be changed in any way or
sold commercially in any format or medium without the formal permission of the copyright
holders.
This document is the author’s post-print version, incorporating any revisions agreed during the
peer-review process. Some differences between the published version and this version may
remain and you are advised to consult the published version if you wish to cite from it.
1
1 Abstract—The ability to process large amounts of data and computing), compressed, in diverse proprietary formats, and 51
2 to extract useful insights from data has revolutionised society. intertwined [6]. Accordingly, this diverse data is collated in 52
3 This phenomenon—dubbed as Big Data—has applications for a federated BIM models, which are enriched gradually and 53
4 wide assortment of industries, including the construction industry. persisted beyond the end-of-life of facilities. BIM files can 54
5 The construction industry already deals with large volumes of quickly get voluminous, with the design data of a 3-story 55
6 heterogeneous data; which is expected to increase exponentially
7 as technologies such as sensor networks and the Internet of
building model easily reaching 50GB in size [7]. Noticeably, 56
8 Things are commoditized. In this paper, we present a detailed this data in any form and shape has intrinsic value to the 57
9 survey of the literature, investigating the application of Big Data performance of the industry. With the advent of embedded 58
10 techniques in the construction industry. We reviewed related devices and sensors, facilities have even started to generate 59
11 works published in the databases of American Association of massive data during the operations and maintenance stage, 60
12 Civil Engineers (ASCE), Institute of Electrical and Electronics eventually leading to more rich sources of Big BIM Data. This 61
13 Engineers (IEEE), Association of Computing Machinery (ACM), vast accumulation of BIM data has pushed the construction 62
14 and Elsevier Science Direct Digital Library. While the application industry to enter the Big Data era. 63
15 of data analytics in the construction industry is not new, the Big Data has three defining attributes (a.k.a. 3V‘s), namely 64
16 adoption of Big Data technologies in this industry remains at a (i) volume (terabytes, petabytes of data and beyond); (ii) 65
17 nascent stage and lags the broad uptake of these technologies in
variety (heterogeneous formats like text, sensors, audio, video, 66
18 other fields. To the best of our knowledge, there is currently no
19 comprehensive survey of Big Data techniques in the context of graphs and more); and (iii) velocity (continuous streams of the 67
20 the construction industry. This paper fills the void and presents a data). The 3V‘s of Big Data are clearly evident in construction 68
21 wide-ranging interdisciplinary review of literature of fields such data. Construction data is typically large, heterogeneous, and 69
22 as statistics, data mining and warehousing, machine learning, and dynamic [8]. Construction data is voluminous due to large 70
23 Big Data Analytics in the context of the construction industry. volumes of design data, schedules, Enterprise Resource Plan- 71
24 We discuss the current state of adoption of Big Data in the ning (ERP) systems, financial data, etc. The diversity of con- 72
25 construction industry and discuss the future potential of such struction data can be observed by noting the various formats 73
26 technologies across the multiple domain-specific sub-areas of the supported in construction applications including DWG (short 74
27 construction industry. We also propose open issues and directions for drawing), DXF (drawing exchange format), DGN (short for 75
28 for future work along with potential pitfalls associated with Big
design), RVT (short for Revit), ifcXML (Industry Foundation 76
29 Data adoption in the industry.
Classes XML), ifcOWL (Industry Foundation Classes OWL), 77
36 increased in 2012 is approximately 2.5 quintillion (1018 ) bytes To understand the subtleties of Big Data, we need to 85
37 per day [2]. This data growth brings significant opportunities disambiguate between two of its complementary aspects: Big 86
38 to scientists for identifying useful insights and knowledge. Data Engineering (BDE) and Big Data Analytics (BDA). The 87
39 Arguably, the accessibility of data can improve the status domain of BDE is primarily concerned with supporting the 88
40 quo in various fields by strengthening existing statistical and relevant data storage and processing activities, needed for 89
41 algorithmic methods [3], or by even making them redundant analytics [9]. BDE encompasses technology stacks such as 90
42 [4]. Hadoop and Berkeley Data Analytics Stack (BDAS). Big Data 91
43 The construction industry is not an exception to the per- Analytics (BDA), the second integral aspect, relates to the tasks 92
44 vasive digital revolution. The industry is dealing with sig- responsible for extracting the knowledge to drive decision- 93
45 nificant data arising from diverse disciplines throughout the making [9]. BDA is mostly concerned with the principles, 94
46 life cycle of a facility. Building Information Modelling (BIM) processes, and techniques to understand the Big Data. The 95
47 is envisioned to capture multi-dimensional CAD information essence of BDA is to discover the latent patterns buried 96
48 systematically for supporting multidisciplinary collaboration inside Big Data and derive useful insights therefrom [10]. 97
49 among the stakeholders [5]. BIM data is typically 3D ge- These insights have the capability to transform the future 98
50 ometric encoded, compute intensive (graphics and Boolean of many industries through data-driven decision-making. This 99
2
1 ability to identify, understanding and reacting to the latent and produce final results which are stored back to the file 55
2 trends promptly is indeed a competitive edge in this hyper- system. Hadoop—a popular Big Data platform—introduced 56
4 Contributions of this paper: While some data-driven so- to execute MR programs successfully. In a typical Hadoop 58
5 lutions have been proposed for the fields of the construction cluster, several mappers and reducers simultaneously run MR 59
6 industry, there is currently no comprehensive survey of the programs. MR is a powerful model for batch-processing tasks. 60
7 literature, targeting the application of Big Data in the context However, it is struggling with applications that require real- 61
8 of the construction industry. This paper fills the void and time, graph, or iterative processing. Latest versions of Hadoop 62
9 presents a wide-ranging interdisciplinary study of fields such as have encountered this issue to some extent where processing 63
10 Statistics, Data Mining and Warehousing, Machine Learning, aspect of MR is detached from rest of the ecosystem. To this 64
11 Big Data and their applications in the construction industry. end, Yet Another Resource Negotiator (YARN) is introduced 65
18 open research issues and future work, and pitfalls of Big Data processing model for Big Data platforms. In contrary to MR, 72
19 in the construction industry are then presented in Section V DAG relaxes the rigid map-then-reduce style of MR to a more 73
20 and VI, respectively. generic notion. BDAS—an emerging Big Data platform— 74
21 II. B IG DATA E NGINEERING (BDE) component called Spark [14]. Spark holds supremacy over MR 76
22 Big Data Engineering (BDE) provide infrastructure to sup- in many aspects. Particularly, in-memory computation and high 77
23 port Big Data Analytics (BDA). Some discussions about the expressiveness are keys to wider adoption of Spark. These 78
24 Big Data platforms worth consideration to understand the BDE capabilities heralded the Spark a natural choice to support 79
25 adequately. Various Big Data platforms are developed so far iterative as well as reactive applications [14]. Spark is reported 80
26 with varied characteristics, which can be divided into two to have ten times faster than MR on disk-resident tasks, 81
27 groups: (i) horizontal scaling platforms (HSPs), the ones that whereas hundred times faster for memory-resident tasks [11]. 82
28 distribute processing across multiple servers and scale out by Fig. 3 shows components of Spark. These technologies are 83
29 adding new machines to the cluster. (ii) And vertical scaling designed to support functions that are vital to the development 84
44 A. Big Data Processing I/O required to bring sparsely distributed data together for 99
45 [Fig. 2 about here.] analysis using MR. To overcome this, a data pre-partition 100
49 1) MapReduce (MR): MR is the distributed processing the CPU maximally during analysis [16]. This way Hadoop 105
50 model to handle Big Data [13]. The entire analytical tasks is customised for BIM data and querying components are 106
51 in MR are written as two functions, i.e., map and reduce implemented as YARN applications. A BIM system for clash 107
52 (see Fig. 2), which are submitted to separate processes called detection and quantity estimation is developed to exploit the 108
53 Mappers and Reducers. Mapper read data, process it, and gen- proposed YARN applications. It is reported that the system has 109
54 erate intermediate results. Reducers work on mappers’ output improved the performance manifold, and the required tasks are 110
3
1 executed at real time with reasonable response time. More importantly, NoSQL systems eschew the rigid schema- 55
2 Lin et al. [7] presented the development of a specialised big oriented storage in favour of schema-less storage to achieve 56
3 BIM data storage and retrieval system for experts and naive flexibility [18]. Today these systems are prevalent in myriad 57
4 BIM users. The intentions are to develop a highly interactive data-intensive applications in many industries. Pointedly, the 58
5 user interface for querying BIM data through mobile devices architecture of NoSQL systems is well suited to fragmented 59
6 to maximise its utility and usability. User queries in plain nature of construction industry’s data. 60
7 English are re-formulated using the proposed natural language NoSQL systems store schema-less data in a non-relational 61
8 processing approach to retrieve highly complex BIM data, data model. Presumably, there are four data models for these 62
10 query execution, an MR join pre-processing is demonstrated 1) Key-Value: This is the simplest data model to store 64
11 to merge two BIM collections before query evaluation. The unstructured data. However, the underlying data is not 65
13 compared to the same join pre-processing written in traditional 2) Document: This data model is suitable for storing self- 67
can be inefficient. 69
20 1) Distributed File Systems: In this subsection, we are the period of popularity and y-axis shows a change in 76
21 discussing two competing distributed file systems, namely popularity). Table VIII describes features of 12 prominent 77
27 higher in such settings, it provides greater fault tolerance Storage: Despite significance for massive BIM data storage, 82
28 for hardware failures. Data distribution and replications existing applications are still lacking their successful imple- 83
29 are the key traits of HDFS to achieve the fault tolerance mentation. Das et al. [20] proposed Social-BIM to capture 84
30 and high availability. There are however situations when social interactions of users along with the building models. 85
31 the usage of HDFS degrades performance, particularly in A distributed BIM framework, called BIMCloud, is developed 86
32 applications requiring low-latency data access. Similarly, to store this data through IFC. Apache Cassandra, hosted 87
33 it is also not ideal for storing a large number of small on Amazon EC2, is used. Jeong et al. [21] proposed a 88
34 files due to the associated overhead for managing their hybrid data management infrastructure comprised two tiers. 89
35 metadata. Lastly, HDFS is not the choice of technology The client tier that utilises MongoDB for storing the structured 90
36 if applications require a significant number of concurrent data temporarily for efficiently completing analytical tasks, 91
37 modifications at random places in data. whereas, the central tier employs Apache Cassandra to store 92
38 • Tachyon is the BDAS flagship distributed file system permanently the streams of sensor data generated over time. 93
39 that extends HDFS and provides access to the distributed Cheng et al. [22] have also employed the Apache Cassandra for 94
40 data at memory speed across the cluster. Some of the presenting their query language to extract partial BIM models. 95
41 features where Tachyon has outsmarted HDFS include: Similarly, Lin et al. [7] exploited MongoDB to store BIM data 96
42 (i) in-memory data caching for enhanced performance of building models for distributed processing through MapRe- 97
43 and (ii) backwards compatibility to work seamlessly duce. MongoDB is tailored for IFC, with minor alterations to 98
44 with Spark as well as MR tasks without any code IFC hierarchy for supporting MR-efficient query execution. 99
46 2) NoSQL Databases: Relational databases served IT in- Big Data Analytics has a rich intellectual tradition and 101
47 dustry for the past couple of decades as de facto data man- borrows from a wide variety of fields. There have been 102
48 agement standard. However, recently applications emerged traditionally many related disciplines that have essentially the 103
49 that demanded more scalability, performance, and flexibility. same core focus: finding useful patterns in data (but with a 104
50 Relational databases are found unsuitable for these applications different emphasis). These related fields are Statistics (18301 ) 105
51 due to their specialised storage and processing needs. Conse- 1 While it can be difficult to pin down the exact time of genesis of a
52 quently, new systems came into being—called “Not only SQL” technology, the year in which the domain’s seminal work was proposed is
53 (NoSQL) systems—to fill this technological gap. NoSQL sys- provided to approximately sequence the various Big Data Analytics enabling
54 tems improved traditional data management in numerous ways. technologies chronologically.
4
1 [23], Data Mining (1980), Predictive Analytics (1989 [24]), Databases are crucial to empowering various aspects of 54
2 Business Analytics (1997), Knowledge Discovery from Data data mining, in particular by taking care of the activities of 55
3 (KDD) (2002), Data Analytics (2010), Data Science (2010) efficient data access, group and ordering of operations and 56
4 and now Big Data (2012). Fig. 5 shows the relevance of these optimising the queries to scale up data mining algorithms. 57
5 multidisciplinary fields to Big Data. So, Big Data Analytics Databases provide native support for analytics in the form 58
6 is a broadening of the field of data analytics and incorporates of data warehousing. In data warehousing, the copy of the 59
7 many of the techniques that have already been performed. This transactional data is stored specifically structured for querying 60
8 is the key reason that most of the existing work, presented in and the analysis [37], [41]. The transactional data is collated 61
9 subsequent subsections, has focused on data analytics rather from the operational databases using a process usually known 62
10 than Big Data is that the Big Data revolution—i.e., the ability as Extract, Transform, and Load (ETL) [42]. Data in the 63
11 to process large amounts of diverse data on a large scale—has warehouse is typically analysed through the Online Analytical 64
12 only recently happened. Existing approaches can be possibly Processing (OLAP). OLAP outperforms SQL in computing the 65
13 extended to the environments, dealing with large, diverse summaries (roll-up) and breakdown (roll-down) of the data. 66
16 Some ML-based tools have been developed for Big Data presented knowledge discovery in databases (KDD) framework 70
17 analytics. Table IX highlight some of the important ones. To to analyse massive construction datasets. Limitations of ML 71
18 showcase the implementation of BDA, we use MLlib (MLbase) algorithms (such as incorrect prediction) are discussed and 72
19 code in the subsequent subsections. overcome through statistical methods. Buchheit et al. [43] also 73
21 1) Statistics: In scientific studies, rigorous and efficient applicability of KDD to construction datasets for identifying 77
22 techniques are used to answer research questions. Careful causes of construction delays, cost overrun, and quality con- 78
24 investigations. Statistics is the study of collecting, analysing, Carrilli et al. [32] used data mining to learn from past 80
25 and drawing conclusions from the data, with the primary projects and improve future project delivery. Approaches such 81
26 focus on selecting the right tools and techniques at every as text analysis, link analysis and dimensional matrix analysis 82
27 data analysis stage [29]. Right from the data collections, are performed on data from multiple projects. Liao et al. [45] 83
28 to efficiently analysing it, and then inferring or formulating employed association rule mining to proactively prevent oc- 84
29 conclusions out of it, all of these steps comes under the scope cupational injuries. In another similar study [46], data mining 85
30 of statistics [30]. Various fields of analytics are borrowing is used to explore the causes and distribution of occupational 86
31 techniques from statistics [29]. injuries and revealed that falls and collapses are the primary 87
32 Examples of Construction Research using Statistics: The reasons of occupational fatalities. While Oh et al. [47] em- 88
33 industry is employing statistical methods in a variety of appli- ployed DW in construction productivity data, which is utilised 89
34 cation areas, such as identifying causes of construction delays using a multi-layer analysis through OLAP in the proposed 90
35 [31], learning from post-project reviews (PPRs) [32], decision system. SQL is quite prevalent in the industry for querying 91
36 support for construction litigation [33], detecting structural partial BIM models query languages such as Express Query 92
37 damages of buildings [34], identifying actions of workers and Language (EQL) and Building Information Modelling Query 93
38 heavy machinery [35], [36], etc., are to name a few. Language (BIMQL) are developed in the various construction 94
49 ability of particular technique(s) for solving the given business a sub-field of Artificial Intelligence (AI), focuses on the task 105
50 problem. Models with the highest accuracy and tolerance are of enabling computational systems to learn from data about 106
51 chosen and applied to the actual data for generating predictive specific task automatically. ML tasks can be categorized into: 107
52 results (including predictions, rules, probability, and predictive i) classification (or supervised learning); ii) clustering (or 108
2 ML has many applications across the construction applica- safety knowledge, drinking habits, direct employer, and indi- 57
3 tions, such as the modelling of judicial reasoning and pre- vidual safety behaviour. 58
8 are discussed by ML techniques in the subsequent sections. programs emulate decisions automatically based on the pre- 61
10 Regression is the supervised ML method, which is con- or mutually exclusive). It suits situations where single but more 65
11 cerned about predicting the numerical value of a target variable focused decisions are involved. Since these algorithms learn 66
12 based on input variables. For instance, estimating the cost of by examples, carefully crafted examples of correct decisions 67
13 the design based on design specifications. Regression can be of aside with input data are vital for algorithms to learn pre- 68
14 the following types. The simple linear regression that is used cisely. These algorithms learn to mimic the examples of right 69
15 for modelling the relationship between a dependent variable y decisions contrary to clustering in which algorithms decide 70
16 and one explanatory variable x. Multiple linear regression that on their own without prior guidance. Classification intends 71
17 is used for modelling the relationship between one dependent to choose a single choice from the limited set of possible 72
18 variable (continuous) and two or more explanatory variables. choices. Prominent classification algorithms include Logistic 73
19 This is commonly used regression approach. The logistic Regression, Naive Bayes, Decision Trees, and Support Vector 74
20 regression that is used for modelling the relationship between Machine (SVM). These algorithms are slightly discussed in 75
21 on categorical dependent variable and one or more explanatory the subsequent sections. 76
34 Examples of Construction Research using Regression: Siu P osterior = (P rior ∗ Likelihood)/Evidence (1)
35 et al. [60] employed regression for predicting the cycle times
Listing 2 shows MLlib code for Naive Bayes classifier, where 89
36 of construction operations using least-square-error and least-
data is split into training (60%) and test (40%), and model is 90
37 mean-square. The approach is evaluated on a project installing
built and used for making predictions. 91
38 Viaduct Bridge and is reported to have higher accuracy of
39 predictions. Aibinu et al. [61] employed linear regression for 1 v al s p l i t s = parsedData . randomSplit ( Array ( 0 . 6 , 0 . 4 ) , 92
50 Chan et al. [64] employed multiple regression analysis for Classifiers: Jiang et al. [34] presented a Bayesian probabilis- 103
51 predicting the partnering success of contracting parties. tic methodology for detecting the structural damages. Bayes 104
52 Fang et al. [65] applied logistic regression analysis to factor evaluation metric is computed from Bayes theorem 105
53 explore the relationship between safety climate and individual and Gaussian distribution assumption for accurate damage 106
54 behaviour. The results demonstrate the vivid relationship of identification. The effectiveness of the proposed techniques 107
55 safety climate and personal behaviour such as gender, marital is reported for assessing damage confidence of structures 108
6
1 over five damaged scenarios of four-story buildings bench- an SVM has a unique solution (since it involves optimisation 57
2 mark. Gong et al. [35] presented a framework for automated of a concave function). SVM uses kernel methods to map data 58
3 classification of actions of workers and heavy machinery from input/parametric space to higher level dimensional feature 59
4 in complex construction scenarios. They employed Bag-of- space. Listing 4 shows MLlib code to illustrate SVM, where 60
5 Video-Features-Model alongside the Bayesian probability for algorithm builds a model, compute accuracy on test data, and 61
6 evaluating and tuning action discovery. It is revealed that the evaluate the model. 62
16 Big Data technologies for enhanced accuracy and response. point . label )} 73
11 74
18 2) Decision Trees: Decision trees (DTs) is the modern ML Listing 4. A Snippet of MLlib Code for SVM
19 method for predicting about qualitative and quantitative target
20 features. The process of building DT begins with identifying Examples of Construction Research using SVM: To iden- 77
21 decision node and then recursively split nodes until no further tify the damages in bridges, Liu et al. [68] employed SVM 78
22 divisions are possible. The robustness of DT depends on the and genetic algorithms (GA). The selection, crossover, and 79
23 logic for splitting nodes, which is assessed by using concepts mutation in GA are used for selecting best kernel parameters 80
24 such as information gain (IG) or entropy reduction. Listing which are used in SVM as model parameters. A numerical 81
25 3 shows MLlib code to show DTs implementation; the data simulation is presented to see the feasibility of the pro- 82
26 is split into training and testing sets, initialized parameters, posed approach. Comparative analysis of GA-RBF (radical 83
27 created DT, and evaluated the model using data. basis function) and GA-BP (back propagation networks) is 84
35 8 val model = D e c i s i o n T r e e . t r a i n C l a s s i f i e r ( t r a i n D a t a , models is compared and contrasted against the Gold standard 91
41 Examples of Construction Research using Decision Trees: second, and third-degree polynomial kernel SVM models are 98
42 Pietrzyk et al. [66] studied the issue of mould germination in compared and contrasted. Highest accuracy is revealed for the 99
43 building structures using fault tree analysis. Structure related first and second-degree polynomial SVM, of 76% and 85% 100
44 deficiencies that are introduced during the construction process respectively, implemented using TF-IDF. Similarly, SVM is 101
45 are identified and classified. A probabilistic quantification used in fault detection system for HVAC under real working 102
46 model is generated to compare building structures based on conditions [71]. The SVM classifiers for fault detection and 103
47 their tendency for mould germination. Desai et al. [67] have isolation (FDI) are developed. The proposed approach can 104
48 employed decision trees to analyse and assess the labour pro- efficiently detect and isolate many typical HVAC faults. 105
53 3) Support Vector Machines (SVM): SVM is a widely problems. Multi-layer perceptron (MLP) is the most commonly 110
54 used technique that is remarkable for being practical and used type of ANN. ANNs are typically made up of three 111
55 theoretically sound, simultaneously. SVM is rooted in the field layers including an input layer, hidden (intermediate) layer, 112
56 of statistical learning theory, and is systematic: e.g., training and output layer. 113
7
1 Data samples in MLP neural network are normalised and evolution process. It computes better solutions to optimisation 59
2 are fed into the input layer. This data moves from the input problems using the concepts such as inheritance, mutation, 60
3 layer to one or two hidden layers and is finally passed onto the selection, and crossover. Typically GA algorithms involve 61
4 output layer, producing an output of the given ANN algorithm. creating two integral components, including (i) genetic rep- 62
5 Typically, x:y:z is used to describe the ANN topology in which resentation (array of bits) of the problem, and (ii) a fitness 63
6 x, y, z corresponds to the number of nodes in input, hidden, function to evaluate solution domain. The process starts with 64
7 and output layers, respectively. During the training phase, initiating a solution randomly and then keeps improving it 65
8 the values of connections between nodes (a.k.a weights) are through iterative application of mutation, crossover, inversion 66
9 adjusted. Back propagation, simulated annealing and genetic and selection unless an optimal solution is found. 67
10 algorithms are commonly used for training ANNs. Listing 5 Examples of Construction Research using GA: Chen et 68
11 shows MLlib code to explain lifecycle stages of ANN model al. [76] used GA to develop cost/schedule integrated planning 69
12 development and evaluation. system (CSIPS) which is focused on assigning crew optimally 70
16 3 val t e s t = s p l i t s (1) Similarly, Moon et al. [77] developed an active BIM system 74
17 4
for assessing the risks imposed by schedule and workspace 75
18 5 v al l a y e r s = Array [ I n t ] ( 4 , 5 , 4 , 3)
19 6
conflicts that typically happens during the construction phase 76
21 8 . setLayers ( layers ) . setBlockSize (128) algorithms for efficiently generating the optimal plan for 78
22 9 . s e t S e e d (1234L) . s e t M a x I t e r ( 1 0 0 ) workspace conflicts. 79
2310
2411 v a l model = t r a i n e r . f i t ( t r a i n ) 6) Latent Document Analysis (LDA)/ Latent Semantic 80
2512 v a l r e s u l t = model . t r a n s f o r m ( t e s t ) Analysis (LSA): LSA determines the meaning of words over 81
2613 val predictionAndLabels = r e s u l t
2714 . select (” prediction ” , ” label ”)
a large corpus of documents using statistical techniques. It 82
2815 v a l e v a l u a t o r = new uses singular value decomposition method as its entire basis 83
Listing 5. A Snippet of MLlib Code for ANN word priming, discourse comprehension, and essay quality 86
35 diagnosis. Fang et al. [74] employed ANN for structural Listing 6. A Snippet of MLlib Code for Latent Semantic Analysis
36 damage detection. Back propagation algorithm, empowered
37 by heuristics-based tunable steepest descent method, is used 1) Precision—is the fraction of retrieved documents, which 92
38 for training the neural network. Frequency response functions are relevant. It is useful to assess the quality of LSA 93
39 (FRF) are used for structural damage detection. A case study of approaches. 94
40 cantilevered beam is analysed for unseen, single, and multiple 2) Recall—is the fraction of the relevant documents, which 95
41 damage types. Similarly, ANN is employed alongside GA in are retrieved. Recall mostly informs about the complete- 96
42 [73] for fault classification, in which ANN and GA comple- ness of LSA approaches. 97
43 mented each other in reconstructing the missing input data. 3) F-Measure—is often used to combine precision and recall 98
44 Moselhi et al. [75] deliberated the usefulness of ANN over the for assessing the accuracy of tests. 99
45 conventional expert-based systems, employed in developing Listing 6 shows MLlib code to demonstrate the implemen- 100
46 various applications for the construction industry. A generic tation of LDA, where a corpus is created, and documents are 101
47 neural network based architecture is described, which is val- clustered based on word distribution. 102
48 idated by implementing an application for optimal markup
49 estimation. It is argued that ANN-based intelligent systems Examples of Construction Research using LDA & LSA: 103
50 guarantee ideal performance over the systems, developed using Kandil et al. [79] employed LSA for automated construction 104
51 conventional expert systems based approaches. document classification. The proposed technique classified two 105
52 ANN algorithms have recently brought revolution in ma- sets of documents: (1) documents with low word variations 106
53 chine learning through deep learning. New algorithms of ANN (claims and legal documents), and (2) documents with high 107
54 are designed to learn from high dimensionality data (i.e., Big word variations (correspondence and meeting minutes). The 108
55 Data), which seek special attention in all the construction evaluation of proposed technique provided satisfactory classi- 109
56 industry applications where ANN is employed. fication results. Mahfouz et al. [69] employed a hybrid ML- 110
57 5) Genetic Algorithms (GA): Genetic Algorithms (GA) are on top of SVM and LSA. The presented results are relatively 112
58 evolutionary ML algorithms that are inspired by the natural better than approaches based on a single ML technique. Salama 113
8
1 et al. [78] employed LSA-based classifiers for this purpose Based on these ratios (as independent variables), an automated 56
2 where the documents clauses are automatically classified into ML-based algorithm (Ripple Down Rules) is employed to 57
3 predefined categories such as environmental, health, etc., be- classify the overrun potential of construction projects into 58
4 fore rule extraction. The developed method is reported to following discrete values of Near, Overrun, BigOverrun. This 59
5 achieve 100% and 96% recall and precision respectively. exploration has revealed interesting rules for assessing the 60
11 classification, document analysis, image-based classification, uses site videos to measure the workers’ behaviour towards 66
12 the classification for predicting project overrun, and finally, the safety. The proposed approach analyse the 3D skeleton motion 67
13 classification for safety analysis. Pointedly, these applications model of the workers to identify their actions. Since safe and 68
14 need to be revamped with the Big Data technologies, since they unsafe actions are known, so the training data is correctly 69
15 present similar challenges of high dimensionality, velocity, labelled for safe and unsafe actions, which is exploited by 70
16 and variety. Besides, these applications also involve classy the classifier for learning. As a case study, the motion of 71
17 computation while performing domain-specific tasks. worker while climbing the ladder is analysed. It is revealed 72
25 The prototype system is evaluated and found very relevant. characteristics. Intuitively, clustering is akin to unsupervised 78
26 Rehman et al. [81] classified construction documents into classification: while classification in supervised learning as- 79
27 two distinct groups of good and bad information-containing sumed the availability of a correctly labelled training set, the 80
28 documents. Three layered ML approach is employed. Decision unsupervised task of clustering seeks to identify the structure 81
29 Trees (DT), Naive Bayes, SVM, and KNN algorithms are used of input data directly. Items in one cluster are similar to each 82
30 to check the accuracy of classification. Except for the DT, the other whereas different from the items of other clusters. Some 83
31 rest of algorithms have significantly improved the classification of the examples of clustering algorithms include K-means, O- 84
32 accuracy. Similarly, Liu et al.[82] presented the process for means, fuzzy K-means, and canopy. Listing 7 shows MLlib 85
33 structured document retrieval for engineering based document code for clustering data using K-Means and evaluating the 86
1 val numClusters = 2 88
35 Document Analysis: Soibelman et al. [83] proposed a 2 v a l n u m I t e r a t i o n s = 20 89
36 comprehensive platform to store and analyse unstructured doc- 3 v a l c l u s t e r s = KMeans . t r a i n ( p a r s e d D a t a , n u m C l u s t e r s , 90
37 uments used within a construction project. The system captures numIterations ) 91
39 diverse data about text, web, image, and linking and stores Listing 7. A Snapshot of MLlib Code for K-Means
40 it in an analytic-friendly format. These documents are then
41 automatically linked to the appropriate binary files (building Examples of Construction Research using Clustering: 93
42 models) using different ML classifiers, which dramatically Ng et al. [88] used clustering to group the facilities based 94
43 improved the information retrieval and significantly reduced on the deficiency descriptions stored in the facility condition 95
44 overall searching time of project managers. assessment database. The results have shown that facility 96
45 Image-Based Classification: Construction site photography deficiencies are unique and always a function of location and 97
46 logs comprise a significant chunk of construction documen- type of the facility. Fan et al. [89] employed clustering for 98
47 tation. A novel ML-based classification system is proposed developing construction case retrieval system to identifying 99
48 in [84] which uses Whitening Transform (WT), SVM, and accidents occurred in the past. The goal is to resolve the 100
49 Biased Discriminant Transform (BDT) algorithms to classify disputes before provoking litigation and work interruptions. 101
50 and index construction site images. The proposed approach has It is noticed that the NLP based approaches performed far 102
51 significantly boosted the results of traditional search engines. better than case-based reasoning techniques, while measuring 103
52 Predicting Overrun Potential: Williams et al. [85] analysed A hybrid approach is adopted in [90] to group construction 105
53 highway project bidding data for interested trends informing project documents automatically. The approach initially uses 106
54 about project overruns. Data exploration revealed that bids clustering to generate classes for these documents based on 107
55 with higher ratios tend to have significant cost overruns. textual similarity measures. Later on text classifier is used 108
9
1 to classify relevant documents from the construction docu- et al. [96] developed IR system called Knowledge Map Model 54
2 ment information system. This hybrid approach has drastically System (KMMS) to facilitate construction professionals for 55
3 improved the recall and F-measure. Clustering becomes non- managing and reusing construction knowledge from a va- 56
4 trivial with massive datasets comprising millions of dimen- riety of unstructured documents. Fan et al. [97] proposed 57
7 The NLP is concerned with creating computational models Hsu et al. [98] employed context-based text mining for 3D 62
8 that resemble the linguistic abilities (reading, writing, listening, CAD documents exploration. Traditional systems depend on 63
9 and speaking) of human beings. It provides basic concepts textual naming and require designers to memorise and embed 64
10 and methods for text processing and analysis, such as part these descriptions within the design documents. To this end, 65
11 of speech (POS) tagging, tokenization, sentence splitting, a context-based CAD document retrieval system (CCRS) is 66
12 named entity recognition, and semantic role labelling, etc. This developed for extracting the context from CAD documents 67
13 field brings together diverse techniques from computational into the characteristic document (CD), which is exploited 68
14 linguistics, speech recognition, and speech synthesis to process by query planner to select the documents. Lin et al. [99] 69
15 human languages. studied the retrieval of technical documents like journal papers, 70
16 Examples of Construction Research using NLP: The NLP patents, technical reports, or domain handbooks. A concept- 71
17 has a broad range of applications for knowledge acquisition based IR system is developed to illustrate the effectiveness of 72
18 and retrieval in the construction industry. Al-Qady et al. [91] proposed partitioning approach. It is shown that the proposed 73
19 used NLP to develop ontologies from construction contrac- approach is quite useful for concept-based IR of technical 74
20 tual documents. They employed NLP-based Concept Relation documents. Al-Qasy et al. [100] introduced an electronic doc- 75
21 Identification using Shallow Parsing (CRISP) for automatically ument management system (EDMS) to manage construction 76
22 extracting the concepts and concept relationships from the text project documents. At the crux of this system lies the proposed 77
23 of contract documents. The Kappa score and F-measure have idea of document discourse, which determines the semantic 78
24 significantly improved knowledge acquisition, while construct- similarity of documents. A classification algorithm, using 79
25 ing legal ontology. The works in [92], [93], [94] proposed document discourse, is implemented for classifying project 80
26 an NLP-based information extraction system for automated documents. The system is evaluated by a group of experts. 81
31 processing. A technique for tagging, separation, and sequenc- Rapid urbanisation has escalated construction activities 84
32 ing of regulatory document elements is proposed to generate globally, which triggered construction industry to consume the 85
33 high-quality ontology. The proposed algorithm is tested on bulk of natural resources and produce massive construction 86
34 the regulatory documents, retrieved from the International and demolition (C&D) waste [101]. The adverse impact of 87
35 Building Code and the results are promising with higher construction activities on the environment has serious implica- 88
36 precision and recall. tions worldwide [102]. Existing waste management approaches 89
37 E. Information Retrieval (IR) measures to manage waste only after it happens [103]. These 91
49 Examples of Construction Research using IR: Demian et massive datasets to uncover non-obvious correlations related to 104
50 al. [95] developed CoMem-XML system to augment searching design, procurement, materials, and supply-chain, which could 105
51 through granularity and context. The system is enhanced lead to waste during the actual construction stage. It explores 106
52 for contextual similarity, which is revealed to be of greater waste data in a forward-looking way [104], [106]. Advanced 107
53 usefulness and usability to construction professionals. Tserng analytical approaches could be employed to forecast waste and 108
10
1 prescribe the best course of actions to pre-emptively minimise in next generation GD tools. Table XIV summarises the state 55
5 nificant number of diverse data of building design, material design clashes is an integral part of the building model. Ideally, 58
6 properties, and construction strategies to successfully carry out this phase should be carried out before the start of con- 59
7 the process. Storing these datasets, using traditional technolo- struction stage for effective project management. Traditional 60
8 gies, is not only insurmountable, but the real-time processing paper-based approaches are widely substituted by BIM-enabled 61
9 for underpinning high-dimensional analytical models is highly automated approaches, which are found relatively inefficient 62
10 challenging. This calls for the application of Big Data tech- as well as less accurate to identify the majority of design 63
11 nologies for effective construction waste management. Partic- conflicts. However, existing BIM-enabled conflict resolution 64
12 ularly, robust waste generation estimation models, BIM-based solutions are still tedious and time-consuming for efficient 65
13 optimal materials selection during design specification, and process automation. There are two aspects of these systems. 66
14 holistic waste minimisation framework are key research areas Firstly, adequate knowledge management is at the crux of 67
15 which call for the applications of these Big Data technologies these systems to achieve accuracy. Wang et al. [38] proposed 68
16 to be employed. Table XIV summarises the state of the art and a knowledge-based system for acquiring, formulating, and de- 69
17 potential opportunities for resource and waste optimisation. ploying knowledge in BIM-enabled MEP design coordination. 70
18 Some of these opportunities are further explained in Section However, much is required in this direction. Additionally, for 71
21 B. Value Added Services capabilities. Table XIV summarises the state of the art and 77
24 in the construction industry. have been wide applicability in various domains of the con- 80
48 The tool has to generate and compare a permutation of (1) the problems that have clearly defined and logical solutions; 104
49 models for single client requirement. This field requires more and (2) the problems that have approximate heuristic solutions 105
50 R&D for getting mature to be usable in the enterprise-grade (and no logic-based straightforward solution applies). The 106
51 applications. These challenges of GD tools are expressly the former category is handled through automated approaches, 107
52 jurisdiction of using Big Data technologies. These technologies whereas the later ones are tackled through visualisation. 108
53 can undoubtedly bring new levels of usability, accessibility, Human knowledge, creativity, and intuition are pivotal for 109
54 and democratisation in the design exploration and optimisation effective visualisation. Human knowledge works perfectly with 110
11
1 smaller datasets, but its application in involving high dimen- networking services (BSNS) over the cloud-enabled platform. 58
2 sional larger datasets becomes impractical. The field of Visual The goal is to enhance the overall comprehension of BIM 59
4 soning and visualisation to solve complex analytical problems. However, a robust framework is required to capture every 61
5 Such systems are phenomenal to empower analytical abilities useful social interaction into the BIM right from the design 62
6 of users while perceiving, understanding, and reasoning about to end-of-life of the building. Since data of social interactions 63
7 complex and uncertain situations. VA is one of the key domains are likely to be in variety, velocity, and volume, Big Data 64
8 that require Big Data technologies to execute data visualisation technologies could be harnessed to develop interesting domain 65
9 to provide personal views and interactive exploration of data. applications for enhancing the productivity of stakeholders. 66
10 One of the key reasons behind the widespread adoption Table XIV summarises the state of the art and potential 67
11 of BIM lies in its versatile visualisation capabilities. Existing opportunities for this subdomain. 68
12 software are quite competitive to visualise all dimensions (nD) 6) Personalized Services: In personalised services, the pri- 69
13 of the design using the right set of tools and techniques. mary emphasis lies on an adaptation of the given facilities 70
14 In this context, Castronov et al. [110] studied the role of based on the user’ choice. The users are empowered to 71
15 visualisation in 4D construction management. Shortcomings of control the overall usage of services the way they desire. 72
16 existing BIM visualisation are identified, and general guideli- These systems adapt based on various parameters such as user 73
17 nes/ protocols are prescribed for developing 4D visualisation behaviour. The input to such services could be manual as well 74
18 in BIM authoring tools. To enable participation of technically as automatic. 75
19 unskilled BIM users, Zhadanovsky et al. [154] studied the issue Gao et al. [116] developed SPOT+ system to enable office 76
20 of generating master plan visualisation. Similarly to promote workers to personalise the indoor thermal comfort. SPOT+ 77
21 sustainable energy use, Goodwin et al. [111] employed VA used Predictive Personal Vote (PPV) to automatically adjust in- 78
22 for classifying energy users. The data of household energy door thermal comfort that mainly involve heating. The system 79
23 consumption along with geo-demographic data is used for turns on the heating before the arrival of occupants whereas 80
24 deeper insights. Classification is reported to enable clusters turns the heating off immediately after their departure. Rabbani 81
25 and trends for understanding energy usage. However, state-of- et al. [117] proposed an enhanced personalised thermal com- 82
26 the-art approaches of visualisation are needed during clustering fort system called SPOT* that enables users to adjust lower 83
27 process and decision making to enable overall comprehension. and upper bounds of indoor temperature as desired, which is 84
28 Chuang et al. [112] studied the development of a cloud-enabled automatically regulated accordingly. SPOT* supports heating 85
29 web-based system for BIM visualisation and manipulation. The as well cooling of indoor spaces. The system has significant 86
30 system improved communication and distribution of relevant potential for energy reduction while maintaining the overall 87
31 information among the stakeholders. comfort at desired level. Panagopoulos et al. [118] proposed 88
32 The scope of BIM is widening with more applications the AdaHeat system that uses intelligent agents to regulate 89
33 from construction as well FM stage has started utilising and the heating for domestic consumption. A novel aspect of this 90
34 extending it. As BIM data grows, these models get highly system is that it requires minimal user input. Chen et al. 91
35 dimensional, so the visualisation of high-dimensional BIM [119] studied the correlation of human behaviour and energy 92
36 models is challenging. VA is essential to both BIM and consumption in smart homes. Computational models to predict 93
37 Big Data and provides sophisticated techniques to improve energy consumption based on user behaviour are developed. 94
38 BIM and Big Data visualisation for better comprehension and These models are used to develop a web-based system that 95
39 interpretation. Table XIV summarises the state of the art and provides user with insights based on behaviour for optimal 96
40 potential opportunities for this subdomain. energy consumption. 97
54 Meadati et al. [114] studied the integration of RFID, BIM, and Facilities management (FM) integrate organisational pro- 110
55 social media to support facility managers in locating data from cesses to maintain the agreed services that support and improve 111
56 multiple documents. Jiao et al. [115] brought the web3D-based the effectiveness of its primary activities. Operations and man- 112
57 AR environment for integration of BIM and business social agement are the central parts of FM and are the longest stage in 113
12
1 whole building lifecycle. Mostly FM activities (such as assets data sources and to process it for generating actionable in- 57
2 management, preventive maintenance, etc.) are laborious, and sights. Despite BEMSs use the state-of-the-art multi-processor 58
3 the efficiency of such tasks can improve by incorporating infrastructures, the issue of data management and processing 59
4 suitable supporting technology. Localization information is is reported to have taxed the boundaries of these systems. 60
5 of great importance to these technology solutions. Today Hong et al. [125] proposed a cloud-based storage system to 61
6 these facilities utilise advanced automation and integration to store and process energy data generated from a network of 62
7 measure, monitor, control, and optimise building operations thousands of Zigbee sensors. To persist this data, Singh et al. 63
8 and maintenance. They provide adaptive, real-time control over [126] proposed cloud-based storage and processing architec- 64
9 an ever-expanding array of building activities in response to a ture. Berges et al. [127], [128] proposed novel approach for 65
10 wide range of internal and external data streams. As investment identifying appliances and their events (on/off or low/high) to 66
11 ramps up and more intelligent systems are brought online, measure their electric consumption precisely from the electric 67
12 more data will enter the energy management platform at faster influx. It is reportedly revealed that proposed approach requires 68
14 Taneha et al. [120] proposed an approach to determine the life deployment. Similarly, Goodwin et al. [111] employed 70
15 FM personal location information using localization technolo- visual analytics for energy users classification. It is highlighted 71
16 gies to support FM related activities. The system employs that state-of-the-art approaches of visualisation are at the 72
17 three technologies like RFID, Wireless LAN, and Inertial core of clustering process, decision making, and enhanced 73
18 measurement units (IMUs) for this localization. To reduce the overall comprehension of energy consumption. Wei et al. [128] 74
19 FM cost, Ng et al. [121] applied knowledge discovery and proposed an IOT-based framework to monitor and analyse the 75
20 data mining over the facilities maintenance databases. Liu et energy consumption of Smart Buildings. 76
21 al. [122] evaluated the capabilities of BIM to support the FM The software as mentioned above perfectly presents the 77
22 operations. A detailed needs of FM professionals are identified opportunities for Big Data analytics to advance the field. Point- 78
23 to harness BIM to support relevant tasks. The factors affecting edly, energy-related data is of immense importance for various 79
24 the maintainability of facilities are mainly considered. analytics, which is usually discarded by building owners and 80
25 Motamedi et al. [155] highlighted three challenges faced by utility companies at a time interval. To present this data nicely 81
26 the majority of FM systems. These include (i) inefficient & for advanced analytics is the next frontier of innovation in this 82
27 time-consuming searching interfaces, (ii) no unified interface field. Table XIV summarises the state of the art and potential 83
28 for FM system to exchange information, and (iii) inability to opportunities for this subdomain. 84
37 state of the art and potential opportunities for this subdomain. (BIM) is conceived to revolutionise construction industry in 91
39 Two type of energy software are prevalent. Firstly, building plications for improving the overall building delivery process. 95
40 energy simulation software to model the energy consumption Theoretically, BIM is declared as the de facto standard for 96
41 of buildings. Their accuracy depends on the accuracy of managing building data, its applications, in practice, across 97
42 provided parameters that are fine-tuned by experts. This fine every lifecycle stages of building are yet to develop, however. 98
43 tweaking is laborious and time-consuming. Automatic fine Preconstruction stages are well-known for widely adopting the 99
44 tweaking involves lots of computations. Sanyal et al. [123] BIM, whereas, it is progressively used lesser in the later stages 100
45 studied the automatic generation of accurate input model of building lifecycle [5]. Substantial research is made to extend 101
46 with proposed Autotune workflow for the EnergyPlus energy BIM for encapsulating different types of related data. 102
47 simulation software. Pointedly, it is informed that the software Goedert et al. [133] extended BIM for construction process 103
48 operates on raw data of about 270 terabytes, and condenses documentation. Chiang et al. [157] integrated power consump- 104
49 that to approximately 80 terabytes of useful data. Data storage, tion data with BIM models. Isikdag et al. [134] integrated 105
50 transfer, and processing such datasets is inevitably the subject geographic information systems (GIS) data with BIM for 106
51 of Big Data technologies. developing a fire response system. Yeh et al. [158] employed 107
52 Secondly, Building Energy Management Systems (BEMSs) BIM for onsite building information retrieval using augmented 108
53 are vital for buildings. And as part of their architecture, reality. Wang et al. [38] extended BIM for spatial conflict data 109
54 hundreds to thousands of sensors are installed to capture for MEP models. Yu et al. [135] integrated BIMserver with 110
55 data. Linda et al. [124] used computational intelligence based OpenStudion (a platform for assessing the energy efficiency 111
56 anomaly detection to fuse data from multiple heterogeneous of building designs). Das et al. [20] tailored BIM for social 112
13
1 interactions taking place while reviewing and commenting Amarnath et al. [161] deployed Revit Server on the cloud for 58
2 on different aspects of the design. Zheng et al. [136] in- collaboration and coordination of architectural and structural 59
3 tegrated BIM with diverse project data sources. Chaung et models. Rawai et al. [162] explored cloud computing for 60
4 al. [112] exploited BIM for cloud-enabled design exploration green and sustainable developments. Fathi et al. [142] used 61
5 and manipulation. Jiao et al. [113] sorted out the issues of the cloud for BIM-based context-aware computing. Beach et 62
6 integrating BIM with project schedules, progress monitoring al. [143] discussed the issues of enabling Google SketchUp 63
7 data, and work assignments. Meadit et al. [114] integrated over the Amazon EC2 cloud. Chong et al. [144] evaluated 64
8 RFID data into BIM to locate project documents. Volk et al. existing cloud computing applications and highlighted Google 65
9 [129] illustrated the automated creation of BIM models for Apps, Autodesk BIM 360, and Viewpoint, among others, 66
11 These illustrate the gradual increase in the size and scope al. [145] used the cloud for creating e-procurement platform— 68
12 of the contents of BIM models, which eventually restricts the Cloud Marketplaces. Jiao et al. [113] exploited cloud frame- 69
13 capabilities of traditional BIM-based storage and processing work for integrating project management data with building 70
14 systems. To tackle this, Jiao et al. [6] tailored MapReduce for models. Jiao et al. [115] integrated cloud computing with 71
15 storage and processing BIM. However, there are still many latest technologies such as AR and business social networking 72
16 use cases which may require sophisticated customizations to services to create virtual environment to visualise better and 73
17 the way BIM is stored and processed. So in future, we are understand BIM models. Wong et al. [137] highlighted the 74
18 expecting BIM specialised Big Data storage and processing legal issues related to cloud-based BIM models, including 75
19 platforms. Up until recently, BIM is envisaged to contain data security, responsibility, liability, and design ownership. 76
20 of construction industry only; however the emergence of linked Cloud computing has already accelerated the uptake of IT 77
21 building data has changed this perception. Despite linking BIM in the construction industry by transforming many domain 78
22 data to inter-industry applications, many interesting applica- specific applications as discussed above. And the role of 79
23 tions can be developed by enabling the integration of BIM with Big Data in this transformation is overwhelming. Table XIV 80
24 Linked Open Data (LOD) datasets, such as weather, flooding, summarises the state of the art and potential opportunities for 81
28 area of BIM. Table XIV summarises the state of the art and fact about Internet is it keeps evolving since its percep- 84
29 potential opportunities for this subdomain. tion. It started with Internet-of-Computers and had evolved 85
30 2) Big Data with Cloud Computing: Cloud computing is shift. With fast emerging technologies, the devices are getting 87
31 Internet computing paradigm in which on-demand access to a smaller and powerful, and the broadband connectivity is get- 88
32 shared pool of configurable resources is provided [159]. The ting cheaper and ubiquitous. This has led to the proliferation 89
33 idea is to outsource data storage and computation to third-party of connected devices on the Internet, eventually resulted in an 90
34 datacentres. Multiple users can simultaneously access the cloud exciting trend coined as the Internet-of-Things (IOT) [150]. 91
35 services without having to purchase individual licenses. Cloud The primary vision behind IOT is to bring together the smart 92
36 computing offers three service models. (i) Infrastructure-as-a- devices and objects the vital parts of Internet. Fusing these 93
37 service (IaaS): In IaaS, the user is provided with an abstrac- exciting physical and digital worlds are creating fascinating 94
38 tion to manage virtual/physical computers and cloud network opportunities of growth. Some of the popular areas where 95
39 services. (ii) Platform-as-a-service (PaaS): In PaaS, a user IOT applications are successfully demonstrated across the 96
40 is provided with services pertaining to development environ- industries include logistics, transport, assets tracking, smart 97
41 ments such as operating systems, programming languages, or homes, smart buildings, to energy, defence and agriculture. 98
42 databases, among others; (iii) Software-as-a-service (SaaS): In Elghamrawy et al. [146] demonstrated RFID usage for 99
43 SaaS, the user is provided access to enterprise applications via construction monitoring and quality control. Meadati et al. 100
44 the internet such as Revit 360. [114] integrated RFID with 3D BIM documents of assets 101
45 Cloud computing is widely adopted in the construction for searching and locating objects quickly. Wei et al. [128] 102
46 industry since it supports the integration of tasks in BIM-based proposed an IOT-based framework for building energy mon- 103
47 applications. Hong et al. [125] utilised cloud computing for itoring. Zanella et al. [148] presented specifications of urban 104
48 building energy management systems using Zigbee sensors. IOT to envision the idea of Smart Cities. Kortuem et al. [163] 105
49 Das et al. [20] proposed a cloud-based BIM framework for discussed the technical specifications of the smart object for 106
50 integrating stakeholders interactions with BIM. Zhang et al. petrochemical and road construction industries. Curry et al. 107
51 [136] utilised private clouds to offer BIM services across the [147] examined the storage and processing of energy sensors 108
52 whole building lifecycle. Klinc et al. [139] proposed SaaS data using cloud-based data management framework. 109
53 platform for the structural analysis applications. Kumar et The applications of IOT are non-trivial and often deploy 110
54 al. [140] employed cloud for SMEs design and construction hundreds or even thousands of sensor devices for data collec- 111
55 firms. Chuang et al. [112] used cloud computing for BIM tion. Since construction industry presents unlimited use cases 112
56 design exploration and manipulation. Redmond et al. [160] for IOT, Big Data is inherently the subject of interest. IOT and 113
57 employed cloud for interoperability between BIM applications. Big Data are complementary trends, with former to generate 114
14
1 large volumes of data and the later to store and analyse buildings APIs can bridge this technology gap and enable 58
2 these data at the real time in construction specific domain integration of sensors, users, control systems, machinery for 59
3 applications. Table XIV summarises the state of the art and providing innovative smart building services that promise com- 60
4 potential opportunities for this subdomain. fort, safety, and energy. Table XIV summarises the potential 61
8 energy and producing lots of greenhouse gases [159]. Smart reality (AR), which is an offshoot of virtual reality, is the field 65
9 building technology is a paradigm shift to embrace the integra- in which computer-generated virtual objects are superimposed 66
10 tion of contemporary technologies with the prevailing building over real-world scenes to produce mix worlds. It enables a 67
11 systems for striking the trade-off between the comfort maximi- semi-immersive environment that accurately aligns real scenes 68
12 sation and energy minimisation [149]. Building systems such with corresponding virtual world imagery. This mixed overlay 69
13 as building automation, life safety, telecommunications, user enables the users to obtain additional information about the 70
14 systems, facility management systems, among others, provide real world. It is an emerging technology for enhancing human 71
16 allows the users to control their interactions with building Rankohi et al. [165] argued that visualisation and simulation 73
17 services better. The smart building incorporates technologies aspects of the construction industry apps can be revamped 74
18 into building systems through a unified view. Often, these with AR to enhance their usability. Some of the exciting AR 75
19 systems generate vast amounts of data and majority of this application areas are highlighted such as virtual site visits, 76
20 data remain untapped and often discarded. To truly realise proactive schedule dispute identification and resolution, and 77
21 smart buildings, this data of unprecedented size need to be as-planned vs. as-built comparison. Chi et al. [166] pointed 78
22 analysed—a task that presents significant data management out the following four pillars for wider AR adoption in the 79
23 and processing issues. To this end, Big Data analytics is of construction industry. (i) Localization, the ability to accurately 80
24 immense importance to optimise total building performance impose virtual object on the real-life scene. (ii) A natural user 81
25 via predictive analytics. interface, which provides easy and intuitive user experiences to 82
26 McKinsey [159] highlighted smart buildings amongst the increase the usability of AP software. (iii) Cloud computing, 83
27 top ten emerging technology businesses. Azam et al. [149] which enables apps to store and retrieve information seam- 84
28 implemented a prototype software Project Dasher to illustrate lessly everywhere, and (iv) mobile devices, which are getting 85
29 Smart Buildings. Data of sensors related to motion, CO2 , smaller, cheaper, and powerful and play a vital role in AR 86
30 temperature, airflow, lighting, and other acoustics properties environment. William et al. [153] went ahead by bringing 87
31 are gathered and analysed. It is reportedly revealed that more BIM, mobile technology and AR together. The BIM aspects of 88
32 than 2 billion data entries are accumulated in 3 months that geometry translation, indoor localization, attribute assignment, 89
33 reached the limits of legacy relational databases. Stankovic and registration are explored for integration with mobile AR. 90
34 et al. [150] developed sensor based fire-fighting systems for The study proposed BIM2MAR, which provides general guide- 91
35 skyscraper office building with the authorities to detect fires, lines for integrating BIM with mobile AR. It is emphasised 92
36 alter fire situations, and aid in evacuation. Bonino et al. [151] robust BIM integration requires new approaches for BIM 93
37 studied complex event processing in smart buildings. spChain geometry conversion and indoor localisation of BIM using 94
38 framework is proposed to support the real-time processing of geo-coordinates. Jiao et al. [115] developed a web3D-based 95
39 sensor data. Miller et al. [152] analysed significant energy data AR environment to integrate BIM, business social networking 96
40 through proposed DayFilter approach to precisely identify the services (BSNS), and cloud services. 97
41 diurnal patterns from the data. AR and Big Data inevitably converge. The complexity 98
53 APIs are often proprietary and lack standardization in many There are many interesting open research issues within the 109
54 cases. For these reasons, these APIs can only be exploited by construction industry for Big Data. Some of these include (but 110
55 the BMS software itself, and are not amenable to the third- are not limited to) the following: 111
1 Construction waste minimisation is the perennial issue of Currently, BIM is prevalent in the design world, with very 57
2 the construction industry. Estimating construction waste accu- limited utilisation across the construction and FM stages of 58
3 rately, at the early stages of design or as the project proceeds, is the building. The real intent of BIM could never be achieved 59
4 core to so many exciting project activities. Particularly, waste until it is employed in every stage of the building lifecycle. 60
5 estimation is preliminary to waste minimisation at the early At present, no such mechanism can facilitate the tracking of 61
6 stages of design, where it provides insights about how the de- progress of various construction sites using automated tools. 62
7 sign is generating waste. These insights enable the designers to It is indeed labour-intensive as well impractical (to some 63
8 explore further and carry out corrective measures proactively, extent) to update the BIM model with such minute details 64
9 for waste efficiency at the early stages of design. So, construc- pertaining to the daily construction progress. As a result, real- 65
10 tion waste estimation has become the key research question time construction progress monitoring is not an easy task, 66
11 in construction waste management research. This estimation because managers are required to visit their sites regularly and 67
12 requires thorough design exploration and optimisation from a assess the progress subjectively with the intended schedule, 68
13 myriad of dimensions. Existing waste estimation models are which is less effective and error-prone. Employing Big Data 69
14 based on very limited, and static project attributes such as and sensing technologies could move the state of the art in 70
15 GFA, project contract sum, etc. [107], [108], [167], [168]. domain of construction progress monitoring to the next level. 71
16 However, these attributes are incapable of informing about Using latest imaging technology, the progress of the on-going 72
17 the true size of construction waste, hence unable to generate construction is captured at the real time. Big Data analytics 73
18 a reliable waste estimate, regardless of how much data is will process the real-time streams of these images to measure 74
19 used during their model development. A comprehensive waste the daily change and updated the BIM models and construction 75
20 estimation model that considers dynamic project attributes of schedule accordingly. The project managers are presented with 76
21 deconstruction, standardisation and dimension coordination, an update to date progress on the schedule, which will, in turn, 77
22 reuse and recycling, and procurement, among together, needs enable them to see whether they are lagging behind on the 78
23 to be developed. The model is also required to consider project or still follow the schedule. Accordingly, the project 79
24 many attributes of construction materials, which heralds the managers can proactively respond in case of any delay is 80
25 development of a comprehensive materials database using reported. This will save them a lot of money due to penalty 81
26 open and linked data standards. The waste estimation model whenever the deadline is missed, and improve the overall 82
27 and construction materials database will be bundled into a project monitoring and control. This is also aligned with the 83
28 standard and handy simulation tool, where waste estimates are vision of BIM adoption. In this way, Big Data can help the 84
29 visualised onto design elements through analytical dashboard industry to deliver the projects on time. 85
33 to backstage its storage and computation related workloads. requirements and the designers experience. Thus, such designs 88
35 Existing interoperability efforts in the construction industry the data collected by manufacturers on hundred or thousands 91
36 are mainly concerned about exchanging the building data of their product lines during design specification, which might 92
37 between domain-specific applications (architectural, structural, be quite valuable. Similarly, many other sources of data can 93
38 MEP, energy simulation, etc.) pertaining to the construction be relevant to designs such as users’ sentiments while inter- 94
39 industry. However, many interesting use cases can be achieved acting with facilities, weather, flooding, energy consumption, 95
40 from greater integration of BIM data with external data commute pattern in that vicinity, and population densities, 96
41 sources such as materials, GIS, sensors, geodata, etc. This to name a few. These datasets could be harnessed to sup- 97
42 interoperability, at a wider scale, enables the construction port for example the generation of an optimal construction 98
43 industry to achieve automation of its business processes, which schedule. And the good thing is that these data are captured 99
44 can improve the overall efficiency of the project participants. using technologies such as the web, sensors, smart meters, 100
45 Linked data coupled with the Web of data technologies are mobile phones, etc., and are made available through open 101
46 found phenomenal for this integration. Substantial progress is data initiative (in most cases). However, the design world 102
47 made to develop various enabling artefacts for this integration is still detached from harnessing these data sources for their 103
48 such as ifcOWL ontology [169], [170]. However, much has purpose. Currently, there is no such tool that can facilitate the 104
49 yet to be done. To this end, the development of a robust BDA designers to leverage these data during their design activities. 105
50 enabled platform that supports the storage and processing of If this is achieved, this can result in the paradigm shift 106
51 these diverse linked data sets pertaining to the building as well of Design with Data, where these diverse data sources are 107
52 as other data, is required. This platform can provide the basis integrated within the BIM authoring tools and made available 108
53 for the development of interesting applications, particularly for to architects, engineers, contractors, and facility managers at 109
54 energy analytics and smart buildings. early design stages. Big Data Analytics is indeed the key 110
1 requirements of sustainability, users, environment, and even that construction industry can gain huge revenue from this 54
2 broader infrastructures of emerging concept of smart cities. investment as experienced by other industries, provided the 55
3 VI. P ITFALLS OF B IG DATA IN C ONSTRUCTION implication of Big Data is, however, difficult to quantify. More 57
7 concern. This section discusses some of these challenges and To monitor project site activities at real-time, instant data 61
8 provides suggestions to deal with them for the successful transmission between project sites (dams, highways, etc.) and 62
9 implementation and dissemination of Big Data technologies centralised Big Data repository should be supported. However, 63
10 across various domain applications of the construction indus- project sites usually have low bandwidth; due to unavailability 64
11 try. of sophisticated networking infrastructure in rural, underde- 65
12 A. Data Security, Privacy and Protection: veloped areas. Advanced wireless sensor networks need to be 66
17 access control, intrusion prevention, Denial of Service (DoS) The effectiveness of Big Data cannot be measured just by 71
18 prevention, etc. [171], [172], [173]. These issues also require accumulating large volumes of data; it is more of the use 72
19 more study in the context of BIM-related construction data, cases or industrial problems that dictate the usefulness of 73
20 and the appropriate solutions also need to be adopted in the these technologies. It is feared that the construction industry 74
21 underlying analytics workflows. might not extract the full value of accessible Big BIM Data 75
22 B. Data Quality of Construction Industry Datasets: if the conceived use cases are vague. To this end, researchers 76
36 pessimistic feeling in the industry. Consequently, the industry Although the construction industry generates massive 89
37 will be reluctant towards adopting such fascinating trends like amounts of data throughout the life cycle of a building, 90
38 Big Data. the adoption of Big Data technology in this sector lags the 91
41 Every technology incurs cost so introducing Big Data in in leveraging such technologies for improving the efficiency 95
42 construction is not for free of charge. Companies are required of construction processes. In this exploratory study, we have 96
43 to set up data centres and purchase software licenses, which analysed the extent to which the industry has employed Big 97
44 can be an attractive investment. Also, skilled IT personnel Data technologies. Towards this end, we have reviewed not 98
45 to keep the entire ecosystem running is another overhead. only the latest research but also relevant research articles that 99
46 So Big Data has inevitably substantial cost implication. The have been published over the last few decades in which the 100
47 construction business is considered amongst the low-profit- precursor to modern Big Data Analytics techniques have been 101
48 margin businesses, and introducing such costly add-ons to deployed in various domain-specific construction applications. 102
49 projects are more likely to be opposed and difficult to be Principal Big Data technology streams are explained to help 103
50 defended. However, Big Data has the potential to enhance the readers to understand the complicated subject. Concepts of Big 104
51 overall project delivery by optimising processes and reducing Data Engineering and Big Data Analytics are demarcated; the 105
52 risks that companies usually bear due to myriad inefficien- works utilising these technologies across various subdomains 106
53 cies such as delays, litigations, etc. It is highly optimistic of the construction industry are deliberated. 107
17
1 Through our research, we conclude that while data-driven [14] V. S. Agneeswaran, Big Data Analytics Beyond Hadoop: Real-Time 62
2 analytics have long been used in the construction industry due Applications with Storm, Spark, and More Hadoop Alternatives. FT 63
4 struction subdomains, the adoption of the recent, much agiler [15] H. Karau, A. Konwinski, P. Wendell, and M. Zaharia, Learning Spark: 65
Lightning-Fast Big Data Analysis. ” O’Reilly Media, Inc.”, 2015. 66
5 and powerful, Big Data technology has been relatively slow.
[16] C.-Y. Chang and M.-D. Tsai, “Knowledge-based navigation system for 67
6 Although Big Data trend is gradually creeping in the industry; building health diagnosis,” Advanced Engineering Informatics, vol. 27, 68
7 its applicability is amplified further by many other emerging no. 2, pp. 246–260, 2013. 69
8 trends such as BIM, IOT, cloud computing, smart buildings, [17] T. White, Hadoop: The definitive guide. ” O’Reilly Media, Inc.”, 2012. 70
9 and augmented reality, which are also slightly elaborated. We
[18] P. Helland, “If you have too much data, then’good enough’is good 71
10 presented some of the prominent future works along with enough,” Communications of the ACM, vol. 54, no. 6, pp. 40–47, 2011. 72
11 potential pitfalls associated with Big Data while adopting it [19] M. Gelbmann, “Graph DBMSs are gaining in popularity faster than 73
12 in the industry. To the best of our knowledge, this is the any other database category,” 2014. 74
13 first in-depth review of the applications of Big Data related [20] M. Das, J. C. Cheng, and S. S. Kumar, “BIMCloud: A distributed 75
14 techniques in the construction industry. In our work, we have cloud-based social BIM framework for project collaboration,” in the 76
15 identified many potential application areas in which Big Data 15th International Conference on Computing in Civil and Building 77
16 techniques can significantly advance the state-of-the-art in the Engineering (ICCCBE 2014), Florida, United States, 2014. 78
17 construction industry. This work is of utility and relevance to [21] S. Jeong, J. Byun, D. Kim, H. Sohn, I. H. Bae, and K. H. Law, “A 79
18 all the construction researchers and practitioners who will like data management infrastructure for bridge monitoring,” in SPIE Smart 80
Structures and Materials+ Nondestructive Evaluation and Health 81
19 to harness the power of Big Data in the construction industry Monitoring, pp. 94350P–94350P, International Society for Optics and 82
20 for developing exciting business applications. Photonics, 2015. 83
24 Mifflin Harcourt, 2013. natural science and statistics, 1830–1835,” The Historical Journal, 88
vol. 26, no. 03, pp. 587–616, 1983. 89
25 [2] E. Siegel, Predictive Analytics: The Power to Predict who Will Click,
26 Buy, Lie, Or Die. John Wiley & Sons, 2013. [24] R. O. Duda, P. E. Hart, et al., Pattern classification and scene analysis, 90
vol. 3. Wiley New York, 1973. 91
27 [3] “Blog post by Anand Rajaraman on how ‘More Data Usually
28 Beats Better Algorithms’ (2013) [Online].” http://anand.typepad.com/ [25] S. Owen, R. Anil, T. Dunning, and E. Friedman, Mahout in action. 92
30 [4] C. Anderson, “The end of theory,” Wired Magazine, vol. 16, 2008. [26] R. Ihaka and R. Gentleman, “R: a language for data analysis and 94
31 [5] R. Eadie, M. Browne, H. Odeyinka, C. McKeown, and S. McNiff, graphics,” Journal of computational and graphical statistics, vol. 5, 95
32 “BIM implementation throughout the UK construction project lifecy- no. 3, pp. 299–314, 1996. 96
33 cle: An analysis,” Automation in Construction, vol. 36, pp. 145–151, [27] R. Yadav, Spark Cookbook. Packt Publishing Ltd, 2015. 97
34 2013. [28] J. Dean, Big data, data mining, and machine learning: value creation 98
35 [6] Y. Jiao, S. Zhang, Y. Li, Y. Wang, B. Yang, and L. Wang, “An for business leaders and practitioners. John Wiley and Sons, 2014. 99
36 augmented Mapreduce framework for building information modeling [29] L. Wasserman, All of statistics: a concise course in statistical infer- 100
37 applications,” in Computer Supported Cooperative Work in Design ence. Springer Science & Business Media, 2013. 101
38 (CSCWD), Proceedings of the 2014 IEEE 18th International Confer-
39 ence on, pp. 283–288, IEEE, 2014. [30] D. S. Moore and G. P. McCabe, Introduction to the Practice of 102
Statistics. WH Freeman/Times Books/Henry Holt & Co, 1989. 103
40 [7] J.-R. Lin, Z.-Z. Hu, J.-P. Zhang, and F.-Q. Yu, “A natural-language-
41 based approach to intelligent data retrieval and representation for cloud [31] H. Kim, L. Soibelman, and F. Grobler, “Factor selection for delay 104
42 BIM,” Computer-Aided Civil and Infrastructure Engineering, 2015. analysis using knowledge discovery in databases,” Automation in 105
Construction, vol. 17, no. 5, pp. 550–560, 2008. 106
43 [8] G. Aouad, M. Kagioglou, R. Cooper, J. Hinks, and M. Sexton, “Tech-
44 nology management of it in construction: a driver or an enabler?,” [32] P. Carrillo, J. Harding, and A. Choudhary, “Knowledge discovery 107
45 Logistics Information Management, vol. 12, no. 1/2, pp. 130–137, from post-project reviews,” Construction Management and Economics, 108
47 [9] F. Provost and T. Fawcett, Data Science for Business: What you need [33] T. S. Mahfouz, Construction legal support for differing site conditions 110
48 to know about data mining and data-analytic thinking. ” O’Reilly (DSC) through statistical modeling and machine learning (ML). PhD 111
50 [10] F. T. Matsunaga, J. D. Brancher, and R. M. Busto, “Data mining [34] X. Jiang and S. Mahadevan, “Bayesian probabilistic inference for 113
51 techniques and tasks for multidisciplinary applications: a system- nonparametric damage detection of structures,” Journal of engineering 114
52 atic review,” Revista Eletrônica Argentina-Brasil de Tecnologias da mechanics, vol. 134, no. 10, pp. 820–831, 2008. 115
53 Informação e da Comunicação, vol. 1, no. 2, 2015. [35] J. Gong, C. H. Caldas, and C. Gordon, “Learning and classifying 116
54 [11] D. Singh and C. K. Reddy, “A survey on platforms for big data actions of construction workers and equipment using bag-of-video- 117
55 analytics,” Journal of Big Data, vol. 29, no. 9, 2015. feature-words and Bayesian network models,” Advanced Engineering 118
56 [12] J. Qadir, N. Ahad, E. Mushtaq, and M. Bilal, “SDNs, clouds, and Informatics, vol. 25, no. 4, pp. 771–782, 2011. 119
57 big data: New opportunities,” in Frontiers of Information Technology [36] Y. Huang and J. L. Beck, “Novel sparse Bayesian learning for struc- 120
58 (FIT), 2014 12th International Conference on, pp. 28–33, IEEE, 2014. tural health monitoring using incomplete modal data,” in Proceedings 121
59 [13] J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on of the 2013 ASCE International Workshop on Computing in Civil 122
60 large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107– Engineering, Los Angeles, 2013. 123
61 113, 2008. [37] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining to 124
18
1 knowledge discovery in databases,” AI magazine, vol. 17, no. 3, p. 37, [58] D. Arditi and T. Pulket, “Predicting the outcome of construction 65
2 1996. litigation using boosted decision trees,” Journal of Computing in Civil 66
3 [38] L. Wang and F. Leite, “Knowledge discovery of spatial conflict Engineering, vol. 19, no. 4, pp. 387–393, 2005. 67
4 resolution philosophies in BIM-enabled mep design coordination using [59] J.-H. Chen and S. Hsu, “Hybrid ann-cbr model for disputed change 68
5 data mining techniques: a proof-of-concept,” in Proceedings of the orders in construction projects,” Automation in Construction, vol. 17, 69
6 ASCE International Workshop on Computing in Civil Engineering no. 1, pp. 56–64, 2007. 70
7 (WCCE13), pp. 419–426, 2013. [60] M. Siu, R. Ekyalimpa, M. Lu, and S. Abourizk, “Applying regres- 71
8 [39] M. Kantardzic, Data mining: concepts, models, methods, and algo- sion analysis to predict and classify construction cycle time,” Proc., 72
9 rithms. John Wiley & Sons, 2011. Computing in Civil Engineering (2013). 73
10 [40] M. J. Berry and G. S. Linoff, Data mining techniques: for marketing, [61] A. Aibinu and G. Jagboro, “The effects of construction delays on 74
11 sales, and customer relationship management. John Wiley & Sons, project delivery in nigerian construction industry,” International jour- 75
12 2004. nal of project management, vol. 20, no. 8, pp. 593–599, 2002. 76
13 [41] T. Rujirayanyong and J. J. Shi, “A project-oriented data warehouse for [62] M. Sambasivan and Y. W. Soon, “Causes and effects of delays 77
14 construction,” Automation in Construction, vol. 15, no. 6, pp. 800–807, in malaysian construction industry,” International Journal of project 78
15 2006. management, vol. 25, no. 5, pp. 517–526, 2007. 79
16 [42] X. Wu, X. Zhu, G.-Q. Wu, and W. Ding, “Data mining with big data,” [63] S. M. Trost and G. D. Oberlender, “Predicting accuracy of early cost 80
17 Knowledge and Data Engineering, IEEE Transactions on, vol. 26, estimates using factor analysis and multivariate regression,” Journal of 81
18 no. 1, pp. 97–107, 2014. Construction Engineering and Management, vol. 129, no. 2, pp. 198– 82
19 [43] R. Buchheit, J. Garrett Jr, S. Lee, and R. Brahme, “A knowledge 204, 2003. 83
20 discovery case study for the intelligent workplace,” Proc., Computing [64] A. P. Chan, D. W. Chan, Y. Chiang, B. Tang, E. H. Chan, and K. S. 84
21 in Civil Engineering, pp. 914–921, 2000. Ho, “Exploring critical success factors for partnering in construction 85
22 [44] L. Soibelman and H. Kim, “Data preparation process for construction projects,” Journal of Construction Engineering and Management, 86
23 knowledge generation through knowledge discovery in databases,” vol. 130, no. 2, pp. 188–198, 2004. 87
24 Journal of Computing in Civil Engineering, 2002. [65] D. Fang, Y. Chen, and L. Wong, “Safety climate in construction indus- 88
25 [45] C.-W. Liao and Y.-H. Perng, “Data mining for occupational injuries try: A case study in hong kong,” Journal of construction engineering 89
26 in the taiwan construction industry,” Safety Science, vol. 46, no. 7, and management, 2006. 90
27 pp. 1091–1102, 2008. [66] K. Pietrzyk, “A systemic approach to moisture problems in buildings 91
28 [46] C.-W. Cheng, S.-S. Leu, Y.-M. Cheng, T.-C. Wu, and C.-C. Lin, for mould safety modelling,” Building and Environment, vol. 86, 92
29 “Applying data mining techniques to explore factors contributing pp. 50–60, 2015. 93
30 to occupational injuries in taiwan’s construction industry,” Accident [67] V. S. Desai and S. Joshi, “Application of decision tree technique to 94
31 Analysis & Prevention, vol. 48, pp. 214–222, 2012. analyze construction project data,” in Information Systems, Technology 95
32 [47] S.-W. Oh, M.-H. Kim, and Y.-S. Kim, “The application of data ware- and Management, pp. 304–313, Springer, 2010. 96
33 house for developing construction productivity management system,” [68] H.-B. Liu and Y.-B. Jiao, “Application of genetic algorithm-support 97
34 Korean Journal of Construction Engineering and Management, vol. 7, vector machine (ga-svm) for damage identification of bridge,” Interna- 98
35 no. 2, pp. 127–137, 2006. tional Journal of Computational Intelligence and Applications, vol. 10, 99
36 [48] D. Koonce, L. Huang, and R. Judd, “EQL an express query language,” no. 04, pp. 383–397, 2011. 100
37 Computers & industrial engineering, vol. 35, no. 1, pp. 271–274, 1998. [69] T. Mahfouz, J. Jones, and A. Kandil, “A machine learning approach 101
38 [49] W. Mazairac and J. Beetz, “BIMQL–an open query language for for automated document classification: A comparison between svm 102
39 building information models,” Advanced Engineering Informatics, and lsa performances,” International Journal of Engineering Research 103
40 vol. 27, no. 4, pp. 444–456, 2013. & Innovation, p. 53, 2010. 104
41 [50] L. Soibelman, L. Y. Liu, and J. Wu, “Data fusion and modeling [70] T. Mahfouz and A. Kandil, “Construction legal decision support using 105
42 for construction management knowledge discovery,” in International support vector machine (svm),” Proc. of the CRC 2010: Innovation 106
43 Conference on Computing in Civil and Building Engineering, Weimar, for Reshaping Construction, 2010. 107
44 Germany, 2004.
[71] D. Dehestani, F. Eftekhari, Y. Guo, S. Ling, S. Su, and H. Nguyen, 108
45 [51] I. H. Witten and E. Frank, Data Mining: Practical machine learning “Online support vector machine application for model based fault 109
46 tools and techniques. Morgan Kaufmann, 2005. detection and isolation of hvac system,” International Journal of 110
47 [52] J. E. Diekmann and T. A. Kruppenbacher, “Claims analysis and Machine Learning and Computing, vol. 1, no. 1, p. 66, 2011. 111
48 computer reasoning,” Journal of construction engineering and man- [72] Q. Chen, Y. Chan, and K. Worden, “Structural fault diagnosis and iso- 112
49 agement, vol. 110, no. 4, pp. 391–408, 1984. lation using neural networks based on response-only data,” Computers 113
50 [53] D. Arditi, F. E. Oksay, and O. B. Tokdemir, “Predicting the outcome of & structures, vol. 81, no. 22, pp. 2165–2172, 2003. 114
51 construction litigation using neural networks,” Computer-Aided Civil [73] T. Marwala and S. Chakraverty, “Fault classification in structures with 115
52 and Infrastructure Engineering, vol. 13, no. 2, pp. 75–81, 1998. incomplete measured data using autoassociative neural networks and 116
53 [54] M. P. Kim, “Us army corps engineers construction contract claims genetic algorithm,” Bangalore-Current Science, vol. 90, no. 4, p. 542, 117
54 guidance system,” in Excellence in the Constructed Project, pp. 203– 2006. 118
55 209, ASCE, 1989. [74] X. Fang, H. Luo, and J. Tang, “Structural damage detection using neu- 119
56 [55] K. Chau, “Predicting construction litigation outcome using particle ral network with learning rate improvement,” Computers & structures, 120
57 swarm optimization,” in Innovations in Applied Artificial Intelligence, vol. 83, no. 25, pp. 2150–2161, 2005. 121
58 pp. 571–578, Springer, 2005. [75] O. Moselhi, T. Hegazy, and P. Fazio, “Neural networks as tools in 122
59 [56] O. B. Tokdemir, “Using case-based reasoning to predict the outcome construction,” Journal of Construction Engineering and Management, 123
60 of construction litigation,” Computer-Aided Civil and Infrastructure 1991. 124
61 Engineering, vol. 14, no. 6, pp. 385–393, 1999. [76] Y.-J. Chen, C.-W. Feng, Y.-R. Wang, H.-M. Wu, et al., “Using BIM 125
62 [57] K. Chau, “Prediction of construction litigation outcome–a case-based model and genetic algorithms to optimize the crew assignment for 126
63 reasoning approach,” in Advances in Applied Artificial Intelligence, construction project planning,” International Journal of Technology, 127
64 pp. 548–553, Springer, 2006. no. 3, pp. 179–187, 2011. 128
19
1 [77] H. Moon, H. Kim, L. Kang, and C. Kim, “BIM functions for [97] H. Fan, F. Xue, and H. Li, “Project-based as-needed information 66
2 optimized construction management in civil engineering,” Gerontech- retrieval from unstructured AEC documents,” Journal of Management 67
3 nology, vol. 11, no. 2, p. 67, 2012. in Engineering, vol. 31, no. 1, p. A4014012, 2014. 68
4 [78] D. M. Salama and N. M. El-Gohary, “Semantic text classification for [98] J.-y. Hsu et al., “Content-based text mining technique for retrieval of 69
5 supporting automated compliance checking in construction,” Journal cad documents,” Automation in Construction, vol. 31, pp. 65–74, 2013. 70
6 of Computing in Civil Engineering, p. 04014106, 2013. [99] H.-T. Lin, N.-W. Chi, and S.-H. Hsieh, “A concept-based information 71
7 [79] T. Mahfouz, “Unstructured construction document classification model retrieval approach for engineering domain-specific technical docu- 72
8 through support vector machine (SVM),” pp. 126–133, 2011. ments,” Advanced Engineering Informatics, vol. 26, no. 2, pp. 349– 73
10 classification for construction management information systems,” Au- [100] M. Al Qady and A. Kandil, “Document discourse for managing 75
11 tomation in Construction, vol. 12, no. 4, pp. 395–406, 2003. construction project documents,” Journal of Computing in Civil Engi- 76
12 [81] N. Ur-Rahman and J. A. Harding, “Textual data mining for industrial neering, vol. 27, no. 5, pp. 466–475, 2012. 77
13 knowledge management and text classification: A business oriented [101] M. Osmani, J. Glass, and A. D. Price, “Architect and contractor 78
14 approach,” Expert Systems with Applications, vol. 39, no. 5, pp. 4729– attitudes to waste minimisation,” 2006. 79
15 4739, 2012.
[102] L. O. Oyedele, M. Regan, J. von Meding, A. Ahmed, O. J. Ebohon, 80
16 [82] S. Liu, C. A. McMahon, and S. J. Culley, “A review of structured and A. Elnokaly, “Reducing waste to landfill in the UK: identifying 81
17 document retrieval (SDR) technology to improve information access impediments and critical solutions,” World Journal of Science, Tech- 82
18 performance in engineering document management,” Computers in nology and Sustainable Development, vol. 10, no. 2, pp. 131–142, 83
19 Industry, vol. 59, no. 1, pp. 3–16, 2008. 2013. 84
20 [83] L. Soibelman, J. Wu, C. Caldas, I. Brilakis, and K.-Y. Lin, “Manage- [103] Z. Wu, T. Ann, L. Shen, and G. Liu, “Quantifying construction and 85
21 ment and analysis of unstructured construction data types,” Advanced demolition waste: An analytical review,” Waste Management, vol. 34, 86
22 Engineering Informatics, vol. 22, no. 1, pp. 15–27, 2008. no. 9, pp. 1683–1692, 2014. 87
23 [84] I. Brilakis, Content based integration of construction site images in
[104] W. Lu, H. Yuan, J. Li, J. J. Hao, X. Mi, and Z. Ding, “An empirical 88
24 AEC/FM model based systems. PhD thesis, 2005.
investigation of construction and demolition waste generation rates 89
25 [85] T. P. Williams, “Using classification rules to develop a predictive indi- in shenzhen city, south china,” Waste management, vol. 31, no. 4, 90
26 cator of project cost overrun potential from bidding data,” Computing pp. 680–687, 2011. 91
27 in Civil Engineering (2007), p. 35, 2007.
[105] L. L. Ekanayake and G. Ofori, “Building waste assessment score: 92
28 [86] A. O. Elfaki, S. Alatawi, and E. Abushandi, “Using intelligent design-based tool,” Building and Environment, vol. 39, no. 7, pp. 851– 93
29 techniques in construction project cost estimation: 10-year survey,” 861, 2004. 94
30 Advances in Civil Engineering, vol. 2014, 2014.
[106] M. Bilal, L. O. Oyedele, O. O. Akinade, S. O. Ajayi, H. A. Alaka, 95
31 [87] S. Han, S. Lee, and F. Peña-Mora, “A machine-learning classification H. A. Owolabi, J. Qadir, M. Pasha, and S. A. Bello, “Big data 96
32 approach to automatic detection of workers actions for behavior-based architecture for construction waste analytics (cwa): A conceptual 97
33 safety analysis,” in Proceedings of ASCE International Workshop on framework,” Journal of Building Engineering, 2016. 98
34 Computing in Civil Engineering, 2012.
[107] W. Lu, X. Chen, Y. Peng, and L. Shen, “Benchmarking construction 99
35 [88] H. Ng, A. Toukourou, and L. Soibelman, “Knowledge discovery in a waste management performance using big data,” Resources, Conser- 100
36 facility condition assessment database using text clustering,” Journal vation and Recycling, vol. 105, pp. 49–58, 2015. 101
37 of infrastructure systems, vol. 12, no. 1, pp. 50–59, 2006.
[108] W. Lu, X. Chen, D. C. Ho, and H. Wang, “Analysis of the construction 102
38 [89] H. Fan and H. Li, “Retrieving similar cases for alternative dispute
waste management performance in hong kong: the public and private 103
39 resolution in construction accidents using text mining techniques,”
sectors compared using big data,” Journal of Cleaner Production, 104
40 Automation in Construction, vol. 34, pp. 85–91, 2013.
vol. 112, pp. 521–531, 2016. 105
41 [90] M. Al Qady and A. Kandil, “Automatic clustering of construction
42 project documents based on textual similarity,” Automation in Con- [109] N. Kargah-Ostadi, “Comparison of machine learning techniques for 106
43 struction, vol. 42, pp. 36–49, 2014. developing performance prediction models,” in Computing in Civil 107
and Building Engineering (2014), pp. 1222–1229, ASCE. 108
44 [91] M. Al Qady and A. Kandil, “Concept relation extraction from con-
45 struction documents using natural language processing,” Journal of [110] F. Castronovo, S. Lee, D. Nikolic, and J. I. Messner, “Visualization 109
46 Construction Engineering and Management, vol. 136, no. 3, pp. 294– in 4d construction management software: A review of standards and 110
54 techniques,” Journal of Computing in Civil Engineering, pp. 453–460, technology to BIM visualization and manipulation,” in 28th Inter- 118
58 tion,” in Proc., 2012 ASCE Construction Research Congress (CRC), cloud approach to unified lifecycle data management in architecture, 122
60 [95] P. Demian and P. Balatsoukas, “Information retrieval from civil engi- and SNS,” Advanced Engineering Informatics, vol. 27, no. 2, pp. 173– 124
61 neering repositories: Importance of context and granularity,” Journal 188, 2013. 125
62 of Computing in Civil Engineering, vol. 26, no. 6, pp. 727–740, 2012. [114] P. Meadati, J. Irizarry, and A. K. Akhnoukh, “BIM and RFID 126
63 [96] H. P. Tserng, S. Y.-L. Yin, and M.-H. Lee, “The use of knowledge integration: a pilot study,” Advancing and Integrating Construction 127
64 map model in construction industry,” Journal of Civil Engineering and Education, Research and Practice, pp. 570–78, 2010. 128
65 Management, vol. 16, no. 3, pp. 332–344, 2010. [115] Y. Jiao, S. Zhang, Y. Li, Y. Wang, and B. Yang, “Towards cloud 129
20
1 augmented reality for construction application by BIM and SNS case study approach,” Automation in construction, vol. 24, pp. 149– 65
2 integration,” Automation in Construction, vol. 33, pp. 37–47, 2013. 159, 2012. 66
3 [116] P. X. Gao and S. Keshav, “Optimal personal comfort management [133] J. D. Goedert and P. Meadati, “Integrating construction process doc- 67
4 using spot+,” in Proceedings of the 5th ACM Workshop on Embedded umentation into building information modeling,” Journal of construc- 68
5 Systems For Energy-Efficient Buildings, pp. 1–8, ACM, 2013. tion engineering and management, vol. 134, no. 7, pp. 509–516, 2008. 69
6 [117] A. Rabbani and S. Keshav, “The spot* system for flexible personal [134] U. Isikdag, J. Underwood, G. Aouad, and N. Trodd, “Investigating the 70
7 heating and cooling,” in Proceedings of the 2015 ACM Sixth Inter- role of building information models as a part of an integrated data 71
8 national Conference on Future Energy Systems, pp. 209–210, ACM, layer: a fire response management case,” Architectural Engineering 72
9 2015. and Design Management, vol. 3, no. 2, pp. 124–142, 2007. 73
10 [118] A. A. Panagopoulos, M. Alam, A. Rogers, and N. R. Jennings, [135] N. Yu, Y. Jiang, L. Luo, S. Lee, A. Jallow, D. Wu, J. I. Messner, 74
11 “Adaheat: A general adaptive intelligent agent for domestic heating R. M. Leicht, and J. Yen, “Integrating BIMserver and openstudio for 75
12 control,” in Proceedings of the 2015 International Conference on energy efficient building,” in Proceedings of 2013 ASCE International 76
13 Autonomous Agents and Multiagent Systems, pp. 1295–1303, Inter- Workshop on Computing in Civil Engineering, 2013. 77
14 national Foundation for Autonomous Agents and Multiagent Systems, [136] J. Zhang, Q. Liu, F. Yu, Z. Hu, and W. Zhao, “A framework of cloud- 78
15 2015. computing-based BIM service for building lifecycle,” Computing in 79
16 [119] C. Chen and D. J. Cook, “Behavior-based home energy prediction,” in Civil and Building Engineering, pp. 1514–1521, 2014. 80
17 Intelligent Environments (IE), 2012 8th International Conference on, [137] J. Wong, X. Wang, H. Li, G. Chan, and H. Li, “A review of cloud-based 81
18 pp. 57–63, IEEE, 2012. BIM technology in the construction sector,” The Journal of Information 82
19 [120] S. Taneja, A. Akcamete, B. Akinci, J. Garrett, L. Soibelman, and E. W. Technology in Construction, vol. 19, pp. 281–291, 2014. 83
20 East, “Analysis of three indoor localization technologies to support [138] Y. Hsieh, “Cloud-computing based parameter identification system– 84
21 facility management field activities,” in Proceedings of the Interna- with applications in geotechnical engineering,” in Computing in Civil 85
22 tional Conference on Computing in Civil and Building Engineering, and Building Engineering (2014), pp. 1384–1392, ASCE. 86
23 Nottingham, UK, 2010.
[139] R. Klinc, I. Peruš, and M. Dolenc, “Re-using engineering tools: 87
24 [121] H. S. Ng and L. Soibelman, “Knowledge discovery in maintenance Engineering SaaS web application framework,” in Proceedings of the 88
25 databases: Enhancing the maintainability in higher education facili- 2014 International Conference on Computing in Civil and Building 89
26 ties,” in Proc., 2003 Construction Research Congress, ASCE, Reston, Engineering, pp. 23–25, 2014. 90
27 Va, 2003.
[140] B. Kumar, J. C. Cheng, and L. McGibbney, “Cloud computing and its 91
28 [122] R. Liu and R. Issa, “Issues in BIM for facility management from implications for construction it,” in Computing in Civil and Building 92
29 industry practitioners perspectives,” Computing in Civil Engineering Engineering, Proceedings of the International Conference, vol. 30, 93
30 (2013), pp. 411–418, 2013. p. 315, 2010. 94
31 [123] J. Sanyal and J. New, “Simulation and big data challenges in tuning [141] A. Redmond, A. V. Hore, and R. West, “Building support for cloud 95
32 building energy models,” in Modeling and Simulation of Cyber- computing in the Irish construction industry,” 2010. 96
33 Physical Energy Systems (MSCPES), 2013 Workshop on, pp. 1–6, [142] F. Fathi, M. Abedi, S. Rambat, S. Rawai, M. Z. Zakiyudin, et al., 97
34 IEEE, 2013. “Context-aware cloud computing for construction collaboration,” Jour- 98
35 [124] O. Linda, D. Wijayasekara, M. Manic, and C. Rieger, “Computational nal of Cloud Computing, vol. 2012, pp. 1–11, 2012. 99
36 intelligence based anomaly detection for building energy management [143] T. H. Beach, O. F. Rana, Y. Rezgui, and M. Parashar, “Cloud 100
37 systems,” in Resilient Control Systems (ISRCS), 2012 5th International computing for the architecture, engineering & construction sector: 101
38 Symposium on, pp. 77–82, IEEE, 2012. requirements, prototype & experience,” Journal of Cloud Computing, 102
39 [125] I. Hong, J. Byun, and S. Park, “Cloud computing-based building vol. 2, no. 1, pp. 1–16, 2013. 103
40 energy management system with zigbee sensor network,” in Innovative [144] H.-Y. Chong, J. S. Wong, and X. Wang, “An explanatory case study on 104
41 Mobile and Internet Services in Ubiquitous Computing (IMIS), 2012 cloud computing applications in the built environment,” Automation in 105
42 Sixth International Conference on, pp. 547–551, IEEE, 2012. Construction, vol. 44, pp. 152–162, 2014. 106
43 [126] R. P. Singh, S. Keshav, and T. Brecht, “A cloud-based consumer-centric [145] A. Grilo and R. Jardim-Goncalves, “Cloud-marketplaces: Distributed 107
44 architecture for energy data analytics,” in Proceedings of the fourth e-procurement for the AEC sector,” Advanced Engineering Informat- 108
45 international conference on Future energy systems, pp. 63–74, ACM, ics, vol. 27, no. 2, pp. 160–172, 2013. 109
46 2013.
[146] T. Elghamrawy and F. Boukamp, “Managing construction informa- 110
47 [127] M. Berges, E. Goldman, H. S. Matthews, and L. Soibelman, “Learning tion using rfid-based semantic contexts,” Automation in construction, 111
48 systems for electric consumption of buildings,” in ASCI international vol. 19, no. 8, pp. 1056–1066, 2010. 112
49 workshop on computing in civil engineering, vol. 38, 2009.
[147] E. Curry, J. ODonnell, E. Corry, S. Hasan, M. Keane, and S. ORiain, 113
50 [128] C. Wei and Y. Li, “Design of energy consumption monitoring and “Linking building data in the cloud: Integrating cross-domain building 114
51 energy-saving management system of intelligent building based on data using linked data,” Advanced Engineering Informatics, vol. 27, 115
52 the internet of things,” in Electronics, Communications and Control no. 2, pp. 206–219, 2013. 116
53 (ICECC), 2011 International Conference on, pp. 3650–3652, IEEE,
[148] A. Zanella, N. Bui, A. Castellani, L. Vangelista, and M. Zorzi, 117
54 2011.
“Internet of things for smart cities,” Internet of Things Journal, IEEE, 118
55 [129] R. Volk, J. Stengel, and F. Schultmann, “Building information mod- vol. 1, no. 1, pp. 22–32, 2014. 119
56 eling (BIM) for existing buildingsliterature review and future needs,”
[149] A. Khan and K. Hornbæk, “Big data from the built environment,” in 120
57 Automation in Construction, vol. 38, pp. 109–127, 2014.
Proceedings of the 2nd international workshop on Research in the 121
58 [130] X. Wang, “BIM handbook: A guide to building information modeling large, pp. 29–32, ACM, 2011. 122
59 for owners, managers, designers, engineers and contractors,” Construc- [150] J. Stankovic et al., “Research directions for the internet of things,” 123
60 tion Economics and Building, vol. 12, no. 3, pp. 101–102, 2012. Internet of Things Journal, IEEE, vol. 1, no. 1, pp. 3–9, 2014. 124
61 [131] S. Azhar, “Building information modeling (BIM): Trends, benefits, [151] D. Bonino, F. Corno, and L. De Russis, “Real-time big data processing 125
62 risks, and challenges for the AEC industry,” Leadership and Manage- for domain experts, an application to smart buildings,” Big Data 126
63 ment in Engineering, 2011. Computing/Rajendra Akerkar; Taylor & Francis Press: London, UK, 127
64 [132] K. Barlish and K. Sullivan, “How to measure the benefits of BIMa vol. 1, pp. 415–447, 2013. 128
21
1 [152] C. Miller, Z. Nagy, and A. Schlueter, “Automated daily pattern filtering [173] O. Tene and J. Polonetsky, “Big data for all: Privacy and user control 66
2 of measured building performance data,” Automation in Construction, in the age of analytics,” Nw. J. Tech. & Intell. Prop., vol. 11, p. xxvii, 67
3 vol. 49, pp. 1–17, 2015. 2012. 68
1 L IST OF F IGURES
2 1 Proposed Review Structure of the Paper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3 2 An overview of MapReduce processing [12] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
4 3 Apache Spark and Related Technology Stack [15] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5 4 Databases Popularity Trend [19] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6 5 Multidisciplinary Nature of Big Data Analytics (Adapted from [28]) . . . . . . . . . . . . . . . . . . . . . . . . . 27
FIGURES 23
1 L IST OF TABLES
I Prominent NoSQL Systems and Their Critical Features. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2339
V Classification-based works for the construction industry, categorized per classification technique . . . . . . . . . . 30 2343
XII Classification-based works for the construction industry, categorized per classification technique . . . . . . . . . . 37 2350
Product Key
Product Name Data Model(s) Language Concurrency Storage
Description Features
High
Columnar Java, Disk,
Cassandra Apache Cassandra is scalable database that provides proven fault-tolerance and tunable MVCC Availability,
Key-Value Python Hadoop, Plugin
consistency on cluster of commodity servers. Partition Tolerance
Columnar Consistent,
HBase HBase is distributed data store that extended Google Bigtable to scale on HDFS. Its novelty Java Locks Hadoop
Key-Value Partition Tolerance
lies in storing and accessing data with random access. It doesn‘t restrict the kind of data
being stored.
Disk
Hadoop Consistent,
HyperTable Hypertable supports data distribution for scalable data management. It offers maximum Columnar C++ MVCC
GlusterFS, Partition Tolerance
efficiency and superior performance. However, it lacks data management features such as
Kosmos File System
transaction and join processing.
Document Disk,GFS, Consistent,
MongoDB MongoDB is a document-oriented database. It facilitates storage of documents with C++ Locks
Key-Value Plugin Partition Tolerance
variable schemas and is suitable for applications, storing complex types.
Document Erlang, High Availability,
CouchDB CouchDB is suitable for large scale web and mobile applications. It facilitate data storage MVCC Disk
Key-Value C Partition Tolerance
that are queried through web browsers, via HTTP. JavaScript is used to index, integrate,
and transform the database.
C, GFS Consistent,
MarkLogic MarkLogic facilitates storing documents efficiently for easy and intuitive search. It is Document Java, ACID Hadoop High Availability,
suitable for applications that derive revenue, streamline operations, risk management, and Python S3, RDF Partition Tolerance
security.
ANSI Consistent,
Redis Redis is in-memory system that can be used as a database, cache, and message broker. Key-Value Locks RAM
C Partition Tolerance
When configured on cluster, it becomes scalable and highly available. It also supports
transaction processing.
High Availability,
Riak Riak is a distributed database that provides scalability and high availability. It achieves Key-Value Erlang ACID Disk, Plugin
Partition Tolerance
performance and fault tolerance through built-in distribution and replications.
Consistency,
BarkeleyDB Berkeley DB is embedded database for key-value dataset. It is written in C but supports Key-Value Java ACID RDF Availability,
application development for C++, PHP, Java, Perl, among others. Partition-Tolerance
High Availability,
Neo4J Neo4J is a semantic store for creating, updating, deleting, and retrieving graph data. It Graph Java Locks Disk
Partition Tolerance
captures relationships natively and processes queries as paths through language called
Cypher. Neo4J is good option for applications, dealing with connected data.
Disk, Consistent,
OrientDB OrientDB is a system for large-scale and distributed graph management. The core features Graph Java ACID Plug-in High Availability,
include multi-master replication and sharding. RAM, SSD Partition Tolerance
Columnar Consistency,
Oracle Document Berkeley DB Availability,
Oracle NoSQL is designed specifically to provide highly reliability, scalability, and Java ACID
NoSQL Key-Value Architecture, RDF Limited
maximum availability across the cluster of storage nodes. Data is replicated to survive
Graph Partition-Tolerance
rapid failure and load balancing for distributed query processing.
TABLES 30
Supported Supported
Tool Name Description ML at Scale
Languages Algorithms
-Collaborative Filtering
Apache Mahout is an open-source machine learning framework for quickly writing scalable and high -Java -Classification
Yes
Mahout [25] performance ML applications. -Scala. -Clustering
-Regression
-Collaborative Filtering
R is an open-source programming language for statistical analysis. R is extremely extensible. With
Many -Classification
R [26] huge developer base, thousands of R packages are available to provide variety of functionalities. Yes
languages -Clustering
The graphics supported by R are highly polished and very powerful.
-Regression
Spark has constituted a novel ML platform called MLbase, which has brought together highly
-Collaborative Filtering
robust ML components, such as ML optimizer, MLI, and MLlib, to support the full lifecycle -Java
-Classification
MLbase [27] activities, required to implement as well as use ML algorithms. ML optimizer automates the tasks -Scala Yes
-Clustering
of ML pipeline construction to efficiently search algorithms of MLI and MLlib. MLI is the API to -Python
-Regression
develop ML algorithms using high-level constructs. MLlib is the Spark distributed ML library.
Oryx is an open-source ML library that has evolved over time out of the libraries and toolkits -Collaborative Filtering
developed by Cloudera. Based on the distributed input from HDFS, it builds predictive models -Classification
Oryx [14] -Java Yes
that are written to output in predictive model markup language (PMML). An interesting feature -Clustering
of Oryx is its ability to keep the model updated under emerging streams of data from Hadoop. -Regression
TABLES 31
TABLE V. C LASSIFICATION - BASED WORKS FOR THE CONSTRUCTION INDUSTRY, CATEGORIZED PER CLASSIFICATION TECHNIQUE
TABLE VII. S UMMARY OF OPPORTUNITIES WITHIN THE SUB - DOMAIN OF CONSTRUCTION INDUSTRY
Construction Industry
State of the Art Potential Opportunities
Sub-domains
-BIM tools to actualize circular economy for sustainability, green
supply chains, and closed-loop supply chains
-BIM tools for optimal & auto design specification
-Construction waste generation estimation [105], [107] -BIM integrated materials database using open standards
Resource and
-Waste generation benchmarking [107] -BIM integrated linked data for waste data management
Waste Optimization
Comparative analysis of waste management performance [108] -BIM based waste estimation using predictive analytics
-BIM based waste minimisation through deconstruction
-BIM based waste minimisation through resource optimisation
-BIM based waste minimisation through interactive visualisation
-Framework to exploit Big Data analytics to parallelize algorithms for
-Autodesk Dreamcatcher—a prototype system to showcase real time GD computation
Generative Designs
the feasibility of this idea of generating design from abstract requirements -Big Data algorithms to accurately reduce the design space
-Big Data enabled GD tool
-Big Data analytics based MEP design checker that uses prescriptive
Clash Detection -BIM enabled approaches are developed to resolve conflicts in MEP design,
analytics not only to identify conflicts but also describe the best
and Resolution however these approaches are time consuming [38]
action to resolve it.
Performance -Pavement management system using pavement deterioration
-Big Data driven BIM system for pavement deterioration prediction
Prediction prediction [109]
-Visual analytics driven Big Data framework for BIM model
-4D BIM Visualisation [110], energy user classification [111], using VA visualisation
Visual Analytics
-Cloud-based BIM system for design visualisation and exploration [112] -Visual analytics driven design optimiser for energy reduction
and comfort maximisation
-Integration of project management data using social networking [113]
Social Networking -BIM framework for social network information modelling
-BIM, RFID, and social data integration [114]
Services/ Analytics using Big Data
-AR based Business social networking services (BSNS) [115]
Personalized -SPOT+ indoor air personalisation [116], SPOT* for heating/cooling [117] -Personalisation energy monitor that requires less user input to
Services -AdaHeat domestic heat regulator [118], Behavioural energy adaptation [119] regulate optimal energy consumption
-BIM based indoor localisation [120], FM cost reduction through massive
Facility Management -Big Data Analytics based BIM system for FM activities
data exploration [121], FM data modelling through BIM [122]
-Energy simulation software (EnergyPlus) [123], energy management systems
-Big Data framework for BIM based open energy data persistence
Energy Management and [124], Cloud based energy data storage and processing [125], [126]
-Big Data analytics platform to simulate and optimise energy
Analytics -Appliance event identification using NILM and Wire Spy [127], [128],
usage of buildings
Energy user classification [111], IOT framework for energy analytics [128]
-BIM models for building designs [129], [130], [131], [132], BIM models for -Big Data enabled IFC-compliant BIM storage system
construction process documentation [133], BIM models for GIS data [134], -BIM platform for IOT applications
Big Data with BIM BIM for MEP conflict resolution [38], BIM open platform [135], BIM via -Open data platform for linking BIM models with external sources
cloud [20], [136], [112], BIM and RFID [114], BIM models for project -Big Data enabled BIM processing platform to developing
management data [113], MapReduce savvy BIM data storage and processing [6] applications
-Cloud based energy data management [125]
-Cloud based BIM data management [20]
-Cloud enabled BIM design data storage & exploration [112], [137], [136], [138]
Big Data with Cloud -SaaS platform for structural MEP analysis [139] -A BDA platform to store and process BIM models on cloud for
Computing -Cloud based BIM system for SMEs [140], [141] developing domain specific applications.
-BIM based context-aware computing [142]
-Amazon EC2 enabled Google SketchUp [143], [144]
-Cloud based e-procurement platform [145]
-RFID based construction document retrieval & assets
Big Data with Internet management system [146], [114]
-Big Data driven IOT platform for Smart Buildings
of Things (IOT) -IOT based energy monitoring and analysis system [128], [147]
-Urban IOT, a framework for smart cities [148]
-Project dasher for measuring and visualising CO2 emission of buildings [149]
Big Data for Smart -A robust firefighting systems for buildings [150] -Building Personalisation Services using Big Data
Buildings -Complex event processing for smart buildings [151] -Mobiles apps to exploit building personalisation services
-DayFilter, for pattern recognition in energy data [152]
-Big BIM Data Visual Exploration System
-Big Data and AR based virtual site exploration
Big Data with Augmented -BIM2MAR, a platform to integrate BIM, mobile and AR [153] -Big Data and AR enabled Proactive Dispute Identification
Reality (AR) -Web3D-based AR system for BIM and social networking services (SNS) [115] and resolution System
-Big Data and AR enabled As-planned vs. As-built
Comparison System