Professional Documents
Culture Documents
www.elsevier.com/locate/autcon
Abstract
The widespread use of information technologies for construction is considerably increasing the number of electronic text
documents stored in construction management information systems. Consequently, automated methods for organizing and
improving the access to the information contained in these types of documents become essential to construction information
management. This paper describes a methodology developed to improve information organization and access in construction
management information systems based on automatic hierarchical classification of construction project documents according to
project components. A prototype system for document classification is presented, as well as the experiments conducted to verify
the feasibility of the proposed approach.
D 2003 Elsevier Science B.V. All rights reserved.
Keywords: Construction management; Classification systems; Information management; Information systems; Text/data mining
0926-5805/03/$ - see front matter D 2003 Elsevier Science B.V. All rights reserved.
doi:10.1016/S0926-5805(03)00004-9
396 C.H. Caldas, L. Soibelman / Automation in Construction 12 (2003) 395–406
classification system (CICS) defines concept hierar- object’s name to the terms being used in the different
chies that can be used for document classification, construction documents.
providing a common framework for document organ- The previously mentioned limitations and the push
ization and management among project organizations. towards fully integrated and automated project pro-
These classification frameworks can be embedded in cesses justify the need for the development of auto-
inter-organizational information systems, like project mated classification methods for construction project
websites, project management software, and document documents that can explore the internal characteristics
management systems. Examples of CICSs include: the of these documents and adapt to different classifica-
CSI MasterFormat [17], CSI UniFormat [33], CI/SfB, tion frameworks.
Uniclass, and the Overall Construction Classification This paper presents a unique way to improve infor-
System [20]. mation organization and access in inter-organizational
One limitation of the existing inter-organizational construction management systems based on methods
information systems is the reliance on manual classi- for automated hierarchical classification of construc-
fication methods conducted by human experts. With tion project documents according to CICSs items. In
the growth in the use of information technologies by order to accomplish this goal, a combination of techni-
construction companies, the increasing availability of ques from the areas of information retrieval and text
electronic documents, and the development of model- mining was explored. As a result, a methodology for
based systems, manual classification becomes imprac- automated hierarchical document classification was
tical. One example of the limitations of manual devised and implemented. A prototype of a construc-
classification is the time and effort that would be tion document classification system was also devel-
required to classify all documents created in a con- oped to provide easy deployment and scalability to the
struction project (contracts, specifications, meeting classification process. The developed prototype auto-
minutes, change orders, field reports, and requests mated all steps of the text classification process.
for information, among others), according to all com- Experiments were conducted to validate the results
ponents of a CICS. and demonstrate the applicability of the implemented
Another limitation of the current systems is the techniques.
consideration of documents as single units for the
purpose of classification and retrieval. Many construc-
tion documents, including specifications and meeting 2. Construction management information systems
minutes, should clearly be divided and then assigned
to more than one item of a CICS. This limitation can The escalating globalization and complexity of
be illustrated by the case in which a project manager construction projects have increased the participation
wants to access information contained in meeting of companies from diverse locations in project teams
minutes regarding a specific CSI MasterFormat item [3]. In this environment, effective inter-organizational
in order to solve an issue. Using current technologies, construction management information systems able to
the project manager would need to manually search minimize time and distance constraints are necessary.
and analyze each document individually in order to Examples of such systems are described extensively
obtain the desired information. in literature [18,19,22,27,32,34,39]. In the distributed
A third problem that exists in available systems is and dynamic construction environment, the ability to
the lack of support for differences in vocabularies and exchange and integrate information from different
naming conventions. This problem can be illustrated sources and in different data formats becomes crucial
by the case in which an architect gives a name for a to the improvement of the construction processes
particular object in a project model. Since there is supported by these systems. Simoff and Maher [26]
usually no standard vocabulary among organizations argue that a key issue in managing construction
that participate in a construction project, references to information is the diversity of data types, including:
that particular object in project documents are often
done using different names. Using current technolo- structured data files, stored in database manage-
gies, project managers would need to map the model ment systems or specific applications, such as data
C.H. Caldas, L. Soibelman / Automation in Construction 12 (2003) 395–406 397
warehousing, enterprise resource planning, cost edge-based interfaces linking multiple applications
estimating, scheduling, payroll, finance, and ac- and multiple databases; (iii) integration through geom-
counting; etry; and (iv) integration through a shared project
semi-structured data files, such as HyperText Mark- model holding all the information relating to a project
up Language (HTML), Extensible Markup Lan- according to a common infrastructure model.’’
guage (XML), or Standardized General Markup The technical integration through a shared project
Languages (SGML) files; model can be based on the creation of model-based
unstructured text data files, such as contracts, systems using 3D/4D CAD [1] or on the use of
specifications, catalogs, change orders, requests for distributed software architectures to facilitate the inte-
information, field reports, and meeting minutes; gration of decentralized project information [29,32].
unstructured graphic files stored in binary format, The adoption of data standards can support these
such as 2D and 3D drawings; and integration approaches. Examples of initiatives in this
unstructured multimedia files, such as pictures, area are presented by Eastman [7], and include the
audio, and video files. ISO-STEP, the Industry Foundation Classes (IFC)
created by the International Alliance for Interoper-
For instance, let us consider a typical construction ability [12], and the aecXML specification [2].
situation where a construction manager wants to find Currently, the majority of the architecture, engi-
all available information about one construction activ- neering, construction, and facilities management
ity, say, placing concrete in a slab. He/she will probably (AEC/FM) information integration initiatives focus
find the drawings in computer-aided design (CAD) on structured data types. Nevertheless, Soibelman
files, the cost estimates in files produced by cost and Caldas [27] argue that a large percentage of the
estimation systems, the schedule in files generated by construction data is stored on semi-structured and
project management software, the specifications and unstructured files. Recent research work addressed
contracts in text documents, the communications some of the issues related with unstructured data
among project members in e-mail files, and price integration. Fruchter [9] describes tools to capture,
quotes in files collected from different websites. A share, and reuse project information. Garrett et al. [10]
major task is how to retrieve, classify, and integrate explore the use of text analysis for building up
information in these different file formats, especially classifications of regulation sections. Wood [38]
considering that the files can also be stored in different describes an approach to extracting concepts from
organizations, computers, or file systems. textual design documentation. BruU ggemann et al. [4]
Information integration methodologies have been proposed the use of arbitrarily structured metadata to
investigated worldwide in order to improve informa- markup documents. Scherer and Reul [24] use text
tion organization and access in inter-organizational clustering techniques to group similar documents and
construction management information systems. Tei- retrieve project knowledge from heterogeneous AEC/
cholz [31] argues that project information should be FM documents. Yang et al. [35] and Kosovac et al.
integrated in three dimensions: ‘‘(1) horizontal inte- [16] proposed the use of controlled vocabularies
gration of multiple disciplines that take part in a (thesauri) to integrate heterogeneous data representa-
construction project; (2) vertical integration of multi- tions. Since a great percentage of AEC/FM information
ple stages in the life cycle of a facility; and (3) is exchanged using text data files, the management of
longitudinal integration over time, which is also the information contained in these types of documents
related with the capture of knowledge that allows becomes crucial to construction information manage-
improved performance or better decisions in the ment.
future.’’
Fisher and Kunz [8] argue that technical and
managerial strategies have been used to improve 3. Construction information classification systems
information integration. On the technical side, there
are four approaches to achieve integration [21,40]: Construction management information systems
‘‘(i) communication between applications; (ii) knowl- generate a significant quantity of data that needs to
398 C.H. Caldas, L. Soibelman / Automation in Construction 12 (2003) 395–406
be organized, stored, accessed, and used by all project Ref. [6]. The importance of this study is that auto-
organizations. The increase in the amount and types of mated document classification methods can be used to
information generated and the construction industry’s improve information organization and access in cur-
subsequent reliance on it motivated the creation of rent information management systems as well as
classification standards that can comprehend the full being a foundation for integration of construction
scope of construction information. These standards documents in emerging model-based systems.
enable the organization of project information and Experiments were conducted in order to evaluate
facilitate the communication between project organ- the alternative methods that could be applied in each
izations throughout the project’s life cycle. of the phases of the document classification process.
The information classification standards created by The database selected for this evaluation was the
the AEC/FM industry are called construction informa- Sweet’s Product Marketplace [30]. This database
tion classification systems [13]. They can be defined as stores data from over 10,700 manufacturers and
a standard representation of construction project infor- 61,300 products for the construction industry. Con-
mation. According to Kang and Paulson [13,14], a struction products are classified using the hierarchical
construction information classification system pro- structure of CSI MasterFormat [17] in this database.
vides a common method for improving organization The experiments were conducted using 3030 ran-
and coordination of information in construction proj- domly selected documents from the Sweet’s database.
ects. Examples of CICSs include the CSI Masterformat The goal was to verify the classification accuracy of
[17], the CSI Uniformat [33], and the Overall Con- the proposed automated document classification
struction Classification System [20], and Uniclass method, using the classification decisions already
[14]. For instance, in OCCS project facilities, con- defined in the Sweet’s database as a benchmark. The
structed entities, spaces, elements, work results, prod- selected documents were originally classified in the
ucts, process phases, process services, process partic- database according to a subset of 121 CSI Master-
ipants, process aids, process information, and attributes Format items. These items were distributed according
are all defined in a standard manner. Therefore, CICSs to the CSI MasterFormat classification hierarchy and
provide a common framework for information organ- were composed of 16 items on level one, 52 items on
ization and access in construction management in- level two, and 53 items on level three.
formation systems as well as knowledge dissemina- The activity diagram of the proposed document
tion, being an essential component in the integration of classification process is presented in Fig. 1. The
construction project information. definition of the classes and the selection of the
training positive, training negative, testing positive,
and testing negative documents that will be used to
4. Automated hierarchical construction document create the classification model and verify their accu-
classification racy are the initial activities that should be con-
ducted.
From the observations and problems presented in The documents used to create the classification
Sections 1 and 2, we can infer that information models as well as the new documents to be classified
integration, organization, and access should be con- are usually stored in different data formats including:
sidered in construction management. Since a great word processor, spreadsheet, e-mail, HTML, XML,
percentage of the information exchanged among con- PostScript (PS), and Portable Document Format
struction organizations is stored in text data files, the (PDF) files. In order to apply the classification algo-
management of the information contained in these rithms, these files need to be converted to text file
types of documents becomes essential. In order to format. This is usually done using file converter
improve the management of text-based information, systems in order to create a text version of each
an automated document classification method was document, while keeping the original documents in
devised and implemented. The method was designed their native formats and locations. The text versions
according to the construction document classification can then be used in the remaining activities of the
process developed by the authors and described in classification process.
C.H. Caldas, L. Soibelman / Automation in Construction 12 (2003) 395–406 399
The next two steps require decisions regarding positive training documents in class C; Nneg = Total
removal of stopwords and stemming. Stopwords are number of negative training documents in class C;
frequent words that do not carry information relevant NhasT = Total number of training documents in class C
to text classification like conjunctions, prepositions, that has term T; NnoT = Total number of training
and pronouns. Stemming is the process of prefix and/ documents in class C that does not have term T;
or suffix removal to generate word stems. This is done NposhasT = Total number of positive training docu-
to group words that have the same conceptual mean- ments in class C that has term T; NneghasT = Total
ing. Our experiments revealed that the removal of number of negative training documents in class C
stopwords, as well as the use of stemming algorithms that has term T; NposnoT = Total number of positive
improves classification accuracy in most of the cases. training documents in class C that does not have term
The index terms were obtained in one of the steps of T; NnegnoT = Total number of negative training docu-
the document classification process. Therefore, pre- ments in class C that does not have term T.
defined index terms were not used in the process. The research demonstrated that the effectiveness of
According to Sebastiani [25], a major character- DR methods depends on the classification method
istic, or difficulty of text classification problems is the used. For instance, the results for support vector
high dimensionality of the feature space. Many clas- machines [15] without dimensionality reduction were
sification algorithms cannot deal with such a large slightly better than when dimensionality reduction
feature set, since processing is extremely costly in was used. Table 1 presents the classification accuracy
computational terms. Hence, in many cases, there is a results for support vector machines in different CSI
need to reduce the original feature set, which is MasterFormat levels without dimensionality reduc-
commonly known as dimensionality reduction (DR) tion, as well as the best classification result obtained
or attribute selection in the pattern recognition liter- from the test cases where dimensionality reduction
ature. was used.
Various DR methods have been tested in this Classification algorithms cannot directly interpret
research. These methods are grounded on concepts text documents. For this reason, a preparation and
from the areas of information theory and linear indexing procedure that maps a text document into a
algebra [36]. In our experiments, the information gain compact representation of its content needs to be
method gave satisfactory results. In the information uniformly applied to training and test documents.
gain method, the expected reduction in entropy caused The vector space model was selected for document
by selecting a term that will be used to classify the representation because the resulting model can be
documents is calculated for all terms that occur in the uniformly applied to the different classification algo-
documents belonging to each class. Terms with high- rithms analyzed. In the vector space model, vectors
est information gain are selected. The information represent documents. The collection of documents is
gain is calculated using the following formula: represented by an m n term-by-document weighted
frequency matrix A={aij}, where aij was defined as
GainðI; CÞ ¼ EntropyðT; CÞ ðNhasT =Ntotal Þ the weight of a term i in document j. Each of the m
EntropyðT; ChasT Þ ðNnoT =Ntotal Þ
EntropyðT; CnoT Þ; Table 1
Effect of dimensionality reduction on classification accuracy using
SVM
where: Gain(T,C) = Information gain for term T in class
CSI MasterFormat level Classification accuracy
C; Entropy(T,C) = (Npos/Ntotal) log2 (Npos/Ntotal)
(Nneg/Ntotal) log2 (Nneg/Ntotal); Entropy(T,ChasT) = Dimensionality reduction
(NposhasT/NhasT) log2 (NposhasT/NhasT) (NneghasT/ Without (%) With (%)
N hasT ) log 2 (N neghasT /N hasT ); Entropy(T,C noT ) = Level 1 95.88 94.33
(NposnoT/NnoT) log2 (NposnoT/NnoT) (NnegnoT/ Level 2 91.53 88.64
NnoT) log2 (NnegnoT/NnoT); Ntotal = Total number of Level 3 86.37 83.17
All levels 92.05 89.53
training documents in class C; Npos = Total number of
C.H. Caldas, L. Soibelman / Automation in Construction 12 (2003) 395–406 401
unique terms in the document collection is assigned a where: tfcki = the tfc weight of term k in document i;
row in the matrix, while each of the n documents in tf-idfki = the tf-idf weight of term k in document i; tf-
the collection is assigned a column in the matrix. A idfsi = the tf-idf weight of term s in document i; T = set
non-zero element, aij, indicates not only that term i of all terms that occurs at least once in the collection.
occurred in document j, but also the number of times In tfc weighting, the values of tf-idf weighting are
the term appears in that document or its relative normalized to minimize the effect of length differ-
weight. Since the number of terms in a given docu- ences among documents. Our experiments demonstra-
ment is typically far less than the number of terms in ted that these different weighting schemes have
the entire document collection, the matrix A is usually different classification accuracies. Table 2 presents
very sparse. For each class (defined here as a CICS the accuracy results in different CSI MasterFormat
item), only the terms selected after the dimensionality levels, using the index weighting methods previously
reduction step are used to create the vector space described.
model. An independent vector space model needs to The machine learning algorithms used to create the
be created for each class. classification models have their own data input format
Several ways of determining the weights aij were and requirements. Usually, their data input is made
investigated, including: Boolean weighting, absolute using text files containing the data that will be
frequency, term frequency-inverse document fre- processed. The data transformation step aims to create
quency (tf-idf) weighting, and normalized term fre- the data input files required by the classification
quency-inverse document frequency (tfc) weighting algorithms. Basically, the information from the vector
[23]. These approaches were originally developed space model is converted into the appropriate text file
based on two empirical observations regarding text format.
documents: (i) the more times a word occurs in a Pattern classification algorithms are used to create
document, the more relevant it is to the subject of the the classification models. In this case, the classes are
document, and (ii) the more times the word occurs represented by the items of a Construction Informa-
throughout all documents in the collection, the more tion Classification System. Hence, construction docu-
poorly it discriminates between documents. ment classification is defined as the task of assigning a
In Boolean weighting, a value of 1 is given to each Boolean value to each pair {dj, ci}aD C, where D is
cell, aij, in which the term i occurred in document j. In a domain of project documents and C is a set of CICS
absolute frequency weighting, the cell aij value is items (classes). A value of T (true) assigned to {dj, ci}
given by the absolute frequency of the term i in indicates a decision that document dj is related with
document j. tf idf weighting uses the following item ci, while a value of F (false) indicates that dj is
formula to calculate the cell values: not related with item ci.
Several algorithms were tested, including: naive
tf idf ki ¼ fki log2 ðN =dk Þ; Bayes, k-nearest neighbors, Rocchio, and support
where: tf-idfki = the tf-idf weight of term k in docu- vector machines (SVM). Table 3 presents the classi-
ment i; fki = the absolute frequency of term k in docu- fication accuracy results in different CSI MasterFor-
ment i; N = the number of documents in the collection;
dk = the number of documents containing term k.
Table 2
The reasoning behind the tf-idf weighting is that if Effect of the index weighting methods on classification accuracy
the term occurs in many of the documents in the
CSI MasterFormat level Classification accuracy
collection, then it does not serve well as a document
Index weighting method
identifier and should be given a low weight as a
potential index term. In tfc weighting, the values for Boolean Abs. frequency tf-idf tfc
(%) (%) (%) (%)
each cell aij is calculated by the formula:
,vu T
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Level 1 89.11 81.48 82.98 95.88
uX Level 2 78.89 65.12 64.70 91.53
tfc ¼ tf idf t ðtf idf Þ2 ; Level 3 69.49 50.05 50.32 86.37
ki ki si
s¼1 All levels 80.58 67.83 68.30 92.05
402 C.H. Caldas, L. Soibelman / Automation in Construction 12 (2003) 395–406
Whenever a new document needs to be classified, The system enables the classification of construction
it must be projected into the multidimensional space documents according to the specific classification
of each of the existing classes considering the same items found in construction information classification
data preparation options (e.g.: use of the stemmer, systems. CDCS automates the steps involved in the
index term weighting method). This projection is con- document classification process previously described.
ducted very carefully since the index terms in the It is currently composed of seven main modules: data
document to be classified need to match the right selection, data conversion, dimensionality reduction,
multidimensional space dimensions. Considering that data preparation, data transformation, learning, and
the new document vector is xnew, the classification classification. The system was implemented in the
decision for a new document for a given class is given programming language Java and uses Java Database
by the sign of (wxnew + b). A positive value means that Connectivity (JDBC) to communicate with a database
the new document is related to this class. A negative management system (SQL Server). This database
value means that the new document is not a member of stores the data generated during the creation of the
this class. classification models; this data will also be used in the
Since, there are several classification models (one classification of new documents.
for each class), the new document needs to be In CDCS, the classification structure can be de-
projected in several multidimensional spaces. There- fined according to a hierarchy of classes. For instance,
fore, this process needs to be repeated for each of the considering the CSI MasterFormat [17] as the classi-
existing classification models. fication structure, the document is initially classified
according to each element of the first level (CSI
MasterFormat level one-Divisions). For the elements
5. Implementing automated hierarchical document in the first level in which the classification decision
classification was true (meaning that the document was related with
that particular CSI MasterFormat level one item-
A prototype system, called the Construction Docu- Division), the binary classification can then be con-
ment Classification System (CDCS), was implemented ducted for the second hierarchical level (CSI Master-
in order to test the feasibility of the proposed approach. Format level two). Following the same process, for
404 C.H. Caldas, L. Soibelman / Automation in Construction 12 (2003) 395–406
Classification Re sults
95.88%
Accuracy
91.53%
86.37%
The proposed methodology can also be used to Experiments were conducted to verify the classi-
improve the organization and access to more unstruc- fication accuracy for hierarchical classification struc-
tured text documents. It has been successfully tested tures. A construction products’ database, originally
in other types of construction documents, such as classified according to a hierarchical structure, was
meeting minutes, requests for information, change used in this analysis. The results demonstrated the
orders, and design review documents. effectiveness and applicability of automated docu-
ment classification methods for construction manage-
ment information systems. Examples of other prob-
6. Conclusions lems that can benefit from the proposed automated
classification method include: analysis of construction
In this paper, a methodology for automated hier- project documentation, organization of multimedia
archical document classification was described and project inspection files based on their description,
evaluated. Automatic hierarchical classification is part facilitation of automated access to project specifica-
of an ongoing research project that aims to improve tions in proactive project controls systems, identifica-
the organization and access of unstructured text docu- tion of problem areas and potential causes of delays,
ments in construction management information sys- cost overruns, or quality deviations, and generation of
tems and facilitate the integration of such documents lessons learned that could be applied in future activ-
in model-based systems. This is a very important issue ities and projects.
for construction information management because a
large percentage of project information is stored in
text documents and these documents contain valuable
Acknowledgements
information for decision-making, data analysis, and
knowledge discovery.
The authors would like to thank the National
The methodology supports the generation of clas-
Science Foundation for the support under the grant
sification models based on project information classi-
number 0201299.
fication structures, such as construction information
classification systems or project model objects. After
creating these classification models, new construction
documents can be effectively classified. The main References
characteristics of the proposed methodology are:
[1] F.B. Aalami, M. Fischer, J.C. Kunz, AEC 4D-CAD produc-
tion model: definition and automated generation. CIFE WP
It does not require the manual assignment of 052, 1998.
metadata (keywords or index terms) to all docu- [2] aecXML, < http://www.iai-na.org/domains/aecxml/about/
ments in the information system. Manual assign- aecxml_about.html> (Aug 28, 2002).
ment of metadata is a tedious task. It is also hard to [3] C.J. Anumba, N.F.O. Evbuomwan, A taxonomy for commu-
nication facets in concurrent life-cycle design and construc-
achieve consistency when a large number of users tion, Computer-Aided Civil and Infrastructure Engineering 14
from different organizations are adding documents (1999) 37 – 44.
to the system. [4] B.M. BruUggemann, K. Holz, F. Molkenthin, Semantic doc-
It does not need the utilization of a controlled umentation in engineering, Proceedings of the ICCCBE-
vocabulary that would only be effective if it was VIII, Palo Alto, CA, ASCE, Reston, VA, August, 2000,
pp. 828 – 835.
accepted as a standard by the AEC/FM organiza- [5] C.J.C. Burges, A tutorial on support vector machines for pat-
tions and adopted by all users of a construction tern recognition, Data Mining and Knowledge Discovery 2 (2)
management information system. (1998) 121 – 167.
It uses already existing AEC/FM standards to define [6] C.H. Caldas, L. Soibelman, J. Han, Automated classification
the categories that will be used for classification; of construction project documents, Journal of Computing in
Civil Engineering, 2002 (October) 16 (4), pp. 234 – 243.
and [7] C.M. Eastman, Building Product Models: Computer Environ-
It facilitates the creation of automated mapping me- ments Supporting Design and Construction, CRC Press, Boca
chanisms from documents to project components. Raton, FL, USA, 1999.
406 C.H. Caldas, L. Soibelman / Automation in Construction 12 (2003) 395–406
[8] M. Fischer, J. Kunz, The circle: architecture for integrating tion. Technical Report IEI-B4-31-1999, Istituto di Elabora-
software, Journal of Computing in Civil Engineering 9 (2) zione dell’Informazione, CNR, Pisa, Italy, 1999.
(1995) 122 – 133. [26] S.J. Simoff, M.L. Maher, Ontology-based multimedia data
[9] R. Fruchter, A/E/C teamwork: a collaborative design and mining for design information retrieval, Proc. of Computing
learning space, Journal of Computing in Civil Engineering in Civil Engineering, ASCE, Reston, VA, 1998, pp. 212 – 223.
13 (4) (1999) 261 – 269. [27] L. Soibelman, C. Caldas, Project extranets for construction
[10] J.H. Garrett Jr., S.J. Fenves, D.M. Stasiak, A WWW-based management: the American experience, Proceedings of En-
regulation broker, CIB Proceedings Publication 198: Con- tac-2000, May, 2000, Salvador, Brazil.
struction on the Information Highway, CIB, Rottedam, [28] L. Soibelman, H. Kim, Generating construction knowledge
1996, pp. 219 – 230. with knowledge discovery in databases, Journal of Computing
[11] J. Han, M. Kamber, Data Mining: Concepts and Techniques, in Civil Engineering, vol. 16 (1), ASCE, 2002, pp. 39 – 48.
Morgan Kaufmann, San Francisco, CA, 2001. [29] L. Soibelman, F. Peña-Mora, A distributed multi-reasoning
[12] IAI, < http://www.iai-international.org/iai_international/> mechanism to support the conceptual phase of structural
(Aug 28, 2002). design, Journal of Structural Engineering 126 (6) (2000)
[13] L.S. Kang, B.C. Paulson, Adaptability of information classi- 733 – 742.
fication systems for civil works, Journal of Construction En- [30] Sweet’s.Sweet’s Product Marketplace, < http://sweets.
gineering and Management 123 (4) (1997) 419 – 426. construction.com/default.jsp> (Aug 28, 2002).
[14] L.S. Kang, B.C. Paulson, Information classification for civil [31] P. Teicholz, Vision of future practice, Berkeley-Stanford
engineering projects by Uniclass, Journal of Construction En- Workshop on Defining a Research Agenda for AEC Proc-
gineering and Management 126 (2) (2000) 158 – 167. ess/Product Development in 2000 and Beyond, Stanford,
[15] T. Joachims, Text categorization with support vector ma- CA, 1999.
chines: learning with many relevant features, Proceedings [32] ToCEE-Towards a Concurrent Engineering Environment Proj-
of ECML-98, Chemnitz, Germany, Springer, Berlin, 1998, ect, The ToCEE client-server system for concurrent engineer-
pp. 137 – 142. ing. Final Report-ESPRIT Project No. 20587, 2000.
[16] B. Kosovac, T. Froese, D. Vanier, Integrating heterogene- [33] UniFormat, UniFormat 1998 Edition, 9Construction Specifi-
ous data representations in model-based AEC/FM systems, cations Institute, Alexandria, VA, 1998.
Proceedings of CIT 2000, Reykjavik, Iceland, CIB, Rotter- [34] VEGA, Virtual Enterprises using Groupware Tools and Dis-
dam, vol. 1, 2000, pp. 556 – 566. tributed Architecture-VEGA Project < http://cic.cstb.fr/ILC/
[17] MasterFormat, MasterFormat 1995 Edition, Construction ecprojec/vega/home.htm> (Aug 28, 2002).
Specifications Institute, Alexandria, VA, 1995. [35] M.C. Yang, W.H. Wood, M.R. Cutkosky, Data mining for
[18] W.J. O’Brien, Implementation issues in project web-sites: a thesaurus generation in informal design information retrieval,
practitioner’s viewpoint, Journal of Management in Engineer- Proceedings of the International Computing Congress, ASCE,
ing 16 (3) (2000) 34 – 39. Reston, VA, 1998, pp. 189 – 200.
[19] OSMOS, Open System for Inter-enterprise Information Man- [36] Y. Yang, J.O. Pedersen, A comparative study on feature se-
agement in Dynamic Virtual Environments-OSMOS Proj- lection in text categorization, Proceedings of ICML-97, 1997,
ect, < http://cic.vtt.fi/projects/osmos/index.html> (Aug 28, pp. 412 – 420, Nashville, TN.
2002). [37] S.A. Weiss, S. Kasif, E. Brill, Text Classification in USENET
[20] OCCS, Overall Construction Classification System, < http:// Newsgroups: A Progress Report, Department of Computer
www.occsnet.org> (Aug 28, 2002). Science, The Johns Hopkins University, Baltimore, MD,
[21] Y. Rezgui, Y. Brown, G. Cooper, J. Yip, P. Brandon, J. Kirk- 1997 (April).
ham, An information management model for concurrent con- [38] W.H. Wood, The development of modes in textual design
struction engineering, Journal of Automation in Construction data, Proceedings of the ICCCBE-VIII, Palo Alto, CA, CESE,
5 (4) (1996) 343 – 355. Reston, CA, 2000 (August), pp. 882 – 889.
[22] E.M. Rojas, A.D. Songer, Web-centric systems: a new para- [39] A. Zarli, Y. Rezgui, A survey of internet-oriented technologies
digm for collaborative engineering, Journal of Management in for document-driven applications in construction open dynamic
Engineering 15 (1) (1999) 39 – 45. virtual environments, Proceedings of CIT 2000-International
[23] G. Salton, C. Buckley, Term weighting approaches in auto- Conf., vol. 1, Construction Information Technology, Reykja-
matic text retrieval, Information Processing and Management vik, Iceland, 2000, pp. 1089 – 1101.
2 (5) (1988) 513 – 523. [40] Y. Zhu, R.R. Issa, Web-based construction document process-
[24] R.J. Scherer, S. Reul, Retrieval of project knowledge from ing via malleable frame, Journal of Computing in Civil Engi-
heterogeneous AEC documents, Proceedings of the ICCCBE- neering 15 (3) (2001) 157 – 169.
VIII, Palo Alto, CA, ASCE, Reston, VA, August, 2000, pp.
812 – 819.
[25] F. Sebastiani, Machine learning in automated text categorisa-