You are on page 1of 25

Automation in Construction 162 (2024) 105376

Contents lists available at ScienceDirect

Automation in Construction
journal homepage: www.elsevier.com/locate/autcon

Automated BIM generation for large-scale indoor complex environments


based on deep learning
Mostafa Mahmoud a, Wu Chen a, Yang Yang a, Yaxin Li a, b, *
a
Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Hong Kong, China
b
Micro Dimension Technology Limited, Hong Kong, China

A R T I C L E I N F O A B S T R A C T

Keywords: Large volumes of 3D parametric datasets, such as building information modeling (BIM), are the foundation for
3D reconstruction developing and applying smart city and digital twin technologies. Those datasets are also considered valuable
Building information modeling (BIM) tools for efficiently managing rebuilt structures during the operation and maintenance stages. Nevertheless,
Semantic segmentation
current approaches developed for the scan-to-BIM process rely on manual or semi-automatic procedures and
Point clouds
Deep learning
insufficiently leverage semantic data in point clouds. These methods struggle to accurately represent large-scale
indoor complex layouts and extract details from irregular-shaped unstructured elements, causing inefficiencies in
BIM model generation. To address these issues, we propose an innovative scan-to-BIM framework based on deep
learning algorithms and raw point cloud data, enabling the automatic generation of 3D models for both struc­
tured and unstructured indoor elements. Initially, we propose an enhanced deep learning neural network to
improve the point clouds' semantic segmentation accuracy. Subsequently, an efficient workflow is developed to
reconstruct 3D building models of structured indoor scenes. The proposed workflow can reconstruct large-scale
data with multiple room layouts of Manhattan or non-Manhattan structures and reconstruct 3D models auto­
matically by using a BIM parametric algorithm implemented in Revit software. Moreover, we introduce a robust
method for unstructured elements to automatically generate corresponding 3D BIM models, even when the
incorporating semantic information is incomplete. The proposed approach was evaluated on synthetic and real
data for different scales and complexities of indoor scenes. The results of the experiments demonstrate that the
improved model significantly enhances the overall semantic segmentation accuracy compared to the baseline
models. The proposed scan-to-BIM framework is efficient for indoor element 3D reconstruction, achieving pre­
cision, recall, and F-score values ranging from 96% to 99%. The generated BIM models are competitive with
traditional methods regarding model completeness and geometric accuracy.

1. Introduction resources while enhancing adaptability. Regarding the AEC (architec­


ture, engineering, and construction) related fields, BIM has emerged as a
Over 75% of people worldwide reside in cities and spend over 90% of crucial and essential tool for facility management and the construction
their time indoors [1], making indoor 3D models extremely important. field. In contrast to conventional 3D models, such as point clouds or
The indoor parametric model with rich semantic and geometrical in­ mesh models, which only contain spatial information, BIMs have addi­
formation is crucial for a wide variety of applications, such as planning tional information (such as topology, material, and function). Moreover,
for building maintenance and renovations [2], product lifecycle and they digitally represent linked information, enhancing the efficiency of
emergency management [3], and location-based services [4]. The uses modification and maintenance operations. In the construction industry,
of 3D semantic reconstruction are more comprehensive than traditional BIM is widely used in planning, design, and construction, and recently,
3D geometric reconstruction techniques, especially when analyzing and the focus of usage has changed from the early lifecycle stages of building
interpreting models [5]. Manual indoor digital model creation is slow to the maintenance or renovation of existing buildings [6,7]. The
and impractical for frequent revisions due to space and layout changes. operation phase is responsible for more than 60% of the costs during
There's a growing need for automated workflows to save time and their lifetime [8]. Considering that as-designed BIM is never available

* Corresponding author at: Department of Land Surveying and Geo-Informatics, The Hong Kong Polytechnic University, Hong Kong, China.
E-mail address: yaxin-roan.li@polyu.edu.hk (Y. Li).

https://doi.org/10.1016/j.autcon.2024.105376
Received 10 October 2023; Received in revised form 3 February 2024; Accepted 7 March 2024
Available online 18 March 2024
0926-5805/© 2024 Elsevier B.V. All rights reserved.
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

for old or converted buildings, the scan-to-BIM approach has been geometry and semantics of interior environments. These works can be
widely adopted to generate as-built BIM models of those facilities, which categorized into primitive-based, geometry-based, and deep learning-
often involve manual efforts by specialist engineers [9]. However, this based techniques.
process is still labor-intensive and difficult to standardize. There are
three stages in the conventional method for creating indoor BIMs for
2.1. Primitive-based approaches
existing structures: The first stage is collecting 3D point cloud data with
spatial and color information. The semantic analysis of the 3D point
Primitive-based approaches typically involve the utilization of shape
cloud is the next stage. The spatial information, such as coordinates and
descriptors and primitives to decompose an object into multiple shape
dimensions, and the attribute information, such as the materials and
primitives, which are combined to reconstruct the object. Nan et al. [20]
functionality of various construction components, make up this se­
presented a flexible parametric technique to find and match the
mantic information. In the last stage, skilled modelers manually produce
appropriate model from a library of 3D models. Upon discovering a
as-built building information modeling (AB BIM) using semantic data
certain semantic class for a segmented object in the point cloud, every
[10–12]. Increased attention is now directed toward reconstructing in­
model is non-rigidly transformed via localized scale deformations. The
door environments to address the limitations of existing reconstruction
model with the least registration residual is considered the best match.
methods. These methods can be classified into primitive-based, geom­
Florent et al. [21] provided a method for segmented point cloud data-
etry-based, and deep learning-based techniques. While these methods
based 3D reconstruction of primitive indoor models. Before combining
have made significant progress in the scan-to-BIM reconstruction pro­
them with semantics, their approach recovers several 3D shape repre­
cess, several challenges still need to be addressed. (1) Automation levels
sentations of the underlying point cloud dataset. Xu et al. [22] proposed
remain relatively low, relying heavily on manual or semi-automatic
a style-content-separating strategy to create a new instance from the
approaches to create high-quality 3D semantic and BIM models
shape set while considering the shape style. The effectiveness of this
[11,13]. (2) Additionally, the present modeling techniques underutilize
strategy lies in the potency of their clustering approach style. By
the semantic data inherent in point clouds, missing out on valuable
avoiding reliance on correspondence, this methodology reveals valuable
priors that have the potential to substantially improve the resilience of
insights within the dataset. This lays the foundation for further analyses
point clouds against noise and incompleteness [14,15]. (3) Moreover,
such as co-segmentation, part correspondence, and improved content
the reconstruction of large-scale room layouts often lacks comprehen­
classification. Additionally, other approaches utilize parametric
sive representation, especially for complex irregular shapes, impacting
modeling [23]. Overall, analyzing 3D shapes and parameters for objects
the visualization quality [16,17]. (4) Lastly, current approaches fail to
with complex geometric structures poses challenges and may lead to
capture essential details of complete indoor scenes, resulting in unsat­
inaccuracies, limiting its reliability mainly to objects with simple
isfying extraction results for more complicated unstructured elements
structures.
with irregular shapes, such as furniture [18,19].
Therefore, this study introduces an innovative and efficient deep
learning-based scan-to-BIM framework to overcome these limitations. 2.2. Geometry-based approaches
This framework is designed to automatically generate 3D models from
scanned point clouds, incorporating both structured and unstructured The main objective of geometric modeling is to recover the complete
elements seamlessly integrated into BIM systems. The study makes sig­ 3D geometry of the interior environment. These methods entail
nificant contributions within this framework, including a proposed se­ modeling polygonal structural elements like walls, floors, ceilings,
mantic segmentation model and an automated reconstruction approach doors, and windows, along with the anticipation of room layouts. Their
for structured and unstructured elements. Primarily, the study in­ primary aim involves incorporating prior geometric knowledge like
troduces an improved deep learning semantic segmentation neural point normal, adjacency, and spatial distance within the feature
network to enhance the identification and classification of various ele­ extraction algorithm, which presents inherent challenges. The identifi­
ments within scanned data. Additionally, a novel approach is presented, cation and classification of building planes and other features relied on
enabling the automatic reconstruction of 3D BIM with semantic features the region-growing technique [24]. Although leveraging geometric
derived from 3D indoor point clouds. This approach significantly im­ priors intends to boost the accuracy of feature extraction, aiming for
proves the precision and effectiveness of 3D building model recon­ enhanced 3D reconstruction outcomes, the approach encounters certain
struction by employing data-driven techniques. Specifically designed for limitations. To minimize interaction during the 3D reconstruction, Ma
large-scale data with complex space layouts, it doesn't rely on prior in­ et al. [25] produced an effective reconstruction from architectural
formation or specific room layouts. It incorporates a BIM parametric drawings using matching and classification algorithms. Although this
algorithm implemented in Revit, thereby streamlining the automation of method is based on 2D architectural drawings, due to irregularities in
the scan-to-BIM process. Furthermore, the study introduces a robust architectural representations, total automation for creating 3D building
method to generate 3D BIM object models automatically for indoor models from 2D architectural drawings has not been realized. According
unstructured elements while integrating semantic analysis. It involves to Macher et al. [12], a semi-automatic method is introduced for
semantic segmentation using deep learning and clustering techniques to reconstructing existing structures' 3D models using point clouds. The
extract object data and create automatic 3D BIM objects through an point clouds were collected using laser scanners, and the technique also
algorithm implemented in Revit software. This method defines spatial applied an opening-recognized strategy for window and door detection.
relationships among indoor objects, eliminating generated noise during While this procedure is primarily manual, automation will reduce the
the semantic segmentation. time spent on manual modeling and minimize user errors. Valero et al.
The rest of this paper is structured as follows: We discuss previous [26] also produced a semantic BIM modeling workflow, introducing a
studies that have utilized existing methods for modeling indoor struc­ semi-automatic scan-to-BIM method. The technique fully incorporates
tures and then introduce the methodology employed in our proposed point clouds and visual images to extract and model geometric aspects of
framework. Each step considered in the workflow is analyzed and dis­ diverse interior components. However, the Manhattan assumption and
cussed. Also, the detailed outcomes of data collection and results are planar theory are the foundations for wall reconstruction and inaccurate
demonstrated. We conclude with recommendations for future research. wall recognition. Generally, these approaches can reconstruct geometric
elements while maintaining semantic details, but their dependability is
2. Related works restricted to less intricate structures such as floors, walls, and ceilings.
Moreover, they are unsuitable for large-scale point clouds or recon­
In recent years, many studies have examined how to reproduce the structing irregular indoor objects.

2
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. 1. The proposed overall reconstruction framework.

2.3. Deep learning approaches in extracting complex building components and reconstructing indoor
objects. Perez-Perez et al. [37] introduced Scan2BIM-NET, employing a
Recent studies have emphasized deep learning techniques for 3D deep learning technique to segment structural and mechanical compo­
shape classification and segmenting point clouds. Deep learning nents. However, they utilized the CNN and RNN networks using a
methods outperform primitive-based and geometry-based approaches in limited 83-room dataset in training and based on only five classes. Their
3D model reconstruction because they can independently learn complex study relies on a specifically created dataset that narrows the scope of
features, offer a deeper understanding of semantics, exhibit flexibility model comparability, as broader assessments usually depend on
across various settings, and handle extensive point cloud data more commonly used datasets. Moreover, they didn't introduce any work
efficiently [27]. Subsequently, significant strides have been made in related to automatic 3D BIM model generation for structured elements
deep learning-based point-cloud segmentation [28]. Two types of or reconstruction of 3D BIM for unstructured indoor objects. Park et al.
techniques are used: indirect deep learning methodologies, which [38] employed a deep learning-based framework to automate the
include project- and voxel-based methods, and direct deep learning recognition of construction objects and their attributes. However, limi­
methodologies, which include point- and graph convolution network- tations arise from their use of the Point-Net++ model, resulting in
based methods. The voxel-based process entails converting the orig­ reduced segmentation accuracy. They also didn't integrate their model
inal point cloud into a 3D voxel format to enable point cloud classifi­ into BIM software like Revit, opting for a model viewer in Python.
cation or segmentation. The project-based method comprises projecting Moreover, their study primarily focuses on outdoor areas, using
the 3D point cloud onto a 2D image. Qi et al. [29] presented Point-Net as specialized datasets, complicating comparisons with other scenarios.
a ground-breaking technique to address these issues and enable direct Furthermore, they didn't introduce automated 3D BIM model generation
deep learning on point clouds. Point-Net's success was a foundation for work for structured and unstructured elements. In the study of Park et al.
later algorithms like Point-Net++ [30] and Point-CNN [31]. The growth [39], the primary focus revolved around reconstructing 3D models for
of graph neural networks also led to the creation of a GCN-based method indoor objects using deep learning techniques, specifically through se­
for semantic or instance segmentation [32]. The GCN method improves mantic segmentation. However, their utilization of the Point-Net
performance by considering both points and edges in the neighborhood. network results in low semantic segmentation accuracy, leading to
DGCNN [33], SPG [34], and RandLA-Net [35] are a few related tech­ decreased precision in generating 3D models. Moreover, their reliance
niques that have made notable strides in this area. on a limited dataset restricted the diversity of dataset application and
Recently, researchers have utilized deep learning techniques in in­ adaptability across various settings. Additionally, they employed
door scene reconstruction. Li et al. [36] provided a framework for manual modeling for door and window openings, while their parametric
generating indoor AB BIMs using low-cost RGB-D sensors' point clouds. algorithm exhibited less flexibility in modifying different indoor objects
Their deep learning-based 3D reconstruction of indoor spaces from RGB of the same type. Huan et al. [40] suggested a multi-task neural network
images and depth data is more efficient and cost-effective than tradi­ using geometry information to take RGB-D data and rebuild indoor
tional segmentation methods. However, their approach has limitations scenes in a semantic 3D format. The layout of the room was estimated

3
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. 2. The proposed neural network structure of the deep learning model.

using a geometric extractor network. However, these learning-based segmentation is thoroughly studied and explained. Thirdly, an innova­
techniques are challenging for large-scale scene reconstruction and tive approach that uses 3D indoor point clouds to rebuild 3D building
can only be applied in tiny spaces with Manhattan buildings. In the study models automatically, emphasizing space-forming objects, is presented.
of Tang et al. [18], laser scanner point clouds can automatically The proposed approach uses semantic segmentation results and a room
reconstruct an indoor BIM model. An initial deep learning point cloud clustering density-based algorithm [43] to make it easier to separate
classification algorithm processes the entire scene. To create the 2D data into distinct room patches. The room boundary line segments are
binary map, geometric primitives are generated using a hybrid surface detected by applying plane detection to the planar surfaces based on the
extraction method. random sampling and consensus (RANSAC) algorithm [44]. Point cloud
Nevertheless, the final 3D model loses some geometric precision, plane projection and contour point extraction via the Alpha shape al­
particularly for curved structures. Moreover, this approach cannot gorithm [45] are applied.
reconstruct furniture like tables and chairs. Kim et al. [41] presented a The line feature extraction step is applied to detect each wall line and
method for generating 3D models with materials using point clouds and deduce room corner points. Following this step is the wall opening
panorama images, yet it exhibits limitations. They used the Point-Net detection to determine the wall surface locations. Finally, automatic 3D
segmentation model with limited accuracy and didn't have a compre­ BIM modeling for the structured elements is produced using the para­
hensive strategy for reconstructing structured building elements. Also, metric Dynamo algorithm [46] implemented in Revit software.
their approach doesn't handle large-scale data and heavily depends on Fourthly, a method that uses point clouds to rebuild 3D indoor object
segmentation outcomes for modeling opening elements, potentially models for space-occupying objects automatically is also presented. The
leading to inaccuracies. Xiang et al. [42] utilized a deep learning model proposed method uses semantic segmentation results and extracts in­
to generate 3D BIM models from 2D images, transforming them into 3D door classes. It then applies clustering to each class, and bounding boxes
semantic point clouds. However, this approach presents limitations: it are created. The final step involves fine-tuning these bounding boxes to
demands high computational resources, affects efficiency, and yields their optimal dimensions, subsequently creating automatic BIM models
lower geometric accuracy than laser scanner data. Its scope is confined encapsulating the unstructured indoor elements.
to structural elements and smaller areas, omitting indoor objects and
impeding its scalability to larger environments. 3.1. Preprocessing of input point clouds
Hence, our proposed study introduces a deep learning-based scan-to-
BIM framework, overcoming the previous limitations in generating 3D Considering point clouds generated from buildings are typically
models from scanned indoor point clouds. It improves semantic seg­ dense and large, making it impractical to work directly with the raw data
mentation accuracy and automates precise 3D BIM model reconstruction due to the time consumption and the potential for computational errors,
for large-scale data and various layouts. It also integrates semantic in­ it is necessary to subsample the point cloud. Subsampling does not affect
formation to automatically reconstruct 3D indoor objects for BIM, the linear distribution of plane projections within the building and
addressing spatial relationships to overcome segmentation noise. significantly improves computational efficiency by reducing the amount
of data. This procedure subdivides the entire domain into small cube-
3. Methodology like units known as voxels with identical grid spacing. The arithmetic
average of the points within each voxel, which contains a specific
The proposed overall reconstruction framework is shown in Fig. 1. number of points (n), represents each voxel. However, after sub­
The input of the framework is the raw indoor point clouds, and the sampling, the dataset may still have outliers, such as scan-generated
output is the 3D BIM models for structural and non-structural elements, sparse spots. Based on the local distances from their neighbors, those
respectively. The reconstruction process encompasses three primary points are identified to eliminate points from a given sphere with few
stages: semantic segmentation of the point clouds, 3D reconstruction of neighbors around them [47]. The number of neighboring points and
space-forming objects, and 3D reconstruction of space-occupying ob­ radius parameters can be used to adjust and filter the data.
jects. Firstly, the framework starts with data preprocessing to optimize
data for subsequent processing. Secondly, an enhanced deep learning
model tailored to improve the efficiency and accuracy of semantic

4
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. 3. The proposed improved local feature aggregation module: The top panel shows the local spatial encoding block and the attention pooling mechanism.

3.2. Semantic segmentation-based improved deep learning model complexity. However, it mainly depends on adjusting the learning
feature modules, sampling, and loss function. The features from the
For large-scale regions, semantic segmentation of 3D point clouds is encoding side are then combined through skip connections. The output
essential for reconstructing parametric indoor environments. RandLA- layer comprises the last four layers, which are the fully connected layers
Net [35] is an efficient point cloud semantic segmentation neural (FCLs), rather than the three layers in the original model. FCLs express
network that employs an encoding-decoding approach. It uses a multi- the learned features from the previous layers to predict the semantics or
layer perceptron (MLP) as its fundamental unit and adheres to random labels for each input point. The final hidden layer is extended with a
sampling and local feature aggregation principles. For use in large-scale dropout layer to prevent overfitting. The predicted semantics of each
interior scenarios, we developed an enhanced semantic segmentation point constitute the network's final output.
neural network for improved effectiveness. This involved refining the
original algorithm's sampling strategy, strengthening the local feature 3.2.2. Sampling strategy
aggregation module, and integrating dilation convolution. Additionally, The deep learning model needs to perform a sampling step initially;
we introduced an innovative loss function and made suitable adjust­ this involves randomly down sampling the input point cloud to reduce
ments to the overall network structure. These modifications collectively the total points while preserving the overall structure. This operation
contributed to enhancing the performance of the semantic segmentation reduces the computational cost and memory requirements of subsequent
process. operations. However, it can quickly result in the loss of essential data. To
address this problem, we propose using the inverse density importance
3.2.1. Neural network structure sampling method [48] to solve this problem and to overcome imbal­
As shown in Fig. 2, four layers comprise the proposed network anced sampling. IDIS involves giving each point in the point cloud a
structure: input, encoding, decoding, and output. The input layer re­ weight based on its local density. Low-density regions are assigned
ceives the point cloud dataset, where each point cloud's size is N*d. N larger weights than high-density regions, and the reverse is true for
represents the number of points, and d is the feature dimension. The points in those regions. When IDIS samples K points from the total N
feature dimensions primarily encompass point cloud coordinates (x, y, z) points, it reorders them according to their densities before choosing the
and color values (r, g, b), with no incorporation or reliance on point first K points with the highest densities [35]. This strategy aids in
cloud intensity within the feature set. The improved local feature ag­ making sure that crucial point cloud regions are accurately represented
gregation module (ILFA) and inverse density importance sampling in the training data, thus achieving better accuracy in the case of testing
(IDIS) are used in each encoding layer of the new model instead of the data rather than utilizing original random sampling.
random sampling and local feature modules used in the original model.
The network employs five encoding layers, where the data size 3.2.3. Dilation convolution
decreased by 25%, and the feature dimensions gradually increased to In Fig. 3, within the dilated residual block of the aggregation module,
retain more information from the 3D point cloud. The decoding layers we introduce a solution to the problem of losing both spatial and se­
are five and consist of multiple decoding levels that expand the point mantic features in the input point clouds. This is achieved through the
clouds and reduce the feature dimensions. Improving the model's ac­ utilization of dilated convolution. Dilated convolution is a unique
curacy does not depend on increasing the number of encoding or structure of the convolution process that allows the convolution kernel's
decoding layers to avoid the computational cost, overfitting, and model receptive field to be expanded without additional parameters. The

5
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

inclusion of dilated convolution helps enhance the model's ability to finding the nearest neighbors, encoding the positional data of related
capture and retain important details while processing the point cloud points, and enhancing the point cloud features. After sampling, the K-
data. Convolutions with variable dilation rates (r = 2, r = 3) are utilized nearest neighbors (KNN) algorithm is employed in the local spatial
to broaden the receptive area and preserve more geometric and semantic encoding section to encode the points effectively. The attentive pooling
details. The enhanced model incorporates a series of components in the approach automatically uses point clouds to learn spatial position in­
feature aggregation module. It includes local spatial encoding, attention formation, preserving the point cloud's crucial characteristic informa­
pooling, and a multi-layer perceptron applied in the dilated residual tion. The SoftMax function calculates the necessary parameters and
block. weights for this pooling technique. Using the dilated residual block
becomes crucial for preserving critical point clouds as much as possible.
3.2.4. Loss function Additionally, it significantly enhances the semantic segmentation
The loss function, which serves as the optimization objective during performance of the point clouds by efficiently preserving the intricate
training, directs the deep learning model's parameter updates. The local spatial structure. We improved the local feature aggregation
model's ability to recognize appropriate patterns, effectively generalize module to enhance the overall accuracy of semantic segmentation. This
new data, and achieve the desired performance are all influenced by this can be achieved by utilizing the dilation convolution block twice and
function choice. The weighted cross-entropy loss is used in the original integrating additional LSE, attention pooling, and spatial MLP units
semantic segmentation model [35]. Although the weighted cross- compared to the original local feature module. The combination of these
entropy loss addresses class importance, it lacks effectiveness in dis­ modules (i) enables the model to aggregate and process local spatial
tinguishing challenging instances. To tackle this, we propose employing information effectively, (ii) broadens each point's receptive area, (iii)
weighted cross-entropy loss along with weighted focal loss. This fusion minimizes the impact of data loss due to sampling, and (iv) utilizes the
rectifies class imbalances by emphasizing challenging samples, ensuring dilated residual block to preserve crucial semantic information during
equity in learning across different classes for the point clouds repre­ the feature aggregation process.
senting urban areas. Focal loss [49] is a beneficial tool, particularly in
point cloud segmentation, where challenges associated with the class 3.3. Workflow of point cloud reconstruction of space-forming objects
imbalance and noisy labels are prevalent. Compared to cross-entropy
loss, focal loss evens the class imbalance by focusing on rare and The proposed method efficiently identifies the structured compo­
wrongly classified samples. This creates a balanced concentration of nents within indoor buildings through the analysis of disordered point
common and uncommon classes. It effectively handles noisy or mis­ clouds, as depicted in Fig. 1. It leverages the outcomes of semantic
labeled data, reducing their impact during training and enhancing the segmentation and employs room clustering. Subsequently, the method
model's resilience to labeling issues. This loss function offers flexibility detects room boundary line segments by employing planar surface
through a customizable focusing parameter, balancing precision and detection. This step is complemented by extracting line features, dis­
recall based on segmentation task requirements. Integrating the focal tinguishing individual wall lines, and identifying precise room corner
loss in training expedites convergence and improves learning stability. points. Following this step is the wall opening detection to determine the
This enables efficient learning from complex cases, leading to better wall surface locations. Finally, using Revit software, the method gen­
generalization and better performance on unknown data. Below is an erates 3D BIM models proficiently encapsulating the structured
explanation of the weighted cross-entropy with focal loss function: constituents.
∑n
Losswce (y, ̂
y) = − i
αi p(yi )log(p(̂y ) ) (1) 3.3.1. Room clustering using a density-based algorithm
Where y and ̂ y stand for the actual and predicted class labels. αi is the The deep learning semantic segmentation model is applied to the
√̅̅̅ preprocessed large-scale point clouds, producing semantic points as
class-specific weighting factor and equal αi = 1/ fi , and fi refers to the
described in Section 3.2. Point clouds can be divided into clusters using
frequency rate of the ith class, while n represents the total count of
clustering algorithms in accordance with predetermined criteria. The
classes.
three primary categories of its methodologies are k-clustering, hierar­
Focal loss incorporates an additional factor of (1 − pt )γ to the regular chical, and density approaches. The utilized clustering algorithm is
cross-entropy criterion. density-based spatial clustering of applications with noise (DBSCAN)
Lossfocal (pt ) = − α(1 − pt )γ log(pt ) (2) [43]. Our objective at this stage is to cluster the entire floor into several
room regions, and the major feature employed by the clustering tech­
Where pt is the predicted probability, γ is the focusing parameter, nique is the ceiling cluster of the segmentation result. The DBSCAN
and α is the balancing parameter. method can automatically determine the number of clusters, exhibits
In conclusion, the enhanced model utilizes a summation of the robustness to outliers, and performs effectively with clusters of varying
weighted cross-entropy loss and the focal loss as a combined loss func­ shapes. The study used the DBSCAN technique to automate the scan-to-
tion. BIM process using 3D point clouds. During the clustering process, this
Lossused = Losswce + Lossfocal (3) technique involves pre-setting the values of epsilon and the minimum
points. The epsilon parameter represents the radius distance from a core
In our empirical experiments, we systematically investigated point acting as the center, with the minimum number of points falling
different combinations of α and γ values (α = [0.25, 0.5, 0.75] and γ = within the epsilon's boundaries.
[0.5, 1, 2, 5]). Our investigations revealed that the combination (α =
0.25, γ = 2) achieved optimal segmentation results for our point cloud 3.3.2. Room boundary line segment detection
dataset. While specific values may vary depending on dataset charac­ 3D line segments of different structures are crucial for human
teristics and segmentation complexity, our findings indicate that this perception and have various applications, including building outline
combination performs favorably for point cloud segmentation tasks. extraction and building model reconstruction [50]. While detecting line
segments from 2D images has been extensively studied [51,52], unlike
3.2.5. Feature aggregation module 2D images, where pixels have close relationships with their neighbors,
Fig. 3 shows the proposed improved local feature aggregation unorganized point clouds lack connectivity information. There is a rising
module. The essential characteristics of the 3D point cloud are extracted need for the concise and significant abstraction of point cloud data due
using this module. This module comprises local spatial encoding (LSE), to point clouds' irregular, unstructured, and unordered nature [14]. In
attentive pooling, and a dilated residual block. LSE plays a crucial role in this work, the detection of room boundary line segments involves three

6
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

different datasets, its adaptability in handling concavity, and its expe­


dited processing time relative to alternative methods such as Delaunay
triangulation [54]. Fig. 5a and b show the projected plane of a room
cluster and extracted line segments, respectively.
Algorithm 1. Room Boundary Detection.

Input: clustered point cloud: { }


Distance threshold of RANSAC: { }
Number of iteration threshold of RANSAC: { }
Alpha shape parameter: { }
Output: 3D line segment points: 1
1: initialize output folder { }
2: define: function { 1 } to project points to a plane ( , { , , , }):
3: extract x, y, and z coordinates from { }
4: project each point onto the plane using projection equations.
5: return: projected point cloud { }
Fig. 4. Point cloud projection on a 3D plane surface. 6: end function
7: for each In folder:
steps: (i) detecting the 3D plane, (ii) projecting the point cloud onto the 8: for ( ≤ ):
9: apply RANSAC
plane surface, and (iii) extracting contour points. The detailed imple­ 10: if calculated distance ≤ { }:
mentation of the room boundary line segment detection is presented in 11: add the points to candidate plane inliers
Algorithm 1. 12: end if

RANSAC [44], a resilient model-fitting algorithm often compared to


13: select the best candidate plane { }
14: end for
linear regression in terms of its performance, is used to identify the 3D 15: extract and save plane parameters { , , , }
plane. RANSAC follows these essential steps: random subset selection 16: project { } onto the plane using { 1 }: { }= 1 ({ })
and model fitting, detecting outliers, and iteratively refining the model 17: calculate the alpha shape with { } value

to estimate plane surface parameters. RANSAC can perform a reliable


18: save extracted line segment points 1 to { }
19: end for
estimation of the plane surface parameters by carrying out these pro­ 20: return:
cedures. To ensure consistency among the point clouds within each
cluster, a process of projecting them onto a shared planar surface is
conducted, as illustrated in Fig. 4. For applying the projection of points 3.3.3. Line feature extraction
to a plane that has specific detected plane parameters, we can use Eq. (4) The process of line feature extraction involves the identification of
to get the projected points. wall-representing lines and the extraction of their endpoints. It involves
⃒ ⃒ ⎡⃒ ⃒
⃒ xp ⃒ ⃒ xi ⃒
⃒ ⃒ ⎤/
⃒A⃒ √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ two key aspects: detecting lines and extracting corner points. Initially,
⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒ yp ⃒ = ⎣⃒ yi ⃒ − (A*xi + B*yi + C*zi + D)*⃒ B ⃒ ⎦
(
A2 + B2 + C 2
)
(4) contour points obtained from earlier stages are used to detect wall lines.
⃒ ⃒ ⃒ ⃒ ⃒ ⃒
⃒ zp ⃒ ⃒ zi ⃒ ⃒C⃒ This is accomplished through the application of the RANSAC algorithm
( ) within the x-y plane under a predetermined threshold. Each detected
Where (xi , yi , zi ) are coordinates of the input point cloud, xp , yp , zp line corresponds to a wall's length, while the average z-values are
are coordinates of the projected point cloud, and (A, B, C, D) are the used retained to denote ceiling height. The identified lines, displayed with
plane parameters. distinct colors in Fig. 5c, are methodically arranged in a logical order by
At the contour point extraction stage, the Alpha shape algorithm their orientation and their distance from the central points to each line's
[45] is applied to detect the line segment and get the contour points of center and intersected. To conclude, the intersection of these lines fa­
edges for each resulting planar projected cluster. This procedure in­ cilitates the extraction of corner points for each room cluster. For an in-
volves connecting the points with edges and forming a convex hull depth insight into the implementation of line feature extraction, Algo­
around them [53]. Noteworthy, the advantages of utilizing the alpha rithm 2 provides a comprehensive elucidation.
shape algorithm encompass its ability to generate intricate forms of

Fig. 5. Line detection and feature extraction: (a) clustered and projected plane of a room, (b) contour line segments, and (c) line detection and corner extraction.

7
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Algorithm 2. Line Feature Extraction.

Input: 3D line segment points:


Distance threshold of RANSAC: { }
Number of iteration threshold of RANSAC: { }
Number of lines parameter: { }
Output: line corner points: 2
1: initialize line endpoints { } & best lines { } & output intersection list { }
2: define: function { 2 } to sort lines in a specific order ( )):
3: calculate center of { } ={ }
4: calculate angle between { }, {Axis {x=0}} ={ }
5: sort { } based on { }
6: return: sorted lines { }
7: end function
8: for ( ≤ ):
9: for ( ≤ ):
10: apply RANSAC
11: if calculated distance ≤ { }:
12: add the points to candidate line inliers
13: end if
14: end for
15: add line properties to { }
16: calculate { }
17: remove inlier points from the remaining data
18: end for
19: calculate { } using { 2 }: { }= 2 ({ })
20: for ( ≤ { }):
21: get endpoints of { } & { } −1
22: calculate intersection point { }
23: add { } to { }
24: end for
25: return: 2

Fig. 6. Histogram analysis method for opening detection.

3.3.4. Wall opening detection


The identification and precise localization of building openings
harness the potential of 3D point clouds. Typically, these openings are
doors and windows, which are crucial components of a building's ar­
chitecture. Precise detection of doors and windows contributes to
creating realistic 3D models of buildings [55,56]. The method used to
detect wall openings, encompassing doors, windows, and structural
openings revolves around histogram analysis. As anticipated in Fig. 6,
the histogram-based method delineates the steps to detect wall open­
ings. First, a clustering method based RANSAC algorithm is applied to
deal separately with each wall by getting the planes with corresponding
projected points. Each wall coordinate system is transformed from point
cloud global coordinates to the wall local coordinates, where the x-axis

Fig. 7. Example of applying the histogram method to a wall: (a) histogram of projected points and (b) scatter plot for the detected wall points.

8
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. 8. A visual flowchart for the Dynamo algorithm workflow for structured elements.

openings by comparing the lowest average elevation of clusters with the


wall, and their coordinates are retransformed into the point cloud co­
ordinate system. Moreover, the door and window classifications derived
from segmentation results can be utilized, and additional post-
processing of the projected point clouds is conducted to obtain the
opening center and dimensions.
⎡ ⎤ ⎡ ⎤⎡ ⎤ ⎡ ⎤
xl cosθ − sinθ 0 xg xo
⎣ yl ⎦ = ⎣ sinθ cosθ 0 ⎦⎣ yg ⎦ + ⎣ yo ⎦ (5)
zl 0 0 1 zg zo
( )
Where θ represents the rotation angle, and (xl , yl , zl ), xg , yg , zg , and
Fig. 9. An illustration of erroneously detected bounding boxes for indoor ob­ (xo , yo , zo ), represent the coordinates of the local wall, global point
jects: (a) table cluster and clouds, and original translation vectors, respectively.
(b) chair cluster.
3.3.5. Automatic BIM modeling of structured elements
The input data for this step consists of the final coordinates of corner
points and openings, exported in an Excel file for automatic BIM
modeling. A more effective modeling method is based on a parametric
algorithm designed with Dynamo visual programming [33]. It replaces
the manual creation of the elements with repeatable operations and
improves the productivity of the workflow in Revit software.
The visual flowchart for the Dynamo algorithm workflow is illus­
trated in Fig. 8. The algorithm is composed of consecutive groups; each
group contains a number of nodes connected by links. The algorithm
starts by importing the input data from a specific file and then assigning
the data to separate coordinates. After that, it applies suitably pro­
Fig. 10. An illustration of adjusted bounding boxes for indoor objects: (a) table grammed scripts with nodes, links, and Python codes for creating
cluster and (b) chair cluster. different elements of walls, floors, ceilings, and openings. Rather than
manually creating the BIM model, this Dynamo algorithm automates the
coincides with wall length and the z-axis coincides with wall height, as development of all structured elements efficiently.
described in Eq. (5). Next, the projected data undergoes grid-based
processing into smaller pixel sizes, guided by a predefined threshold.
3.4. Workflow of point cloud reconstruction of space-occupying objects
This process creates an estimated histogram for the projected wall, as
illustrated in Fig. 7(a). A histogram analysis determines the presence of
As mentioned in Section 2, previous research has attempted to tackle
wall openings through an occupancy analysis [57,58]]. Since openings
the challenge of automating scan-to-BIM processes, primarily focusing
are expected to be represented by only a few points in the histogram, this
on modeling essential components such as walls, ceilings, and floors.
criterion serves as a basis to define the openings, as depicted in Fig. 7(b).
However, objects within indoor areas, like furniture, still demand seg­
The detected opening is clustered and processed to remove uncorrected
mentation modeling efforts. Modeling objects obstructed from view in
clusters using conditions such as the number of points in each cluster,
the scanned point clouds presents an additional constraint. Geometric
the minimum cluster length, and the minimum height-to-width ratio.
rule-based or segmentation-based approaches have been used to deter­
Subsequently, the opening clusters are classified as door and window
mine the point clouds' minimal bounding boxes by analyzing planar

9
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. 11. A visual flowchart for the Dynamo algorithm workflow for unstructured elements.

Table 1
Statistics for separate indoor spaces of the S3DIS dataset [64].
Area Area (m2) Volume (m3) Office Conf. Room Audit. Lobby Lounge Hallway Copy Room Pantry Open Space Storage WC Total

Area 1 965 2850 31 2 0 0 0 8 1 1 0 0 1 44


Area 2 1100 3065 14 1 2 0 0 12 0 0 0 9 2 40
Area 3 450 1215 10 1 0 0 2 6 0 0 0 2 2 23
Area 4 870 2780 22 3 0 2 0 14 0 0 0 4 2 47
Area 5 1700 5370 42 3 0 1 0 15 0 1 0 4 2 68
Area 6 935 2670 37 1 0 0 1 6 1 1 1 0 0 50
Total 6020 17,950 156 11 2 3 3 61 2 3 1 19 9 270

Fig. 12. Relationship between evaluation accuracy and loss function across training steps.

Table 2
Comparison of the semantic segmentation models tested on S3DIS area 5.
Model Standard metrics Class IOU % (Area-5)

m-IOU (%) m-Acc (%) OA (%) Ceil. Floor Wall Beam Col. Win. Door Table Chair Sofa Book. Board Clutt.

Point-Net [29] 41.1 49 – 88.8 97.3 69.8 0.1 3.9 46.3 10.8 59.0 52.6 5.9 40.3 26.4 32.2
PointNet ++ [30] 53.5 – 83 89.5 97.2 70.0 0.0 20.9 38.0 33.4 70.2 75.5 39.8 58.3 47.0 41.6
Seg-Cloud [66] 57.4 48.9 – 90.1 96.1 69.9 0.0 18.4 38.4 23.1 70.4 75.9 40.9 58.4 13.0 41.6
Tangent-Conv [67] 52.6 – – 90.5 97.7 74.0 0.0 20.7 39.0 31.3 69.4 77.5 38.5 57.3 48.8 39.8
PointCNN [31] 57.3 63.9 85.9 92.3 98.2 79.4 0.0 17.6 22.8 62.1 74.4 80.6 31.7 66.7 62.1 56.7
SPG [34] 58.0 66.5 86.4 89.4 96.9 78.1 0.0 42.8 48.9 61.6 84.7 75.4 69.8 52.6 2.1 52.2
PointWeb [68] 60.3 66.6 87 92.0 98.5 79.4 0.0 21.1 59.7 34.8 76.3 88.2 46.9 69.3 64.9 52.5
PCT [69] 61.3 67.7 – 92.5 98.4 80.6 0.0 19.4 61.6 48.0 76.6 85.2 46.2 67.7 67.9 52.3
RandLA-Net [35] 62.4 71.4 87.2 91.1 95.6 80.2 0.0 24.7 62.1 47.7 76.2 83.7 60.2 71.1 65.7 53.8
SCF-Net [70] 63.7 71.8 87.2 90.8 97 80.9 0.0 19.9 60.7 44.6 79.4 87.9 76.1 71.1 68.8 50.4
Our model 64.5 72.5 87.9 91.9 97.1 81.3 0.0 28.6 61.9 47.8 79.6 87.6 69.8 70.0 71.6 52.6

object shapes. Nevertheless, these methods do not consider objects with approaches, allowing for more comprehensive and accurate scan-to-BIM
diverse shapes or indoor facilities [59,60]. The implementation of deep automation by accommodating various object shapes and indoor facil­
learning-based methods is proposed as a solution to address these lim­ ities. There have been developments in deep learning networks for 3D
itations. These techniques can surpass the constraints of traditional semantic segmentation, enabling effective detection of multiple classes

10
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

height (30 cm).


As depicted in Fig. 10, refining bounding boxes results in object di­
mensions closely mirroring the physical environment's reality. These
conditions facilitate this adjustment, ensuring realistic alignment of
indoor objects' sizes and insertion points. Subsequently, this step yields
detected objects, each associated with its unique set of 3D bounding box
coordinates. These coordinates encompass vital attributes such as the
insertion point (x, y, z), width, height, and depth.

3.4.3. Automatic BIM modeling of unstructured elements


Autodesk Revit and the Dynamo tool can help automatically
generate the BIM model, leveraging information obtained from the
bounding boxes. While the bounding box data provides essential infor­
mation like object size and position, it does not indicate each object's
Fig. 13. Comparison of evaluation metrics for different semantic segmenta­ joint or relationship status. A custom Dynamo algorithm was created to
tion models. connect adjacent walls or floors to specific objects while aligning their
dimensions with data derived from bounding boxes to address this issue.
of indoor objects. This section introduces a method to automatically Fig. 11 illustrates a visual flowchart for the Dynamo algorithm workflow
reconstruct 3D indoor BIM objects while incorporating semantic infor­ for the unstructured elements. The algorithm starts by importing the 3D
mation from 3D point clouds. As depicted in Fig. 1, the proposed method bounding box data as input and then assigning the data to separate
utilizes the semantic segmentation results to extract the indoor classes. insertion points and element dimensions. Then, to make unstructured
Subsequently, it applies clustering to each class, generating bounding elements look real, the parametric algorithm in this work brings in
boxes. The bounding boxes are adjusted, and automatic BIM models of standard items from the BIM library and uses position parameters to set
the indoor unstructured elements are finally created. their positions. Lastly, it adjusts the element dimensions based on the
derived bounding box data.
3.4.1. Indoor class clustering and creating bounding boxes
Post-extraction of the indoor classes, a pivotal step involves clus­ 4. Results and discussion
tering each class, thereby enabling the distinct modeling of individual
cluster objects. To facilitate this process, the DBSCAN algorithm is 4.1. Semantic segmentation model evaluation
suitable for clustering without knowing the number of clusters. It works
with only a predefined epsilon and the number of points. The clustering Before testing the model using evaluated point clouds, each data
methodology efficiently segregates each indoor class into discrete clus­ piece undergoes initial preprocessing, including subsampling and
ters, effectively grouping point clouds. However, to proceed toward BIM filtering. The preprocessing step involves using specific parameters such
model creation, these clusters necessitate transformation into compre­ as a voxel size of 2 cm for subsampling, considering 6 neighboring
hensive data encapsulating an object's positioning and dimensions. We points, and an outlier radius of 1 m for filtering the point cloud data.
employ rectangular parallelepiped bounding boxes for every cluster, Subsequently, the improved semantic segmentation model was trained
encompassing essential location and size data [61]. Rectangular co­ and tested using an available indoor public dataset. Further details
ordinates orthogonal to the x, y, and z axes are used to form the regarding the dataset, along with the analysis of training and testing the
bounding boxes. Fig. 9 illustrates two examples of extracted indoor model, are explained in the following subsections.
classes (table and chair) transformed into separate clusters and the
resulting primary bounding boxes for each cluster. 4.1.1. Datasets for 3D point cloud segmentation
The datasets for 3D point clouds are crucial for the deep learning-
3.4.2. Adjusting the bounding boxes and extracting model points based semantic segmentation of 3D point clouds. The semantic seg­
One of the advantages of BIM technology is its spatial relationships mentation model's effectiveness and efficiency are evaluated using these
and elemental connections. This is applied when modeling the 3D model datasets, and they are essential for testing and training the network
in Revit, where wall surfaces connect the floor (“level 1”) and ceiling model [62]. With the increasing adoption of various RGB-D cameras and
(“level 2”). Furthermore, door and window classes are modeled with lidars, numerous research groups have created 3D point cloud datasets
connections to the wall element. It is essential to address the problem of for indoor public datasets. The Stanford large-scale 3D indoor spaces
primary box boundaries by increasing their sizes and adjusting the (S3DIS) data was used as a reference dataset to train and evaluate the
boundary box coordinates through connections to the host element. This model [63]. The dataset is made up of a total of six major interior regions
correction will ensure accurate box dimensions, allowing the model to with 270 rooms. One of the 13 semantic classes is assigned to each point
choose an appropriate component for each object based on width, in the dataset. The 12 classifications include furniture (sofa, table, and
length, and height. Indoor classes like chairs, sofas, and tables are chair), office supplies (board and bookcase), and structural elements
adjusted in the z-direction to align with the floor element. (floor, ceiling, beam, wall, column, door, and window). Items not fitting
Moreover, we can introduce conditions to eliminate incomplete or into the predefined categories are annotated as “clutter”. Table 1 illus­
small parts, specifically those with lengths below the minimum dimen­ trates the distribution of indoor spaces across the six major areas [64].
sion or volume of the object; see Fig. 9. Small data portions emerge due Area 5 in the S3DIS dataset is the most challenging, comprising the
to the point cloud's limited visibility, capturing only the outer surfaces of different and most extensive datasets. Consequently, many researchers
objects. For instance, after clustering, a chair class might split into two rely on conducting test evaluations for semantic segmentation using
groups, with a small segment that doesn't fit the chair's classification, area 5 as the test set and areas 1, 2, 3, 4, and 6 as the training set.
illustrated in Fig. 9 (b). Removing the identified small cluster addresses
this issue, refining chair modeling accuracy. The cluster volume condi­ 4.1.2. Segmentation model training evaluation
tion is integral to this refinement process, pegged at roughly 0.05 m3. The training outcomes are depicted in Fig. 12, illustrating the cor­
This value is derived from minimum dimensions, including parameters relation between segmentation accuracy and the number of iteration
like minimum width (40 cm), minimum length (40 cm), and minimum steps (depicted in orange). Additionally, the graph displays the rela­
tionship between the loss function and the number of iteration steps

11
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. 14. Visualization of semantic segmentation results on S3DIS datasets based on space-forming and space-occupying objects.

(depicted in blue). This graph indicates that the training accuracy star­ was 18,800 across 38 epochs. The network's accuracy reaches 96% at 50
ted with low accuracy and ended the iterations with high accuracy; the epochs during training. After conducting 50,000 iterations in training
opposite occurs with the model loss function. This behavior is commonly the model network, the final training accuracy reached 96.2%, sur­
observed in deep learning models, signaling the model's effectiveness passing the training accuracy of the original model, which achieved
and improvement during the training process. The segmentation accu­ 95.7% [35].
racy reached its optimum level, 98%, when the number of training steps We employed Adam optimizer with a maximum epoch set at 100 and

12
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. 15. The datasets used for evaluating our approach: (a) synthetic 1, (b) synthetic 2, (c) real collected 1, and (d) real collected 2.

500 train steps for model optimization. From its initial setting of 0.01 positive, and false negative situations, respectively. Item k + 1 denotes
after each epoch, the learning rate decreased by 5%. The processing of the total number of classes, including empty and actual classes. The OA
the training set takes about thirteen hours, while the processing of the is the most straightforward metric for calculating the probability that
test set takes around three minutes. The format of each processed point each random sample's semantic annotation result corresponds to the
cloud is composed of 3D coordinates, color values, and labels. The same specific category of actual data annotation. The OA can evaluate the
machine with an Intel Xeon (R) Silver 4216 @ 2.10 GHz CPU, 125 GB overall correct rate, calculated as the percentage of accurate predictions
RAM, and a Tesla V100 PCIe 32 GB GPU is used for all the experiments. in the points. The mean accuracy calculates the overall accuracy for each
class. The m-IoU is a crucial indication for determining how accurate the
4.1.3. Segmentation model test evaluation segmentation was. The primary metric employed to determine how
For easier comparison with the baseline model, we maintain the much each set's intersection and union differ from one another is the
consistency of the evaluation measures. We use three metrics: mean IoU. The m-IoU is the average after calculating the IoU for each cate­
accuracy (m-acc), mean intersection over union (m-IoU), and overall gory. The experimental findings, which include a summary of the
accuracy (OA) [65]. Eqs. (6–8) illustrate how these indicators are uti­ experiment test results in Table 2, confirm the efficacy and applicability
lized to evaluate the efficacy of point cloud semantic segmentation. of the suggested segmentation network. Our network's semantic seg­
mentation performance is evaluated utilizing OA, m-OA, and m-IOU as
TP + TN
OA = (6) widely used metrics. Our improved model significantly outperforms
TP + FP + FN + TN
previous models in comparison. In addition, Fig. 13 provides a visual
1 ∑k TP + TN comparison of key evaluation metrics among various semantic seg­
m − acc = (7) mentation models.
1 + k i=0 TP + FP + FN + TN
Fig. 14 demonstrates the improved accuracy of our network
1 ∑k TP compared to the baseline model. It visually compares semantic seg­
m − IOU = (8)
1 + k i=0 TP + FP + FN mentation results for structural outer regions (space-forming objects)
Where TP, TN, FP, and FN denote true positive, true negative, false and non-structural inner regions (space-occupying objects). We can

13
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. 16. The semantic segmentation results: (a, b) synthetic datasets and (c, d) collected datasets.

notice that the model results improve the semantic segmentation accu­ superior evaluation metrics and outcomes.
racy for space-forming objects (the first three scenes) and space-
occupying objects (the last three scenes). Also, it is noticeable that the 4.2. Reconstructed 3D BIM model of structured elements
results show better accuracy in delineating boundary areas and edges of
objects, especially space-forming objects, in contrast to the original 4.2.1. Dataset description
model. The model better represents their borders when we look at To assess the effectiveness and accuracy of the adopted approach,
clusters like the ceiling, floor, and walls. This is essential for achieving two distinct types of datasets were utilized: synthetic and collected. The
high 3D modeling accuracy. The better performance of the suggested synthetic datasets, as depicted in Fig. 15a and b represent 3D point
model is achieved due to improvements attributed to the proposed clouds and feature diverse room layout structures. They were specif­
approach. ically designed to evaluate the performance of the indoor scene
We optimized the point-cloud sampling strategy, resulting in a more modeling algorithm, according to the reference [71]. On the other hand,
precise representation of input data and enabling the model to capture the real laser scanner datasets were gathered within the Hong Kong
spatial intricacies more accurately. Furthermore, we introduced a local Polytechnic University campus. The datasets were collected using the
feature aggregation module working alongside dilated convolutions. Nav-Vis M6, which is an indoor mapping 3D scanner stable on wheels.
This empowers the model to aggregate local context information and These datasets comprise complete furniture arrangements with varying
effectively understand data. Dilated convolutions expand the model's layout complexities. The first dataset (collected 1) was obtained from a
ability to capture fine-grained and global features, improving pattern large floor area, while the second dataset (collected 2) was acquired
recognition. We also fine-tuned the loss function by considering focal from a lecture room, as illustrated in Fig. 15c and d, respectively. These
loss and prioritizing critical aspects for efficient training and superior datasets were used to validate the model's performance in realistic
outcomes. Structural adjustments in the neural network further refined environments.
data processing, increasing its discriminative ability and resulting in

14
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. 17. Line detection and corner extraction results: (a, b) synthetic datasets and (c, d) collected datasets.

4.2.2. 3D-created BIM model analysis surfaces.


Fig. 16 shows semantic segmentation results using our improved The proposed workflow efficiently reconstructs small or large-scale
deep learning model on both synthetic and real datasets. This indicates data encompassing various room layouts, including Manhattan-style
that the semantic classes are acceptable and could be used for recon­ and intricate non-Manhattan structural arrangements. The resulting
struction. Refer to Appendix A for the visualization of semantic seg­ BIM models include modeling the walls, floor, ceiling, and wall open­
mentation, excluding the ceiling element. The input point clouds were ings; the ceiling is removed for better visualization. Table 3 describes the
represented in (x, y, z, r, g, b) format, denoting the point cloud co­ count of detected wall object surfaces in each dataset under evaluation.
ordinates (x, y, z) alongside the RGB color values. For applying the For synthetic datasets 1 and 2, the detected doors and windows are
reconstruction workflow, the result of semantic segmentation is used as three, the same as the actual reference data. This indicates 100% ac­
described in the methodology section. The line detection and corner curacy for the models developed from the synthetic datasets. For the first
extraction results in Fig. 17 show that the RANSAC algorithm is used for collected dataset, the number of detected doors is 22, while the actual
each room cluster to find a more refined group of lines. Subsequently, number of doors is 23, indicating 95% detection accuracy. A potential
corner points representing room edges are detected and materialized explanation is that closed doors, undetected as openings in the semantic
through the intersections of consecutive lines. segmentation, were not identified as door classes. This issue needs to be
The reconstruction algorithms effectively identified anticipated improved in future works. For the second collected dataset, the detected
lines, as shown in Fig. 17, and accurately determined corner points number of doors is 2, and the number of windows is 4, similar to the
through precise line sorting and intersection. In this figure, the detected actual data and producing 100% detection accuracy.
lines are represented in blue, while corner points are depicted in black.
Employing a custom algorithm for comparing the 3D model to the real 4.2.3. 3D BIM model evaluation
world yielded consistently high evaluation metrics, illustrated in Ta­ The suggested method was verified using both artificial and real-
bles 4 and 5. This validation confirms the detection of 100% accurate world data. The data was assessed using the evaluation method intro­
lines and points in all datasets. In the event that room clustering is not duced by reference [72]. This method calculates representative three
required, we can also utilize the projected wall class for line and corner metrics and compares the created and reference models. The metrics are
detection. The door and window openings are detected using the pre­ precision (correctness), recall (completeness), and F-score, as described
viously explained method to get their insertion points and dimensions. in Eqs. (9–11).
The information deduced from line features and openings is exported for ⋂
Areapred model Arearef model
automatic modeling. As shown in Fig. 18, the 3D BIM model was finally P= (9)
Areapred model
created by applying the proposed dynamo parametric algorithm in Revit
software version 2021. The model provides an excellent representation
of the whole area mapped for the structural items and wall object

15
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. 18. The 3D BIM model results: (a, b) synthetic datasets and (c, d) collected datasets.

Table 3
The element extraction results for the doors and windows of the proposed method.
Dataset Detected doors Detected windows Actual doors Actual windows Door accuracy (%) Windows accuracy (%)

Synthetic 1 3 3 3 3 100 100


Synthetic 2 3 3 3 3 100 100
Collected 1 22 17 23 17 95 100
Collected 2 2 4 2 4 100 100

Table 4
Evaluation of the reconstructed BIM models based on predicted areas.
Dataset Intersection area Predicted Reference Precision Recall F-
(m2) area (m2) area (m2) score

Synthetic 1 78.650 79.429 79.026 0.990 0.995 0.992


Synthetic 2 212.514 219.283 212.637 0.969 0.999 0.984
Collected 1 126.860 128.041 127.017 0.988 0.998 0.993
Collected 2 43.067 43.068 43.164 0.999 0.997 0.998


Areapred model Arearef intersection area to the entire actual area, signifying the proportion of
(10)
model
R=
Arearef model correctly predicted model areas out of all the actual model areas. The F-
score is a comprehensive metric that combines precision and recall into
P*R a single value, offering a balanced measure of the model's performance.
F − score = 2* (11)
P+R The F-score assigns equal weight to both metrics by calculating the
Precision (P) is the area of intersection between the true model and harmonic mean of recall and precision. Its value is between 0 and 1,
where 1 denotes excellent recall and precision, and 0 denotes poor
the reconstructed model divided by the total reconstructed area. In other
words, it quantifies the proportion of correctly predicted model areas model performance.
As the area is a crucial attribute of BIM elements, we conducted a
out of all the predicted model areas. Recall (R) is the ratio of the

16
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Table 5 Table 4 emphasizes the models' high accuracy, highlighting the F-score
Evaluation of the reconstructed BIM models based on predicted perimeters. as an average accuracy indicator.
Dataset Predicted Reference Error Accuracy Additionally, we performed a comparison between the measured
perimeter (m) perimeter (m) (m) (%) dimensions of all rooms and their actual values, which are recorded in
Synthetic Table 5. The comparison is developed by differentiating the predicted
84.092 83.794 − 0.298 99.60
1 perimeter from the actual perimeter of the reference model. The error in
Synthetic
131.306 129.103 − 2.204 98.30
the estimated perimeter ranges from 0.019 m to 2.204 m, which is
2 acceptable relative to the actual perimeter. The calculated error per­
Collected
1
150.663 149.734 − 0.929 99.30 centage is equal to the division of the error by the reference perimeter,
Collected and the accuracy is deduced by subtracting the error percentage from
27.473 27.492 0.019 99.90
2 100%. The calculated accuracy gives high values of more than 98%,
indicating the closeness of the created model to the actual model.
Further assessment of additional datasets is available in Appendix A.
comparison between the area measurements of the extracted elements
and their actual values. The corresponding results are anticipated in 4.3. Reconstructed 3D BIM model of unstructured elements
Table 4. The comparison is developed by applying precision, recall, and
F-score metrics. The results for calculated model evaluation metrics We used the second collected dataset to apply the indoor object
indicate highly accurate created models, where the metric values are reconstruction method, which was observed inside a lecture room. As
greater than 96%, indicating that the developed models are identical to depicted in Fig. 19a, the inner view of this dataset with RGB color is
manually created reference models. The precision ranges from 96.9% to shown, and Fig. 19b shows the semantic segmentation results. The
99.9%, which means the correctness of the created models compared to outcome classes are table, chair, board, floor, ceiling, wall, door, win­
the corresponding reference models is high. The recall values range from dow, and clutter. The indoor classes are extracted to be modeled auto­
99.5% to 99.9%, which means the completeness of the developed matically in the BIM model. These indoor classes are shown in Fig. 20,
models compared to the corresponding reference models is also high. which includes tables in brown, chairs in green, and a board in blue. The

Fig. 19. Inner view of the second collected dataset: (a) point clouds with RGB values (b) semantic segmentation result.

17
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. 20. Indoor classes of the second collected dataset.

samples = 10) to ensure precise clustering outcomes. These bounding


boxes are subsequently refined, as previously detailed, to ensure the
absence of erroneous elements and the accurate preservation of object-
spatial relationships. These refined bounding box coordinates are the
basis for computing each cluster's essential attributes, such as length,
width, height, and insertion points. The resulting compiled data is then
harnessed to facilitate the creation of models for each cluster, blending
precision and efficiency.
Fig. 21 demonstrates the final 3D BIM model not only for the
structured classes but also for the indoor unstructured object classes.
Notably, Figs. 18d and 21 depict identical 3D model space. Fig. 18d
displays structural elements in a 3D view, while Fig. 21 offers an over­
head view emphasizing indoor objects. Different size proportions are
due to varied scaling and perspective choices to highlight indoor objects.
This model is reconstructed automatically by applying the created in­
door Dynamo scripts. For evaluating the reconstructed indoor objects,
Table 6 is proposed, which contains the detected number of each cluster
and the corresponding actual number. The detected numbers of boards
and tables are 1 and 8, respectively, identical to the actual numbers, so
Fig. 21. Final 3D BIM model for the second collected dataset, including the
created indoor objects. the detection accuracy of the board and table is 100%. The detected
number of chairs is 30, while the actual number is 31, giving this class a
detection accuracy of 97%. This is due to the proximity of two chairs,
Table 6
leading them to be merged into a single cluster after the clustering al­
The element extraction results of indoor classes using the proposed method. gorithm was applied. This convergence is visually depicted in Fig. 20,
where the chairs are represented as a unified cluster highlighted in a red
Class Detected number Actual number Accuracy (%)
box.
Board 1 1 100 As described in Fig. 22, each modeled object is created utilizing the
Table 8 8 100
inserted bounding box data, which ensures that the modeled elements
Chair 30 31 97
are much closer to the real-world objects. The right part of the figure
explains the inserted Excel file of the bounding box dimension, while the
identified indoor classes are then organized into clusters via a density- left part explains object properties for the selected table after the
based algorithm for generating individual bounding boxes for each modeling. The marked red rectangle compares the selected table di­
cluster. The clustering code developed undergoes several steps: initial mensions before and after the modeling process. As shown in this figure,
segmentation, bounding box computation, object-specific adjustments, the upper right chair type differs from the other types; this chair type
and dimension calculation. Different parameter tests on pre-evaluated cannot be changed automatically with the codded algorithm because
data ascertain the optimal settings for DBSCAN (eps = 0.15, min semantic segmentation does not differentiate between different types of

18
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. 22. Dimensional comparison of the created model and the extracted cluster bounding box.

chairs. Moreover, the final automatic model does not consider the object the segmentation process. The method ends with integrating object data
element orientation; therefore, some elements need to be rotated for the into a Revit parametric algorithm to automate the scan-to-BIM proced­
final 3D model. ure. The proposed indoor 3D reconstruction method for structured ele­
ments gives promising precision, recall, and accuracy metrics. Results
5. Conclusion and future work from these model evaluation metrics demonstrate highly accurate
models, indicating their equivalence to manually created reference
Precisely representing actual buildings is vital for informed decision- models. In addition, the results of the reconstruction method for un­
making in building operation and administration. Obtaining timely structured elements indicate precise and automatic BIM modeling.
status information necessary for building administration is critical. Despite the significant improvements in the BIM model's quality,
However, the traditional methods for the scan-to-BIM process for some issues must be resolved before a BIM representation that closely
existing buildings are characterized by time-consuming, labor-intensive, resembles the actual data can be achieved. It is proposed to improve the
and expensive procedures. These methods face challenges in accurately density clustering algorithm, which relies on point density to group
representing large-scale indoor complex layouts and extracting details objects together. However, if objects of the same class are located
from irregular-shaped unstructured elements, creating inaccurate in­ remarkably close to each other, it becomes challenging to identify them
door BIM models. This study introduces an innovative deep learning- as separate entities. Furthermore, the current approach does not account
based scan-to-BIM framework to generate 3D models automatically for variations in shape within a single class. To accurately select objects
from scanned point clouds. The models encompass structured and un­ from a library containing diverse shapes, an algorithm needs to be
structured elements seamlessly integrated into the BIM system. The developed to incorporate a broader range of shapes into the library and
framework includes three primary stages: a proposed semantic seg­ consider the varied object orientations. Also, the semantic segmentation
mentation model and automated processes for reconstructing structured model's training dataset must be modified to consider different chairs or
and unstructured elements. tables as multiple classes and not as the exact class representation.
The proposed framework presents an improved deep learning seg­ Additionally, it is proposed to enhance the accuracy of the opening
mentation network to enhance identifying and classifying various ele­ detection method and implement automatic editing for opening ele­
ments within scanned data. This method optimizes semantic ments in the BIM model. Finally, the proposed framework cannot
segmentation in 3D point clouds, enhancing the accuracy of element reconstruct structures with curved walls. Accurately identifying curved
recognition for subsequent reconstruction processing. In addition, the walls presents considerable issues because of the inherent complications
framework presents a novel approach that enables the automatic of segmentation methods. Reconstruction errors result from the inability
reconstruction of 3D BIM with semantic features derived from 3D indoor of RANSAC to recognize lines within the wall point clouds. This tech­
point clouds. This approach significantly enhances the accuracy and nique is unable to capture the nuances of curved surfaces precisely.
efficiency of reconstructing 3D building models using data-driven Incorporating shape-fitting algorithms like surfaces, ellipses, cylinders,
methodologies. It is specifically designed to reconstruct large-scale and others to improve the reliability of reconstructing curved structures
data with complex and diverse space layouts without the need for pre- is crucial for more generalized frameworks.
existing information. The method utilizes a BIM parametric algorithm
integrated into Revit, streamlining the automation of the scan-to-BIM CRediT authorship contribution statement
process. Moreover, the framework presents a robust method to
generate 3D BIM object models automatically for indoor unstructured Mostafa Mahmoud: Writing – review & editing, Writing – original
elements, incorporating semantic analysis. The method extracts and draft, Visualization, Validation, Software, Methodology, Investigation,
modifies indoor object data through semantic segmentation and clus­ Formal analysis, Data curation, Conceptualization. Wu Chen: Writing –
tering, considering spatial relationships to ensure noise reduction during review & editing, Validation, Supervision, Resources, Project

19
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

administration, Investigation, Funding acquisition, Conceptualization. Data availability


Yang Yang: Writing – review & editing, Visualization, Validation,
Software, Methodology, Investigation. Yaxin Li: Writing – review & The authors do not have permission to share data.
editing, Visualization, Validation, Resources, Methodology, Investiga­
tion, Data curation. Acknowledgements

This research received significant support from the Hong Kong


Declaration of competing interest Research Grants Council (RGC) and research funds provided by the
Research Institute of Sustainable Urban Development at the Hong Kong
The authors declare that they have no known competing financial Polytechnic University. Additionally, it received partial funding from
interests or personal relationships that could have appeared to influence the Research Institute of Land and Space at the Hong Kong Polytechnic
the work reported in this paper. University.

Appendix A. This appendix includes Fig. A1, displaying the semantic segmentation results derived from the primary dataset used for
evaluation, excluding the ceiling element. Additionally, supplementary datasets are provided in this section to facilitate additional
assessment

Fig. A1. The semantic segmentation results for primary experiments excluding the ceiling element: (a, b) synthetic datasets and (c, d) collected datasets.

20
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. A2. Additional datasets used for evaluating our approach: (a) synthetic 3 and (b) real collected 3.

Table A1
The element extraction results of doors and windows for the additional datasets.

Dataset Detected doors Detected windows Actual doors Actual windows Door accuracy (%) Windows accuracy (%)

Synthetic 3 13 16 13 16 100 100


Collected 3 3 6 3 6 100 100

The first additional dataset (synthetic 3) is utilized from the same synthetic data source as before, and the second dataset (collected 3) is observed
in a lecture hall at the Hong Kong Polytechnic University campus, shown in Figs. A2a and A2b. The semantic segmentation results utilizing our
improved deep learning model are demonstrated in Fig. A3 for the two datasets.

21
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. A3. The semantic segmentation results of the additional datasets: (a) synthetic 3 and (b) real collected 3.

Table A2
Evaluation of the reconstructed BIM models based on predicted areas for the additional datasets.

Dataset Intersection area Predicted Reference Precision Recall F-


(m2) area (m2) area (m2) score

Synthetic 3 281.820 286.400 283.750 0.984 0.993 0.988


Collected 3 101.768 102.356 102.201 0.994 0.996 0.995

22
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. A4. Line detection and corner extraction results of the additional datasets: (a) synthetic 3 and (b) real collected 3.

Table A3
Evaluation of the BIM models based on predicted perimeters for the additional datasets.

Dataset Predicted perimeter (m) Reference perimeter (m) Error (m) Accuracy (%)

Synthetic 3 242.14 240.88 − 1.263 99.46


Collected 3 50.170 50.101 − 0.069 99.86

As shown in Fig. A4, the line detection results of the refined room lines and corners. Door and window openings are also detected and exported for
model reconstruction. Fig. A5 displays a 3D BIM model created using our parametric algorithm in Revit. These models represent complete mapped
areas, encompassing structural items and wall surfaces. The workflow efficiently reconstructs data across various scales and room layouts without
relying on Manhattan assumptions. Table A1 details the number of detected wall object surfaces for each dataset. The doors and windows detected in
the evaluated datasets match the actual reference data, resulting in a 100% accuracy rate in the developed models. Results from calculated model
evaluation metrics indicate highly accurate models. The models demonstrate high precision (over 98%) and recall (over 99%), indicating correctness
and completeness, as specified in Table A2. Moreover, we compared the measured dimensions to their actual values, displayed in Table A3. Compared
to the actual perimeter, the predicted perimeter's error is within acceptable bounds, ranging from 0.069 m to 1.263 m. The measured accuracy exhibits
values over 99%, demonstrating how closely the developed model resembles the actual model.

23
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

Fig. A5. The 3D BIM model results of the additional datasets: (a) synthetic 3 and (b) real collected 3.

References [15] H. Tran, K. Khoshelham, Procedural reconstruction of 3D indoor models from lidar
data using reversible jump Markov chain Monte Carlo, Remote Sens. 12 (5) (2020)
838, https://doi.org/10.3390/rs12050838.
[1] U. EPA, Buildings and their Impact on the Environment: A Statistical Summary,
[16] K. Khoshelham, L. Díaz-Vilariño, 3D modeling of interior spaces: learning the
Technical Report, US Environmental Protection Agency, 2009. https://archive.epa.
language of indoor architecture, Int. Arch. Photogramm. Remote. Sens. Spat. Inf.
gov/greenbuilding/web/pdf/gbstats.pdf (accessed on 20/03/2023).
Sci. 40 (5) (2014) 321–326, https://doi.org/10.5194/isprsarchives-XL-5-321-
[2] C. Chen, L. Tang, BIM-based integrated management workflow design for schedule
2014.
and cost planning of building fabric maintenance, Autom. Constr. 107 (2019)
[17] M. Bassier, M. Vergauwen, Unsupervised reconstruction of building information
102944, https://doi.org/10.1016/j.autcon.2019.102944.
modeling wall objects from point cloud data, Autom. Constr. 120 (2020) 1–52,
[3] A.A. Ahmed, M.M. Al-Shaboti, A. Al-Zubairi, An indoor emergency guidance
https://doi.org/10.1016/j.autcon.2020.103338.
algorithm based on wireless sensor networks, in: International Conference on
[18] S. Tang, X. Li, X. Zheng, B. Wu, W. Wang, Y. Zhang, BIM generation from 3D point
Cloud Computing (ICCC), IEEE, 2015, pp. 1–5, https://doi.org/10.1109/
clouds by combining 3D deep learning and improved morphological approach,
CLOUDCOMP.2015.7149628.
Autom. Constr. 141 (June) (2022) 104422, https://doi.org/10.1016/j.
[4] X. Tian, R. Shen, D. Liu, Y. Wen, X. Wang, Performance analysis of RSS
autcon.2022.104422.
fingerprinting based indoor localization, IEEE Trans. Mob. Comput. 16 (10) (2017)
[19] S. Ochmann, R. Vock, R. Klein, Automatic reconstruction of fully volumetric 3D
2847–2861, https://doi.org/10.1109/TMC.2016.2645221.
building models from oriented point clouds, ISPRS J. Photogramm. Remote Sens.
[5] Y. Cui, B. Yang, P. Liu, L. Kong, A review of indoor automation modeling based on
151 (2019) 251–262, https://doi.org/10.1016/j.isprsjprs.2019.03.017.
light detection and ranging point clouds, Sens. & Mater. 35 (2023) 247–268, doi:
[20] L. Nan, K. Xie, A. Sharf, A search-classify approach for cluttered indoor scene
10./10.18494/SAM4211.
understanding, ACM Trans. Graphics (TOG) 31 (6) (2012) 1–10, https://doi.org/
[6] D. Ilter, E. Ergen, BIM for building refurbishment and maintenance: current status
10.1145/2366145.2366156.
and research directions, Struct. Surv. 33 (3) (2015) 228–256, https://doi.org/
[21] F. Poux, R. Neuville, G.-A. Nys, R. Billen, 3D point cloud semantic modeling:
10.1108/SS-02-2015-0008.
integrated framework for indoor spaces and furniture, Remote Sens. 10 (9) (2018)
[7] R. Tayeh, R.R.A. Issa, Interactive holograms for construction coordination and
1412, https://doi.org/10.3390/rs10091412.
quantification, J. Manag. Eng. 36 (6) (2020) 4020079, https://doi.org/10.1061/
[22] K. Xu, H. Li, H. Zhang, D. Cohen-Or, Y. Xiong, Z.-Q. Cheng, Style-content
(ASCE)ME.1943-5479.000084.
separation by anisotropic part scales, in: ACM SIGGRAPH Asia Papers, 2010,
[8] A. Akcamete, B. Akinci, J.H. Garrett, Potential utilization of building information
pp. 1–10, https://doi.org/10.1145/1866158.1866206.
models for planning maintenance activities, in: Proceedings of the International
[23] Y. Zheng, Q. Weng, Model-driven reconstruction of 3-D buildings using LiDAR
Conference on Computing in Civil and Building Engineering 2010, 2010,
data, IEEE Geosci. Remote Sens. Lett. 12 (7) (2015) 1541–1545, https://doi.org/
pp. 151–157. https://open.metu.edu.tr/handle/11511/84666 (accessed on 21/08/
10.1109/LGRS.2015.2412535.
2023).
[24] C. Wang, Y.K. Cho, C. Kim, Automatic BIM component extraction from point clouds
[9] J. Jung, S. Hong, S. Jeong, S. Kim, H. Cho, S. Hong, J. Heo, Productive modeling for
of existing buildings for sustainability applications, Autom. Constr. 56 (2015)
development of as-built BIM of existing indoor structures, Autom. Constr. 42
1–13, https://doi.org/10.1016/j.autcon.2015.04.001.
(2014) 68–77, https://doi.org/10.1016/j.autcon.2014.02.021.
[25] Z. Ma, P. Cooper, D. Daly, L. Ledo, Existing building retrofits: methodology and
[10] S. Hong, J. Jung, S. Kim, H. Cho, J. Lee, J. Heo, Semi-automated approach to
state-of-the-art, Energ. Build. 55 (2012) 889–902, https://doi.org/10.1016/j.
indoor mapping for 3D as-built building information modeling, Comput. Environ.
enbuild.2012.08.018.
Urban. Syst. 51 (2015) 34–46, https://doi.org/10.1016/j.
[26] E. Valero, D.D. Mohanty, M. Ceklarz, B. Tao, F. Bosché, G.I. Giannakis, S. Fenz,
compenvurbsys.2015.01.005.
K. Katsigarakis, G.N. Lilis, D. Rovas, A. Papanikolaou, An Integrated Scan-to-BIM
[11] R. Volk, J. Stengel, F. Schultmann, Building information modeling (BIM) for
Approach for Buildings Energy Performance Evaluation and Retrofitting, in:
existing buildings—literature review and future needs, Autom. Constr. 38 (2014)
Proceedings of the International Symposium on Automation and Robotics in
109–127, https://doi.org/10.1016/j.autcon.2013.10.023.
Construction November, 2021, pp. 204–211, https://doi.org/10.22260/isarc2021/
[12] H. Macher, T. Landes, P. Grussenmeyer, From point clouds to building information
0030.
models: 3D semi-automatic reconstruction of indoors of existing buildings, Appl.
[27] Y. Li, F. Wang, X. Hu, Deep-learning-based 3D reconstruction: a review and
Sci. 7 (10) (2017) 1–30, https://doi.org/10.3390/app7101030.
applications, Appl. Bion. Biomechan. 2022 (2022) 1–6, https://doi.org/10.1155/
[13] P. Hübner, M. Weinmann, S. Wursthorn, S. Hinz, Automatic voxel-based 3D indoor
2022/3458717.
reconstruction and room partitioning from triangle meshes, ISPRS J. Photogramm.
[28] Y. Guo, H. Wang, Q. Hu, H. Liu, L. Liu, M. Bennamoun, Deep learning for 3D point
Remote Sens. 181 (2021) 254–278, https://doi.org/10.1016/j.
clouds: a survey, IEEE Trans. Pattern Anal. Mach. Intell. 43 (12) (2021)
isprsjprs.2021.07.002.
4338–4364, https://doi.org/10.1109/TPAMI.2020.3005434.
[14] S.A. Bello, S. Yu, C. Wang, J.M. Adam, J. Li, Review: deep learning on 3D point
clouds, Remote Sens. 12 (11) (2020) 1–34, https://doi.org/10.3390/rs12111729.

24
M. Mahmoud et al. Automation in Construction 162 (2024) 105376

[29] C.R. Qi, H. Su, K. Mo, L.J. Guibas, Pointnet: Deep learning on point sets for 3d [51] B. Song, X. Li, Power line detection from optical images, Neurocomputing 129
classification and segmentation, in: Proceedings of the IEEE Conference on (2014) 350–361, https://doi.org/10.1016/j.neucom.2013.09.023.
Computer Vision and Pattern Recognition, arXiv preprint arXiv:1612.00593, 2017, [52] E.J. Almazan, R. Tal, Y. Qian, J.H. Elder, Mcmlsd: A dynamic programming
https://doi.org/10.48550/arXiv.1612.00593. approach to line segment detection, in: Proceedings of the IEEE Conference on
[30] C.R. Qi, L. Yi, H. Su, L.J. Guibas, Pointnet++: Deep hierarchical feature learning on Computer Vision and Pattern Recognition, 2017, pp. 2031–2039. https://opena
point sets in a metric space, Adv. Neural Inf. Proces. Syst. 30 (2017). ISBN: ccess.thecvf.com/content_cvpr_2017/papers/Almazan_MCMLSD_A_Dynamic_
9781510860964, https://proceedings.neurips.cc/paper_files/paper/2017/hash CVPR_2017_paper.pdf (accessed on 11/05/2023).
/d8bf84be3800d12f74d8b05e9b89836f-Abstract.html (accessed on 12/12/2023). [53] S. Asaeedi, F. Didehvar, A. Mohades, α-Concave hull, a generalization of convex
[31] Y. Li, R. Bu, M. Sun, W. Wu, X. Di, B. Chen, Pointcnn: Convolution on x- hull, Theor. Comput. Sci. 702 (2017) 48–59, https://doi.org/10.1016/j.
transformed points, Adv. Neural Inf. Proces. Syst. 31 (2018). ISBN: tcs.2017.08.014.
9781510884472, https://proceedings.neurips.cc/paper/2018/hash/f5f8590cd58a [54] D.T. Lee, B.J. Schachter, Two algorithms for constructing a Delaunay triangulation,
54e94377e6ae2eded4d9-Abstract.html (accessed on 08/12/2023). Int. J. Comput. Inform. Sci. 9 (3) (1980) 219–242, https://doi.org/10.1007/
[32] T.N. Kipf, M. Welling, Semi-supervised classification with graph convolutional BF00977785.
networks, arXiv preprint (2016), https://doi.org/10.48550/arXiv.1609.0290 [55] S. De Geyter, M. Bassier, H. De Winter, M. Vergauwen, Review of window and door
arXiv:1609.0290. type detection approaches, Int. Arch. Photogramm. Remote. Sens. Spat. Inf. Sci. 48
[33] Y. Wang, Y. Sun, Z. Liu, S.E. Sarma, M.M. Bronstein, J.M. Solomon, Dynamic graph (2/W1-2022) (2022) 65–72, https://doi.org/10.5194/isprs-archives-XLVIII-2-W1-
cnn for learning on point clouds, Trans. Graphics (TOG) 38 (5) (2019) 1–12, 2022-65-2022.
https://doi.org/10.1145/3326362. [56] K. Pexman, D.D. Lichti, P. Dawson, Automated storey separation and door and
[34] L. Landrieu, M. Simonovsky, Large-scale point cloud semantic segmentation with window extraction for building models from complete laser scans, Remote Sens. 13
superpoint graphs, in: Proceedings of the IEEE Conference on Computer Vision and (17) (2021) 3384, https://doi.org/10.3390/rs13173384.
Pattern Recognition, arXiv preprint arXiv:1711.09869, 2018, https://doi.org/ [57] W. Shi, W. Ahmed, N. Li, W. Fan, H. Xiang, M. Wang, Semantic geometric
10.48550/arXiv.1711.09869. modelling of unstructured indoor point cloud, ISPRS Int. J. Geo Inf. 8 (1) (2019) 9,
[35] Q. Hu, B. Yang, L. Xie, S. Rosa, Y. Guo, Z. Wang, N. Trigoni, A. Markham, Learning https://doi.org/10.3390/ijgi8010009.
semantic segmentation of large-scale point clouds with random sampling, IEEE [58] G.-T. Michailidis, R. Pajarola, Bayesian graph-cut optimization for wall surfaces
Trans. Pattern Anal. Mach. Intell. 44 (11) (2021) 8338–8354, https://doi.org/ reconstruction in indoor environments, Vis. Comput. 33 (2017) 1347–1355,
10.1109/TPAMI.2021.3083288. https://doi.org/10.1007/s00371-0161230-3.
[36] Y. Li, W. Li, S. Tang, W. Darwish, Y. Hu, W. Chen, Automatic indoor as-built [59] L. Yang, J.C.P. Cheng, Q. Wang, Semi-automated generation of parametric BIM for
building information models generation by using low-cost RGB-D sensors, Sensors steel structures based on terrestrial laser scanning data, Autom. Constr. 112 (April)
(Switzerland) 20 (1) (2020) 293, https://doi.org/10.3390/s20010293. (2020) 103037, https://doi.org/10.1016/j.autcon.2019.103037.
[37] Y. Perez-Perez, M. Golparvar-Fard, K. El-Rayes, Scan2BIM-NET: deep learning [60] C. Thomson, J. Boehm, Automatic geometry generation from point clouds for BIM,
method for segmentation of point clouds for scan-to-BIM, J. Constr. Eng. Manag. Remote Sens. 7 (9) (2015) 11753–11775, https://doi.org/10.3390/rs70911753.
147 (9) (2021) 1–14, https://doi.org/10.1061/(asce)co.1943-7862.0002132. [61] Y. Matsuura, A. Hayano, K. Itakura, Y. Suzuki, Estimation of planes of a rock mass
[38] J. Park, Y.K. Cho, Point Cloud Information Modeling: Deep Learning–Based in a gallery wall from point cloud data based on MD PSO, Appl. Soft Comput. 84
Automated Information Modeling Framework for Point Cloud Data, J. Constr. Eng. (2019) 105737, https://doi.org/10.1016/j.asoc.2019.105737.
Manag. 148 (2) (2022) 1–14, https://doi.org/10.1061/(asce)co.1943- [62] B. Jiang, J. He, S. Yang, H. Fu, T. Li, H. Song, D. He, Fusion of machine vision
7862.0002227. technology and AlexNet-CNNs deep learning network for the detection of
[39] J. Park, J. Kim, D. Lee, K. Jeong, J. Lee, H. Kim, T. Hong, Deep Learning–Based postharvest apple pesticide residues, Artif. Intelligen. Agricult. 1 (2019) 1–8,
Automation of Scan-to-BIM with Modeling Objects from Occluded Point Clouds, https://doi.org/10.1016/j.aiia.2019.02.001.
J. Manag. Eng. 38 (4) (2022) 1–11, https://doi.org/10.1061/(asce)me.1943- [63] I. Armeni, S. Sax, A.R. Zamir, S. Savarese, Joint 2d-3d-semantic data for indoor
5479.0001055. scene understanding, arXiv preprint (2017), https://doi.org/10.48550/
[40] L. Huan, X. Zheng, J. Gong, GeoRec: geometry-enhanced semantic 3D arXiv.1702.01105 arXiv:1702.01105.
reconstruction of RGB-D indoor scenes, ISPRS J. Photogramm. Remote Sens. 186 [64] Stanford 2D-3D-Semantics Dataset (2D-3D-S). http://buildingparser.stanford.
(February) (2022) 301–314, https://doi.org/10.1016/j.isprsjprs.2022.02.014. edu/dataset.html, 2024 (accessed on16/04/2023).
[41] S. Kim, K. Jeong, T. Hong, J. Lee, J. Lee, Deep Learning–Based Automated [65] J. Zhang, X. Zhao, Z. Chen, Z. Lu, A review of deep learning-based semantic
Generation of Material Data with Object–Space Relationships for Scan-to-BIM, segmentation for point cloud, IEEE Access 7 (2019) 179118–179133, https://doi.
J. Manag. Eng. 39 (3) (2023) 1–13, https://doi.org/10.1061/jmenea.meeng-5143. org/10.1109/ACCESS.2019.2958671.
[42] Z. Xiang, A. Rashidi, G. Ou, Integrating Inverse Photogrammetry and a Deep [66] L. Tchapmi, C. Choy, I. Armeni, J. Gwak, S. Savarese, Segcloud: Semantic
Learning–Based Point Cloud Segmentation Approach for Automated Generation of segmentation of 3d point clouds, in: 2017 International Conference on 3D Vision
BIM Models, J. Constr. Eng. Manag. 149 (9) (2023) 04023074, https://doi.org/ (3DV), 2017, pp. 537–547, https://doi.org/10.1109/3DV.2017.00067.
10.1061/jcemd4.coeng-13020. [67] M. Tatarchenko, J. Park, V. Koltun, Q.-Y. Zhou, Tangent convolutions for dense
[43] K. Khan, S.U. Rehman, K. Aziz, S. Fong, S. Sarasvady, DBSCAN: Past, present and prediction in 3d, in: Proceedings of the IEEE Conference on Computer Vision and
future, in: The Fifth International Conference on the Applications of Digital Pattern Recognition, arXiv preprint arXiv:1807.02443, 2018, https://doi.org/
Information and Web Technologies (ICADIWT 2014), 2014, pp. 232–238, https:// 10.48550/arXiv.1807.02443.
doi.org/10.1109/ICADIWT.2014.6814687. [68] H. Zhao, L. Jiang, C.-W. Fu, J. Jia, Pointweb: Enhancing local neighborhood
[44] M.A. Fischler, R.C. Bolles, Random sample consensus: a paradigm for model fitting features for point cloud processing, in: Proceedings of the IEEE/CVF Conference on
with applications to image analysis and automated cartography, Commun. ACM 24 Computer Vision and Pattern Recognition, 2019, pp. 5565–5573, in: https://open
(6) (1981) 381–395, https://doi:ACM0001–0782/81/0600–0381 $00.75. access.thecvf.com/content_CVPR_2019/papers/Zhao_PointWeb_Enhancing_Local_
[45] X. Xu, K. Harada, Automatic surface reconstruction with alpha-shape method, Vis. Neighborhood_Features_for_Point_Cloud_Processing_CVPR_2019_paper.pdf
Comput. 19 (2003) 431–443, https://doi.org/10.1007/s00371-003-0207-1. (accessed on 20/06/ 2023).
[46] W. Thabet, J. Lucas, S. Srinivasan, Linking life cycle BIM data to a facility [69] M.-H. Guo, J.-X. Cai, Z.-N. Liu, T.-J. Mu, R.R. Martin, S.-M. Hu, Pct: point cloud
management system using Revit dynamo, Organizat. Technol. Manag. Construct. transformer, Comput. Vis. Media 7 (2021) 187–199, https://doi.org/10.1007/
14 (1) (2022) 2539–2558, https://doi.org/10.2478/otmcj-2022-0001. s41095-021-0229-5.
[47] Point Cloud Outlier Removal. http://www.open3d.org/docs/release/tutorial/g [70] S. Fan, Q. Dong, F. Zhu, Y. Lv, P. Ye, F.-Y. Wang, SCF-Net: Learning spatial
eometry/pointcloud_outlier_removal.html, 2024. contextual features for large-scale point cloud segmentation, in: Proceedings of the
[48] F. Groh, P. Wieschollek, H.P.A. Lensch, Flex-Convolution: Million-scale point-cloud IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021,
learning beyond grid-worlds, in: Asian Conference on Computer Vision, 2018, pp. 14504–14513. https://openaccess.thecvf.com/content/CVPR2021/paper
pp. 105–122, https://doi.org/10.1007/978-3-030-20887-5_7. s/Fan_SCFNet_Learning_Spatial_Contextual_Features_for_LargeScale_Point_Clo
[49] T.-Y. Lin, P. Goyal, R. Girshick, K. He, P. Dollár, Focal loss for dense object ud_Segmentation_CVPR_2021_paper.pdf (accessed on 08/10/2023).
detection, in: Proceedings of the IEEE International Conference on Computer [71] C. Mura, O. Mattausch, A. Jaspe Villanueva, E. Gobbetti, R. Pajarola, Automatic
Vision, arXiv preprint arXiv:1708.02002v2, 2017, https://doi.org/10.48550/ room detection and reconstruction in cluttered indoor environments with complex
arXiv.1708.02002. room layouts, Comput. Graph. 44 (1) (2014) 20–32, https://doi.org/10.1016/j.
[50] P. Tian, X. Hua, W. Tao, M. Zhang, Robust extraction of 3D line segment features cag.2014.07.005.
from unorganized building point clouds, Remote Sens. 14 (14) (2022) 3279, [72] K. Khoshelham, H. Tran, D. Acharya, L.D. Vilariño, Z. Kang, S. Dalyot, Results of
https://doi.org/10.3390/rs14143279. the ISPRS benchmark on indoor modelling, ISPRS Open J. Photogramm. Remote
Sens. 2 (August) (2021) 100008, https://doi.org/10.1016/j.ophoto.2021.100008.

25

You might also like