4-Creating Historical Building Models

International Journal of Architectural Heritage
Conservation, Analysis, and Restoration
ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/uarc20
Creating Historical Building Models by Deep

Fusion of Multi-Source Heterogeneous Data Using
Residual 3D Convolutional Neural Network
Wenfa Hu & Ruiqi Hu
To cite this article: Wenfa Hu & Ruiqi Hu (26 Jun 2023): Creating Historical Building Models
by Deep Fusion of Multi-Source Heterogeneous Data Using Residual 3D Convolutional Neural
Network, International Journal of Architectural Heritage, DOI: 10.1080/15583058.2023.2229253
To link to this article: https://doi.org/10.1080/15583058.2023.2229253
Published online: 26 Jun 2023.
Submit your article to this journal
Article views: 113
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at

https://www.tandfonline.com/action/journalInformation?journalCode=uarc20
INTERNATIONAL JOURNAL OF ARCHITECTURAL HERITAGE
https://doi.org/10.1080/15583058.2023.2229253
Creating Historical Building Models by Deep Fusion of Multi-Source

Heterogeneous Data Using Residual 3D Convolutional Neural Network
Wenfa Hua and Ruiqi Hub
a
School of Economics and Management, Tongji University, Shanghai, China; bManning College of Information and Computer Sciences,
University of Massachusetts Amherst, Amherst, USA
ABSTRACT ARTICLE HISTORY

There are a large number of historical buildings in the world, but many of them are seriously Received 5 December 2021
damaged. A fundamental task to revitalize them is to record their details and redraw them, but it is Accepted 20 June 2023
a highly inefficient task. This paper aimed to create three-dimensional (3D) models automatically KEYWORDS
for those damaged historical buildings while respecting the 1964 Venice Charter, the ICOMOS Convolutional neural
recommendations, and local ordinances. After reviewing the advantages and disadvantages of network; deep fusion;
various measurement tools, we implemented a machine learning algorithm, the residual 3D heterogeneous data;
convolutional neural network (CNN) method, to combine heterogeneous data from multiple historical building; historical
sources including a 3D laser scanner, an unmanned aerial vehicle, a total station, and a camera, building information model
in a historical building information model (HBIM) to increase its accuracy. Those data, preprocessed
by four steps including data registration, reducing noises, filtering data, and data integration, were
trained and fused in a 3D CNN model, which consisted of one input layer, five 3D convolutional and
pooling layers, and two connected layers. A Shanghai historical building conservation project was
taken as a pilot case to verify its productivity and accuracy in developing the HBIM, on which the
architectural features were thoroughly recorded, and the damaged details were recovered.
1. Introduction traditional reverse engineering method for historical

buildings, e.g. building information modeling (BIM)
Historic buildings, built decades or hundreds of years
(Bastem and Cekmis 2022), depends on a large number
ago, are important assets of their cities and states,
of field investigations, measurements, and surveys, and
attracting tourists and adding cultural perspectives and
this method is time-consuming (Clarke and Laefer
other qualities to their communities. They are perma
2014), expensive (Milic et al. 2019), inaccurate (Wang
nent symbols of their history and culture, providing
et al. 2020), and inefficient.
tangible reminders of the past (Walter 2014). But
Modern technology provides us with some solutions
many historical buildings are at risk of collapse because
to those problems. Three-dimensional (3D) laser scan
of inevitable corrosion and inadequate maintenance,
ning technology, as a new surveying and mapping tech
their structural materials are being broken (Ozmen,
nology (Jaafar et al. 2017), is widely used in creating 3D
Akan, and Unay 2011; Wang et al. 2019), and their
models of historical buildings because of its advantages
facilities are worn out (Chen, Chiu, and Tsai 2018).
of high efficiency, precision, and integration. However,
The 1964 Venice Charter, the ICOMOS recommenda
practices indicate some shortcomings, e.g. the poor
tions, and local ordinances require that their original
quality of point cloud in the upper part of a high-rise
and historical features would be well-conserved and
building due to scanning distance or scanning angle, or
restored. Therefore, their details should be carefully
blind areas or blocks of other objects when scanning the
recorded before the construction and the damaged fea
building top. An unmanned aerial vehicle (UAV) can fly
tures would be completely restored after the construc
high to avoid those blocks, carrying multi-lens sensors
tion (Collepardi 1999). Since local ordinances require all
(Templin and Popielarczyk 2020). With the assistance of
rebuilding proposals to follow the latest building codes
tilt photogrammetry technology, a UAV can survey and
which will potentially ruin the conservation of the his
map the building continuously from various angles so
torical features, their construction proposals must be
that the point cloud data with the spatial and color
carefully evaluated before the construction. One obsta
information in building appearances are collected
cle to restoring historic buildings is the lack of architec
through a large number of image-matching calculations.
tural documents and drawings (Karimi et al. 2022). The
CONTACT Wenfa Hu wenfahu@tongji.edu.cn

© 2023 Taylor & Francis
2 W. HU AND R. HU
Then, a 3D model can be created and the model can be historical buildings (Bruno and Roncella 2019; Bruno,
refined with more detailed information if the UAV takes De Fino, and Fatiguso 2018). Sztwiertnia et al. (2021)
more images from the top of the building (Lim et al. claimed that HBIM was a system combining BIM and
2022), which is a fast reverse high-precision modeling other modeling technology like laser scanning and
technology for historical buildings. Due to the flight photogrammetry. Nieto-Julian, Anton, and Jose
altitude control of a commercial UAV, the angle limita Moyano (2020) illustrated how the HBIM process was
tion of the airborne camera lens on the UAV, and the applied in a reverse-engineering solution. One key pro
blocks of ground trees or other buildings, there are still blem in developing HBIM is the mapping of parameter
many errors or mistakes in the 3D models for ized objects of building elements onto laser scanning or
a historical building, e.g. high quality in the top of the photogrammetric data (Jordan-Palomar et al. 2018).
building model but poor quality in its bottom when Yang et al. (2020) proposed that there were two main
a tree sticks to the building bottom or flight paths are techniques of photogrammetry and laser-scanning to
blocked by other objects. As an electronic theodolite, the measure memorial buildings, but they were difficult to
total station is generally used in surveying and building be measured because of their valuable characteristics. Pu
construction for measuring angles and distances of con and Vosselman (2009) also mentioned that ground laser
trol points quickly and accurately (Alizadeh-Khameneh scanning was an important data source for the auto
et al. 2018), but a normal total station can not record the matic reconstruction of building models. Lee et al.
rich details of a historical building. Though an advanced (2019) proposed a building modeling method based on
total station can depict a pictorial view of the historical the information extracted from light detection and ran
building from the ground standpoint, it is more expen ging (LiDAR), and Wu et al. (2018) proposed to do it on
sive and hardly takes detailed pictures of the building’s aerial images. Yang, Sheng, and Wang (2016) proposed
upper parts. to create a 3D building facade with fused data of terres
None of those modern technologies is perfect to trial LiDAR data and optical images.
survey or measure a historical building. Since any of To create building models efficiently, Yang et al.
them has advantages over others, they are complemen (2019) developed an HBIM system to reconstruct his
tary to the measurements of historical buildings by torical building models with data from a terrestrial laser
enhancing the surveying accuracy, complementing the scanner and digital cameras. Cornelis et al. (2008) cre
missed data, and recovering the damaged details. But ated 3D solid models and 3D/2D surface models for an
how to create a 3D building model by fusion of those existing object after concerning its methods of structure
heterogeneous data is a technical challenge (Patrucco and construction in an HBIM system, presenting
et al. 2022). This paper presents a deep fusion frame a procedure to automatically generate 3D models of
work based on a residual 3D convolutional neural net historical buildings with multi-spectral texture informa
work (CNN) algorithm (Zou et al. 2019) to develop a 3D tion by a 2D range camera. Leonov et al. (2015) pro
building model, and then a historical building informa posed to combine the two-dimensional geographic
tion modeling (HBIM) platform can be further intro information system (GIS) and a three-dimensional
duced to refine 3D building models. model in HBIM for the preservation of architectural
The remainder of this paper is organized as follows: heritage. Jo and Hong (2019) demonstrated that a three-
Section 2 is a review of the literature on the reconstruction dimensional solution, based on a simplified parametric
of a 3D building model, including HBIM, the fusion of model, was suitable for industrial elements and build
multiple data, and the development of a 3D CNN algo ings and could be used for the conservation and man
rithm. Section 3 presents a framework for creating 3D agement of heritage documents.
building models and the main data sources. The process, Previous literature reveals that the methods of creat
algorithm, and formation of multiple data fusions based ing 3D historical building models are time-consuming
on 3D CNN are presented in Section 4. Section 5 illus and inefficient, therefore an automatic 3D modeling
trates a pilot project of historical buildings in Shanghai, approach would be more productive for repairing and
and Section 6 is the conclusion and future studies. upgrading a historical building.
2.2. Heterogeneous data from multi-sources in

2. Literature review
surveying historical buildings
2.1. 3D modeling methods for historical buildings
Although 3D laser scanners have many advantages,
The HBIM, as an extension of BIM, is a new method to Cheng et al. (2013) claimed that they might not be
utilize remote sensing data to reconstruct 3D models of suitable for rebuilding an entire castle since they were
INTERNATIONAL JOURNAL OF ARCHITECTURAL HERITAGE 3
not portable or fast enough and the vast amounts of data supervised, unsupervised, and reinforcement learn
they generated were difficult to be recorded and visua ing, and they are applied widely in detection, predic
lized. To avoid those problems, Pesci, Teza, and Bonali tion, and generation. As a subcategory of machine
(2011) combined aerial and ground images to build learning, deep learning is a neural network with neu
a building model based on multi-source data using rons having several layers between input and output,
existing floor plans, assisted by limited measurement robustly providing automatic learning of features.
data. Franceschi et al. (2015) proposed to combine Machine learning provides a practical platform for
LiDAR and photogrammetric data to investigate the detecting features in images, texts, and languages.
extensional synsedimentary structures of the Early After reviewing several examples of photogrammetry
Jurassic in the Southern Alps of Italy, where airborne and remote sensing carried out at Leibniz University
LiDAR data helped to obtain geological information of Hannover, Heipke and Rottensteiner (2020) claimed
vegetation-covered areas and photogrammetry was that deep learning, particularly CNN, improved sig
applied to generate high-resolution 3D texture models nificantly the photogrammetric processing in surface
caused by inaccessible parts of airborne LiDAR. Cheng reconstruction, scene classification, object extraction,
et al. (2011) developed a parametric object prototype and object tracking and recognition. Yang,
library based on historical building data in a cross- Rottensteiner, and Heipke (2021) proposed a deep
platform. Andriasyan et al. (2020) developed an auto learning framework with a two-step strategy to verify
matic modeling system for historical buildings using land use information. The first step was to detect the
point clouds and image measurement data. The HBIM high-resolution aerial images by a CNN, and
process starts with the remote collection of measure the second step was to classify land use by
ment data using a ground laser scanner combined with another CNN.
digital photos and then designs and constructs an object Machine learning is a promising technique in the
parameterization library. conservation of historical buildings and heritages
Martinez-Carricondo et al. (2021) addressed that the (Mesanza-Moraza, Garcia-Gomez, and Azkarate
combination of HBIM and UAV photogrammetry was 2021). Egorova and Shcherbinin (2016) suggested
more efficient for modeling and documentation of for creating virtual heritage replicas by 3D modeling to
gotten heritage than traditional drawing methods. As promote their preservation. After a systematic review
a new innovative technology, UAV provides an eco of various machine learning techniques applied in
nomic, reliable, and direct method to capture building historical buildings, Mishra (2021) highlighted the
data effectively and sustainably (Zaragoza et al. 2017). research directions to improve the robustness of sev
To create models of building tops and sides, many eral predictive models such as determination of the
researchers suggest the fusion technology of multi- mechanical properties of building materials, damage
source data, especially using vehicle and airline LiDAR scenarios, and damages on the building surfaces. Yu
data. Rao and Liu (2020) proposed combining 3D laser et al. (2022) proposed to use the deep networks to
technology with close-range photogrammetry to create automatically perform the restoration of the destroyed
a 3D model of the basement and murals of Santa historical records in Dunhuang Mogao Grottoes in
Cristina, Italy. Martinez et al. (2012) utilized feature China. Karadag (2023) used conditional generative
recognition to extract building elevation information, adversarial networks, a subset of machine learning,
but a large number of manual interventions were still to predict missing and damaged parts of 200 historical
required in completing a 3D building model. buildings within early Ottoman tombs and their doc
The data from multi-sources, e.g. a laser scanner, umentation was generated accurately through 3D
LiDAR, camera, photos, and surveyors, in upgrading point cloud data. With the growth of 3D capturing
a historical building is heterogeneous, but the combina devices such as 3D laser scanners and UAVs equipped
tion of those data is an inefficient and error-prone job. with high-resolution cameras, photogrammetry 3D
Those heterogeneous data should be processed to gen modeling has increased the productivity of modeling
erate 3D building models in a more efficient method. historical buildings. Galanakis et al. (2023) explored
deep learning to detect the detailed boundary of
stones in the Apollo Temple at Delphi, reaching
2.3. Machine learning and deep learning for
stone-level accuracy. Koutsoudis et al. (2021) pre
modeling historical buildings
sented a machine-learning approach to produce
Machine learning, a subclass of artificial intelligence, a large-scale 3D model of the urban area of 0.5
is self-learning based on data. The machine learning square km with a multilayer texture map from the
algorithm is divided into three categories which are multispectral aerial images.
4 W. HU AND R. HU
2.4. Heterogeneous data fusion by the CNN method 3. Research framework and data sources
As a subset of machine learning designed for proces 3.1. Research framework for creating 3D building
sing structured arrays of data, CNNs are at the heart models
of deep learning algorithms, which are widely used in The framework for creating 3D models for histor
computer vision and language processing because of ical buildings consists of four steps (Figure 1). The
their superior performance. CNNs have three main first step is to acquire heterogeneous data from
types of layers including a convolutional layer, various sources including a 3D laser scanner on
a pooling layer, and a fully connected layer. Song the ground, aerial cameras on a UAV, handy cam
et al. (2020) concluded that CNN was a good algo eras, and total stations. The data can be classified
rithm for multi-source heterogeneous data fusion. into three types: point cloud data, image data, and
Janssens et al. (2016) utilized the fusion of multi- survey data. The second step is data preprocessing
source heterogeneous data developed by a CNN through a 3D laser scanning system, which pro
model to bear fault detection of the vibration signal. duces a large amount of sparse point cloud data.
Zhang et al. (2019) proposed a fusion of 2D CNN and The point cloud data from 3D scanners will be
3D DenseNet for dynamic gesture recognition. Lv reduced and filtered and noises in the point clouds
et al. (2019) developed a two-route CNN model to should be deleted and some key points must be
classify bank accounts with heterogeneous data, calibrated. After data preprocessing, a 3D point
where suspicious bank accounts were identified using cloud draft model will be developed by data inte
transaction data through the CNN model. Nti, Felix gration. The third step is to recognize various fea
Adekoya, and Asubam Weyori (2021) proposed a new tures of the historical building and fuse those data
multi-source information fusion framework to predict into a refined 3D building model by a deep learning
stock prices based on a hybrid CNN structure and algorithm, called the residual 3D CNN, including
long-short memory. Saba et al. (2019) developed input, 3D CNN, RELU, maximum pooling layer,
a damage detection and recognition model according fully connected layer, and output. A refined build
to the patient skin images based on CNN. Li et al. ing model could represent detailed building facades
(2017) proposed a novel efficient deep fusion CNN for with features of color, texture, shape, and spatial
multimodal 2D + 3D facial expression recognition. relationship and it also includes the spatial structure
Previous literature reveals that the multi-source data information and 3D topological relationship of
method based on CNNs have been widely applied in building components. The fourth step is to create
computer image recognition, those images are faces, a perfect 3D building model on an HBIM platform,
skins, and gestures, but how the large-scale building integrating architectural data, building components,
images are recognized to create 3D models is not clear, and facilities.
which demands further study in this paper.
Figure 1. Flow chart of deep fusion of multi-source heterogeneous data using residual 3D CNN.
3.2. Data sources for modeling a historical building photogrammetry by adjusting the angle of the camera
lens.
The main data sources for modeling a historical build
In aerial photography, the UAV flies at an altitude
ing are a 3D laser scanner, a UAV, a total station, and
higher than the building roof and its camera lens angle
field surveyors.
is fixed, there are many blind spots because of the
limitations of the ground trees and the view angles.
3.2.1. 3D laser scanning The blind spots in photography make some textures in
The 3D laser scanning technology has been widely used a building model miss, which seriously affects the mod
in creating building facade models and interior decora eling results. For example, in the UAV ortho-images,
tion models of historical buildings. Its main contents are the ground resolution in the UAV aerial photograph is
the reconstruction of facades and interior components. inversely proportional to the UAV altitude. When the
(1) Digital models of building facades airborne camera focal length and the photograph pixels
Since its facades and roofs are so unique and its are fixed, along with the increase of the height between
structure and components are so complicated that the ground object and the UAV, the aerial photograph
a historical building is difficult to be modeled by resolution decreases continuously. Therefore, the reso
a forward parametric drawing method, but it is suitable lution on a building top is higher but on its bottom is
for reverse modeling technology, e.g. 3D laser scanning. lower in a UAV aerial photograph, which results in
To solve the problems of complex spatial topological a conclusion that the model texture on the building
relations and mutual blocks among building compo top is good but on its bottom is poor. Other surveying
nents, in a reverse modeling method the point clouds or photography tools are needed to tackle this problem.
are first encapsulated into surface patches, and then the
holes in the patches, including internal holes, edge
holes, and other data holes, are repaired. 3.3. Surveying by total stations and other methods
(2) Digital models of building interior decoration and The total station is a combination of a digital theodolite
components and an electronic distance-measuring device, useful for
A historical building e.g. the Jade Buddha Temple has measuring the vertical and horizontal angles and slope
a large number of valuable decorations and compo distances. Data collected and processed in a total station
nents, including statues, pillars, shrines, wall reliefs, can be downloaded to computers for further processing.
roof murals, special roof shapes, ancient beams and A total station can provide more accurate measurement
columns, and unique structures, whose models are than photogrammetry tools e.g. 3D scanners and UAVs,
only suitable for 3D laser scanning technology. After and it can be used to measure the distance between
considering the integrity of compatible data integrity building reference points or compensate for the missed
and lightweight data, a data acquisition scheme called data in the photogrammetry tools.
first overall and after local is often adopted. The internal
structure of a historical building should be scanned in
a panoramic fine view at first and the building interior 4. Multi-dimensional data fusion based on CNN
details are scanned in a high precision afterward so that CNN is a typical hierarchical structure, which classifies
this scheme can ensure data integrity while reducing and identifies data by automatically extracting features
data volume. layer by layer. 2D CNN is often used to classify still
pictures, but it has some limitations in feature extraction
3.2.2. UAV tilt photogrammetry of serial data such as video and space because it can’t
The UAV tilt photogrammetry technology is a new tool identify the spatial relationship between each data in
in surveying, mapping, and remote sensing, which com sequences. 3D CNN can capture the correlation between
bines traditional aerial photography and close-range sequence data. Because its convolution kernel has one
measurement technology, breaking through the limita more spatial dimension than 2D CNN, it has more net
tion of the ortho-images taken only from a vertical work parameters, which increases the computational
angle, and acquiring images by single or multiple lenses consumption and its computation is slow.
mounted on a flight platform in five different angles In a three-convolution layer CNN, the first convolu
including vertical, front-looking, left-looking, right- tion layer in a CNN detects low-order features such as
looking and rear-looking. A tilted camera on a UAV edges, angles, curves, etc. The input of the second con
may have five lenses, three lenses, two lenses, or a single volution layer is the output of the first layer, and
lens. Except for the single lens, the angles of other lenses the second layer can detect the combination of low-
are fixed. The single lens on a UAV completes five-view order features such as circles and polygons. The input
6 W. HU AND R. HU
of the third convolution layer is the output of the second The position value of a feature map at the i-th layer
layer, and the third layer can detect the spatial features and the j-th sequence at positions (a, b, c) is Vijabc ,
e.g. spatial structure. With the increase of convolution represented by Equation (2).
layers, more high-order features are detected, which are � Xm XPi 1 XQi 1 XTi 1 pqt �
ðaþpÞðbþqÞðcþtÞ
more complex and abstract since they are combinations Vijabc ¼ f dij þ k¼1 p¼0 q¼0 t¼0
Y ijk � Vð i 1Þ;k
of lower-order features. 1D CNN is mainly used to deal (2)
with natural language and time-series models. 2D CNN
is mainly used in computer vision and image processing, where m is the number of feature maps. Pi and Qi are
and various CNN varieties emerge, which is the most convolution kernel sizes, representing the length and
commonly used method in various image processing pqt
width of the convolution kernel, respectively.Yijk is the
competitions. 3D CNN is used to process 3D data,
convolution kernel connected with the (i-1)-th layer and
which is mainly used to capture features from the spatial ðaþpÞðbþqÞðcþtÞ
dimension and the temporal dimension after analyzing the k-th sequence feature map. Vði 1Þ;k is the
videos. Therefore, 3D CNN is the most effective method value of the (i-1)-th layer and the k-th feature map at
to process hyperspectral remote-sensing images. the position of ða þ pÞðb þ qÞðc þ tÞ. dij is the offset
between the i-th layer and the j-th feature map.
The down-sampling layer is usually used in conjunc
4.1. Method of 3D convolution tion with the convolution layer. Due to the limitation of
convolution kernel size, the size of feature maps is
The input of 3D CNN is many 3D swept cubes, which limited after convolution processing. So it is necessary
can simultaneously extract features in three scales. to compress the features of feature maps effectively in
Through the 3D convolution kernel, features can be a pooling layer. Because the convolution operation for
extracted from successive frames and feature cubes can large-scale images in large-scale data sets is too slow, the
be connected to multiple 3D models in the upper layer network training speed will be seriously affected. To
so that all information on a whole-space piece can be improve the network training speed and reduce the
captured. complexity of the network, it is best to use these two
The convolution kernel in 3D CNN is a 3D cube. In layers together. The down-sampling operation has
the network, each feature cube in the convolution layer translation invariance, that is, the extracted features
can be connected with several adjacent 3D figures from can remain unchanged while moving a target in the
the previous layer and then be operated in convolutions. image in a small displacement, which enhances its
The 3D convolution process is shown in Figure 2. The robustness to displacement. Therefore, the down-
position value of a feature cube is felt locally by con sampling layer is an effective method to reduce the
volution of the same position of multiple 3D graphics dimensionality of data images, which reduces the
frames in the upper layer. The 3D convolution is defined redundancy of sample information in the process of
by Equation (1). dimensionality reduction, so it has a certain resistance
XP XQ XR to over-fitting. The sampling layer uses Equation (3).
1 1 1
Vxyz ¼ p¼0 q¼0 r¼0
wpqr uðxþpÞðyþqÞðzþrÞ (1)
ymnl ¼ max ðxm�sþi;n�tþj;l�rþk Þ (3)
0�i�S1;0�j�S2;0�k�S3
Figure 2. 3D convolution process.

Where s, t, and r are the sampling steps in three the convolution kernel in each of them is 3 × 3 × 3. The
directions, respectively; x is a 3D input vector; y is the kernel’s number increases from 16 to 256 in an order to
output after sampling. generate more types of high-level features by combina
The main functions of the down-sampling layer are tions of low-level features. S1-S5 are the down-sampling
to reduce the image resolution, reduce the amount of layers, where the maximum pooling method is adopted
network operation data, and enhance the robustness of to reduce the resolution and scale of feature maps to
the network structure. However, the down-sampling decrease computation load and to tolerate more distor
layer should not be added too frequently, because the tion of inputted images. The down-sampling layer uses
down-sampling will zoom the image size in multiple a 1 × 2 × 2 window to sample in the spatial dimensions.
scales and fuzz the feature information seriously. The D1 layer is a fully connected layer with 256 neurons,
The pooling layer is to extract features by where the feature cubes outputted from the S5 layer
a convolution operation. Although the parameters connect. The D2 layer is the second fully connected
have been reduced by previous layers, they are still layer and also the output layer. The number of neurons
relatively complicated in calculating when they are is 6, which is the same as the number of target classes.
directly inputted into a classifier. At this time, the pool Each neuron in the D2 layer is fully connected with 256
ing function is generally required to convolute and neurons in the D1 layer. After regression and classifica
process the data so that the feature is mapped. It is tion by a classifier, the behavior category is marked and
a cluster operation, known as down-sampling, which obtained in the output.
means that a location and the eigenvalues of its neigh
borhoods are statistically analyzed and its location is
4.3. Process of a 3D CNN
updated by this value. The common pooling methods
are average pooling and maximum pooling. The pooled The building images are inputted into the network,
layer generally has no filling operation, so the size after which produces pictures of various sizes but their mea
pooling is n s f þ 1 � n s f þ 1. surements are invariant. After that, it goes to the next
three stages. In the first stage, a constructed image
pyramid is sent to the Phase I network to obtain
4.2. Structure of a 3D CNN
a boundary box vector and the building frame, where
To extract diverse information from a set of feature the building model is calibrated by the regression of
maps to improve their feature representation, the num boundary box vectors, and the candidate frames with
ber of convolution kernels can be added. The common high coincidence are merged by the non-maximum
criterion in designing a CNN is that deeper layers and suppression. In the second stage, the building images
more feature maps demand more convolution kernels in are sent to Phase II, where the improper building mod
the convolutional layers, which can combine more low- els that do not meet the requirements are rejected con
level feature maps and generate more diverse high-level tinuously. The overlapped candidate frames are merged
features. The structure of 3D CNN is developed and by the regression of the boundary box and the non-
shown in Figure 3, including one input layer, five 3D maximum suppression, and fewer boundary boxes
convolutional layers and five 3D pooling layers that are remain. In Phase III the boundary box regression and
interlaced with each other, and two fully connected the non-maximum suppression continue to identify the
layers which yield the classified results. building areas, and the final boundary box and key point
The first layer is the input layer, composed of multi positions are obtained. The visualized process in
ple 3D images. C1-C5 are five convolution layers, and a historical building is shown in Figure 4.
C1:16@3x3x3 C2:32@3x3x3 C3:64@3x3x3

Input
S1:1x2x2 S2:1x2x2 S3:1x2x2
C5:256@3x3x3 C4:128@3x3x3
Output D2 D1:256
S5:1x2x2 S4:1x2x2
Figure 3. The 3D CNN structure.

8 W. HU AND R. HU
�� 2
4.3.1. Classification of buildings �� landmark ��
Llandmark
i ¼ ��ybi yi landmark �� (6)
To judge whether the building image is a binary classi 2
fication, a cross-entropy loss function is adopted Where yi landmark is the predicted location of key posi
(Equation 4).
tions in a building; ybi landmark is the real location of key
� positions in the building. yi landmark 2 R10 means that the
Ldet
i ¼ ðyidet log ðpi Þ þ 1 yidet ð1 log ðpi ÞÞ (4)
coordinates of building components such as doors, win
where pi is the probability of the building sample xi , dows, pipes, walls, etc.
yidetis a real tag. Since there are various tasks in each level of CNN,
after reviewing different types of images such as back
grounds, buildings, and non-buildings, the total loss
4.3.2. Regression of boundary boxes function is expressed as follows.
The regression of boundary boxes is to calculate the XN X j j
predicted offset of a candidate box and a real box, TotalLoss ¼ Min i¼1
α β L (7)
j2fdet;box;landmarkg j i i
which is regarded as a regression task. A Euclidean
loss function (Equation 5) is introduced. Where N represents the number of training sets,
j
�� 2 βi2 f0; 1g is used to indicate different training samples.
�� box �� αj represents the important degree of different
Lbox
i ¼ ��ybi yi box �� (5)
2 components.
yi box is the
Where is the regressive prediction value;b A new type of explanatory loss function is defined as:
box 4
real bounding box coordinates; yi 2 R represent the L ¼ Ls0 þ λc Lc þ λc0 Lc0
upper left corner coordinates, width, and length, Xm eαðwxi þbyi Þ λc Xm
respectively. ¼ i¼1
log Pn αðwxi þbj Þ 2 þ i¼1
jjxi cyi jj
j¼1 e
λc0 Xm 1
4.3.3. Location of key points þ Pn ��
2 i¼1
j¼1 xi cyi �� þ 1
To predict the offset value, a loss function is defined by
(8)
Equation (6)
Figure 4. Process of 3D CNN in a historical building.

Where W is the weight vector of the final inner 4.5. Backpropagation algorithm
convolution layer, m is the number of images in
The backpropagation algorithm is widely used in most
a training batch, n is the number of categories, b is the
networks to update their weights. Its process is to use
offset,xi is the building feature labeled as yi , α is used to
a mean square error loss function to measure the error
control the separability degree, λc is used to adjust the
between the actual output and the real label of an object,
compactness within the class, λc0 is used to adjust the
and then repeatedly iterate and optimize the network
separability degree between classes.
parameters according to the error backpropagation
mechanism.
4.4. Residual learning There is a sample data set in a neural network of
forward propagation, and the loss function by Euclidean
A deep CNN can transform the feature representation distance is obtained by Equation 12.
of samples in the original space into a new feature
space, and then obtain the hierarchical feature repre 1 �� 2
J ðW; b; x; yÞ ¼ ��hw;b ðxÞ y �� (12)
sentation by automatic learning. When compared with 2
other traditional pattern recognition methods, the
deep learning network has more layers and its struc where m represents the number of categories of sam
ture is more complex so it has stronger abilities in ples, hw;b ðxÞ is the network output, and y is the actual
feature learning and feature expression. Therefore, the output.
depth of the network determines the model perfor The expression multiplied by 1/2 is to make the
mance. However, along with the network deepening, derivative of the loss function coefficient equal to 1.
the network will degrade and the accuracy of classifi The whole loss function of the sample data set consists
cation recognition quickly reaches a maximum, and of two parts (Equation 13): The first is the overall loss
then deeper levels would cause a higher error rate. Of function of the data set, and the second is the weight
course, the higher error rate caused by the network attenuation of the whole network.
degradation is not caused by over-fitting but only by � X �
1 m i i
�
more layers. When the problem of network J ðW; bÞ ¼ J w; b; x ; y
m i¼1
degradation is solved, a deeper network can be trained
λ Xnl 1 Xsl Xslþ1 � ðlÞ �2
successfully. The residual network introduces a cross- þ Wji (13)
2 l¼1 i¼1 j¼1
layer connection, or a shortcut connection, to solve
this problem, which is constructed as follows. Because the function J ðW; b; x; yÞ is continuous and
If a cross-layer connection is added, the relationship derivable, it can be updated by its negative derivative of
between F(x) and G(x) is shown as follows. weights along with the gradient directions (W and b in
F ðxÞ ¼ GðxÞ x (9) the function), also called the gradient descent method.
The parameters W and b are updated by Equation 14
The cross-connected module is called the residual and 15.
module because it calculates the residual error when
not connected. fmi g represents the ownership value of ðl Þ ðl Þ @
Wij ¼ Wij α ðl Þ
J ðW; bÞ (14)
the residual module, then the output of the residual @Wij
module is y.
y ¼ F ðx; fmi gÞ þ x (10)
ðl Þ ðl Þ @
bi ¼ bi α J ðW; bÞ (15)
When calculating the residual module, F ðx; fmi gÞ @bi
ðlÞ
and x should have the same dimension. If their digits

are different, a weight matrix m needs to be introduced. Where α represents the network learning speed or
After x is linearly projected, their digits are adjusted to the learning rate.
be the same. The partial derivative of the overall loss function is
y ¼ F ðx; fmi gÞ þ ws x (11) obtained.
2 3
In the process of forwarding propagation, F ðxÞ þ x is @ 1 X m @
the sum of F ðxÞ and x, where no additional parameters ðlÞ
J ðW; bÞ ¼ 4 ðl Þ
Jðw; b; xi ; yi Þ5
@W m i¼1
@W
are introduced so that the complexity of the original ij ij
ðl Þ
network will not be affected and a backpropagation þ λWij (16)
algorithm can still be used to train the network.
10 W. HU AND R. HU
" # " #
@ 1 Xm @ i i l l 1 @
ðl Þ
J ðW; bÞ ¼ ðlÞ
Jðw; b; x ; y Þ (17) b ¼b α Jðw; b; xi ; yi (23)
@b m i¼1
@b m @bðlÞ
i i i
The weight attenuation part only acts on the para The process of backpropagation is to search for the
meter W, and the whole backpropagation process con minimum value of the loss function J ðW; bÞ, in which
sists of four steps. the parameters W and b play important roles. At first, all
(1) Regarding each output unit at the nl output layer, parameters Wij are initialized to a random value close to
the error of the i-th unit can be expressed as follows. zero. Then, the gradient descent method is used to
@ � � � � reduce the value of the loss function J ðW; bÞ in repeated
ðn Þ ð nl Þ 0 ðnl Þ
δi l ¼ ðn Þ
J ðW; b; x; y Þ ¼ y i aj � f Z j iteration.
@bZi l
(18)
(2) Regarding each output unit at the hidden layer 5. Pilot project in Shanghai
nl 1, the error of the i-th unit at the l-th layer be There is an upgrading project in Shanghai, China, con
expressed. sisting of 49 historical buildings. All of them were built
ðnl 1Þ @ a century ago and are slowly deteriorating now. Since
δi ¼ ðnl 1Þ
J ðW; b; x; yÞ they are heritages, their appearance should remain as
@Z
�X i ��
Snl ðnl 1Þ ð nl 1 Þ ðn Þ
the old but their structural components should be
¼ j¼1
Wij � δi �f 0 Zj l (19) replaced and reinforced. Therefore, a reserve engineer
ing technology is selected to create 3D models to eval
(3) Partial derivative can be obtained. uate this project. Managers utilize a UAV, a 3D laser
@ ðlÞ ðlþ1Þ scanner, and other devices to capture building data. To
ðl Þ
J ðW; b; x; yÞ ¼ aj δi (20) avoid potential blocks during capturing data, branches
@Wij
around buildings have been trimmed (shown in
Figure 5)
@ ðlþ1Þ
ðl Þ
J ðW; b; x; yÞ ¼ δi (21)
@bi
(4) The weight parameters are updated. 5.1. Multi-source heterogeneous data and
2 3 surveying routes
ðl Þ ðl Þ 1 @ � ðl Þ Before updating a historical building, we survey and
Wij ¼ Wij α4 J w; b; xi ; yi 5 þ λWij
m @W ðlÞ record its facades and features. The surveying task
ij
consists of four types of activities: (1) surveying the
(22)
architectural features including facades by 3D laser
Figure 5. A historical building in the pilot project.

scanners, UAV, and total stations; (2) investigating building corner and there are six vertical lines around
the building components and interior by photos; (3) the 5th building, and we set eight control points along
identifying the building materials by tests; (4) locat a vertical line and there are 48 control points on the
ing key positions by total stations. An accurate mea surface of the 5th building.
surement is critical for developing 3D building The height of the control points is evaluated by the
models. remote elevation measurement method. After compar
ing the control points between the point clouds and the
5.1.1. Surveying devices and surveying routes total station’s measurement, this building has sloped up
The surveying job is well planned in three steps: the slightly at the eastern façade, and the 3D point cloud
preliminary surveying step, the comprehensive survey models are calibrated by 48 control points. The calibra
ing step, and the supplementary surveying step. tion errors of control points’ heights are shown in
A Phantom 4 pro UAV with a 20 megapixel came, Table 1. The result indicates that the maximum calibra
a FARO FocusS 350 laser scanner and a Leica TS02 tion error is 18 mm, the minimum calibration error is 4
total station were utilized in this surveying task. mm, and the average calibration error is 13 mm, which
Surveying routes in a historical block with six buildings means that the 3D point clouds can meet the require
(shown in Figure 6) are carefully designed for surveying ment of building models.
devices to capture building data. We have surveyed each
historical building based on its surveying route. There 5.1.3. Preliminary tests of 3D point cloud models
are nine ground stations designed for the 5th building The 5th building surface is scanned by a sampling inter
(shown in Figure 7), which are 6 to 10 m away from the val of 5 mm and the surface model includes 2.4 million
building corners, where the surveying devices would cloud points. Since the accuracy of the 3D scanner is 1
scan or survey at least two building facades. The point mm at a single location, a dense point cloud model
cloud data from the laser scanner at two stations would would have a higher accuracy and a relative accuracy
overlap 50% of each façade, and the UAV photogram could reach 1 mm. To compare the accuracy of various
metry could upgrade its color, texture, and resolution. point cloud models, we develop its surface model by
A field investigation is also critical in collecting building four kinds of points: 24 million points, 4.8 million
data e.g. building materials, components, and features. points, 2.4 million points, and 1.2 million points,
whose sample intervals are from 5 mm to 20 mm and
5.1.2. Control points and calibration errors of 3D the comparison shows that all surface models can
point cloud models express this building’s details and their relative accura
To verify the accuracy of 3D scanning, we adopt the cies can reach 1 mm. After reviewing the results, we
total station to survey the control points in historical adopted the sampling interval of 5 mm to scan other
buildings. For example, we set a vertical line along the historical buildings in this project.
Figure 6. Deployment of surveying routes and ground stations.

12 W. HU AND R. HU
laser scanning, which would result in a better fusion of

multi-data. The are 316 photos taken after shooting the
building and their average ground resolution is 0.7 mm.
Figure 7. Deployment of the ground stations and the surveying

route around a building.
5.1.4. UAV surveying schemes and routes

After reviewing the surroundings and roofs, we pro
posed various UAV survey schemes and routes for dif
ferent buildings. In the UAV flight scheme for the 5th
building, the flight height is 9.324 m for the first circle,
13.465 m for the second circle, 17.378 m for the third
circle, and 22.164 m, the longitudinal overlap rate is
85%, the lateral overlap is 80%, the camera angle is
45°, and the ground sample distance (GSD) is 10 mm.
Since the UAV CMOS sensor size is 1 inch and f = 8.8
mm, the minimum GSD is H � af ¼5.89 mm, which
meets the surveying requirement. Figure 8. Example of UAV circling routes around a building.
The UAV flies in a circling route (shown in Figure 8)
for 38 min, at the speed of 5 m/s. The camera takes model is developed (shown in Figure 9). To improve
photos of the building in the same flight route in five its precision and reduce measurement errors, some key
directions: vertical, tilt forward, tilt back, tilt left, and tilt positions in the building model are calibrated.
right. The 48 control points on the 5th building surface When the building images and data are trained by the
are used to calibrate models in the UAV tilt photogram 3D CNN model, the results at each convolution layer are
metry. To secure similar lighting effects, the UAV tilt obtained (shown in Figure 10). The results from the
photography is conducted at the same time as the 3D CNN model can be refined further in an HBIM platform
Table 1. Calibration errors of 48 control points’ heights on the 5th building.

No. of control Calibration error No. of control Calibration error No. of control Calibration error No. of control Calibration error
points (m) points (m) points (m) points (m)
SE01 0.006 SC05 0.013 NW01 0.006 NC05 0.017
SE02 0.013 SC06 0.007 NW 02 0.013 NC06 0.015
SE03 0.014 SC07 0.015 NW 03 0.016 NC07 0.014
SE04 0.015 SC08 0.017 NW 04 0.007 NC08 0.017
SE05 0.017 SW01 0.005 NW 05 0.015 NE01 0.004
SE06 0.015 SW02 0.009 NW 06 0.011 NE02 0.009
SE07 0.013 SW03 0.012 NW 07 0.014 NE03 0.015
SE08 0.017 SW04 0.013 NW 08 0.009 NE04 0.014
SC01 0.004 SW05 0.017 NC01 0.005 NE05 0.013
SC02 0.009 SW06 0.018 NC02 0.015 NE06 0.016
SC03 0.012 SW07 0.017 NC03 0.013 NE07 0.016
SC04 0.015 SW08 0.011 NC04 0.017 NE08 0.015
Figure 9. Extraction of building edges and modeling in draft.
so that a 3D building model with more detailed infor 5.4. Evaluation of various models
mation e.g. building components and facilities will be
To test the accuracy of models generated by the
created and applied in exporting to a serial of construc
multi-source data fusion method, we compare the
tion drawings.
coordinate errors of 48 control points in the 3D
point cloud model, UAV tilt photogrammetry
model, and the deep fusion model, with the total
5.3. Recovery of the damaged building details station surveying. The comparison of errors in
To repair the damaged building details, we use 263 three dimensions (ΔX, ΔY, and ΔZ) of control points
photos to train the CNN model, and then we use 140 and their root mean square (RMS) is shown in
photos to check the accuracy of the CNN model. Table 2.
When damaged building details are identified by The result of the 5th building surveying reveals that
the CNN model, then it spots them and tries to the deep fusion model is much better than the 3D point
repair them. For example, several details in a door cloud model and the tilt photogrammetry model, which
header were damaged, which are recognized by the proves that the modeling precision by multi-source data
CNN model, then it recovers them, shown in fusion technology can meet the requirement of a 3D
Figure 11. historical building model.
14 W. HU AND R. HU
Figure 10. 3D CNN process of a building model.
Figure 11. Recovery of damaged building details.
Table 2. Comparison of the coordinate errors of control points in various models of the 5th building surface.
3D point cloud model (m) Tilt photogrammetry model (m) Deep fusion model (m)
No. of control points ΔX ΔY ΔZ ΔX ΔY ΔZ ΔX ΔY ΔZ
SE01 0.008 0.013 0.017 0.021 0.016 0.027 0.011 0.005 0.012
SE02 0.011 0.012 0.024 0.015 0.014 0.017 0.007 0.008 0.015
SE03 0.012 0.008 0.015 0.013 0.027 0.014 0.008 0.009 0.017
SE04 0.012 0.009 0.021 0.016 0.026 0.027 0.011 0.013 0.017
SE05 0.013 0.014 0.015 0.017 0.023 0.025 0.008 0.016 0.012
SE06 0.014 0.016 0.029 0.023 0.022 0.019 0.017 0.012 0.021
SE07 0.011 0.018 0.012 0.024 0.018 0.017 0.013 0.007 0.006
SE08 0.015 0.019 0.022 0.018 0.019 0.014 0.006 0.010 0.007
SC01 0.006 0.016 0.021 0.016 0.012 0.017 0.007 0.013 0.008
SC02 0.008 0.015 0.014 0.025 0.017 0.026 0.005 0.011 0.012
SC03 0.018 0.005 0.027 0.023 0.026 0.017 0.012 0.012 0.021
SC04 0.013 0.004 0.006 0.017 0.021 0.011 0.009 0.007 0.003
SC05 0.011 0.002 0.013 0.018 0.014 0.025 0.012 0.006 0.011
SC06 0.004 0.009 0.021 0.027 0.017 0.028 0.005 0.007 0.013
SC07 0.012 0.015 0.017 0.024 0.026 0.023 0.011 0.013 0.006
SC08 0.014 0.013 0.018 0.029 0.024 0.027 0.008 0.006 0.007
RMS 0.012 0.013 0.019 0.021 0.021 0.022 0.009 0.010 0.013
Comparison of other control points are not listed here.
6. Conclusion and future studies features, it is hardly identified by manual surveying and
The key task before repairing a historical building is its 3D models are impossible to be created effectively
how to create a complete 3D model with featured with manual measurement tools. As an excellent deep
facades, detailed decorations, and architectural compo learning method, CNN has widely been used to recog
nents. Since a historical building has so many unique nize complicated features. Learning skills in feature
recognition from sufficient cases, improve its capability decision-making. Review of Habitat International 81:12–23.
of distinguishing different features. However, the build doi:10.1016/j.habitatint.2018.09.003.
ing features extracted by the traditional loss function are Cheng, L., J. Gong, L. Manchun, and Y. Liu. 2011. 3D building
model reconstruction from multi-view aerial imagery and
not good enough to be discriminated from other fea
lidar data. Review of Photogrammetric Engineering and
tures in a large number of data sets, the building infor Remote Sensing 77 (2):125–39. doi:10.14358/pers.77.2.125.
mation cannot be completely represented by those Cheng, L., L. Tong, Y. Chen, W. Zhang, J. Shan, Y. Liu, and
features. A 3D CNN model with one input layer, five L. Manchun. 2013. Integration of LiDAR data and optical
3D convolutional and pooling layers, and two connected multi-view images for 3D reconstruction of building roofs.
layers, is developed to fuse all data from various sources Review of Optics and Lasers in Engineering 51 (4):493–502.
like a UAV, a 3D laser scanner, a total station, and other doi:10.1016/j.optlaseng.2012.10.010.
Clarke, J. A., and D. F. Laefer. 2014. Systematic approach for
images. By improving the CNN structure and the loss large-scale, rapid, dilapidation surveys of historic masonry
function, the 3D CNN model has constructed 3D build buildings. Review of International Journal of Architectural
ing models for a pilot project of historical buildings in Heritage 8 (2):290–310. doi:10.1080/15583058.2012.
Shanghai by multi-source data fusion. The pilot project 692849.
proves that the 3D CNN method can improve the effi Collepardi, M. 1999. Thaumasite formation and deterioration
ciency and accuracy of modeling historical buildings in in historic buildings. Review of Cement & Concrete
Composites 21 (2):147–54. doi:10.1016/s0958-9465(98)
an automatic process. How the 3D building models are
00044-4.
further refined in HBIM platforms will be investigated Cornelis, N., B. Leibe, K. Cornelis, and L. Van Gool. 2008. 3D
by future studies. urban scene modeling integrating recognition and
reconstruction. Review of International Journal of
Computer Vision 78 (2–3):121–41. doi:10.1007/s11263-
Acknowledgments 007-0081-9.
Egorova, O., and D. Shcherbinin. 2016. Creating technical
The authors wish to acknowledge Tongji Architectural Design heritage object replicas in a virtual environment. Review
(Group) Co., Ltd for extending support in carrying out the of Frontiers of Mechanical Engineering 11 (1):108–15.
field study. Financial support is from the National Natural doi:10.1007/s11465-016-0363-4.
Science Foundation of China (grant number 71971158). Franceschi, M., M. Martinelli, L. Gislimberti, A. Rizzi, and
M. Massironi. 2015. Integration of 3D modeling, aerial
LiDAR and photogrammetry to study a synsedimentary
Disclosure statement structure in the early Jurassic Calcari Grigi (Southern
Alps, Italy). Review of European Journal of Remote Sensing
No potential conflict of interest was reported by the author(s). 48 (1):527–39. doi:10.5721/EuJRS20154830.
Galanakis, D., E. Maravelakis, D. P. Pocobelli, N. Vidakis,
M. Petousis, A. Konstantaras, and M. Tsakoumaki. 2023.
References SVD-based point cloud 3D stone by stone segmentation for
cultural heritage structural analysis-The case of the Apollo
Alizadeh-Khameneh, M. A., M. Horemuz, A. B. O. Jensen,
Temple at Delphi. Review of Journal of Cultural Heritage
and J. V. Andersson. 2018. Optimal vertical placement of
61:177–87. doi:10.1016/j.culher.2023.04.005.
total station. Review of Journal of Surveying Engineering
Heipke, C., and F. Rottensteiner. 2020. Deep learning for
144 (3). doi:10.1061/(asce)su.1943-5428.0000255.
geometric and semantic tasks in photogrammetry and
Andriasyan, M., J. Moyano, J. E. Nieto-Julian, and D. Anton.
2020. From point cloud data to building information mod remote sensing. Review of Geo-Spatial Information Science
elling: An automatic parametric workflow for heritage. 23 (1):10–19. doi:10.1080/10095020.2020.1718003.
Review of Remote Sensing 12 (7):1094. doi:10.3390/ Jaafar, H. A., X. L. Meng, A. Sowter, and P. Bryan. 2017. New
rs12071094. approach for monitoring historic and heritage buildings:
Bastem, S. S., and A. Cekmis. 2022. Development of historic Using terrestrial laser scanning and generalised procrustes
building information modelling: A systematic literature analysis. Review of Structural Control & Health Monitoring
review. Review O Building Research and Information 24 (11):e1987. doi:10.1002/stc.1987.
50 (5):527–58. doi:10.1080/09613218.2021.1983754. Janssens, O., V. Slavkovikj, B. Vervisch, K. Stockman,
Bruno, N., and R. Roncella. 2019. HBIM for conservation: M. Loccufier, S. Verstockt, R. Van de Walle, and S. Van
A new proposal for information modeling. Review of Hoecke. 2016. Convolutional neural network based fault
Remote Sensing 11 (15):1751. doi:10.3390/rs11151751. detection for rotating machinery. Review of Journal of Sound
Bruno, S., M. De Fino, and F. Fatiguso. 2018. Historic building and Vibration 377:331–45. doi:10.1016/j.jsv.2016.05.027.
information modelling: Performance assessment for Jo, Y. H., and S. Hong. 2019. Three-dimensional digital doc
diagnosis-aided information modelling and management. umentation of cultural heritage site based on the conver
Review of Automation in Construction 86:256–76. doi:10. gence of terrestrial laser scanning and unmanned aerial
1016/j.autcon.2017.11.009. vehicle photogrammetry. Review of Isprs International
Chen, C. S., Y. H. Chiu, and L. C. Tsai. 2018. Evaluating the Journal of Geo-Information 8 (2):53. doi:10.3390/
adaptive reuse of historic buildings through multicriteria ijgi8020053.
16 W. HU AND R. HU
Jordan-Palomar, I., P. Tzortzopoulos, J. Garcia-Valldecabres, Mishra, M. 2021. Machine learning techniques for structural
and E. Pellicer. 2018. Protocol to manage Heritage-Building health monitoring of heritage buildings: A state-of-the-art
Interventions using heritage building information review and case studies. Review of Journal of Cultural
Modelling (HBIM).” Review of Sustainability 10 (4):908. Heritage 47:227–45. doi:10.1016/j.culher.2020.09.005.
doi:10.3390/su10030908. Nieto-Julian, J. E., D. Anton, and J. Jose Moyano. 2020.
Karadag, İ. 2023. Machine learning for conservation of archi Implementation and management of structural deforma
tectural heritage. Review of Open House International tions into historic building information models. Review of
48 (1):23–37. doi:10.1108/ohi-05-2022-0124. International Journal of Architectural Heritage
Karimi, F., N. Valibeig, G. Memarian, and A. Kamari. 2022. 14 (9):1384–97. doi:10.1080/15583058.2019.1610523.
Sustainability rating systems for historic buildings: Nti, I. K., A. Felix Adekoya, and B. Asubam Weyori. 2021.
A systematic review. Review of Sustainability A novel multi-source information-fusion predictive frame
14 (19):12448. doi:10.3390/su141912448. work based on deep neural networks for accuracy enhance
Koutsoudis, A., G. Ioannakis, P. Pistofidis, F. Arnaoutoglou, ment in stock market prediction. Review of Journal of Big
N. Kazakis, G. Pavlidis, C. Chamzas, and N. Tsirliganis. Data 8 (1). doi: 10.1186/s40537-020-00400-y.
2021. Multispectral aerial imagery-based 3D digitisation, Ozmen, C., A. E. Akan, and A. I. Unay. 2011. Analysis of
segmentation and annotation of large scale urban areas of a historic masonry building. Review of Gradevinar
significant cultural value. Review of Journal of Cultural 63 (5):449–58.
Heritage 49:1–9. doi:10.1016/j.culher.2021.04.004. Patrucco, G., A. Gomez, A. Adineh, M. Rahrig, and
Lee, P. C., W. Xie, T. P. Lo, D. B. Long, and X. F. Tang. 2019. J. L. Lerma. 2022. 3D data fusion for historical analyses of
A cloud model-based knowledge mapping method for his heritage buildings using thermal images: The Palacio de
toric building maintenance based on building information Colomina as a case study. Review of Remote Sensing
modelling and ontology. Review of Ksce Journal of Civil 14 (22):5699. doi:10.3390/rs14225699.
Engineering 23 (8):3285–96. doi:10.1007/s12205-019-2457-0. Pesci, A., G. Teza, and E. Bonali. 2011. Terrestrial laser scan
Leonov, A. V., M. N. Anikushkin, A. V. Ivanov, ner resolution: Numerical simulations and experiments on
S. V. Ovcharov, A. E. Bobkov, and Y. M. Baturin. 2015. spatial sampling optimization. Review of Remote Sensing
Laser scanning and 3D modeling of the Shukhov hyperbo 3 (1):167–84. doi:10.3390/rs3010167.
loid tower in Moscow. Review of Journal of Cultural
Pu, S., and G. Vosselman. 2009. Knowledge based reconstruc
Heritage 16 (4):551–59. doi:10.1016/j.culher.2014.09.014.
tion of building models from terrestrial laser scanning data.
Li, H., J. Sun, X. Zongben, and L. Chen. 2017. Multimodal 2D
Review of Isprs Journal of Photogrammetry and Remote
+3D facial expression recognition with deep fusion convo
Sensing 64 (6):575–84. doi:10.1016/j.isprsjprs.2009.04.001.
lutional neural network. Review of Ieee Transactions on
Rao, C. P., and Y. Liu. 2020. Three-dimensional convolutional
Multimedia 19 (12):2816–31. doi:10.1109/tmm.2017.
neural network (3D-CNN) for heterogeneous material
2713408.
homogenization. Review of Computational Materials
Lim, J. S., S. Gleason, M. Williams, G. J. Matas, D. Marsden,
Science 184:109850. doi:10.1016/j.commatsci.2020.109850.
and W. Jones. 2022. UAV-based remote sensing for mana
Saba, T., M. Attique Khan, A. Rehman, and S. Larabi Marie-
ging alaskan native heritage landscapes in the
Yukon-Kuskokwim Delta. Review of Remote Sensing Sainte. 2019. Region extraction and classification of skin
14 (3):728. doi:10.3390/rs14030728. cancer: A heterogeneous framework of deep CNN features
Lv, F., J. Huang, W. Wang, Y. Wei, Y. Sun, B. Wang, and fusion and reduction. Review of Journal of Medical Systems
P. Pławiak. 2019. A two-route CNN model for bank 43 (9). doi: 10.1007/s10916-019-1413-3.
account classification with heterogeneous data. Review of Song, W., L. Zhang, Y. Tian, S. Fong, L. Jinming, and
Plos One 14 (8):e0220631. doi:10.1371/journal.pone. A. Gozho. 2020. CNN-based 3D object classification using
0220631. hough space of LiDAR point clouds. Review of Human-
Martinez, J., A. Soria-Medina, P. Arias, and A. Felippe Centric Computing and Information Sciences 10 (1). doi:10.
Buffara-Antunes. 2012. Automatic processing of terrestrial 1186/s13673-020-00228-8.
laser scanning data of building facades. Review of Sztwiertnia, D., A. Ochalek, A. Tama, and P. Lewinska. 2021.
Automation in Construction 22:298–305. doi:10.1016/j.aut HBIM (heritage building information modell) of the Wang
con.2011.09.005. stave church in Karpacz – case study. Review of
Martinez-Carricondo, P., F. Carvajal-Ramirez, L. Yero- International Journal of Architectural Heritage
Paneque, and F. Aguera-Vega. 2021. Combination of 15 (5):713–27. doi:10.1080/15583058.2019.1645238.
HBIM and UAV photogrammetry for modelling and doc Templin, T., and D. Popielarczyk. 2020. The use of low-cost
umentation of forgotten heritage. Case study: Isabel II dam unmanned aerial vehicles in the process of building models
in Nijar (Almeria, Spain). Review of Heritage Science 9 (1). for cultural tourism, 3D web and augmented/mixed reality
doi:10.1186/s40494-021-00571-8. applications. Review of Sensors 20 (19):5457. doi:10.3390/
Mesanza-Moraza, A., I. Garcia-Gomez, and A. Azkarate. s20195457.
2021. Machine learning for the built heritage archaeological Walter, N. 2014. From values to narrative: A new foundation
study. Review of Acm Journal on Computing and Cultural for the conservation of historic buildings. Review of
Heritage 14 (1):1–21. doi:10.1145/3422993. International Journal of Heritage Studies 20 (6):634–50.
Milic, V., K. Ekelow, M. Andersson, and B. Moshfegh. 2019. doi:10.1080/13527258.2013.828649.
Evaluation of energy renovation strategies for 12 historic Wang, N. N., X. F. Zhao, P. Zhao, Y. Zhang, Z. Zou, and
building types using LCC optimization. Review of Energy J. P. Ou. 2019. Automatic damage detection of historic
and Buildings 197:156–70. doi:10.1016/j.enbuild.2019.05.017. masonry buildings based on mobile deep learning. Review
of Automation in Construction 103:53–66. doi:10.1016/j. Yang, X., L. Yi-Chou, A. Murtiyoso, M. Koehl, and
autcon.2019.03.003. P. Grussenmeyer. 2019. HBIM modeling from the surface
Wang, Q., C. M. Yang, J. G. Lu, F. W. Wu, and R. Xu. 2020. mesh and its extended capability of knowledge
Analysis of preservation priority of historic buildings along representation. Review of Isprs International Journal of Geo-
the subway based on matter-element model. Review of Information 8 (7):301. doi:10.3390/ijgi8070301.
Journal of Cultural Heritage 45:291–302. doi:10.1016/j.cul Yu, T. X., C. Lin, S. J. Zhang, C. X. Wang, X. H. Ding, H. L. An,
her.2020.03.003. X. X. Liu, T. Qu, L. Wan, S. You, et al. 2022. Artificial intelli
Wu, B., L. Xie, H. Han, Q. Zhu, and E. Yau. 2018. Integration gence for Dunhuang cultural heritage protection: The project
of aerial oblique imagery and terrestrial imagery for opti and the dataset. Review of International Journal of Computer
mized 3D modeling in urban areas. Review of Isprs Journal Vision. 130(11):2646–73. doi:10.1007/s11263-022-01665-x.
of Photogrammetry and Remote Sensing 139:119–32. doi:10. Zaragoza, I. M.-E., G. Caroti, A. Piemonte, B. Riedel,
1016/j.isprsjprs.2018.03.004. D. Tengen, and W. Niemeier. 2017. Structure from motion
Yang, C., F. Rottensteiner, and C. Heipke. 2021. A hierarchical (SfM) processing of UAV images and combination with
deep learning framework for the consistent classification of terrestrial laser scanning, applied for a 3D-documentation
land use objects in geospatial databases. Review of Isprs in a hazardous situation. Review of Geomatics Natural
Journal of Photogrammetry and Remote Sensing Hazards & Risk 8 (2):1492–504. doi:10.1080/19475705.
177:38–56. doi:10.1016/j.isprsjprs.2021.04.022. 2017.1345796.
Yang, L., Y. Sheng, and B. Wang. 2016. 3D reconstruction of Zhang, E., B. T. Xue, F. Z. Cao, J. H. Duan, G. F. Lin, and
building facade with fused data of terrestrial LiDAR data Y. F. Lei. 2019. Fusion of 2D CNN and 3D DenseNet for
and optical image. Review of Optik 127 (4):2165–68. doi:10. dynamic gesture recognition. Review of Electronics
1016/j.ijleo.2015.11.147. 8 (12):1511. doi:10.3390/electronics8121511.
Yang, X. C., P. Grussenmeyer, M. Koehl, H. Macher, Zou, Z., X. F. Zhao, P. Zhao, F. Qi, and N. N. Wang. 2019.
A. Murtiyoso, and T. Landes. 2020. Review of built heritage CNN-based statistics and location estimation of missing
modelling: Integration of HBIM and other information components in routine inspection of historic buildings.
techniques. Review of Journal of Cultural Heritage Review of Journal of Cultural Heritage 38:221–30. doi:10.
46:350–60. doi:10.1016/j.culher.2020.05.008. 1016/j.culher.2019.02.002.

4-Creating Historical Building Models

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4-Creating Historical Building Models

Uploaded by

Copyright:

Available Formats

International Journal of Architectural Heritage

Conservation, Analysis, and Restoration

ISSN: (Print) (Online) Journal homepage: https://www.tandfonline.com/loi/uarc20

Creating Historical Building Models by Deep

Wenfa Hu & Ruiqi Hu

To link to this article: https://doi.org/10.1080/15583058.2023.2229253

Published online: 26 Jun 2023.

Submit your article to this journal

Article views: 113

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at

Creating Historical Building Models by Deep Fusion of Multi-Source

ABSTRACT ARTICLE HISTORY

1. Introduction traditional reverse engineering method for historical

CONTACT Wenfa Hu wenfahu@tongji.edu.cn

2.2. Heterogeneous data from multi-sources in

Figure 2. 3D convolution process.

C1:16@3x3x3 C2:32@3x3x3 C3:64@3x3x3

Figure 3. The 3D CNN structure.

Figure 4. Process of 3D CNN in a historical building.

and x should have the same dimension. If their digits

Figure 5. A historical building in the pilot project.

Figure 6. Deployment of surveying routes and ground stations.

laser scanning, which would result in a better fusion of

Figure 7. Deployment of the ground stations and the surveying

5.1.4. UAV surveying schemes and routes

Table 1. Calibration errors of 48 control points’ heights on the 5th building.

Figure 9. Extraction of building edges and modeling in draft.

Figure 10. 3D CNN process of a building model.

Figure 11. Recovery of damaged building details.

You might also like