Kropp - 4D Dim Construction State Recognition

Automation in Construction 86 (2018) 11–32
Contents lists available at ScienceDirect
Automation in Construction
journal homepage: www.elsevier.com/locate/autcon
Interior construction state recognition with 4D BIM registered image T

sequences
⁎
Christopher Kroppa, Christian Kochb, , Markus Königa
a
Computing in Engineering, Ruhr-University Bochum, Germany
b
Intelligent Technical Design, Bauhaus-Universität Weimar, Germany
A R T I C L E I N F O A B S T R A C T
Keywords: Deviations from planned schedules in construction projects frequently lead to unexpected financial dis-
Progress monitoring advantages. However, early assessment of delays or accelerations during the phase of construction enables the
Building information modelling adjustment of subsequent and dependent tasks. Manually performed, this involves many human resources if as-
Computer vision built information is not immediately available. This is particularly valid for indoor environments, where a
Indoor construction
general overview of tasks is not given.
Image registration
In this paper, we present a novel method that increases the degree of automation for indoor progress mon-
Image recognition
itoring. The novel method recognizes the actual state of construction activities from as-built video data based on
as-planned BIM data using computer vision algorithms. To achieve that, two main steps are incorporated. The
first step registers the images with the underlying 4D BIM model. This means the discovery of the pose of each
image of a sequence according to the coordinate system of the building model. Being aware of the image origin,
it allows for the advanced interpretation of the content in consecutive processing. In the second step, the relevant
tasks of the expected state of the 4D BIM model are projected onto the image space. The resulting image regions
of interest are then taken as input for the determination of the activity state.
The method is extensively tested in the experiment section of this paper. Since each consecutive process is
based on the output of preceding steps, each process of the introduced method is tested for its standalone
characteristics. In addition, the general manner of applicability is evaluated by means of two exemplary tasks as
a concluding proof of the success of the novel method. All experiments show promising results and direct to-
wards automatic indoor progress monitoring.
1. Introduction finishing works represent a high-ranking section of every construction

project that is sensitive to schedule disorders. The average share in a
Building Information Modelling (BIM) is an essential step to the typical total budget lies between 25% and 40% [2]. As a result, a higher
digital management of construction projects. One big advantage of BIM degree of automation in progress monitoring would result in reasonable
is the holistic collection, linking and provision of data for different cost savings. The derivation of the actual state requires the automatic
planning, construction and operation tasks. In the context of con- comparison of the as-planned information that is present in BIM with
struction management, the application of 4D building models by linking the actual as-built state of the building.
activities of a schedule with corresponding building elements is very Recently, the research community has been investigating methods
common. Based on 4D building models the construction sequence can that aim to achieve a higher degree of automation in progress mon-
be analyzed and progress monitoring can be supported. itoring. Several approaches and studies that address the comparison of
In current practice, progress monitoring is mainly performed as-built and BIM-based as-planned data for this purpose have been
manually. To this end, weekly or daily paper-based progress reports are presented. For example, sensing technologies like Radio-Frequency
collected by field personnel by hand [1]. This procedure leads to a high Identification (RFID) [3,4], Ultra-Wideband (UWB) [5–8]), Wi-Fi [9],
workload of the field personnel to fulfil the need for a reasonable fre- ZigBee [10], laser scanners [11,12], Global Positioning System [13],
quency of inspections. Hence, manual progress monitoring involves image [14–19], video [20–23], and depth image [24] capturing devices
either a massive amount of human intervention or inspection updates are used to obtain data of the as-built state.
cannot be performed as frequently as required. In particular, interior Each of these technologies has its advantages and limitations in a
⁎
Corresponding author.
E-mail address: c.koch@uni-weimar.de (C. Koch).
https://doi.org/10.1016/j.autcon.2017.10.027
Received 16 May 2017; Received in revised form 25 October 2017; Accepted 28 October 2017
Available online 09 November 2017
0926-5805/ © 2017 Elsevier B.V. All rights reserved.
C. Kropp et al. Automation in Construction 86 (2018) 11–32
specific environment. For instance, radio-based technologies are able to each of the four stated main aspects regarding BIM-based progress
determine the presence of objects through walls and can, therefore, be monitoring.
deployed indoors, but are limited to presence detection or localization. In this context, vision-based methods appear to represent the ap-
Contrarily, stationary line-of-sight 3D laser sensing technologies obtain propriate means to target indoor construction state recognition.
accurate measurements within general site scenes, but are too inflexible However, current methods for indoor construction state recognition are
for indoor applications. Although depth images from portable devices independent of, and not associated with 4D BIM. This paper presents an
are more flexible, they are currently limited in range. The main aspects approach to automate 4D BIM based progress monitoring.
of BIM-based automation of progress monitoring for a specific sensing
technology and a target environment that need to be considered are:
2. Related work
1.1. Registration of as-built data to BIM model
Automation in progress monitoring has been the subject of research
Mapping of the sensed data to the BIM object instances and asso- in recent years. Various sets of different approaches have been pre-
ciation to scheduled activities. The comparison of as-built and as- sented that are based on various acquisition and processing techniques.
planned states is only possible if registration is performed. Objects from In this section, a summary of present approaches is given and the
the BIM model need to be identified in the raw data. Detection solely of technologies are introduced that this work refers to.
the type or class is not sufficient for precise progress statements. When a
column is detected, the determined state needs to be associated with
one specific instance of all available columns. 2.1. Data acquisition
1.2. Activity coverage The acquisition of as-built data is a fundamental step to ensure ac-
curate progress monitoring. Since the choice of the acquisition tech-
The raw data delivered by the sensing technology must have the nique influences all subsequent steps, the applied technique needs to be
capability of supporting the state detection of all activities that are well-chosen. Thus, the capabilities of the technique and the resulting
supposed to be performed in the actual environment. Sensing technol- raw as-built data need to be analyzed. There are several surveys that
ogies like RFID or laser scans are not able to determine states defined by give an overview on available techniques and its applications in-
sole appearance changes, for example, wall painting. corporated for progress monitoring [26–28]. These overviews mainly
list Radio-frequency Identification (RFID) technologies, laser scanning
1.3. Coverage of the location's spatial structure methods, as well as image and video capture systems as promising
technologies for as-built assessment. With RFID, construction elements
The sensing technology must meet the requirements of the en- can be inexpensively and robustly tagged due to low sensor complexity,
vironment's structure. It needs to handle permanent and temporary but requires mounting tags and does not enable determination of con-
construction obstacles and distances of the actual environment. For struction state [29]. Applications for radio frequency based technolo-
instance, utilization of common RFID technology for triangulation of gies during the construction of buildings are the localization of mate-
objects on construction sites delivers positions with an error of about rials [4,29–32], personnel [33] and inventory [34]. The disadvantages
2 m [25], resulting in not being able to determine whether materials are of RFID make these technologies unattractive for detailed progress
already mounted or still stored. However, low distance RFID might help measurement and RFID is seen to be a supporting approach for other
detect the presence of materials or construction objects within a range techniques in recent approaches for construction progress measurement
of up to 1 m (ISO/IEC 18000–3), which is useful in interior finishing [35]. Further radio frequency based technologies are utilized to capture
inspections. the as-built data, such as global navigation satellite systems (GNSS) and
Wi-Fi. For indoor construction scenes, GNSS are not available because
1.4. Dependency on infrastructure of the missing line of sight [36] and Wi-Fi is usually not present in
buildings under construction. Laser scanning offers high accuracy point
Sensing technologies may depend on infrastructure in order to cloud measurements, but consumes a certain amount of time for each
collect data. For the usage of these technologies, it needs to be con- scan [37] and/or has problems at spatial discontinuities [38]. Besides
sidered if the state of the building provides the required infrastructure. high costs for laser equipment, heavy involvement of personnel induces
For instance, Wi-Fi-based localization of objects by triangulation as- additional expenses. However, several approaches apply laser scan
sumes the presence of appropriate hardware and a dense distribution in technology for 3D object recognition using BIM and for comparison of
every part of the building, which contrasts with the stage of interior information inferred from scanned data with designed 4D BIM
finishing. [12,39–41] to track progress. Another promising form of as-built data is
Currently, there exists no proper solution that covers the difficult images and videos. Images and videos are acquired in several ways. The
situation imposed by view obscuring walls and registration issues along advantages of images and videos as the basis for progress measurement
with the dense repetition of the similar (but distinct) building elements. lie on low costs, short time, and the ease of collection. Continuous
This makes registration challenging for line-of-view sensing technolo- advents of high definition cameras and high performance processing
gies and requires complex infrastructure for radio-based technologies. units signify the potential of these types of data with an enhanced ac-
Moreover, there are challenges associated with the wide range of con- curacy of information and fast processing. The technology's main ad-
struction activities existing indoors. vantages are low cost and portability. Additionally, no special knowl-
Vision-based as-built data acquisition methods can meet all the four edge is needed to operate and taking photos of construction sites for
main aspects leveraging computer vision and image processing methods documentation reasons is a common practice. The camera system may
based on sequential imagery. Registration is possible on discovered either consist of a monocular [42] or a stereo sensor [17]. The per-
correspondences between the image and the known geometry of the spective can be fixed and known [16,43–45], or uncertain [37,46–48].
digital building model. A wide range of activities can be covered by Besides the approach of using single photos, videos have been recently
appearance-based processing methods or scene and object structure used to capture as-built scenes [18,49]. Occlusions may occur due to
recognition approaches. In addition, image sensing devices are highly the line-of-sight characteristic, but can be handled by traversing the
portable and can easily elude barriers for a better view and do not indoor scene. Both automated registration and recognition can be
depend on infrastructure. In Table 1, sensing technologies are rated for achieved by computational processing of acquired data.
12
Table 1
Rating of main aspects regarding interior finishing for different sensing technologies (with + as appropriate and – as inappropriate).
AutoID WiFi/UWB/ Laser scans Unregistered images Registered, unordered images Registered image sequences/videos
GPS
Registration + + + − − +
Activity-coverage − − − − + +
Spatial-coverage + + − − − +
Infrastructure independent + − + + + +
2.2. Image-to-image registration between both sides. In recent years, several research contributions have
been presented that address machine-based approaches for the regis-
In this subsection, an overview of different vision-based methods tration of the sensed data and the corresponding model data. These
that estimate camera motion based on the registration from one image approaches leverage correspondences of different geometric primitives
to another through a scene is presented. The following approaches of (points/lines/planes) to determine the translation, rotation and scale in
Visual Odometry (VO), Visual Simultaneously Localization and reference to the model. On the one side, there are assisted approaches
Mapping (VSLAM) and Structure from Motion (SFM) are introduced. that depend on manual intervention for the registration. On the other
Although certain similarities between these approaches are present, side, there are approaches that work automatically, but have restric-
research in the field of vision-based localization spreads into specialized tions. However, some approaches are already being used in outdoor
branches that each make distinct assumptions. progress monitoring research projects.
SFM reconstructs the camera motion and the structure of the scene. In recent years, several research results have been published dealing
The order of the input image set is usually not regarded, which leads to with the registration of acquired site data with as-planned building
an interrelation of all images with each other. One key that has led to models. In their experiments, predefined viewpoints for cameras are
automatic SFM was the introduction of advanced feature descriptors, used. Some conceptual works target towards automatic comparison of
which allow for the robust matching of unordered image sets, since they images taken from the construction site with 4D building models with
provide invariance in terms of scale and rotation. By a final global static cameras. The position and orientation of the cameras are assumed
optimization procedure, regarding the minimization of the re-projec- to be known. However, the comparison still requires manually per-
tion error, the whole model is adjusted. This optimization procedure is formed steps, like manual camera installation on the site [56,57],
usually performed with (sparse) bundle adjustment, which adjusts cam- correspondence establishment [44], manual work package creation
eras and features in space by an overdetermined system based on [58]. In Lukins et al. [57], a fixed camera is used to classify changes in
multiple observations in various views. the scene appearance as structural events related to the building model
VO has been introduced by [50] and is described as an algorithm using the pose estimation of David & DeMenthon [59]. Lukins et al.
that estimates the trajectory of a camera traversed through its en- [57] report about failed alignments due to building model complexity
vironment. The basic approach is available in a monocular binocular and cluttered scenes, and thus perform additional optimization. The
system, but other approaches consider more simultaneously analyzed approach introduced in [16] obtains prior knowledge of building
camera observations [51]. VO incrementally updates the new camera components and their occupancy within a scene from a 4D building
pose with each new image. After the track of a feature is lost, i.e. oc- model registered to the camera. The paper argues that fixed cameras on
clusion occurs or the field of view is exited, its 3D information is dis- a construction site for tracking progress are disadvantageous due to
carded [52]. In order to reduce the drift that occurs during the incre- inflexibility in response to changing structures. In contrast to the pre-
mental addition of new images, some approaches apply regional viously described approaches, the approaches in [46] uses a large set of
optimization on a window of the last couple of frames [53]. unconstrained images that are manually registered to a 4D BIM model
VSLAM is a vision-based method that represents a branch con- with an automatically reconstructed point cloud. In [40], a laser scan
sidering both estimation of the camera location whilst building a map. derived point cloud is registered to a 3D building model. Hereby, the as-
With this technique, a global map is built that consists of landmarks, planned model is formatted as stereo lithography, before the 3D model is
which represent the estimated position of feature correspondences in a registered with the scanned model using manually selected point cor-
sequence of image frames. The information that is stored in the map respondences. This approach is extended in [41] for full automation,
allows for active search of feature recognition. Since landmarks are kept where the registration is split into a coarse and a fine registration step
in the map although they are not visible (i.e. not in the field of view or based on iterative closest point (ICP). The approach of Kim [60] deals
simply not recognized yet), they can be recognized in future image with the fully automated registration similar to the previous approach,
frames. Some references also call VSLAM a real-time or online version but re-samples the point cloud and the 3D building model data to
of SFM [54] due to the fact that the structure of the environment is kept provide a common data layout. There are several cases in which this
in the map. Applications in SFM often aim for reconstruction of objects, approach is not applicable [61]. Moreover, the ICP algorithm needs a
whereas VSLAM keeps the landmarks in the map for the purpose of good initial guess to converge [62]. In this context, [63] conducted a
recognition of the same pose, i.e. for loop-closures of the path. Hence, survey on methods for as-built generation of BIM models including the
VSLAM only needs to involve as many features as necessary to estimate setup where as-designed BIM models are available.
the localization, which can end up in very sparsely populated maps. An Other research contributions give solutions to the problem of
approach that targets the characteristics of building structures is pre- camera pose estimation for each image. Pose estimation approaches
sented in [55] and is based on lines of three dominant directions. This that work with correspondences between the model features and their
approach shows promising results in controlled environments, but bears projections to the image sensor are also called the Perspective-n-Point/
several unsolved problems reported with the recognition of lines. Line (PnP/PnL) problem. A set of three correspondences is the smallest
subset of correspondences used in solving the problem. A higher
number of features used in algorithms typically improves the accuracy
2.3. Image-to-model registration of the solution. However, in case the correspondences contain outliers,
the accuracy is negatively affected [64]. In Chen [65], a general solu-
Registering acquired data to a model combines an as-is view with tion for the PnL problem is presented. Dhome et al. Liu et al. [66]
as-designed information. This states the foundation for comparisons
13
presented an approach that delivers a unique solution using eight line with the fixed camera and that it is not feasible to confirm every
correspondences, which Ansar et al. [67] decreases to four. Mirzaei component. Furthermore, limited success rates are supported [43].
et al. [68] try to reduce the effect of outliers. Recently, progress has Moreover, there is only completion of whole components, but no in-
been achieved by [69] in terms of robustness and accuracy with im- dication of percentage completion. It is concluded, that total automa-
precise correspondences. The approach of [64] is based on lines only tion is not possible with this approach and only visible changes are
and is declared as very robust to outliers, but requires at least nine line detected. In addition, the approach is limited since it is not generic
correspondences, whereas [70] considers points and lines. Fischler and enough for other activities.
Bolles [71] introduce the RANSAC paradigm and show its abilities for In other research contributions, vision-based activity recognition is
outlier rejection by determining the pose of a camera for cartography indirectly guessing the work in progress while workers are observed
purposes on the basis of the PnP problem for given correspondences. permanently by fixed on-site cameras [6,24,86]. In Chen et al. [6],
Furthermore, there exist image-to-model registration approaches UWB is used to fuse the camera images with three dimensional space,
that establish correspondences in addition to the estimation of the pose whereby depth images are considered in the approaches of Khosrow-
[72–81]. Some approaches incorporate point features, others use line poura et al. [24] and Escorcia et al. [86]. Progress monitoring per-
features or both. The approach presented in [72] combines an algo- formed with these approaches involves a lot of time applying the
rithm to establish the correct point correspondences [73] and pose es- camera system on one task. The workers need to be observed perma-
timation proposed in [74]. It performs a local search and determinis- nently by the camera to guess the current activity. For each application,
tically converges to a local minimum solution for an initial guess of the in particular for progress monitoring, the necessity of the resulting
object pose. This does not guarantee a globally optimal solution. Hence, discretization level needs to be minded. Additionally, in practice,
a random start scheme is proposed, when no prior pose estimation is privacy policies in many countries will not allow for the observation
provided. This increases the search space significantly, but can find a and monitoring of the workers at any time. For indoor progress mon-
globally optimal solution. This approach is extended for the case of itoring the present indirect progress estimation approaches do not seem
coplanar model points [75], but is only applicable for this case. The to be appropriate. They are not applicable in a common framework for
application is modified to work for line features in David [76]. The automation in progress monitoring with a single and common tech-
authors state that for the random start approach fewer initial guesses nique. Instead, directly deriving the state from one moment of ob-
are required than with the use of point features. Diaz and Abderrahim servation seems to be a more practical approach.
[77] modify the initial algorithm in a way that it fits the requirements Although 4D BIM offers sources of rich information that may sup-
of 3D object tracking. Other similar approaches try to reduce the search port the recognition of the current state of a building component or
space by different strategies, using gaussian mixture models [78], activity, research efforts in vision-based state recognition are mainly
branch & bound [79], evolutionary algorithms [80], and differential independent of 4D BIM. This particularly affects the recognition of
evolution [81]. When prior pose information is provided, they are more objects and materials. Hence, their association to further information
robust to occlusions, clutter and repetitive patterns [78]. accessible by the link to 4D BIM and its objects and tasks is not con-
sidered and is insufficiently investigated in research. Furthermore, a
2.4. Progress recognition wide range of materials and structural objects needs to be recognized
for indoor progress monitoring, which is not covered by just a single
The research community has shown the promising applicability of recognition method.
vision based recognition and developed several approaches that focus
on the vision-based recognition of construction components and activ- 2.5. Line segment extraction
ities.
Only a few approaches are presented that consider the detection or Line segment extraction is a task within the field of image proces-
recognition of construction components. In Roh et al. [1], object de- sing that detects appearances of straight contours. The change of pixel
tection is considered especially for indoor purposes of progress mon- intensity values from high to low and vice versa states the basis for
itoring. The approach detects air condition equipment and is in general processing algorithms. The whole images are scanned for all possible
only applicable for small objects with distinct characteristics. Kropp lengths and poses of line segments. Line segment detection algorithms
et al. [84] present an approach that aims to recognize structural fin- follow different strategies to select gradient pixels for potential line
ishing components. The approach is evaluated with the recognition of association and connect them to segments. This requires the para-
heating devices and is able to differentiate between different activity metrization of each algorithm, which influences the output result. A
states. The cascaded material state determination approach presented fundamental parameter used in many algorithms is the gradient
in Kropp et al. [83] is extended by Hamledari et al. [82] to detect threshold that discards pixels, when the gradient value is too low.
electrical outlet, but is not linked to BIM and thus not able to perform Sensitive thresholds regard low gradient line contours, but include false
recognition nor identification of components. lines due to noise and clutter, and are accompanied by a long runtime.
Some work has been carried out to include construction work Contrarily, thresholds that only regard high gradients do not detect a
packages and activities into the recognition process to detect the pro- complete set of lines. In recent years, state-of-the-art line segment de-
gress and change of construction elements. Work package and object tection algorithms have been presented [87,88]. Grompone von Gioi
progress (or change) detection or recognition are considered in those et al. [87] have presented a line segment detector that works on the
approaches [43,57,85]. Two major trends can be observed that consider directions of the gradients for the environment of each pixel and con-
the estimation of progress of work packages and activities. The existing nects those that are adjacent and share a similar angle. The line seg-
approaches either analyze the result of the work by means of the re- ment detector based on the work of Akinlar and Topal [88] computes
sulting object or they do track the labor itself and derive completion anchor points that are in contrast to surrounding pixels and connects
according to the type and intensity of actions. The approaches pre- the found points to lines, when they are connected by pixels with fitting
sented by Ibrahim [85] and Zhang [43] handle visual work package gradients. Other works focus on the recognition of lines from image to
progress inspection by present 3D models and a fixed camera. These image concerning their appearances for matching purposes [89–91].
approaches investigate images of a work package under construction These approaches not only regard the pixels that state the line segment,
from the same view over the lapse of time. Nevertheless, it is reflected but also consider the surrounding pixels along the line to create de-
that this approach has several challenges by heavy clutter, uncontrolled scriptors. The disadvantage using those descriptors is the dependency
and unforeseeable lighting variations, dynamic target appearance and on the line length. Line segments cannot be recognized on basis of the
frequent occlusions [85]. Additionally, there is the concern of coverage descriptor when the observable line length varies.
14
2.6. Problem statement and objectives elaborated into a finer camera pose estimated relative to the building
model geometry. Video frames that do not contain sufficient geometry
For controlling project costs and time, on-site inspections are ne- of the building model for fine registration are forwarded to a rough
cessary to determine the current progress of interior finishing works. motion estimation process that bypasses the lack of the visibility of
Currently, there are no methods available that sufficiently assist and actual building model geometry until fine pose estimation is available
continuously automate the progress monitoring. Existing methods again. This way, switching from fine to rough estimation of the camera
mostly deal with outdoor construction sites. Furthermore, these poses enables the automatic registration from the point of initial re-
methods are neither directly applicable indoors nor validated for in- gistration. After all video frames are processed this way, the registration
terior sites. Moreover, they are predominantly concerned with accurate of as-built data to the building model is complete.
geometry reconstruction and subsequent as-design comparison, but lack After registration, the recognition block represents the part of the
the capability of activity state recognition and project delay prediction. method that determines the actual state of activities. Based on the re-
Furthermore, there is only a small amount of building activities covered gistered video frames, the single images are analyzed in an appearance-
and no approach considers a generic framework for involving the var- based approach. First, the search space for objects related to relevant
ious set of building activities indoors. Additionally, BIM data is used for activities is reduced to avoid analyzing negligible video frames. In ad-
as-built/as-planned matching and for visualization purposes. However, dition, only image areas that do contain the projected model objects are
none of the concerned approaches considers the usage of 4D BIM data further regarded for analysis. Second, a rectification of the regarded
to support activity state determination referring to the 4D BIM sche- image regions is performed to enable a more homogeneous basis of the
dule. Moreover, research in the field of indoor navigation does not image data for the succeeding state recognition. During state recogni-
provide solutions that match the required aspects stated in the in- tion, the input image regions are classified to identify the actual state of
troduction section. State-of-the-art approaches like the work of Van the observed activity.
Opdenbosch et al. [92] are based on infrastructure for a server con-
nection as well as on obtaining information about a rough localization 3.1. Registration
and a prior information of the scene in terms of laser scans and images.
This does not allow for the deployment in frequently changing scenes The registration intends to make the as-built acquired image data
like congested interior construction sites. comparable to the as-planned information contained in the BIM model.
To overcome the mentioned limitations, the conceptual framework Thereby, registration means to reveal the pose of the camera that
proposed in this paper addresses the following main research objec- captured an image frame relative to the coordinate system of the
tives: building model. Ideally, the registration is performed completely au-
tomatically and accurately. Since information extraction solely from
- Leverage of 4D BIM information to enable smart methods for video images does not always allow to establish a distinct assignment to a
based progress monitoring; pose and accurate positioning systems are not available indoors,
- Robust registration of acquired as-built data with the underlying 4D manual assistance is assumed within the proposed approach. The ap-
BIM model; proach represents the idea that the geometry of the currently expected
- A recognition approach that is able to consider a vast set of con- building model can be leveraged to relate the observed scene in the
struction activities contained in 4D BIM models. assessed images to BIM. However, it is not assured that images from
indoor scenes contain structures that are also part of the BIM model.
3. Methodology Thus, the proposed approach regards a combination of different com-
puter vision algorithms to manage this circumstance.
In this chapter, a novel method that targets a higher degree of au- The registration of each frame is illustrated in Fig. 2. This workflow
tomation for indoor progress monitoring with the consistent integration describes the processing of an arriving image until a pose is found.
of as-planned 4D BIM information into the as-built inspection process is Determined poses of succeeding frames are used as a guess for the
presented. For the first time, a coordinated set of processes is aligned following frames. This enables the automatic registration of the video
that regards challenges and requirements for indoor scenes. The frames after an initial registration. The initial registration is performed
method considers extensive access to information present in the 4D BIM for the first frame and delivers an estimation of the first camera pose.
model in order to enable and support the progress monitoring pro- This pose is used for the automatic fine pose estimation process. In case
cesses. This leads the automation for indoor progress determination to a a fine pose can be obtained from the input image, the image is suc-
degree that reduces human intervention to a low level. cessfully registered and forwarded to the activity state recognition.
To achieve this, the underlying concept of the new method regards When no fine pose could be estimated, rough motion estimation de-
all processes, from data acquisition to activity state recognition. A livers the needed pose. For the next frame of the sequence, the pose is
workflow of the concept is illustrated in Fig. 1. The as-built data is provided as an initial guess. The next subsections give further details on
captured with video cameras, the status of construction activities is the registration.
determined by analyzing the content of images. Therefore, a variety of
algorithms from the fields of computer vision, image processing and ma- 3.1.1. Acquisition
chine learning are used for different purposes to ensure the mainly au- To create as-built data of the current construction status, video
tomatic workflow. As-built video frames are first registered to the BIM frames are captured with a monocular camera system within the con-
model and in the end the state of activities is recognized. The identified struction scene during inspections. Since the image frames are analyzed
activity states can be used as an information resource for assisted de- according to the result of construction activities, the inspection is likely
cision making, which may result in re-scheduling. The updated sche- to perform well when results are visible and works are currently not in
dules can then be exploited for subsequent inspections. action. A video is recorded while the camera system is traversed
Taking the as-built video frames as input, the first block of processes through the interior of the building. The personnel that carries out the
regards the registration of the image data to the building model. This inspections needs to be aware of the abilities of trajectory reconstruc-
reveals the pose of the camera for each captured frame within the tion based on computer vision and adjust motions accordingly.
building. To achieve that output of the registration block, an initial
registration is performed. The initial registration represents the only 3.1.2. Initial registration
part with manual intervention during the whole inspection and aims to The initial registration delivers a rough estimation of the pose at the
provide a rough camera pose. This rough camera pose is then beginning of an image sequence. This is necessary, since the concept
15
Re-Scheduling
4D BIM
Building Object
Schedule
model Catalogue
Geometry Relevant Objects

Acquisition
Registration Recognition
As-built Initial Registered Search Space Identified Assisted
Video Frames Registration Video Frames Reduction Activity State Decision Making
Rough Fine Pose ROI in Video State

Camera Pose Estimation Frames Recognition
Rough Motion Fine Image Rectified

Estimation Camera Pose Rectification ROI
Fig. 1. Concept overview for video-based progress monitoring.
further assumes that the structure of the building is ambiguous in dif- to the guessed line appearance along the perpendicular of the projected
ferent rooms and several observations may look similar. The concept model line. The projected building model lines define a search region
assumes that the camera poses of consecutive images can be recovered for each line. The search region contains a buffer that considers the
automatically. Thus, an initial registration covers the first frame in the uncertainty of the pose. The model line projection allows to reduce the
image sequence (see Fig. 2) and is performed by manually navigating search space for each line significantly in terms of the analyzed image
through the digital building model. Therefore, the building model is region as well as the direction of the line detection. This has the ad-
superimposed onto the regarded image as a wire frame model in an vantage that thresholds of image gradients can be set low to find
augmented reality (AR) manner. The user selects the view on the shallow line appearances.
building model that visually matches the actual camera view. The in-
itial registration is only performed once at the beginning of each in-
spection. 3.1.4. Fine pose estimation
The fine pose estimation module intends to find an accurate pose of
the camera for a present image of the building interior during con-
3.1.3. Line extraction struction. Therefore, a rough estimate of the pose from a previous step
To find correspondences between the model and the as-built scene, is used as an initial guess (see Fig. 2) and makes use of BIM information
a line segment extraction approach that provides line candidates of the to accurately align the camera pose to the building model.
model lines within as-built images is developed. This novel method The currently expected state of the building model is analyzed and
exploits the presence of a rough camera pose. The underlying algorithm geometric primitives are extracted. Furthermore, the image is processed
expects the image and a set of model lines projected to the image space to determine candidates for the geometric primitives from the building
as input, and delivers image line segments as possible candidates for model. The used geometric primitives are line segments, since line
each model line. Instead of scanning the entire image for line segments, segments are the dominating geometric primitive in many building
the appointed method leverages the projected lines from the building models. This approach takes extracted line segments from the building
model with an estimated pose. The approach further scans an image model, projects them to the image space and incorporates the method
region of interest (ROI) with a directed filter. The ROI is set according presented in Subsection 3.1.3 to obtain candidate line segments in the
First Frame 0
Frame Initial
Frame yes
Arrival Registration
?
Pose 0 Pose i-1
no
Frame i Pose
Fine Pose Rough Motion
Found no
Estimation Estimation
?
Pose i
yes
Pose i
Recognition
Fig. 2. Processing of the registration of each frame by pose estimation.
16
image space. The approach processes this input data and, first, de- subset of correspondences. This selection is performed according to the
termines the correct correspondent line segments from the model and similarity rank of the candidate to the projection of the model line
the image, and, second, a refined camera pose. For this reason, the fine segment. Simultaneously, the subset of model lines is analyzed for
pose estimation approach expects a sufficient set of detectable building known intersections. Incorporating additional point features to the
model lines in the image. existing line correspondences improves the accuracy of the results ac-
In a first step, the potential correspondences of line segments, which cording to Xu et al. [70]. Intersections in three-dimensional space re-
are detectable in the image and are present in the building model, are sults in an intersection in two-dimensional projection space as well.
established. To achieve this, the model line segments are projected onto Thus, using the image line segments to generate intersections of their
the image according to the rough pose estimate (or the pose from the projected lines produces a set of additional point correspondences as an
initial registration for the first frame) and consequently candidates of input for pose determination. Taking the line and point correspondence
line segments visible in the image are determined. In order to obtain subset as the input for a PnX algorithm [70], the model parameters of
line segments from the building model that are potentially visible from the pose can be determined. These model parameters are consequently
the camera pose in the captured image, the model data is processed. For tested against all candidate lines. The best fitting candidate lines that
this purpose, it is not sufficient to remove model line segments that are are associated with a model line form the consensus set. Optionally, a
not visible in the input camera frustum. Rather, model line segments refinement of the model parameters with the consensus set of lines is
that are hidden by other structures need to be eliminated, since the performed. If the projection error falls below a certain threshold and
amount of model line segments potentially not visible in the image the size of the consensus set is lower than a previously defined size,
should be reduced. Thus, a state-of-the-art hidden-surface algorithm appropriate model parameters have been determined, otherwise the
[93] is incorporated to obtain a set of model lines segments potentially algorithm iterates with a new selected random subset.
visible by the camera pose. Concluding, this set states a guess of visible
lines that is dependent on the expected state of the building construc- 3.1.5. Rough motion estimation
tion and the rough pose estimation. Hence, it is possible that model line Rough motion estimation predicts camera poses for the fine pose
segments (a) are modelled and assumed to be visible in the current estimation. In the case where fine pose estimation is not successful (see
construction state, but are actually not visible due to progress devia- Fig. 2), these poses remain until it is feasible again to obtain fine poses.
tions or vice versa, and (b) are hidden by non-modelled objects, not In contrast to the fine pose estimation, which derives poses absolutely
perceptible due to light conditions, etc. Moreover, real image scenes on to the building model geometry, rough motion estimation predicts
construction sites are typically cluttered and contain objects, shadows, poses based on relative image-to-image registration. Therefore, the
etc. that cannot not be anticipated and thus modelled or modelled in originally non-transformed and un-scaled camera poses of the image-to-
sufficient detail. For these reasons, the described approach assumes that image registered video frames are transformed and scaled into the
in construction scene images not only are more line segments detected, building model coordinate system.
but also a vast number of outliers are observable. Thus, the approach Two distinct frames i and j are considered to estimate the right
presented here is designed to handle a high ratio of outliers and several transformation and scale. This approach assumes that the absolute pose
line segment candidates from the observed scene for a projected model Pa = Ra ∣ ca, composed of the absolute rotation Ra and the absolute
line. camera center ca, from previous fine pose estimation as well as its re-
Subsequently, image line segments are assigned to model line seg- lative counterpart Pr = Rr ∣ cr, composed of the relative rotation Rr and
ments according to similarity in terms of angle and position distance to the relative camera center cr, from image-to-image registration are
the projected model lines. This assigns each projected model line a set known for both frames. The ratio of the Euclidean distances between
of image line segment candidates. The resulting problem, finding the the absolute ca and the relative cr camera centers determines the scale
right assignments for the line segment correspondences and simulta- factor from the relative to the absolute coordinate system
neously determining the resulting pose is illustrated in Fig. 3.
|cai − ca j |
The problem is solved by incorporating a modified RANSAC algo- sij = .
rithm for pose estimation, depicted in Fig. 4. The algorithm starts with |cri − cr j |
the selection of a random subset of the visible model line segments.
For a frame k for which an absolute pose is not available and only Pr
Based on the resulting subset, image line segments are selected from the
is known, the relative motion from the pose of frame j is determined as
set of all candidates that are assigned to a model line to construct a
Pr ∆ = Rr ∆ cr ∆ = Pr j−1∙Pr k .
The relative motion is scaled to the absolute motion

Pa∆ = Rr ∆ sij ∙cr ∆.
Knowing the absolute motion from frame j to frame k, the absolute

rough pose is defined as.
Pa rough k = Pa∆ ∙Pa j.
This enables rough pose estimation for all video frames that are not
compliant for fine pose estimation, as well as the provision of predic-
tions for further fine pose estimates.
3.2. Recognition
The recognition block of the general framework contains several

Fig. 3. Illustration of the absolute pose estimation problem without known corre- steps in order to prepare the images for activity state determination.
spondences of line segments between the model and the observation. The red line seg- First, the relevant objects and their geometry in the image space are
ment represents a model line and the green line segments candidates for its projection determined by the search space reduction, afterwards a rectification of
extracted from the image. (For interpretation of the references to color in this figure the regarding areas is performed to reduce the recognition problem to
legend, the reader is referred to the web version of this article.)
two-dimensional space and finally the state is recognized depending on
17
Fig. 4. Extended RANSAC scheme for pose determination with

line candidates.
18
the type of input activity. 3.2.3. State recognition

The state recognition is the actual process where the identification
of the progress is taking place with a classification of the input image
3.2.1. Search space reduction content. Due to the assumption that the image content is rectified to an
The images present in a video are captured in an unsupervised way. approximated plane, the classification can be performed in two-di-
Without further comprehension of context knowledge, any object could mensional space. Thus, a vast amount of approved mainstream re-
possibly appear anytime in every part of every image with any rotation, cognition strategies can be applied. However, the approach presented
scale and translation. Thus, information about the objects that appear in in this paper regards two different branches of recognition to consider
the images captured cannot be derived easily. The complexity changes the circumstance that a wide set of activities needs to be recognized.
significantly once the image data is registered to the building model.
Reducing the search space within the video sequences relieves the ac-
tual classification from critical ambiguities in image data. Only objects 3.2.3.1. Object recognition for activity state determination. The
involved in a certain activity need to be selected. Objects of activities recognition of objects in this approach concerns the physical presence
that were already finished in an earlier point of time, as well as objects of the investigated construction element. It assumes that the element's
that lie far ahead can certainly be neglected. A specific object is selected structure remains mainly the same for each instance of an object due to
that was identified as relevant for current monitoring by scanning the a pre-defined classification. However, partially covered areas of the
schedule for activities. Other objects remain unselected because of object or different lighting conditions restrain the classification. This
missing currency. Once the objects that need to be observed are known, requires that the whole object is visible. Objects that do not fit into the
it is possible to determine a set of images with objects' visibility. In the image cannot be analyzed using the underlying approach.
case that objects have to be completely visible for state determination,
the set only contains images with full appearance of the objects. It is
assumed that objects are installed as planned, so that the as-is and as- 3.2.3.2. Material recognition for activity state determination. Material
planned positions correspond. This approach does not reassess the recognition in this approach addresses the identification of an activity
correctness of positioning. It only tests for the correct schedule and thus state. Contrary to object recognition, material does not have one
for presence of objects. specific shape, but its texture is distributed over the surface it is
applied on. Sometimes there is a fine line between the distinction of an
object and material, especially when material consists of small
3.2.2. Image rectification countable objects. Nonetheless, it ultimately depends on the details of
Since the feasibility of comparing objects' appearances greatly the building model and the construction schedule. Since the underlying
benefits from normalization [94], the present camera pose information approach expects an activity-specific recognition, this is not defined
is used to perform rectification on the images as a preparation for automatically. A further characteristic that particularly represent the
further state determination. This reduces the state recognition to a challenges in indoor scenes is that the whole extent of the material
simple two-dimensional classification problem. Depending on the ex- surface often cannot be observed by one image (camera-dependent).
tents of the object of interest defined by the search space reduction, the Thus, partial visibility of the material needs to be considered in
original images are transformed into the rectified form (see Fig. 5). The recognition strategies. However, each material has its own structural
rectification approximates the original images to a unified view that characteristics that can be leveraged for feature extraction, e.g. bricks
ideally is the same for all input images. Since the approximation is are usually aligned in rows and have a rectangular shape. Hence,
performed on a plane in 3D space, object elements that are not on this material recognition in the proposed framework for state recognition
plane are distorted. However, the main part of the rectified images expects a particular implementation for each material.
states a suitable input for classification.
Fig. 5. Rectification of two images from different poses to one

reference.
Original Image Rectified Images Original Image
19
4. Experiments and results exhaustive application of the fine pose estimation method.
The simulated deviations include rotation and translation on the
The aim of the experiments was to evaluate the output of the single camera pose. To test the impact of different types of motion, the de-
steps of the presented framework for validity and the viability of the viations were applied on single axes. The illustrations of Fig. 7 show the
whole framework as one unit. The evaluation of the registration steps is applied maximum deviations applied to each axis on an exemplary
presented first, followed by the test results for the recognition, con- image. The rotational deviation was induced by a maximum of −20°/
cluding in the final general experimental output. +20° and the translation was set to a maximum of − 1 m/+1 m. The
first row of Fig. 7(a-c) shows the model line projections with the
4.1. Registration maximum camera pose deviation simulated by the rotation of the x-, y-
and z-axis and the second row (d-f) shows the translation on the x-, y-
To evaluate the registration block of the framework, different steps and z-axis.
were tested. This set of steps contains the line extraction, fine pose The goal of this experiment was to recover the original deviation-
estimation, and rough motion estimation. In this section, the experi- less camera pose from the divergent input camera pose. To define an
ments were performed on single images. appropriate measure of deviation for the estimated camera pose, the
distance of the endpoints of the model lines (point distance) between
the original pose and the transformed estimated pose was chosen (see
4.1.1. Line extraction
Fig. 8). This unites the rotational and translational error in one value.
The line extraction step was tested to evaluate the detection rate of
The tests were performed on the same set of images as used in the
the proposed approach. Therefore, a test set of 510 images of photos
line extraction experiments (510 images). The search space for the line
with a resolution of 4288 × 2848 pixels and video frames (each frame
extraction was adaptively set according to the point distance between
processed independently) with a resolution of 1920 × 1080 pixels was
the original and the input pose of the image space endpoints of the
created. To enable straight line segment extraction, the images were
projected model lines. For the case where not enough model lines were
undistorted before they were processed.
projected within the image borders, the images were skipped to avoid
The results of the proposed algorithm are compared to the line ex-
infinite error values for erroneous line configuration input to the
traction algorithms of Grompone von Gioi et al. [87] and Akinlar et al.
RANSAC algorithm.
[88]. Those two algorithms are not directly comparable with the pre-
The charts in Fig. 9 and Fig. 10 show the results of the mean point
sented method. To obtain comparable results, a distance metric is used
distance of all tested images. The deviations were plotted increasing
that counts inliers under a certain threshold of the distance to the ex-
from the negative to the positive maximum. The lowest point distance
pected line as inliers. A positive detection was encountered as soon as at
error could be expected to lie at zero deviation. However, since the
least line was found/filtered in the search region. The expected lines are
RANSAC approach is non-deterministic, this is not guaranteed and re-
a set of manually defined line segments and state the ground truth. It
sults in the small error range are present.
was assured that a contour at the defined expected line pose was pre-
sent. The results listed in Table 2 compare the presented method with
4.1.3. Rough motion estimation
two other line segment extraction methods, namely LSD [87] and
In this section, the rough motion estimation approach is evaluated.
EDLines [88]. The time needed to perform the test for [87,88] is ex-
The main aspect investigated is whether the pose predictions are still
pressed as the extraction time for the whole image and for the presented
appropriate when a longer passage of frames does not allow for the fine
method for one line. This varies by the amount of lines that need to be
pose estimation. The main test procedure works as follows: The image
extracted for an image. However, the algorithm offers opportunities for
sequence is initialized with fine pose estimation. As soon as fine pose
parallelization. The detection rate benefits from the low threshold that
estimation stops, the rough motion estimation steps in and continues
can be applied using the presented method. Fig. 6 shows image ex-
the estimation of the pose until fine pose estimation is available again.
amples with sheer lines that could be extracted with the presented
Therefore, a test set for three different scenarios of camera motion
method, but not with the other two compared methods. Unstable
was generated: translational motion, rotational motion, and both,
transitions along the length of the edge (see Fig. 6a), edges with low
translational motion with simultaneous rotation (see Table 3). The
contrast (see Fig. 6b), and blurry edges (see Fig. 6c) are found more
input of the test procedure contains a video file, an initial pose, the
often with the presented method.
relative camera trajectory, a simple 3D line model, and a frame j that
However, the target application for the line segment extraction al-
determines the end of possible fine pose determination. It is assured
gorithm is not the detection of lines that are known to be present, but of
that until frame j and for the very last frame of the image sequence, all
model lines that are not certainly present, fully observable, or precise in
model lines are visible and the fine pose estimation is providing suc-
pose. Thus, uncertainties induce a larger distance of the expected model
cessful results. To measure the capabilities of the rough motion esti-
line. The more the line search region is extended, the more candidates
mation, the last image is registered to the 3D line model and the dif-
for a line expectation are found.
ference between the resulting registered pose and the guessed rough
pose is evaluated.
4.1.2. Fine pose estimation During the test procedure, fine registration is applied automatically
In this section, the fine pose estimation step is evaluated in terms of for a certain number of frames. With the first and last frame of this
robustness regarding uncertainty in the input camera pose. Therefore, subsequence, i and j are determined. The baseline between the camera
camera pose deviation is simulated on a set of defined ground truth center of both frames defines the scale. The trajectory of the subsequent
camera poses. A set of images is composed and their corresponding frames is guessed by the rough motion estimation procedure.
camera poses within the building model are determined by an The image sequences were recorded with an off-the-shelf smart-
phone camera capable of recording with a video resolution of
Table 2
3840 × 2160 pixels and 30 frames per second. The camera was ap-
Line extraction results.
pointed in a fixed focal length and the intrinsic camera parameters were
Detection rate Time for each line (avg.) determined in advance.
Two different image-to-image registration methods, VO and SFM,
EDLines [88] 0.18 246.238 ms
were incorporated in the test procedure. The results of the rough mo-
LSD [87] 0.41 300.98 ms
Proposed method 0.9259 2.02 ms tion estimation experiments are listed in Table 4. The table contains the
measured length of the trajectory according to the registration, the
20
Fig. 6. Image examples with sheer lines that could be extracted with the presented method, but not with the other two compared methods: (a) unstable transitions, (b) low contrast, and
(c) blurry edges.
number of frames the image sequence contains, the first and second 4.2. Recognition
frame used for registration with fine pose estimation, the appointed
technique, and the corresponding error measured in the last frame. The evaluation of the recognition block of the general framework
However, the fine pose estimation process could re-register the motion was performed by experiments of the object and material recognition.
of the camera to the building model in the last frame in all tested cases The activity state recognition was conducted on the differentiation of
to determine the error. three stages of a drywall installation and on a two-stage classification of
The results of this experiment show the induction of drift by the a radiator mounting. In this section, the experiments were performed
rough motion estimation. However, the deviation measured was very on single images.
low such that the fine pose estimation could still re-register absolutely
to the model. This shows that the incorporation of rough motion esti-
4.2.1. Drywall state determination with material recognition
mation is possible for a longer sequence. Afterwards, the association
Drywall installation activities are usually finished after completing
back to the model is feasible. The experiment includes a limited set of
three states: a) the installation of panels (Fig. 11a, b) the plastering of
motion. Most motion necessary to assess visual as-built data on the site,
the drywall (Fig. 11b and c) the painting of the drywall (Fig. 11c).
though, is covered by these examples.
Different visual features that result in a challenging recognition char-
acterize the different states of drywall installation activities. The main
challenges for determining the state of the drywall are the low structure
Fig. 7. Projections of input model lines with divergent camera poses by rotation (a–c) and translation (d–f).
21
thus disturb a clear view on the expected texture characteristics and

influence results. After these preparatory steps, the sheer drywall can be
investigated. Therefore, the subsequent step performs a compass edge
feature extraction and comparison. The compass edge features are ex-
tracted with the information from the registered motion information
that provides the camera view and rotation relative to the wall. With
that information, an edge filter can be adapted to find the horizontal
and vertical gaps as well as the round screw spots at the drywall. Also
within this step, a preliminarily trained support vector machine classifier
Point distance with the ability to decide between paneled drywalls and the other two
stages pursuant to the extracted edge features is applied.
The paneled state is the earliest outcome state in this classification
workflow, which exits if this state is reached. For the differentiation
between the remaining states, further steps within the workflow are
necessary. The further state distinction is focused on increasing the
contrast of the areas that are painted to the plastered panels. Histogram
based adaptive contrast normalization is used to expose the contrast at
the plastered areas of the wall. Afterwards, a histogram considering the
pixel intensity is used as a feature vector for classification. The result
holds a precision and recall of at least 95%, which allows for the usage
Rotation and translation distance of the method as a process in the general framework. Exemplary results
are illustrated in Fig. 14. A deeper insight into the method is given in
Fig. 8. Illustration of the point distance as the applied error metric.
[83].
4.2.2. Radiator installation state determination with object recognition

The second investigated case for state recognition is based on the
method presented in Kropp et al. [84]. The approach is intended to
derive the current state of an activity by recognizing the presence of
structural objects with changing appearance from different views by
making an approximation of its real perspective transform. It assumes
that views onto an object from a certain similar spatial region result in
comparable object appearances, such that it is sufficient to train a
classifier for achieving good recognition results. It considers available
information about the object from BIM, as well as motion information
to reduce the complex three-dimensional recognition problem into a
simple two-dimensional detection problem. The method was evaluated
Fig. 9. Mean point distance of the line endpoints with applied transformation of the es-
timated camera pose with translational deviation input. on the example of the recognition of radiators. Fig. 15 shows different
activity states of radiators that need to be recognized. Different possible
states of the radiator are reduced to two states and concluded into the
radiator being present and non-present. Examples for the non-present
state can be an unprepared (Fig. 15a) or prepared installation back-
ground (Fig. 15b), whereas the present radiator can be found wrapped
in packaging foil (Fig. 15c) or unwrapped (Fig. 15d).
The workflow that is related to the current approach is displayed in
the flow chart in Fig. 16. In the first step within this flow chart, the view
onto the object is classified by an aspect approximation. This step is
slightly modified compared to the original design to refer to the concept
of aspect graphs [95]. Unlike the drywall recognition process, not the
standard rectified image ROI is used, but an aspect-normalized ROI of
the image. As the result, the most dominant approximated surface is
rectified. Hereby, information from the BIM object geometry and the
Fig. 10. Mean point distance of the line endpoints with applied transformation of the registered motion information is used to obtain the relative pose of the
estimated camera pose with rotational deviation input. object to the camera. The approximated rectification is intended to
reduce the three-dimensional appearance into a normalized two-di-
and the homogeneous color distribution over the wall. Furthermore, in mensional representation of the object. Before visual features are ex-
indoor environments, walls may not completely be visible if a special tracted, this rectangle is illuminated to remove illumination inequal-
lens is not applied. ities. Finally, the method of Histogram of Oriented Gradients is applied to
According to the description of the introduced state recognition extract information about visual characteristics and used to train a
framework, the method is explained as a workflow to achieve state support vector machine classifier. This enables the derivation of the
differentiation. The classification workflow is illustrated as a flow chart presence of the object and hence, a state of the activity. The experi-
in Fig. 12. In the first step, the image is loaded and cut according to the ments result in a precision of 0.96 and a recall of 0.99, which satisfies
provided ROI (see example in Fig. 13). The second step includes seg- the usage of the method as a process in the general framework. In
mentation that prevents the inclusion of unexpected objects, like lad- Fig. 17 exemplified results of the described method are illustrated.
ders, painting pots or cables that may occlude the desired drywall and Kropp et al. [84] contains a detailed description of the explained steps.
22
Table 3
Test cases for rough motion estimation experiments.
Test case Overview Example images of the sequence
Translation
Rotation
Translation & rotation
Table 4
Results of the rough motion estimation experiments.
Test scenario Measured trajectory length # Frames 1st fine pose 2nd fine pose Baseline between 1st and 2nd fine pose Technique Last frame error
[m] frame frame frame[m] [m]
Translation 18.098 721 1 30 0.281 VO 0.267

SFM 0.043
Rotation 3.327 713 1 12 0.024 VO 0.195
SFM 0.640
Trans. & Rot. 16.617 541 1 8 0.181 VO 0.405
SFM 0.288
Fig. 11. Example images of (a) installed drywall panels, (b) plastered drywall, and (c) painted drywall.
4.3. Case study on construction site videos panels, the second video shows the non-installed radiator and the
plastered drywall state, and the third video contains the appearance of
For the evaluation of the whole framework, three exemplary videos the radiator and the painted drywall state. The goal of this experiment
were selected that document the radiator and drywall activity states. In was to test the ability of the final statement about the activity state
Table 5, the content of image sequences is listed. All the videos were under the influence of all previous steps starting from the initial re-
recorded at the renovation of the IC building on the campus of the gistration, over fine and rough pose estimation towards the activity
Ruhr-Universität Bochum with the monocular rear camera of a tablet state recognition.
PC. The first video shows the non-present radiator and installed drywall In the beginning of the sequences, a sufficient set of visible model
23
Drywall State Recognition
4D BIM
Building Object
Schedule Catalogue
model
Geometry Relevant Objects
Compass Edge Intensity

Rectified
Segmentation Feature Panelled? Histogram Plastered?
ROI false Extraction false
Extraction
true true
Segmented Compass Edge Intensity Identified
Rectified ROI Distribution Histogram Activity State
Fig. 12. Workflow for the classification of the drywall installation completion state.
Fig. 13. Example image of a drywall with original appearance (a), BIM extracted ROI (b) and applied segmentation (c).
Fig. 14. Exemplified classification results: (a) recognized installed drywall state, (b–c) recognized plastered drywalls state, (d) recognized painted drywall state.
Fig. 15. Different activity states for the test configuration: (a) unprepared background, (b) prepared background, (c) radiator wrapped in packaging foil, and (d) unwrapped radiator.
lines appears, such that a proper fine pose estimation was guaranteed. contains Euro pallets on the floor that cover the second drywall par-
Since earlier test results showed a positive and reliable application of tially. In the second sequence, some beams cover all activities due to the
the rough motion estimation, only a few manually set frames were camera motion. The third sequence contains a ladder in front of the
defined for fine pose registration. The image sequences contain some present radiator in most frames. The videos contain critical sequences
obstacles that partially cover the relevant activities. The first sequence like backlight, rotation-dominant motion of the camera, views with
24
Fig. 16. Workflow for the classification of the heating installation completion state with example image data.
Fig. 17. Exemplified classification results: (a-d) true negatives (TN), (e) false negatives (FN), (f–h) true positive (TP).
Table 5 containing the relevant activity subject, as well as the number of frames
Overview of the activity states within the three exemplary image sequences. for which a correct determination of the current state was derived. The
voted result shows the ultimate result for the inspection of the activity
Sequence Location Date Radiator Drywall 1 Drywall 2
calculating the absolute majority of state determinations. For all se-
#1 room 5–169 November 27, Not installed Installed Installed quences, activities and frames, the activities were classified correctly.
2012 Thus, also the voted results are correct. Although the camera pose de-
#2 room 5–169 December 20, Not installed Plastered Plastered termination was not always stable due to low feature areas and thus the
2012
#3 room 2–169 December 20, Installed Painted Painted
ROIs of the analyzed state were not always precise, they were accurate
2012 enough for a robust statement about the activity state.
Based on the image sequences, the results were composed in videos
that illustrate the BIM registration and activity state recognition pro-
low-textured walls and an insufficient set of visible model lines. cess. For the registration section, the observed activities were high-
However, for recognition terms, the states of the desired objects and lighted in the original frames. An equivalent view was created with a
materials were expected to be properly recognized. BIM viewer and additionally, a bird's eye view that shows the motion of
Table 6 shows the results of the test. The table contains the total the camera within the building model was created. For the recognition
number of frames contained in the image sequence of the videos, the section, the rectified areas of the observed activities are visualized. It
number of frames that were automatically selected for potentially was highlighted when the recognition of the state was successfully
25
Table 6
Overview of the results within the three exemplary image sequences.
Seq. Sequence length Radiator Drywall 1 Drywall 2

[frames]
Visible Correctly classified Voted Visible Correctly classified Voted result Visible Correctly classified Voted result
[frames] [frames] result [frames] [frames] [frames] [frames]
#1 808 274 274 Not 326 326 Installed 15 15 Installed

present
#2 1732 429 429 Not 345 345 Plastered 346 346 Plastered
present
#3 828 236 236 Present 223 223 Painted 70 70 Painted
Fig. 18. First example result frame of the first

image sequence of the case study (small white
text is summarized in Table 7).
Fig. 19. Second example result frame of the first

image sequence of the case study study (small
white text is summarized in Table 7).
performed and statistics about the recognition are printed. Eleven results for the regarded activity states that were recognized for the
frames are exemplarily shown (Figs. 18–28). The first three examples qualified image up to the current frame.
show the composed results from the first sequence (Figs. 18–20), the
succeeding four examples show the composed results from the second
sequence (Figs. 21–24), and the last four examples show the composed 5. Discussion
results from the third sequence (Figs. 25–28).
The information printed in the example images is summarized in The results of the experiments in the preceding section show the
Table 7. The table contains the related figure and the associated se- overall viability of the framework. However, analyzing the results of
quence the frame was taken from. Moreover, the visibility and the re- the experiments, several issues, which the proposed method has yet to
cognition statistics of the three analyzed activities are contained in the address, were encountered. This mainly includes the as-built acquisi-
table. The visibility means the visibility of the activity-related con- tion and the registration block of the framework. During the processing
struction objects in the estimated image region of the current frame. In of the image sequences, it could be observed that very rough motion of
case of the drywalls, the visibility is represented in a percentage value, the camera, especially rotation-dominant motion in front of the close-
whereas the visibility of the radiator is only given when the whole range objects like walls in combination with the monocular camera
object is visible (yes/no). The recognition statistics hold the voting system can lead to bad results. Scale varies during the trajectory re-
construction of the video sequence, especially the scale cannot be well
26
Fig. 20. Third example result frame of the first

Fig. 21. First example result frame of the second

image sequence of the case study study (small
Fig. 22. Second example result frame of the

second image sequence of the case study (small
reconstructed during rotational motion. For practical reasons, per- pose estimation as early as possible.
sonnel of future inspections on the construction site should be made Since the accumulated search space for line segment extraction
aware of this. varies according to the induced uncertainty and the amount of pro-
Furthermore, the approach for scale determination accomplished in jected model lines that are imposed onto the image differs from image
rough motion estimation assumes that the scale is fixed until the next to image, the presented approach might not be the fastest option in all
fine pose estimation process. The relative vision-based motion estima- cases. Sometimes, when a vast amount of lines is to be extracted, it can
tion techniques regarded, particularly VO, induce an unpredictable be advantageous in terms of processing time to incorporate a con-
degree of drift for each succeeding frame that can grow rapidly de- servative line segment extraction approach with subsequent filtering.
pending on many factors. Thus, it is recommendable to return to fine
27
Fig. 23. Third example result frame of the second

Fig. 24. Fourth example result frame of the

second image sequence of the case study (small
Fig. 25. First example result frame of the third

6. Conclusions and outlook in time in the construction schedule. Solely once, manual intervention
is needed for initialization to register a sequence. Consecutive images
In order to determine the actual state of buildings under construc- are registered in a fully automatic manner.
tion, this paper contributes a promising novel method that increases the With the BIM registered image sequences, the search for relevant
degree of automation for inspections. The method is based on the 4D tasks is reduced to regions of interest that contain tasks of interests
BIM model information and image sequences that are merged and according to the schedule and associated objects in the building model.
processed to derive rich information about single construction tasks of Again, registration information comes into play for the rectification of
interest. Each image of a sequence is registered to the 4D BIM model in the determined image regions. This leads to less complex problems for
terms of its origin in the building model coordinate system and its point the activity state recognition.
28
Fig. 26. Second example result frame of the third

Fig. 27. Third example result frame of the third

Fig. 28. Fourth example result frame of the third

The behavior of single independent steps of the new method was The results show that an automation of progress monitoring sup-
studied as well as the performance of the general framework for each ports the availability of quickly interpreted as-built inspections. This
step using the results of preceding steps. Both, the independent as well enables short intervals and, thus, frequent updates on the as-built
as the dependent results show promising applicability for widely au- states. Performing inspections from the beginning of finishing works on
tomatic progress monitoring combining BIM and image sequences. a daily basis reveals frequent insights about activities performed on the
However, potential errors in each of the subsequent steps are propa- construction site. However, in the case where activities are actually
gated to the following step. So far, this has not been analyzed and tested performed ahead of the schedule, occlusions might be possible and
in detail yet, but leaves room for further improvements and future would cause problems. A potential solution in future work would be to
discussions. use certain time windows of the planned schedule including activities
29
Table 7
Details of the printed information within the examples of the composed result frames.
Figure Example sequence Drywall #1 Radiator Drywall #2
Visibility Installed/Plastered/Painted Visibility Not installed/Installed Visibility Installed/Plastered/Painted
Fig. 18 1 50% 1/0/0 Yes 1/0 28% 0/0/0

Fig. 19 1 55% 216/0/0 No 133/0 0% 15/0/0
Fig. 20 1 1% 326/0/0 Yes 222/0 0% 15/0/0
Fig. 21 2 54% 0/12/0 Yes 12/0 31% 0/11/0
Fig. 22 2 59% 0/134/0 No 73/0 0% 0/21/0
Fig. 23 2 0% 0/227/0 Yes 196/0 0% 0/21/0
Fig. 24 2 21% 0/227/0 No 429/0 54% 0/325/0
Fig. 25 3 47% 0/0/4 Yes 0/4 24% 0/0/0
Fig. 26 3 54% 0/0/140 No 0/99 0% 0/0/0
Fig. 27 3 4% 0/0/221 Yes 0/143 6% 0/0/0
Fig. 28 3 0% 0/0/221 No 0/234 41% 0/0/48
and object ahead of the schedule. equipment operations using GPS data, Autom. Constr. 29 (2013) 107–122, http://
Due to the complexity of the framework, there is a vast availability dx.doi.org/10.1016/j.autcon.2012.09.004.
[14] K. Han, M. Golparvar-Fard, D. Castro-Lacouture, J. Irizarry, B. Ashuri (Eds.),
of starting points for optimization. This can reach from more robust Automated monitoring of operation-level construction progress using 4D BIM and
video recording techniques that will come up soon to real time im- daily site photologs, Construction Research Congress 2014, American Society of
plementation of the whole framework to achieve augmented reality Civil Engineers, Atlanta, Georgia, 2014, pp. 1033–1042, , http://dx.doi.org/10.
1061/9780784413517.106.
applications on mobile devices. However, the framework is generic and [15] Z. Zhu, S. German, I. Brilakis, Detection of large-scale concrete columns for auto-
modular, which allows for the smooth interchange of single steps in mated bridge inspection, Autom. Constr. 19 (2010) 1047–1055, http://dx.doi.org/
many ways. Nevertheless, there is a need to generate an extensive test 10.1016/j.autcon.2010.07.016.
[16] Y.M. Ibrahim, T.C. Lukins, X. Zhang, E. Trucco, A.P. Kaka, Towards automated
set of activities to cover for state recognition.
progress assessment of workpackage components in construction projects using
computer vision, Adv. Eng. Inform. 23 (2009) 93–103, http://dx.doi.org/10.1016/
Acknowledgements j.aei.2008.07.002.
[17] H. Son, C. Kim, 3D structural component recognition and modeling method using
color and 3D data for construction progress monitoring, Autom. Constr. 19 (2010)
The authors gratefully acknowledge the financial support by the 844–854, http://dx.doi.org/10.1016/j.autcon.2010.03.003.
German Research Foundation (DFG) for this work under the grants KO [18] I. Brilakis, M.W. Park, G. Jog, Automated vision tracking of project related entities,
4311/4-1 and KO 3473/8-1. Adv. Eng. Inform. 25 (2011) 713–724, http://dx.doi.org/10.1016/j.aei.2011.01.
003.
[19] S. El-Omari, O. Moselhi, Data acquisition from construction sites for tracking pur-
References poses, Eng. Constr. Archit. Manag. 16 (2009) 490–503, http://dx.doi.org/10.1108/
09699980910988384.
[20] M.W. Park, I. Brilakis, Construction worker detection in video frames for initializing
[1] S. Roh, Z. Aziz, F. Peña-Mora, An object-based 3D walk-through model for interior vision trackers, Autom. Constr. 28 (2012) 15–25, http://dx.doi.org/10.1016/j.
construction progress monitoring, Autom. Constr. 20 (2011) 66–75, http://dx.doi. autcon.2012.06.001.
org/10.1016/j.autcon.2010.07.003. [21] E. Rezazadeh Azar, B. McCabe, Automated visual recognition of dump trucks in
[2] P. Greiner, P. Mayer, K. Stark, Ablaufplanung, Baubetriebslehre - Proj, 4th ed., construction videos, J. Comput. Civ. Eng. 26 (2012) 769–781, http://dx.doi.org/10.
Vieweg + Teubner Verlag, 9783834806581, 2009, pp. 115–170. 1061/(ASCE)CP.1943-5487.0000179.
[3] S.N. Razavi, C.T. Haas, Multisensor data fusion for on-site materials tracking in [22] M. Golparvar-Fard, A. Heydarian, J.C. Niebles, Vision-based action recognition of
construction, Autom. Constr. 19 (2010) 1037–1046, http://dx.doi.org/10.1016/j. earthmoving equipment using spatio–temporal features and support vector machine
autcon.2010.07.017. classifiers, Advanced Enginering Informatics. 27 (2013) 652–663, http://dx.doi.
[4] S.N. Razavi, C.T. Haas, Using reference RFID tags for calibrating the estimated lo- org/10.1016/j.aei.2013.09.001.
cations of construction materials, Automation in Construction, 2011, pp. 677–685, , [23] J. Gong, C.H. Caldas, C. Gordon, Learning and classifying actions of construction
http://dx.doi.org/10.1016/j.autcon.2010.12.009. workers and equipment using bag-of-video-feature-words and Bayesian network
[5] T. Cheng, M. Venugopal, J. Teizer, P.A. Vela, Performance evaluation of ultra models, Adv. Eng. Inform. 25 (2011) 771–782, http://dx.doi.org/10.1016/j.aei.
wideband technology for construction resource location tracking in harsh en- 2011.06.002.
vironments, Autom. Constr. 20 (2011) 1173–1184, http://dx.doi.org/10.1016/j. [24] A. Khosrowpoura, J.C. Niebles, M. Golparvar-Fard, Vision-based workface assess-
autcon.2011.05.001. ment using depth images for activity analysis of interior construction operations,
[6] T. Cheng, J. Teizer, G.C. Migliaccio, U.C. Gatti, Automated task-level activity Autom. Constr. 48 (2014) 74–87, http://dx.doi.org/10.1016/j.autcon.2014.08.003.
analysis through fusion of real time location sensors and worker's thoracic posture [25] A.R. Andoh, X. Su, H. Cai, A boundary condition-based algorithm for locating
data, Autom. Constr. 29 (2013) 24–39, http://dx.doi.org/10.1016/j.autcon.2012. construction site objects using RFID and GPS, Construction Research Congress 2012
08.003. (28) (2012) 808–817, http://dx.doi.org/10.1061/9780784412329.082.
[7] A. Shahi, J.S. West, C.T. Haas, Onsite 3D marking for construction activity tracking, [26] M. Kopsida, I. Brilakis, P. Vela, A review of automated construction progress and
Autom. Constr. 30 (2013) 136–143, http://dx.doi.org/10.1016/j.autcon.2012.11. inspection methods, Proc. 32nd CIB W78 Conference on Construction IT, 2015, pp.
027. 421–431.
[8] R. Maalek, F. Sadeghpour, Accuracy assessment of ultra-wide band technology in [27] J. Teizer, Status quo and open challenges in vision-based sensing and tracking of
tracking static resources in indoor construction scenarios, Autom. Constr. 30 (2013) temporary resources on infrastructure construction sites, Adv. Eng. Inform. 29
170–183, http://dx.doi.org/10.1016/j.autcon.2012.10.005. (2015) 225–238, http://dx.doi.org/10.1016/j.aei.2015.03.006.
[9] S. Woo, S. Jeong, E. Mok, L. Xia, C. Choi, M. Pyeon, J. Heo, Application of WiFi- [28] J. Yang, M.W. Park, P.A. Vela, M. Golparvar-Fard, Construction performance
based indoor positioning system for labor tracking at construction sites: a case study monitoring via still images, time-lapse photos, and video streams: now, tomorrow,
in Guangzhou MTR, Automation in Construction, 2011, pp. 3–13, , http://dx.doi. and the future, Adv. Eng. Inform. 29 (2015) 211–224, http://dx.doi.org/10.1016/j.
org/10.1016/j.autcon.2010.07.009. aei.2015.01.011.
[10] X. Shen, W. Chen, M. Lu, Wireless sensor networks for resources tracking at building [29] E. Ergen, B. Akinci, R. Sacks, Tracking and locating components in a precast storage
construction sites, Tsinghua Sci. Technol. 13 (2008) 78–83, http://dx.doi.org/10. yard utilizing radio frequency identification technology and GPS, Autom. Constr. 16
1016/S1007-0214(08)70130-5. (2007) 354–367, http://dx.doi.org/10.1016/j.autcon.2006.07.004.
[11] S. El-Omari, O. Moselhi, Integrating 3D laser scanning and photogrammetry for [30] D. Grau, L. Zeng, Y. Xiao, Automatically tracking engineered components through
progress measurement of construction work, Autom. Constr. 18 (2008) 1–9, http:// shipping and receiving processes with passive identification technologies, Autom.
dx.doi.org/10.1016/j.autcon.2008.05.006. Constr. 28 (2012) 36–44, http://dx.doi.org/10.1016/j.autcon.2012.05.016.
[12] F. Bosché, M. Ahmed, Y. Turkan, C.T. Haas, R. Haas, The value of integrating scan- [31] J. Song, C.T. Haas, C.H. Caldas, Tracking the location of materials on construction
to-BIM and scan-vs-BIM techniques for construction monitoring using laser scan- job sites, J. Constr. Eng. Manag. 132 (2006) 911–918, http://dx.doi.org/10.1061/
ning and BIM: the case of cylindrical MEP components, Autom. Constr. (2014), (ASCE)0733-9364(2006)132:9(911).
http://dx.doi.org/10.1016/j.autcon.2014.05.014. [32] D. Grau, C.H. Caldas, C.T. Haas, P.M. Goodrum, J. Gong, Assessing the impact of
[13] N. Pradhananga, J. Teizer, Automatic spatio-temporal analysis of construction site materials tracking technologies on construction craft productivity, Autom. Constr.
30
18 (2009) 903–911, http://dx.doi.org/10.1016/j.autcon.2009.04.001. [58] X. Zhang, N. Bakis, T.C. Lukins, Y.M. Ibrahim, S. Wu, M. Kagioglou, G. Aouad,
[33] J. Teizer, D. Lao, M. Sofer, Rapid automated monitoring of construction site ac- A.P. Kaka, E. Trucco, Automating progress measurement of construction projects,
tivities using ultra-wideband, 24th Int. Symp. on Automation and Robotics in Autom. Constr. 18 (2009) 294–301, http://dx.doi.org/10.1016/j.autcon.2008.09.
Construction, ISARC 2007, 2007 9788190423519, pp. 23–28. 004.
[34] A.A. Oloufa, M. Ikeda, H. Oda, Situational awareness of construction equipment [59] P. David, D. Dementhon, Object recognition in high clutter images using line fea-
using GPS, wireless and web technologies, Automation in Construction, 2003, pp. tures, Proc. IEEE International Conference on Computer Vision, 2005, pp.
737–748, , http://dx.doi.org/10.1016/S0926-5805(03)00057-8. 1581–1588, , http://dx.doi.org/10.1109/ICCV.2005.173.
[35] S. El-Omari, O. Moselhi, Integrating automated data acquisition technologies for [60] C. Kim, C. Kim, H. Son, Fully automated registration of 3D data to a 3D CAD model
progress reporting of construction projects, Automation in Construction, 2011, pp. for project progress monitoring, Autom. Constr. 35 (2013) 587–594, http://dx.doi.
699–705, , http://dx.doi.org/10.1016/j.autcon.2010.12.001. org/10.1016/j.autcon.2013.01.005.
[36] G. Deak, K. Curran, J. Condell, A survey of active and passive indoor localisation [61] B. Bellekens, V. Spruyt, M. Weyn, A survey of rigid 3D pointcloud registration al-
systems, Comput. Commun. 35 (2012) 1939–1954, http://dx.doi.org/10.1016/j. gorithms, Ambient 2014, Fourth International Conference on Ambient Computing,
comcom.2012.06.004. Applications, Services and Technologies 2014, 2014 9781612083568, pp. 8–13.
[37] M. Golparvar-Fard, J. Bohn, J. Teizer, S. Savarese, F. Peña-Mora, Evaluation of [62] M.Y. Yang, Y. Cao, J. McDonald, Fusion of camera images and laser scans for wide
image-based modeling and laser scanning accuracy for emerging automated per- baseline 3D scene alignment in urban environments, ISPRS J. Photogramm. Remote
formance monitoring techniques, Autom. Constr. 20 (2011) 1143–1155, http://dx. Sens. 66 (2011), http://dx.doi.org/10.1016/j.isprsjprs.2011.09.004.
doi.org/10.1016/j.autcon.2011.04.016. [63] V. Pătrăucean, I. Armeni, M. Nahangi, J. Yeung, I. Brilakis, C. Haas, State of re-
[38] P. Tang, D. Huber, B. Akinci, R. Lipman, A. Lytle, Automatic reconstruction of as- search in automatic as-built modelling, Adv. Eng. Inform. 29 (2015) 162–171,
built building information models from laser-scanned point clouds: a review of http://dx.doi.org/10.1016/j.aei.2015.01.001.
related techniques, Autom. Constr. 19 (2010) 829–843, http://dx.doi.org/10.1016/ [64] B. Přibyl, P. Zemčík, M. Čadík, Camera pose estimation from lines using Plücker
j.autcon.2010.06.007. coordinates, Proc. Br. Mach. Vis. Conf. 2015, The British Machine Vision
[39] Y. Turkan, F. Bosche, C.T. Haas, R. Haas, Automated progress tracking using 4D Association and Society for Pattern Recognition, Swansea, GB, 2015, pp.
schedule and 3D sensing technologies, Automation in Construction, 2012, pp. 45.1–45.12, , http://dx.doi.org/10.5244/C.29.45.
414–421, , http://dx.doi.org/10.1016/j.autcon.2011.10.003. [65] H.H. Chen, Pose determination from line-to-plane correspondences: existence
[40] F. Bosche, C.T. Haas, B. Akinci, Automated recognition of 3D CAD objects in site condition and closed-form solutions, IEEE Trans. Pattern Anal. Mach. Intell. 13
laser scans for project 3D status visualization and performance control, J. Comput. (1991) 530–541, http://dx.doi.org/10.1109/34.87340.
Civ. Eng. 23 (2009) 311–318, http://dx.doi.org/10.1061/(ASCE)0887-3801(2009) [66] Y. Liu, T.S. Huang, O.D. Faugeras, Determination of camera location from 2-D to 3-
23:6(311). D line and point correspondences, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990)
[41] F. Bosché, Automated recognition of 3D CAD model objects in laser scans and 28–37, http://dx.doi.org/10.1109/34.41381.
calculation of as-built dimensions for dimensional compliance control in con- [67] A. Ansar, K. Daniilidis, Linear pose estimation from points or lines, IEEE Trans.
struction, Adv. Eng. Inform. 24 (2010) 107–118, http://dx.doi.org/10.1016/j.aei. Pattern Anal. Mach. Intell. 25 (2003) 578–589, http://dx.doi.org/10.1109/TPAMI.
2009.08.006. 2003.1195992.
[42] T.C. Lukins, E. Trucco, Towards automated visual assessment of progress in con- [68] F.M. Mirzaei, S.I. Roumeliotis, Globally optimal pose estimation from line corre-
struction projects, Proc. of the British Machine Vision Conference, 2007, pp. 1–10, , spondences, Proc. - IEEE International Conference on Robotics and Automation,
http://dx.doi.org/10.5244/C.21.18. 2011, pp. 5581–5588, , http://dx.doi.org/10.1109/ICRA.2011.5980272.
[43] X. Zhang, N. Bakis, T.C. Lukins, Y.M. Ibrahim, S. Wu, M. Kagioglou, G. Aouad, [69] L. Zhang, C. Xu, K.M. Lee, R. Koch, Robust and efficient pose estimation from line
A.P. Kaka, E. Trucco, Automating progress measurement of construction projects, correspondences, Lect. Notes Comput. Sci. (Including subser. Lecture Notes on
Autom. Constr. 18 (2009) 294–301, http://dx.doi.org/10.1016/j.autcon.2008.09. Artificial Intelligence, Lecture Notes Bioinformatics), 2013, pp. 217–230, , http://
004. dx.doi.org/10.1007/978-3-642-37431-9_17.
[44] D. Rebolj, N.Č. Babič, A. Magdič, P. Podbreznik, M. Pšunder, Automated con- [70] C. Xu, L. Zhang, L. Cheng, R. Koch, Pose estimation from line correspondences: a
struction activity monitoring system, Adv. Eng. Inform. 22 (2008) 493–503, http:// complete analysis and a series of solutions, IEEE Trans. Pattern Anal. Mach. Intell.
dx.doi.org/10.1016/j.aei.2008.06.002. (2016), http://dx.doi.org/10.1109/TPAMI.2016.2582162.
[45] M. Golparvar-Fard, F. Peña-Mora, Application of visualization techniques for con- [71] M.a. Fischler, R.C. Bolles, Random sample consensus: a paradigm for model fitting
struction progress monitoring, Computing in Civil Engineering, 2007, pp. 216–223, with applications to image analysis and automated cartography, Commun. ACM 24
, http://dx.doi.org/10.1061/40937(261)27. (1981) 381–395, http://dx.doi.org/10.1145/358669.358692.
[46] M. Golparvar-Fard, F. Peña-Mora, S. Savarese, Application of D4AR - A 4-dimen- [72] P. David, D. Dementhon, R. Duraiswami, H. Samet, SoftPOSIT: simultaneous pose
sional augmented reality model for automating construction progress monitoring and correspondence determination, Int. J. Comput. Vis. 59 (2004) 259–284, http://
data collection, processing and communication, Journal of Information Technology dx.doi.org/10.1023/B:VISI.0000025800.10423.1f.
in Construction 14 (2009) 129–153 (ISSN:14036835). [73] S. Gold, a. Rangarajan, C.P. Lu, S. Pappu, New algorithms for 2D and 3D point
[47] M. Golparvar-Fard, F. Peña-Mora, C.A. Arboleda, S. Lee, Visualization of con- matching:: pose estimation and correspondence, Pattern Recogn. 31 (1998)
struction progress monitoring with 4D simulation model overlaid on time-lapsed 1019–1031, http://dx.doi.org/10.1016/S0031-3203(98)80010-1.
photographs, J. Comput. Civ. Eng. 23 (2009) 391–404, http://dx.doi.org/10.1061/ [74] D.F. Dementhon, L.S. Davis, Model-based object pose in 25 lines of code, Int. J.
(ASCE)0887-3801(2009)23:6(391). Comput. Vis. 15 (1995) 123–141, http://dx.doi.org/10.1007/BF01450852.
[48] M. Golparvar-Fard, S. Savarese, F. Peña-Mora, Automated model-based recognition [75] D. Oberkampf, D.F. DeMenthon, Iterative pose estimation using coplanar feature
of progress using daily construction photographs and IFC-based 4D models, points, Comput. Vis. Image Underst. 63 (1996) 495–511, http://dx.doi.org/10.
Construction Research Congress 2010, 2010, pp. 51–60, , http://dx.doi.org/10. 1006/cviu.1996.0037.
1061/41109(373)6. [76] P. David, P. David, D. DeMenthon, D. DeMenthon, R. Duraiswami, R. Duraiswami,
[49] F. Dai, A. Rashidi, I. Brilakis, P. Vela, Comparison of image-based and time-of-flight- H. Samet, H. Samet, Simultaneous pose and correspondence determination using
based technologies for three-dimensional reconstruction of infrastructure, Journal line features, IEEE Computer Society Conference on Computer Vision and Pattern
of Construction Engineering and Management 139 (2013) 69–79, http://dx.doi. Recognition 2, 2003, pp. 424–431, , http://dx.doi.org/10.1109/CVPR.2003.
org/10.1061/(Asce)Co.1943-7862.0000565. 1211499.
[50] D. Nister, O. Naroditsky, J. Bergen, Visual odometry, Proc. 2004 IEEE computer [77] J.C. Diaz, M. Abderrahim, Modified SoftPOSIT algorithm for 3D visual tracking,
society conference on computer vision and pattern recognition, 2004, pp. 652–659, 2007 IEEE international symposium on intelligent, Signal Process. (2007), http://
, http://dx.doi.org/10.1109/CVPR.2004.1315094. dx.doi.org/10.1109/WISP.2007.4447523.
[51] D. Scaramuzza, F. Fraundorfer, Visual odometry part II, IEEE Robot. Autom. Mag. [78] F. Moreno-Noguer, V. Lepetit, P. Fua, Pose priors for simultaneously solving
18 (2011) 80–92, http://dx.doi.org/10.1109/MRA.2011.943233. alignment and correspondence, Lecture Notes in Computer Science (Lecture Notes
[52] J. Engel, J. Sturm, D. Cremers, Semi-dense visual odometry for a monocular camera, on Artificial Intelligence, Lecture Notes Bioinformatics), 2008, pp. 405–418, ,
Proc. IEEE International Conference on Computer Vision, 2013, pp. 1449–1456, , http://dx.doi.org/10.1007/978-3-540-88688-4-30.
http://dx.doi.org/10.1109/ICCV.2013.183. [79] M. Brown, D. Windridge, J.Y. Guillemaut, Globally optimal 2D-3D registration from
[53] S. Niko, K. Konolige, S. Lacroix, P. Protzel, Visual Odometry Using Sparse Bundle points or lines without correspondences, Proc. IEEE International Conference on
Adjustment on an Autonomous Outdoor Vehicle, Elektrotechnik Und Computer Vision, 2016, pp. 2111–2119, , http://dx.doi.org/10.1109/ICCV.2015.
Informationstechnik, (2005), pp. 157–163, http://dx.doi.org/10.1007/3-540- 244.
30292-1_20. [80] A.M.D.J.C, C. Rossi, EvoPose: a model-based pose estimation algorithm with cor-
[54] H. Strasdat, J.M.M. Montiel, A.J. Davison, Real-time monocular SLAM: why filter? respondences determination, IEEE International Conference on Mechatronics and
Proc. - IEEE International Conference on Robotics and Automation, 2010, pp. Automation, ICMA 2005, Niagara Falls, Ont., Canada, 2005, pp. 1551–1556, ,
2657–2664, , http://dx.doi.org/10.1109/ROBOT.2010.5509636. http://dx.doi.org/10.1109/ICMA.2005.1626786.
[55] H. Zhou, D. Zou, L. Pei, R. Ying, P. Liu, W. Yu, StructSLAM: visual SLAM with [81] J. Xia, X. Xu, J. Xiong, Simultaneous Pose and Correspondence Determination Using
building structure lines, IEEE Trans. Veh. Technol. 64 (2015) 1364–1375, http:// Differential Evolution, Proc. - International Conference on Natural Computation,
dx.doi.org/10.1109/TVT.2015.2388780. Chongqing, 2012, pp. 703–707, , http://dx.doi.org/10.1109/ICNC.2012.6234643.
[56] P. Podbreznik, D. Rebolj, Automatic comparison of site images and the 4D model of [82] H. Hamledari, B. Mccabe, S. Davari, Automated computer vision-based detection of
the building, in: R. Scherer, P. Katranuschkov, S.E. Schapke (Eds.), Conference on components of under-construction indoor partitions, Autom. Constr. 74 (2017)
Information Technology in Construction, Dresden, Germany, 2005, pp. 235–239 78–94, http://dx.doi.org/10.1016/j.autcon.2016.11.009.
(3860054783). [83] C. Kropp, C. Koch, M. König, Drywall state detection in image data for automatic
[57] T.C. Lukins, E. Trucco, Towards automated visual assessment of progress in con- indoor progress monitoring, in: R. Issa, I. Flood (Eds.), Computing in Civil and
struction projects, Proceedings of the British Machine Vision Conference, 2007, pp. Building Engineering, ASCE, Orlando, Florida, United States, 2014, pp. 347–354, ,
142–151, , http://dx.doi.org/10.5244/C.21.18. http://dx.doi.org/10.1061/9780784413616.044.
31
[84] C. Kropp, M. König, C. Koch, Object Recognition in bim registered videos for indoor on LBD descriptor and pairwise geometric consistency, J. Vis. Commun. Image
progress monitoring, Proc. 20th International Workshop on Intelligent Computing Represent. 24 (2013) 794–805, http://dx.doi.org/10.1016/j.jvcir.2013.05.006.
in Engineering, Vienna, Austria, 2013 9783200031456, pp. 1–10. [91] C. Zhao, H. Zhao, J. Lv, S. Sun, B. Li, Multimodal image matching based on mul-
[85] Y.M. Ibrahim, T.C. Lukins, X. Zhang, E. Trucco, A.P. Kaka, Towards automated timodality robust line segment descriptor, Neurocomputing 177 (2016) 290–303,
progress assessment of workpackage components in construction projects using http://dx.doi.org/10.1016/j.neucom.2015.11.025.
computer vision, Adv. Eng. Inform. 23 (2009) 93–103, http://dx.doi.org/10.1016/ [92] D. Van Opdenbosch, G. Schroth, R. Huitl, S. Hilsenbeck, A. Garcea, E. Steinbach,
j.aei.2008.07.002. Camera-based indoor positioning using scalable streaming of compressed binary
[86] M.V. Escorcia, M. Dávila, J. Niebles Golparvar-Fard, Automated vision-based re- image signatures, 2014 IEEE International Conference on Image Processing ICIP
cognition of construction worker actions for building interior construction opera- 2014, 2014, pp. 2804–2808, , http://dx.doi.org/10.1109/ICIP.2014.7025567.
tions using RGBD cameras, Construction Research Congress, 2012, pp. 879–888, , [93] F. Dévai, An optimal hidden-surface algorithm and its parallelization, Lecture Notes
http://dx.doi.org/10.1061/9780784412329.089. in Computer Science (Lecture Notes in Artificial Intelligence, Lecture Notes
[87] R. Grompone Von Gioi, J. Jakubowicz, J.M. Morel, G. Randall, LSD: a fast line Bioinformatics), 2011, pp. 17–29, , http://dx.doi.org/10.1007/978-3-642-21931-
segment detector with a false detection control, IEEE Trans. Pattern Anal. Mach. 3_2.
Intell. 32 (2010) 722–732, http://dx.doi.org/10.1109/TPAMI.2008.300. [94] S. Savarese, L. Fei-Fei, 3D generic object categorization, localization and pose es-
[88] C. Akinlar, C. Topal, EDLines: a real-time line segment detector with a false de- timation, Proc. IEEE International Conference on Computer Vision, IEEE, 2007, pp.
tection control, Pattern Recogn. Lett. 32 (2011) 1633–1642, http://dx.doi.org/10. 1–8, , http://dx.doi.org/10.1109/ICCV.2007.4408987.
1016/j.patrec.2011.06.001. [95] D. Eggert, L. Stark, K. Bowyer, Aspect graphs and their use in object recognition,
[89] Z. Wang, F. Wu, Z. Hu, MSLD: A robust descriptor for line matching, Pattern Ann. Math. Artif. Intell. 13 (1995) 347–375, http://dx.doi.org/10.1007/
Recogn. 42 (2009) 941–953, http://dx.doi.org/10.1016/j.patcog.2008.08.035. BF01530835.
[90] L. Zhang, R. Koch, An efficient and robust line segment matching approach based
32

Kropp - 4D Dim Construction State Recognition

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kropp - 4D Dim Construction State Recognition

Uploaded by

Copyright:

Available Formats

Automation in Construction 86 (2018) 11–32

Contents lists available at ScienceDirect

Interior construction state recognition with 4D BIM registered image T

1. Introduction ﬁnishing works represent a high-ranking section of every construction

Geometry Relevant Objects

Rough Fine Pose ROI in Video State

Rough Motion Fine Image Rectified

Fig. 1. Concept overview for video-based progress monitoring.

Fig. 2. Processing of the registration of each frame by pose estimation.

The relative motion is scaled to the absolute motion

Knowing the absolute motion from frame j to frame k, the absolute

The recognition block of the general framework contains several

Fig. 4. Extended RANSAC scheme for pose determination with

the type of input activity. 3.2.3. State recognition

Fig. 5. Rectiﬁcation of two images from diﬀerent poses to one

Original Image Rectified Images Original Image

thus disturb a clear view on the expected texture characteristics and

4.2.2. Radiator installation state determination with object recognition

Test case Overview Example images of the sequence

Translation & rotation

Translation 18.098 721 1 30 0.281 VO 0.267

Drywall State Recognition

Geometry Relevant Objects

Compass Edge Intensity

Seq. Sequence length Radiator Drywall 1 Drywall 2

#1 808 274 274 Not 326 326 Installed 15 15 Installed

Fig. 18. First example result frame of the ﬁrst

Fig. 19. Second example result frame of the ﬁrst

Fig. 20. Third example result frame of the ﬁrst

Fig. 21. First example result frame of the second

Fig. 22. Second example result frame of the

Fig. 23. Third example result frame of the second

Fig. 24. Fourth example result frame of the

Fig. 25. First example result frame of the third

Fig. 26. Second example result frame of the third

Fig. 27. Third example result frame of the third

Fig. 28. Fourth example result frame of the third

Figure Example sequence Drywall #1 Radiator Drywall #2

Visibility Installed/Plastered/Painted Visibility Not installed/Installed Visibility Installed/Plastered/Painted

Fig. 18 1 50% 1/0/0 Yes 1/0 28% 0/0/0

You might also like