You are on page 1of 15

Visual Tracking of Construction Jobsite Workforce and

Equipment with Particle Filtering


Zhenhua Zhu, A.M.ASCE 1; Xiaoning Ren 2; and Zhi Chen 3

Abstract: Tracking workforce and equipment at construction jobsites has attracted considerable interest, considering its importance for
productivity analysis, safety monitoring, and dynamic site layout planning, for example. Several real-time locating systems (RTLSs) are
Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

commercially available, but their requirements for attaching sensors or tags to a workers or equipment raise privacy concerns. Recently,
the idea of using video cameras statically placed at construction jobsites to track workers and equipment has been proposed and tested.
One challenge of visual tracking stems from jobsite occlusions, which significantly affect tracking performance. This paper presents a vision
tracking method using particle filters to address the issue of occlusions at construction jobsites. The method includes two main phases. First,
the worker or mobile equipment of interest is manually initiated with a rectangular window, and hundreds of particles are generated. Then
each particle is propagated and its weight is calculated by measuring its observation likelihood. The particles are resampled based on their
weights. This makes it possible to follow a worker or equipment of interest. The method has been tested at real construction jobsites in
Quebec, and tracking results demonstrated its effectiveness. DOI: 10.1061/(ASCE)CP.1943-5487.0000573. © 2016 American Society of
Civil Engineers.
Author keywords: Automatic identification systems; Data collection; Imaging techniques; Automation.

Introduction construction scenarios. They found that the UWB technology


was a good tool for tracking project-related entities in real time,
Tracking workforce and mobile equipment at construction jobsites but its accuracy decreased as a result of interactions between human
has attracted considerable interest. This is mainly because tracking bodies and sensor tags (Nasr et al. 2013). Also, the initial invest-
results provide construction researchers and professionals with im- ment and long-term maintenance cost were another consideration
portant insights that allow them to address several critical issues for owners and contractors (Nasr et al. 2013). In addition, all
regarding construction performance and safety at jobsites. For ex- existing RTLSs share one common limitation: workers and equip-
ample, the tracking of construction workers and equipment could ment must be physically tagged in order to track them on construc-
make it possible to estimate their productivity (Gong and Caldas tion jobsites. Tagging workers might provoke strong resistance and
2011). Also, tracking helps construction managers learn how much opposition from unions because of the privacy concerns involved.
time workers have been wasting on obtaining materials and tools As a result, such RTLSs have not been widely used to track con-
(Weerasinghe and Ruwanpura 2009). Tracking mobile equipment struction workers at most construction jobsites.
and workers on foot could protect workers from potential collisions Tracking workers and equipment using videos collected by
with equipment at construction jobsites (Teizer and Vela 2009; high-definition (HD) video cameras provides another alternative.
Yang et al. 2010). Tracking workers at heights could be used to Typically, the cameras are statically placed on construction jobsites.
classify unsafe activities, such as leaning too far to one side or They capture construction activities at jobsites into videos and do
reaching too far overhead (Han and Lee 2013; Han et al. 2013). not require physically tagging workers. Therefore, compared with
Currently, several real-time locating systems (RTLSs) are com- the use of RTLSs to track construction workers, the installation of
mercially available that facilitate the tracking of workers and equip- video cameras on construction jobsites is more acceptable and more
ment at construction jobsites. These systems include but are not amenable to unions. In fact, HD cameras have already been in-
limited to radio frequency identification, global positioning sys- stalled on many construction jobsites to document project construc-
tems, wireless local area networks, and ultra-wideband (UWB). tion progress at the sites.
Nasr et al. (2013) compared the applications of these RTLSs in One significant challenge of visually tracking workers and
equipment in videos stems from occlusions at jobsites. Construc-
1
Assistant Professor, Dept. of Building, Civil, and Environmental En- tion jobsites are commonly cluttered with different types of mate-
gineering, Concordia Univ., Montreal, Canada H3G 1M8 (corresponding rials and tools. These materials and tools, plus as-built facilities and
author). E-mail: zhenhua.zhu@concordia.ca temporary structures, block the fields of view (FOVs) of video cam-
2
Graduate Student, Dept. of Building, Civil, and Environmental eras, even if the cameras are placed at heights. Therefore, workers
Engineering, Concordia Univ., Montreal, Canada H3G 1M8. E-mail: and equipment might not always be in the line of sight of the cam-
ryanren528@gmail.com eras; they could be severely occluded, as shown in Fig. 1.
3
Professor, Dept. of Building, Civil, and Environmental Engineering, The existence of occlusions significantly affects the robustness
Concordia Univ., Montreal, Canada H3G 1M8. E-mail: zhi.chen@
of visual tracking performance. It easily leads to tracking failure
concordia.ca
Note. This manuscript was submitted on April 28, 2015; approved on because the visual features of a target become difficult to follow
December 15, 2015; published online on March 29, 2016. Discussion per- under occlusion conditions. Tracking failure due to occlusions fur-
iod open until August 29, 2016; separate discussions must be submitted for ther limits the use of visual tracking techniques at real construction
individual papers. This paper is part of the Journal of Computing in Civil jobsites as an alternative to the RTLS techniques. Therefore, it is
Engineering, © ASCE, ISSN 0887-3801. important to improve visual tracking robustness under occlusion

© ASCE 04016023-1 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

Fig. 1. Example of occlusions at construction jobsites

conditions, as recommended in the recent research work of Yang cameras. Specific to visual detection, Park and Brilakis (2012)
et al. (2015). and Rezazadeh Azar and McCabe (2012) relied on a histogram
This paper presents a vision-based method to track construction of oriented gradients (HOG) and Haar-like features to perform
workers and equipment at construction jobsites. The method relies worker and equipment detection. A similar research study was also
on the concept of particle filters to address severe occlusions that found in the work of Memarzadeh et al. (2012, 2013), who com-
could happen during tracking. The method consists of two main bined the HOG and color features with a new multiple binary sup-
steps. First, the tracking of workers or equipment of interest is port vector machine (SVM) classifier to automatically detect and
manually initiated with a rectangular window. In the window, hun- distinguish workers and equipment in on-site videos.
dreds of particles are generated. Then the particles are propagated As for visual tracking, most of the existing studies directly ap-
when a new video frame is generated. The observation likelihood plied existing visual tracking methods created in the field of com-
of each particle is measured. The measurements are used as puter vision to facilitate construction engineering and management
weights to resample the particles. Based on the distribution of tasks. For example, Zou and Kim (2007) adopted a simple hue,
the resampled particles, a new rectangular window is generated. saturation, and value (HSV) color-based tracking scheme to mon-
In this way, the workforce of interest can be followed in the new itor the movement of an excavator, so that its corresponding idle
video frame. time could be calculated. Weerasinghe and Ruwanpura (2009)
The method has been tested in videos collected from a real con- tracked construction workers based on the shape and color of their
struction jobsite in Montreal, Canada. A HD camera was placed on hardhats to monitor the workers’ performance. Gong and Caldas
the jobsite to record the activities of construction workers in the (2011) used the mean-shift tracking algorithm to measure working
Roccabella residential project. The test results on these real-jobsite cycles of a mini loader. To estimate dirt-loading cycles during
videos demonstrate the performance of the method in terms of han- earthmoving operations, Rezazadeh Azar et al. (2013) relied on
dling occlusions. The method was able to track the construction KLT features (Tomasi and Kanade 1991) to track a loading truck.
workers and equipment even when they were severely occluded The KLT method is a differential method for estimating the optical
by temporary structures, as-built facilities, and other workers at flow, which assumes the flow is constant in the local neighborhood
the jobsite. of the pixel of interest and addresses the basic optical flow equa-
tions for all the pixels that belong to that neighborhood.
In addition, some research studies have focused on evaluating
Related Work
existing tracking methods in construction scenarios. Park et al.
This section first describes existing research studies that have been (2011) conducted a comparative study and evaluated the visual
performed on visually detecting and tracking workers or equipment tracking methods in three categories (i.e., contour-based tracker,
at jobsites. This is followed by the introduction of the idea of par- kernel-based tracker, and point-based tracker) to track workers,
ticle filtering and its potential applications in the area of visual equipment, and materials at construction jobsites. Based on their
tracking in general. comparison results (Park et al. 2011), they concluded that the
kernel-based tracking method (Ross et al. 2008) outperformed the
point-based tracking method (Mathes and Piater 2006). A similar
Visual Detection and Tracking at Construction Jobsites comparison was also made by Teizer and Vela (2009), but their study
So far, several research studies have been performed to investigate only focused on the tracking of construction workers. In that study,
the potential for detecting and tracking, for example, workers, four common tracking methods (i.e., density mean shift tracking,
equipment, and materials at construction jobsites using video Bayesian contour tracking, active contour tracking, and graph-cut

© ASCE 04016023-2 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


tracking) were compared, and it was found that the Bayesian contour when used to track nonlinear systems (Mihayloya et al. 2007). Be-
tracking outperformed the other three (Teizer and Vela 2009). cause the problems encountered in the visual tracking of on-site
Following the evaluation results, new ideas on how to use construction resources are typically nonlinear or non-Gaussian,
existing tracking methods were proposed. One example was to en- there is a huge potential for the use of particle filtering to perform
hance the detection of construction resources at jobsites with the construction resource tracking. In addition, construction sites are
integration of visual tracking. Park and Brilakis (2012) integrated cluttered. The visual tracking of on-site construction resources must
kernel-based tracking (Ross et al. 2008) into their equipment resolve occlusion issues. Particle filtering is expected to address
detection scheme based on equipment motion, shape, and color possible occlusions since it functions even if data are missing in
features. The results showed that the tracking results could replace the visual tracking process. For these two main reasons, this paper
false detections and, therefore, enhance detection performance investigates the use of particle filtering techniques to track con-
(Park and Brilakis 2012). A similar idea was proposed and imple- struction resources on construction jobsites.
mented by Rezazadeh Azar et al. (2013), where a KLT tracker Currently, the potential of particle filtering in terms of tracking
(Tomasi and Kanade 1991) was used to narrow the search space construction resources at construction jobsites remains unclear. On
for the detection of trucks using a HOG detector (Prisacariu and the one hand, existing particle filter–based tracking methods have
Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

Reid 2009) and to determine truck locations when detection failed. not been tested in construction scenarios. Construction jobsites
In addition, Yang et al. (2010) developed a multiple tracking have unique characteristics. They are typically dirty and cluttered
management scheme that could be used to track multiple workers with materials, tools, equipment, and workers. Workers and equip-
under conditions of no occlusion, partial occlusion, or severe oc- ment are always in close proximity to each other at construction
clusions using different strategies. The scheme was based on the jobsites. These real jobsite conditions produce severe occlusions.
assumption that each worker was on the same flat ground plane with Therefore, the promising tracking results in other test scenarios
walking poses, and the scheme was integrated into the kernel covari- do not validate the effectiveness of particle filters when it comes
ance tracking with an offline trained spatial model and an online to tracking workers, equipment, and other things in complex
learned color model (Yang et al. 2010). The tracking results were
and challenging construction scenarios.
promising. However, the researchers mentioned that the tracking
On the other hand, little work has been done on the use of par-
still failed when a worker underwent severe occlusions for a period
ticle filters to track construction jobsite resources. Gong and Caldas
of time (Yang et al. 2010). Therefore, it was important to find an
(2011) once tested the integration of the mean shift and particle
effective way to recover from lost tracking under severe occlusions.
filtering methods and recommended the combination of the mean
shift and Kalman filters and particle filters since such combinations
Particle Filtering and Its Applications in Visual took the best innate features of each method. However, insufficient
Tracking visual tracking results were obtained in their tests for determining
Particle filtering is the implementation of a recursive Bayesian how severe occlusions might be handled by particle filters. For ex-
estimation using a sequential Monte Carlo approach (Rao and ample, it is unknown whether a worker of interest could still be
Satyanarayana 2013). The main objective of recursive Bayesian es- tracked under severe or even temporary full occlusions, as shown
timation is to track the state of a target as it evolves over time, given in Fig. 1.
a series of observation measurements over time. The estimation
could be approximated using a Monte Carlo approach that relies
on a particle-based representation (Salmond and Gordon 2005). Research Objective and Scope
More details regarding the introduction of particle filtering can This study aims to fill the aforementioned gaps. The overall objec-
be found in the work of Rao and Satyanarayana (2013) and tive of the study is to develop a tracking method based on particle
Salmond and Gordon (2005). filters and then test its effectiveness in real construction scenarios.
So far, particle filtering has been widely used to track objects The focus of the test will be on checking whether the particle filter–
of interest, such as vehicles, pedestrians, and athletes, in video se- based method developed in this paper is able to track construction
quences (Mihaylova et al. 2007; Rao and Satyanarayanna 2013). resources at real construction jobsites under severe and even tem-
For example, Shan et al. (2007) created a real-time hand tracking porary full occlusions since occlusions are common at most con-
system using a mean shift embedded particle filter. Wei et al. (2011) struction jobsites. The test results are expected to help construction
proposed a layered particle filter architecture that embedded a con- researchers and professionals investigate and learn about the poten-
tinuous adaptive mean shift algorithm to track multiple vehicle tar- tial of using particle filters in visual tracking in a more comprehen-
gets. Wang et al. (2009) incorporated the CamShift algorithm into sive way.
the probabilistic framework of a particle filter to track, for example, The method developed in this paper is generic. It is supposed to
soccer players and F1 racecars. Hess and Fern (2009) tested their dis- be able to track different types of construction resources. The spe-
criminatively trained particle filters to track American football players cific focus in this paper is on the tracking of workers and mobile
on the field. Lu et al. (2009) relied on a boosted particle filter to track construction equipment. The method does not include or rely on
and recognize actions of multiple hockey players. Ababsa (2010) any kind of offline training or online learning strategies. Offline
used an iterated particle filter to perform real-time camera tracking training would limit the generic nature of the tracking since only
in structured environments. Zheng and Bhandarkar (2009) detected objects trained offline can be tracked. On the other hand, online
and tracked faces using a boosted adaptive particle filter.
learning would allow updates to be made to the visual features
of objects in the tracking process, but it might lose the tracking
Research Gaps, Objective, and Scope of those objects of interest under cluttered scenarios, as shown
in the comparative example presented subsequently. For these rea-
sons, neither offline training nor online learning is considered in
Research Gaps to Fill this paper for the moment. However, they will be integrated into
Particle filtering has shown promising tracking results in recent re- the method to make the tracking more stable later on when the ro-
search studies. It has proven to be powerful and reliable, especially bustness of the method to occlusions is tested in the paper. Effective

© ASCE 04016023-3 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


First video Particles
Particles Resampling
frame initialization

Observation
Particles Weights
New video likelihood
propagation computation
frames computation

Motion Observation
Prediction Update
model model
phase phase

Fig. 2. Method overview


Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

integration of offline training or online learning represents a and the corresponding motion model. Existing dynamic models
separate research topic that is beyond the scope of this paper. include, for example, general random-walk models (Nait-Charif
and McKenna 2003) and constant acceleration models (Bar-
Shalom and Li 1993). Here, the idea of a mixed-state motion
Tracking Methodology Based on Particle Filters model is used. Brasnett et al. (2007) mentioned that mixed-state
motion models could be used to overcome partial or full
Suppose xk is the state vector of a target to be tracked and zk is an
occlusions.
observation related to xk at time step k. The tracking is used to
Specifically, two motion models are adopted. The first model
recursively calculate the posterior probability density function
assumes that a particle remains static. The second one assumes that
(PDF) pðxk jz1; : : : ;k Þ that represents the degree of belief in state
the velocity of the particle in a short period of time is constant. It
xk at time step k, given fz1 ; z2 ; : : : ; zk g. Here, the Monte Carlo
will be the same as the velocity of the particle from the previous
approach is used to evaluate the posterior PDF pðxk jz1; : : : ;k Þ with
time step to the current time step. Eqs. (1) and (2) mathematically
a particle-based representation, as suggested by Salmond and
describe the two motion models separately:
Gordon (2005). First, N particles are randomly generated following
the posterior PDF pðx0 jz0 Þ of the initial state. Suppose these par-
xi− i i
kþ1 ¼ xk þ wk ð1Þ
ticles are denoted by fx10 ; x20 ; : : : ; xN0 g. At time step k, k ¼
1; 2; : : : ; and the particles are propagated to generate a set of par-
ticles denoted by fx1− 2− N−
k ; xk ; : : : ; xk g. The distribution of the pro-
pagated particles fx1− 2− N− xi− i i i
kþ1 ¼ 2 × xk − xk−1 þ wk ð2Þ
k ; xk ; : : : ; xk g represents the prior PDF
pðxk jz1; : : : ;k−1 Þ. Then the distribution is updated in light of obser-
vation measurement zk . Specifically, the relative observation like- where xi− i
k = predicted position (time step k þ 1) of ith particle; xk =
i
lihood of each particle is calculated. This likelihood indicates the current position (time step k) of ith particle; xk = previous position
prior PDF pðzk jxi− k Þ. Based on the likelihood, the weight of each (time step k − 1) of ith particle; and wik = white noise. For each
particle in the distribution is determined and normalized. Then the particle, the decision to select which motion model to adopt is made
N−
particles fx1− 2−
k ; xk ; : : : ; xk g are resampled to produce a new set
based on a random value generated dynamically.
of particles fx1k ; x2k ; : : : ; xNk g, where particles having low weights
are eliminated and particles having high importance weights are
multiplied (Hol et al. 2006). The distribution of the resampled par-
ticles should follow the posterior PDF pðxk jz1; : : : ;k Þ. In this way, the
estimation is made.
The overall framework of the tracking method using particle fil-
ters is illustrated in Fig. 2. It includes two main phases: prediction
and updating. First, the target construction resource to be tracked is
located with a rectangular window in the first video frame. Then, N
particles (N ¼ 200 in this study) are generated around the center of
the window following a multivariate normal distribution. When a
new video frame is received, the positions of the particles are pro-
rogated, and the visual features of the particles in the new frame are
extracted. These features are compared with the original features of
the particles in the first video frame to calculate the observation
likelihoods and determine the weight of each corresponding par-
ticle. Based on the weights, the particles are resampled to locate
the target construction resource in the new video frame.

Prediction Phase
In the prediction phase (Fig. 2), the positions of the particles in the
Fig. 3. Example camera setup
new video frame are estimated based on their previous positions

© ASCE 04016023-4 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


local regions are used as the visual features to estimate observation
likelihoods. In the HSV space, the intensity information is de-
coupled from the colors, so it should be more robust to illumination
variations than the RGB (red, green, blue) color space at a modest
computational cost.
The histogram of each particle in the new video frame is com-
pared with its histogram in the first video frame after normalization.
Then the closeness of the two histograms are measured by comput-
ing their Bhattacharyya similarity coefficient (Bsc ) (Aherne et al.
1997). The corresponding exponential value (e−20Bsc ) related to
the coefficient (Bsc ) is calculated as the weight for the particle re-
sampling. The use of e−20Bsc was suggested by previous researcher
(Vermaak et al. 2003) and was also confirmed in the experiments
conducted for this study.
Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

Based on the weights, the particles are resampled. The


purpose of the resampling is to multiply the particles with high
weights and suppress those with low weights. As a result, new
particles are generated. These particles have the same weights
but still approximate the distribution of the particles before re-
sampling. Here, the deterministic resampling algorithm devel-
oped by Kitagawa (1996) is adopted for resampling since it is
one of the most frequently used algorithms for particle filters
(Hol et al. 2006).
Fig. 4. Particles initialized for tracking

Implementation and Results


Update Phase
The propagation predicts the positions of the particles in the new Implementation and Test Bed
video frame. Then the visual features of the local region centered at The method was implemented as a prototype in the MATLAB
each particle are retrieved. Here, the HSV color histograms in the R2014b platform. The image processing toolbox and computer

Video frame 0001 Video frame 0030 Video frame 0060

Video frame 0090 Video frame 0120 Video frame 0150

Video frame 0180 Video frame 0210 Video frame 0240

Fig. 5. Worker tracking with view changes

© ASCE 04016023-5 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


Video frame 0270 Video frame 0320 Video frame 0370
Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

Video frame 0420 Video frame 0470 Video frame 0520

Video frame 0570 Video frame 0620 Video frame 0670

Fig. 6. Tracking worker movement

Video frame 0720 (Occlusion 0%) Video frame 0750 (Occlusion 0%) Video frame 0780 (Occlusion 20%)

Video frame 0810 (Occlusion 50%) Video frame 0840 (Occlusion 80%) Video frame 0870 (Occlusion 80%)

Video frame 0900 (Occlusion 80%) Video frame 0930 (Occlusion 50%) Video frame 0960 (Occlusion 20%)

Fig. 7. Worker tracking with severe occlusions

© ASCE 04016023-6 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


vision system toolbox built in MATLAB were used. Both toolboxes 64-bit operating system, Microsoft Windows 7 Enterprise. The
provide algorithms, functions, and tools for image and video hardware configuration for the test included an Intel Xeon CPU
processing, including, for example, image/video file I/O, image/ E5-1607 at 3.00 GHz, 6 GB memory, and an NVIDIA Quadro
video display, and drawing graphic results (MATLAB). These algo- 600 graphic processing unit (GPU).
rithms, functions, and tools significantly facilitated the method’s The real jobsites of two construction projects in Quebec were
development and implementation. The method was tested in a selected as the test beds to evaluate the tracking performance of the
Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

Video frame 0001 Video frame 0030 Video frame 0060

Video frame 0090 Video frame 0120 Video frame 0150

Video frame 0180 Video frame 0210 Video frame 0240

Fig. 8. Tracking the movement of a worker with a ladder

Video frame 230 (Occlusion 28%) Video frame 231 (Occlusion 31%) Video frame 232 (Occlusion 21%)

Video frame 233 (Occlusion 19%) Video frame 234 (Occlusion 13%) Video frame 235 (Occlusion 7%)

Fig. 9. Worker tracking with full occlusions

© ASCE 04016023-7 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


method in the paper. The first project involved building a 2-story daily construction activities. The jobsite videos, with a total size
commercial podium and two residential towers of 40 floors above a of 1.65 GB, were collected and used as the test bed. Fig. 3 shows
5-story underground parking. The second one involved building an example of how the video cameras were set up at the construc-
four hydropower generating stations on the RiviŁre Romaine. tion jobsites. All construction jobsite videos were captured in
HD cameras were placed on the construction jobsites to record outdoor environments.
Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

Video frame 0200 (Occlusion 0%) Video frame 0220 (Occlusion 23%) Video frame 0240 (Occlusion 35%)

Video frame 0260 (Occlusion 55%) Video frame 0280 (Occlusion 50%) Video frame 0300 (Occlusion 32%)

Fig. 10. Roller tracking

Video frame 0190 (Occlusion 0%) Video frame 0230 (Occlusion 9%) Video frame 0270 (Occlusion 12%)

Video frame 0310 (Occlusion 17%) Video frame 0350 (Occlusion 39%) Video frame 0390 (Occlusion 35%)

Video frame 0430 (Occlusion 28%) Video frame 0470 (Occlusion 10%) Video frame 0510 (Occlusion 0%)

Fig. 11. Truck tracking

© ASCE 04016023-8 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


Video frame 00270 Video frame 0520 Video frame 0770
Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

Video frame 01020 Video frame 01270 Video frame 01520

Video frame 01770 Video frame 02020 Video frame 02270

Fig. 12. Dozer tracking

Results the need for modifications. Figs. 10–12 show examples of tracking
a roller, truck, and dozer at construction jobsites. In Fig. 10,
Fig. 4 shows an example of the particles initially generated in the
the roller is being driven by an operator from one side of the jobsite
rectangular window that identified a worker of interest to be tracked
at the jobsite. Some of the tracking results of this worker are illus- to another. During the movement of the roller, its view is
trated in Figs. 5–7. As mentioned earlier, the tracking results were temporarily occluded by a column in the middle. In the figure, us-
obtained here without the need for offline training or online detec- ing the proposed method, the roller was able to be tracked even
tion. Specifically, Fig. 5 shows that the particle filter–based method with the occlusions. As with the tracked worker, the roller was lost
developed in this paper was able to address changes in the view of to tracking when the roller was occluded by a column. When it
the worker during the tracking process. Fig. 6 illustrates that the came back to the camera’s line of sight, the tracking resumed.
worker could be tracked when he walked from one side of the job- In Fig. 11, a truck was tracked using the proposed method as it
site to another. Fig. 7 shows the robustness of the method in han- passed by an excavator. The excavator partially occluded the truck.
dling severe occlusions when workers were partially occluded by Also, its color is similar to that of the truck. The results showed that
temporary structures at the jobsite. The tracking results were the method still made it possible to correctly identify the truck and
recorded on video and presented in Zhu et al. (2016d). continue tracking the truck. This is because the proposed tracking
Figs. 8 and 9 show another example of tracking a worker on a method relies not just on color values but on the histogram of the
construction jobsite. The worker carried a ladder and walked along color values as well. The histogram represents the distribution of
the side of the jobsite. Then he put the ladder down and turned back the color values centered on each particle, which makes it
(Fig. 8). When he passed behind the column under construction, possible to differentiate between the truck and the excavator in
full occlusions happened. The tracking method propagated the par- the example. More tracking results similar to the tracking of the
ticles and tried to find the worker. However, the worker could not be construction workers in the previous examples can be found in
found because of the occlusions. Therefore, the tracking was kept at Zhu et al. (2016b, c, a).
the position where the worker disappeared. When the worker came
back to the view of the camera, the tracking recovered, as shown in
Metric Measurements
Fig. 9. The detailed process of how the particles were propagated to
recover the tracking could be found in the corresponding tracking To evaluate tracking performance in a quantitative manner, the
video available in Zhu et al. (2016e). ground truths of the objects of interest to track were manually la-
In addition to the tracking of construction workers, mobile beled in the test videos. Then the tracking results from the proposed
equipment can also be tracked using the proposed method without method were compared with the ground truths to calculate the

© ASCE 04016023-9 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


tracking precision and spatial overlap. Tracking precision is mea- in the x- and y-directions is illustrated in Figs. 15 and 16. The cor-
sured by center location error, which is typically defined as the responding tracking spatial overlap is illustrated in Fig. 17. The
Euclidean distance between the center locations of the tracked average relative tracking precisions and spatial overlaps for the
objects and their corresponding ground truths in the videos. The previous examples are summarized in Table 1.
distance may be associated with the image size. The larger an im-
age is, the larger measurement value of the distance will be. There-
fore, precision is presented in a relative manner. Specifically, the Comparison
precision in the x-direction is defined as the Euclidean distance The proposed tracking method is compared with the work of Ross
in the x-direction divided by the image width, while the precision et al. (2008). The tracking method of Ross et al. (2008) relied on
in the y-direction is defined as the Euclidean distance in the incremental learning, which was expected to increase the tracking
y-direction divided by the image height. Spatial overlap is mea- performance, especially in cluttered scenarios. Park et al. (2011)
sured by the ratio of the intersection (∩) and union (∪) of the evaluated the method of Ross et al. (2008) and compared it
tracked object and its manually labeled ground truth in each video with other tracking methods in their comparative study of vision
frame (Szczodrak et al. 2010). tracking methods for the tracking of construction resources. The
Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

The way to calculate both metrics is illustrated in Fig. 13. The results showed that the method of Ross et al. (2008) was one of
absolute center location error for the tracking of the worker in the most appropriate methods for tracking construction resources
Figs. 5–7 is presented in Fig. 14, while the relative location error at construction jobsites (Park et al. 2011). For this reason, in this
paper, the proposed tracking method is compared with the tracking
method of Ross et al. (2008).
Fig. 18 illustrates a comparative example for the tracking of the
same worker in a test video. The results in this example show that
under the method of Ross et al. (2008), the tracking of the worker
stopped after he put the ladder down. Instead, the ladder started to
be tracked partly as a result of the online incremental learning
strategy used in the method.

Discussion

The test results showed the potential of tracking workers and


mobile equipment at construction jobsites using the particle fil-
ter–based tracking method developed in this paper. Currently,
the framework proposed in this paper is limited to tracking only
one construction object. It is unable to track multiple objects
at the same time. Therefore, modifications are needed to extend
the proposed framework to enable it to simultaneously track multi-
ple objects. Recall that the tracking strategy in the proposed frame-
work is based on the prediction and updating of particle positions.
One possible modification would be to generate particles for each
construction object to be tracked.
The occlusions at the construction jobsites in this study affected
the tracking performance. Because of the occlusions, the targeted
workers or equipment was not followed exactly. As a result, center
location errors increased and the tracking spatial overlap ratios
Fig. 13. Tracking precision and spatial overlap were reduced. It might be necessary to interpolate the tracking
results to smoothe the tracking transitions when a worker or

70
Center Location Error
60
Euclidean distance (pixels)

50

40

30

20

10

0
200 400 600 800 1000 1200
Frame #

Fig. 14. Precision for tracking worker in Figs. 5–7

© ASCE 04016023-10 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


equipment is behind occlusions. The proposed method did not in- or equipment came out from the occlusion and reappeared in the
terpolate the tracking results. Instead, the tracking stayed (but did camera’s view. Interpolation could be performed by introducing
not stop) where the worker or equipment was lost as a result of other types of digital filters. For example, the Kalman filter was
occlusions. Then it found the new location to track when the worker designed and tested to predict the positions of workers when they

3%

Relative center location error in X-direction

2%
Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

1%

0%
200 400 600 800 1000 1200
Frame #

Fig. 15. Relative center location error in x-direction for tracking worker in Figs. 5–7

6%
Relative center location error in Y- direction

4%

2%

0%
200 400 600 800 1000 1200
Frame #

Fig. 16. Relative center location error in y-direction for tracking worker in Figs. 5–7

100%

80%
Overlap ratio

60%

40%

20%

Tracking Spatial Overlap Ground Truth


0%
200 400 600 800 1000 1200
Frame #

Fig. 17. Spatial overlap for tracking worker in Figs. 5–7

© ASCE 04016023-11 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


Table 1. Average Tracking Precision and Spatial Overlap lost because of the occlusions. This is due mainly to the motion
Relative center models adopted in the proposed framework. So far, only two mo-
location error tion models have been developed. These motion models were cre-
ated based on the assumptions that either particles remain static or
x-direction y-direction Tracking spatial
Tracking examples (%) (%) overlap (%) their velocity over a short period of time is constant.
Moreover, to test the extent of the method’s ability to address
Worker in Figs. 5–7 0.3 0.7 69 occlusion issues, the following experiment was conducted. A truck
Worker in Figs. 8 and 9 4.2 5.6 50
was moving from left to right at an open construction site, and an
Roller in Fig. 10 0.4 0.5 64
Truck in Fig. 11 1.0 0.4 65 artificial occlusion (a black box) was produced manually along the
Dozer in Fig. 12 0.6 0.2 78 truck’s motion path, as shown in Fig. 19. The width of the occlusion
was increased by 10 pixels each time. It was found that the pro-
posed tracking method could successfully track the truck even after
experiencing the occlusion, which by that point was 70 pixels wide,
or approximately 50% of the truck’s length.
Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

are partially or fully occluded. More details regarding prediction Compared with other methods of tracking construction resour-
with Kalman filters can be found in the work of Yang et al. ces, the method proposed in this paper requires no specific offline
(2011) and Zhu et al. (2015). However, additional investigations training. It represents targets of interest to be tracked by a set of
are needed to determine how to seamlessly integrate Kalman filters particles. Then the tracking is based on predicting and updating
into the proposed tracking framework. The main focus of this paper these particles. As shown in the test results, the method could
is on designing particle filters to track workers and mobile equip- be generalized to track different types of construction resources
ment when they encounter occlusions at construction jobsites. at jobsites that have various shapes, colors, and textures. The
The proposed tracking framework would continue tracking an generic nature of the method could enable its use for tracking con-
object even if the object has been occluded for a long period of struction resources, such as workers and mobile equipment, even
time. When an object to be tracked encounters occlusions, the when their poses and orientations are changed. For example, the
tracking stays where the object is occluded. It keeps looking for particles on the workers in Figs. 5–7 and Figs. 8 and 9 could be
the object in the adjacent regions until the object reappears. How- automatically updated, so that the workers are tracked even when
ever, if the occlusions are large, then the object of interest might be they turn around. Another example is shown in Fig. 20. The worker

Method of Ross et
al. (2008) Method of Ross et
Method in this al. (2008)
paper
Method in this
paper

Video frame 0001 Video frame 0060

Method of Ross
et al. (2008)
Method of Ross et
al. (2008)
Method in this
paper
Method in this
paper

Video frame 0090 Video frame 0120

Fig. 18. Comparison between method of Ross et al. (2008) and proposed method

© ASCE 04016023-12 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

Fig. 19. Occlusion extent test

Video frame 0001 Video frame 0075 Video frame 0150

Video frame 0225 Video frame 300 Video frame 375

Fig. 20. Tracking a worker in different poses

of interest could be tracked when working or standing on the motion models in the paper, other object motion models have been
construction jobsite. developed by other researchers. For example, Nait-Charif and
The proposed method relies on motion models to predict the McKenna (2003) and Pérez et al. (2004) created general ran-
positions of particles. The two assumptions in the motion models dom-walk models, and Bar-Shalom and Li (1993) created constant
are based on the fact that construction site videos are captured at 30 acceleration models. The proposed method could be easily ex-
frames per second (fps). As a result, the transition of workers or tended to incorporate these motion models and test their effective-
mobile equipment between two consecutive frames is not fast. ness at tracking construction workers and mobile equipment at
In such a short period of time (within 0.03 s), changes in the veloc- construction jobsites. Also, the two motion models in the proposed
ity of the worker or equipment are ignored. For this reason, it is tracking framework might be modified to enlarge the tracking
assumed that the particles remain static or the velocity of the par- search regions when an object is occluded. This might resolve
ticle in a short period of time is constant. In addition to the two the issue of large occlusions.

© ASCE 04016023-13 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


Also, the proposed method does not adopt an online learning productivity analysis of construction operations.” Autom. Constr.,
strategy. The incremental online learning updates the template of 20(8), 1211–1226.
a construction object of interest to be tracked. It should improve Han, S., and Lee, S. (2013). “A vision-based motion capture and recogni-
the tracking robustness to environmental changes. However, the tion framework for behavior-based safety management.” Autom.
Constr., 35, 131–141.
comparison example showed that online learning might not help
Han, S., Lee, S., and Pena-Mora, F. (2013). “Vision-based detection of un-
in resume tracking. This is especially true under cluttered construc- safe actions of a construction worker: Case study of ladder climbing.” J.
tion scenarios. Comput. Civ. Eng., 10.1061/(ASCE)CP.1943-5487.0000279, 635–644.
Hess, R., and Fern, A. (2009). “Discriminatively trained particle filters for
complex multi-object tracking.” Proc., IEEE Conf. on Computer Vision
Conclusions and Pattern Recognition, Miami, 240–247.
Hol, J. D., Schön, T. B., and Gustafsson, F. (2006). “On resampling algo-
Compared with the use of RTLSs in tracking workers and mobile rithms for particle filters.” Proc., IEEE Nonlinear Statistical Signal
equipment at construction jobsites, visual tracking has a major ad- Processing Workshop, IEEE, New York, 79–82.
vantage in that it does not require the installation of physical tags on Kitagawa, G. (1996). “Monte Carlo filter and smoother for non-Gaussian
Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

workers or equipment. Therefore, it has recently attracted consid- nonlinear state space models.” J. Comput. Graph. Stat., 5(1), 1–25.
erable research interest. This paper presented a method of visually Lu, W.-L., Okuma, K., and Little, J. (2009). “Tracking and recognizing
tracking construction workers and mobile equipment. The method actions of multiple hockey players using the boosted particle filter.”
J. Image Vision Comput., 27(1–2), 189–205.
is based on particle filtering techniques. The effectiveness of the
Mathes, T., and Piater, J. H. (2006). “Robust non-rigid object tracking using
method was tested with videos collected from real construction
point distribution manifolds.” Proc., 28th Annual Symp. of the German
jobsites in Quebec, Canada, to track workers, rollers, trucks, and Association for Pattern Recognition (DAGM), Vol. 4174, Springer,
dozers. Berlin, 515–524.
The test results were compared with ground truths manually MATLAB [Computer software]. MathWorks, Natick, MA.
labeled in the videos to indicate tracking precision and spatial Memarzadeh, M., Golparvar-Fard, M., and Niebles, J. (2013). “Automated
overlap. The results indicated that the average center location error 2D detection of construction equipment and workers from site video
between tracking results and ground truths could range from 8.6 streams using histograms of oriented gradients and colors.” Automat.
pixels to 19.7 pixels. The tracking spatial overlap ranged from Constr., 32, 24–37.
50 to 78%, even when the worker or equipment experienced severe Memarzadeh, M., Heydarian, A., Golparvar-Fard, M., and Niebles, J.
occlusions by temporary structures or as-built facilities, for exam- (2012). “Real-time and automated recognition and 2D tracking of
ple, at cluttered construction jobsites. Moreover, the proposed construction workers and equipment from site video streams.” Int. Conf.
on Computing in Civil Engineering, ASCE, Reston, VA, 429–436.
tracking method was compared with the method of Ross et al.
Mihaylova, L., Brasnett, P., Canagarajan, N., and Bull, D. (2007). “Object
(2008). It was found that the former could be used to facilitate tracking by particle filtering techniques in video sequences.” Advances
the recovery of tracking at construction jobsites even without in- and challenges in multisensor data and information. NATO security
cremental online learning. Future work will focus on two aspects. through science series, Vol. 8, IOS Press, Netherlands, 260–268.
First, the proposed tracking method will be extended to simultane- Nait-Charif, H., and McKenna, S. (2003). “Tracking poorly modelled
ously track multiple construction objects. Second, more motion motion using particle filters with iterated likelihood weighting.” Proc.,
models for construction objects will be created, customized, and Asian Conf. on Computer Vision, 156–161.
incorporated into the proposed tracking method. Nasr, E., Shehab, T., and Vlad, A. (2013). “Tracking systems in construc-
tion: Applications and comparisons.” 〈http://ascpro.ascweb.org/chair/
paper/CPGT11002013.pdf〉 (Sep. 10, 2015).
Park, M. W., and Brilakis, I. (2012). “Construction worker detection in
Acknowledgments
video frames for initializing vision trackers.” Autom. Constr., 28,
15–25.
This paper is based in part on work supported by the National
Park, M.-W., Makhmalbaf, A., and Brilakis, I. (2011). “Comparative study
Science and Engineering Research Council (NSERC) of Canada.
of vision tracking methods for tracking of construction site resources.”
The authors would also like to acknowledge the support of the Roc- Automat. Constr., 20(7), 905–915.
cabella project developer for kindly allowing authors to record Pérez, P., Vermaak, J., and Blake, A. (2004). “Data fusion for tracking with
jobsite activities for this research. Any opinions, findings, and con- particles.” Proc. IEEE, 92(3), 495–513.
clusions or recommendations expressed in this paper are those of Prisacariu, V., and Reid, I. (2009). “fastHOG—A real-time GPU imple-
the authors and do not necessarily reflect the views of the NSERC. mentation of HOG.” Technical Rep. 2310/09, Dept. of Engineering
Science, Oxford Univ., Oxford, U.K.
Rao, G. M., and Satyanarayana, C. (2013). “Visual object target tracking
References using particle filter: A survey.” Int. J. Images Graphics Signal Process.,
5(6), 57–71.
Ababsa, F. (2010). “Real-time camera tracking for structured environment Rezazadeh Azar, E., Dickinson, S., and McCabe, B. (2013). “Server-
using an iterated particle filter.” IEEE Int. Conf. on Systems Man and customer interaction tracker: Computer vision–based system to estimate
Cybernetics (SMC), IEEE, New York, 3039–3044. dirt-loading cycles.” J. Constr. Eng. Manage., 10.1061/(ASCE)CO
Aherne, F. J., Rockett, P. I., and Thacker, N. A. (1997). “Automatic param- .1943-7862.0000652, 785–794.
eter selection for object recognition using a parallel multiobjective Rezazadeh Azar, E., and McCabe, B. (2012). “Automated visual recogni-
genetic algrorithm.” Int. Conf. on Computer Analysis of Images and tion of dump trucks in construction videos.” J. Comput. Civ. Eng., 10
Patterns (CAIP’97), Kiel, Germany. .1061/(ASCE)CP.1943-5487.0000179, 769–781.
Bar-Shalom, Y., and Li, X. (1993). Estimation and tracking: Principles, Ross, D., Lim, J., Lin, R.-S., and Yang, M.-H. (2008). “Incremental learn-
techniques and software, Artech House, Boston. ing for robust visual tracking.” Int. J. Comput. Vision, 77(1), 125–141.
Brasnett, P., Mihaylova, L., Canagarajah, N., and Bull, D. (2007). “Sequen- Salmond, D., and Gordon, N. (2005). “An introduction to particle filters.”
tial Monte Carlo tracking by fusing multiple cues in video sequences.” 〈http://dip.sun.ac.za/∼herbst/MachineLearning/ExtraNotes/
J. Image Vision Comput., 25(8), 1217–1227. ParticleFilters.pdf〉 (Apr. 14, 2015).
Gong, J., and Caldas, C. H. (2011). “An object recognition, tracking, and Shan, C., Tan, T., and Wei, Y. (2007). “Real-time hand tracking using a
contextual reasoning-based video interpretation method for rapid mean shift embedded particle filter.” Pattern Recog., 40(7), 1958–1970.

© ASCE 04016023-14 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023


Szczodrak, M., Dalka, P., and Czyzewski, A. (2010). “Performance evalu- Yang, J., Park, M. W., Vela, P., and Golparvar-Fard, M. (2015). “Construction
ation of video object tracking algorithm in autonomous surveillance performance monitoring via still images, time-lapse photos, and
system.” Proc., IEEE 2nd Int. Conf. on Information, Gdansk, Poland, video streams: Now, tomorrow, and the future.” Adv. Eng. Inf., 29(2),
34–31. 211–224.
Teizer, J., and Vela, P. A. (2009). “Personnel tracking on construction sites Zheng, W., and Bhandarkar, S. (2009). “Face detection and tracking using a
using video cameras.” Adv. Eng. Inform., 23(4), 452–462. boosted adaptive particle filter.” J. Visual Commun. Image Represent.,
Tomasi, C., and Kanade, T. (1991). “Detection and tracking of point 20(1), 9–27.
features.” Technical Rep. CMU-CS-91-132, Carnegie Mellon Univ., Zhu, Z., Park, M. W., Koch, C., Soltani, M., Hammad, A., and Davari, K.
Pittsburgh. (2015). “Predicting movements of onsite workers and mobile
Vermaak, J., Lawrence, N. D., and Perez, P. (2003). “Variational inference equipment for enhancing construction site safety.” Autom. Constr.,
for visual tracking.” IEEE Computer Society Conf. on Computer Vision in press.
and Pattern Recognition, Vol. 1, IEEE, New York, I-773–I-780. Zhu, Z., Ren, X., and Chen, Z. (2016a). “Dozer_tracking_results.” 〈https://
Wang, Z., Yang, X., Xu, Y., and Yu, S. (2009). “CamShift guided particle figshare.com/articles/Dozer_tracking_results/2065428〉 (Jan. 19, 2016).
filter for visual tracking.” Pattern Recog. Lett., 30(4), 407–413. Zhu, Z., Ren, X., and Chen, Z. (2016b). “Roller_tracking_results.” 〈https://
Weerasinghe, I., and Ruwanpura, J. (2009). “Automated data acquisition figshare.com/articles/Roller_tracking_results/2065416〉 (Jan. 19, 2016).
Downloaded from ascelibrary.org by New York University on 06/28/16. Copyright ASCE. For personal use only; all rights reserved.

system to assess construction worker performance.” Construction Zhu, Z., Ren, X., and Chen, Z. (2016c). “Truck_tracking_results.” 〈https://
Research Congress, ASCE, Seattle, 61–70. figshare.com/articles/Truck_tracking_results/2065422〉 (Jan. 19, 2016).
Wei, Q., Xiong, Z., Li, C., Ouyang, Y., and Sheng, H. (2011). “A robust Zhu, Z, Ren, X., and Chen, Z. (2016d). “Worker_tracking_results_1.”
approach for multiple vehicles tracking using layered particle filter.” Int. 〈https://figshare.com/articles/Worker_tracking_results_1_avi/2064291〉
J. Electron. Commun., 65(7), 609–618. (Jan. 19, 2016).
Yang, J., Arif, O., Vela, P. A., Teizer, J., and Shi, Z. (2010). “Tracking Zhu, Z., Ren, X., and Chen, Z. (2016e). “Worker_tracking_results_2.”
multiple workers on construction sites using video cameras.” Adv. 〈https://figshare.com/articles/Worker_tracking_results_2/2065410〉
Eng. Inf., 24(4), 428–434. (Jan. 19, 2016).
Yang, J., Cheng, T., Teizer, J., Vela, P. A., and Shi, Z. K. (2011). “A per- Zou, J., and Kim, H. (2007). “Using hue, saturation, and value color space
formance evaluation of vision and radio frequency tracking methods for for hydraulic excavator idle time analysis.” J. Comput. Civ. Eng., 10
interacting workforce.” Adv. Eng. Inf., 25(4), 736–747. .1061/(ASCE)0887-3801(2007)21:4(238), 238–246.

© ASCE 04016023-15 J. Comput. Civ. Eng.

J. Comput. Civ. Eng., 04016023

You might also like