Professional Documents
Culture Documents
https://doi.org/10.1108/CI-10-2015-0054
Downloaded on: 21 March 2018, At: 22:32 (PT)
References: this document contains references to 48 other documents.
To copy this document: permissions@emeraldinsight.com
The fulltext of this document has been downloaded 336 times since 2016*
Users who downloaded this article also downloaded:
(2016),"Information technology and safety: Integrating empirical safety risk data with building
information modeling, sensing, and visualization technologies", Construction Innovation, Vol. 16
Iss 3 pp. 323-347 <a href="https://doi.org/10.1108/CI-09-2015-0047">https://doi.org/10.1108/
CI-09-2015-0047</a>
(2016),"Classifying construction site photos for roof detection: A machine-learning method towards
automated measurement of safety performance on roof sites", Construction Innovation, Vol. 16
Iss 3 pp. 368-389 <a href="https://doi.org/10.1108/CI-10-2015-0052">https://doi.org/10.1108/
CI-10-2015-0052</a>
Access to this document was granted through an Emerald subscription provided by emerald-
srm:580444 []
For Authors
If you would like to write for this, or any other Emerald publication, then please use our Emerald
for Authors service information about how to choose which publication to write for and submission
guidelines are available for all. Please visit www.emeraldinsight.com/authors for more information.
About Emerald www.emeraldinsight.com
Emerald is a global publisher linking research and practice to the benefit of society. The company
manages a portfolio of more than 290 journals and over 2,350 books and book series volumes, as
well as providing an extensive range of online products and additional customer resources and
services.
Emerald is both COUNTER 4 and TRANSFER compliant. The organization is a partner of the
Committee on Publication Ethics (COPE) and also works with Portico and the LOCKSS initiative for
digital archive preservation.
CI
16,3
Tracking-based 3D human
skeleton extraction from stereo
video camera toward an on-site
348 safety and ergonomic analysis
Received 17 October 2015 Meiyin Liu
Revised 12 January 2016
29 March 2016 Civil and Environmental Engineering Department, University of Michigan,
8 April 2016 Ann Arbor, Michigan, USA
Accepted 12 April 2016
SangUk Han
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
Abstract
Purpose – As a means of data acquisition for the situation awareness, computer vision-based motion
capture technologies have increased the potential to observe and assess manual activities for the
prevention of accidents and injuries in construction. This study thus aims to present a computationally
efficient and robust method of human motion data capture for the on-site motion sensing and analysis.
Design/methodology/approach – This study investigated a tracking approach to three-
dimensional (3D) human skeleton extraction from stereo video streams. Instead of detecting body joints
on each image, the proposed method tracks locations of the body joints over all the successive frames by
learning from the initialized body posture. The corresponding body joints to the ones tracked are then
identified and matched on the image sequences from the other lens and reconstructed in a 3D space
through triangulation to build 3D skeleton models. For validation, a lab test is conducted to evaluate the
accuracy and working ranges of the proposed method, respectively.
Findings – Results of the test reveal that the tracking approach produces accurate outcomes at a distance, with
nearly real-time computational processing, and can be potentially used for site data collection. Thus, the proposed
approach has a potential for various field analyses for construction workers’ safety and ergonomics.
Originality/value – Recently, motion capture technologies have rapidly been developed and studied
in construction. However, existing sensing technologies are not yet readily applicable to construction
environments. This study explores two smartphones as stereo cameras as a potentially suitable means
of data collection in construction for the less operational constrains (e.g. no on-body sensor required, less
sensitivity to sunlight, and flexible ranges of operations).
Keywords Ergonomics, Construction safety, Stereo vision, Computer vision,
3D human skeleton extraction, Motion tracking
Paper type Research paper
Construction Innovation
Vol. 16 No. 3, 2016
pp. 348-367 TheworkpresentedinthispaperwassupportedfinanciallywithaNationalScienceFoundationAward(No.
© Emerald Group Publishing Limited
1471-4175
CMMI-1161123). Any opinions, findings and conclusions or recommendations expressed in this paper are
DOI 10.1108/CI-10-2015-0054 those of the authors and do not necessarily reflect the views of the National Science Foundation.
Introduction Tracking-
The construction industry is labor-intensive, requiring labor force as one of the major based 3D
resources for production. In the USA, 11.1 million employees worked in the construction
industry, which accounted for about 7 per cent of the overall US workforce in 2010
human
(CPWR, 2013). From a cost perspective, labor cost also often forms 33-50 per cent of the skeleton
total project cost in construction (Hanna, 2001; Siriwardana and Ruwanpura, 2012). In
this regard, efficient planning, monitoring and controlling of the on-site employees are 349
key to the success of construction projects. In fact, recent studies in construction have
introduced the automated monitoring of construction workers by addressing the
problems with existing observation methods, which are time-consuming and
painstaking for human observers in their daily practice. (Yang et al., 2010; Gong et al.,
2011a, 2011b; Ray and Teizer, 2012; Cheng et al., 2013; Memarzadeh et al., 2013; Han and
Lee, 2013; Han et al., 2014).
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
One of the major techniques used for such monitoring is based on computer vision.
Computer vision-based worker monitoring provides several advantages. It eliminates
the need to attach a sensor or a tag to a human worker, and does not interfere with the
workers’ on-going tasks (Moeslund et al., 2006) and the provision of rich context for
behavior-related analysis (Gong and Caldas, 2010). In the previous vision-based studies,
various types of motion representations (e.g. color model, silhouettes, quasi-skeleton
and three-dimensional [3D] skeletons) have been proposed depending upon the purpose
of monitoring (e.g. productivity measurements by Gong and Caldas (2011, 2011b),
unsafe action detection by Han and Lee (2013), ergonomic analysis by Ray and Teizer
(2012) and Seo et al. (2014). In general, coarse motion representations, such as simple
silhouettes of a human subject, are relatively easy and stable to capture but contain
less-detailed information about body configurations and postures, thus in turn making
it hard and limited to understand the context of worker behavior. On the other hand, fine
motion representations – for instance, 3D human skeletal models – are predominant in
construction research on vision-based workers’ safety and ergonomics because they
allow for the diverse types of motion analysis, such as activity recognition (Kim and
Caldas, 2013; Han et al., 2014), posture analysis (Ray and Teizer, 2012) and
biomechanical analysis (Seo et al., 2013). Particularly, taking into account that
continuous exposure to physically demanding activities (e.g. lifting, carrying materials,
bending or twisting, repetitive and forceful hand activity) are the main causes for
work-related ergonomic injuries – the leading type of non-fatalities [Bureau of Labor
Statistics (BLS) 2014]. The 3D skeletal models, if available, can provide rich information
on human movements (e.g. angular information for ergonomic and biomechanical
analysis). It remains consistent with the fact that a biomechanical analysis requires joint
angles as input data (Seo et al., 2013). To ultimately automate an on-site ergonomic and
biomechanical analysis, it is essential to obtain joint angles from 3D skeleton. This
research focuses on 3D skeleton extraction in a cost-effective and computation-efficient
manner with less operation constraints (e.g. a longer range and operable in outdoor
environment) toward on-site applications for worker monitoring.
There are several ways to obtain the 3D skeleton models. For example, red, green,
blue plus depth (RGB-D) sensors that capture RGB images along with depth information
(e.g. Microsoft Kinect) and marker-based motion capture systems (e.g. Vicon and
Optotrak) are commonly used (Ray and Teizer, 2012; Escorcia et al., 2012; Han et al.,
2013, 2014). Yet, these commercial systems are not fully applicable to on-site motion
CI capture. The detailed analysis on the pros and cons for each approach regarding
16,3 potential on-site application will be illustrated in the next background section (Table I).
This paper thus proposes a stereo vision system for the 3D pose estimation that
tracks positions of body joints over two-dimensional (2D) image frames and extracts 3D
human skeletal models from multiple view image sequences. Utilizing both spatial and
temporal information from human figure across video frames, the 3D pose estimation
350 can be accelerated significantly by continuously updating training images over frame
and reducing search space for the detection. To evaluate performance of the proposed
approach, a lab test is conducted, in which a commercial motion capture system is used
as ground truth. The following sections review various types of motion capture systems,
present technical details of the tracking-based motion capture, describe the test settings,
and report and discuss the results.
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
On-body sensors
Using sensors (e.g. inertial measurement units, magnetic sensors, goniometer etc.)
directly attached to the body parts of interest, an on-body sensor system enables reliable
and robust motion data capture. A variety of applications could take advantage of this
emerging approach, such as rehabilitation, sports science and medicine, geriatric care,
and health and fitness monitoring (Albinali et al., 2009). It has seldom been applied in the
On-body sensors Reliable data source, free of impact Invasive to on-going task, complex
from visual occlusion, illumination installation and operation
change
Range sensors
Structured light sensor Non-invasive to on-going work Limited active range, indoor
application
Time-of-flight camera Non-invasive, outdoor application Low resolution
RGB sensors
Monocular camera Non-invasive to on-going work, Computational complexity,
Table I. outdoor application, long active interference from occlusion and
Comparison of range, simple installation and illumination
existing human operation
motion capture Stereo camera Non-invasive to on-going work, High computation expense,
approaches adopted outdoor application, long active interference from occlusion and
in construction range illumination
construction industry because the attached sensors would interfere with workers’ Tracking-
on-going work. For lab test or education and training purposes, on-body sensors have based 3D
been utilized in construction (Alwasel et al., 2013; Chen et al., 2014).
human
Range sensors
skeleton
Depth images generated by a range sensor have become popular in recent motion
capture studies, for the depth information (i.e. 3D positions) which may help address 351
issues on complex human body segmentation and ambiguities between different poses
under similar appearance (Ganapathi et al., 2010). Two types of range sensors
categorized according to depth measurement techniques have been developed and
utilized as follows:
(1) Structured light sensor: One of range sensors, Microsoft Kinect, has become a
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
very popular device in human motion capture studies because of its low price
and off-the-shelf applicability. A Kinect device consists of an RGB camera and a
depth sensor. The depth information is computed based on the distortions of
structured lights which project a known pattern of infrared light dots onto the
scene (Chen et al., 2013). This type of sensor has been limited to use in lab settings
(Escorcia et al., 2012) or indoor construction sites (Khosrowpour et al., 2014).
(2) Time-of-flight camera: A time-of-flight camera measures the travel time of light
signals between the light source and the reflecting object. This method is known
to work in an outdoor condition, because it is less sensitive to the sunlight and
can operate in a relatively wide range – up to 10 m (Chen et al., 2013). Recently,
Leone et al. (2010) extracted human skeletons from 3D data clouds captured by a
time-of-flight camera for fall detection; and Diraco et al. (2013) used the extracted
skeleton for posture analysis.
To estimate 3D spatial information of the skeleton, the proposed approach first 353
initializes the positions of individual body joints in one of the stereo images and tracks
them over the following frames. Prior to recognizing the corresponding 2D skeleton in
the other stereo image, the stereo camera parameters are set and obtained through stereo
calibration. The camera parameters and feature matching between two stereo images
help identify dual 2D skeletons that characterize the projection of a common skeleton in
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
3D space on stereo images. Again with camera parameters, the corresponding body joint
locations on 2D images from two views are triangulated to calculate the 3D positions
and build 3D human skeleton models. The detailed descriptions of each process are now
discussed.
For the tracking of an object, a tracker model that well represents the appearances of the
object needs to be carefully selected among available models, such as a single point,
multiple points, patch, contour, silhouette and so on. Because the location and movement
Frame x
Feature
Descriptor
Frame 2
Frame 1
Initializing
Tracking trackers Constructing 2D Matching skeletons Reconstructing 3D
individual joint
over frames skeletons over frames in stereo images skeleton
trackers
Camera 2
Figure 1.
Overview of
tracking-based 3D
skeleton extraction
from multiple view
Calibrating stereo images
cameras
CI of a body joint are essentially characterized by its centroid, a point tracker suitable to
16,3 localize joint centroid is selected. A point tracker is generally simple, computationally
inexpensive and less sensitive to the camera view, compared to other representations
(e.g. patch). In this study, the joint location is initialized on the first frame by detection
algorithms (Yang and Ramanan, 2011) or user input, and then its location in the
successive frame is estimated by a similarity measure between feature descriptors of
354 consecutive frames about the target. The appearance of the target and its nearest
region could be described using color space values, such as RGB values. However,
identical colors perceived by humans do not have the same values in an RGB space, and
the three components are also highly correlated (Paschos, 2001). In this respect, hue,
saturation and value (HSV) would be a suitable option which provides more
perceptually uniform color space values (Yilmaz et al., 2006), and thus is selected to
reduce the difference of color space values of the same point in different frames. In
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
Homography
(H)
x1 x1’
Figure 2.
x2 x2’ Homography
between stereo
x’ = H · x images
CI
16,3 Camera 1
356
Image
Rectification
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
Camera 2
Figure 3.
An example of image
Image
matching after Rectification
rectification
X1
X2
x1’
x1
x2’
x2
Figure 4.
X=f (x, x’) 3D body joints
triangulation
CI
16,3
358
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
Figure 5.
Testing setup
Figure 6.
Sample results of 2D
skeleton tracking
CI The temporal misalignment, if not calibrated, will lead to the failure of reconstructing
16,3 3D skeletons from dual 2D ones. In this test, the two video streams are manually
synchronized afterward during data analysis by observing the common visual cues
presented in the frames, for instance, clapping of the human subject before and after
conducting the actions. Through this temporal synchronization of stereo video streams,
extracted 2D skeleton on right-eye image can infer the corresponding 2D skeleton on
360 left-eye image captured simultaneously.
To triangulate the dual 2D skeletons for the 3D skeleton, the stereo cameras are
initially calibrated using the MATLAB stereo calibration tool (Computer Vision System
Toolbox, MATLAB R2015b) – the snapshots of calibration tasks are illustrated in
Figure 7. Given the stereo camera parameters from calibration, the 3D skeleton can be
obtained by computing 3D locations of the 14 major body joints. Figure 8 shows a 3D
skeleton generated from the testing data, as an example of qualitative result.
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
Performance evaluation
The performances of tracking-based motion capture are evaluated in terms of:
• bone lengths between body joints; and
• rotation angles of body parts at the joints.
The bone lengths of each frame are measured by the tracking-based method in the
Euclidean distance between major body joints (e.g. body part IDs 1 to 10 in Figure 9) and
compared to the corresponding bone length of each frame measured by the Optotrak.
Particularly, the measurement is compared to the result of detection-based 3D pose
estimation reported in Han and Lee (2013) to assess the two different approaches
overall – namely, tracking and detection – although the data sets used in the studies are
different.
A rotation angle is a measurement widely used in commercial motion capture
systems (e.g. Vicon and Kinect) to characterize human postures (Meredith and Maddock,
2000). It is defined by the Euler rotation angle between particular body part’s directional
vector (e.g. right forearm) at a certain time (e.g. Tx) and the vector of identical body part
at an initial time (e.g. T0) when the body configuration is initially defined (e.g. T-pose)
(Han et al., 2014). Specifically, the rotation angles are commonly defined in a local
Figure 7.
Camera calibration
with a check board:
all the corners of
check boxes are
automatically
detected by
MATLAB stereo
calibration tool
Tracking-
based 3D
human
skeleton
361
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
Figure 8.
Qualitative
experiment result:
reconstructed 3D
human skeleton
10 6
7
8 5
9
3 2
Figure 9.
Body parts indices
for bone length and
4 1 rotation angle
validation
coordinate system based on the hierarchical body structure (Figure 10 (a)); that is, a
rotation angle of a child body part (e.g. Part ID 1 in Figure 10 (a)) is determined according
to the angle of its parent body part (e.g. Part ID 2 in Figure 10 (a)), and the errors at the
end joint may be accumulated along errors of the parent body parts. For the fair
evaluation of measurements at each body joint, a rotation angle in this study is thus
CI defined in a global coordinate system rather than that defined in local coordinate
16,3 system, as shown in Figure 10.
The root-mean-square errors (RMSEs) over all frames between the proposed method
and the commercial motion capture system are measured as a matric for performance
evaluation. The calculation is shown in equation (1), where xexp,i denotes the ith (out of n)
sample in experiment data, and similarly xgt,i denotes that in ground truth:
362
冪兺
n
(xexp,i ⫺ xgt,i )2
RMSE ⫽ (1)
i⫽1
n
1 y
2 x
y
z
y x
2 x
1
Figure 10. x
z
Comparison of the z
definitions of rotation
angles between (a) a
local coordinate z
system and (b) a
global coordinate
system (a) (b)
Table II.
Bone length
Body part ID 1 2 3 4 5 6 7 8 Mean SD
difference
comparison (cm) X 3.5 1.3 3.7 7.6 8.7 9.6 4.9 9.2 6.4 3.2
between detection- Y 12.7 6.5 11.5 7.2 8.5 6.1 9.6 9.8 9.0 2.4
and tracking-based Z 5.0 3.5 6.9 12.3 59.4 17.2 22.4 42.2 21.1 20.0
(proposed)
approaches to Note: The slightly different body configuration is studied in Han and Lee (2013) (i.e. body part ID 8 not
skeleton extraction included)
the self-occlusions caused by bending in squat lifting. Also, even small errors in tracking a Tracking-
body part in a z-axial direction (i.e. an object to a camera) may sensitively magnify the errors based 3D
in 3D triangulation. Generally, the upper body parts have better performance in bone length
measurement than lower body parts because of nearly no occlusion of upper body parts
human
when the subject is conducting actions of lifting and walking. skeleton
Technically, the detection-based method requires large and comprehensive training data
sets for the pose estimation because the detection is performed solely by comparing the 363
testing image with the training images. These data sets include images of various human
subjects, actions, backgrounds and so on, especially captured under various conditions (e.g.
illumination and view angles). In contrast, the tracking-based method is free from the large
amount of training data sets required for the detection and is less sensitive to a diversity of
testing images in appearances. Furthermore, the tracking approach may be computationally
efficient as roughly estimating the potential region of joint positions in each frame based on
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
the localization decision made from the previous frame. This may be one of the reasons that
the tracking-based method has better performance in estimating bone length which depends
on location information of body joints. From the customization perspective, this model-free
approach also provides the user with the flexibility on tracking certain body parts, for
example, upper-body or specific body parts according to the purpose of the study (e.g. upper
limb for ergonomic analysis).
The RMSEs of rotation angles are also measured for the motion data sets of two
actions (i.e. squat lifting and walking), as reported in Table III and IV. Overall, the errors
in X, Y and Z-axes rotations are 6.4, 9.0 and 21.1 degrees, respectively, for squat lifting,
and 7.0, 19.7 and 11.2 degrees respectively, for walking. Despite the extensive
movements of thighs and legs (i.e. body part IDs 1 to 4) during squat lifting, relatively
small errors are found in the experiment, which infer the robustness of the tracking
method. On the other hand, larger errors are mainly caused by forearms (i.e. body part
IDs 5 and 8), which indicate the computational difficulty in tracking the forearms and
hands. Han and Lee (2013) also reported that hands are the most challenging body part
to detect and localize using the detection approach. Through visual inspections for the
error analysis, the low frame rate of 29 fps is found to cause severe blurring around the
body parts that are moving fast, such as forearms in squat lifting. This might be one of
Table III.
Squat lifting: rotation
Body part ID 1 2 3 4 5 6 7 8 9 10 Mean angle comparison
(degree) at body
Detection-based average error (Han and parts between
Lee, 2013) 3.6 5.9 6.3 10.1 10.4 5.7 7.7 8.7 3.0 – 6.3 tracking-based
Tracking-based average error 7.8 3.7 6.6 2.5 1.8 2.9 2.6 4.1 9.2 3.5 3.8 method and Optotrak
364
Conclusion
The method proposed in this paper successfully reduced the computational time to a near
real-time level, thus filling a primary gap that current research about vision-based human
monitoring has faced in the context of construction. The performance is tested by comparing
the measured bone lengths and rotation angles at joints with the ones from a commercial
motion capture system. The results reveal that the tracking approach better performs the 3D
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
pose estimation than the detection approach in terms of bone length measurements and
rotation angles that are not reported in the previous study. Additionally, this method used
two smartphone cameras to generate stereo video streams and perform 3D human skeleton
reconstruction. Even though the utilization of smart phones for monitoring on-site workers
might be faced with some issues, the experimental results revealed that it can obtain good
accuracy and holds certain potential for future on-site application. Notably, the proposed
approach does not utilize any marker or sensor attached to a human body and has less
operational constraints rather than range sensors.
The motion data that the proposed method can produce from video streams can
potentially be used for behavior monitoring, ergonomic assessment and productivity
analysis as studied in prior work. The technological system can automatically extract 3D
human skeletons that serve as crucial input of ergonomic and safety analysis. It can be
integrated into diverse frameworks which depend on the skeleton configuration or partial
information extracted from it (e.g. joint angles, anthropometry). For example, Ray and Teizer
(2012) proposed an ergonomic analysis framework which utilizes 3D skeleton models for
posture classification (e.g. standing, bending), pose estimation (e.g. body joints location and
angle) and rule-based ergonomic assessment. In addition, the proposed method can also be
integrated into the unsafe behavior detection framework (Han et al., 2014), which compares
the pattern similarity of motion data extracted from motion sensors to detect unsafe actions
similar to a pre-defined motion template of the action. In addition, motion data (e.g. body joint
angles) derived from the skeleton can be used for biomechanical analysis. Seo et al. (2014)
performed the posture assessment to identify body parts enduring forceful exertion by using
motion data captured from range sensor and force data. These examples, in which motion
capture data (i.e. 3D skeletal models) are used, are the potential applications to which the
proposed method is readily available.
The proposed tracking method does not require a significant effort in collecting images to
train detection algorithms; instead, the tracking approach can continuously update
changing appearances of the tracker over frames. By so doing, however, initial localizations
of the trackers on the first frame when incorrectly performed can be the major source of
errors. Also, missing the trackers in the tracking process – for instance, by occlusions – can
be another critical factor affecting the performance. Accordingly, further research is required
to test the introduced method’s performance by taking videos at different view angles,
especially ones causing greater self-occlusion (e.g. side view). It might provide plausible
improvement to the performance by integrating detection and tracking approaches via
complementing shortcoming of each. In addition, laboratory testing is selected to assess
performance of the proposed method before the pursuit of a scale-up field test. However, Tracking-
practical issues such as occlusions, illuminations and camera views, may pose technical based 3D
challenges for the vision-based tracking when applied to jobsites. As the next step, the human
proposed method will be tested under less constrained conditions in future studies to ensure
the performance in a field setting.
skeleton
References 365
Albinali, F., Goodwin, M.S. and Intille, S.S. (2009), “Recognizing stereotypical motor movements in
the laboratory and classroom: a case study with children on the autism spectrum”,
Proceedings of the 11th International Conference on Ubiquitous Computing, ACM,
Orlando, FL, pp. 71-80.
Alwasel, A., Elrayes, K., Abdel-Rahman, E. and Haas, C. (2013), “A human body posture sensor for
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
monitoring and diagnosing MSD risk factors”, Proceedings of the 30th International
Symposium on Automation and Robotics in Construction (ISARC), Montreal, pp. 531-539.
Bay, H., Tuytelaars, T. and van Gool, L. (2008), “SURF: speeded up robust features”, Computer
Vision and Image Understanding, Vol. 110 No. 3, pp. 346-359.
Bureau of Labor Statistics (BLS) (2014), Nonfatal Occupational Lnesses Requiring Days Away
From Work, 2013, US Department of Labor, available at: www.bls.gov/news.release/osh
2.nr0.htm (accessed 27 Sepember 2015).
Chen, J., Ahn, C.R. and Han, S. (2014), “Detecting the hazards of lifting and carrying in construction
through a coupled 3D sensing and IMUs sensing system”, International Conference for
Computing in Civil and Building Engineering, ASCE, Reston, VA, pp. 1110-1117.
Chen, L., Wei, H. and Ferryman, J. (2013), “A survey of human motion analysis using depth
imagery”, Pattern Recognition Letters, Vol. 34 No. 15, pp. 1995-2006.
Cheng, T., Migliaccio, G., Teizer, J. and Gatti, U. (2013), “Data fusion of real-time location sensing
and physiological status monitoring for ergonomics analysis of construction workers”,
Journal of Computing in Civil Engineering, Vol. 27 No. 3, pp. 320-335.
CPWR (2013), The Construction Chart Book: The US Construction Industry and Its Workers, 5th
ed., CPWR – The Center for Construction Research and Training, Silver Spring.
Delon, J. and Rougé, B. (2007), “Small baseline stereo vision”, Journal of Mathematical Imaging
and Vision, Vol. 28 No. 3, pp. 209-223.
Diraco, G., Leone, A. and Siciliano, P. (2013), “Human posture recognition with a timeof-flight 3d sensor
for in-home applications”, Expert Systems with Applications, Vol. 40 No. 2, pp. 744-751.
Escorcia, V., Davila, M.A., Golparvar-Fard, M. and Niebles, J.C. (2012), “Automated vision-based
recognition of construction worker actions for building interior construction operations
using RGBD cameras”, paper presented at Construction Research Congress, West
Lafayette.
Fischler, M.A. and Bolles, R.C. (1981), “Random sample consensus: a paradigm for model fitting
with applications to image analysis and automated cartography”, Communications of the
ACM, Vol. 24 No. 6, pp. 381-395.
Ganapathi, V., Plagemann, C., Koller, D. and Thrun, S. (2010), “Real time motion capture using a
single time-of-flight camera”, IEEE Conference on Computer Vision and Pattern
Recognition, San Francisco, pp. 755-762.
Gong, J. and Caldas, C.H. (2010), “Computer vision-based video interpretation model for automated
productivity analysis of construction operations”, Journal of Computing in Civil
Engineering, Vol. 24 No. 3, pp. 252-263.
CI Gong, J. and Caldas, C.H. (2011a), “An object recognition, tracking, and contextual
reasoning-based video interpretation method for rapid productivity analysis of
16,3 construction operations”, Automation in Construction, Vol. 20 No. 8, pp. 1211-1226.
Gong, J., Caldas, C.H. and Gordon, C. (2011b), “Learning and classifying actions of construction
workers and equipment using Bag-of-Video-Feature-Words and Bayesian network
models”, Advanced Engineering Infomatics, Vol. 25 No. 4, pp. 771-782.
366 Han, S. and Lee, S. (2013), “A vision-based motion capture and recognition framework for
behavior-based safety management”, Automation in Construction, Vol. 35 No. 1, pp. 131-141.
Han, S., Lee, S. and Peña-Mora, F. (2014), “Comparative study of motion features for
similarity-based modeling and classification of unsafe actions in construction”, Journal of
Computing in Civil Engineering, Vol. 28 No. 1.
Hanna, A.S. (2001), Quantifying the Cumulative Impact of Change Orders for Electrical and Mechanical
Downloaded by Birmingham City University At 22:32 21 March 2018 (PT)
Corresponding author
SangHyun Lee can be contacted at: shdpm@umich.edu
For instructions on how to order reprints of this article, please visit our website:
www.emeraldgrouppublishing.com/licensing/reprints.htm
Or contact us for further details: permissions@emeraldinsight.com