You are on page 1of 14

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/224091870

Hierarchical pose classification based on human physiology for behaviour


analysis

Article · April 2010


DOI: 10.1049/iet-cvi.2008.0086 · Source: IEEE Xplore

CITATIONS READS

4 6,209

5 authors, including:

Vivek Maik David T Paik


SRM Institute of Science and Technology Stanford University
78 PUBLICATIONS   295 CITATIONS    42 PUBLICATIONS   1,435 CITATIONS   

SEE PROFILE SEE PROFILE

Joonki Paik
Chung-Ang University
531 PUBLICATIONS   6,052 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Blind Deconvolution and Super Resolution Problems View project

Low-light image restoration method View project

All content following this page was uploaded by Vivek Maik on 25 April 2017.

The user has requested enhancement of the downloaded file.


www.ietdl.org

Published in IET Computer Vision


Received on 11th December 2008
Revised on 23rd April 2009
doi: 10.1049/iet-cvi.2008.0086

ISSN 1751-9632

Hierarchical pose classification based on


human physiology for behaviour analysis
V. Maik1 D.T. Paik2 J. Lim1 K. Park3 J. Paik1
1
Image Processing and Intelligent Systems Lab, Graduate School of Advanced Imaging Science, Multimedia and Film,
Chung Ang University, Seoul, South Korea
2
Boston University, One Silber Way, Boston, MA 02215, USA
3
Motion Graphics Lab, Graduate School of Advanced Imaging Science, Multimedia and Film, Chung Ang University, Seoul,
South Korea
E-mail: vivek5681@wm.cau.ac.kr

Abstract: This study presents a new approach to classify human body poses by using angular constraints and
variations of body joints. Although different classifications of the poses have been previously made, the
proposed approach attempts to create a more comprehensive, accurate and extensible classification by
integrating all possible poses based on angles of movement in human joints. The angular variations in all body
joints can determine any possible poses. The joint angles from the body axis are computed in the three-
dimensional space. In order to train and classify the pose in an automated manner, support vector machines
(SVMs) were used. Experiments were carried out on both benchmark (CMU dataset) and in-house simulated
(POSER dataset) poses to evaluate the performance of the proposed classification scheme.

1 Introduction different dimensions and consequently classify a group of


poses based on the collection of the joints’ angle data. In
Human body, an intricate biological structure, is capable of the proposed classification scheme, human body is
making a wide gamut of poses with both innate and represented in a total of 23 joints and the angular
acquired flexibility and mobility. The number of creatable movements of the target joints are defined along three
human poses by a unique individual is not countable. orthogonal axes. Through experiments, the data of
Therefore a well-structured, comprehensive algorithm of minimum and maximum angles of each joint in each axis
pose classification is essential for many fields of have been obtained. These 23 joints were divided into three
applications, such as intelligent humanoid robots, human – levels according to their role and flexibility in pose
computer interaction (HCI), behaviour analysis, multimedia formation. Each of these three levels is further divided into
application, sports science etc. In computer vision and major and minor joints which are defined by the level of a
robotics, human pose has been represented by the three- joint’s contribution to the overall shape or posture of the
dimensional (3D) orientation using a collection of pose. A major joint is primarily responsible for or
landmark points. This effort to mimic the human system, associated with the pose, and even slight difference in this
however, is not sufficient enough to accurately and fully type of joint angles may significantly change the resulting
illustrate a pose generated by human body’s flexibility and pose. A minor joint is less critical in defining a pose, and
mobility. In this paper, we present a physiology-based variations in its angles will not change the general category
human method which integrates visual information with a of the pose. For example, when classifying a running pose,
model of feasible poses based on the body joint angles. It the major joints are hips, knees and shoulders because even
has been proven by medical experts that every individual the slight changes in these joint angles may alter the pose
joint has a different degree of mobility, which depends on as walking, running sprinting or balloting. The minor
several factors such as age, gender and innate or acquired joints in this example are neck and spinal joints because
body composition [1]. However, we disregard these changes in these joints will not affect the general
variations to provide the general range of joint ranges in classification of the pose.

12 IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24


& The Institution of Engineering and Technology 2009 doi: 10.1049/iet-cvi.2008.0086
www.ietdl.org

A brief overview of human anatomy and joint movement


variations is found in [2, 3]. Recent works on pose
estimation from still images using deformable templates
and shape contexts have been proposed in [4]. Appearance-
based methods, which are associated with the features, can
represent the global human appearance instead of a specific
body component. Similar global feature-based methods
have been proposed using wavelet features and support
vector machines (SVMs) classifier [5, 6]. Since these
algorithms extract the entire human body as a whole
without locating individual body parts, they are more
Figure 1 Schematic of three synovial joints
suitable for applications such as pedestrian detection. Part-
based approach used for detecting and estimating people by a Ball and socket joint
b Hinge joint
various body parts (torso, limbs and head) has been c Pivot joint
proposed in [7, 8]. An alternative approach for pose
estimation first locates the body joints and then computes
and down in one dimension. In this type of joint, a
the pose using inverse kinematics [9]. A 3D pose
cylindrical end of one bone fits into a trough-shaped
estimation method matches features of an input image to a
surface of another bone and the angular movement is
large set of labelled images containing all the possible body
restricted on the single plane. There are also pivot joints on
poses. In this method different image features have been
the neck and forearms that allow rotation around the centre
used including edges, silhouette outlines and shape context
axis. Although a hinge joint moves the elbow in a single
distributors.
direction, it is the pivot joint that allows the radius to
rotate over the ulna, creating a twisting motion. However,
The paper is organised as follows. In Section 2 we present
in this paper, we will ignore the rotation movement of the
the relationship between human poses and joint angles. In
pivot joints because they do not produce a significant pose
Section 3 the hierarchical pose classification method is
variation in the context of pose shapes. Also for the
presented. In Section 4 in-depth analysis of part-based
classification, we will assume the neck joint, which is
joint modelling using SVM and POSER are presented.
technically a compound of a number of small joints, to be a
Sections 5 and 6, respectively, summarise experimental
ball-and-socket joint because of its functional similarity.
results and conclude the paper.
Three different synovial joints are depicted in Fig. 1.

By using 23 joint points, we simplify the body image to a


2 Understanding human pose in set of points and at the same time represent the overall
relation to joint angles detailed variation in joint movements of the body. The 23
landmark points include end points, ball-and-socket joints,
Human pose, in biological terms, is defined as the position hinge joints and vertebrate joint points. Each landmark
and orientation of the human body, with an infinite point is labelled with a letter that represents its joint type
number of different variations allowed by the skeletal and a number in order for us to easily recognise the
framework connected by joints, muscles, ligaments and corresponding joint. For example, the end points indicate
other types of body tissues. The skeletal system has an the landmark points of head, hands and feet. The set of
important role of protecting and shaping the human body, joints, denoted by J, can be defined as
but the body joints, the locations a bone connects to
another, produce mobility and versatility of body shape. In
this section the types and characteristics of different joints J ¼ {Ball & Socket, Hinge, End, Vertebrae}
are discussed with respect to their application to the ¼ {B, H , E, V }
proposed pose classification scheme. Of the three major ¼ {(B1 , . . . , B5 ), (H1 , . . . :, H8 ),
types of joints, fibrous, cartilaginous and synovial, synovial
(E1 , . . . , E5 ), (V1 , . . . , V5 )} (1)
joints are responsible in moving and posturing, and thus in
creating a pose. Of the synovial joints the ball-and-socket
joints are arguably the most complicated joints that allow The two hands are labelled E1 and E2 , the two feet E3 and
radial movements of the limbs in all directions. Located on E4 , and the head E5 . They are usually responsible for
the shoulder and hip, ball-and-socket joints are formed interacting poses of a person. Ball-and-socket joints are
when a spherical head of one bone fits into a round socket labelled with an initial B, as the two shoulders are B1 and
in another. These multi-axial joints allow movement in all B2 , the two hip points B3 and B4 , and the neck point B5 .
directions, as well as the rotation of the bone. Hinge joints They are mainly responsible for the locomotive poses.
are simpler in structure and use a different mechanism. There are total of eight hinge points with the initial H; the
Located in elbows, knees, wrists and ankles, hinge joints elbows are H1 and H2 , the wrists H3 and H4 , the knees H5
move like the mechanical hinges on the door, swinging up and H6 , and finally the ankles H7 and H8 . These joints

IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24 13


doi: 10.1049/iet-cvi.2008.0086 & The Institution of Engineering and Technology 2009
www.ietdl.org

searching through the database with heavy computation.


Similar drawbacks can be countered in other applications.
Hence, we propose a hierarchical pose representation which
can significantly reduce the redundancy and complexity in
the interpretation process based on the above described
joint types and their angular variations. We generalise the
concept of the joint angles to represent any plausible pose
using a hierarchical structure with four different types of
joints. We define those types as B, H, E and V, which are
classified into three hierarchy levels given as

Level 1 ¼ L1 ¼ {B, H }
Level 2 ¼ L2 ¼ {E}
(2)
and
Level 3 ¼ L3 ¼ {V }

Level 1 (L1) joints cover basic poses and they are involved in
almost every pose that human body can represent. Any pose
classification using the proposed method begins with L1
joints and inference results are either combined with other
levels to make a final interpretation on a particular test
pose. Major poses, such as standing, walking, running,
jumping and jogging, can be interpreted by only L1 joints
Figure 2 Human body joints representation and their since they fall into the category of the primary pose which
maximum and minimum angular variation depends on only joints B and H. More examples of poses
associated with joint angles are shown in Table 1. Level 2
(L2) joints cover functional and end joints (E). E joints are
account for pushing, pulling and picking actions again primarily responsible for a finer representation of the given
responsible for interacting in a daily life. The five vertebrate pose. For example, poses of carrying a baggage or drinking
or spinal joints (V ) are responsible for supporting the
water cannot be interpreted by only L1 joints. In these
posture in any action performance. Wherever the spinal cases, L2 joints can give specific information of the action
joints (V ) are involved we have considered the movement which is completely defined by L1 (pose) þ L2 (action).
of the spine as a whole, since they possess limited range
Level 3 (L3) joints cover the vertebrae or spinal joints. V
and it would be inaccurate and tedious to analyse each joint joints contribute to the posture of a person with a
separately. Fig. 2 illustrates 23 landmark points particular action. L3 joints play a less significant role in
interconnected to create a human pose. Each of the
defining the pose than L1 and L2 joints. Pose involving L3
different joints in Fig. 2 is denoted but the angular joints are bending multiple actions including picking,
mobility in X-, Y- and Z-axis. We have also indicated the mobbing etc. Since L3 joints are involved only in a small
inward rotation (InR), outward rotation (OutR), right
number of regular poses, large change or deviation in their
rotation (RR) and left rotation (LR) for the corresponding joint angles usually represent an active or hostile poses like
set of joints. sports, fighting etc. Based on the above observation
classification of a particular pose can be expressed as
3 Hierarchical pose follows. The first classification starts with each individual
L1 joint. The classification then moves towards multiple
representation other joints with lower levels L2 and L3 for a specific pose.
The pose classification using joint angles brings us one step For a normal pose, for example, walking, only L1 joints are
closer to the actual interpretation and recognition of a involved. But for representing a person walking and hand
human pose by machines and computers. However, only carrying an object, L2 joints are additionally used. At the
joint angles are not sufficient for the recognition stage same time for a person walking and pushing an object, all
where a pose has to be interpreted into one of several three types of joints, from L1 (walk), L2 (object) and L3
categories given in Table 1. The joint angles can be (push or pull), are involved.
integrated with objects, people, environment factors and
even a priori stored database for final interpretation. If a In the first case the classification will identify B and H
machine or a robot has to interpret and recognise a pose, it joints associated with legs as the major joint and hence the
needs to have a database with all the poses, whose size pose is classified as ‘walking’ regardless of B and H joints of
could reach up to several terabytes. If there is a request, the the arms. In the second example variation in the E joint
robot has to efficiently retrieve the required pose by can represent an interacting pose. In the final example

14 IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24


& The Institution of Engineering and Technology 2009 doi: 10.1049/iet-cvi.2008.0086
www.ietdl.org

Table 1 Angular range variation of body joints and sample poses

Body joints Pose examples Joint angular range


X Y Z
(E2 – E5) grip/hold 0 0 75
B5 head shake 0 B(280, 80) 0
nod 0 0 (120,180)
bow 0 0 (90,180)
(H1 – H4) lift object 0 0 (30,150)
(V1 – V5) bend 0 0 (170,180)
(E2 – E5), (H1 – H4) beckoning 0 0 E(0,75), H(60,120)
(B1 – B2), (H1 – H4) pick, lift B(0,90) 0 B(230,90), H(0,150)
kick B(0,45) 0 B(230,90), H(2100,0)
walk 0 0 B(215,30), H(230,45)
(B1 – B2), (V1 – V5) bend down 0 0 B(30,210), V (210,10)
bend side B(0,45), 0 0
V (210,10)
turn around 0 B(280,80) and 0
V (275,75)
(E2 – E5), (B1 – B2), (H1 – H4) throw, push, pull B(0,150) 0 E(0,75), B(230,120), H(0,120)
(B1 – B2), (H1 – H4), (V1 – V5) bend and lift 0 0 B(30,180), H(30,180) and
V (170,190)
(B1 – B2), (H1 – H4), (V1 – V5), throw, push, pull B(0,150), 0 E(0,75), B(230,120), H(0,120)
(E2 – E4) V (210,10) and V (210,10)

when a V joint is also in motion, the corresponding pose is methodology for developing classification models. The aim
classified as an actively or rigorously interacting pose. If the of the SVM-based classification is to provide a
robot interprets joint angles using a vision sensor, pre- computationally efficient way of learning the given set of
specified constraints on joint angles can immediately hierarchically classified poses. Such learning process
eliminate unrealistic poses. Furthermore, these constraints separates the generalised hyper plane in a higher
can eliminate redundant poses by considering variations of dimensional feature space.
the major and minor joints. The final interpretation checks
for interaction with any objects, environment, people or Suppose a training set is given as
another isolated pose. In order to reduce the erroneous
interpretations, the robot can still use a database for the S ¼ {(xi , yi )}N
i¼1 # (X  Y )
N
(3)
final query, by using a much smaller database than the
existing methods. where xi [ X # Rn (n-dimensional real space) and
yi [ Y ¼ {1, 1} represent the input vector and the target
4 Pose modelling and variable, respectively. A kernel function is defined as using

classification k(xi , xj ) ¼ kf(xi ), f(xj )l (4)


Based on the result of the hierarchical pose representation we
can classify and model poses in an effective manner. where f maps the input space X into another high-
dimensional feature space F. Hyper planes, denoted by
(w, b), where w [ Rn and b [ R consisting of all x [ X
4.1 Classification by the SVM satisfying kw, xl þ b ¼ 0, are to be separated for the
As the first step of pose modelling acquired pose data are classification purpose. For given data it is possible to find a
trained by using SVM, which provides classification results. hyper plane that maximises the margin, which represents
SVM has become increasingly popular non-parametric the minimal distance from the hyper plane to the given

IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24 15


doi: 10.1049/iet-cvi.2008.0086 & The Institution of Engineering and Technology 2009
www.ietdl.org

points. The problem can thus be formed as joint angles on the right hand (RH) is represented as

minimise kwk2 =2 subject to yi (kw, xi l þ b)  1 (5) RH L1 ¼ {B1 , H1 , H3 }


¼ {clavicle, humerus, radius}
More details on the tradeoff constants and iterative ¼ {d (2), d (3), d (1)} (7)
optimisation can be found in [10]. For each of the motion
category training and test datasets are separately
accomplished. To reduce the computational burden we where joints B1 , H1 and H3 for L1 hierarchy which correspond
selected almost 80% of the data for training and the rest to the clavicle, humerus and radius with dimensions two, three
20% for the testing. For each category a test pose was and one, respectively. Similarly level 2 (L2) and level 3 (L3)
classified against a training of several poses in the same hierarchies for end joints (RH) and spinal joints can be
motion category. The tradeoff constant C, number of runs represented as
R and linear kernel calculation for the SVM were tested
several times. In almost all cases no more than 40 runs and RH L2 ¼ {E2 }
1550 kernels were required. The training was carried out in ¼ {wrist, hand, fingers, thumb}
batches of motion categories including interactive poses, ¼ {d (1), d (3), d (1), d (2)} (8)
locomotive poses, physically active poses, behavioural poses
etc. The motion capture data were represented as
and
B ¼ {b1 (d1 ), b2 (d2 ), . . . }
(6) L3 ¼ {V1 , V2 , V3 , V4 , V5 }
H ¼ {h1 (d1 ), h2 (d ), . . . }
¼ {root, lower back, upper back, thorax}
where b1 and b2 represent the specific body parts and d1 and d2 ¼ {d (6), d (3), d(3), d (3)} (9)
represent the dimensions of joint angles. Similar conventions
can be applied to E and V joints. Fig. 3 shows the chart Based on the proposed learning framework we can model the
connecting motion capture data with the proposed remaining body joints with left hand (LH), right leg (RL) and
hierarchical joint representation. The 23 landmark points for left leg (LL) for L1 and L2 hierarchies. Once we have
each joint location were linked to 29 body parts whose modelled the motion data we are ready to train the modelled
dimensions vary in the range [1, 6]. The L1 hierarchy for data using SVM and test for classification accuracy.

Figure 3 Graphical representation of various body parts and their respective angular dimensions

16 IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24


& The Institution of Engineering and Technology 2009 doi: 10.1049/iet-cvi.2008.0086
www.ietdl.org

Pre-processing steps of a feature extraction algorithm for we can classify the corresponding set of joints of particular
joint angles or pose estimates are outside the scopes of this pose into major and minor joints. A joint falls into the
paper, and we assume they are a priori given or manually major category if a particular pose cannot be validated
estimated. We focus on classifying a given pose based on without this type of joints. On the other hand, a minor
its hierarchy and joint angles from the training data. In the joint describes the details of the shape, but does not
proposed method a given pose can have a single semantic validate a pose. A major joint is responsible for obtaining
interpretation (walk, run), multiple semantic interpretations the abstract posture of the given pose, but a minor joint
(walk þ shake hands, jump þ sit) or poses defined by usually defines an action or a function of a pose. Hence the
actions including physical activities or sports (basket ball, minor joints are very important for interpretation or
soccer, dance). It has been found that it is possible to recognising a pose. In case of a person standing and
classify any pose with acceptably high accuracy using a brushing teeth, for example, major joints define the
proper training and selection of the hierarchy level. The standing pose whereas minor joints define the brushing
training data for the SVM basically consist of a collection action. Based on this concept possible combination of
of poses from a specific motion category. We classify the poses is summarised in Fig. 5. Given input joint angles we
entire collection of poses based on their motion and can obtain the major and minor joints of the pose, and
applications for simpler pattern representation using the interpret the corresponding posture and action in the
SVM. Each pose is represented by several frames with joint above-mentioned manner. Now the standing posture and
angle information at three hierarchical levels as brushing action are combined into a pose involving L1 – L2
hierarchy joints. L1 represents the standing posture and L2
m
Pi¼1 k
¼ {Fi¼1 , L(D)}m
i¼1 (10) represents the brushing teeth action.

where Pm represents the pose for particular motion category Angles of all movable joints have been estimated in the
m that is, in our case, given by X-, Y-, and Z-axis, where Y-axis is along the standing
direction of the body. For a lying down pose, the Y-axis is
m
Pi¼1 ¼ {Locomotive, Interactive, Behavioural, Sports}m
i¼1 not vertical and is inclined or tilted towards the horizontal
(11) direction. In this case joint angles can still be estimated by
tilting the Y-axis by 908. Similar cases include exercising,
Fki¼1 represent k frames and L represents joint angles for a swimming and crawling, where the Y-axis is tilted from the
particular hierarchy or combination of two or more vertical position. Also the spatial configuration of the knee
hierarchies with feature dimension D given as and elbow joints (H ) depends entirely on the hip and
shoulder joints (B). Similar exceptions also exist for wrists
L(D) , {L1 (D1 ), L2 (D2 ), L3 (D3 )} (12) and ankles (E), but are less significant in the two cases
such as the forearm motion in dumbbell lifting or wiping a
where L(D) represents a specific level or a combination of two horizontal surface. The range of angular variation is the
or more levels, and D1 , D2 and D3 , respectively, represent the same in both cases, but one involves rotation about the
dimensions of the first, the second and the third levels. Based Z-axis, whereas the other about the Y-axis.
on the above definition the training is carried out at each level
for various poses at different motion categories. Detailed In another type of pose, called the intermediate pose, one
specifications on the training data have been included in has to use two axes for the same lifting and wiping motion. A
the experimental results section. Once we have trained the possible solution would be to detach the limbs at B joints and
pose data the input test pose data can be similarly vertically align the hand-arm/thigh. This would indeed make
coordinated with the training data. H joints purely uni-axial in nature. In Table 1, we assume the
above-mentioned case, and the angles at H joints depend on
Once the training and test data have been prepared we still the corresponding B joints. Wherever the motion of a B joint
need to estimate the optimum levels of hierarchy. For a covers the entire three axes, we use only two axes to define it,
particular test pose, the selection process estimates the since any axis can be obtained from another by rotating 908 in
variance along the frames for each particular joint. The the appropriate direction. In the case of lifting objects with a
level with maximum variance is selected in the classification straight back, for instance, the shoulder joints can rotate
process for improving the speed and classification accuracy. about any of the three axes, depending on the height of the
Variances of joint angles for different levels for different target object. This motion can be broken down into: (i)
poses are shown in Fig. 4. aligning the shoulder to account for the object height
(rotation about X-axis) and (ii) reaching out and lifting the
object to the target position (rotation about the Y or
4.2 Modelling by POSER Z-axis). Note the range for step 2 is the same regardless of
After classification the commercial software POSER is used the axis. To reduce the redundancy in the similar poses that
for reconstructing realistic and extracting 3D joint differ in the overall orientation of the body (lying and
information from the simulated pose. By using POSER standing), one can ‘bind’ the Y-axis to the spine, and include
software for poses associated with multiple joint variations, two angles to define the orientation of the coordinate frame.

IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24 17


doi: 10.1049/iet-cvi.2008.0086 & The Institution of Engineering and Technology 2009
www.ietdl.org

Figure 4 Variance of joint angles against number of joints for different poses
a right leg, b left leg, c right hand, d left hand, e head and neck, f back, g right fingers, h left fingers, i right toes, j left toes

18 IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24


& The Institution of Engineering and Technology 2009 doi: 10.1049/iet-cvi.2008.0086
www.ietdl.org

Figure 5 Pose classification using body parts

5 Experimental results For the SVM classification the hierarchy levels are chosen
in the order of maximum to minimum variance. For
Experiments for hierarchical pose classification were carried locomotive poses we define walking, running and jumping
out on two datasets: (i) a benchmark CMU mobile motion poses as the most fundamental poses. In case of the
capture database and (ii) a POSER-simulated pose walking pose, the significant joint variations are confined to
database. The CMU database [18] consists of 140 subjects only L1 joints as shown in Fig. 5, so we can accurately
with approximately 3000 trails which were used in the classify using only arm and leg movements. In case of
SVM modelling and classification. The characteristics of running and jumping poses joint variations occur at all
the CMU datasets are given in Table 2. The POSER
dataset is relatively smaller than the CMU dataset, but
incorporates more varieties for emphasising the importance Table 3 Classification accuracy for locomotive poses
of hierarchical representation of joint angles.
Pose type L1 L1 þ L2 L1 þ L2 þ L3 MC
walk 100 100 100 NA
5.1 SVM classification results
run 99.25 99.25 99.94 jog
The SVM classification results for locomotive poses (walk,
run and jump), physical activities (dance, martial arts), jump 100 100 100 NA
sports (basket ball, soccer), interactive poses and common run þ jog 93.76 99.1 100 run
behaviour poses are shown in Tables 3 – 6. Each pose is
classified based on their joint angles at different hierarchy walk þ run 89.91 99.7 92.8 jog, run
levels. For a given pose the maximum variance along walk þ jump 73.89 85.69 94.26 run, jog
particular joint columns is used for hierarchy selection. The
group of joints with maximum variance is assigned to be
the major joints.
Table 4 Classification accuracy for physical activities and
sports poses
Table 2 Characteristics of the CMU dataset Pose L1 L1 þ L2 L1 þ L2 þ L3 MC
type
Pose type Training Testing Number of
data data subjects basket 97.4 100 100 boxing
ball
locomotive poses 7800 3400 28
boxing 97.13 100 99.94 basketball,
physical activities 9400 5800 19
martial arts
and sports
dance 90.48 100 100 martial arts
human interactive 12 900 3400 23
poses soccer 88.7 100 100 martial arts,
dance
common 14 500 3700 17
behaviours and martial 69.2 81.4 92.8 dance
expressive poses arts

IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24 19


doi: 10.1049/iet-cvi.2008.0086 & The Institution of Engineering and Technology 2009
www.ietdl.org

Table 5 Classification accuracy for human interactive poses

Pose type L1 L1 þ L2 L1 þ L2 þ L3 MC
walk þ shake hands 100 100 NA NA
walk þ throw þ catch 77.33 87.43 96.95 shake hands, hand gestures
squat þ pull þ stand 65 73.2 90.2 walk, shake hands
sit þ hug þ hold hands 80.4 96.6 96.8 walk, catch, hand gestures
walk þ hand gestures 76.2 100 NA walk
push þ pull þ slant 52.8 93 99.4 throw, squat

Table 6 Classification accuracy for common behavioural and include common representation of daily poses. The
expressive poses variance levels indicate that interactive poses generally
involve all three joint levels for a particular pose.
Pose L1 L1 þ L2 L1 þ L2 þ L3 MC
type
5.2 POSER classification results
laugh 98.56 100 NA drink
In this section we graphically illustrate several test poses based
drink 93.02 99.36 NA laugh on the proposed classification scheme. Test poses with
different joint angles were generated using POSER
wash 98.08 100 NA mop
software [19] which is an interactive 3D design tool for
bend 83.26 91.23 97.64 mop, wash portraying various human poses. Fig. 6 shows selected
poses described in Tables 1 and 2 using POSER. Fig. 6
mop 78.69 92.36 99.4 bend, wash
shows results of joint angle estimation for some test poses.
The joint angles estimated from test poses are compared
with the reference values in Table 1. In Fig. 6a the
levels, but variation in L1 joints is significantly different with simulated walking pose pattern is shown with estimated
each other. Classification results of walking poses are accurate joint angles. The first inference can be made from the joint
at all the levels as shown in Table 3. However, when two angles at the lower part of the body. In this case joints B3
primary locomotive poses simultaneously occur slow and B4 form an angle of 45.58, which represents either
transition period, such as walk þ run, walk þ jog and standing or sitting pose. When we consider the next set of
walk þ jump, the misclassification (MC) in L1 joints joint angles H5 and H6 for the same pose these angles are
increases but improvements at other hierarchies were almost in line with the reference axis at 170 and 173.48,
noticed as shown in Table 3. In the right most column of which exclude the inference as sitting pose where H5 and
the table we provide the MC results of the SVM classifier. H6 would have been much smaller. As a result we can
These columns represent the false interpretation of the safely assume that this is the standing pose.
SVM classifier for a given pose. In case of the running
pose the misclassified pose is the jogging pose, and similar Further inference on the pose can be made from joint
results are presented for other pose categories. In case of angles H1 and H2 , by which we can decide whether it is an
physical activities and sports given in Table 4, the MC interactive or a self-active pose. In all experiments we have
results increase because of similar variations in L1 hierarchy used only joint angles that are visible from a particular
joints. But we can see the improvement in classification view. The proposed algorithm is based on the assumption
accuracy with the extension to L2 and L3 joints. For that a pose can be inferred from a minimum set of joints.
basketball several ball-dribbling poses were tested and it Fig. 6 shows similar results for various poses. Smaller
was found that L1 classification was sufficient. Similar values of H5 and H6 usually represent a sitting pose.
results were obtained by using only L1 hierarchy for Fig. 6g represents a person sitting and talking on a cell
boxing, dancing and soccer poses. The classification phone. The angle of joints H1 clearly classifies the pose as
accuracy reduced in L1 hierarchy for dancing and soccer an interactive one. According to the proposed classification
poses with misinterpretation between dance and soccer. method the pose in Fig. 6g would be classified as a
Some dancing poses were misinterpreted into soccer and combined sitting and interaction pose. With additional
some soccer poses were misinterpreted as dance. However knowledge about the object the pose could be defined more
with L1 þ L2 hierarchy we achieved higher classification specifically. Fig. 6d represents a shaking-hand pose. The
accuracy because of additional information from E joints. angles of joints B3 , B4 , H5 and H6 first classify it as a
Tables 5 and 6, respectively, summarise classification results standing pose and the angles of joints B1 , H1 then classify
for human interactive and behavioural/expressive poses. it as an interaction pose. In a similar way, we have a
These are the most important classes of poses because they combined standing and interacting pose that can be

20 IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24


& The Institution of Engineering and Technology 2009 doi: 10.1049/iet-cvi.2008.0086
www.ietdl.org

Figure 6 Different poses generated by POSER using many body joints


a Walking
b Shaking hand
c Chatting
d Combined sitting and shaking hand
e Sitting
f Running
g Interacting
h Sword fighting
i Threatening
j Ball catching
k Exercising
l Dancing

IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24 21


doi: 10.1049/iet-cvi.2008.0086 & The Institution of Engineering and Technology 2009
www.ietdl.org

Table 7 Pose inference results using proposed joint-based classification

Test pose Classification result Probable pose


standing standing standing
running stand þ slant running
fighting stand þ slant þ stretch throwing, fighting
hand shake sit þ interaction hand shake, pick object
fight stand þ arm raise exercise, fight
gun fight sit þ stretch þ object daily activity
kicking stand þ stretch þ interaction soccer, fight, exercise, jump
dancing sit þ stretch þ arms wide interaction, dancing,
crawling stand þ slant þ bend hiding, crouching
exercise stand þ stretch þ arms wide exercise
arms around sit þ one arm stretch interaction
jumping stand þ stretch exercise, jump

interpreted as a shaking-hand pose with additional angles we classify it as standing, sitting, running or walking
information in surroundings. as well as the slant angle with respect to the reference axis.
From there we move towards the upper torso which is more
Several special poses have additionally been studied for complicated than the former. Here we obtain the angles of
analysing the proposed classification results. In case of joints B1 and B2 to estimate the position of the arms. In the
special activity-related poses we explain the joint-based same way angles of joints H1 and H2 represent more specific
classification method. We first use part-based joint function associated with the arm. Fig. 6h represents a rather
variations with the lower torso of the body. Using the joint unusual pose generated by POSER. This special pose,

Figure 7 Pose recognition using joint angles


a Matching results example
b Sample training pattern example

22 IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24


& The Institution of Engineering and Technology 2009 doi: 10.1049/iet-cvi.2008.0086
www.ietdl.org

requires all joints for complete representation. According to Program and the Ministry of Knowledge Economy, Korea,
the part-based classification the lower joint angles with under the HNRC (Home Network Research Center)
reference to the Y-axis would classify it as slanting pose with ITRC (Information Technology Research Centre) support
respect to the vertical axis. The amount of slant is obtained by the Korea Science and Engineering Foundation
using the angle of the spine V. Hence it is classified as a (KOSEF) grant funded by the Korea government and the
combined standing, slanting and stretching pose. If a person Industrial Technology Development program of the
catches a ball or fights with a sword, the corresponding Ministry of Knowledge and Economy (MKE) of Korea.
figures would represent a suitable scenario which can be
inferred with additional information about the surrounding.
Table 7 summarises several inference results using the
proposed classification algorithm and the actual poses. From 8 References
this table we can confirm that the pose classification using
joint angles can be very effective for various pose [1] PIERCE A., PIERCE R.: ‘Expressive movement: posture and
interpretation applications. action in daily life, sports and the performing arts’ (Da
Capo, 2001, New edn.)
By using the joint angles we found that the pose could be
recognised with or without a sufficient amount of dataset. [2] MOESLUND T., GRANUM E.: ‘A survey of computer vision-
The measure of the similarity between joints can be based human motion capture’, J. Comput. Vis. Image
estimated using standard deviations of the lengths and angles Underst., 2001, 81, pp. 231– 238
in the input image from those of the training set. For poses
that do not exist in the database, a new pose is generated [3] MOLINA-TANCO L., HILTON A.: ‘Realistic synthesis of
directly from the estimated features as shown in Fig. 7. The novel human movements from a database of motion
first row shows matching results between inputs and training capture examples’. Proc. IEEE Workshop on Human
patterns. The second row shows sample training poses Motion, 2000
generated using joint features. The computational load of the
proposed joint is far less than existing recognition-based [4] ELGAMMAL A. , LEE C. : ‘Inferring 3D body pose from
algorithms. With the joint angle-based classification method silhouettes using activity manifold learning’, Proc. Int.
the various poses can be modelled with a fewer number of Conf. Comput. Vis. Pattern Recognit., 2004, 2, pp. 681– 688
training sets. In case of a standing pose, by using joint angle
classification we can efficiently exclude the training sets [5] ARIE J. , WANG Z. , PANDIT P. , RAJARAM S.: ‘Human activity
containing sitting or walking poses. recognition using multi dimensional indexing’, IEEE Trans.
Pattern Anal. Mach. Intell., 2002, 24, (9), pp. 1091 – 2005
6 Conclusions [6] MORI G. , MALIK J.: ‘Recovering 3D human body
In this paper we proposed a new hierarchical pose classification configurations using shape contexts’, IEEE Trans. Pattern
scheme based on joint angles in human body. The main Anal. Mach. Intell., 2006, 28, (7), pp. 1052 – 1062
motivation of the proposed work is to provide an efficient
approach to pose interpretation in machines and computers. [7] MARIEB E.: ‘Essentials of human anatomy and
The hierarchical of classification makes the proposed method physiology’ (Pearson Inc., 2002, 8th edn.)
less vulnerable to false interpretations. The range of joint
angles has been interpreted using test experiments. The [8] CAMPBELL N., REECE J.: ‘Biology’ (Pearson Inc., 2002, 6th
classification tree along with several examples has been edn.)
presented to show the efficiency of the proposed approach.
Experimental interpretation was carried on several test poses [9] CHAPPELLE O. , HAFFNER P., VAPNIK V. : ‘Support vector
generated by POSER and a common CMU benchmark machines for histogram based image classification’, IEEE
dataset using the SVM. The proposed algorithm can be Trans. Neural Netw., 1999, 10, (5), pp. 1055 – 1064
considered as a significant milestone in various potential
applications involving human pose such as humanoid robots, [10] CRISTIANINI N., TAYLOR J.: ‘An introduction to support vector
HCI, multimedia and entertainment, to name a few. Future machines and other kernel-based learning methods’
works on related topics will be focused on applying the (Cambridge University Press, 2000)
proposed classification model to a system and further
improving its performance. Some false pose interpretations [11] BELONGIE S. , MALIK J., PUZICHA J. : ‘Shape matching and
using joint angles will also be dealt with in an analytical way. object recognition using shape contexts’, IEEE Trans.
Pattern Anal. Mach. Intell., 2002, 24, (4), pp. 509 – 522
7 Acknowledgments [12] MITTAL A., ZHAO L., DAVIS L.: ‘Human body pose estimation
This work was supported by Seoul Future Content using silhouette shape analysis’. Proc. Advanced Video and
Convergence (SFCC) Cluster established by Seoul R&D Signal Based Surveillance Conf., 2003, pp. 263– 270

IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24 23


doi: 10.1049/iet-cvi.2008.0086 & The Institution of Engineering and Technology 2009
www.ietdl.org

[13] GRUMMAN K., SHAKHNAROVICH G., DARRELL T.: ‘Inferring 3D [17] CUCCHIARA R. , GRANA C., PRATI A.: ‘Probabilistic posture
structures with statistical image based model’, Proc. Int. classification for human behavior analysis’, IEEE Trans.
Conf. Comput. Vis., 2003, 1, pp. 941– 647 Syst. Man Cybern., 2005, 35, (1), pp. 42 – 55

[14] UDE A., MAN C., RILEY M., ATKENSON C. : ‘Automatic [18] http://www.coedu.usf.edu/behavior/bares.htm
generation of kinematics models for the conversion of
human motion capture in to humanoid robot motion’. [19] POSER, http://www.e-frontier.com
Proc. IEEE Conf. Humanoid Robots, Cambridge, MA, 2000
[20] BAUBY C., KUO A.: ‘Active control of lateral balance for
[15] LEE M., COHEN I.: ‘A model-based approach for estimating human walking’, Int. J. Biomech., 2000, 33, pp. 1433 – 1440
human 3D poses in static images’, IEEE Trans. Pattern Anal.
Mach. Intell., 2006, 28, (6), pp. 305– 317 [21] KUMAR P., SENGUPTA K., RANGANATH S.: ‘Real time detection
and recognition of human profiles using inexpensive
[16] YANG J., HUANG Q., PHENG Z., ZHANGE L., SHI Y., ZHAO X.: ‘Capturing desktop cameras’. Proc. Int. Conf. Pattern Recognition
and analyzing of human motion for designing humanoid (ICPR), 2000, pp. 1096 – 1099
motion’, Proc. IEEE Conf. Inf. Acquis., 2005, 5, pp. 332– 338

24 IET Comput. Vis., 2010, Vol. 4, Iss. 1, pp. 12– 24


& The Institution of Engineering and Technology 2009 doi: 10.1049/iet-cvi.2008.0086

View publication stats

You might also like