Professional Documents
Culture Documents
ABSTRACT
In this paper, we present an interactive dancing game based on motion capture technology. We address the problem of real-
time recognition of the user’s live dance performance in order to determine the interactive motion to be rendered by a
virtual dance partner. The real-time recognition algorithm is based on a human body partition indexing scheme with
flexible matching to determine the end of a move as well as to detect unwanted motion. We show that the system can
recognize the live dance motions of users with good accuracy and render the interactive dance move of the virtual partner.
Copyright # 2011 John Wiley & Sons, Ltd.
KEYWORDS
interactive dancing game; motion capture; real-time motion recognition
*Correspondence
Liqun Deng, Computer Science and Technology, University of Science and Technology of China, Hefei, China.
E-mail: dlqun@mail.ustc.edu.cn
dance move and its corresponding reactive move is generating different locomotion styles for animation. Lee
predefined. In the game, the dance performance of a user et al. [7] proposed to precompute the reactive motions for
is lively captured, preprocessed, and recognized in real avatars from a large collection of motion dataset so as to
time by a novel online classifier. At the same time, reduce the time delay during the interactive control of
according to the recognition result, the suitable interactive avatars in the animation.
dance motion by the virtual dance partner is promptly Synchronizing the motion with the music is another
determined and rendered. This will give the user the important aspect. Shiratori et al. [8] and Kim et al. [9]
impression that he/she is dancing in collaboration with the identified the appropriate dance motions according to the
virtual partner. The primary contributions of this paper matching with the input music. Alankus et al. [10]
include the followings: proposed an automatic approach to synthesize dance
motions with musical rhythms.
We propose a novel approach to recognize dance
Lee et al. [11] worked on real time controlling the
motions in real time. We develop an online human
avatars with mocap data. In their approach, the motion data
motion recognizer based on a human body partition
in database are preprocessed both by the Markov process in
scheme. We present our flexible matching method and
lower layer and a clustering technique in higher layer.
several rules which contribute to the good performance
Various classification techniques have been applied for
on both continuous motion recognition and unwanted
recognizing motions. Li et al. [12] applied singular value
motion detection.
decomposition (SVD) for feature extraction on motions
We implement an interactive dancing game system that
and proposed a new eigen feature based similarity measure
can be used for dance training and entertainment. Exper-
to classify them. Tormene et al. [13] proposed a new
iments and the user study demonstrated that our classi-
variant of dynamic time warping named open-end DTW
fier is effective in online motion recognition, and the
(OE-DTW) which allows matching incomplete time series
proposed system is well accepted by the investigated
patterns with complete ones. As another extension of
users.
DTW, continuous dynamic programming (CDP) was also
The rest of this paper is organized as follows: the next utilized for human gesture recognition [14]. Liang et al. [2]
section briefly reviews the related work. Next, the real-time recognized the motions recorded by accelerometers using a
motion classifier is described and the interactive dancing continuous hidden Markov model (HMM) based classifier.
game application is illustrated in the following two sec- In our prior work [15], we addressed the problem of
tions, respectively. The performance evaluation section continuously recognizing the dance moves for a long dance
presents the experimental results and user studies. The sequence, but did not consider the real time issue. The
conclusion and future work are provided in the final above existing methods are not suitable for our case in
section. which the recognition decision is required each time when
several new frames are input instead of waiting for the end
of the entire pattern before starting the recognition process.
2. RELATED WORK We need a fast method that does not require too much
training data.
There have been a lot of efforts towards the offline analysis Our recognition method is motivated by the success of
of a single person’s motion. Kovar et al. [6] created a novel indexing techniques applied for motion retrieval [16–18],
concept of motion graph that constructed the mocap data as which are efficient and fast in searching a large dataset for a
a directed graph, which was demonstrated to be efficient in query motion. For example, Chiu et al. [16] proposed to
230 Comp. Anim. Virtual Worlds 2011; 22:229–237 ß 2011 John Wiley & Sons, Ltd.
DOI: 10.1002/cav
L. Deng et al. Real-time mocap dance recognition
partition a human skeletal model into nine body parts and On the other hand, to facilitate a higher-level description
construct an index map for each of the body parts through of the motions, the joints of the skeletal model are grouped
self-organizing map (SOM) clustering. These maps are into five partitions that include torso, left upper limb, right
then used for querying a motion in a long motion sequence. upper limb, left lower limb, and right lower limb, as shown
Two advantages of the body partition based retrieval are: in Figure 2(b). Hence each motion is represented by five
(1) reducing the computation cost by partitioning whole- disjoint sub-motion matrices corresponding to the body
body motions with high dimension into a set of body-part partitions along the frame time, which are clustered and
motions of low dimension and (2) avoiding the disharmony indexed separately.
that usually occurs in dance between different body parts.
We use a similar scheme, and extend it from motion 3.1.2. Clustering Motions With SOM.
retrieval to real-time motion recognition. In addition, each For each of the C classes, we select K trials as the training
time when a new block of input motion is recognized, source and build the index structure (In our system, C ¼ 19,
DTW is used to further extract its temporal alignment with K ¼ 5). Thus, K C sub motions for each body partition
the corresponding template motion, and thus to decide the exist. These sets of sub motions here are clustered using a
exact interactive motion clip. SOM based approach.
For the technical theory of SOM, we refer readers to the
3. REAL-TIME MOTION book by Duda et al. [19]. In our case, for each set of sub-
CLASSIFIER motions specified by a certain body part, we collect all the
corresponding frames and cluster them using a two-step
The framework of our proposed classifier is shown in procedure. The first step is to train a SOM with an existed
Figure 1. It is divided into the indexing and recognition SOM toolbox [20], and take the weight vectors in the SOM
stages, which are presented in the following subsections. nodes as cluster centers. During SOM training, the initial
SOM parameters are set as the default of the toolbox [21].
3.1. Indexing Stage After this step, we find that most of the resulted clusters are
not or rarely indexed by the training frames, thus the
3.1.1. Motion Representation. second step is to iteratively refine the result. We first get rid
In this prototype system, we take Agogo dance into of the clusters that are not or rarely indexed (less than 40 in
account. We assume that this dance is composed of 19 our case), then resize the SOM map according to the
classes of dance moves of different difficulty levels. Some number of remaining cluster centers and retrain the frames.
moves are symmetric in which the male and female This step is repeated until all the resulted cluster centers are
dancers’ motions are the same while in other moves, the well indexed by the training frames. The rationale behind
two dancers’ motions are completely different but this step is to reduce the noise in the result caused by
collaborative. More information about our dance moves irregular training frames.
is described in the prior work [15]. For each class, 15 trials Figure 3 shows an example of projecting motions into
are pre-captured by five subjects using the optical the clusters. By following the notation by Wu et al. [17],
system. The durations of the motion clips range from the resulting sequences of cluster IDs are known as motion
85 to 360 frames. Each motion is measured by a set of 3D strings and each motion trial is transformed into five
rotations of 20 body joints (see Figure 2(a)). motion strings, with each frame represented by a vector of
Figure 2. (a) Human skeletal model. (b) Five body partitions of a skeletal model.
Comp. Anim. Virtual Worlds 2011; 22:229–237 ß 2011 John Wiley & Sons, Ltd. 231
DOI: 10.1002/cav
Real-time mocap dance recognition L. Deng et al.
five elements as the projections of the body parts. The final consisted of two parts, entries and content. The number of
sizes of the five SOM clusters are 10, 28, 33, 15, and 19, entries is exactly the number of the SOM centers of the
respectively. corresponding body part generated by the previous
subsection, and the entries are specified by the cluster
IDs (see Figure 5). The content of the entries is filled up by
3.1.3. Building Indexing Maps. traversing the TMs, that is, if the torso of the i-th frame of
This step aims to build five index maps corresponding to model m contains ID j, then a new pair (m, i) will be added
torso, left upper limb, left lower limb, right upper limb, and into the content of the j-th entry of the torso map. The
right lower limb, respectively. average numbers of pairs for each entry in the five index
First, for each motion class, the K trials are combined to maps are 120, 102, 88, 217, and 139, respectively.
be trained as a template model (TM). Let L be the largest
lengths among the K trials. We scale the trials to contain L
frames by uniform scaling [22], and combine them (see
Figure 4 for example). Thus, the resulted model is also with 3.2. Recognition Stage
L frames, and each frame is composed by five items with
respect to the five body parts. Each item is a unification of In this subsection, we apply the index maps for real-time
the corresponding K values. Second, the C TMs are used to motion recognition. We propose a flexible matching
build the index maps. Each map is actually a hash table scheme to search the match of a query motion from the
232 Comp. Anim. Virtual Worlds 2011; 22:229–237 ß 2011 John Wiley & Sons, Ltd.
DOI: 10.1002/cav
L. Deng et al. Real-time mocap dance recognition
Comp. Anim. Virtual Worlds 2011; 22:229–237 ß 2011 John Wiley & Sons, Ltd. 233
DOI: 10.1002/cav
Real-time mocap dance recognition L. Deng et al.
is considered as reaching the end point. If the length of the result hence is used to decide the exact interactive motion
current recognized motion is larger than a fixed length clip to animate avatar in real time.
minLen, it is considered as a meaningful dance move,
otherwise it is considered as an unwanted move. (In our
system, k ¼ 2, minLen ¼ 60). 4. INTERACTIVE DANCING GAME
After the end point of a recognized move is detected, the
input motion sequence is segmented, and the counts for all We have implemented the interactive dancing game
the TMs are reset to zero, hence re-starting the recognition system. In the application, two modes, i.e., the training
process. mode and the freestyle mode have been taken into account.
The training mode aims to demonstrate pre-captured
dance motions such that users can watch and learn how
3.2.4. Extracting Temporal Alignment. they should dance with the virtual partner. An exemplar
After the previous step, the label of the input motion is scenario is shown in Figure 8(a).
determined. However, it does not guarantee that the In the freestyle mode, the user is allowed to dance freely
alignment between the input motion and the recognized in a given time period and his/her dance is lively captured
template motion is exact, which is important for the system and processed by the game system in real time (see
to produce smooth interactive motions. Figure 8(b)). The music related to the dance is also played.
We employ OE-DTW [13] to process the motions to be At the same time, the virtual partner is performing the
aligned. One advantage of OE-DTW over traditional DTW interactive motions to let the user feel more immersed in
is that OE-DTW does not require the prior knowledge on the virtual environment. At the top part of the screen as
the length of the motion to be matched, and its distance is shown in Figure 8(b), several messages are displayed. The
determined by the last column of the distance table thus it is message ‘‘interactive move: 4 (25%)’’ means that the
suitable for online application. Suppose Q is an online current input motion is recognized as the motion class 4,
motion clip, R is a template motion, and Figure 7 shows an and the percent completion of this move by the user is 25%.
example on OE-DTW. The current input Q matches the For each run, we allow the user to dance for 20 seconds,
suffix R1:::I of R. and the remaining time is shown on the screen. When a
For each move class, among the K trials, the trial with meaningful dance move is detected, one mark is awarded to
minimal average DTW distance is chosen as the template the user.
motion of this class. The real time input frames are
continuously aligned with the template motions and the
5. PERFORMANCE EVALUATION
234 Comp. Anim. Virtual Worlds 2011; 22:229–237 ß 2011 John Wiley & Sons, Ltd.
DOI: 10.1002/cav
L. Deng et al. Real-time mocap dance recognition
Figure 8. (a) Interactive dance game in training mode, where the right avatar is dancing the input moves, and the left avatar dancing the
interactive moves. (b) A snap of the scenario in freestyle model. The shot in the right bottom shows the real time dancing of a real user.
Comp. Anim. Virtual Worlds 2011; 22:229–237 ß 2011 John Wiley & Sons, Ltd. 235
DOI: 10.1002/cav
Real-time mocap dance recognition L. Deng et al.
provided. The subjects were required to give their marks 3. Shin HJ, Lee J, Shin SY, Gleicher M. Computer
using the five-level Likert scale (1 means strongly disagree puppetry: an importance-based approach. ACM Trans-
and 5 means strongly agree) for each of the questions. The action on Graphics 2001; 20(2): 67–94.
last column of Table 2 shows the average mark as well as 4. Magnenat-Thalmann N, Protopsaltou D, Kavakli E.
the statistical significance of each question. Since high Learning how to dance using a web 3D platform. In
mark represents positive feedback and vice versa, we can ICWL ’07: Lecture Notes in Computer Science, Vol.
see that our system impresses the subject greatly. The 4823, 2008; 1–12.
marks from question 3, 4, and 5 support that the live dances 5. Chan JCP, Leung H, Tang JKT, Komura T. A virtual
are accurately recognized, and that of question 6, 7, and 8 reality dance training system using motion capture
suggest that the system is well designed and nicely technology. IEEE Transaction on Learning Technol-
interacts with users. The marks from question 1 and 2, on ogies, 17 Aug. 2010. <http://doi.ieeecomputersociety.
the whole, indicate that the system can achieve good org/10.1109/TLT.2010.27>
performance and impress users. 6. Kovar L, Gleicher M, Pighin F. Motion graphs. ACM
Transaction on Graphics 2002; 21(3): 473–482.
7. Lee J, Lee KH. Precomputing avatar behavior from
human motion data. In SCA ’04: Proceedings of the
6. CONCLUSIONS AND FUTURE 2004 ACM SIGGRAPH/Eurographics symposium on
WORK Computer animation, 2004; 79–87.
8. Shiratori T, Nakazawa A, Ikeuchi K. Dancing-to-
This paper presented an interactive dancing game based on music character animation. Computer Graphics
the mocap technology. We proposed a novel approach to Forum 2006; 25(3): 449–458.
handle the real-time recognition of the user’s dance motion 9. Kim JW, Fouad H, Sibert JL, Hahn JK. Perceptually
based on human body partition indexing. The matching motivated automatic dance motion generation for
was flexible in identifying the end of a move and detecting music. Computer Animation and Virtual Worlds,
unwanted motion. Experiments showed that our proposed 2009; 20(2–3): 184–375.
method has good performance on both isolated and 10. Alankus G, Bayazit AA, Bayazit OB. Automated
continuous motion recognition and positive feedback from motion synthesis for dancing characters. Computer
the subjects was obtained from the user study. However, Animation and Virtual Worlds 2005; 16(3–4): 259–
our classifier is trained with the trials of all motion classes 271.
as a whole. Hence if a new motion class is added to the 11. Lee J, Chai J, Reitsma PS. Interactive control of
system, it needs to redo all the training procedure. avatars animated with human motion data. ACM
As future work, we will consider introducing more kinds Transaction on Graphics 2002; 21(3): 491–500.
of dances into this system to increase the complexity and 12. Li C, Zheng SQ, Prabhakaran B. Segmentation and
variety. Also, since different users have different dance recognition of motion streams by similarity search.
styles and dance progresses when they use this system for ACM Transaction on Multimedia Computing, Com-
learning purpose, it is a promising work to make the system munications, and Applications 2007; 3(3): Article 16.
personalized. We may combine this work with our prior 13. Tormene P, Giorgino T, Quaglini S, Stefanelli M.
work [5] to produce more meaningful feedback to the user Matching incomplete time series with dynamic time
regarding his/her dance performance. warping: an algorithm and an application to post-
stroke rehabilitation. Artificial Intelligence in Medi-
cine 2009; 45(1): 11–34.
14. Mori A, Uchida S, Kurazume R, Taniguchi R,
ACKNOWLEDGEMENTS Hasegawa T, Sakoe H. Early recognition and predic-
tion of gestures. In ICPR ’06: Proceedings of Inter-
The work described in this paper was fully supported by a national Conference on Pattern Recognition, 2006;
grant from the Research Grants Councils of the Hong Kong 560–563.
Special Administration Region, China (Project No. CityU 15. Deng LQ, Leung H, Gu NJ, Yang Y. Automated
1165/09E). recognition of sequential patterns in captured motion
streams. In WAIM ’10: Lecture Notes in Computer
Science, Vol. 6184, 2010; 250–261.
REFERENCES
16. Chiu C, Chao S, Wu M, Yang S, Lin H. Content-based
1. Chai J, Hodgins JK. Performance animation from low- retrieval for human motion data. Journal of Visual
dimensional control signals. ACM Transactions on Communication and Image Representation 2004; 15:
Graphics 2005; 24(3): 686–696. 446–466.
2. Liang X, Li Q, Zhang X, Zhang S, Geng W. Perform- 17. Wu S, Wang Z, Xia S. Indexing and retrieval of human
ance driven motion choreographing with acceler- motion data by a hierarchical tree. In VRST ’09:
ometers. Computer Animation and Virtual Worlds Proceedings of the 16th ACM Symposium on Virtual
2009; 20: 89–99. Reality Software and Technology, 2009; 207–214.
236 Comp. Anim. Virtual Worlds 2011; 22:229–237 ß 2011 John Wiley & Sons, Ltd.
DOI: 10.1002/cav
L. Deng et al. Real-time mocap dance recognition
Comp. Anim. Virtual Worlds 2011; 22:229–237 ß 2011 John Wiley & Sons, Ltd. 237
DOI: 10.1002/cav