You are on page 1of 8

Proceedings Chapter

Towards a Minimal Representation of Affective Gestures (Extended


Abstract)

GLOWINSKI, Donald, et al.

Reference
GLOWINSKI, Donald, et al. Towards a Minimal Representation of Affective Gestures (Extended
Abstract). In: 2015 International Conference on Affective Computing and Intelligent
Interaction (ACII). IEEE, 2015.

DOI : 10.1109/ACII.2015.7344616

Available at:
http://archive-ouverte.unige.ch/unige:98097

Disclaimer: layout of this document may differ from the published version.
2015 International Conference on Affective Computing and Intelligent Interaction (ACII)

Towards a Minimal Representation of


Affective Gestures (Extended Abstract)

Donald Glowinski Nele Dael Gualtiero Volpe


Marcello Mortillaro Antonio Camurri
Klaus Scherer
Swiss Center for Affective Sciences Institute of Psychology Casa Paganini - InfoMus (DIBRIS)
University of Geneva University of Lausanne University of Genoa
Switzerland CH-1205 Lausanne, CH 1015 Genoa, IT-16145
Email: donald.glowinski@unige.ch

Abstract—How efficiently decoding affective information when related to human upper-body movements. The motivation is
computational resources and sensor systems are limited? This to embed it in user centric, networked media applications,
paper presents a framework for analysis of affective behavior including future mobiles, characterized by low computational
starting with a reduced amount of visual information related to resources and limited sensor systems. We start from the
human upper-body movements. The main goal is to individuate a hypothesis that even only with a reduced amount and quality
minimal representation of emotional displays based on non-verbal
gesture features. The GEMEP (Geneva multimodal emotion
of visual information it is still possible to classify a significant
portrayals) corpus was used to validate this framework. Twelve amount of the affective behavior expressed by a human. A
emotions expressed by ten actors form the selected data set small set of visual features are extracted from two consumer
of emotion portrayals. Visual tracking of trajectories of head videocameras (25fps), based on head and hands position and
and hands was performed from a frontal and a lateral view. velocity. Neither facial expressions are considered, nor fine-
Postural/shape and dynamic expressive gesture features were grain hands gesture (e.g., finger movements). Starting from
identified and analyzed. A feature reduction procedure was behavioral studies, postural and dynamic expressive motor fea-
carried out, resulting in a four-dimensional model of emotion tures, considered relevant for the communication of emotional
expression, that effectively classified/grouped emotions according content, are extracted from the low-level kinematic features of
to their valence (positive, negative) and arousal (high, low). head and hands.
These results show that emotionally relevant information can
be detected/measured/obtained from the dynamic qualities of
gesture. The framework was implemented as software modules II. BACKGROUND
(plug-ins) extending the EyesWeb XMI Expressive Gesture Pro- The body is an important source of information for af-
cessing Library and was tested as a component for a multimodal fect recognition. A large trend of research concerns facial
search engine in collaboration with Google within the EU-ICT
I-SEARCH project.
expression, gesture, and activity recognition (see, for example,
the IEEE Face and Gesture Conferences). Many techniques
Keywords—emotion; automatic body features extraction have been suggested to analyze hand and head gestures over
the years (HMMs, CRFs, neural networks, DBN). However,
I. I NTRODUCTION our approach is different since we focus on the non-verbal
expressive content rather than on the specific sign or gesture
Multimodal interfaces able to capture non-verbal expressive being performed. This section reviews the state of the art,
and affective features are needed to support novel multi- considering the analysis of full-body features. We then detail
media applications, Future Internet, and user-centric media. recent interests for the analysis of upper-body movements. The
For example, in networked user centric media the expressive, section ends with a presentation of a conceptual framework to
emotional dimension is becoming particularly significant and investigate higher-level, qualitative aspects of gesture perfor-
the need is emerging for extraction, analysis, encoding, and mance to describe behavior expressivity.
communication of a small, but yet highly informative, amount
of information characterizing the expressive-emotional behav- A. From Face to Body
ior of people using or wearing mobile devices.In this context,
upper-body movements are of particular interest. Head and Compared to the facial expression literature, attempts for
hands movements are actually most often employed to express recognizing affective body movements are few. Pioneering
one’s affect and human observer’s attention spontaneously work by Ekman suggested that people make greater use of
focuses around this body region when tempting to infer others the face than the body for judgments of emotion in others
emotions [1]. This predominance of upper-body movements in [2]. However, results from psychology suggest that body
emotional communication is reinforced by current computer movements do constitute a significant source of affective infor-
interfaces and practice. mation [3]. Yet, body gesture presents more degrees of freedom
compared to facial expression. An unlimited number of body
This paper presents a framework for analysis of affective postures and gestures, with combinations of movements of var-
behavior starting with a reduced amount of visual information ious body parts and postural attitudes are possible. No standard

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 498


coding scheme equivalent to the FACS for facial expressions amount of motion computed with silhouette motion images,
exists to decompose any possible bodily movement expression the degree of contraction and expansion of the body computed
into elementary components. Various systems were proposed using its bounding region, or the motion fluency computed
by psychologists [4] or inspired by dance notation and theories on the basis of the variation of the overall amount of motion
[5], but none of them reached the consensus achieved by the over time. This framework of vision-based bodily expression
Ekman’s system for analysis of facial expression. analysis was used for a number of multimodal interactive
applications, in particular in performing arts [15].
Existing studies on full-body movement use coarse-grained
posture features (e.g., leaning forward, slumping back) or low- Most of these listed works attempt to recognize a small
level physical features of movements (kinematics). Bianchi et set of prototypic expressions of basic emotions like happiness
al. [6] formalized a general description of posture based on and anger. The few exceptions include a tentative effort to
angles and distances between body joints and used it to create detect more complex states such as frustration [16]. Our work
an affective posture recognition system that maps the set of on the one hand reinforces the results obtained in previous
postural features into affective categories using an associative research by showing that a limited number of gesture features,
neural network. Other approaches have exploited the dynamics chosen among those already identified in psychological studies,
of gestures referring to few psychological studies reporting that seem sufficient for accurate automatic recognition of emotion.
the temporal dynamics play an important role for interpreting On the other hand, it (i) extends the set of emotions that
emotional displays [7]. Kapur et al. [8] showed that very are taken into account (a set including twelve emotions is
simple statistical measures of motion dynamics (e.g., velocity considered, whereas most studies in the literature usually refer
and acceleration) are sufficient for training successfully au- to the six emotions commonly referred as basic emotions),
tomatic classifiers (e.g., SVMs and decision trees classifier). (ii) introduces features at a higher-level with respect to the
The role of kinematic features has been further established by kinematical features used in most studies, and (iii) considers
the study of Bernhardt and Robinson [9]. Further developing a larger data set of emotion portrayals (120 portrayals) from
the motion-captured knocking motion from Pollick [10], they an internationally validated corpus of emotion portrayals, the
developed a computational approach to extract affect-related GEMEP corpus.
dynamic features. Velocity, acceleration, and jerk measured for
each joint composing the skeletal structure of the arm proved III. E XPERIMENTAL F RAMEWORK
successful in the automatic recognition of the expressed affects
(neutral, anger, happy, and sad). Our framework aims at individuating a minimal and ef-
ficient representation of emotional displays based on non-
B. Upper-Body Movements verbal gesture features, analysing affective behavior starting
from low-resolution visual information related to human upper-
Another trend of studies focused on upper-body move- body movement. It grounds on the methodology adopted and
ments. Two-handed gestures accompanying speech in particu- results obtained in our previous research on expressive gesture.
lar have been mainly investigated and revealed the role of hand According to this framework, analysis is accomplished by
illustrators, i.e., hand movements accompanying, illustrating, three subsequent layers of processing, ranging from low-
and accentuating the verbal content of utterances [1]. As level physical measures (e.g., position, speed, acceleration
pointed out by Freedman, beside gait, hand illustrators are the of body parts) toward overall gesture features (e.g., motion
most frequent, regularly occurring quantifiable bits of overt fluency, impulsiveness) and high-level information describing
behavior available for objective study on expressivity [11]. The semantic properties of gestures (affect, emotion, attitudes). The
sign language production revealed that small detailed motions modules composing the framework refer to the conceptual
are performed in and around the face and upper body region layers presented above (see Figure 1):
where the receiver (looking at the signer’s face) can naturally
observe gesture in high acuity [12]. On the basis of hands • Module 1 computes low-level motion features, i.e.,
position and trajectory patterns, they individuated four classes the 3D positions and kinematics of the video-tracked
of gesture using Hidden Markov Models (HMMs): hands hands and head.
clapping, hands over the head, ’Italianate’ gestures (follows • Module 2 computes a vector of higher-level expressive
a repetitive sinusoidal pattern), and lift of the hand, which gesture features, including five sets of features aiming
combinations proved successful for the classification of six at explaining the qualitative aspects of movement.
basic emotions. Gunes et al. [13] proposed a bi-modal system
that recognizes twelve affective states. Starting from the frontal • Module 3 reduces the dimensionality of the data while
view of sitting participants, their motion was estimated using highlighting salient patterns in the dataset.
optical flow. Hands, head, and shoulders regions were tracked
and analyzed (e.g., how the centroid, rotation, length, width, A. The Expressive Features Extraction Module
and area of the region changed over time), and classification
performed based on SVMs. Considering literature mentioned above, our analysis is
based on the position and the dynamics of the hands and the
C. Expressive Gesture Analysis head. Extraction of expressive features from human movement
is carried out using the EyesWeb XMI Expressive Gesture
Camurri et al. developed a qualitative approach to human Processing Library, extended by the modules described in this
full-body movement for affect recognition [14]. Starting from and following sections. A skin color tracking algorithm is used
low-level physical measures of the video-tracked whole-body to extract the blobs of the head and of the two hands (see
silhouette, they identified motion features such as the overall Figure 2). A three-dimensional position of the blobs’ center

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 499


Fig. 1. The experimental framework and the evaluation component

of mass (centroid) is obtained from the two synchronized anthropometric tables [17].
videocameras (frontal and lateral).
Spatial extent: the bounding triangle. We further consider
a bounding triangle related to the three blobs’ centroids of
hands and head (see Figure 3). The dynamics of the perimeter
of such a bounding triangle approximates the space occupied
by the head and the hands from the frontal view. The use of
space in terms of judged expansiveness or spatial extension
of movements have been regarded by Wallbott as another
Fig. 2. From left to right: the original image, the three blobs (extracted
relevant indicator for distinguishing between active and passive
by color skin tracking algorithms), and the bounding triangle (used to emotions [3]. Mehrabian pointed out how the degree of open-
extract contraction/expansion and symmetry features). Two cameras ness in the arm arrangement (ranging from close-arm position
(frontal and lateral view) are used to record emotion displays: a single to moderate or extreme open-arm position) characterizes the
time-of-flight camera can provide the same information. accessibility and the liking of a communicator’s posture from
the receiver’s point of view [18].

1) Expressive Features: starting from 3D position, velocity,


and acceleration of the head and the hands, as described in
the previous section, five categories of expressive features are
automatically extracted. They are described in the following
subsections.

Energy. A first expressive feature concerns the overall


energy spent by the user for his performance, approximated by
the total amount of displacement in all of the three extremities.
A study by Wallbott revealed that the judged amount of
movement activity is an important factor in differentiating
emotions [3], [14]. The overall motion energy (activation) at
time frame f is estimated as follows: Let vl (f ) denote the
module of velocity of each limb l (head and two hands in our
case) at time frame f (Eq 1)
q Fig. 3. From left to right, four key-frames of a portrayal representing
2 2 2 elation; from top to bottom, the frontal view, the profile view, the 3-D
vl (f ) = ẋl (f ) + ẏl (f ) + żl (f ) (1) representation of the displacement of the head and the two hands in
the space with the bounding triangle relating the three blobs.
We define Etot as an approximation of the body kinematic
energy, as the weighted sum of the limbs’ kinetic energy (Eq
2). Smoothness/jerkiness. In general, “smoothness” is synony-
n
1X 2 mous to “having small values of high-order derivatives”. Since
Etot (f ) = ml vl (f ) (2)
2 Flash and Hogan, movement jerk - the third derivative of
l=1
movement position - is frequently used as a descriptor of
mi are the approximations of the mass of head and hands, the smoothness of a movement [19], [20], [21]. A minimum-
and their value is computed starting from the biometrics jerk model was successfully applied by Pollick et al. to-

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 500


gether with other motion features to specify the arousal level portrayal is measured by the velocity of the head displacement
communicated by a moving arm expressing basic emotion along its z component (depth) (Eq 8):
categories [10]. Curvature is computed for each hand trajectory
by Equation 3. Headleaning (f ) = żhead (8)
ẋ · ÿ − ẏ · ẍ
k= 2 (3) 2) Dynamic-Features: the expressive features presented
(ẋ + ẏ 2 )3/2
above are computed either at each video frame (shape features,
where ẋ,ẍ and ẏ,ÿ are respectively the 1st and 2nd order e.g., bounding triangle) or on a sliding window with a 1-
derivatives of the trajectory of one hand in its horizontal and sample step (dynamic features), thus providing a time series
vertical components. of values at the same sampling rate as the input video. A
further processing step consisted of analyzing the temporal
Symmetry. Lateral asymmetry of emotion expression has profiles of these time series in order to get information on
long been studied in face expressions resulting in valuable their temporal dynamics (e.g., coefficient of dispersion, ratio
insights about a general hemisphere dominance in the control between the maximum value in the time series and the time
of emotional expression [22]. Hand symmetry is measured duration of its largest peak). Each emotion portrayal can thus
in two ways. We first compute spatial hands symmetry with be analyzed and (automatically) annotated with a 25-features
respect to the vertical axis and with respect to the horizontal vector, including the standard statistical values (mean, std) of
axis. Horizontal asymmetry (SIhorizontal ) is computed from the above listed expressive features plus the dynamic features
the position of the barycenter and the left and right edges of of energy.
the bounding triangle that relate the head and the two hands
(Eq 4). B. The Dimension Reduction Module
||xB − xL | − |xB − xR || The third module performs a dimension reduction proce-
SIhorizontal = (4) dure (Principal Component Analysis) to reduce the dimension-
|xR − xL |
ality of the data and to obtain an appropriate set of independent
where xB is the x coordinate of the barycentre, xL is the features to be used as minimal representation of emotion
x coordinate of the left edge of the bounding triangle and xR portrayals. In order to determine the number of components to
is the x coordinate of the right edge of the bounding triangle. retain, the module can apply to the features two complementary
Vertical asymmetry (SIvertical ) is computed by the difference stopping rules [26]: parallel analysis (randomly permutated
between the y coordinates of hands. A first measure related data, critical value 95th percentile) [27] and minimum average
to spatial symmetry (SIspatial ) results from the ratio of the partial correlation statistics [28].
measures of horizontal and vertical symmetries (Eq 5).
IV. E VALUATION OF THE M INIMAL R EPRESENTATION OF
SIhorizontal E MOTION D ISPLAYS
SIspatial = (5)
SIvertical
To evaluate the efficiency of the proposed minimal rep-
A second measure of symmetry is based on the ratio resentation of emotion displays, the experimental framework
between the measures of geometric entropy of each hand was applied to the GEMEP corpus.
trajectory. The measure of geometric entropy provides infor-
mation on how much the trajectory followed by a gesture is A. The GEMEP Corpus of Emotion Portrayals
spread/dispersed over the available space [23]. The geometric
We have chosen a corpus of emotion portrayals (the
entropy (H) associated to the hand’s oscillation is computed
GEMEP corpus) to tune and validate the framework archi-
on the frontal plane (XY) relative to the interlocutor’s point of
tecture. The Geneva Multimodal Emotion Portrayal corpus
view (Eq 6):
2 ∗ LP (GEMEP; [29]) developed at the University of Geneva is
H = ln (6) general in the sense of number of portrayals and variety of
c
emotions. It consists of more than 7000 audio-video emotion
where LP is the path length and c is the perimeter of the convex portrayals, representing 18 emotions being portrayed by 10
hull around LP. The second measure of symmetry (SIspread ) actors in an interactive setting with a professional theatre
thus considers how similar the hands’ trajectories spreads were. director. All recordings are standardized in terms of technical
SIspread is computed as the ratio between Hlef t hand and and environmental features such as light, distance, and video-
Hright hand (see Eq 7) camera settings, except for clothing. They also have similar
Hlef t hand time durations. Our analyses are based on a selected sub-
SIspread = (7) set of 120 portrayals representing a full within-subjects 12
Hright hand
(emotions) x 10 (actors) design. These portrayals have been
validated by extensive ratings that ensured high believability
(assessed as perceived capacity of the actor to communicate a
Forward-backward leaning of the head. Head movement is natural emotion impression), reliability (inter-rater agreement)
relied on as an important feature for distinguishing between and recognizability (accuracy scores) of the encoded emotion
various emotional expressions. Recent studies from the music [29]. The GEMEP was previously employed in two studies on
field highlighted the importance of head movements in the emotion recognition, in [30] and [31]. Our study stands out
communication of the emotional intentions of the player [24]. by (i) formulating a systematic approach to produce a gesture-
The amount of head forward and backward leaning for each based description of emotion expressions based on head and

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 501


TABLE I. C LUSTER MEMBERSHIP OF THE 12EMOTION CLASSES ,
hands movement (ii) by considering a larger subset of 120 EACH CELL DENOTES A PORTRAYAL
portrayals representing a much wider spectrum of emotions
(12) and (iii) by including both frontal and side views in the
analysis.

B. Dimension Reduction and Two-step Clustering


Four components were retained that accounted for 56%
of the variance. The first component (29.4% of explained
variance) roughly corresponds to motion activity (the amount
of energy), the second component (10.3%) to the temporal and
spatial excursion of movement, the third component (9.4%)
corresponds to the spatial extent and postural symmetry, and
the fourth component (6.9%) to the motion discontinuity and
jerkiness (is the movement irregular?). These results are in line
with previous research findings [3], [18], [32]. The complex
original set of expressive gesture features is thus best summa-
rized by proposing a four-dimensional scheme to characterize
affective-related movement.
In order to validate the effectiveness of the features in
representing emotion portrayals, we applied clustering tech-
niques, an unsupervised learning method, on the PCA-based
data to classify a portrayal as belonging to a single cluster.
To determine automatically the optimal number of clusters
and to avoid capitalizing on chance, we applied bootstrapping
techniques. We generated thousand resamples of the original
dataset, each of which was obtained by random sampling
with replacement from the original dataset. The most frequent
solution resulting from the procedure applied on the GEMEP
data set was a four-cluster solution (73,6% of bootstrapped Fig. 4. The four clusters (blue, red, green and purple lines) with their
samples). corresponding values of the four principal components: (1st PC) motion ac-
tivity (lozenge-shape), (2nd PC) temporal and spatial excursion of movement
C. Results (square), (3rd PC) spatial extent and postural symmetry (triangle), (4th PC)
motion discontinuity and jerkiness (circle); the y axis corresponds to the
We investigate how much information is retained in the estimated marginal means for z-score
minimal representation obtained from the application of the
experimental framework to the GEMEP corpus. The four
obtained clusters were compared with the original twelve 2 (high negative) identifies portrayals whose movements are
emotion sets (ground truth) by creating a 12 (emotions) x also characterized by relatively high loadings on the first and
4 (clusters) contingency table. As detailed in the Table 1, second principal components, yet much less accentuated than
results showed that the majority of emotion portrayals are in cluster one. This second cluster also clearly distinguished
clearly associated with one of the four clusters. Specifically, itself by a high score of jerkiness, refering to movements that
the valence quality of an emotion, usually referred to as unfold in a discontinuous way. In cluster 3 (low positive),
the pleasantness-unpleasantness dimension, and arousal were groups portrayals are characterized by low motor activity and a
equally appropriate to explain the cluster outcomes. Each of the reduced spatio-temporal excursion of movements with respect
four clusters represents one quadrant of the two-dimensional to the portrayals aggregated in cluster one and two. Portrayals
structure proposed by [33]. Only one portrayal out of the 120 grouped in cluster 4 (low negative) are characterized by the low
(actor 2 portraying hot anger) was classified under cluster 1 activity, spatio-temporal excursion and jerkiness in movement
instead of cluster 2. execution, yet displaying a symmetry score similar to the one
observed in cluster 1 and 2.
D. Cluster Profiles
The current results show that the twelve emotions we
Figure 4 shows that Cluster 1 (high positive) groups addressed in this study can be distinguished on the basis
portrayals with high loadings on the first and second principal of four components of head and hand movement features.
components (PC), respectively referring to the motion activity The emotion categories grouped according to their position
and to the temporal and spatial excursion of movement. The in the arousal/valence dimensions, presented a high intern
movement pattern in this group is actually one of very high homogeneity (low intra-class variability) with respect to the
amount of activity and high movement excursion compared four cluster types. Cluster overlap on symmetry (see Figure
to the other clusters. Values of symmetry and especially the 4, estimated means of the four clusters converge around
motion discontinuity or jerkiness (respectively the 3th and 4th 0) confirms that information on the dynamics of movement
PC) are on the contrary relatively lower referring to asym- performance is required to disambiguate the results obtained
metric and slightly discontinuous type of movements. Cluster on the sole basis of postural information [34], [35].

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 502


Following [36], the most typical portrayals of each cluster to the four quadrants of the valence/arousal space, could be
was identified as the closest, based upon Euclidean distance, distinguished. Further, these emotions were characterized by
to the cluster centroid in the principal components coordinate homogeneous movement patterns, as emerged by the cluster
system. Snapshots of the frontal and side views of the four analysis. This is a remarkable result given the low quality and
selected portrayals are displayed in Figure 5). limited amount of information.

In addition, the identification of multiple clusters within


the bodily modality extend findings from [37], while adding
spatio-temporal features of movement into the equation. In
particular, the four-dimensional scheme proposed in the system
for characterizing expressive behavior confirmed that dynamic
aspects are complementary to postural and gesture shape-
related information. These results also corroborate the recent
view that bodily expressions of emotion constitute a relevant
source of nonverbal emotion information [38].

In particular, the current results suggest that many of


the distinctive cue parameters that distinguish the bodily ex-
pression of different emotions may be dynamic rather than
categorical in nature - concerning movement quality rather
than specific types of gestures, as they have been studied in
linguistically oriented gesture research or work on nonverbal
communication [1]. In consequence, a paradigm shift towards
greater emphasis on the dynamic quality and unfolding of
expressive behavior may help to provide a breakthrough in
our ability to analyze and synthesize the mapping of emotion
specificity into gestural, facial, and vocal behavior - with obvi-
ous benefits for leading edge affective computing applications.

Fig. 5. The selected four typical portrayals of each cluster, identified as the
closest to their cluster centroid. From top to bottom, amusement, hot anger,
interest and cold anger emotions respectively portrayed by actors 10, 6, 1 and
4. From left to right, the four key-frames of each portrayal.

It is worth pointing out that cluster profiles conform to


the literature in non-verbal behavior research, developmental
psychology, clinical and dance movement studies. According
to [32], [3], a high amount of movement activity and variability
clearly characterize high arousal emotions such as the one
contained in cluster 1 and 2. High negative emotions grouped
in Cluster 2 are further related to motion discontinuity (“jerk-
iness”) confirming previous results by [10].

Fig. 6. The EU-ICT I-SEARCH project: An EyesWeb XMI experimental


V. C ONCLUSION application implementing the minimal representation of emotion described in
this paper: the snapshot in the figure shows the real-time extraction of Activity,
This paper introduces an experimental framework and al- Spatial Extent, Symmetry, Jerkiness from a single camera video input.
gorithms for the automated extraction in real time of a reduced
set of features, candidate for characterizing affective behavior.
Results can be used for developing techniques for lossy
compression of emotion portrayals from a reduced amount of This research aims at contributing to concrete applications
visual information related to human upper-body movements in affective computing, multimodal interfaces, and user centric
and gestures. media, exploiting non-verbal affective behaviour captured by a
limited amount of visual information. The behavioral data were
In order to evaluate the effectiveness of the experimental acquired by means of two video cameras (see Figure 6); the
framework and algorithms, they were applied to extract a same data could now be obtained by a single videocamera, e.g.
minimal representation from the GEMEP corpus and used the emerging 3D webcams (time-of-flight camera, MS Kinect,
as input to unsupervised machine-learning techniques. The or stereo webcam) confirming the validity of such minimalist,
analysis suggested that meaningful groups of emotions, related yet efficient approach of affective computing.

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 503


R EFERENCES [21] J. Yan, “Effects of Aging on Linear and Curvilinear Aiming Arm
Movements,” Experimental Aging Research, vol. 26, no. 4, pp. 393–
[1] D. McNeill, Hand and mind: What gesture reveals about thought. 407, 2000.
Chicago: University of Chicago Press, 1992.
[22] C. Roether, L. Omlor, and M. Giese, “Lateral asymmetry of bodily
[2] P. Ekman and W. Friesen, “Head and body cues in the judgment of
emotion expression,” Current Biology, vol. 18, no. 8, pp. 329–330, 2008.
emotion: A reformulation,” Perceptual and motor skills, vol. 24, no. 3
PT 1, pp. 711–724, 1967. [23] P. Cordier, M. Mendes France, J. Pailhous, and P. Bolon, “Entropy as
[3] H. Wallbott, “Bodily expression of emotion,” Eur. J. Soc. Psychol, a global variable of the learning process,” Human movement science,
vol. 28, pp. 879–896, 1998. vol. 13, no. 6, pp. 745–763, 1994.
[4] A. Bobick, “Movement, activity and action: the role of knowledge in the [24] S. Dahl and A. Friberg, “Visual Perception of Expressiveness in
perception of motion,” Philosophical Transactions of the Royal Society Musicians’ Body Movements,” Music Perception, vol. 24, no. 5, pp.
B: Biological Sciences, vol. 352, no. 1358, pp. 1257–1265, 1997. 433–454, 2007.
[25] A. Camurri, B. Mazzarino, and G. Volpe, “Expressive interfaces,”
[5] R. Laban and L. Ullmann, The Mastery of Movement. Plays, Inc., 8
Cognition, Technology & Work, vol. 6, no. 1, pp. 15–22, 2004.
Arlington Street, Boston, Mass. 02116, 1971.
[6] N. Bianchi-berthouze and A. Kleinsmith, “A categorical approach to [26] P. Peres-Neto, D. Jackson, and K. Somers, “How many principal
affective gesture recognition,” Connection Science, vol. 15, no. 4, pp. components? stopping rules for determining the number of non-trivial
259–269, 2003. axes revisited,” Computational Statistics and Data Analysis, vol. 49,
no. 4, pp. 974–997, 2005.
[7] B. De Gelder, “Towards the neurobiology of emotional body language,”
Nature reviews. Neuroscience(Print), vol. 7, no. 3, pp. 242–249, 2006. [27] J. Horn, “A rationale and test for the number of factors in factor
analysis,” Psychometrika, vol. 30, no. 2, pp. 179–185, 1965.
[8] A. Kapur, A. Kapur, N. Virji-Babul, G. Tzanetakis, and P. Driessen,
“Gesture-Based Affective Computing on Motion Capture Data,” Lecture [28] W. Velicer, “Determining the number of components from the matrix of
Notes in Computer Science, vol. 3784, p. 1, 2005. partial correlations,” Psychometrika, vol. 41, no. 3, pp. 321–327, 1976.
[9] D. Bernhardt and P. Robinson, “Detecting Affect from Non-stylised [29] T. Banziger and K. Scherer, Introducing the Geneva Multimodal Emo-
Body Motions,” Lecture Notes in Computer Science, vol. 4738, p. 59, tion Portrayal (GEMEP) Corpus. Oxford, England: Oxford University
2007. Press, 2010, ch. Blueprint for affective computing: A sourcebook, pp.
[10] F. Pollick, H. Paterson, A. Bruderlin, and A. Sanford, “Perceiving affect 271–294.
from arm movement,” Cognition, vol. 82, no. 2, pp. 51–61, 2001. [30] G. Castellano and M. Mancini, “Analysis of emotional gestures for
[11] N. Freedman, “Hands, words and mind: On the structuralization of body the generation of expressive copying behaviour in an embodied agent,”
movements during discourse and the capacity for verbal representation,” LNCS 5085, pp. 193–198, 2009.
Communicative structures and psychic structures: A psychoanalytic [31] S. Vieillard and M. Guidetti, “Children’s perception and understanding
interpretation of communication, pp. 109–132, 1977. of (dis) similarities among dynamic bodily/facial expressions of hap-
[12] P. Siple, Understanding Language Through Sign Language Research. piness, pleasure, anger, and irritation,” Journal of experimental child
Academic Press Inc, 1978. psychology, vol. 102, no. 1, pp. 78–95, 2009.
[13] H. Gunes and M. Piccardi, “Automatic temporal segment detection and [32] M. Meijer, “The contribution of general features of body movement to
affect recognition from face and body display,” IEEE Transactions on the attribution of emotions,” Journal of Nonverbal Behavior, vol. 13,
Systems, Man, and Cybernetics, Part B, vol. 39, no. 1, pp. 64–84, Feb. no. 4, pp. 247–268, 1989.
2009.
[33] J. Russell, J. Bachorowski, and J. Fernandez-Dols, “Facial and Vocal
[14] A. Camurri, I. Lagerlöf, and G. Volpe, “Recognizing emotion from
Expressions of Emotion,” Annual Reviews in Psychology, vol. 54, no. 1,
dance movement: Comparison of spectator recognition and automated
pp. 329–349, 2003.
techniques,” International Journal of Human-Computer Studies, Else-
vier Science, vol. 59, pp. 213–225, july 2003. [34] N. Bianchi-Berthouze, P. Cairns, A. Cox, C. Jennett, and W. Kim, “On
[15] A. Camurri, G. Volpe, G. De Poli, and M. Leman, “Communicating posture as a modality for expressing and recognizing emotions,” in
Expressiveness and Affect in Multimodal Interactive Systems,” IEEE Workshop on the role of emotion in HCI, 2006.
Multimedia, pp. 43–53, 2005. [35] G. Castellano, M. Mortillaro, A. Camurri, G. Volpe, and K. Scherer,
[16] A. Kapoor, W. Burleson, and R. Picard, “Automatic prediction of “Automated Analysis of Body Movement in Emotionally Expressive
frustration,” International Journal of Human-Computer Studies, vol. 65, Piano Performances,” Music Perception, vol. 26, no. 2, pp. 103–119,
no. 8, pp. 724–736, 2007. 2008.
[17] W. Dempster and G. Gaughran, “Properties of body segments based on [36] E. Braun, B. Geurten, and M. Egelhaaf, “Identifying prototypical
size and weight,” American Journal of Anatomy, vol. 120, no. 1, pp. components in behaviour using clustering algorithms,” PLoS ONE,
33–54, 1967. vol. 5, no. 2, p. e9361, 02 2010.
[18] A. Mehrabian, Nonverbal Communication. Aldine, 2007. [37] K. R. Scherer, “The dynamic architecture of emotion: Evidence for
[19] T. Flash and N. Hogan, “The coordination of arm movements: an ex- the component process model,” Cognition and Emotion, vol. 23(7), pp.
perimentally confirmed mathematical model,” Journal of Neuroscience, 1307–1351, 2009.
vol. 5, no. 7, pp. 1688–1703, 1985. [38] A. Vinciarelli, M. Pantic, and H. Bourlard, “Social signals, their
[20] A. Hreljac, “The relationship between smoothness and performance function, and automatic analysis: a survey,” in Proceedings of the 10th
during the practice of a lower limb obstacle avoidance task,” Biological international conference on Multimodal interfaces. ACM New York,
Cybernetics, vol. 68, no. 4, pp. 375–379, 1993. NY, USA, 2008, pp. 61–68.

978-1-4799-9953-8/15/$31.00 ©2015 IEEE 504

You might also like