Professional Documents
Culture Documents
256
are presented with a series of tasks, which can be solved by have less semantic content than spoken language. Questions,
using their skills in reading directions, measuring distances therefore, arise as to whether verbal utterances with simi-
and identifying symbols on a map. For instance, a sam- lar intended valence provide the necessary context to align
ple task would be “Go North West for 550 meters and find with the intended valence of the sound emblem; and ulti-
a museum”. Figure 1 gives the architecture whereby the mately whether sound emblems, when combined with verbal
robot uses sensors to determine the user’s affect and learner utterances, can be used to shape the emotional climate of
model, and adapts its interaction accordingly through the interaction between robots and humans.
Interaction Manager [8]. The system generates personalised
pedagogical feedback that is aligned with the user’s affect, 4. METHOD
for example, the robot gives hints and encouraging remarks
120 participants (28 female, MAge = 31.02, SDAge = 8.03)
if the user is getting frustrated. Here, we discuss a percep-
from 35 countries were recruited on Crowdflower’s online
tion study that has informed the generation of this affective
platform2 . Each participant listened to and rated a set of 5
feedback as part of the Multimodal Output Manager mod-
utterances randomly picked from the set of 30 stimuli and
ule. All modules and other related resources are available
were paid approximately US$1.50 on completion of each set.
from the Emote project website.
A total of approximately 5,500 ratings were collected.
257
Figure 2: Digital version of the Affect Grid adapted Figure 3: Sound selection with BEST-IDs of positive
from [27] as a single-item measure of valence and and negative stimuli from the BEST dataset as a
arousal. In this example, a judgment (indicated by function of their valence and arousal ratings. Error
the cross) has been made on the 9x9 grid indicating bars represent standard error of the mean (SEM)
a response of ‘7’ for valence, and ‘8’ for arousal
258
Figure 4: Mean ratings from a 7 point Likert Scale for ‘Pleasantness’, ‘Politeness’ and ‘Naturalness’. Mean
and standard deviations are given
With regards Hypotheses II and III, for all three dimen- standard deviations given in Figure 4, which are greater for
sions, the ANOVA showed a main effect (p < 0.05) for the those utterances containing a sound emblem. Furthermore,
‘Position Factor’ (3 levels: ‘sound before’, ‘sound after’, ‘ut- it should be noted that, while participants in the present
terance only’)4 . The DKT post-hoc comparison helps us to study were informed that they would be listening to robot
understand that, as it happens, for all three dimensions, the speech, they were given no concrete visual representation of
significant contrasts (p < 0.05) are between ‘utterance only’ the robot. Therefore, it is possible that basic differences in
vs. ‘sound after’, as well as, between ‘utterance only’ vs. perceived ‘Naturalness’ between ‘with sound emblem’ and
‘sound before’, but not between ‘sound after’ vs. ‘sound be- ‘without sound emblem’ stimuli may have been partially re-
fore’. Therefore, for our participants, the stimuli presented sponsible for the overall reductions in ‘Pleasantness’ for the
in the ‘utterance only’ condition are perceived as more pleas- composite stimuli. In an embodied presentation situation,
ant, natural and polite than those stimuli paired with an the sounds might thus have been perceived as more congru-
utterance with a sound emblem in either position. In other ent with the TTS than it was the case in the present online
words, adding an emblem to an utterance (be it positive study.
or negative) significantly reduces the ‘Pleasantness’, ‘Natu- One advantage of sound emblems is that they are lan-
ralness’ and ‘Politeness’. For positive feedback, adding an guage independent. However, further studies would need to
emblem, in either position, does not increase the valence investigate cultural differences along the lines of the studies
(i.e. make it more positive), in fact, surprisingly, it has the reported in [5]. In addition, future work involves repeating
opposite effect and makes it perceived as significantly more the study for the target age group of children and with the
negative on the three dimensions (p < 0.05). physical embodiment of the robot.
In the Emote system for a robotic tutor, the multimodal
output is currently categorised as a binary negative or pos-
6. DISCUSSION AND FUTURE WORK itive and the choice of this feedback is triggered by the
Our study suggests that, even if every attempt is made to inferred affect of the user in terms of four broad affective
make negative feedback sound pleasant and polite, that it states: frustrated, excited, bored and neutral. Future work
will still be deemed as less so than positive feedback. This would involve making the system adaptive at a more fine-
work can, therefore, be used to inform the design of robotic grained level both in terms of the affect it infers from the
multimodal output, allowing the system designer to be aware user and the multimodal output it generates. The study de-
of the emotional valence induced by the various types of scribed here moves the field closer to these highly sensitive
feedback so as to define strategies to mitigate any negative affective systems.
consequences. Whilst the results concerning the use of sound
emblems are somewhat unexpected, it may be that the bi-
nary categorisation of negative/positive may be too coarse
7. ACKNOWLEDGMENTS
and perhaps a more fine-grained and sophisticated pairing This work was supported by the European Commission
between the utterances is required in terms of utterance con- (EC) and was funded by the EU FP7 ICT-317923 project
tent and length along with the intended valence of the sound Emote. The authors are solely responsible for the content of
emblems. this publication. It does not represent the opinion of the EC,
In addition, it is possible that users are so unused to hear- and the EC is not responsible for any use that might be made
ing sound emblems of this type in everyday conversation that of data appearing therein. We would like to acknowledge the
they were not sure how to judge them. This lack of general other Emote partners for their collaboration in developing
consensus on how to interpret the emblems is reflected by the the system described here.
4
F (2, 238) = 28.88 for ‘Pleasantness’; F (2, 238) = 22.17 for
‘Politeness’; and F (2, 238) = 32.24 for ‘Naturalness’
259
8. REFERENCES [18] I. Leite, G. Castellano, A. Pereira, C. Martinho, and
[1] R. Black. Bebot - Robot Synth. Normalware, 2014. A. Paiva. Modelling empathic behaviour in a robotic
[2] G. Castellano, A. Paiva, A. Kappas, R. Aylett, game companion for children: an ethnographic study
H. Hastie, W. Barendregt, F. Nabais, and S. Bull. in real-world settings. In Proc. of HRI, 2012.
Towards empathic virtual and robotic tutors. In Proc. [19] F. Mairesse and M. Walker. Personality generation for
of AIED. 2013. dialogue. In 45th Annual Meeting of the Association
[3] N. Dethlefs, H. Cuayahuitl, H. Hastie, V. Rieser, and for Computational Linguistics (ACL), 2007.
O. Lemon. Cluster-based prediction of user ratings for [20] J. J. Ohala. The acoustic origin of the smile. The
stylistic surface realisation. In Proc. of EACL, 2014. Journal of the Acoustical Society of America,
[4] P. Ekman and W. V. Friesen. The repertoire of 68(S1):S33–S33, 1980.
nonverbal behavior: Categories, origins, usage, and [21] N. K. Person, R. J. Kreuz, R. A. Zwaan, and A. C.
coding. Semiotica, 1(1):49–98, 1969. Graesser. Pragmatics and pedagogy: Conversational
[5] H. Elfenbein and N. Ambady. On the universality and rules and politeness strategies may inhibit effective
cultural specificity of emotion recognition: a tutoring. Journal of Cognition and Instruction,
meta-analysis. 128(2):203–35, 2002. 13(2):161-188, 1995.
[6] A. C. Graesser, K. Wiemer-Hastings, [22] R. W. Picard, S. Papert, W. Bender, B. Blumberg,
P. Wiemer-Hastings, and R. Kreuz. Autotutor: A C. Breazeal, D. Cavallo, T. Machover, M. Resnick,
simulation of a human tutor. Cognitive Systems D. Roy, and C. Strohecker. Affective learning — a
Research, 1(1):35 – 51, 1999. manifesto. BT Technology Journal, 22(4):253–269,
[7] C. E. Izard. The psychology of emotions. Springer 2002.
Science & Business Media, 1991. [23] R. Read and T. Belpaeme. How to use non-linguistic
[8] S. Janarthanam, H. Hastie, A. Deshmukh, R. Aylett, utterances to convey emotion in child-robot
and M. E. Foster. A reusable interaction management interaction. In Proc. of the seventh annual
module: Use case for empathic robotic tutoring. In ACM/IEEE international conference on Human-Robot
Proc. of SemDial, 2015. Interaction, pages 219–220. ACM, 2012.
[9] E.-S. Jee, Y.-J. Jeong, C. H. Kim, and H. Kobayashi. [24] R. Read and T. Belpaeme. Situational context directs
Sound design for emotion and intention expression of how people affectively interpret robotic non-linguistic
socially interactive robots. Intelligent Service Robotics, utterances. In Proc. of the 2014 ACM/IEEE
3(3):199–206, 2010. international conference on Human-robot interaction,
[10] H. G. Johnson, P. Ekman, and W. V. Friesen. pages 41–48. ACM, 2014.
Communicative body movements: American emblems. [25] J. A. Russell. A circumplex model of affect. J. Pers.
Semiotica, 15(4):335–354, 1975. Soc. Psychol., 39(6):1161–1178, 1980.
[11] A. Kappas, D. Kuester, P. Dente, and C. Basedow. [26] J. A. Russell. Emotion, core affect, and psychological
Simply the B.E.S.T.! creation and validation of the construction. Cognition and Emotion,
bremen emotional sounds toolkit. In Poster presented 23(7):1259–1283, 2009.
at the 1st International Convention of Psychological [27] J. A. Russell, A. Weiss, and G. A. Mendelsohn. Affect
Science, Amsterdam, the Netherlands, 2015. grid: a single-item scale of pleasure and arousal.
[12] W. Killgore. The affect grid: A moderately valid, Journal of personality and social psychology,
nonspecific measure of pleasure and arousal. 57(3):493, 1989.
Psychological reports, 83(2):639–642, 1998. [28] Y. I. Russell and F. Gobet. Sinuosity and the affect
[13] A. N. Kluger and A. DeNisi. The effects of feedback grid: A method for adjusting repeated mood scores 1.
interventions on performance: A historical review, a Perceptual and motor skills, 114(1):125–136, 2012.
meta-analysis, and a preliminary feedback intervention [29] K. R. Scherer. On the symbolic functions of vocal
theory. 119:254–284, 1996. affect expression. Journal of Language and Social
[14] T. Komatsu. Can we assign attitudes to a computer Psychology, 7(2):79–100, 1988.
based on its beep sounds? In Proc. of the Affective [30] K. R. Scherer. Affect bursts. Emotions: Essays on
Interactions: The computer in the affective loop emotion theory, 161:196, 1994.
Workshop at Intelligent User Interface 2005, 2005. [31] M. Schröder. Experimental study of affect bursts.
[15] T. Komatsu, K. Kobayashi, S. Yamada, K. Funakoshi, Speech communication, 40(1):99–116, 2003.
and M. Nakano. Augmenting expressivity of artificial [32] P. Sharp. Behavior modification in the secondary
subtle expressions (ASEs): Preliminary design school: A survey of students’ attitudes to rewards and
guideline for ases. In Proc. of the 5th Augmented praise. 9:109–112, 1985.
Human International Conference, 2014. [33] M. Tielman, M. Neerincx, J.-J. Meyer, and R. Looije.
[16] T. Komatsu and S. Yamada. How does the agents’ Adaptive emotional expression in robot-child
appearance affect users’ interpretation of the agents’ interaction. In Proc. of the 2014 ACM/IEEE
attitudes: Experimental investigation on expressing international conference on Human-robot interaction,
the same artificial sounds from agents with different pages 407–414. ACM, 2014.
appearances. Intl. Journal of Human–Computer [34] M. Yik, J. A. Russell, and J. H. Steiger. A 12-point
Interaction, 27(3):260–279, 2011. circumplex structure of core affect. Emotion,
[17] M. K. Lau. DTK: Dunnett-Tukey-Kramer Pairwise 11(4):705–731, 2011.
Multiple Comparison Test Adjusted for Unequal
Variances and Unequal Sample Sizes.
260