Professional Documents
Culture Documents
Research Article
Multimodal Affective Computing to Enhance the User
Experience of Educational Software Applications
Received 19 January 2018; Revised 8 July 2018; Accepted 5 August 2018; Published 13 September 2018
Copyright © 2018 Jose Maria Garcia-Garcia et al. This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
Affective computing is becoming more and more important as it enables to extend the possibilities of computing technologies by
incorporating emotions. In fact, the detection of users’ emotions has become one of the most important aspects regarding
Affective Computing. In this paper, we present an educational software application that incorporates affective computing by
detecting the users’ emotional states to adapt its behaviour to the emotions sensed. This way, we aim at increasing users’
engagement to keep them motivated for longer periods of time, thus improving their learning progress. To prove this, the
application has been assessed with real users. The performance of a set of users using the proposed system has been compared with
a control group that used the same system without implementing emotion detection. The outcomes of this evaluation have shown
that our proposed system, incorporating affective computing, produced better results than the one used by the control group.
Finally, we have assessed the application to prove that Because of the high potential of knowing how the user
including emotion detection in the implementation of ed- is feeling, this kind of technology (emotion detection) has
ucational software applications considerably improves users’ experienced an outburst in the business sector. Many
performance. technology companies have recently emerged, focused ex-
The rest of the paper is organized in the following sections: clusively on developing technologies capable of detecting
In Section 2, some background concepts and related works are emotions from specific input. In the following sections, we
presented. In Section 3, we describe the educational software present a brief review of each kind of affective information
application we have developed enhanced with affective channel, along with some existing technologies capable of
computing-related technologies. Section 4 shows the evalu- detecting this kind of information.
ation process carried out to prove the benefits of the system
developed. Finally, Section 5 presents some conclusions and
final remarks. 2.2. Emotion Detection Technologies. This section presents
a summary of the different technologies used to detect
emotions considering the various channels from which af-
2. Background Concepts and Related Works fective information can be obtained: emotion from speech,
emotion from text, emotion from facial expressions, emo-
In this section, a summary of the background concepts of
tion from body gestures and movements, and emotion from
affective computing and related technologies is put forward.
physiological states [13].
We provide a comparison among the different ways of
detecting emotions together with the technologies developed
in this field. 2.2.1. Emotion from Speech. The voice is one of the channels
used to gather emotional information from the user of
a system. When a person starts talking, they generate infor-
2.1. Affective Computing. Rosalind Picard used the term mation in two different channels: primary and secondary [14].
“affective computing” for the first time in 1995 [11]. This The primary channel is linked to the syntactic-semantic
technical report established the first ideas on this field. The part of the locution (what the person is literally saying),
aim was not to answer questions such as “what are emo- while the secondary channel is linked to paralinguistic in-
tions?,” “what causes them?,” or “why do we have them?,” formation of the speaker (tone, emotional state, and ges-
but to provide a definition of some terms in the field of tures). For example, someone says “That’s so funny”
affective computing. (primary channel) with a serious tone (secondary channel).
As stated before, the term “affective computing” was By looking at the information of the primary channel, the
finally set in 1997 as “computing that relates to, arises from, message received is that the speaker thinks that something is
or deliberately influences emotion or other affective phe- funny and by looking at the information received by the
nomena” [1]. More recently, we can find the definition of secondary channel, the real meaning of the message is
Affective computing as the study and development of sys- worked out: the speaker is lying or being sarcastic.
tems and devices that can recognize, interpret, process, and Four technologies in this category can be highlighted:
simulate human affects [4]. In other words, any form of Beyond Verbal [15], Vokaturi [16], EmoVoice [17] and Good
computing that has something to do with emotions. Due to Vibrations [18]. Table 1 shows the results of the comparative
the strong relation with emotions, their correct detection is study performed on the four analyzed technologies.
the cornerstone of Affective Computing. Even though each
type of technology works in a specific way, all of them share
a common core in the way they work, since an emotion 2.2.2. Emotion from Facial Expressions. As in the case of
detector is, fundamentally, an automatic classifier. speech, facial expressions reflect the emotions that a person
The creation of an automatic classifier involves collecting can be feeling. Eyebrows, lips, nose, mouth, and face
information, extracting the features which are important for muscles: they all reveal the emotions we are feeling. Even
our purpose, and finally training the model so it can rec- when a person tries to fake some emotion, still their own face
ognize and classify certain patterns [12]. Later, we can use is telling the truth. The technologies used in this field of
the model to classify new data. For example, if we want to emotion detection work in an analogous way to the ones
build a model to extract emotions of happiness and sadness used with speech: detecting a face, identifying the crucial
from facial expressions, we have to feed the model with points in the face which reveal the emotion expressed, and
pictures of people smiling, tagged with “happiness” and processing their positions to decide what emotion is being
pictures of people frowning, tagged with “sadness.” After detected.
that, when it receives a picture of a person smiling, it Some of the technologies used to detect emotions from
identifies the shown emotion as “happiness,” while pictures facial expressions are Emotion API (Microsoft Cognitive
of people frowning will return “sadness” as a result. Services) [19], Affectiva [20], nViso [21], and Kairos [22].
Humans express their feelings through several chan- Table 2 shows a comparative study.
nels: facial expressions, voices, body gestures and move- As far as the results are concerned, every tested tech-
ments, and so on. Even our bodies experiment visible nology showed considerable accuracy. However, several
physical reactions to emotions (breath and heart rate, conditions (reflection on glasses and bad lightning) mask
pupil’s size, etc.). important facial gestures, generating wrong results. For
Mobile Information Systems 3
example, an expression of pain, in a situation in which eyes with emotion detection from facial expressions, one of the
and/or brows cannot be seen, can be detected as a smile by most attractive fields to companies: posts from social media,
these technologies (because of the stretching, open mouth). messages sent to “Complaints” section, and so on. Com-
As far as time is concerned, Emotion API and Affectiva panies which can know how their customers are feeling have
show similar times to scan an image, while Kairos takes an advantage over companies which cannot. Table 3 shows
much longer to produce a result. Besides, the amount of a comparative study of some of the key aspects of each
values returned by Affectiva provides much more in- technology. It is remarkable that as far as text is concerned,
formation to the developers, and it is easier to interpret the most of the companies offer a demo or trial version on their
emotion that the user is showing than when we just have the websites, while companies working on face or voice rec-
weight of six emotions, for example. It is also remarkable the ognition are less transparent in this aspect. Regarding their
availability of Affectiva, which provides free services to those accuracy, the four technologies have yielded good values. On
dedicated to research and education or producing less than the one hand, BiText has proved to be the simplest one, as it
$1,000,000 yearly. only informs if the emotion detected is good or bad. This
way, the error threshold is wider and provides less wrong
results. On the other hand, Tone Analyzer has proved to be
2.2.3. Emotion from Text. There are certain situations in
less clear on its conclusions when the text does not contain
which the communication between two people, or between
some specific key words.
a person and a machine, does not have the visual component
As far as the completeness of results is concerned,
inherent to face-to-face communication. In a world domi-
Receptiviti has been the one giving more information, re-
nated by telecommunications, words are powerful allies to
vealing not only affective information but also personality-
discover how a person may be feeling. Although emotion
related information. The main drawback is that all these
detection from text (also referred as sentiment analysis)
technologies (except Synesketch) are pay services and may
must face more obstacles than the previous technologies
not be accessible to everyone. Since Synesketch is not as
(spelling errors, languages, and slang), it is another source of
powerful as the rest, it will require an extra effort to be used.
affective information to be considered. Since emotion de-
tection from texts analyzes the words contained on a mes-
sage, the process to analyze a text takes some more steps than 2.2.4. Emotion from Body Gestures and Movement. Even
the analysis of a face or a voice. There is still a model that though people do not use body gestures and movement to
needs to be trained, but now text must be processed in order communicate information in an active way, their body is
to use it to train a model [23]. This processing involves tasks constantly conveying affective information: tapping with the
of tokenization, parsing and part-of-speech tagging, lem- foot, crossing the arms, tilting the head, changing our position
matization, and stemming, among others. Four technologies a lot of times while seated, and so on. Body language reveals
of this category are Tone Analyzer [24], Receptiviti [25], what a person is feeling in the same way our voice does.
BiText [26], and Synesketch [27]. However, this field is quite new, and there is not a clear
Due to the big presence of social media and writing understanding about how to create systems able to detect
communication in the current society, this field is, along emotions relating to body language. Most researchers have
4 Mobile Information Systems
focused on facial expressions (over 95 per cent of the studies affect recognition through smartphone modalities and show
carried out on emotions detection have used faces as the current research trends towards mobile affective com-
stimuli), almost ignoring the rest of channels through which puting. Indeed, the special capacities of mobile devices open
people reveal affective information [28]. new research challenges in the field of affective computing
Despite the newness of this field, there are several that we aim to address in the mobile version of the system
proposals focused on recognizing emotional states from proposed.
body gestures, and these results are used for other purposes. Finally, we can also find available libraries to be used in
Experimental psychology has already demonstrated how different IDEs (integrated development environments)
certain types of movements are related to specific emotions supporting different programming languages. For instance,
[29]. For example, people experimenting fear will turn their NLTK, in python [32] can be used to analyze natural lan-
bodies away from the point which is causing that feeling; guage for sentiment analysis. Scikit-learn [33], also in py-
people experimenting happiness, surprise, or anger will turn thon, provides efficient tools for data mining and data
their bodies towards the point causing that feeling. analysis with machine learning techniques. Lastly, OpenCV
Since there are no technologies available for emotion (Open Source Computer Vision Library) [34] supports C++,
detection from body gestures, there is not any consensus Python, and Java interfaces in most operating systems. It is
about the data we need to detect emotions in this way. designed for computer vision and allows the detection of
Usually, experiments on this kind of emotion detection use elements caught by the camera in real time to analyze the
frameworks (as for instance, SSI) or technologies to detect facial points detected according, for instance, to the Facial
the body of the user (as for instance, Kinect), so the re- Action Coding System (FACS) proposed by Ekman and
searches are responsible for elaborating their own models Rosenberg [35]. The data gathered could be subsequently
and schemes for the emotion detection. These models are processed with the scikit-learn tool.
usually built around the joints of the body (hands, knees,
aff_information � get_affective_information()
neck, head, elbows, and so on) and the angle between the
body parts that they interconnect [30], but in the end, it is up #aff_information � {“face”: [. . .], “voice”: [. . .],
to the researchers. “mimic”: [. . .]}
stress_flags � {“face”: 0.0, “voice”: 0.0, “mimic”: 0.0}
#values from 0 to 1 indicating stress levels detected
2.2.5. Emotion from Physiological States. Physiologically
speaking, emotions originate on the limbic system. Within for er_channel, measures in aff_information:
this system, the amygdala generates emotional impulses measure_stress(er_channel, measures, stress_flags)
which create the physiological reactions associated with
emotions: electric activity on face muscles, electrodermal if (stress_flags[“face”] > 0.6 and
activity (also called galvanic skin response), pupil dilatation, stress_flags[“voice”] < 0.3 and
breath and heart rate, blood pressure, brain electric activity, stress_flags[“mimic”] < 0.1):
and so on. Emotions leave a trace on the body, and this can #reaction to affective state A
be measured with the right tools.
Nevertheless, the information coming directly from the if (stress_flags[“face”] < 0.1 and
body is harder to classify, at least with the category system stress_flags[“voice”] < 0.1 and
used in other emotion detection technologies. When stress_flags[“mimic”] < 0.5):
working with physiological signals, the best option is to #reaction to affective state B
adopt a classification system based on a dimensional ap-
proach [25]. An emotion is not just “happiness” or “sadness” ...
anymore, but a state determined by various dimensions, like
valence and arousal. It is because of this that the use of 3. Modifying the Behaviour of an Educational
physiological signals is usually reserved for research and Software Application Based on
studies, for example, related to autism. There are no emotion Emotion Recognition
detection services available for this kind of detection based
on physiological states, although there are plenty of sensors Human interaction is, by definition, multimodal [36]. Unless
to read these signals. the communication is done through phone or text, people
In a recent survey on mobile affective computing [31], can see the face of the people they are talking to, listen to
authors make a thorough review of the current literature on their voices, see their body, and so on. Humans are, at this
Mobile Information Systems 5
teachers of the primary school provided us within the school The whole evaluation process was divided into two parts:
premises.
(i) Introduction to the Test. At the beginning of the
evaluation, the procedure was explained to the
4.2. Evaluation Metrics. The system was measured con- sixteen children at a time, and the game instructions
sidering three types of metrics: effectiveness, efficiency, and for the different levels were given.
satisfaction, that is, the users’ subjective reactions when (ii) Performing the Test. Kids were called in pairs to the
using the system. Effectiveness was measured by consid- room where the laptops running System 1 and
ering task completion percentage, error frequency, and System 2 were prepared. None of the children knew
frequency of assistance offered to the child. Efficiency was what system they were going to play with. At the
measured by calculating the time needed to complete an end of the evaluation sessions, the sixteen children
activity, specifically, the mean time taken to achieve the completed the SUS questionnaire. Researchers were
activity. Besides, some other aspects were also considered present all the time, ready to assist the participants
such as the number of attempts needed to successfully and clarify doubts when necessary. When a partici-
complete a level, number of keystrokes, and the number of pant finished the test, they returned to their class-
times a key was pressed too fast as an indicative signal of room and called the next child to go in the evaluation
nervousness. room.
Finally, satisfaction was measured with the System
Usability Scale (SUS) slightly adapted for teenagers and kids To keep the results of each participant fully independent,
[39]. This questionnaire is composed of ten items related to the sixteen users were introduced on the database of the
the system usage. The users had to indicate the degree of prototype with the key “evalX,” being “X” a number. Users
agreement or disagreement on a 5-point scale. with an odd “X” used System 1, while those with an even “X”
were assigned to System 2 (control group).
The task that the participants had to perform was to play
4.3. Experimental Design. After several considerations the seven levels of the prototype, including each level
regarding the evaluation process for games used in a platform game and a reading out loud exercise. The data
learning environments [40], the following features were collected during the evaluation sessions were subsequently
established: analyzsed, and the outcomes are described next.
(i) Research Design. The sample of participants was
divided into two groups of the same size, being one 4.4. Evaluation Outcomes and Discussion. Although partic-
of them the control group. This control group tested ipants with System 1 needed, on average, a bit more time per
the application implemented without emotion de- level to finish (76.18 seconds against 72.7), we could ap-
tection and hence without modifying the behaviour preciate an improvement on the performance of the par-
of the application in real time according to the child’s ticipants using System 1, as most of them made less than 5
emotions. This one was called the System 2 group. mistakes on the last level, while only one of the control group
The other group tested the prototype implemented users of System 2 had less than 5 mistakes.
with emotion detection which adapted its behaviour, Figure 4 shows the evolution of the average number of
by modifying the pace of the game and difficulty mistakes, which increases in the control group (System 2)
level, according to the emotions detected on the user, from level 4 onwards. Since the game adapts its difficulty (in
in such a way that if the user becomes bored, the System 1), after detecting a peak of mistakes in the fourth
system increases the pace of the game and difficulty level (as a sign of stress, detected as a combination of
level and on the contrary, if the user becomes negative feelings found in the facial expression and the way
stressed or nervous, the system decreases the speed the participant used the keyboard), the difficulty level was
of the game and difficulty level. This one was called reduced. This adaptation made the next levels easier to play
the System 1 group. By doing this, it can be shown for participants using System 1, what was reflected in less
how using emotion detection to dynamically vary the mental effort. Since participants using System 2 did not have
difficulty level of an educational software application this feature, their average performance got worse.
influences the performance and user experience of On average, participants using System 1 needed 1.33
the students. attempts to finish each level, while participants using System
(ii) Intervention. The test was conducted in the premises 2 needed 1.59, almost 60% more. Also, the ratio of mistakes
of the primary school in a quiet room where just to total keystrokes was also higher in the case of System 2
the participants (two at a time) using System 1 and users (19% against the 12% from users of System 1). Like-
System 2 and the evaluators were present. We wise, System 2 users asked for help more often (13 times)
prepared two laptops of similar characteristics, one than System 1 users (10 times). In future experimental ac-
of them running System 1 with the version of tivities, the sample size would be increased in order to obtain
the application implemented with emotion recog- more valuable data.
nition and the other laptop running System 2 with The evaluation was carried out as a between-subjects
the version of the application without emotion design with emotion recognition as the independent variable
recognition. (using or not using emotion recognition features) and
8 Mobile Information Systems
7 100
9 92.5
10 11 87.5
13 82.5
15 80
5 Mean 89.06
However, some researchers are working on this issue so the Scholarship Program granted by the Spanish Ministry of
physiological signals can be used as the face or the voice. In Education, Culture and Sport, and the predoctoral fellowship
a not too distant future, reading the heartbeat of a person with reference 2017-BCL-6528, granted by the University of
with just a mobile with Bluetooth may not be as crazy as it Castilla-La Mancha. We would also like to thank the teachers
may sound. and pupils from the primary school “Escolapios” who col-
Previous technologies analyze the impact of an emotion in laborated in the assessment of the system.
our bodies, but what about our behaviour? A stressed person
usually tends to make more mistakes. In the case of a person References
interacting with a system, this will be translated in faster
movements through the user interface, or more mistakes when [1] R. W. Picard, Affective Computing, MIT Press, Cambridge,
selecting elements or typing, and so on. This can be logged and UK, 1997.
[2] E. Johnson, R. Hervás, C. Gutiérrez, T. Mondéjar, and
used as another indicator of the affective state of a person.
J. Bravo, “Analyzing and predicting empathy in neurotypical
All these technologies are not perfect. Humans can see each and nonneurotypical users with an affective avatar,” Mobile
other and estimate how other people are feeling within mil- Information Systems, vol. 2017, Article ID 7932529, 11 pages,
liseconds, and with a small threshold error, but these tech- 2017.
nologies can only try to figure out how a person is feeling [3] S. Koelstra, C. Muhl, M. Soleymani et al., “DEAP: a database
according to some input data. To get more accurate results, for emotion analysis using physiological signals,” IEEE
more than one input is required, so multimodal systems are the Transactions on Affective Computing, vol. 3, no. 1, pp. 18–31,
best way to guarantee results with the highest levels of accuracy. 2012.
In this paper, we present an educational software ap- [4] R. Kaliouby, “We need computers with empathy,” Technology
plication that incorporates affective computing by detecting Review, vol. 120, no. 6, p. 8, 2017.
[5] S. L. Marie-Sainte, M. S. Alrazgan, F. Bousbahi, S. Ghouzali,
the users’ emotional states to adapt its behaviour to the
and A. W. Abdul, “From mobile to wearable system:
emotions detected. Assessing this application in comparison a wearable RFID system to enhance teaching and learning
with another version without emotion detection, we can conditions,” Mobile Information Systems, vol. 2016, Article ID
conclude that the user experience and performance is higher 8364909, 10 pages, 2016.
when including a multimodal emotion detection system. [6] M. Li, Y. Xiang, B. Zhang, and Z. Huang, “A sentiment de-
Since the system is continuously adapting itself to the user livering estimate scheme based on trust chain in mobile social
according to the emotions detected, the level of difficulty network,” Mobile Information Systems, vol. 2015, Article ID
adjusts much better to their real needs. 745095, 20 pages, 2015.
On the basis of the outcomes of this research, new [7] B. Ovcjak, M. Hericko, and G. Polancic, “How do emotions
challenges and possibilities in other kind of applications will impact mobile services acceptance? A systematic literature
be explored; for example, we could “stress” a user in a game if review,” Mobile Information Systems, vol. 2016, Article ID
8253036, 18 pages, 2016.
the emotions detected show that the user is bored. The
[8] P. Williams, “Emotions and consumer behavior,” Journal of
application could even introduce dynamically other ele- Consumer Research, vol. 40, no. 5, pp. viii–xi, 2014.
ments to engage the user in the game. What is too simple [9] E. Andrade and D. Ariely, “The enduring impact of transient
bores a user, whereas what is too complex causes anxiety. emotions on decision making,” Organizational Behavior and
Changing the behaviour of an application dynamically Human Decision Processes, vol. 109, no. 1, pp. 1–8, 2009.
according to the user’s emotions, and also according to the [10] R. W. Picard, “Affective computing: challenges,” International
nature of the application, increases the satisfaction of the Journal of Human-Computer Studies, vol. 59, no. 1-2,
user and helps them decrease the number of mistakes. pp. 55–64, 2003.
As future work, among other things, we aim to improve [11] R. W. Picard, “Affective computing,” Tech. Rep. 321, M.I.T
the mobile aspects of the system and explore further the Media Laboratory Perceptual, Computing Section,
Cambridge, UK, 1995.
challenges that the sensors offered by mobile devices bring
[12] I. Morgun, Types of Machine Learning Algorithms, 2015.
about regarding emotion recognition, especially in educa- [13] J. Garcı́a-Garcı́a, V. Penichet, and M. Lozano, “Emotion
tional settings. detection: a technology review,” in Proceedings of XVIII In-
ternational Conference on Human Computer Interaction,
Data Availability Cancún, México, September 2017.
[14] S. Casale, A. Russo, G. Scebba, and S. Serrano, “Speech
The data used to support the findings of this study are emotion classification using machine learning algorithms,” in
available from the corresponding author upon request. Proceedings of IEEE International Conference on Semantic
Computing 2008, pp. 158–165, Santa Monica, CA, USA,
August 2008.
Conflicts of Interest [15] Beyond Verbal, “Beyond verbal–the emotions analytics,” May
2017, http://www.beyondverbal.com/.
The authors declare that they have no conflicts of interest. [16] Vokaturi, May 2017, https://vokaturi.com/.
[17] T. Vogt, E. André, and N. Bee, “EmoVoice—a framework for
Acknowledgments online recognition of emotions from voice,” in Perception in
Multimodal Dialogue Systems, E. André, L. Dybkjær,
This research work has been partially funded by the regional W. Minker, H. Neumann, R. Pieraccini, and M. Weber, Eds.,
project of JCCM with reference SBPLY/17/180501/000495, by Springer, Berlin, Heidelberg, Germany, 2008.
10 Mobile Information Systems
Advances in
International Journal of
Fuzzy
Reconfigurable Submit your manuscripts at Systems
Computing www.hindawi.com
Hindawi
www.hindawi.com Volume 2018 Hindawi Volume 2018
www.hindawi.com
Journal of
Computer Networks
and Communications
Advances in International Journal of
Scientific Human-Computer Engineering Advances in
Programming
Hindawi
Interaction
Hindawi
Mathematics
Hindawi
Civil Engineering
Hindawi
Hindawi
www.hindawi.com Volume 2018
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018
International Journal of
Biomedical Imaging