Recognition of Emotions

Hindawi
Mobile Information Systems

Volume 2018, Article ID 8751426, 10 pages
https://doi.org/10.1155/2018/8751426
Research Article
Multimodal Affective Computing to Enhance the User
Experience of Educational Software Applications
Jose Maria Garcia-Garcia ,1 Vı́ctor M. R. Penichet ,1 Marı́a Dolores Lozano ,1

Juan Enrique Garrido ,2 and Effie Lai-Chong Law3
1
Research Institute of Informatics, University of Castilla-La Mancha, Albacete, Spain
2
Escuela Politécnica Superior, University of Lleida, Lleida, Spain
3
Department of Informatics, University of Leicester, Leicester, UK
Correspondence should be addressed to Marı́a Dolores Lozano; maria.lozano@uclm.es
Received 19 January 2018; Revised 8 July 2018; Accepted 5 August 2018; Published 13 September 2018
Academic Editor: Salvatore Carta
Copyright © 2018 Jose Maria Garcia-Garcia et al. This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
Affective computing is becoming more and more important as it enables to extend the possibilities of computing technologies by
incorporating emotions. In fact, the detection of users’ emotions has become one of the most important aspects regarding
Affective Computing. In this paper, we present an educational software application that incorporates affective computing by
detecting the users’ emotional states to adapt its behaviour to the emotions sensed. This way, we aim at increasing users’
engagement to keep them motivated for longer periods of time, thus improving their learning progress. To prove this, the
application has been assessed with real users. The performance of a set of users using the proposed system has been compared with
a control group that used the same system without implementing emotion detection. The outcomes of this evaluation have shown
that our proposed system, incorporating affective computing, produced better results than the one used by the control group.
1. Introduction want computers to be genuinely intelligent and to interact

naturally with us, we must give computers the ability to
In 1997, Rosalind W. Picard [1] defined Affective Computing recognize, understand, even to have and express emotions.
as “computing that relates to, arises from, or influences In another different research work, Rosalind pointed out
emotions or other affective phenomena.” Since then, a general some inspiring challenges [10]: sensing and recognition,
concern about the consideration of the emotional states of modelling, expression, ethics, and utility of considering
users for different purposes has arisen in different research affect in HCI. Studying such challenges still makes sense
fields (phycology [2, 3], marketing, computing, etc.). since there are gaps to be explored behind them. In human-
Concretely, the underlying idea of Affective Computing computer interaction, emotion helps regulate and bias
is that computers that interact with humans need the ability processes in a helpful way.
to at least recognize affect [4]. Indeed, affective computing is In this paper, we focus our research in the use of
a new field, with recent results in areas such as learning [5], emotions to dynamically modify the behaviour of an edu-
information retrieval, communications [6], entertainment, cational software application according to the user feelings,
design, health, marketing, decision-making, and human as described in Section 3. This way, if the user is tired or
interaction where affective computing may be applied [7]. stressed, the application will decrease its pace and, in some
Different studies have proved the influence of emotions in cases, the level of difficulty. On the other hand, if the user is
consumers’ behaviour [8] and decision-making activities [9]. getting bored, the application will increase the pace and the
In computer science research, we could study emotions difficulty level so as to motivate the user to continue using
from different perspectives. Picard mentioned that if we the application.
2 Mobile Information Systems
Finally, we have assessed the application to prove that Because of the high potential of knowing how the user
including emotion detection in the implementation of ed- is feeling, this kind of technology (emotion detection) has
ucational software applications considerably improves users’ experienced an outburst in the business sector. Many
performance. technology companies have recently emerged, focused ex-
The rest of the paper is organized in the following sections: clusively on developing technologies capable of detecting
In Section 2, some background concepts and related works are emotions from specific input. In the following sections, we
presented. In Section 3, we describe the educational software present a brief review of each kind of affective information
application we have developed enhanced with affective channel, along with some existing technologies capable of
computing-related technologies. Section 4 shows the evalu- detecting this kind of information.
ation process carried out to prove the benefits of the system
developed. Finally, Section 5 presents some conclusions and
final remarks. 2.2. Emotion Detection Technologies. This section presents
a summary of the different technologies used to detect
emotions considering the various channels from which af-
2. Background Concepts and Related Works fective information can be obtained: emotion from speech,
emotion from text, emotion from facial expressions, emo-
In this section, a summary of the background concepts of
tion from body gestures and movements, and emotion from
affective computing and related technologies is put forward.
physiological states [13].
We provide a comparison among the different ways of
detecting emotions together with the technologies developed
in this field. 2.2.1. Emotion from Speech. The voice is one of the channels
used to gather emotional information from the user of
a system. When a person starts talking, they generate infor-
2.1. Affective Computing. Rosalind Picard used the term mation in two different channels: primary and secondary [14].
“affective computing” for the first time in 1995 [11]. This The primary channel is linked to the syntactic-semantic
technical report established the first ideas on this field. The part of the locution (what the person is literally saying),
aim was not to answer questions such as “what are emo- while the secondary channel is linked to paralinguistic in-
tions?,” “what causes them?,” or “why do we have them?,” formation of the speaker (tone, emotional state, and ges-
but to provide a definition of some terms in the field of tures). For example, someone says “That’s so funny”
affective computing. (primary channel) with a serious tone (secondary channel).
As stated before, the term “affective computing” was By looking at the information of the primary channel, the
finally set in 1997 as “computing that relates to, arises from, message received is that the speaker thinks that something is
or deliberately influences emotion or other affective phe- funny and by looking at the information received by the
nomena” [1]. More recently, we can find the definition of secondary channel, the real meaning of the message is
Affective computing as the study and development of sys- worked out: the speaker is lying or being sarcastic.
tems and devices that can recognize, interpret, process, and Four technologies in this category can be highlighted:
simulate human affects [4]. In other words, any form of Beyond Verbal [15], Vokaturi [16], EmoVoice [17] and Good
computing that has something to do with emotions. Due to Vibrations [18]. Table 1 shows the results of the comparative
the strong relation with emotions, their correct detection is study performed on the four analyzed technologies.
the cornerstone of Affective Computing. Even though each
type of technology works in a specific way, all of them share
a common core in the way they work, since an emotion 2.2.2. Emotion from Facial Expressions. As in the case of
detector is, fundamentally, an automatic classifier. speech, facial expressions reflect the emotions that a person
The creation of an automatic classifier involves collecting can be feeling. Eyebrows, lips, nose, mouth, and face
information, extracting the features which are important for muscles: they all reveal the emotions we are feeling. Even
our purpose, and finally training the model so it can rec- when a person tries to fake some emotion, still their own face
ognize and classify certain patterns [12]. Later, we can use is telling the truth. The technologies used in this field of
the model to classify new data. For example, if we want to emotion detection work in an analogous way to the ones
build a model to extract emotions of happiness and sadness used with speech: detecting a face, identifying the crucial
from facial expressions, we have to feed the model with points in the face which reveal the emotion expressed, and
pictures of people smiling, tagged with “happiness” and processing their positions to decide what emotion is being
pictures of people frowning, tagged with “sadness.” After detected.
that, when it receives a picture of a person smiling, it Some of the technologies used to detect emotions from
identifies the shown emotion as “happiness,” while pictures facial expressions are Emotion API (Microsoft Cognitive
of people frowning will return “sadness” as a result. Services) [19], Affectiva [20], nViso [21], and Kairos [22].
Humans express their feelings through several chan- Table 2 shows a comparative study.
nels: facial expressions, voices, body gestures and move- As far as the results are concerned, every tested tech-
ments, and so on. Even our bodies experiment visible nology showed considerable accuracy. However, several
physical reactions to emotions (breath and heart rate, conditions (reflection on glasses and bad lightning) mask
pupil’s size, etc.). important facial gestures, generating wrong results. For
Mobile Information Systems 3
Table 1: Comparison of emotion detection technologies from speech.

Requires Difficulty Free
Name API/SDK Information returned
Internet of use software
Beyond verbal API Yes Temper, arousal, valence, and mood (up to 432 emotions) Low No
Votakuri SDK No Happiness, neutrality, sadness, anger, and fear Medium Yes
EmoVoice SDK No Determined by developer High Yes
Good Happy level, relaxed level, angry level, scared level, and
SDK — Medium No
vibrations bored level
Table 2: Comparison of emotion detection technologies from facial expressions.

Requires Difficulty
Name API/SDK Information returned Free software
Internet of use
Happiness, sadness, fear, anger, surprise, neutral,
Emotion API API/SDK Yes Low Yes (limited)
disgust, and contempt
Joy, sadness, disgust, contempt, anger, fear, and
Affectiva API/SDK Yes Low Yes, with some restriction
surprise1
Happiness, sadness, fear, anger, surprise, disgust,
nViso API/SDK No — No
and neutral
Kairos API/SDK Yes Anger, disgust, fear, joy, sadness, and surprise2 Low Yes, only for personal use
1 2
Besides, it also detects different facial expressions, gender, age, ethnicity, valence, and engagement. Besides, it also detects user head position, gender, age,
glasses, facial expressions, and eye tracking.
example, an expression of pain, in a situation in which eyes with emotion detection from facial expressions, one of the
and/or brows cannot be seen, can be detected as a smile by most attractive fields to companies: posts from social media,
these technologies (because of the stretching, open mouth). messages sent to “Complaints” section, and so on. Com-
As far as time is concerned, Emotion API and Affectiva panies which can know how their customers are feeling have
show similar times to scan an image, while Kairos takes an advantage over companies which cannot. Table 3 shows
much longer to produce a result. Besides, the amount of a comparative study of some of the key aspects of each
values returned by Affectiva provides much more in- technology. It is remarkable that as far as text is concerned,
formation to the developers, and it is easier to interpret the most of the companies offer a demo or trial version on their
emotion that the user is showing than when we just have the websites, while companies working on face or voice rec-
weight of six emotions, for example. It is also remarkable the ognition are less transparent in this aspect. Regarding their
availability of Affectiva, which provides free services to those accuracy, the four technologies have yielded good values. On
dedicated to research and education or producing less than the one hand, BiText has proved to be the simplest one, as it
$1,000,000 yearly. only informs if the emotion detected is good or bad. This
way, the error threshold is wider and provides less wrong
results. On the other hand, Tone Analyzer has proved to be
2.2.3. Emotion from Text. There are certain situations in
less clear on its conclusions when the text does not contain
which the communication between two people, or between
some specific key words.
a person and a machine, does not have the visual component
As far as the completeness of results is concerned,
inherent to face-to-face communication. In a world domi-
Receptiviti has been the one giving more information, re-
nated by telecommunications, words are powerful allies to
vealing not only affective information but also personality-
discover how a person may be feeling. Although emotion
related information. The main drawback is that all these
detection from text (also referred as sentiment analysis)
technologies (except Synesketch) are pay services and may
must face more obstacles than the previous technologies
not be accessible to everyone. Since Synesketch is not as
(spelling errors, languages, and slang), it is another source of
powerful as the rest, it will require an extra effort to be used.
affective information to be considered. Since emotion de-
tection from texts analyzes the words contained on a mes-
sage, the process to analyze a text takes some more steps than 2.2.4. Emotion from Body Gestures and Movement. Even
the analysis of a face or a voice. There is still a model that though people do not use body gestures and movement to
needs to be trained, but now text must be processed in order communicate information in an active way, their body is
to use it to train a model [23]. This processing involves tasks constantly conveying affective information: tapping with the
of tokenization, parsing and part-of-speech tagging, lem- foot, crossing the arms, tilting the head, changing our position
matization, and stemming, among others. Four technologies a lot of times while seated, and so on. Body language reveals
of this category are Tone Analyzer [24], Receptiviti [25], what a person is feeling in the same way our voice does.
BiText [26], and Synesketch [27]. However, this field is quite new, and there is not a clear
Due to the big presence of social media and writing understanding about how to create systems able to detect
communication in the current society, this field is, along emotions relating to body language. Most researchers have
Table 3: Comparison of emotion detection technologies from text.

Name API/SDK Requires Internet Information returned Difficulty of use Free software
Tone analyzer API Yes Emotional, social, and language tone Low No
Receptiviti API Yes See [29] Low No
BiText API Yes Valence (positive/negative) Low No
Synesketch SDK No Six basic emotions Medium Yes
focused on facial expressions (over 95 per cent of the studies affect recognition through smartphone modalities and show
carried out on emotions detection have used faces as the current research trends towards mobile affective com-
stimuli), almost ignoring the rest of channels through which puting. Indeed, the special capacities of mobile devices open
people reveal affective information [28]. new research challenges in the field of affective computing
Despite the newness of this field, there are several that we aim to address in the mobile version of the system
proposals focused on recognizing emotional states from proposed.
body gestures, and these results are used for other purposes. Finally, we can also find available libraries to be used in
Experimental psychology has already demonstrated how different IDEs (integrated development environments)
certain types of movements are related to specific emotions supporting different programming languages. For instance,
[29]. For example, people experimenting fear will turn their NLTK, in python [32] can be used to analyze natural lan-
bodies away from the point which is causing that feeling; guage for sentiment analysis. Scikit-learn [33], also in py-
people experimenting happiness, surprise, or anger will turn thon, provides efficient tools for data mining and data
their bodies towards the point causing that feeling. analysis with machine learning techniques. Lastly, OpenCV
Since there are no technologies available for emotion (Open Source Computer Vision Library) [34] supports C++,
detection from body gestures, there is not any consensus Python, and Java interfaces in most operating systems. It is
about the data we need to detect emotions in this way. designed for computer vision and allows the detection of
Usually, experiments on this kind of emotion detection use elements caught by the camera in real time to analyze the
frameworks (as for instance, SSI) or technologies to detect facial points detected according, for instance, to the Facial
the body of the user (as for instance, Kinect), so the re- Action Coding System (FACS) proposed by Ekman and
searches are responsible for elaborating their own models Rosenberg [35]. The data gathered could be subsequently
and schemes for the emotion detection. These models are processed with the scikit-learn tool.
usually built around the joints of the body (hands, knees,
aff_information � get_affective_information()
neck, head, elbows, and so on) and the angle between the
body parts that they interconnect [30], but in the end, it is up #aff_information � {“face”: [. . .], “voice”: [. . .],
to the researchers. “mimic”: [. . .]}
stress_flags � {“face”: 0.0, “voice”: 0.0, “mimic”: 0.0}
#values from 0 to 1 indicating stress levels detected
2.2.5. Emotion from Physiological States. Physiologically
speaking, emotions originate on the limbic system. Within for er_channel, measures in aff_information:
this system, the amygdala generates emotional impulses measure_stress(er_channel, measures, stress_flags)
which create the physiological reactions associated with
emotions: electric activity on face muscles, electrodermal if (stress_flags[“face”] > 0.6 and
activity (also called galvanic skin response), pupil dilatation, stress_flags[“voice”] < 0.3 and
breath and heart rate, blood pressure, brain electric activity, stress_flags[“mimic”] < 0.1):
and so on. Emotions leave a trace on the body, and this can #reaction to affective state A
be measured with the right tools.
Nevertheless, the information coming directly from the if (stress_flags[“face”] < 0.1 and
body is harder to classify, at least with the category system stress_flags[“voice”] < 0.1 and
used in other emotion detection technologies. When stress_flags[“mimic”] < 0.5):
working with physiological signals, the best option is to #reaction to affective state B
adopt a classification system based on a dimensional ap-
proach [25]. An emotion is not just “happiness” or “sadness” ...
anymore, but a state determined by various dimensions, like
valence and arousal. It is because of this that the use of 3. Modifying the Behaviour of an Educational
physiological signals is usually reserved for research and Software Application Based on
studies, for example, related to autism. There are no emotion Emotion Recognition
detection services available for this kind of detection based
on physiological states, although there are plenty of sensors Human interaction is, by definition, multimodal [36]. Unless
to read these signals. the communication is done through phone or text, people
In a recent survey on mobile affective computing [31], can see the face of the people they are talking to, listen to
authors make a thorough review of the current literature on their voices, see their body, and so on. Humans are, at this
point, the best emotion detectors as we combine information

from several channels to estimate a result. This is how User System
multimodal systems work. interaction response
It is important to remark that a multimodal system is not
just a system which takes, for example, affective information 1 2
from the face and from the voice and calculates the average
of each value. The hard part of implementing one of these
systems is to combine the affective information correctly.
For example, a multimodal system combining text and facial
System behaviour
expressions that detects a serious face and the message “it is modification
very funny” will return “sarcasm/lack of interest,” while the
result of combining these results in an incorrect way will
return “happy/neutral.” It is proven that by combining 7
information from several channels, the accuracy of the App module
classification improves significantly.
For example, let us imagine we need to assess the stress
levels of a person considering the affective information Focused – yes
Data read
gathered through three different channels: affective in- Motivated – yes
from the Frustrated – 30%
formation extracted from facial expressions, voice, and body user Nervous – no
language. Since we have more than one channel, we can
support each measure taken from each channel with values
3 6
detected in the others.
This way, it is possible not only to confirm with a high
Analysis of
level of certainty the occurrence of an affective state, but also emotions
to detect situations that could not be sensed without per-
forming multimodal emotion detection, as sarcasm.
The following code snippet shows an easy example of
affective information combination. The mere fact of con-
sidering a measure in the context of more affective in- Emotions module
formation gives us a whole new dimension of information.
To this end, we have developed an initial prototype in Joy – 80.2%
order to study how using multimodal emotion detection Disgust – 10.8%
systems on educational software applications could enhance Pictures, Anger – 9.0%
sound tracks,
the user experience and performance. The proposed pro- etc.
Attention – 47%
totype, named emoCook, has been developed as a game to ...
teach English to 9–11-year-old children. Information about 4 5
this prototype can be found at [37]. At present, the prototype Emotions
is only available in Spanish as it is initially addressed to detected
Spanish-speaking children in the process of learning English.
The architecture of this application is shown in Figure 1.
During the gameplay, the user is transmitting affective in- Emotion
formation (Figure 1-1) through their face, their voice, their detection
behaviour, and so on. The prototype is receiving this in- services
formation (Figure 1-3) and sending it to several third-party
emotion detection services (Figure 1-4). After retrieving this
information (Figure 1-5), we put it in context to extract
conclusions from it about the user’s performance (Figure 1-6). Figure 1: Application architecture.
Based on these results, the pace and difficulty level of the game
changes (Figure 1-7), adapting it to the user’s affective state
(Figure 1-2). it is considered as a mistake. The maximum number of
The theme of the game was focused on cooking issues to mistakes allowed per level is five.
practice vocabulary and expressions related to this topic. It is After finishing this first part, the system shows a set of
organized in different recipes, from the easiest to hardest. sentences (more or less complex) including vocabulary re-
Each recipe is an independent level and is divided into two lated to the recipe that the player has to read out loud to
parts. The first part is a platform game in which the player practice speaking and pronunciation. If the user fails thrice
must gather all the ingredients needed to cook the recipe to read a sentence, the system will move to the next one, or
(Figure 2). The ingredients are falling from the sky all the finish the exercise if it is the last sentence.
time, along with other food we do not need for the recipe. If This prototype has been implemented with three emo-
the player catches any food that is not in the ingredients list, tion detection technologies, which monitor the player’s
Figure 2: emoCook prototype.
affective state, and the results obtained are used to change

the difficulty level and the pace of the game. Each time the
player finishes a level, the affective data are analyzed, and
according to the results, the difficulty of the next level is set.
The technologies integrated in the system are the following:
(i) Affectiva. It uses the camera feed to read the facial
expression of the player.
(ii) Beyond Verbal. It gathers the audio collected during
the speech exercise to identify the affective state of
the player attending to their speech features.
(iii) Keylogger. The game keeps a record of the keys
Figure 3: Mobile version of emoCook system.
pressed by the users, considering different factors:
when they press a correct key, when they do not,
when they press it too fast, and so on. the API offered in [38] with promising results that will be
Because of changes on Beyond Verbal API, affective data further explored.
from the speech could not be collected, so in the end, only
data from the facial expression (using Affectiva) and from 4. Evaluation of the System
the behaviour when pressing keys (using Keylogger) were
used. Affectiva is a third-party service, while Keylogger was In order to prove the initial hypothesis, the system has been
developed within the prototype. assessed with real users by applying the method described in
A mobile version of the system is also available, and it this section.
can be used through a browser running on a mobile device
[37]. This way, the game can be controlled both with the
arrow keys in a keyboard and by touching on a tactile screen. 4.1. Participants and Context. We recruited sixteen children
Touching on the left-hand side of the screen makes the aged between 10 and 11 years old belonging to the same
character move to the left. Touching on the right-hand side primary school and with a similar level of English knowledge
of the screen makes the character move to the right and to avoid differences in the education level that could affect
touching twice very quickly in any part of the screen makes the evaluation results. Their parents had been previously
the character jump upwards. informed and authorised their participation in this evalu-
Figure 3 shows a screenshot of the mobile version of the ation. The setup of the experiment consisted of two laptops,
application running in the Firefox browser in a mobile one in front of the other so that participants could not see
device. The possibility of using the system through a mobile each other. Both laptops were equipped with mouse and
device opens new ways of detecting emotions that we aim to webcam and Windows 10 as operating system and were
explore in further research. For instance, we could use connected to the same Wi-Fi network. The prototype was
sensors such as the accelerometer or gyroscope to gather accessed through the browser Google Chrome in both
affective information. Initial trials have been performed with laptops. This setup was prepared in a room the English
teachers of the primary school provided us within the school The whole evaluation process was divided into two parts:
premises.
(i) Introduction to the Test. At the beginning of the
evaluation, the procedure was explained to the
4.2. Evaluation Metrics. The system was measured con- sixteen children at a time, and the game instructions
sidering three types of metrics: effectiveness, efficiency, and for the different levels were given.
satisfaction, that is, the users’ subjective reactions when (ii) Performing the Test. Kids were called in pairs to the
using the system. Effectiveness was measured by consid- room where the laptops running System 1 and
ering task completion percentage, error frequency, and System 2 were prepared. None of the children knew
frequency of assistance offered to the child. Efficiency was what system they were going to play with. At the
measured by calculating the time needed to complete an end of the evaluation sessions, the sixteen children
activity, specifically, the mean time taken to achieve the completed the SUS questionnaire. Researchers were
activity. Besides, some other aspects were also considered present all the time, ready to assist the participants
such as the number of attempts needed to successfully and clarify doubts when necessary. When a partici-
complete a level, number of keystrokes, and the number of pant finished the test, they returned to their class-
times a key was pressed too fast as an indicative signal of room and called the next child to go in the evaluation
nervousness. room.
Finally, satisfaction was measured with the System
Usability Scale (SUS) slightly adapted for teenagers and kids To keep the results of each participant fully independent,
[39]. This questionnaire is composed of ten items related to the sixteen users were introduced on the database of the
the system usage. The users had to indicate the degree of prototype with the key “evalX,” being “X” a number. Users
agreement or disagreement on a 5-point scale. with an odd “X” used System 1, while those with an even “X”
were assigned to System 2 (control group).
The task that the participants had to perform was to play
4.3. Experimental Design. After several considerations the seven levels of the prototype, including each level
regarding the evaluation process for games used in a platform game and a reading out loud exercise. The data
learning environments [40], the following features were collected during the evaluation sessions were subsequently
established: analyzsed, and the outcomes are described next.
(i) Research Design. The sample of participants was
divided into two groups of the same size, being one 4.4. Evaluation Outcomes and Discussion. Although partic-
of them the control group. This control group tested ipants with System 1 needed, on average, a bit more time per
the application implemented without emotion de- level to finish (76.18 seconds against 72.7), we could ap-
tection and hence without modifying the behaviour preciate an improvement on the performance of the par-
of the application in real time according to the child’s ticipants using System 1, as most of them made less than 5
emotions. This one was called the System 2 group. mistakes on the last level, while only one of the control group
The other group tested the prototype implemented users of System 2 had less than 5 mistakes.
with emotion detection which adapted its behaviour, Figure 4 shows the evolution of the average number of
by modifying the pace of the game and difficulty mistakes, which increases in the control group (System 2)
level, according to the emotions detected on the user, from level 4 onwards. Since the game adapts its difficulty (in
in such a way that if the user becomes bored, the System 1), after detecting a peak of mistakes in the fourth
system increases the pace of the game and difficulty level (as a sign of stress, detected as a combination of
level and on the contrary, if the user becomes negative feelings found in the facial expression and the way
stressed or nervous, the system decreases the speed the participant used the keyboard), the difficulty level was
of the game and difficulty level. This one was called reduced. This adaptation made the next levels easier to play
the System 1 group. By doing this, it can be shown for participants using System 1, what was reflected in less
how using emotion detection to dynamically vary the mental effort. Since participants using System 2 did not have
difficulty level of an educational software application this feature, their average performance got worse.
influences the performance and user experience of On average, participants using System 1 needed 1.33
the students. attempts to finish each level, while participants using System
(ii) Intervention. The test was conducted in the premises 2 needed 1.59, almost 60% more. Also, the ratio of mistakes
of the primary school in a quiet room where just to total keystrokes was also higher in the case of System 2
the participants (two at a time) using System 1 and users (19% against the 12% from users of System 1). Like-
System 2 and the evaluators were present. We wise, System 2 users asked for help more often (13 times)
prepared two laptops of similar characteristics, one than System 1 users (10 times). In future experimental ac-
of them running System 1 with the version of tivities, the sample size would be increased in order to obtain
the application implemented with emotion recog- more valuable data.
nition and the other laptop running System 2 with The evaluation was carried out as a between-subjects
the version of the application without emotion design with emotion recognition as the independent variable
recognition. (using or not using emotion recognition features) and
Average mistakes per level Table 4: Satisfaction results for System 1.

20
Participant SUS score
1 90
3 90
15
5 90
Number of mistakes
7 100
9 92.5
10 11 87.5
13 82.5
15 80
5 Mean 89.06
0 Table 5: Satisfaction results for System 2.

1 2 3 4 5 6 7
Participant SUS score
System 1 2 92.5
System 2 4 75
Figure 4: Average mistakes per level. 6 87.5
8 85
10 75
attempts (attempts needed to finish each level), time (time 12 92.5
(seconds) needed to finish each level), mistakes (number of 14 72.5
mistakes), keystrokes (number of keystrokes), and stress 16 90
(number of times a key was pressed too fast in a short time) Mean 83.75
as the dependent variables.
We performed a standard t-test [41] to compare the
means of each dataset and test the null hypothesis that there SUS Results
90
was no significant difference in the students’ performance
when using emotion recognition to adapt the system be-
haviour. We used α 0.05 as our limit for statistical sig- 88
nificance, with significant results reported below.
Regarding keystrokes (t 0.97; p 0.666), mistakes 86
(t −1.51; p 0.26), and stress (t 1.13; p 0.51), t-test
results confirmed the null hypothesis was false and, thus, 84
that the two datasets are significantly different.
Although the dependent variables time (t 0.44; 82
t 1.31) and attempts (t −0.42; t 1.33) were similar in
both datasets, the efficiency (considered as the lowest
80
number of actions a user needs to finish each level) is greater
in users of System 1, even though both users of System 1 and System 1
System 2 finished within a similar time frame, what helped System 2
the first ones to make less mistakes. The outcomes of the Figure 5: Comparison of SUS results in both systems.
evaluation shown in Figure 4 indicate a clear improvement
when using System 1 as the number of mistakes increases in emotion detection has still many aspects to improve in the
users of System 2 at higher difficulty levels. coming years.
Finally, Table 4 and Table 5 show the results of the SUS Applications which obtain information from the voice
scores per system and participant. The final value is between need to be able to work in noisy environments, to detect
0 and 100, 100 being the highest degree of user’s satisfaction. subtle changes, maybe even to recognize words and more
As we can see, System 1 users rated the application with complex aspects of human speech, like sarcasm.
a higher level of satisfaction compared to the level obtained The same applies for applications that detect information
by users of System 2, as shown in Figure 5. from the face. Most people use glasses nowadays, which can
greatly complicate accurate detection of facial expressions.
5. Conclusions and Final Remarks Applications able to read body gestures do not even exist
now, even though it is a source of affective information as
Emotion detection, together with Affective Computing, is valid as the face. There are already applications for body
a thriving research field. Few years ago, this discipline did detection (Kinect), but there is no technology like Affectiva
not even exist, and now there are hundreds of companies or Beyond Verbal for the body yet.
working exclusively on it, and researchers are investing time Physiological signals are even less developed, because of
and resources on building affective applications. However, the imposition of sensors that this kind of detection requires.
However, some researchers are working on this issue so the Scholarship Program granted by the Spanish Ministry of
physiological signals can be used as the face or the voice. In Education, Culture and Sport, and the predoctoral fellowship
a not too distant future, reading the heartbeat of a person with reference 2017-BCL-6528, granted by the University of
with just a mobile with Bluetooth may not be as crazy as it Castilla-La Mancha. We would also like to thank the teachers
may sound. and pupils from the primary school “Escolapios” who col-
Previous technologies analyze the impact of an emotion in laborated in the assessment of the system.
our bodies, but what about our behaviour? A stressed person
usually tends to make more mistakes. In the case of a person References
interacting with a system, this will be translated in faster
movements through the user interface, or more mistakes when [1] R. W. Picard, Affective Computing, MIT Press, Cambridge,
selecting elements or typing, and so on. This can be logged and UK, 1997.
[2] E. Johnson, R. Hervás, C. Gutiérrez, T. Mondéjar, and
used as another indicator of the affective state of a person.
J. Bravo, “Analyzing and predicting empathy in neurotypical
All these technologies are not perfect. Humans can see each and nonneurotypical users with an affective avatar,” Mobile
other and estimate how other people are feeling within mil- Information Systems, vol. 2017, Article ID 7932529, 11 pages,
liseconds, and with a small threshold error, but these tech- 2017.
nologies can only try to figure out how a person is feeling [3] S. Koelstra, C. Muhl, M. Soleymani et al., “DEAP: a database
according to some input data. To get more accurate results, for emotion analysis using physiological signals,” IEEE
more than one input is required, so multimodal systems are the Transactions on Affective Computing, vol. 3, no. 1, pp. 18–31,
best way to guarantee results with the highest levels of accuracy. 2012.
In this paper, we present an educational software ap- [4] R. Kaliouby, “We need computers with empathy,” Technology
plication that incorporates affective computing by detecting Review, vol. 120, no. 6, p. 8, 2017.
[5] S. L. Marie-Sainte, M. S. Alrazgan, F. Bousbahi, S. Ghouzali,
the users’ emotional states to adapt its behaviour to the
and A. W. Abdul, “From mobile to wearable system:
emotions detected. Assessing this application in comparison a wearable RFID system to enhance teaching and learning
with another version without emotion detection, we can conditions,” Mobile Information Systems, vol. 2016, Article ID
conclude that the user experience and performance is higher 8364909, 10 pages, 2016.
when including a multimodal emotion detection system. [6] M. Li, Y. Xiang, B. Zhang, and Z. Huang, “A sentiment de-
Since the system is continuously adapting itself to the user livering estimate scheme based on trust chain in mobile social
according to the emotions detected, the level of difficulty network,” Mobile Information Systems, vol. 2015, Article ID
adjusts much better to their real needs. 745095, 20 pages, 2015.
On the basis of the outcomes of this research, new [7] B. Ovcjak, M. Hericko, and G. Polancic, “How do emotions
challenges and possibilities in other kind of applications will impact mobile services acceptance? A systematic literature
be explored; for example, we could “stress” a user in a game if review,” Mobile Information Systems, vol. 2016, Article ID
8253036, 18 pages, 2016.
the emotions detected show that the user is bored. The
[8] P. Williams, “Emotions and consumer behavior,” Journal of
application could even introduce dynamically other ele- Consumer Research, vol. 40, no. 5, pp. viii–xi, 2014.
ments to engage the user in the game. What is too simple [9] E. Andrade and D. Ariely, “The enduring impact of transient
bores a user, whereas what is too complex causes anxiety. emotions on decision making,” Organizational Behavior and
Changing the behaviour of an application dynamically Human Decision Processes, vol. 109, no. 1, pp. 1–8, 2009.
according to the user’s emotions, and also according to the [10] R. W. Picard, “Affective computing: challenges,” International
nature of the application, increases the satisfaction of the Journal of Human-Computer Studies, vol. 59, no. 1-2,
user and helps them decrease the number of mistakes. pp. 55–64, 2003.
As future work, among other things, we aim to improve [11] R. W. Picard, “Affective computing,” Tech. Rep. 321, M.I.T
the mobile aspects of the system and explore further the Media Laboratory Perceptual, Computing Section,
Cambridge, UK, 1995.
challenges that the sensors offered by mobile devices bring
[12] I. Morgun, Types of Machine Learning Algorithms, 2015.
about regarding emotion recognition, especially in educa- [13] J. Garcı́a-Garcı́a, V. Penichet, and M. Lozano, “Emotion
tional settings. detection: a technology review,” in Proceedings of XVIII In-
ternational Conference on Human Computer Interaction,
Data Availability Cancún, México, September 2017.
[14] S. Casale, A. Russo, G. Scebba, and S. Serrano, “Speech
The data used to support the findings of this study are emotion classification using machine learning algorithms,” in
available from the corresponding author upon request. Proceedings of IEEE International Conference on Semantic
Computing 2008, pp. 158–165, Santa Monica, CA, USA,
August 2008.
Conflicts of Interest [15] Beyond Verbal, “Beyond verbal–the emotions analytics,” May
2017, http://www.beyondverbal.com/.
The authors declare that they have no conflicts of interest. [16] Vokaturi, May 2017, https://vokaturi.com/.
[17] T. Vogt, E. André, and N. Bee, “EmoVoice—a framework for
Acknowledgments online recognition of emotions from voice,” in Perception in
Multimodal Dialogue Systems, E. André, L. Dybkjær,
This research work has been partially funded by the regional W. Minker, H. Neumann, R. Pieraccini, and M. Weber, Eds.,
project of JCCM with reference SBPLY/17/180501/000495, by Springer, Berlin, Heidelberg, Germany, 2008.
[18] Good Vibrations, “Good vibrations company B.V.–recognize

emotions directly from the voice,” May 2017, http://good-
vibrations.nl.
[19] Microsoft, “Microsoft cognitive services–emotion API,”
May 2017, https://www.microsoft.com/cognitive-services/
en-us/emotion-api.
[20] Affectiva, “Affectiva,” May 2017, http://www.affectiva.com.
[21] nViso, “Artificial intelligence emotion recognition software-
nViso,” May 2017, http://nviso.ch/.
[22] Kairos, “Face recognition, emotion analysis & demographics,”
May 2017, https://www.kairos.com/.
[23] H. Binali and V. Potdar, “Emotion detection state of the art,”
in Proceedings of the CUBE International Information Tech-
nology Conference on-CUBE ’12, pp. 501–507, New York, NY,
USA, September 2012.
[24] IBM, “Tone analyzer,” May 2017, https://tone-analyzer-demo.
mybluemix.net/.
[25] Receptiviti, May 2017, http://www.receptiviti.ai/.
[26] Bitext, “Bitext API,” May 2017, https://api.bitext.com.
[27] U. Krčadinac, “Synesketch: free open-source textual emotion
recognition and visualization,” May 2017, http://krcadinac.
com/synesketch/.
[28] A. Kleinsmith and N. Bianchi-Berthouze, “Affective body
expression perception and recognition: a survey,” IEEE
Transactions on Affective Computing, vol. 4, no. 1, pp. 15–33,
2013.
[29] R. W. Picard, “Future affective technology for autism and
emotion communication,” Philosophical Transactions of the
Royal Society B: Biological Sciences, vol. 364, no. 1535,
pp. 3575–3584, 2009.
[30] Universität Augsburg University, “OpenSSI,” May 2017,
https://hcm-lab.de/projects/ssi/.
[31] E. Politou, E. Alepis, and C. Patsakis, “A survey on affective
computing,” Computer Science Review, vol. 25, pp. 79–100,
2017.
[32] N. Hardeniya, NLTK Essentials, Packt Publishing Limited,
Birmingham, UK, 2015.
[33] G. Hackeling, Mastering Machine Learning with Scikit-Learn,
Packt Publishing Limited, Birmingham, UK, 2014.
[34] J. Howse, P. Joshi, and M. Beyeler, OpenCV: Computer Vision
Projects with Python, Packt Publiser Limited, Birmingham,
UK, 2016.
[35] P. Ekman and E. Rosenberg, What the Face Reveals: Basic and
Applied Studies of Spontaneous Expression Using the Facial
Action Coding System (FACS), Oxford University Press,
Oxford, UK, 2005.
[36] J. Tao and T. Tan, “Affective computing: a review,” Lecture
Notes in Computer Science, vol. 3784, pp. 981–995, 2005.
[37] J. M. Garcia-Garcia, “emoCook,” December 2017, https://
emocook.herokuapp.com/.
[38] A. Bar, “What web can do today. An overview of the device
integration HTML5 APIs,” June 2018, https://whatwebcando.
today/device-motion.html.
[39] Usability.gov, “Usability.gov-improving the user experience,”
January 2018, https://www.usability.gov/get-involved/blog/
2015/02/working-with-kids-and-teens.html.
[40] A. All, E. P. Nuñez Castellar, and J. Van Looy, “Assessing the
effectiveness of digital game-based learning: best practices,”
Computers & Education, vol. 92-93, pp. 90–103, 2016.
[41] D. Garson, Significance Testing: Parametric and Nonpara-
metric, Statistical Associates Publishing, Blue Book Series,
Asheboro, NC, USA, 2012.
Advances in
Multimedia
Applied
Computational
Intelligence and Soft
Computing
The Scientific
Engineering
Journal of
Mathematical Problems
Hindawi
World Journal
Hindawi Publishing Corporation
in Engineering
Hindawi Hindawi Hindawi
www.hindawi.com Volume 2018 http://www.hindawi.com
www.hindawi.com Volume 2018
2013 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018
Modelling & Advances in

Simulation Artificial
in Engineering Intelligence
Hindawi Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018
Advances in
International Journal of
Fuzzy
Reconfigurable Submit your manuscripts at Systems
Computing www.hindawi.com
Hindawi
www.hindawi.com Volume 2018 Hindawi Volume 2018
www.hindawi.com
Journal of
Computer Networks
and Communications
Advances in International Journal of
Scientific Human-Computer Engineering Advances in
Programming
Hindawi
Interaction
Hindawi
Mathematics
Hindawi
Civil Engineering
Hindawi
Hindawi
www.hindawi.com Volume 2018
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018
International Journal of
Biomedical Imaging
International Journal of Journal of

Journal of Computer Games Electrical and Computer Computational Intelligence
Robotics
Hindawi
Technology
Hindawi Hindawi
Engineering
Hindawi
and Neuroscience
Hindawi
www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018 www.hindawi.com Volume 2018

Recognition of Emotions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Recognition of Emotions

Uploaded by

Copyright:

Available Formats

Hindawi

Mobile Information Systems

Jose Maria Garcia-Garcia ,1 Vı́ctor M. R. Penichet ,1 Marı́a Dolores Lozano ,1

Correspondence should be addressed to Marı́a Dolores Lozano; maria.lozano@uclm.es

Academic Editor: Salvatore Carta

1. Introduction want computers to be genuinely intelligent and to interact

Table 1: Comparison of emotion detection technologies from speech.

Table 2: Comparison of emotion detection technologies from facial expressions.

Table 3: Comparison of emotion detection technologies from text.

point, the best emotion detectors as we combine information

Figure 2: emoCook prototype.

aﬀective state, and the results obtained are used to change

Average mistakes per level Table 4: Satisfaction results for System 1.

0 Table 5: Satisfaction results for System 2.

[18] Good Vibrations, “Good vibrations company B.V.–recognize

Modelling & Advances in

International Journal of Journal of

You might also like