Professional Documents
Culture Documents
fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2018.2885304, IEEE
Transactions on Affective Computing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 1
Abstract—Textual emotion detection is a challenge in com- what causes the said changes, what the final state is, and what
putational linguistics and affective computing as it involves the kind of actions result from the said final state. A survey on
discovery of all associated emotions expressed in a given piece of ”Trend analysis in social networking using opinion mining”
text. It becomes even more difficult when applied to conversation
transcripts, as there arises a need to model the spoken utterances [4] predicted the need for emotion detection in streaming data
between speakers while keeping in mind the context of the and live chat.
entire conversation. In this paper, we propose a semisupervised Conversational text is also more of a challenge than other
multi-label method of predicting emotions from conversation types of text. For short text samples, such as news headlines
transcripts. The corpus contains conversational quotes extracted
from movies. A small number of them are annotated, whereas
[1] or tweets [5], the expression of emotion generally depends
the rest are used for unsupervised training. The word2vec word- on the words being used. Meanwhile, for longer text, the
embedding method has been used to build an emotion lexicon grammatical structure and syntactic variables such as nega-
from the corpus and then embed the utterances into vector tions, embedded sentences, and type of sentence (question,
representations. A deep-learning auto-encoder is then used to exclamation, command, or statement) play a part in expressing
discover the underlying structure of the unsupervised data. We
emotions [6]. Identifying emotions in conversational text is
fine-tune the learned model based on labeled training data and
measure its performance on a test set. The experiment result also very different from paragraphs because there is often
suggests that the method is effective and is only slightly less more than just one party in a conversation. Each party takes
effective than human annotators. turns, which are called utterances, to express different ideas
Index Terms—Emotion recognition, semisupervised learning, and emotions, thus making an impact on the other party’s
multilabel, word2vec, autoencoder. emotions. Therefore, to detect emotions in conversational text,
one needs to monitor not only the current utterance but also
the previous utterances, as well as the context of the entire
I. I NTRODUCTION conversation [7].
1949-3045 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2018.2885304, IEEE
Transactions on Affective Computing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 2
1949-3045 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2018.2885304, IEEE
Transactions on Affective Computing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 3
One of the most obvious clues to identify emotions in takes advantage of the label correlations and at the same time
conversation is the choice of words as suggested in [17]. manages to avoid the problem of a large number of subsets
Therefore, [18], [19] proposed a database of more than English that traditional LP methods face.
lemmas giving information about the affective meanings of Algorithm adaptation methods include the multi-label ex-
words. These works inspired the use of emotional lexicon tension of decision trees, support vector machines, neural
in emotion analysis, including the NRC emotion lexicon [2] networks, and so on. [27]. Meka’s 2 implementation of deep
and the Wordnet-affect [20]. However, their lexicon size is back-propagation neural network (DBPNN) [28] is also ca-
small, therefore lacks coverage, and only works on short text pable of handling multi-label problems. The method focuses
domains such as tweets, news headlines, and so on where the on using multiple layers of restricted Boltzmann machines
recognition of emotion heavily depends on words. They relied (RBM) to create an autoencoder for pre-training. After that, the
on a set of specific seed emotional words and then expanded whole network is fine-tuned using back-propagation of error
the lexicon by finding all its synonyms and antonyms. The derivatives. In our study, we implement a similar adaptation of
lexicons are often annotated in a multiclass manner, which a multi-label neural network using a stack of fully connected
means that one lexical item can only be associated with a layers and also incorporate the correlations between labels by
specific emotion. However, in reality, one word might express directly modifying the loss function.
different emotions in different situations. Their methods not Using Plutchik’s basic emotions, [29] proposed a simple
only oversimplify the complex nature of emotions but also bag-of-words approach and fine-tuned RAkEL for multi-label
remove the collocation connections between the lexical items. classification of movie reviews. We delve further and work
As a result, the approaches of previous lexicons show inflexi- on the conversation data, where the exchange between the
bility when applied to other domains, such as emotion analysis characters and the context of the entire dialogue is of great
in customer reviews, restaurants, personal blogs, because they importance. The closest example of our work is [3] on para-
do not take into account the different meanings of words (word graphs and documents, which tried to improve the sentence-
senses) in different contexts. [21]. In this paper, despite the level prediction of some special emotions which, owing to
fact that we build our lexicon from the IMDb movie quotes data sparseness and inherent multi-label classification, were
domain, the lexicon is automatically extracted from the corpus very difficult to predict. These researchers incorporated label
through word embedding and can be reproduced on any other dependency between labels and context dependency into the
domains without much effort. graph model to achieve the goal. However, their work is for
Most of the previous work concentrated on narrowing down paragraphs in Chinese. In our case, we take advantage of an
the complexity of the problem by focusing on only a small autoencoder to capture the abstract representation of context
set of emotions that barely involved three or four emotional information.
states [22], [23]. While such approaches may have succeeded
in particular problems and domains, they lacked the capacity
III. C ORPUS
to predict all genres of emotions in different kinds of text.
Another work by [24] performed a multiclass classification The IMDb quotes corpus is newly published and gets
of dialogue data sourced from Twitter in Japanese. The re- updated frequently. It includes approximately 2,107,863 ut-
searchers automatically labeled the obtained dialogues using terances (turns in conversation) from 117,425 movies. There
emotional expression clues, which is similar to our work of are several reasons for us to choose the IMDb corpus. First,
using an emotional word lexicon. The work assumed that one movies have more input for the annotators with video and
tweet might portray only one emotion. Although, this may audio other than just text, which helps them to provide a
be the case in short exchanges of tweets, it is not applicable more accurate annotation. Second, movies tend to evoke many
to real-life conversations where the number of characters emotions among the viewers [30]. Third, there are various
per utterance is not limited, and the context information genres in movies such as romance, sci-fi, documentary, and so
complicates the process of emotion analysis. on. These genres might be equivalent to real-life domains, for
Existing multi-label classification methods can be classi- example: normal life conversations and scientific exchanges
fied into either the problem transformation category or the and debates. Lastly, unlike other types of art, such as literature
algorithm adaptation category. The former group of methods or theatre plays, moviemakers often try to give their dialogues
transforms the multi-label tasks into multiple single-label as much real-life feelings as possible [31]. In the work of
classifications. They use the multi-class approach of one-vs- [32] and [33], it had been concluded that spoken language
rest but ignore the correlations among labels similar to the in movies resembles spontaneous spoken language, which
binary relevance method [25]. The less common approach supports our choice of using a movie corpus to imitate real-life
in this group is the Label Powerset (LP) method, which conversations.
produces new labels from every possible combination of the To produce labeled data, we sample quotes from a few
original labels. This approach considers label correlations but chosen famous movies. We then try to match the quotes with
also produces a very large number of subsets of labels with subtitle files to get the time of the corresponding scenes from
very few examples of each. Random k-Labelsets (RAkEL) the movies. Annotators are given a description of the basic
[26] follow LP; however, they train the LP classifiers with emotions and dyads and then asked to watch the scenes with
only k random labelsets from all possible subsets. RAkEL
2 http://meka.sourceforge.net/\#about
1949-3045 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2018.2885304, IEEE
Transactions on Affective Computing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 4
1949-3045 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2018.2885304, IEEE
Transactions on Affective Computing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 5
distribution of emotions in our gold-standard data. While Fig. 3. Intensity and polarity of basic emotions.
emotions such as anger, trust, and sadness are annotated For the dyads and other words, we calculate the similarity
very frequently, the annotators rarely label anticipation. There between these words and all primary emotions ei . We assume
might be two reasons for this, the first being the subtlety of the that the higher the similarity, the closer the emotional state
emotion itself and the second being the choice of the movie of the words to the primary emotions. The emotional vector
genre used in our data. However, both seem to be interesting of one word is the averaged result of all primary emotion
points to investigate on our future research. On an average, vectors multiplied by the similarity weights, as in an equation
our data has 1.41 emotions per utterance. 2. Since there are four axes of basic emotions in Plutchik’s
theory, the result of this step is a four-dimensional vector of
IV. P ROPOSED METHOD emotion features. Pn
A. Emotional words to vectors sim(word, ei ) × vec(ei )
vec(word) = 1 (2)
n
Using a lexicon is proven to provide significant improve- The final embedding is the concatenation result of the
ments in identifying the emotion conveyed by a word [2]. word2vec-generated vectors and the newly calculated emo-
Therefore, in our case, we build a new lexicon, in which each tional vectors. In this research, our embedding is of 100-
lexical item displays not only its association with Plutchik’s dimensional vectors, 96 of which are generated with word2vec.
basic emotions but also the strength of the association. We The other four are generated using the preceding steps.
combine the word2vec features and calculated emotion fea-
tures to form a hybrid vector representation of a lexicon item
as follows: B. Visualization of the lexicon
1) word2vec features: Using word2vec, we generate the
Our lexicon consists of 181,276 lexical words, which is
embedding of all words available in the corpus. With the
much larger than most previous lexicons proposed by other
embedding, the cosine similarity between each word and the
researchers. The NRC Emotion Lexicon [2] and Wordnet-
primary emotion words is calculated in the form of an equation
Affect [20] contain 25,000 and 2,876 synsets, respectively.
1. In our work, the embedding of a word is a 96-dimensional
Figure 4a is a visualization of the top 5,000 popular lexical
vector.
items and some of the basic emotions. The projection is
vec(word) · vec(ei ) done directly on 100-dimensional vectors using the matplotlib
sim(word, ei ) = (1) library function 3 and t-SNE algorithm from the scikit-learn
kvec(word)k2 kvec(ei )k2
packet 4 for dimension reduction. Despite the fact that the
2) Emotion features: By contrast, we define the primary visualization is done by reducing the number of dimensions
emotions and dyads proposed in Plutchik’s theories as the of each lexical item to only two dimensions, some interesting
emotional vectors of our lexicon and give them initial values. results can be observed in figure 4.
Different levels of intensity of emotional words are also From the figure, we observe can see that except for the pair
considered. Each lexical item in the lexicon has a vector of of fear and anger, other opposite basic emotions are located
values on every axis of the basic emotions: joy-sadness, fear- quite far from each other (subfigure 4a), which is the desirable
anger, trust-disgust, and surprise-anticipation. We manually outcome of the lexicon. Interestingly, in a small cluster of
assign the primary emotions a value vector of 1, 0, or -1, and subfigures 4b, we observe three basic emotions: joy, fear, and
others with 1.5, 0.5, or -0.5, -1.5, depending on the intensity anger. The surrounding lexical items, while appearing to be
of the emotion according to Plutchik’s theory. For example, random at first, somehow seem more relevant later. Words
”joy” came from the axis of joy-sadness. Thus, its vector is
[1,0,0,0], while the vector for ”sadness” is [-1,0,0,0]. The word 3 https://matplotlib.org
1949-3045 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2018.2885304, IEEE
Transactions on Affective Computing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 6
(a) Embedding of the top and most frequent 5000 lexical items (small dots) and basic emotions, which are both annotated in
red with arrow signs. We notice some clusters are overlapped by basic emotions.
(b) Items with the most similarity to basic emotions: joy, fear, and anger (c) Items that are most similar to basic emotion trust , such as nods,
. Dyads or intensified/lesser emotions are boxed and annotated in blue. appreciate, together, and agreed.
Rage is the intensification of anger and guilt is a combination of joy and
fear
Fig. 4. Visualization of the lexicon in two dimensions using matplotlib and scikit-learn’s t-SNE. The opposite emotions are often far from each other, while
lexical items with similar meanings are close, as in 4a. However, because we reduced the number of embedding dimensions from the original 100 to only 2,
some clusters are mixed together, as in 4b
such as pain, rage, evil, and curse are close to anger; pride, for each utterance in a conversation, we also have to vectorize
happiness, and beauty are close to joy. The dyad guilt, which the previous utterance and the entire conversation to capture
according to Plutchik’s theory is a combination of joy and the contextual information (Figure 5). As a result, the vector
fear (subfigure 1b) is also present in this small cluster. In the representation of an utterance is now a 300-dimensional vector
cluster of trust (subfigure 4c), we see lexical items that suggest concatenated product of the utterance itself and the above-
agreement, such as nods, agreed, and appreciate. The results mentioned contextual information. This representation is then
show that our lexicon learns the association between lexical fed to the input layer of the neural network in the following
items and the basic emotions and dyads. sections:
1949-3045 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2018.2885304, IEEE
Transactions on Affective Computing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 7
1949-3045 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2018.2885304, IEEE
Transactions on Affective Computing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 8
which states that if a basic emotion is labeled by at least three vs. Bag-of-words approaches: Our system, under different
among five annotators, we accept it as a true label for the settings, performed remarkably better than the simple ap-
utterance. proaches using Meka’s DBPNN and RAkEL. The supervised
Evaluation metrics In our study, two common evaluation method used the same dataset as the other two methods and
metrics, which have been popularly used in multi-label clas- none of them took advantage of the unsupervised data. Yet, the
sification problems, [3], [38] are employed to measure the supervised method remarkably outperformed the two methods
performance of our system. Let Yi be a set of true labels for by 14 and 19 in hamming and the F1 score, respectively. We
a given instance i , and Yi0 be the set of labels predicted by a believe that the context features and our emotion lexicon were
system. Let N be the total number of instances, then: a deciding factor.
1) Hamming score, or accuracy in multi-label classifica- Our system: autoencoder vs. self-learn We can clearly see
tion, gives the degree of similarity between the ground- that the the autoencoder has a better performance as compared
truth set of labels and the predicted set of labels. to the self-learn method. While both are semisupervised meth-
N ods, self-learn uses supervised data to produce a model. This
1 X |Yi ∩ Yi0 |
Hammingscore = (5) model tries to classify the unsupervised data and assimilates
N i |Yi ∪ Yi0 | the obtained result to retrain. Naturally, when the size of the
2) F1-measure: It is the harmonic mean of precision and unsupervised data becomes larger than the supervised one, the
recall. In our study, we have given equal importance to model starts dealing with more and more unseen examples and
precision and recall. the performance drops. Figure 8 confirms this explanation.
2 ∗ precision ∗ recall Our system: semisupervised vs. supervised The perfor-
F1 = (6)
precision + recall mance of the autoencoder is much better than that of the
Precision: It is the fraction of correctly predicted labels supervised method. Our system first learns and tries to imitate
among all the predicted labels in the set. the enormous number of unsupervised examples. During the
N
1 X |Yi ∩ Yi0 | retraining process, it figures out the connections between the
P recision = (7) concepts that it has imitated and the true results. Therefore,
N i |Yi0 |
it has a better understanding of the data and makes better
Recall: It is the fraction of correctly predicted labels among predictions than when using only the supervised examples.
all the true labels in the set. vs. human annotator: This is the most important baseline,
N
1 X |Yi ∩ Yi0 | which explains how well our system performs in comparison
Recall = (8)
N i |Yi | with a human. Please note that this evaluation of the human
annotators is the average agreement between each annotator
B. Experimental results and the gold-standard data (decided by the majority rule as
discussed in the earlier section). Our system’s performance
To evaluate our system, a comparison must be made against is slightly worse than that of the human annotator in both
other systems. We replicated the works of others and applied Hamming score by 4 and the F1 measure by 6. However,
them to our new corpus. [29] is a similar work that also used we should also take into consideration that the input for
Plutchik’s theory of basic emotions and worked on multi- human annotators are movies with full video, sound signals,
label data without considering the intensity of the emotion and transcript texts, while the input for our system is only
labels. This study achieved an F1 score of 45.6 in its own the transcripts. We acknowledge that the different inputs of
dataset of 629 sentences of user-generated movie reviews. annotators and our system may affect the results. However,
Both the studies used Meka’s RAkEL method and bag-of- as the purpose of this research is to make a system that can
words approach as their work for the first baseline. However, identify emotions in text messages, we did not incorporate the
it would be unfair to apply [29]’s system to our corpus features from other modalities.
and make comparisons since it is fine-tuned specifically for
In short, our system with autoencoder semisupervised learn-
their corpus. Their study neither considers the emotions of
ing performs much better than the existing methods not only
each sentence nor the contextual information. Therefore, the because of the emotion lexicon that we built or its way of
second system is Meka’s DBPNN, which is said to have a extracting contextual information but also because of its ability
better performance than RAkEL [39]. We consider RAkEL
to use largely available unsupervised data. When relying only
and DBPNN to be state-of-the-art systems for multi-label
on textual data, our system performs slightly worse than
classification of emotions in text.
human annotators, for whom full movie clips serve as the
To our knowledge, the most important baseline is human an- input.
notation. To obtain this baseline, we calculated the evaluation
metrics based on the average agreement score between each
C. Publication of the data and possibility of replicating the
annotator and the gold standard, as mentioned in section III.
work in other domains
We elaborated the performance of our system using different
settings: autoencoder semisupervised learning, unsupervised The annotated data and the emotion lexicon will soon be
self-learning, and supervised learning using labeled data only. published in the author’s GitHub repository.
Figure 7 compares the performance of our system to the The movie conversation domain is very close to real-life
baselines. settings. Therefore, we believe that the model and lexicon
1949-3045 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2018.2885304, IEEE
Transactions on Affective Computing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 9
Fig. 7. Evaluation of the system: 1) Human annotators, 2) our system using autoencoder, 3) our system using self-learning, 4) our supervised system,
5)RAkEL, and 6)DBPNN.
1949-3045 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2018.2885304, IEEE
Transactions on Affective Computing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 10
[2] S. Mohammad, “Portable features for classifying emotional text,” in [23] C. Yang, K. H.-Y. Lin, and H.-H. Chen, “Emotion classification using
Proceedings of the 2012 Conference of the North American Chapter web blog corpora,” in Web Intelligence, IEEE/WIC/ACM International
of the Association for Computational Linguistics: Human Language Conference on. IEEE, 2007, pp. 275–278.
Technologies. Association for Computational Linguistics, 2012, pp. [24] T. Hasegawa, N. Kaji, N. Yoshinaga, and M. Toyoda, “Predicting and
587–591. eliciting addressee’s emotion in online dialogue.” in ACL (1), 2013, pp.
[3] S. Li, L. Huang, R. Wang, and G. Zhou, “Sentence-level emotion 964–972.
classification with label and context dependence,” in Proceedings of the [25] K. Brinker, J. Fürnkranz, and E. Hüllermeier, “A unified model for
53rd Annual Meeting of the Association for Computational Linguistics multilabel classification and ranking,” in Proceedings of the 2006
and the 7th International Joint Conference on Natural Language conference on ECAI 2006: 17th European Conference on Artificial
Processing (Volume 1: Long Papers). Beijing, China: Association Intelligence August 29–September 1, 2006, Riva del Garda, Italy. IOS
for Computational Linguistics, July 2015, pp. 1045–1053. [Online]. Press, 2006, pp. 489–493.
Available: http://www.aclweb.org/anthology/P15-1101 [26] G. Tsoumakas and I. Vlahavas, “Random k-labelsets: An ensemble
[4] S. Dave and H. Diwanji, “Trend analysis in social networking using method for multilabel classification,” Machine learning: ECML 2007,
opinion mining a survey,” 2015. pp. 406–417, 2007.
[5] J. Bollen, H. Mao, and A. Pepe, “Modeling public mood and emotion: [27] G. Tsoumakas and I. Katakis, “Multi-label classification: An overview,”
Twitter sentiment and socio-economic phenomena.” 2011. International Journal of Data Warehousing and Mining, vol. 3, no. 3,
[6] G. Collier, Emotional expression. Psychology Press, 2014. 2006.
[7] D.-A. Phan, H. Shindo, and Y. Matsumoto, “Multiple emotions detection [28] G. Hinton and R. Salakhutdinov, “Reducing the dimensionality of data
in conversation transcripts,” PACLIC 30, p. 85, 2016. with neural networks,” Science, vol. 313, no. 5786, pp. 504–507, 2006.
[8] R. Plutchik, “A general psychoevolutionary theory of emotion,” Theories [29] L. Buitinck, J. Van Amerongen, E. Tan, and M. de Rijke, “Multi-emotion
of emotion, vol. 1, pp. 3–31, 1980. detection in user-generated reviews,” in Advances in Information Re-
[9] ——, “The nature of emotions human emotions have deep evolutionary trieval. Springer, 2015, pp. 43–48.
roots, a fact that may explain their complexity and provide tools for [30] J. T. Hancock, K. Gee, K. Ciaccio, and J. M.-H. Lin, “I’m sad you’re
clinical practice,” American scientist, vol. 89, no. 4, pp. 344–350, 2001. sad: emotional contagion in cmc,” in Proceedings of the 2008 ACM
[10] M. Li, Q. Lu, Y. Long, and L. Gui, “Inferring affective meanings of conference on Computer supported cooperative work. ACM, 2008, pp.
words from word embedding,” IEEE Transactions on Affective Comput- 295–298.
ing, vol. 8, no. 4, pp. 443–456, 2017. [31] S. I. Rauma, “Cinematic dialogue, literary dialogue, and the art of
adaptation: dialogue metamorphosis in the film adaptation of the green
[11] R. Řehůřek and P. Sojka, “Software Framework for Topic Modelling
mile,” 2004.
with Large Corpora,” in Proceedings of the LREC 2010 Workshop on
New Challenges for NLP Frameworks. Valletta, Malta: ELRA, May [32] P. Forchini, “Movie language revisited.”
2010, pp. 45–50, http://is.muni.cz/publication/884893/en. [33] I. V. Serban, R. Lowe, L. Charlin, and J. Pineau, “A survey of
[12] P. Ekman, W. V. Friesen, M. O’Sullivan, A. Chan, I. Diacoyanni- available corpora for building data-driven dialogue systems.” CoRR,
Tarlatzis, K. Heider, R. Krause, W. A. LeCompte, T. Pitcairn, P. E. Ricci- vol. abs/1512.05742, 2015. [Online]. Available: http://dblp.uni-trier.de/
Bitti et al., “Universals and cultural differences in the judgments of facial db/journals/corr/corr1512.html#SerbanLCP15
expressions of emotion.” Journal of personality and social psychology, [34] J. Cohen, “Kappa: Coefficient of concordance,” Educ. Psych. Measure-
vol. 53, no. 4, p. 712, 1987. ment, vol. 20, p. 37, 1960.
[13] R. A. Calvo and S. Mac Kim, “Emotions in text: dimensional and [35] R. Artstein and M. Poesio, “Inter-coder agreement for computational
categorical models,” Computational Intelligence, vol. 29, no. 3, pp. 527– linguistics,” Computational Linguistics, vol. 34, no. 4, pp. 555–596,
543, 2013. 2008.
[14] L.-C. Yu, J. Wang, K. R. Lai, and X.-j. Zhang, “Predicting valence- [36] R. Socher, J. Pennington, E. H. Huang, A. Y. Ng, and C. D. Manning,
arousal ratings of words using a weighted graph method,” in Proceedings “Semi-supervised recursive autoencoders for predicting sentiment dis-
of the 53rd Annual Meeting of the Association for Computational tributions,” in Proceedings of the conference on empirical methods in
Linguistics and the 7th International Joint Conference on Natural natural language processing. Association for Computational Linguis-
Language Processing (Volume 2: Short Papers), vol. 2, 2015, pp. 788– tics, 2011, pp. 151–161.
793. [37] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
[15] L.-C. Yu, L.-H. Lee, S. Hao, J. Wang, Y. He, J. Hu, K. R. Lai, Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow,
and X. Zhang, “Building chinese affective resources in valence-arousal A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser,
dimensions,” in Proceedings of the 2016 Conference of the North M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray,
American Chapter of the Association for Computational Linguistics: C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar,
Human Language Technologies, 2016, pp. 540–545. P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals,
P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng,
[16] J. Posner, J. A. Russell, and B. S. Peterson, “The circumplex model
“TensorFlow: Large-scale machine learning on heterogeneous systems,”
of affect: An integrative approach to affective neuroscience, cognitive
2015, software available from tensorflow.org. [Online]. Available:
development, and psychopathology,” Development and psychopathology,
http://tensorflow.org/
vol. 17, no. 03, pp. 715–734, 2005.
[38] S. Godbole and S. Sarawagi, “Discriminative methods for multi-labeled
[17] S. M. Mohammad and P. D. Turney, “Emotions evoked by common classification,” in Advances in Knowledge Discovery and Data Mining.
words and phrases: Using mechanical turk to create an emotion lexicon,” Springer, 2004, pp. 22–30.
in Proceedings of the NAACL HLT 2010 workshop on computational
[39] P. Fernandez-Gonzalez, C. Bielza, and P. Larranaga, “Multidimensional
approaches to analysis and generation of emotion in text. Association
classifiers for neuroanatomical data,” in ICML Workshop on Statistics,
for Computational Linguistics, 2010, pp. 26–34.
Machine Learning and Neuroscience (Stamlins 2015), 2015.
[18] A. B. Warriner, V. Kuperman, and M. Brysbaert, “Norms of valence,
arousal, and dominance for 13,915 english lemmas,” Behavior research
methods, vol. 45, no. 4, pp. 1191–1207, 2013.
[19] M. M. Bradley and P. J. Lang, “Affective norms for english words
(anew): Instruction manual and affective ratings,” Citeseer, Tech. Rep.
[20] C. Strapparava, A. Valitutti et al., “Wordnet affect: an affective extension
of wordnet.” Citeseer, 2004.
[21] B. Heredia, T. M. Khoshgoftaar, J. Prusa, and M. Crawford, “Cross-
domain sentiment analysis: an empirical investigation,” in Information
Reuse and Integration (IRI), 2016 IEEE 17th International Conference
on. IEEE, 2016, pp. 160–165.
[22] S. K. D’Mello, S. D. Craig, J. Sullins, and A. C. Graesser, “Predict-
ing affective states expressed through an emote-aloud procedure from
autotutor’s mixed-initiative dialogue,” International Journal of Artificial
Intelligence in Education, vol. 16, no. 1, pp. 3–28, 2006.
1949-3045 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TAFFC.2018.2885304, IEEE
Transactions on Affective Computing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING 11
1949-3045 (c) 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.