Professional Documents
Culture Documents
1 Introduction Premise
Hypothesis
I won a trip to Greece in a
This person feels angry Contradiction
competition :-)
To enable communication about emotions, there
Hypothesis
exists a set of various emotion names, for instance This person feels sad Contradiction
those labeled as basic emotions, by Ekman (1992)
or Plutchik (2001) (anger, fear, joy, sadness, dis- Figure 1: An example of the application of NLI to ZSL
gust, surprise, trust, anticipation). While such psy- emotion classification. Given the premise “I won a trip
chological models influence natural language pro- to Greece in a competition”, three hypotheses represent
cessing and emotion categorization approaches, the the emotions (joy, anger, sadness). The representation
choice of emotion concepts is context-dependent. of joy is entailed and therefore predicted.
6805
Proceedings of the 29th International Conference on Computational Linguistics, pages 6805–6817
October 12–17, 2022.
Buechel et al., 2021). cent years (Bostan and Klinger, 2018; Mohammad
We take a different, more direct, route to ob- et al., 2018a; Plaza-del-Arco et al., 2020, i.a.).
taining classifiers for discrete emotion categories Emotion classification aims at mapping textual
which are not known at system development time, units to an emotion category. The categories of-
namely zero-shot learning (ZSL). One instantiation ten rely on psychological theories such as those
of such ZSL systems is via natural language infer- proposed by Ekman (1992) (anger, fear, sadness,
ence models (ZSL-NLI), in which the inference joy, disgust, surprise), or the dimensional model
task needs to perform reasoning (Yin et al., 2019). of Plutchik (2001) (adding trust and anticipation).
Consequently, the idea of implementing ZSL-NLI However, neither are all these basic emotions rele-
models is not by exemplification and optimizing a vant in all domains, nor are they sufficient. For in-
classifier, but developing appropriate natural lan- stance, in the education field, D’mello and Graesser
guage class name representations which we refer to (2007) found boredom, confusion, flow, frustration,
as prompts. We see an example for the application and delight to be more relevant than fear or disgust.
of an NLI model to ZSL emotion classification in Sreeja and Mahalaksmi (2015) reveal that emotions
Figure 1 – the NLI model needs to decide if the such as love, hate, and courage are necessary to
hypothesis (a prompt which represents the class la- model the emotional perception of poetry. Bostan
bel) entails the premise (which corresponds to the et al. (2020) identify annoyance, guilt, pessimism,
instance to be classified). This paradigm raises the or optimism to be important to analyze news head-
question (which we answer in this paper) of how to lines.
formulate the emotion prompt and how much the A strategy to avoid specification of discrete cat-
design choice of the prompt needs to fit the dataset. egories is the use of dimensional spaces that con-
Manually developing intuitive templates based sider valence, arousal, and dominance (VAD, Rus-
on human data introspection may be the most natu- sell and Mehrabian, 1977). Smith and Ellsworth
ral method to produce prompts. In this paper, we (1985) claim that this model does not represent
provide manually created templates to probe emo- important difference between emotions and pro-
tion classification in an NLI-ZSL setup and we ana- pose an alternative dimensional model based on
lyze whether prompts are language-register depen- cognitive appraisal, which has recently been used
dent according to various corpora (tweets, event de- for text analysis (Hofmann et al., 2020; Stranisci
scriptions, blog posts). To accomplish this aim, we et al., 2022; Troiano et al., 2023). Independent of
perform experiments on an established set of emo- the classification or regression approach, nearly all
tion datasets with three NLI models and we show recently proposed systems rely on transfer learn-
that (1) prompts are indeed corpus-specific and that ing from general language representations. We
the differences follow the same pattern across dif- refer the reader to recent shared task surveys for a
ferent pretrained NLI models, (2) that an ensemble more comprehensive overview (Mohammad et al.,
of multiple prompts behaves more robustly across 2018b; Tafreshi et al., 2021; Plaza-del Arco et al.,
corpora, and (3) the representation of the emotion 2021).
concept as part of the textual prompt is an important
element, benefiting from representations with syn- 2.2 Zero-shot Learning
onyms and related concepts, instead of just the emo-
tion name. Our code is publicly available at https: Zero-shot learning (ZSL) aims at performing pre-
//github.com/fmplaza/zsl_nli_emotion_prompts. dictions without having seen labeled training ex-
amples specific for the concrete task. Zero-shot
2 Related Work methods typically work by associating seen and
unseen classes using auxiliary information, which
2.1 Emotion Classification encodes observable distinguishing properties of in-
Emotion analysis has become a major area of re- stances (Xian et al., 2019). In NLP, the term is
search in NLP which comprises a variety of tasks, used predominantly either to refer to cross-lingual
including emotion stimulus or cause detection (Li transfer to languages that have not been seen at
et al., 2021; Doan Dang et al., 2021) and emo- training time (change of the language), or to pre-
tion intensity prediction (Mohammad and Bravo- dict classes that have not been seen (change of the
Marquez, 2017; Köper et al., 2017). The task of labels, Wang et al., 2019). Our work falls in the
emotion classification received most attention in re- second category.
6806
Various approaches exist to perform zero-shot onyms and related concepts helps its interpretation
text classification. One approach represents labels in the NLI models.
in an embedding space (Socher et al., 2013; Sap-
padla et al., 2016; Rios and Kavuluru, 2018, i.a.). 3 Methods
A model is trained to predict the respective embed-
ding vectors for categorical labels. At test time, em- In this section, we explain how we apply NLI for
beddings of novel labels need to be known and will ZSL emotion classification and propose a collec-
be assigned if the distance between the predicted tion of prompts to contextualize and represent the
embedding and the label embedding is small. This emotion concept in different corpora. In addition,
method suffers from the hubness problem, that is, we propose a prompt ensemble which is more ro-
when the semantic label embeddings are close to bust across corpora.
each other, the projection of labels to the semantic
space forms hubs (Radovanovic; et al., 2010). 3.1 Natural Language Inference for Zero-shot
Emotion Classification
Another approach is to use transformer language
models to classify if a label embedding is com- The NLI task is commonly defined as a sentence-
patible with an instance embedding (Brown et al., pair classification in which two sentences are given:
2020). To this end, no labeled examples are pro- a premise s1 and a hypothesis s2 . The task is to
vided at training phase but an instruction in nat- learn a function fNLI (s1 , s2 ) → {E, C, N }, in
ural language is given to the model to interpret which E expresses the entailment of s1 and s2 , C
the label class (the prompt). An instance of this denotes a contradiction and N is a neutral output.
approach is Task-Aware Representations (TARS, We treat ZSL emotion classification as a textual
Halder et al., 2020) who separate the instance text entailment problem, but represent each label under
and the class label by the special separator token consideration with multiple prompts, in contrast
[SEP] in BERT (Devlin et al., 2019). to Yin et al. (2019). Given a sentence to be clas-
An alternative is to treat ZSL as textual entail- sified x (premise) and an emotion e, we have a
ment. Following this approach, Yin et al. (2019) function g(e) that generates a set of prompts (hy-
propose a sequence-pair classifier that takes two pothesis) out of the class e ∈ E (with E being
sentences as input (a premise and a hypothesis) and the set of emotions under consideration). Under
decides whether they entail or contradict each other. the assumption of an NLI model m, which calcu-
They study various formulations of the labels as hy- lates the entailment probability pm (γ, x) for some
potheses and evaluate the method in various NLP emotion representation γ ∈ g(e), we assign the
tasks including topic detection, situation detection, average entailment probability across all emotion
and emotion classification. In their evaluation, emo- representations as
tion classification turns out to be most challenging.
Another study that conducted prompt engineering 1 X
p̄gm (e, x) = pm (γ, x)
in NLI models proposes probabilistic ZSL ensem- |g(e)|
γ∈g(e)
bles for emotion classification (Basile et al., 2021).
The authors experiment with the same prompts as for a particular prompt generation method g. The
Yin et al. (2019) and aggregate the predictions of classification decision
multiple NLI models using Multi-Annotator Com-
petence Estimation (MACE), a method developed êgx = arg max p̄gm (e, x)
for modelling crowdsourced annotations. e∈E
Table 1: Emotion prompts. Words in italics represent placeholders for the emotion concept representation.
0.4
0.3
0.2
Expr-Emo
Expr-Emo
Expr-Emo
Feels-Emo
Feels-Emo
Feels-Emo
WN-Def
WN-Def
WN-Def
Emo-Name
Emo-Name
Emo-Name
Figure 2: Results of Experiment 1. Comparison of prompts across NLI models and emotion datasets.
with two Intel Xeon Silver 4208 CPU at 2.10GHz, emotions experienced when not in a state of well-
192GB RAM, as main processors, and six GPUs being”. This is ambiguous since not being in a state
NVIDIA GeForce RTX 2080Ti (with 11GB each). of well-being may also be associated with other
negative emotions such as anger or fear. Interest-
4.2 Results ingly, the best-performing emotion representation
In order to answer the research questions formu- on T EC is Emo-Name, which resembles the anno-
lated in this study, we conduct different ZSL-NLI tation procedure of just using an emotion-related
emotion classification experiments. hashtag for labeling. Similarly, Expr-Emo shows
the best performance for the self-reports of I SEAR
4.2.1 Experiment 1: Are NLI models (“This text expresses”) and Feels-Emo on B LOGS
behaving the same across prompts? (“This person feels”). These subtle differences in
With the first experiment, we aim at observing if the prompt formulations indicate that there are par-
different NLI models behave robustly across emo- ticular factors in the dataset that influence the inter-
tion datasets and prompts. We use each model pretation of the prompt, for instance, the annotation
described in § 4.1.2 with each emotion representa- procedure, the data selection or the language reg-
tion that is not a set of multiple prompts, but only ister employed in the corpus, and therefore, they
consists of a single prompt, namely Emo-Name, affect the interpretation of the emotion by the NLI-
Expr-Emo, Feels-Emo and WN-Def. We evaluate ZSL classifier.
each model using all datasets (§ 4.1.1). 4.2.2 Experiment 2: Should we use synonyms
Figure 2 (and Table 6 in the Appendix) show the for emotion representation?
results. Each plot shows the performance of one
In this experiment, we aim at observing whether
NLI model on the three emotion datasets using the
the incorporation of synonyms in the prompt helps
four prompts. We see that the performances follow
the emotion interpretation. Instead of considering
the same patterns across NLI models and emotion
only the emotion name, we use six close emotion
datasets. Emo-Name is the best performing prompt
synonyms (see Emo-S, Expr-S, Feels-S in Table 7
for T EC, Expr-Emo for I SEAR and Feels-Emo for
in the Appendix).3 This leads to six prompts for
B LOGS. The lowest performance is achieved with
each emotion. For simplicity, we now only consider
WN-Def. The most successful NLI model across
DeBERTa, which showed best performances in the
the prompts is DeBERTa followed by BART and
previous experiment.
RoBERTa.
Figure 3 (and Table 6 in the Appendix) shows
Therefore, NLI models do behave robustly
the results of each context with just the emotion
across prompts. Particularly low performance can
be observed with WN-Def. This finding is in line 3
We assume that larger numbers might show better per-
with previous research (Yin et al., 2019): These def- formance in general, but this set of six synonyms focuses on
close, unambiguous synonyms which undoubtedly represent
initions may be suboptimal choices, for instance, the emotion in most contexts. We evaluate the impact of larger
sadness is represented via “This person expresses sets with the EmoLex approach.
6810
0.6 Emo-Name 0.8 non-zsl
0.5 Emo-S 0.7 Emo-Name
Expr-Emo 0.6 Emo-S
0.4 Expr-Emo
Macro-F1
Expr-S
Macro-F1
0.5
0.3 Feels-Emo 0.4 Expr-S
Feels-S 0.3 Feels-Emo
0.2 WN-Def 0.2 Feels-S
0.1 0.1 WN-Def
0.0 0.0 d-ensemble
TEC Blogs ISEAR TEC Blogs ISEAR d-oracle
Dataset Dataset
Figure 3: Results of Experiment 2. Comparison of Figure 4: Results of Experiment 3. Comparison of the
prompts including synonym emotion representations prompt individual models and the proposed ensemble
across three emotion datasets (T EC, B LOGS and I SEAR) models along with the non-zsl experiments.
using the DeBERTa model.
in interpreting emotions that are clearly expressed We presented an analysis of various prompts
in the text, but vary performance-wise when the for NLI-based ZSL emotion classification. The
emotion is implicitly communicated. prompts that we chose were motivated by the vari-
ous particularities of the corpora: single emotions
4.2.4 Experiment 4: Are synonyms sufficient? for T EC (tweets), “The person feels/The text ex-
Would it be even more useful to use presses” for B LOGS (blogs), and I SEAR (events).
more representations of emotions? In addition, we represented the emotions with emo-
In Experiment 2 we found that the use of synonyms tion names, synonyms, definitions, or with the help
is beneficial in some cases (I SEAR and B LOGS). of lexicons. Our experiments across these data sets
This raises the question if more terms that represent showed that, to obtain a superior performance, the
the emotion would lead to an even better perfor- prompt needs to fit well to the corpus – we did not
mance. We evaluate this setup with the EmoLex find one single prompt that works well across differ-
model introduced above, in which each emotion ent corpora. To avoid the requirement for manually
concept is represented with a set of prompts, where selecting a prompt, we therefore devised an ensem-
each prompt is a concept from an emotion lexicon. ble model that combines multiple sets of prompts.
Notably, in this prompt-generating methods, emo- This model is more robust and is nearly on par
tions are not only represented by abstract emotion with the best individual prompt. In addition, we
names or synonyms, but in addition with (some- found that representing the emotion concept more
times concrete) concepts, like “gift” or “tears”. diversely with synonyms or lexicons is beneficial,
Table 5 shows the performance of the DeBERTa but again corpus-specific.
model using the Emolex concepts (d-emolex), next Our work raises a set of future research questions.
to the ensemble results. The additional concepts We have seen that the oracle ensemble showed
which cover a wide range of topics associated with a good performance, illustrating that the various
the respective emotions particularly help in the prompts provide complementary information. This
B LOGS corpus, which is the one resource that has motivates future research regarding other combi-
been manually annotated in a traditional manner. nation schemes, including learning a combination
This manual annotation process might include com- based on end-to-end fine-tuned NLI models.
plex inference by the annotators to infer an emo- We have further seen that including more con-
tion category, instead of only using single words cepts with the help of a dictionary helps in one
6812
corpus, but not across corpora; however, synonyms News Headlines Annotated with Emotions, Semantic
constantly help. This raises the question about the Roles, and Reader Perception". In Proceedings of
the 12th Language Resources and Evaluation Confer-
right trade-off between many, but potentially inap-
ence, pages 1554–1566, Marseille, France. European
propriate, noisy concepts and hand-selected, high- Language Resources Association.
quality concepts. A desideratum is an automatic
subselection procedure, which removes concepts Laura-Ana-Maria Bostan and Roman Klinger. 2018.
"An Analysis of Annotated Corpora for Emotion
that might decrease performance and only keeps Classification in Text". In Proceedings of the 27th
concepts that are “compatible” to the current lan- International Conference on Computational Linguis-
guage register and annotation method. Ideally, this tics, pages 2104–2119, Santa Fe, New Mexico, USA.
procedure would not make use of annotated data, Association for Computational Linguistics.
because that would limit the advantages of ZSL. Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie
The main limitation of our current work is that Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind
we manually designed the prompts under consid- Neelakantan, Pranav Shyam, Girish Sastry, Amanda
eration, based on the corpora we used for evalua- Askell, Sandhini Agarwal, Ariel Herbert-Voss,
Gretchen Krueger, Tom Henighan, Rewon Child,
tion. This is a bottleneck in model development, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu,
which should either be supported by a more guided Clemens Winter, Christopher Hesse, Mark Chen, Eric
approach which supports humans in developing Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess,
prompts, or by an automatic model that is able to Jack Clark, Christopher Berner, Sam McCandlish,
Alec Radford, Ilya Sutskever, and Dario Amodei.
automatically generate prompts based on the lan- 2020. Language Models are Few-Shot Learners. In
guage register and concept representation in the Advances in Neural Information Processing Systems
dataset. 33: Annual Conference on Neural Information Pro-
cessing Systems 2020, NeurIPS 2020, December 6-
Acknowledgements 12, 2020, virtual.
We thank Enrica Troiano and Laura Oberländer for Sven Buechel and Udo Hahn. 2017. "EmoBank: Study-
ing the Impact of Annotation Perspective and Repre-
discussions on the topic of emotion analysis.
sentation Format on Dimensional Emotion Analysis".
Roman Klinger’s work is supported by the In Proceedings of the 15th Conference of the Euro-
German Research Council (DFG, project num- pean Chapter of the Association for Computational
ber KL 2869/1-2). Flor Miriam Plaza-del- Linguistics: Volume 2, Short Papers, pages 578–585,
Arco and María-Teresa Martín Valdivia have Valencia, Spain. Association for Computational Lin-
guistics.
been partially supported by the LIVING-LANG
project (RTI2018-094653-B-C21) funded by Sven Buechel, Luise Modersohn, and Udo Hahn. 2021.
MCIN/AEI/10.13039/501100011033 and ERDF A "Towards Label-Agnostic Emotion Embeddings". In
Proceedings of the 2021 Conference on Empirical
way of making Europe, and a grant from the Min- Methods in Natural Language Processing, pages
istry of Science, Innovation and Universities of the 9231–9249, Online and Punta Cana, Dominican Re-
Spanish Government (FPI-PRE2019-089310). public. Association for Computational Linguistics.
Angelo Basile, Guillermo Pérez-Torró, and Marc Sidney D’mello and Arthur Graesser. 2007. Mind and
Franco-Salvador. 2021. "Probabilistic Ensembles body: Dialogue and posture for affect detection in
of Zero- and Few-Shot Learning Models for Emotion learning environments. In Proceedings of the 2007
Classification". In Proceedings of the International conference on Artificial Intelligence in Education:
Conference on Recent Advances in Natural Language Building Technology Rich Learning Contexts That
Processing (RANLP 2021), pages 128–137, Held On- Work, page 161–168, NLD. IOS Press.
line. INCOMA Ltd.
Bao Minh Doan Dang, Laura Oberländer, and Roman
Laura Ana Maria Bostan, Evgeny Kim, and Roman Klinger. 2021. Emotion Stimulus Detection in Ger-
Klinger. 2020. "GoodNewsEveryone: A Corpus of man News Headlines. In Proceedings of the 17th
6813
Conference on Natural Language Processing (KON- Xiangju Li, Wei Gao, Shi Feng, Yifei Zhang, and Daling
VENS 2021), Düsseldorf, Germany. German Society Wang. 2021. "Boundary Detection with BERT for
for Computational Linguistics & Language Technol- Span-level Emotion Cause Analysis". In Findings of
ogy. the Association for Computational Linguistics: ACL-
IJCNLP 2021, pages 676–682, Online. Association
Paul Ekman. 1992. An argument for basic emotions. for Computational Linguistics.
Cognition and Emotion, 6(3-4):169–200.
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
Thomas Haider, Steffen Eger, Evgeny Kim, Roman dar Joshi, Danqi Chen, Omer Levy, Mike Lewis,
Klinger, and Winfried Menninghaus. 2020. "PO- Luke Zettlemoyer, and Veselin Stoyanov. 2020.
EMO: Conceptualization, Annotation, and Model- RoBERTa: A Robustly Optimized BERT Pretraining
ing of Aesthetic Emotions in German and English Approach.
Poetry". In Proceedings of the 12th Language Re-
sources and Evaluation Conference, pages 1652– Winfried Menninghaus, Valentin Wagner, Eugen Was-
1663, Marseille, France. European Language Re- siliwizky, Ines Schindler, Julian Hanich, Thomas Ja-
sources Association. cobsen, and Stefan Koelsch. 2019. What are aesthetic
emotions? Psychological Review, 126(2):171–195.
Kishaloy Halder, Alan Akbik, Josip Krapac, and Roland
Vollgraf. 2020. "Task-Aware Representation of Sen- George A. Miller. 1995. Wordnet: A lexical database
tences for Generic Text Classification". In Proceed- for english. Commun. ACM, 38(11):39–41.
ings of the 28th International Conference on Com-
putational Linguistics, pages 3202–3213, Barcelona, Saif Mohammad. 2012. # Emotional tweets. In * SEM
Spain (Online). International Committee on Compu- 2012: The First Joint Conference on Lexical and
tational Linguistics. Computational Semantics–Volume 1: Proceedings of
the main conference and the shared task, and Volume
Pengcheng He, Xiaodong Liu, Jianfeng Gao, and
2: Proceedings of the Sixth International Workshop
Weizhu Chen. 2021. DeBERTa: Decoding-enhanced
on Semantic Evaluation (SemEval 2012), pages 246–
BERT with Disentangled Attention. In International
255.
Conference on Learning Representations.
Jan Hofmann, Enrica Troiano, Kai Sassenberg, and Ro- Saif Mohammad and Felipe Bravo-Marquez. 2017.
man Klinger. 2020. "Appraisal Theories for Emotion Emotion Intensities in Tweets. In Proceedings of the
Classification in Text". In Proceedings of the 28th 6th Joint Conference on Lexical and Computational
International Conference on Computational Linguis- Semantics (*SEM 2017), pages 65–77, Vancouver,
tics, Barcelona, Spain (Online). International Com- Canada. Association for Computational Linguistics.
mittee on Computational Linguistics.
Saif Mohammad, Felipe Bravo-Marquez, Mohammad
Maximilian Köper, Evgeny Kim, and Roman Klinger. Salameh, and Svetlana Kiritchenko. 2018a. SemEval-
2017. "IMS at EmoInt-2017: Emotion Intensity 2018 task 1: Affect in tweets. In Proceedings of The
Prediction with Affective Norms, Automatically Ex- 12th International Workshop on Semantic Evaluation,
tended Resources and Deep Learning". In Proceed- pages 1–17, New Orleans, Louisiana. Association for
ings of the 8th Workshop on Computational Ap- Computational Linguistics.
proaches to Subjectivity, Sentiment and Social Media
Analysis, pages 50–57, Copenhagen, Denmark. Asso- Saif Mohammad, Felipe Bravo-Marquez, Moham-
ciation for Computational Linguistics. mad Salameh, and Svetlana Kiritchenko. 2018b.
SemEval-2018 task 1: Affect in tweets. In Proceed-
Angeliki Lazaridou, Georgiana Dinu, and Marco Ba- ings of The 12th International Workshop on Semantic
roni. 2015. "Hubness and Pollution: Delving into Evaluation, pages 1–17, New Orleans, Louisiana. As-
Cross-Space Mapping for Zero-Shot Learning". In sociation for Computational Linguistics.
Proceedings of the 53rd Annual Meeting of the As-
sociation for Computational Linguistics and the 7th Saif M Mohammad and Peter D Turney. 2013. Crowd-
International Joint Conference on Natural Language sourcing a word–emotion association lexicon. Com-
Processing (Volume 1: Long Papers), pages 270–280, putational intelligence, 29(3):436–465.
Beijing, China. Association for Computational Lin-
guistics. Sungjoon Park, Jiseon Kim, Seonghyeon Ye, Jaeyeol
Jeon, Hee Young Park, and Alice Oh. 2021. Dimen-
Mike Lewis, Yinhan Liu, Naman Goyal, Marjan sional Emotion Detection from Categorical Emotion.
Ghazvininejad, Abdelrahman Mohamed, Omer In Proceedings of the 2021 Conference on Empiri-
Levy, Veselin Stoyanov, and Luke Zettlemoyer. cal Methods in Natural Language Processing, pages
2020. BART: Denoising Sequence-to-Sequence Pre- 4367–4380, Online and Punta Cana, Dominican Re-
training for Natural Language Generation, Transla- public. Association for Computational Linguistics.
tion, and Comprehension. In Proceedings of the 58th
Annual Meeting of the Association for Computational Flor Miriam Plaza-del Arco, Salud M. Jiménez Zafra,
Linguistics, pages 7871–7880, Online. Association Arturo Montejo Ráez, M. Dolores Molina González,
for Computational Linguistics. Luis Alfonso Ureña López, and María Teresa
6814
Martín Valdivia. 2021. Overview of the EmoE- P.S. Sreeja and G.S. Mahalaksmi. 2015. Applying vec-
valEs task on emotion detection for Spanish at Iber- tor space model for poetic emotion recognition. Ad-
LEF 2021. Procesamiento del Lenguaje Natural, vances in Natural and Applied Sciences, 9(6 SE):486–
67(3):273–294. 491.
Flor Miriam Plaza-del-Arco, Carlo Strapparava, L. Al- Marco Antonio Stranisci, Simona Frenda, Eleonora Cec-
fonso Ureña-López, and M. Teresa Martín-Valdivia. caldi, Valerio Basile, Rossana Damiano, and Viviana
2020. EmoEvent: A Multilingual Emotion Corpus Patti. 2022. Appreddit: a corpus of reddit posts an-
based on different Events. In Proceedings of the notated for appraisal. In Proceedings of the Lan-
12th Language Resources and Evaluation Confer- guage Resources and Evaluation Conference, pages
ence, pages 1492–1498, Marseille, France. European 3809–3818, Marseille, France. European Language
Language Resources Association. Resources Association.
Robert Plutchik. 2001. The Nature of Emotions: Human Shabnam Tafreshi, Orphee De Clercq, Valentin Barriere,
emotions have deep evolutionary roots, a fact that Sven Buechel, João Sedoc, and Alexandra Balahur.
may explain their complexity and provide tools for 2021. "WASSA 2021 Shared Task: Predicting Em-
clinical practice. American scientist, 89(4):344–350. pathy and Emotion in Reaction to News Stories". In
Daniel Preoţiuc-Pietro, H. Andrew Schwartz, Gregory Proceedings of the Eleventh Workshop on Compu-
Park, Johannes Eichstaedt, Margaret Kern, Lyle Un- tational Approaches to Subjectivity, Sentiment and
gar, and Elisabeth Shulman. 2016. "Modelling Va- Social Media Analysis, pages 92–104, Online. Asso-
lence and Arousal in Facebook posts". In Proceed- ciation for Computational Linguistics.
ings of the 7th Workshop on Computational Ap-
Enrica Troiano, Laura Oberländer, and Roman Klinger.
proaches to Subjectivity, Sentiment and Social Media
2023. Dimensional modeling of emotions in text
Analysis, pages 9–15, San Diego, California. Associ-
with appraisal theories: Corpus creation, annotation
ation for Computational Linguistics.
reliability, and prediction. Computational Linguis-
Milos Radovanovic;, Alexandros Nanopoulos, and Mir- tics, 49(1).
jana Ivanovic. 2010. Hubs in Space: Popular Nearest
Neighbors in High-Dimensional Data. Journal of Enrica Troiano, Sebastian Padó, and Roman Klinger.
Machine Learning Research, 11(86):2487–2531. 2019. "Crowdsourcing and Validating Event-focused
Emotion Corpora for German and English". In Pro-
Anthony Rios and Ramakanth Kavuluru. 2018. "Few- ceedings of the 57th Annual Meeting of the Associ-
Shot and Zero-Shot Multi-Label Learning for Struc- ation for Computational Linguistics, Florence, Italy.
tured Label Spaces". In Proceedings of the 2018 Con- Association for Computational Linguistics.
ference on Empirical Methods in Natural Language
Processing, pages 3132–3142, Brussels, Belgium. Wei Wang, Vincent W. Zheng, Han Yu, and Chunyan
Association for Computational Linguistics. Miao. 2019. A Survey of Zero-Shot Learning: Set-
tings, Methods, and Applications. ACM Trans. Intell.
James A Russell and Albert Mehrabian. 1977. Evidence Syst. Technol., 10(2).
for a three-factor theory of emotions. Journal of
research in Personality, 11(3):273–294. Adina Williams, Nikita Nangia, and Samuel Bowman.
2018. "A Broad-Coverage Challenge Corpus for Sen-
Prateek Veeranna Sappadla, Jinseok Nam, Eneldo tence Understanding through Inference". In Proceed-
Loza Mencía, and Johannes Fürnkranz. 2016. Us- ings of the 2018 Conference of the North American
ing Semantic Similarity for Multi-Label Zero-Shot Chapter of the Association for Computational Lin-
Classification of Text Documents. In Proceedings of guistics: Human Language Technologies, Volume
the 23rd European Symposium on Artificial Neural 1 (Long Papers), pages 1112–1122, New Orleans,
Networks, Computational Intelligence and Machine Louisiana. Association for Computational Linguis-
Learning (ESANN-16), Bruges, Belgium. d-side pub- tics.
lications.
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien
Klaus R. Scherer and Harald G. Wallbott. 1997. The
Chaumond, Clement Delangue, Anthony Moi, Pier-
ISEAR Questionnaire and Codebook. Geneva Emo-
ric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz,
tion Research Group.
Joe Davison, Sam Shleifer, Patrick von Platen, Clara
Craig A. Smith and Phoebe C. Ellsworth. 1985. Pat- Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven
terns of cognitive appraisal in emotion. Journal of Le Scao, Sylvain Gugger, Mariama Drame, Quentin
personality and social psychology, 48(4):813. Lhoest, and Alexander Rush. 2020. "Transformers:
State-of-the-Art Natural Language Processing". In
Richard Socher, Milind Ganjoo, Christopher D. Man- Proceedings of the 2020 Conference on Empirical
ning, and Andrew Y. Ng. 2013. Zero-Shot Learning Methods in Natural Language Processing: System
through Cross-Modal Transfer. In Proceedings of the Demonstrations, Online. Association for Computa-
26th International Conference on Neural Informa- tional Linguistics.
tion Processing Systems - Volume 1, NIPS’13, page
935–943, Red Hook, NY, USA. Curran Associates Yongqin Xian, Christoph H. Lampert, Bernt Schiele,
Inc. and Zeynep Akata. 2019. Zero-Shot Learning—A
6815
Comprehensive Evaluation of the Good, the Bad and
the Ugly. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 41(9):2251–2265.
Wenpeng Yin, Jamaal Hay, and Dan Roth. 2019.
"Benchmarking Zero-shot Text Classification:
Datasets, Evaluation and Entailment Approach". In
Proceedings of the 2019 Conference on Empirical
Methods in Natural Language Processing and the 9th
International Joint Conference on Natural Language
Processing (EMNLP-IJCNLP), pages 3914–3923,
Hong Kong, China. Association for Computational
Linguistics.
6816
A Experiment Results
Table 6: Results from the set of prompts across emotion datasets (T EC, B LOGS and I SEAR) and NLI models.
We report macro-average precision (P), macro-average recall (R), and macro-average F1 (F1 ) for each model.
(r: RoBERTa, b: BART, d: DeBERTa, d-synonyms: DeBERTa using as prompts synonyms. In cases where no
experiments have been performed, we use ’—’. Figures 2 and 3 in the paper depict these experiments.
6817