Professional Documents
Culture Documents
Abstract—Emotions are essential for constructing social relationships between humans and interactive systems. Although emotional
and empathetic dialogue generation methods have been proposed for dialogue systems, appropriate dialogue involves not only
mirroring emotions and always being empathetic but also complex factors such as context. This paper proposes Emotion Regulation
Chat (ER-Chat) as an end-to-end dialogue framework for emotion regulation. Emotion regulation is concerned with actions to approach
appropriate emotional states. Learning appropriate emotion and intent when responding on the basis of the context of the dialogue
enables the generation of more human-like dialogue. We conducted automatic and human evaluations to demonstrate the superiority of
ER-Chat over the baseline system. The results show that inclusion of emotion and intent prediction mechanisms enable generation of
dialogues with greater fluency, diversity, emotion awareness, and emotion appropriateness, which are greatly preferred by humans.
Fig. 2. This figure is a framework structure diagram of our proposed method, ER-chat.
regulation by fine-tuning the multitasking of the emotion the EmpatheticDialogue dataset, eventually obtaining 8
and intent prediction mechanism. Fig. 2 shows an overview high-frequency intents (Questioning, Acknowledging, Consol-
diagram of the proposed framework. ing, Agreeing, Encouraging, Sympathizing, Suggesting, Wish-
ing) and (Neutral. We trained a BERT-based classifier and
achieved automatic intent labeling. The classification perfor-
3.1 Dataset mance was also reasonable with an accuracy of 85.4 and a
Several dialogue corpora have been published for open- macro-F1 score of 85.6.
domain neural dialogue generation. Rashkin et al. [8] pub-
lished the EmpatheticDialogues dataset to train and evalu- 3.2 Methodology
ate empathetic dialogue systems. This dataset has 24,850 Our proposed approach is a text-to-text dialogue frame-
empathetic dialogues based on 32 equally distributed emo- work for emotion regulation. ER-Chat is a multitasking
tions. However, while this dataset is one of the best corpora extension of the Transformer-based T5 model with added
for the generation of emotional dialogue, it only contains modules for emotion and intent predictions. T5 is a deep
emotion labels for the entire context. Emotion regulation is neural network with high flexibility and performance that
shown in Fig. 1, where listeners interact with different emo- allows one pre-trained model to be transfer-trained for
tions and intents in their sentences. Therefore, we aug- diverse tasks by redefinition of the text-to-text problem for-
mented this dataset with emotion and intent labels. To mat. Learning based on real-world knowledge based on T5
minimize human effort, we built a classifier and performed architecture enables generation of responses with appropri-
automatic annotations. Each sentence thus contains two ate emotions and intent while inheriting human-like
labels, namely emotion and intent. The details of the labels response generation capabilities. The data for the training
and heat maps for each emotion and intent are shown in process are formulated as follows:
Fig. 3.
Emotion Label D ¼ fC; R; E; Ig; (1)
Ekman’s Universal Emotions Theory [40] is often used
for emotion labeling. However, in this study, we imple-
mented a classifier based on GoEmotions [41] for more
detailed emotion labeling. GoEmotions has 27 categories
and neutral labels based on 58,000 comments sourced from
Reddit. However, due to the disproportionate number of
emotion labels in the dataset, we selected nine emotions
and neutral when using the hierarchical clustering pro-
posed by Demszky et al. [41]. Based on the official imple-
mentation1 of the pre-trained BERT base model, ten
emotion labels (Joy, Caring, Admiration, Gratitude, Approval,
Surprise, Fear, Sadness, Anger, Neutral) were assigned to each
sentence in the EmpatheticDialogue dataset. The classifica-
tion performance was reasonable with an accuracy of 65.1
and a macro-F1 score of 65.2.
Intent Label
Welivita et al. [42] proposed an intent-based taxonomy in
human social conversations, for which datasets are publicly
available. They manually annotated 15 different intents for
1. https://github.com/google-research/google-research/tree/mas- Fig. 3. Heatmap to label the emotion and intent of sentences in the
ter/goemotions EmpatheticDialogue dataset.
2232 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 13, NO. 4, OCTOBER-DECEMBER 2022
TABLE 1
Example of Generated Text Using the Baseline and ER-Chat by EmpatheticDialogue Dataset Context
Speaker: I’m leaving tomorrow to go on vacation! MoEL That sounds exciting ! I hope you are happy for you !
Listner: Hey that’s awesome! Where ya goin? EmpDG I have never had a couple of nostalgia.
Speaker: We are going to Daytona and Orlando for 10 days. T5-base That sounds like a lot of fun!
ER-Chat Those are awesome! I’m going to Florida.
MoEL That is good. What are you going to do?
Speaker: I have always more trust on my girlfriend.
EmpDG I am sure you are feeling better now.
T5-base Why?
I agree. That’s a good thing, but trust can be hard to
ER-Chat
find.
Speaker: My favorite team blew a big lead in the playoffs. MoEL I agree, I am sure you are doing it !
It was like drinking sour milk. EmpDG I am so sorry to hear that. It is hard to be a pain.
Listner: im a browns fan i feel that pain T5-base I would be so mad!
Speaker: I hate when teams choke. It makes me sick. ER-Chat yea I don’t blame you for it.
MoEL You should have a good relationship.
Speaker: I can’t help feel stressed about money. EmpDG oh no ! that is a shame.
T5-base Why is that?
ER-Chat Why? Is it the money you have that keeps you busy?
BERTScore [49]: BERTScore is used as a measure of in the human evaluations of the baseline method, in addi-
the fluency of the model output. The tokens in the tion to Emotion Awareness and Emotion Appropriateness,
generated and gold standard sentences are each con- which are indices assessed to clarify whether the dialogue
verted to vector representations using BERT. The generation was directed toward emotion regulation.
similarity between the generated and gold standard Responses were generated by the proposed method and
sentences is determined by the weighted cosine simi- three baseline methods for each of the 50 dialogue context
larity using these distributed representations. We samples. To ensure fairness, each response was presented
use F1-score for metrics that is the harmonic mean of in random order, and the subjects annotated all responses
the precision and recall. on a five-point Likert scale (1 = very bad, 2 = bad, 3 = neu-
tral, 4 = good, 5 = very good). A sample screen of the Likert
scale test is shown in Fig. 4.
4.3.2 Human Evaluation
Fluency: How linguistically intelligible is the response?
To evaluate the response quality of the proposed method,
Relevance: How relevant is the response to the context and
human evaluations with annotations of the subjects was
user?
performed. To ensure quality, we conducted two types of
experiments, namely annotation evaluation based on the
Likert scale using five evaluation indices and A/B test eval-
uation. As text data for evaluation, 100 dialogue contexts
corresponding to the test data were extracted from the
dataset. Of these, half were used for the Likert scale evalua-
tion experiment and the remaining half were used for the
A/B test. Two hundred subjects were recruited using Ama-
zon Mechanical Turk,3 a crowdsourcing service. The sub-
jects were Master users and therefore experts at using the
Amazon Mechanical Turk; all of them live in the U.S. and
are fluent in English. The age of the subjects ranged from
20 to 60 years; there were 114 males and 84 females, and
two subjects did not prefer to tell us their genders. A
reward of ten dollars was given per person, and the
response time was approximately one hour. Examples of
generated text used during the manual evaluations are
shown in Table 1.
TABLE 2
Result of Auto/Human Evaluation
Empathy: How empathetic is the response? method included appropriate emotions. Table 3 shows the
Emotion Awareness: How aware is the Bot regarding the results of the A/B test performed on the results of the auto-
user’s emotion? matic and human evaluations. When subjects were asked to
Emotion Appropriateness: How appropriate is the annotate which approach was better, the results showed that
response in dealing with emotions? the proposed approach was much better than the baseline
method, similar to the results of the Likert scale test.
A/B Test
We conducted repeated measures ANOVA tests using the
To further solidify the human evaluations, the A/B test
proposed method and the three baseline assessments based
was conducted to compare the dialogues with their con-
on the aforementioned results. The results showed that five
texts. This test demonstrates the superiority of the proposed
metrics: Fluency, Satisfaction, Relevance, Emotion Aware-
method by directly comparing the responses generated by
ness, and Emotion Appropriateness, were significantly dif-
ER-Chat with those of the baseline method. Fifty annotators
ferent (p < 0.01) for all five indicators. Therefore, multiple
were asked to choose among 1. ER-Chat is better, 2. Both are
comparisons were conducted using Tukey’s test as a post-
equal, and 3. (MoEL, EmpDG, T5) is better for a sample of 50
hoc comparison, and significant differences (p < 0.01) were
randomly ordered contexts and responses.
found for all five indicators for ER-Chat, MoEL, and EmpDG.
Furthermore, no significant differences were found in multi-
4.4 Results ple comparisons for ER-Chat and T5-base. The reason for
4.4.1 Automatic Evaluation this may be that the pre-trained model was used, which
The left side of Table 2 details the results of the automatic could not significantly outperform the proposed method
evaluations. Compared to the baseline method, the output due to dependencies such as vocabulary usage.
text in ER-Chat had higher Distinct-2 and BERTScore values. Table 3 shows the results of the A / B test performed
However, for Perplexity and Distinct-1, the highest score based on the results of automatic and human evaluations.
was obtained in the ablation test or baseline. As the BERT- When subjects were asked to annotate which was better, the
Score is an index used to evaluate similarity between the gen- results were superior to those of the baseline method, as
erated sentences and BERTScore, the proposed method was were the results of the Likert scale. Slightly better results
observed to allow for more emotional responses than the were also obtained in comparison with those from T5-base.
existing methods by regulating emotions against the gold We conducted a paired t-test for the results of this A / B
standard response. Further, Distinct-N is indices that evalu- test, and found a significant difference (p < 0.01) for ER-
ate the diversity of the generated dialogues; in terms of these Chat versus MoEL and ER-Chat versus EmpDG. Although
indices, the performance of the proposed method was defi- there were no significant differences observed for ER-Chat
nitely superior, indicating that it allows more consistent and versus T5-base, the proposed method, outperformed all the
diverse outputs. Since these differences were negligible, it baseline methods in the percentage of A / B test selections.
cannot be concluded that the proposed method significantly
4.5 Case Study
outperformed the baseline SoTA method, but we showed
comparable results to the SoTA method. We conducted a case study to identify the context important for
emotion regulation. Table 4 shows the input contexts and emo-
tion scenarios from the original dataset sampled from Empa-
4.4.2 Human Evaluation theticDialogue, ER-Chat responses to these, and the gold
The right side of Table 2 shows the results of the human eval- standard responses, which denote the actual response text in
uations. All indicators, namely Fluency, Satisfaction, Rele- the dataset.
vance, Emotion Awareness, and Emotion Appropriateness,
show that our proposed method outperforms the baseline TABLE 3
method. In particular, because ER-Chat and T5-base use pre- Result of A/B Test in Human Evaluation
trained language models, they significantly outperform the Models Win Tie Lose
Transformer-based MoEL and EmpDG for all metrics. Simi-
larly, the two emotional indices, namely Emotion Aware- ER-Chat versus MoEL 65.30% 23.50% 11.20%
ER-Chat versus EmpDG 72.08% 18.16% 9.76%
ness, and Emotion Appropriateness, were above those of the ER-Chat versus T5-base 40.04% 25.44% 34.52%
baseline, thus confirming that the output text of the proposed
KATAYAMA ET AL.: ER-CHAT: A TEXT-TO-TEXT OPEN-DOMAIN DIALOGUE FRAMEWORK FOR EMOTION REGULATION 2235
TABLE 4
Examples of Context and Response in the EmpatheticDialogue Dataset for a Case Study
Scenario Lonely
Speaker: I was going to go to the movies with a bunch of friends. But they all bailed on me and
now I have to go alone.
Context Listener: If the movie is worth it you will never miss them.
Case 1
Speaker: I hope so, but I am really missing them right now. I didn’t want to go alone.
ER-Chat You should go with a group of friends if you don’t mind me asking. Its a good feeling.
Gold I understand. Its tough these days.
Scenario Grateful
Speaker: Im so grateful for my family due to an incident with my friend.
Listener: Oh really? What happened with your family and friend then?
Context
Case 2 Speaker: My friend lost her parents and siblings in a fire and their family was very close to mine.
So seeing her go through the grief made me realize how thankful I should be.
ER-Chat Wow, that is so sad to hear. I have a friend who is a firefighter and I am grateful for her.
Oh wow, that’s awful. Stuff like that really teaches you to appreciate what you’ve got and to not
Gold
take it for granted.
Scenario Annoyed
Speaker: I have a friend that has a child that is unruly and does bad things. One day at our
Context home, he grabbed a game and sat on it, instantly breaking it! It was expensive and this
Case 3 really irked me, especially because she doesn’t discipline him when he does bad things.
ER-Chat Oh my god. That sucks! Its a hard situation for him to be in.
Gold oh man. i’m all about discipline! I don’t like spoiled bratty kids.
Scenario Proud
Speaker: i’m so proud of my son! He learned a couple of words this week. He now knows how
Context to ay please
Case 4 Listener: That’s wonderful, they say children have a certain gift for picking up languages.
Speaker: he actually speaks 2 languages. I teach him english and russian
ER-Chat That’s amazing, I bet he is a very smart kid.
Gold Wow! That’s impressive and a lasting skill for your child to develop.
Case 1 describes a scenario in which the speaker is feel- result, it often differs from Gold’s response as well, but most
ing lonely when going to the movies alone. ER-Chat of the responses generated by ER-Chat are more natural and
response in this situation suggests inviting a friend. This are one of appropriate responses. Therefore, it is essential to
response scenario is more like a human conversation, some- use the information on emotions and intents to generate more
thing that cannot be generated by lonely scenario detection human-like dialogues.
and empathy alone. Thus, this is one example of how ER-
Chat can achieve emotion regulation.
Case 2 describes a scenario in which the speaker is grateful 5 DISCUSSION FOR FUTURE WORKS
to his family. ER-Chat response not only owns the grateful
Emotion and Intent Limitation
emotion but also uses words such as firefighter to capture the
In this research, we used an emotion taxonomy of GoE-
context. Thus, the response can reliably take context into
motions [41]. However, it must be noted that, other meth-
account even in multi-turn dialogues. The response is as empa-
ods, such as Russell’s circumplex model [50], which treats
thetic as Gold’s, and natural responses could be generated.
emotions as continuous values rather than independent cat-
Case 3 describes a scenario in which a person is angry
egories [51], have been proposed. In addition, we used exist-
with his or her child. ER-Chat’s response is similar to
ing methods for the intent taxonomy [42]; however, this
Gold’s, with anger emotion and agreeing intent. In this con-
approach is limited as it cannot perform exact labeling. Fur-
text, dialogue systems in which the response is always posi-
thermore, our approach does not entirely remove noise
tive will appease or calm the speaker. However, ER-Chat
because of the automatic labeling of emotions and intent
facilitates dialogue by responding with the same emotion as
using BERT. Therefore, expressions of emotion and inten-
the speaker. Thus, ER-Chat generates with appropriate
tion can be handled in a more detailed manner by using vec-
emotion and intent, considering the context.
tor representations or continuous values.
Case 4 describes a scenario in which a person is proud of
his son. ER-Chat responds with the same surprise emotion Context Limitations
and acknowledging intent as Gold and generates appropriate It is necessary to estimate the appropriate emotion for
dialogue based on context. As mentioned above, there is no contextual information such as user behavior and sentence
correct answer for generating appropriate dialogue. As a content for emotion regulation. Therefore, if this dialogue
2236 IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 13, NO. 4, OCTOBER-DECEMBER 2022
system is to be implemented as a real application, the [6] C. Huang, O. R. Zaiane, A. Trabelsi, and N. Dziri, “Automatic dia-
logue generation with expressed emotions,” in Proc. Conf. North
long-term context, such as dialogue history, should be Amer. Chapter Assoc. Comput. Linguistics: Hum. Lang. Technol., 2018,
considered. In addition, the generation of open-domain pp. 49–54.
dialogue is a difficult task with no correct answer. It can [7] X. Sun, X. Chen, Z. Pei, and F. Ren, “Emotional human machine
be extended to persona models [20], [21] since different conversation generation based on SeqGAN,” in Proc. 1st Asian
Conf. Affect. Comput. Intell. Interaction, 2018, pp. 1–6.
people have their preferences and personalities for dia- [8] H. Rashkin, E. M. Smith, M. Li, and Y.-L. Boureau, “Towards
logue responses. empathetic open-domain conversation models: A new benchmark
and dataset,” in Proc. 57th Annu. Meeting Assoc. Comput. Linguis-
Ethical Problems tics, 2019, pp. 5370–5381.
Dialogue systems with emotion regulation can be applied [9] Q. Li, H. Chen, Z. Ren, P. Ren, Z. Tu, and Z. Chen, “EmpDG:
for addressing mental health and counselling situations in Multi-resolution interactive empathetic dialogue generation,” in
Proc. 28th Int. Conf. Comput. Linguistics, 2020, pp. 4454–4466.
the future. However, the text generated from an end-to-end [10] J. J. Gross, “The emerging field of emotion regulation: An integra-
model is a black box, and therefore, the responses generated tive review,” Rev. Gen. Psychol., vol. 2, no. 3, pp. 271–299, Sep. 1998.
may be unintended. The problem of unethical dialogue and [11] S. L. Koole, “The psychology of emotion regulation: An integra-
tive review,” Cogn. Emotion, vol. 23, no. 1, pp. 4–41, Jan. 2009.
behavior does not only exist for dialogue systems but for all [12] C. Reeck, D. R. Ames, and K. N. Ochsner, “The social regulation of
humanoid artificial intelligence systems. Methods such as emotion: An integrative, cross-disciplinary model,” Trends Cogn.
online learning that can perform sequential learning should Sci., vol. 20, no. 1, pp. 47–63, Jan. 2016.
be considered, including methods personalized for the [13] N. M. Thompson, A. Uusberg, J. J. Gross, and B. Chakrabarti,
“Empathy and emotion regulation: An integrative account,” Prog.
desired users. Privacy is another issue of concern. A long- Brain Res., vol. 247, pp. 273–304, May 2019.
term consideration of context and preferences means that [14] C. Raffel et al., “Exploring the limits of transfer learning with a
one’s emotional state is no longer private. Therefore, we unified text-to-text transformer,” J. Mach. Learn. Res., vol. 21,
pp. 1–67, 2020.
expect to be able to run such agents on devices that apply [15] L. Shang, Z. Lu, and H. Li, “Neural responding machine for short-
embedded machine learning techniques to protect user pri- text conversation,” in Proc. 53rd Annu. Meeting Assoc. Comput. Lin-
vacy while performing context processing entirely locally. guistics 7th Int. Joint Conf. Natural Lang. Process., 2015, pp. 1577–
1586. [Online]. Available: https://aclanthology.org/P15–1152
[16] I. Sutskever, O. Vinyals, and Q. Le, “Sequence to sequence learn-
ing with neural networks,” in Proc. Adv. Int. Conf. Neural Inf. Pro-
6 CONCLUSION cess. Syst., 2014, pp. 3104–3112.
[17] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural
This article proposes ER-Chat, a text-to-text dialogue genera- Inf. Process. Syst., 2017, pp. 5998–6008.
tion framework for emotion regulation. The proposed method [18] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-
enables dialogue generation by fine-tuning T5 model to pre- training of deep bidirectional transformers for language under-
standing,” 2018, arXiv:1810.04805.
dict emotions and intents expressed by listeners based on the [19] A. Radford et al., “Language models are unsupervised multitask
context of their dialogues. The proposed method consists of learners,” OpenAI Blog, vol. 1, no. 8, 2019, Art. no. 9.
automatic evaluations using Perplexity, BERTScore, and Dis- [20] J. Li, M. Galley, C. Brockett, G. Spithourakis, J. Gao, and W. B.
tinct-N as the evaluation measures for dialogue generation, Dolan, “A persona-based neural conversation model,” in Proc.
54th Annu. Meeting Assoc. Comput. Linguistics, 2016, pp. 994–1003.
Likert scale with 100 subjects, and A/B test with 100 subjects [21] Q. Liu et al., “You impress me: Dialogue generation via mutual
based on five evaluation metrics: Fluency, Relevance, Empa- persona perception,” in Proc. 58th Annu. Meeting Assoc. Comput.
thy, Emotion Awareness, and Emotion Appropriateness. The Linguistics, 2020, pp. 1417–1427.
[22] A. W. Li, V. Jiang, S. Y. Feng, J. Sprague, W. Zhou, and J. Hoey,
results showed that ER-Chat showed comparable perfor- “ALOHA: Artificial learning of human attributes for dialogue
mance as that of the SoTA method on the dialogue generation agents,” in Proc. AAAI Conf. Artif. Intell., 2020, pp. 8155–8163.
metrics and outperformed the baseline method on human [23] N. Asghar, P. Poupart, J. Hoey, X. Jiang, and L. Mou, “Affective
evaluation. By applying the proposed dialogue system to real neural response generation,” in Proc. Eur. Conf. Inf. Retrieval, 2018,
pp. 154–166.
applications, we believe that it may be possible to realize a [24] P. Colombo, W. Witon, A. Modi, J. Kennedy, and M. Kapadia,
dialogue system that is human-like and builds social relation- “Affect-driven dialog generation,” in Proc. Conf. North Amer. Chapter
ships by storing past dialogues and other information from Assoc. Comput. Linguistics: Hum. Lang. Technol., 2019, pp. 3734–3743.
users over long periods of time. [25] Z. Song, X. Zheng, L. Liu, M. Xu, and X.-J. Huang, “Generating
responses with a specific emotion in dialog,” in Proc. 57th Annu.
Meeting Assoc. Comput. Linguistics, 2019, pp. 3685–3695.
[26] Y.-P. Ruan and Z. Ling, “Emotion-regularized conditional variational
REFERENCES autoencoder for emotional response generation,” IEEE Trans. Affective
[1] M. Huang, X. Zhu, and J. Gao, “Challenges in building intelligent Comput., to be published, doi: 10.1109/TAFFC.2021.3073809.
open-domain dialog systems,” ACM Trans. Inf. Syst., vol. 38, no. 3, [27] M. Firdaus, H. Chauhan, A. Ekbal, and P. Bhattacharyya, “More
pp. 1–32, 2020. the merrier: Towards multi-emotion and intensity controllable
[2] R. W. Picard, Affective Computing, Cambridge, MA, USA: The MIT response generation,” in Proc. AAAI Conf. Artif. Intell., 2021,
Press, 2000. pp. 12821–12829.
[3] H. Prendinger, J. Mori, and M. Ishizuka, “Using human physiol- [28] F. Cui, H. Di, L. Shen, K. Ouchi, Z. Liu, and J. Xu, “Modeling
ogy to evaluate subtle expressivity of a virtual quizmaster in a semantic and emotional relationship in multi-turn emotional con-
mathematical game,” Int. J. Hum.- Comput. Stud., vol. 62, no. 2, versations using multi-task learning,” Appl. Intell., vol. 52,
pp. 231–245, 2005. pp. 4663–4673, 2022.
[4] H. Prendinger and M. Ishizuka, “The empathic companion: A [29] L. Shen and Y. Feng, “CDL: Curriculum dual learning for emo-
character-based interface that addresses users’affective states,” tion-controllable response generation,” in Proc. 58th Annu. Meeting
Appl. Artif. Intell., vol. 19, no. 3/4, pp. 267–285, 2005. Assoc. Comput. Linguistics2020, pp. 556–566.
[5] H. Zhou, M. Huang, T. Zhang, X. Zhu, and B. Liu, “Emotional [30] Z. Lin, A. Madotto, J. Shin, P. Xu, and P. Fung, “MoEL: Mix-
chatting machine: Emotional conversation generation with inter- ture of empathetic listeners,” in Proc. Conf. Empir. Methods Nat-
nal and external memory,” in Proc. AAAI Conf. Artif. Intell., 2018, ural Lang. Process. 9th Int. Joint Conf. Natural Lang. Process.,
Art. no. 90. 2019, pp. 121–132.
KATAYAMA ET AL.: ER-CHAT: A TEXT-TO-TEXT OPEN-DOMAIN DIALOGUE FRAMEWORK FOR EMOTION REGULATION 2237
[31] N. Majumder et al., “MIME: MIMicking emotions for empathetic Shunsuke Aoki received the BEng degree from
response generation,” in Proc. Conf. Empir. Methods Natural Lang. Waseda University, Tokyo, Japan, the MS degree
Process., 2020, pp. 8968–8979. from The University of Tokyo, Tokyo, Japan, and
[32] R. Zandie and M. H. Mahoor, “EmpTransfo: A multi-head trans- the PhD degree from Carnegie Mellon University,
former architecture for creating empathetic dialog systems,” in Pittsburgh, Pennsylvania. After working as a
Proc. 33rd Int. Flairs Conf., 2020. researcher with Carnegie Mellon University and
[33] H.-J. Choi and Y.-J. Lee, “Deep learning based response genera- Nagoya University, Japan, he is currently an
tion using emotion feature extraction,” in Proc. IEEE Int. Conf. Big assistant professor with National Institute of Infor-
Data Smart Comput., 2020, pp. 255–262. matics, Tokyo, Japan. His research interests
[34] N. Lubis, S. Sakti, K. Yoshino, and S. Nakamura, “Positive emo- include cyber -physical systems, vehicular com-
tion elicitation in chat-based dialogue systems,” IEEE/ACM Trans. munications, and multi-robot coordination.
Audio, Speech, Lang. Process., vol. 27, no. 4, pp. 866–877, Apr. 2019.
[35] Y. Liang, F. Meng, Y. Zhang, Y. Chen, J. Xu, and J. Zhou, “Infusing
multi-source knowledge with heterogeneous graph neural net- Takuro Yonezawa received the PhD degree in
work for emotional conversation generation,” in Proc. AAAI Conf. media and governance from Keio University, in
Artif. Intell., 2021, pp. 13343–13352. 2010. He is an associate professor with the Grad-
[36] Z. Ma, R. Yang, B. Du, and Y. Chen, “A control unit for emotional uate School of Engineering, Nagoya University,
conversation generation,” IEEE Access, vol. 8, pp. 43168–43176, 2020. Japan. His research interests are the intersection
[37] W. Wei et al., “Emotion-aware chat machine: Automatic emotional of the distributed systems, human-computer
response generation for human-like emotional interaction,” in interaction and sensors/actuators technologies.
Proc. 28th ACM Int. Conf. Inf. Knowl. Manage., 2019, pp. 1401–1410. He is a member of IPSJ, IEICE and ACM.
[38] J. J. Gross, H. Uusberg, and A. Uusberg, “Mental illness and well-
being: An affect regulation perspective,” World Psychiatry, vol. 18,
no. 2, pp. 130–139, 2019.
[39] K. Niven, P. Totterdell, and D. Holman, “A classification of con-
trolled interpersonal affect regulation strategies,” Emotion, vol. 9,
no. 4, 2009, Art. no. 498. Tadashi Okoshi received the BA degree in envi-
[40] P. Ekman et al., “Universals and cultural differences in the judg- ronmental information, in 1998, and the master of
ments of facial expressions of emotion,” J. Pers. Social Psychol., media and governance, in 2000 from Keio Univer-
vol. 53, no. 4, 1987, Art. no. 712. sity, the MS degree in computer science, in 2006
[41] D. Demszky, D. Movshovitz-Attias, J. Ko, A. Cowen, G. Nemade, from Carnegie Mellon University, and the PhD
and S. Ravi, “Goemotions: A dataset of fine-grained emotions,” degree in media and governance, in 2015 from
2020, arXiv:2005.00547. Keio University, respectively. He is an associate
[42] A. Welivita and P. Pu, “A taxonomy of empathetic response professor with the Faculty of Environment and
intents in human social conversations,” in Proc. 28th Int. Conf. Information Studies, Keio University. His recent
Comput. Linguistics, 2020, pp. 4886–4899. [Online]. Available: research interests are human attention-aware-
https://aclanthology.org/2020.coling-main.429 ness and management in ubiquitous computing
[43] D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- and cyber physical systems. In Keio University and Carnegie Mellon Uni-
mization,” 2014, arXiv:1412.6980. versity, he has been working on mobile and ubiquitous computing sys-
[44] A. Holtzman, J. Buys, L. Du, M. Forbes, and Y. Choi, “The curious tems, actively joining several R&D projects.
case of neural text degeneration,” 2019, arXiv:1904.09751.
[45] A. Fan, M. Lewis, and Y. Dauphin, “Hierarchical neural story gener-
ation,” in Proc. 56th Annu. Meeting Assoc. Comput. Linguistics, 2018, Jin Nakazawa (Member, IEEE) received the bach-
pp. 889–898. [Online]. Available: https://aclanthology.org/P18–1082 elor’s degree from the Faculty of Policy Manage-
[46] K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “BLEU: A method ment, in 1998, the master’s degree, and doctor of
for automatic evaluation of machine translation,” in Proc. 40th philosophy degree in media and governance from
Annu. Meeting Assoc. Comput. Linguistics, 2002, pp. 311–318. Keio University, in 1998 and 2000. He is professor
[47] C.-W. Liu, R. Lowe, I. Serban, M. Noseworthy, L. Charlin, and with the Faculty of Environment and Information
J. Pineau, “How NOT to evaluate your dialogue system: An Studies, Keio University, Japan. His research inter-
empirical study of unsupervised evaluation metrics for dialogue est includes middleware systems, distributed sys-
response generation,” in Proc. Conf. Empir. Methods Natural Lang. tems, ubiquitous computing systems, and life-
Process., 2016, pp. 2122–2132. [Online]. Available: https://www. logging. He is a member of IEICE, and IPSJ.
aclweb.org/anthology/D16–1230
[48] J. Li, M. Galley, C. Brockett, J. Gao, and W. B. Dolan, “A diversity-
promoting objective function for neural conversation models,” in Nobuo Kawaguchi received the BE, ME, and
Proc. Conf. North Amer. Chapter Assoc. Comput. Linguistics: Hum. PhD degrees in computer science from Nagoya
Lang. Technol., 2016, pp. 110–119. University, Japan, in 1990, 1992, and 1995,
[49] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, respectively. From 1995, he was an associate pro-
“BERTScore: Evaluating text generation with bert,” fessor with the Department of Electrical and Elec-
2019, arXiv:1904.09675. tronic Engineering and Information Engineering,
[50] J. A. Russell, “A circumplex model of affect,” J. Pers. Soc. Psychol., School of Engineering, Nagoya University. Since
vol. 39, no. 6, 1980, Art. no. 1161. 2009, he has been a professor with the Graduate
[51] S. M. Mohammad, “Sentiment analysis: Detecting valence, emo- School of Engineering, Nagoya University. His
tions, and other affectual states from text,” in Emotion Measure- research interests are in the areas of recognition
ment. Amsterdam, Netherlands: Elsevier, 2016, pp. 201–237. of human activity, smart environmental systems,
and ubiquitous communication systems.
Shin Katayama received the BA degee in envi-
ronment and information studies, in 2018, and
the MS degree in media and governance from " For more information on this or any other computing topic,
Keio University, in 2020. He is currently working please visit our Digital Library at www.computer.org/csdl.
toward the PhD degree with the Graduate School
of Engineering, Nagoya University. His current
research interests include human-computer inter-
action, ubiquitous computing systems, and affec-
tive computing.