You are on page 1of 3

EmoTransGAN: Transformer-based Generative

Adversarial Learning for Multimodal Emotion


Detection in Text and Images

I. I NTRODUCTION B. GAN based sentiment Text generation


In human-computer interaction, emotion detection plays a
This section presents an overview of text generation models
pivotal role, finding applications across diverse domains such
in the literature, particularly those employing Generative Ad-
as sentiment analysis in social media and facilitating human-
versarial Networks (GANs). Yu et al. introduced the SeqGAN
robot collaboration. As communication becomes more mul-
model [19], addressing a limitation of GANs related to gen-
timodal, integrating textual and visual information becomes
erating sequences of discrete tokens. The authors conducted
imperative for a holistic understanding of emotions. This
synthetic and real-world data experiments, evaluating Seq-
research proposal puts forth the concept of ”EmoTransGAN,”
GAN’s performance across domains such as Chinese poems,
an innovative approach harnessing Transformer-based Gen-
Obama’s political speeches, and music. Guo et al. proposed
erative Adversarial Learning. EmoTransGAN aims to excel
the LeakGAN model [20], which is specifically designed for
in multimodal emotion detection by seamlessly incorporating
generating lengthy texts. Unlike traditional GAN evaluation,
textual and visual data.
where generated text is assessed as natural or fake post-
generation, the authors advocated for the intermediate assess-
II. R ELATED W ORK ment during text generation. They argued that this approach is
This section highlights some of the critical research in recent convenient for generating extended texts. Evaluation metrics,
years on the topic of sentiment analysis in text generation namely Negative Log-Likelihood (NLL) and BLEU scores,
models. were employed. In synthetic data experiments, LeakGAN’s
performance, assessed using the NLL metric, surpassed that
of Maximum Likelihood Estimation (MLE), SeqGAN, and
A. Text generation RankGAN. The authors compared performance using BLEU
In recent years, there has been a notable surge in the utiliza- scores with the EMNLP 2017 WMT1 dataset, where Leak-
tion of deep neural networks, such as RNN [1], LSTM [3], and GAN consistently outperformed the other models.
CNN [5], demonstrating significant prowess in generative text Che et al. introduced the MaliGAN text generation model
generation. Vu & Le [1] introduced a novel approach by incor- in their research [21], focusing on enhancing the instability of
porating text topics alongside input text to employ RNN for Generative Adversarial Networks (GANs) during backprop-
text generation. Their methodology led to a remarkable 23% agation in the training process. The authors assessed Mali-
accuracy improvement on a Vietnamese news dataset com- GAN’s performance in poetry generation using BLEU scores
pared to a baseline model lacking text generation capabilities. and Perplexity (PPL) metrics. Comparative evaluations with
Similarly, Masum et al. [2] harnessed bi-directional RNN to Maximum Likelihood Estimation (MLE) and SeqGAN models
generate news headlines in Bengali, achieving success by train- indicated MaliGAN’s superior performance. Li et al. proposed
ing the network on Bengali newspapers’ text. In another explo- the JSDGAN model for text generation in their study [22].
ration, Li & Zhang [3] conducted research comparing the text Performance comparisons with MLE, SeqGAN, RankGAN,
generation performance of LSTM and its variants, particularly and LeakGAN models were conducted using NLL and BLEU
emphasizing long text sequences. Their findings highlighted score metrics. The results demonstrated that JSDGAN outper-
LSTM’s superior performance over various metrics, including formed other models in synthetic data, achieving an overall
BERT and BLEURT. Akhtar et al. [6] optimized GRU for NLL score of 5.50. Evaluation using BLEU scores on Chinese
generating Bengali sentences, showcasing the adaptability of poems, COCO image captions, and Obama’s political speech
different recurrent neural network architectures. While RNN, datasets further supported the superiority of the JSDGAN
LSTM, and CNN have been extensively employed for text model. Sarang et al. [14] tackled the issue of data imbalance
generation, their effectiveness diminishes with the complexity in text sequence generation algorithms, employing an LSTM-
of long sentence dependencies. Recent advancements have based text generation model alongside GPT-2 for synthetic
seen a shift towards more sophisticated approaches leveraging data generation. Their study focused on highly imbalanced
attention mechanisms [7], including transformers [8], BERT datasets from diverse domains, revealing a 17% improvement
[24], and GPTs [12]. For a comprehensive understanding in results compared to similar deep learning models. The
of these evolving text-generation techniques, readers are en- authors concluded that LSTM performs well in generating
couraged to explore the systematic mapping study on text- synthetic text at the sentence level, while GPT-2 excels at
generation methods [15]. the paragraph or document level. Building on the findings of
[14], this study extends the exploration by utilizing pre-trained D. Training and Assessment
category GANs for generating sentiment-specific synthetic Train the EmoTransGAN model using the acquired datasets
samples in the education domain, deviating from generic and assess its performance through established metrics for
text generation models. There is many Generative Adversarial multimodal emotion detection. Conduct a comparative analysis
Learning model present in a single-mode sentiment analysis against existing state-of-the-art methods to gauge the efficacy
but very little work on Multimodal Emotion Detection. of the proposed methodology.
III. P ROBLEM S TATEMENT
VI. E XPECTED C ONTRIBUTIONS
The current methods employed for emotion detection strug-
gle to capture the intricate and multifaceted nature of hu- The anticipated contributions of this research comprise:
man emotions, particularly within multimodal communica- 1) The introduction of an innovative EmoTransGAN model
tion. Existing strategies tend to emphasize textual or visual for multimodal emotion detection, harnessing the capa-
information, overlooking the potential synergy of combining bilities of Transformer-based architectures and Genera-
both modalities. This inherent limitation presents a substantial tive Adversarial Learning.
obstacle in accurately recognizing and interpreting emotions 2) Revelations regarding the advantages of integrating tex-
across diverse human-computer interaction scenarios. Conse- tual and visual information, leading to enhanced accu-
quently, a critical need exists for a sophisticated and unified racy and context-awareness in emotion detection.
solution, exemplified by EmoTransGAN, aiming to elevate 3) Empirical proof showcasing the superior performance
the accuracy and comprehensiveness of multimodal emotion of EmoTransGAN compared to current methods, sub-
detection. Such advancements promise to enhance the effec- stantiated by thorough evaluations on widely recognized
tiveness of applications spanning from sentiment analysis in benchmark datasets.
social media to facilitating human-robot collaboration.
VII. S IGNIFICANCE OF THE R ESEARCH
IV. R ESEARCH O BJECTIVES
The successful creation of EmoTransGAN is poised to
The main goals of this study include:
propel the field of multimodal emotion detection, offering
1) Creating a Transformer-based framework for detecting a refined comprehension of human emotions in textual and
emotions across multiple modalities by integrating both visual scenarios. This study holds significance for diverse
textual and visual features. applications such as human-computer interaction, sentiment
2) Utilizing Generative Adversarial Learning to improve analysis, and affective computing. It contributes to the devel-
the model’s capacity to produce lifelike and emotionally opment of systems that are not only more empathetic but also
expressive representations. adept at understanding context.
3) Assessing the effectiveness of EmoTransGAN on es-
tablished benchmark datasets for multimodal emotion
detection and conducting a comparative analysis with VIII. C ONCLUSION
current methodologies. This research proposal delineates the inspiration, goals,
methodology, anticipated contributions, and significance of
V. M ETHODOLOGY the proposed EmoTransGAN model designed for multi-
The outlined methodology encompasses the following piv- modal emotion detection. By amalgamating the capabilities
otal stages: of Transformer-based architectures and Generative Adversarial
Learning, the objective of this research is to extend the limits
A. Data Acquisition and Processing of emotion detection systems and play a role in advancing the
Acquire multimodal datasets that encompass both textual creation of AI models that are more intricate and attuned to
and visual elements, annotated with emotional labels. Process context.
the data to ensure uniformity and relevance for subsequent
training. R EFERENCES
[1] Vu, D. & Le, A. Topic-guided RNN model for Vietnamese text gen-
B. Design of Transformer-based Model eration. Research In Intelligent And Computing In Engineering: Select
Proceedings Of RICE 2020. pp. 827-834 (2021)
Formulate a hybrid Transformer-based architecture adept at [2] Masum, A., Majedul Islam, M., Abujar, S., Sorker, A. & Hossain,
handling both text and image inputs. Develop attention mech- S. Bengali news headline generation on the basis of sequence to
anisms to capture intermodal dependencies and associations. sequence learning using bi-directional RNN. Soft Computing Techniques
And Applications: Proceeding Of The International Conference On
Computing And Communication (IC3 2020). pp. 491-501 (2021)
C. Incorporation of Generative Adversarial Learning [3] Li, L. & Zhang, T. Research on text generation based on LSTM. Int.
Core J. Eng.. 7, 525-535 (2021)
Integrate a Generative Adversarial Network (GAN) module [4] Song, L., Wang, A., Su, J., Zhang, Y., Xu, K., Ge, Y. & Yu, D.
into the model to elevate the generation of emotionally ex- Structural information preserving for graph-to-text generation. ArXiv
pressive representations. Adversarial training will enhance the Preprint ArXiv:2102.06749. (2021)
[5] Logeswaran, L., Lee, H. & Bengio, S. Content preserving text generation
model’s capability to generate authentic and diverse emotional with attribute controls. Advances In Neural Information Processing
features. Systems 31. (2018)
[6] Akhtar, N., Shazol, K., Rahman, R. & Yousuf, M. Bangla text generation
using bidirectional optimized gated recurrent unit network. Proceedings
Of International Conference On Trends In Computational And Cognitive
Engineering. pp. 103-112 (2018)
[7] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez,
A. & Al. Attention is all you need. Advances In Neural Information
Processing Systems. pp. 5998-6008 (2017)
[8] Guo, B., Wang, H., Ding, Y., Wu, W., Hao, S., Sun, Y. & Yu, Z.
Conditional text generation for harmonious human-machine interaction.
ACM Transactions On Intelligent Systems And Technology (TIST). 12,
1-50 (2021)
[9] Nadakuduti, S. & Enciso-Rodrı́guez, F. Advances in genome editing
with CRISPR systems and transformation technologies for plant DNA
manipulation. Frontiers In Plant Science. 11 pp. 2267 (2021)
[10] Xu, J. & Shinden, K. Table caption generation in scholarly documents
leveraging pre-trained language models. 2021 IEEE 10th Global Con-
ference On Consumer Electronics (GCCE). pp. 963-966 (2021)
[11] Rebuffel, C., Soulier, L., Scoutheeten, G. & Gallinari, P. A hierarchical
model for data-to-text generation. Advances In Information Retrieval.
pp. 65 (2020)
[12] Fatima, N., Imran, A., Kastrati, Z., Daudpota, S. & Soomro, A. A
systematic literature review on text generation using deep neural network
models. IEEE Access. 10 pp. 53490-503 (2022)
[13] Lin, K., Li, D., He, X., Zhang, Z. & Sun, M. Adversarial ranking for
language generation. ArXiv Preprint ArXiv:1705.11001. (2017)
[14] Shaikh, S., Daudpota, S., Imran, A. & Kastrati, Z. Towards improved
classification accuracy on highly imbalanced text dataset using deep
neural language models. Applied Sciences. 11, 869 (2021)
[15] Sindhu, I., Daudpota, S., Badar, K., Bakhtyar, M., Baber, J. & Nu-
runnabi, M. Aspect-based opinion mining on student’s feedback for
faculty teaching performance evaluation. IEEE Access. 7 pp. 108729-
741 (2019)
[16] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F. & Nishliu, E. Aspect-
based opinion mining of students’ reviews on online courses. Proceed-
ings Of The 2020 6th International Conference On Computing And
Artificial Intelligence. (2020)
[17] Koufakou, A., Gosselin, J. & Guo, D. Using data mining to extract
knowledge from student evaluation comments in undergraduate courses.
2016 International Joint Conference On Neural Networks (IJCNN). pp.
3138-3142 (2016)
[18] Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A. &
Potts, C. Recursive deep models for semantic compositionality over a
sentiment treebank. Proceedings Of The 2013 Conference On Empirical
Methods In Natural Language Processing. pp. 1631-1642 (2013)
[19] Yu, L., Zhang, W., Wang, J. & Yu, Y. Seqgan: Sequence generative ad-
versarial nets with policy gradient. Proceedings Of The AAAI Conference
On Artificial Intelligence. 31 (2017)
[20] Guo, J., Lu, S., Cai, H., Zhang, W., Yu, Y. & Wang, J. Long text
generation via adversarial training with leaked information. Proceedings
Of The AAAI Conference On Artificial Intelligence. 32 (2018)
[21] Che, T., Li, Y., Zhang, R., Hjelm, R., Li, W., Song, Y. & Bengio,
Y. Maximum-likelihood augmented discrete generative adversarial net-
works. ArXiv Preprint ArXiv:1702.07983. (2017)
[22] Li, Z., Xia, T., Lou, X., Xu, K., Wang, S. & Xiao, J. Adversarial discrete
sequence generation without explicit neural networks as discriminators.
The 22nd International Conference On Artificial Intelligence And Statis-
tics. pp. 3089-3098 (2019)
[23] Nie, W., Narodytska, N. & Patel, A. Relgan: Relational generative
adversarial networks for text generation. International Conference On
Learning Representations. (2018)
[24] Xu, J., Ren, X., Lin, J. & Sun, X. Dp-gan: Diversity-promoting genera-
tive adversarial network for generating informative and diversified text.
ArXiv Preprint ArXiv:1802.01345. (2018)
[25] Montahaei, E., Alihosseini, D. & Baghshah, M. Dgsan: Discrete gener-
ative self-adversarial network. Neurocomputing. 448 pp. 364-379 (2021)
[26] Lu, S., Yu, L., Feng, S., Zhu, Y. & Zhang, W. Cot: Cooperative training
for generative modeling of discrete data. International Conference On
Machine Learning. pp. 4164-4172 (2018)
[27] Wang, K. & Wan, X. Sentigan: Generating sentimental texts via mixture
adversarial networks. IJCAI. pp. 4446-4452 (2018)
[28] Liu, Z., Wang, J. & Liang, Z. Catgan: Category-aware generative
adversarial networks with hierarchical evolutionary learning for category
text generation. Proceedings Of The AAAI Conference On Artificial
Intelligence. 34 pp. 8425-8432 (2020)

You might also like