I. I NTRODUCTION B. GAN based sentiment Text generation
In human-computer interaction, emotion detection plays a This section presents an overview of text generation models pivotal role, finding applications across diverse domains such in the literature, particularly those employing Generative Ad- as sentiment analysis in social media and facilitating human- versarial Networks (GANs). Yu et al. introduced the SeqGAN robot collaboration. As communication becomes more mul- model [19], addressing a limitation of GANs related to gen- timodal, integrating textual and visual information becomes erating sequences of discrete tokens. The authors conducted imperative for a holistic understanding of emotions. This synthetic and real-world data experiments, evaluating Seq- research proposal puts forth the concept of ”EmoTransGAN,” GAN’s performance across domains such as Chinese poems, an innovative approach harnessing Transformer-based Gen- Obama’s political speeches, and music. Guo et al. proposed erative Adversarial Learning. EmoTransGAN aims to excel the LeakGAN model [20], which is specifically designed for in multimodal emotion detection by seamlessly incorporating generating lengthy texts. Unlike traditional GAN evaluation, textual and visual data. where generated text is assessed as natural or fake post- generation, the authors advocated for the intermediate assess- II. R ELATED W ORK ment during text generation. They argued that this approach is This section highlights some of the critical research in recent convenient for generating extended texts. Evaluation metrics, years on the topic of sentiment analysis in text generation namely Negative Log-Likelihood (NLL) and BLEU scores, models. were employed. In synthetic data experiments, LeakGAN’s performance, assessed using the NLL metric, surpassed that of Maximum Likelihood Estimation (MLE), SeqGAN, and A. Text generation RankGAN. The authors compared performance using BLEU In recent years, there has been a notable surge in the utiliza- scores with the EMNLP 2017 WMT1 dataset, where Leak- tion of deep neural networks, such as RNN [1], LSTM [3], and GAN consistently outperformed the other models. CNN [5], demonstrating significant prowess in generative text Che et al. introduced the MaliGAN text generation model generation. Vu & Le [1] introduced a novel approach by incor- in their research [21], focusing on enhancing the instability of porating text topics alongside input text to employ RNN for Generative Adversarial Networks (GANs) during backprop- text generation. Their methodology led to a remarkable 23% agation in the training process. The authors assessed Mali- accuracy improvement on a Vietnamese news dataset com- GAN’s performance in poetry generation using BLEU scores pared to a baseline model lacking text generation capabilities. and Perplexity (PPL) metrics. Comparative evaluations with Similarly, Masum et al. [2] harnessed bi-directional RNN to Maximum Likelihood Estimation (MLE) and SeqGAN models generate news headlines in Bengali, achieving success by train- indicated MaliGAN’s superior performance. Li et al. proposed ing the network on Bengali newspapers’ text. In another explo- the JSDGAN model for text generation in their study [22]. ration, Li & Zhang [3] conducted research comparing the text Performance comparisons with MLE, SeqGAN, RankGAN, generation performance of LSTM and its variants, particularly and LeakGAN models were conducted using NLL and BLEU emphasizing long text sequences. Their findings highlighted score metrics. The results demonstrated that JSDGAN outper- LSTM’s superior performance over various metrics, including formed other models in synthetic data, achieving an overall BERT and BLEURT. Akhtar et al. [6] optimized GRU for NLL score of 5.50. Evaluation using BLEU scores on Chinese generating Bengali sentences, showcasing the adaptability of poems, COCO image captions, and Obama’s political speech different recurrent neural network architectures. While RNN, datasets further supported the superiority of the JSDGAN LSTM, and CNN have been extensively employed for text model. Sarang et al. [14] tackled the issue of data imbalance generation, their effectiveness diminishes with the complexity in text sequence generation algorithms, employing an LSTM- of long sentence dependencies. Recent advancements have based text generation model alongside GPT-2 for synthetic seen a shift towards more sophisticated approaches leveraging data generation. Their study focused on highly imbalanced attention mechanisms [7], including transformers [8], BERT datasets from diverse domains, revealing a 17% improvement [24], and GPTs [12]. For a comprehensive understanding in results compared to similar deep learning models. The of these evolving text-generation techniques, readers are en- authors concluded that LSTM performs well in generating couraged to explore the systematic mapping study on text- synthetic text at the sentence level, while GPT-2 excels at generation methods [15]. the paragraph or document level. Building on the findings of [14], this study extends the exploration by utilizing pre-trained D. Training and Assessment category GANs for generating sentiment-specific synthetic Train the EmoTransGAN model using the acquired datasets samples in the education domain, deviating from generic and assess its performance through established metrics for text generation models. There is many Generative Adversarial multimodal emotion detection. Conduct a comparative analysis Learning model present in a single-mode sentiment analysis against existing state-of-the-art methods to gauge the efficacy but very little work on Multimodal Emotion Detection. of the proposed methodology. III. P ROBLEM S TATEMENT VI. E XPECTED C ONTRIBUTIONS The current methods employed for emotion detection strug- gle to capture the intricate and multifaceted nature of hu- The anticipated contributions of this research comprise: man emotions, particularly within multimodal communica- 1) The introduction of an innovative EmoTransGAN model tion. Existing strategies tend to emphasize textual or visual for multimodal emotion detection, harnessing the capa- information, overlooking the potential synergy of combining bilities of Transformer-based architectures and Genera- both modalities. This inherent limitation presents a substantial tive Adversarial Learning. obstacle in accurately recognizing and interpreting emotions 2) Revelations regarding the advantages of integrating tex- across diverse human-computer interaction scenarios. Conse- tual and visual information, leading to enhanced accu- quently, a critical need exists for a sophisticated and unified racy and context-awareness in emotion detection. solution, exemplified by EmoTransGAN, aiming to elevate 3) Empirical proof showcasing the superior performance the accuracy and comprehensiveness of multimodal emotion of EmoTransGAN compared to current methods, sub- detection. Such advancements promise to enhance the effec- stantiated by thorough evaluations on widely recognized tiveness of applications spanning from sentiment analysis in benchmark datasets. social media to facilitating human-robot collaboration. VII. S IGNIFICANCE OF THE R ESEARCH IV. R ESEARCH O BJECTIVES The successful creation of EmoTransGAN is poised to The main goals of this study include: propel the field of multimodal emotion detection, offering 1) Creating a Transformer-based framework for detecting a refined comprehension of human emotions in textual and emotions across multiple modalities by integrating both visual scenarios. This study holds significance for diverse textual and visual features. applications such as human-computer interaction, sentiment 2) Utilizing Generative Adversarial Learning to improve analysis, and affective computing. It contributes to the devel- the model’s capacity to produce lifelike and emotionally opment of systems that are not only more empathetic but also expressive representations. adept at understanding context. 3) Assessing the effectiveness of EmoTransGAN on es- tablished benchmark datasets for multimodal emotion detection and conducting a comparative analysis with VIII. C ONCLUSION current methodologies. This research proposal delineates the inspiration, goals, methodology, anticipated contributions, and significance of V. M ETHODOLOGY the proposed EmoTransGAN model designed for multi- The outlined methodology encompasses the following piv- modal emotion detection. By amalgamating the capabilities otal stages: of Transformer-based architectures and Generative Adversarial Learning, the objective of this research is to extend the limits A. Data Acquisition and Processing of emotion detection systems and play a role in advancing the Acquire multimodal datasets that encompass both textual creation of AI models that are more intricate and attuned to and visual elements, annotated with emotional labels. Process context. the data to ensure uniformity and relevance for subsequent training. R EFERENCES [1] Vu, D. & Le, A. Topic-guided RNN model for Vietnamese text gen- B. Design of Transformer-based Model eration. Research In Intelligent And Computing In Engineering: Select Proceedings Of RICE 2020. pp. 827-834 (2021) Formulate a hybrid Transformer-based architecture adept at [2] Masum, A., Majedul Islam, M., Abujar, S., Sorker, A. & Hossain, handling both text and image inputs. Develop attention mech- S. Bengali news headline generation on the basis of sequence to anisms to capture intermodal dependencies and associations. sequence learning using bi-directional RNN. Soft Computing Techniques And Applications: Proceeding Of The International Conference On Computing And Communication (IC3 2020). pp. 491-501 (2021) C. Incorporation of Generative Adversarial Learning [3] Li, L. & Zhang, T. Research on text generation based on LSTM. Int. Core J. Eng.. 7, 525-535 (2021) Integrate a Generative Adversarial Network (GAN) module [4] Song, L., Wang, A., Su, J., Zhang, Y., Xu, K., Ge, Y. & Yu, D. into the model to elevate the generation of emotionally ex- Structural information preserving for graph-to-text generation. ArXiv pressive representations. Adversarial training will enhance the Preprint ArXiv:2102.06749. (2021) [5] Logeswaran, L., Lee, H. & Bengio, S. Content preserving text generation model’s capability to generate authentic and diverse emotional with attribute controls. Advances In Neural Information Processing features. Systems 31. (2018) [6] Akhtar, N., Shazol, K., Rahman, R. & Yousuf, M. Bangla text generation using bidirectional optimized gated recurrent unit network. Proceedings Of International Conference On Trends In Computational And Cognitive Engineering. pp. 103-112 (2018) [7] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. & Al. Attention is all you need. Advances In Neural Information Processing Systems. pp. 5998-6008 (2017) [8] Guo, B., Wang, H., Ding, Y., Wu, W., Hao, S., Sun, Y. & Yu, Z. Conditional text generation for harmonious human-machine interaction. ACM Transactions On Intelligent Systems And Technology (TIST). 12, 1-50 (2021) [9] Nadakuduti, S. & Enciso-Rodrı́guez, F. Advances in genome editing with CRISPR systems and transformation technologies for plant DNA manipulation. Frontiers In Plant Science. 11 pp. 2267 (2021) [10] Xu, J. & Shinden, K. Table caption generation in scholarly documents leveraging pre-trained language models. 2021 IEEE 10th Global Con- ference On Consumer Electronics (GCCE). pp. 963-966 (2021) [11] Rebuffel, C., Soulier, L., Scoutheeten, G. & Gallinari, P. A hierarchical model for data-to-text generation. Advances In Information Retrieval. pp. 65 (2020) [12] Fatima, N., Imran, A., Kastrati, Z., Daudpota, S. & Soomro, A. A systematic literature review on text generation using deep neural network models. IEEE Access. 10 pp. 53490-503 (2022) [13] Lin, K., Li, D., He, X., Zhang, Z. & Sun, M. Adversarial ranking for language generation. ArXiv Preprint ArXiv:1705.11001. (2017) [14] Shaikh, S., Daudpota, S., Imran, A. & Kastrati, Z. Towards improved classification accuracy on highly imbalanced text dataset using deep neural language models. Applied Sciences. 11, 869 (2021) [15] Sindhu, I., Daudpota, S., Badar, K., Bakhtyar, M., Baber, J. & Nu- runnabi, M. Aspect-based opinion mining on student’s feedback for faculty teaching performance evaluation. IEEE Access. 7 pp. 108729- 741 (2019) [16] Kastrati, Z., Arifaj, B., Lubishtani, A., Gashi, F. & Nishliu, E. Aspect- based opinion mining of students’ reviews on online courses. Proceed- ings Of The 2020 6th International Conference On Computing And Artificial Intelligence. (2020) [17] Koufakou, A., Gosselin, J. & Guo, D. Using data mining to extract knowledge from student evaluation comments in undergraduate courses. 2016 International Joint Conference On Neural Networks (IJCNN). pp. 3138-3142 (2016) [18] Socher, R., Perelygin, A., Wu, J., Chuang, J., Manning, C., Ng, A. & Potts, C. Recursive deep models for semantic compositionality over a sentiment treebank. Proceedings Of The 2013 Conference On Empirical Methods In Natural Language Processing. pp. 1631-1642 (2013) [19] Yu, L., Zhang, W., Wang, J. & Yu, Y. Seqgan: Sequence generative ad- versarial nets with policy gradient. Proceedings Of The AAAI Conference On Artificial Intelligence. 31 (2017) [20] Guo, J., Lu, S., Cai, H., Zhang, W., Yu, Y. & Wang, J. Long text generation via adversarial training with leaked information. Proceedings Of The AAAI Conference On Artificial Intelligence. 32 (2018) [21] Che, T., Li, Y., Zhang, R., Hjelm, R., Li, W., Song, Y. & Bengio, Y. Maximum-likelihood augmented discrete generative adversarial net- works. ArXiv Preprint ArXiv:1702.07983. (2017) [22] Li, Z., Xia, T., Lou, X., Xu, K., Wang, S. & Xiao, J. Adversarial discrete sequence generation without explicit neural networks as discriminators. The 22nd International Conference On Artificial Intelligence And Statis- tics. pp. 3089-3098 (2019) [23] Nie, W., Narodytska, N. & Patel, A. Relgan: Relational generative adversarial networks for text generation. International Conference On Learning Representations. (2018) [24] Xu, J., Ren, X., Lin, J. & Sun, X. Dp-gan: Diversity-promoting genera- tive adversarial network for generating informative and diversified text. ArXiv Preprint ArXiv:1802.01345. (2018) [25] Montahaei, E., Alihosseini, D. & Baghshah, M. Dgsan: Discrete gener- ative self-adversarial network. Neurocomputing. 448 pp. 364-379 (2021) [26] Lu, S., Yu, L., Feng, S., Zhu, Y. & Zhang, W. Cot: Cooperative training for generative modeling of discrete data. International Conference On Machine Learning. pp. 4164-4172 (2018) [27] Wang, K. & Wan, X. Sentigan: Generating sentimental texts via mixture adversarial networks. IJCAI. pp. 4446-4452 (2018) [28] Liu, Z., Wang, J. & Liang, Z. Catgan: Category-aware generative adversarial networks with hierarchical evolutionary learning for category text generation. Proceedings Of The AAAI Conference On Artificial Intelligence. 34 pp. 8425-8432 (2020)