Kunal Reportfinal

SEMINAR REPORT
On
Multimodal NLP for sentimental analysis
BY
Kunal Jadhav
(TECO-B39)
Under The Guidance of
Mrs Shrinika Inamdar
DEPARTMENT OF COMPUTER ENGINEERING

Pimpri Chinchwad College Of Engineering and
Research
Plot No. B, Sector no. 110, Gate no.1,Laxminagar,
Ravet, Haveli, Pune - 412101
Department of Computer Engineering
Pimpri Chinchwad College of Engineering and Research
CERTIFICATE
This is to certify that Kunal Jadhav from Third Year Computer Engineer-
ing has successfully completed her seminar work titled ”Multimodal NLP
for sentimental analysis ” at Pimpri Chinchwad College of Engineering
and Research, Ravet in the partial fulfillment of the bachelor’s degree in
engineering.
Mrs Shrinika Inamdar Dr. Archana Chaugule Dr.H.U.Tiwari

Seminar Guide Head of Department Principal
Abstract
Artificial Intelligence (AI) and Machine Learning (ML) have emerged as in-
dispensable tools in advancing the field of Multimodal Natural Language
Processing (NLP). The integration of AI and ML techniques has revolu-
tionized how machines interpret and comprehend diverse forms of human
communication, transcending the boundaries of traditional unimodal NLP.
In the domain of Multimodal NLP, AI and ML algorithms play a pivotal
role in processing and fusing information from various modalities such as
text, images, audio, and videos. Through advanced neural network architec-
tures, including Convolutional Neural Networks (CNNs), Recurrent Neural
Networks (RNNs), and Transformer models, machines are equipped to learn
complex patterns and relationships between different modalities. This en-
ables them to decipher nuanced context, emotions, and intent embedded
within multimodal data.
Acknowledgments
It gives us great pleasure in presenting the seminar report on ‘Multi-
modal NLP for sentimental analysis ’.
I would like to take this opportunity to thank my internal guide Mrs Shrinika
Inamdar for giving me all the help and guidance I needed. I am really
grateful to them for their kind support. Their valuable suggestions were very
helpful.
I am also grateful to Dr. Archana Chaugule , Head of Computer Engi-

neering Department, PCCOE&R for her indispensable support, suggestions.
In the end our special thanks to Dr. H. U. Tiwari for providing various
resources such as laboratory with all needed software platforms, continuous
Internet connection, for our seminar.
Kunal Jadhav
(T.E. Computer Engg.)
Contents
1 Introduction 2
1.1 Seminar Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation of the Seminar . . . . . . . . . . . . . . . . . . . . 2
1.3 Introduction Part . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Literature Survey 4
3 Methodology/Proposed system 6
3.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Results 7
5 Summary and Conclusion 8
6 Plagiarism Report 9
7 References 10
PCCOE&R, Department of Computer Engineering 2023 5

List of Figures
3.1 Multimodal-nlp . . . . . . . . . . . . . . . . . . . . . . . . . . 6

List of Tables
2.1 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . 5

Chapter 1
Introduction
1.1 Seminar Idea

• This seminar aims to provide a clear idea of how developing new ap-
proaches to Multimodal natural language Processing that can accu-
rately understand and interpret context and emotion(sentimental anal-
ysis)
1.2 Motivation of the Seminar

• Enhanced Contextual Understanding: Combining textual and visual
information allows for a deeper contextual understanding of language.
Visual cues can provide additional context, helping AI models better
comprehend ambiguous text, idiomatic expressions, and references.
• Cross-Modal Data Fusion: By integrating different modalities, AI mod-

els can take advantage of complementary information, improving the
overall accuracy of tasks such as information retrieval, summarization,
translation, and question answering.
• Robustness in Real-World Applications: In real-world scenarios, com-

munication often involves multiple modalities. Multimodal NLP equips
AI systems to handle the complexity of real-life conversations, which
often include text, voice, images, and gestures.

1.3 Introduction Part

• Multimodal Natural Language Processing (NLP) is a cutting-edge method
for comprehending human communication by fusing textual data with
other modalities like images, videos, and audio.
• The idea enables machines to understand not just language clues but
also visual and audio environment, this developing field broadens the
traditional NLP boundaries.
• By combining the contextual information offered by several modalities,

multimodal NLP seeks to improve the precision of language under-
standing, sentiment analysis, and emotion detection. Its spectrum of
applications includes virtual assistants, social media analysis, multime-
dia content interpretation, and medical diagnostics.

Chapter 2
Literature Survey
Research Objective Methods/Techniques Relevant find-

article ings/Limitations
(Au- identified
thor/Year)
[1] To develop a The system uses a com- The system has the
Abdu et multimodal bination of text and vi- potential to be a
al.(2023) sentiment sual information to iden- valuable tool for a
analysis tify whether a person is variety of applica-
system for exhibiting aggressive be- tions, such as pain
recognizing havior due to pain. The management, law en-
person ag- system is composed of forcement, and cus-
gressiveness five components: 1. Text tomer service. How-
in pain. preprocessing 2. Feature ever, the system is
extraction 3. Classifica- still under develop-
tion 4. Image preprocess- ment and more re-
ing 5. Feature extraction search is needed to
evaluate its effective-
ness in real-world set-
tings.

[2] Zeyd Develop a Computer vision, NLP, Improved metadata

Boukhers multimodal Machine Learning extraction accuracy,
et approach for Challenges in han-
al.(2022) metadata dling non-standard
extraction layouts
from scien-
tific PDFs
[3] Neeraj Develop a CNN, LSTM, Multi- Enhanced perfor-
Bhadani multimodal modal Fusion mance in identifying
et deep learn- hateful memes, Chal-
al.(2021) ing model lenges in handling
for predict- diverse visual and
ing hateful textual content
memes
[4] Quigfu Develop a Multimodal Encoder- Improved sentiment
Qi et al. novel multi- Decoder, Transformer analysis accuracy,
(2022) modal senti- Interpretability of
ment analysis multimodal attention
model using a mechanisms
Transformer-
based archi-
tecture
Table 2.1: Literature Survey

Chapter 3
Methodology/Proposed system
3.1 Architecture
A multimodal NLP architecture for sentimental analysis is a system that
uses a combination of text, audio, and video information to identify and un-
derstand the sentiment of a piece of content. Multimodal NLP architectures
can improve the accuracy of sentimental analysis by taking into account the
complementary information that can be found in different modalities. For
example, a model might use facial expressions to identify the sentiment of a
speaker, even if their words are neutral or ambiguous. Or, a model might
use the tone of voice of a speaker to identify the sentiment of a conversation,
even if the words themselves are positive.
Figure 3.1: Multimodal-nlp

Chapter 4
Results
Multimodal NLP models improves the accuracy of sentiment analysis by

taking into account the complementary information that can be found in
different modalities. For example, a model might use facial expressions to
identify the sentiment of a speaker, even if their words are neutral or am-
biguous. Or, a model might use the tone of voice of a speaker to identify the
sentiment of a conversation, even if the words themselves are positive. For
eg.Customer service,Market research,Social media monitoring etc

Chapter 5
Summary and Conclusion
In conclusion, Multimodal sentiment analysis is a rapidly developing field

with the potential to revolutionize the way we understand human sentiment.
As multimodal NLP models continue to improve, we can expect to see them
used in even more applications in the future. Multimodal sentiment analysis
has the potential to make a significant impact on the way we interact with
the world around us. By better understanding human sentiment, we can
create more meaningful and impactful experiences for everyone.

Chapter 6
Plagiarism Report
PLAGIARISM SCAN REPORT
Date 2023-10-02
0% 100%
Words 630
Plagiarised Unique
Characters 4892
Content Checked For Plagiarism
Abstract:
Artificial Intelligence (AI) and Machine Learning (ML) have emerged as indispensable tools in advancing the field of
Multimodal Natural Language Processing (NLP). The integration of AI and ML techniques has revolutionized how machines
interpret and comprehend diverse forms of human communication, transcending the boundaries of traditional unimodal
NLP. In the domain of Multimodal NLP, AI and ML algorithms play a pivotal role in processing and fusing information from
various modalities such as text, images, audio, and videos. Through advanced neural network architectures, including
Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), and Transformer models, machines are
equipped to learn complex patterns and relationships between different modalities. This enables them to decipher
nuanced context, emotions, and intent embedded within multimodal data.
Keywords: Sentimental analysis
Introduction:
Multimodal Natural Language Processing (NLP) is a cutting-edge method for comprehending human communication by
fusing textual data with other modalities like images, videos, and audio. By enabling machines to understand not just
language clues but also visual and audio environment, this developing field broadens the traditional NLP boundaries. By
combining the contextual information offered by several modalities, multimodal NLP seeks to improve the precision of
language understanding, sentiment analysis, and emotion detection. Its spectrum of applications includes virtual assistants,
social media analysis, multimedia content interpretation, and medical diagnostics.
Objectives:
The motivation/objective for multimodal natural language processing (NLP) using AI and machine learning lies in
leveraging the synergy between diverse data modalities, such as text, images, audio, and more. By integrating these
modalities, researchers and practitioners aim to enhance the understanding, interpretation, and generation of human
language, which in turn can lead to a range of benefits and advancements:
1. Enhanced Contextual Understanding: Combining textual and visual information allows for a deeper contextual
understanding of language. Visual cues can provide additional context, helping AI models better comprehend ambiguous
text, idiomatic expressions, and references.
2. Robustness in Real-World Applications: In real-world scenarios, communication often involves multiple modalities.
Multimodal NLP equips AI systems to handle the complexity of real-life conversations, which often include text, voice,
images, and gestures.
3. Richer User Interaction: Integrating speech, text, and visual inputs can create more natural and immersive user
interactions with AI systems, making interfaces more user-friendly and accessible, especially for individuals with different
communication preferences.
4. Cross-Modal Data Fusion: By integrating different modalities, AI models can take advantage of complementary
information, improving the overall accuracy of tasks such as information retrieval, summarization, translation, and question
answering.
5. Enabling New Applications: Multimodal NLP opens the door to innovative applications such as image captioning,
video description, interactive chatbots with visual understanding, automatic video content generation, and more.
Page 1 of 2
.

Chapter 7
References
REFERENCES.
[1] Abdu et al. (2023), J. Yang and S. Jana, ”Deepxplore: Auto-

mated whitebox testing of deep learning systems”, Proceedings of the 26th
Symposium on Operating Systems Principlesser. SOSP’17,pp. 1-18, 2017
[Google Scholar][IEEE]
[2] Zeyd Boukhers et al., Sooryanarayan Gobu Doraisamy and Nava-

neeth Kumar Kanakarajan, ”A Multimodal Approach for Extracting Content
Descriptive Metadata from Lecture Videos”, J. Intell. Inf. Syst, vol. 46, no.
1, pp. 121-145, 2016, 50, [online] Available: https://doi.org/10.1007/s10844-
015-0356-5. [Google Scholar][IEEE].
[3] Neeraj Bhadani et al., Richard Tzong-Han Tsai, Cheng-Lung

Sung, Chiu-Chen Hsieh, Cheng-Wei Lee, Shih-Hung Wu, et al.,”Reference
metadata extraction using a hierarchical knowledge representation frame-
work”, Decision Support Systems, vol. 43, no. 1, pp. 152-167, 2022, 10,[on-
line] Available: https://doi.org/10.1016/j.dss.2006.08.006. [Google Scholar][IEEE].
[4] Quigfu Qi et al., D Wang, S Kumari et al., ”Large-scale atlas of

microarray data reveals the distinct expression landscape of different tissues
in Arabidopsis”, Plant J, vol. 86, no. 6, pp. 472-80, Jun 2016., 3, [Google
Scholar][IEEE]


Kunal Reportfinal

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Kunal Reportfinal

Uploaded by

Copyright:

Available Formats

SEMINAR REPORT

Under The Guidance of

Mrs Shrinika Inamdar

DEPARTMENT OF COMPUTER ENGINEERING

Mrs Shrinika Inamdar Dr. Archana Chaugule Dr.H.U.Tiwari

modal NLP for sentimental analysis ’.

I am also grateful to Dr. Archana Chaugule , Head of Computer Engi-

5 Summary and Conclusion 8

PCCOE&R, Department of Computer Engineering 2023 5

PCCOE&R, Department of Computer Engineering 2023 6

2.1 Literature Survey . . . . . . . . . . . . . . . . . . . . . . . . . 5

PCCOE&R, Department of Computer Engineering 2023 1

1.1 Seminar Idea

1.2 Motivation of the Seminar

• Cross-Modal Data Fusion: By integrating different modalities, AI mod-

• Robustness in Real-World Applications: In real-world scenarios, com-

PCCOE&R, Department of Computer Engineering 2023 2

1.3 Introduction Part

• By combining the contextual information offered by several modalities,

PCCOE&R, Department of Computer Engineering 2023 3

Research Objective Methods/Techniques Relevant find-

PCCOE&R, Department of Computer Engineering 2023 4

[2] Zeyd Develop a Computer vision, NLP, Improved metadata

PCCOE&R, Department of Computer Engineering 2023 5

Figure 3.1: Multimodal-nlp

PCCOE&R, Department of Computer Engineering 2023 6

Multimodal NLP models improves the accuracy of sentiment analysis by

PCCOE&R, Department of Computer Engineering 2023 7

Summary and Conclusion

In conclusion, Multimodal sentiment analysis is a rapidly developing field

PCCOE&R, Department of Computer Engineering 2023 8

PLAGIARISM SCAN REPORT

Content Checked For Plagiarism

PCCOE&R, Department of Computer Engineering 2023 9

[1] Abdu et al. (2023), J. Yang and S. Jana, ”Deepxplore: Auto-

[2] Zeyd Boukhers et al., Sooryanarayan Gobu Doraisamy and Nava-

[3] Neeraj Bhadani et al., Richard Tzong-Han Tsai, Cheng-Lung

[4] Quigfu Qi et al., D Wang, S Kumari et al., ”Large-scale atlas of

PCCOE&R, Department of Computer Engineering 2023 10

PCCOE&R, Department of Computer Engineering 2023 11

You might also like