Professional Documents
Culture Documents
by
It is certified that the work contained in the project report entitled “Hero, Villain
and Victim: Dissecting harmful memes for Semantic role labelling of entities” by the
following students has been carried out under my supervision and that this work has not
been submitted elsewhere for a degree.
This project report entitled “Hero, Villain and Victim: Dissecting harmful
memes for Semantic role labelling of entities” submitted by the group 21 is approved
for the degree of Bachelor of Technology.
Head of Department
Dr. Uma Sheshadri
Professor
Dept. of Computer Science & Engineering
IIIT Dharwad
Declaration
05 May 2022
IIIT Dharwad
We certify that this written submission truly describes our ideas in our own words and
that when other’s ideas or words are used, we have properly cited and referenced the orig-
inal sources. We certify that all sources used in the preparation of this report have been
properly and accurately acknowledged. We further declare that we have adhered to all the
principles of academic honesty and integrity and have not misrepresented, faked, fabri-
cated or falsified any idea, data, fact or source. We acknowledge that any infringement
of the foregoing will result in disciplinary action by the Institute, as well as legal action
from the sources who were not correctly referenced or from whose permission was not
obtained when required.
ii
Acknowledgements
05 May 2022
IIIT Dharwad
Despite the efforts we have put on this project, it would not have been feasible with-
out the kind support and assistance of many individuals and organisations. We would like
to express our heartfelt gratitude to each and every one of them.
We would like to sincerely thank Asst. Prof. Dr. Sunil Saumya and Mr. Shankar
Biradar for their kind cooperation, encouragement, guidance, and constant supervision,
as well as for providing important project information and for their support in finishing
the project.
We would like to express our gratitude towards our parents faculty members of IIIT
Dharwad for their kind co-operation and encouragement which help us in completion of
this project.
iii
Abstract
The Identification of good and evil through representations of heroism, villainy and
victim-hood i.e., role labelling of entities has recently piqued the scientific community’s
interest. Due to the massive increase in the population of the memes, the number of
objectionable content is increasing at an astounding rate, therefore producing a stronger
interest to address this issue and examine the memes for content moderation. Techniques
like Framing can be used to categorize entities engaged as heroes, villains, victims or oth-
ers. Framing can used to visualize the entities associated in the meme as heroes, villains,
victims or others thus readers may anticipate better and understand their behaviours and
attitudes as characters. In this report we discussed different pre-processing techniques
used along with four approaches to role label the entities of the meme as hero, villain,
victim or other through techniques such as Named Entity Recognition(NER), Sentiment
Analysis, Image Captioning etc. We have choosen this project as a part of the competi-
tion Shared Task@Constraint 20221 organized by Indraprastha Institute of Information
Technology Delhi (IIIT-Delhi) collocated with ACL 2022. We are pleased to inform that
the first two approaches discussed in the report have been accepted for the competition,
securing eighth position with an F1-Score of 23.855 and making it possible us to publish
a research paper for ACL 2022 2 .
1
https://lcs2.iiitd.edu.in/CONSTRAINT-2022/
2
Link of the published research paper https://aclanthology.org/2022.constraint-1.3.pdf
iv
Table of Contents
Acknowledgements iii
Abstract iv
1 Introduction 1
2 Literature Review 3
4 Prerequisites 10
4.1 VADER Sentiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.2 Wu-Palmer Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
4.3 Wu-Palmer Similarity Calculation . . . . . . . . . . . . . . . . . . . . . 10
4.4 Image Captioning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4.1 Architecture of Inception-v3 Model . . . . . . . . . . . . . . . . 12
4.4.2 The Final Inception v3 Model . . . . . . . . . . . . . . . . . . . 15
4.4.3 Architecture of Long Short-Term Memory(LSTM) . . . . . . . . 17
4.4.4 Image Captioning using Inception-v3 and LSTM . . . . . . . . . 19
5 Methodology 21
5.1 Data Pre-Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.1.1 Entity Sentence Linking . . . . . . . . . . . . . . . . . . . . . . 21
5.1.2 Creation of Role-dictionaries . . . . . . . . . . . . . . . . . . . . 22
v
Table of Contents vi
6 Results 32
References 35
List of Figures
vii
List of Tables
viii
Chapter 1
Introduction
The easy accessibility to internet and technology has attracted the interest of today’s
youth in social media. These applications offer a large platform for users to communicate
with others and share their thoughts and opinions.With these advantages comes a disad-
vantage: many people exploit the platform to spread offensive content on social media
under the guise of freedom of expression[1]. This incendiary material is usually directed
towards a single person, a small group of people, a religious group, or a community.
People create offensive content and aggressively spread it over social media [2, 3].
For many purposes, including commercial and political benefit, this type of informa-
tion is created [4, 5]. This type of communication can disturb societal harmony and spark
riots. It also has the ability to have a negative psychological impact on readers. It has
the potential to harm people’s emotions and behavior [6, 7]. As a result, identifying such
content is crucial. Further, researchers, politicians, and investors are working to build a
reliable method for dissecting the dangerous memes present over the internet.
Framing allows a communication source to portray and describe a problem within a
"field of meaning" by employing conventional narrative patterns and cultural references
[8]. By connecting with readers’ existing knowledge, cultural narratives, and moral stan-
dards, framing helps to construct events [9]. It can portray the characters in a story as
heroes, villains, or victims, making it easier for the audience to anticipate and compre-
hend their attitudes, beliefs, decisions, and actions.
Narrative frames can be found in various media, including memes, films, literature,
and the news. Narrators use emotionality to plainly distinguish between good and evil
through vivid descriptions of victimization, heroism, and villainy, which is a major feature
of the popular storytelling culture [10]. Positive adjectives are used to portray heroes,
whereas negative terms depict victims and villains.
1
2
Literature Review
3
4
ies on made on detecting harmful memes tells using various multi-modal frameworks
and large data-sets is crucial in the research of hate speech detection, offense and online
harm detection([25], [22]). Some studies says studying additional cues involving com-
mon sense knowledge[29], semantic entities, about the protected categories[30] along
with other meta information can be explored in characterization of online harm conveyed
by memes at various levels of granularity.
While memes are a mix of picture and text data, it is equally vital in role detection
to obtain context from both image and text data. (Kun et al., 2022)[31] attemped to get
the context of image by maing use of Celebretity face detection using Giphy’s GitHub1
followed by a sub-image detector using YoloV52 . His work based on utilizing a mix
of ensembeled models like DeBERTa[32], RoBERTa[33], ViLT[34], EfficientNetB7[35]
shows memes can be studies from both the image and text context. (Singh et al., 2022)[36]
has given a new way of approaching the text data by formulating the problem as Multi-
ple Choice Question Answering Task (MCQA). (Zhou et al., 2022)[37] leveraged Visual
Commonsense Reasoning (VCR) framework along with some ensemble models to get a
greater context of image.
Although many of the approaches are theoretical, there have been hardly few at-
tempts that have been made to role label the entities that had been exalted, demonized, or
victimized ([38]). Instead, studies were conducted to see how satire delivered through the
means of internet memes affects brand image ([39]). We tried a different approach based
on sentiment and lexicon to associate sentiment polarity for role labelling.
1
https://github.com/Giphy/celeb-detection-oss
2
https://github.com/ultralytics/yolov5
Chapter 3
• Victim: Entity which is victimized from the negative impact of someone else’s
actions.
1
https://constraint-lcs2.github.io/
2
https://codalab.lisn.upsaclay.fr/competitions/906
5
3.2 Dataset Description 6
(b)
(a)
(c)
Figure 3.1: Examples of entities portrayed as heroes, villains, victims and others within
memes.
3.2 Dataset Description 7
Each item of the train and validation dataset contains an image representing the
meme and its metadata which contains pre-extracted OCR text along with its entities
mapped to Hero, Villain, Victim & Other categories. A sample from train & validation
dataset can be seen in Figure 3.2a. Similarly, Each item of the test dataset contains an
image representing the meme and its metadata i.e., the pre-extracted OCR text along with
entities. A sample from test dataset can be shown in Figure 3.2b.
This dataset is contains a total of 6933 memes organized into three parts: train,
validation and test set respectively. The dataset is distributed in a well balanced domain
wise with 3381 COVID-19 and 3552 US Politics memes respectively. This dataset is fairly
split-ted with 5552 Train samples, 650 Validation samples and 731 Test samples. Table
3.1 gives a detailed domain-wise distribution of the dataset.
3.3 Exploratory Data Analysis 8
Domains
Splits
COVID-19 US Politics Total
The Categorical Wise Distribution shown in Figure 3.3 shows us 78% of the entities
sits in other and only 3% share with hero category. The Top 50 Words entity wordcloud
3.3 Exploratory Data Analysis 9
which can be seen in Figure 3.4 gives insights about the most repeated entites across the
dataset. The Sentiment Polarity Score Distribution plot shown in Figure 3.5a justifies that
most of the OCR text data is neutral with a sentiment polarity ranging between -0.025 to
0.0249. A detailed distribution across different features can be shown in Figure 3.5.
(a) Sentiment Polarity Score of OCR (b) Parts of Speech Distribution of OCR
(e) Average OCR Word Length Distribution (f) Top 50 Unigram Distribution of OCR
(g) Top 50 Bigram Distribution of OCR (h) Top 50 Trigram Distribution of OCR
Prerequisites
10
4.3 Wu-Palmer Similarity Calculation 11
The similarity-score lies between (0, 1] i,e 0 < similarity-score <= 1. The similarity-
score can never be zero because the depth of the LCS(Least Common Subsumer) is never
zero (the depth of the root of taxonomy is one).
4.4 Image Captioning 12
inception V3 is a superior version of the basic model Inception V1 which was introduced
as GoogLeNet in 2014. As the name suggests it was developed by a team at Google.
The inception v3 model was released in the year 2015, it has a total of 42 layers and
a lower error rate than its predecessors. Let’s look at what are the different optimizations
that make the inception V3 model better. The major features of the Inception-V3 model
are :
The implementation of the above features and optimizations was done as shown in
Figures-(4.3,4.4,4.5)
Using an Auxiliary classifier to improve the convergence of very deep neural net-
works is the goal. In very deep networks, the auxiliary classifier is primarily employed to
tackle the vanishing gradient problem.
In the early stages of the training, the auxiliary classifiers made no difference. How-
ever, in the end, the network with auxiliary classifiers outperformed the network without
them in terms of accuracy.
As a result, the auxiliary classifiers in the Inception-V3 model architecture operate
as a regularizer.
4.4 Image Captioning 14
In total, the inception V3 model is made up of 42 layers which is a bit higher than
the previous inception V1 and V2 models. But the efficiency of this model is really
impressive. The components of inception-v3 model are as shown in the Figure 4.6
After performing all the optimizations the final Inception-V3 model looks like this
as shown in Figure 4.7
4.4
Image Captioning
The basic difference between the architectures of RNNs and LSTMs is that the hid-
den layer of LSTM is a gated unit or gated cell. It consists of four layers that interact with
one another in a way to produce the output of that cell along with the cell state. These
two things are then passed onto the next hidden layer. Unlike RNNs which have got the
only single neural net layer of tanh, LSTMs comprises of three logistic sigmoid gates and
one tanh layer as shown in Figure 4.8. Gates have been introduced in order to limit the
information that is passed through the cell. They determine which part of the information
will be needed by the next cell and which part is to be discarded. The output is usually in
the range of 0-1 where ‘0’ means ‘reject all’ and ‘1’ means ‘include all’.
4.4 Image Captioning 18
Methodology
During the course work, we proposed two frameworks based on two different
methods.In the first method, we perform entity recognition and then sentiment analy-
sis(VADER sentiment). The second method is performing entity recognition followed by
Wu-Palmer Similarity[57] to calculate similarity scores of entities with each if the roles
i.e., hero, villain, victim and other. Then image captioning was utilized to derive more
context from the memes and to aid both frameworks in the task of role labelling of entities.
To link each entity in a particular meme to its corresponding parts of the sentence
(words surrounding the entity) present in the OCR text of that respective meme. We make
a fair assumption that the words nearer to the entities weigh more than those word which
are farther from the entity in its role assignment.
So first, we search for entity occurrence in the OCR sentences. Then using a window
approach(i.e., selecting the n-words occurring before that entity and the n-words occurring
21
5.1 Data Pre-Processing 22
after the entity), we create a sub-part of that sentence. By doing this on the whole OCR
of that respective meme, we create a list of sub-sentences, one for each entity present in
that particular meme as shown in Figure 5.1.
People use framing to contextualize events by connecting with reader’s prior knowl-
edge, cultural narratives, and moral values. It helps to present the agents involved in a
story as heroes, villains, victims, none, so that readers can more easily anticipate and
comprehend the attitudes, beliefs, decisions, and actions of the agents portrayed.
Framing theory concludes that generally positive phrases are used to characterise
heroes, whereas negative terms are used to depict victims and villains. In general heroes
represent bravery, great accomplishments, or other noble attributes, whereas villains rep-
resent malicious intents, planning, and other undesirable characteristics.
Thus iterating through the whole OCR we have curated three word dictionaries
namely hero dictionary, villain dictionary and victim dictionary as shown in Figure 5.2.
Where hero dictionary contains terms of positive sentiment which would be generally
used to represent the role of heroes, villain dictionary and victim dictionary contains
words of negative sentiment which are generally used to represent villains,victims re-
spectively.
For each word in the English dictionary we calculate its similarity score with all the
words present in hero dictionary, villain dictionary and victim dictionary respectively to
create the similarity dictionary.
For each word in the English dictionary(Wordnet dictionary) we calculate its sim-
ilarity score(using Wu-Palmer similarity) with all the words present in hero dictionary,
villain dictionary and victim dictionary respectively to determine its similarity with the
roles of hero,villain,victim and normalize those to create the similarity dictionary.
To determine a words similarity score with the role of hero we calculate that words
similarity score by comparing it with all the words in the hero dictionary and then take
the normalized sum as the similarity score.
To determine a words similarity score with the role of villain we calculate that words
similarity score by comparing it with all the words in the villain dictionary and then take
the normalized sum as the similarity score.
To determine a words similarity score with the role of victim we calculate that words
similarity score by comparing it with all the words in the victim dictionary and then take
the normalized sum as the similarity score.
In the similarity dictionary for each word the first entry represents the similarity
score with the hero role, the second entry represents the similarity score with the villain
role, the third entry represents the similarity score with the victim role respectively as
shown in the Figure 5.3
5.2.1 Framework-I
1. For each entity given in a particular meme, identify the words close(i.e., surround-
ing words) to these entities by linking the entity sentence.
2. Perform sentiment analysis to determine the polarity of these words, thus making
out the sentiment attributed to the entity.
3. Use sentiment polarity to role label the entities, according to the proposed semantic
classes.
After performing entity sentence linking, we determine the sentiment score of the
words(sub-sentences) linked with an entity; we do this for all the entities mentioned in
that particular meme. To do this, we calculate the sentiment(i.e., word polarity) for each
word using a standard toolkit like VADER-Sentiment1 (as it has a huge vocabulary of
the word polarities), thus getting a polarity for each word, which ranges between [-1, 1]
(i.e., very-negative to very-positive). These sentiment-polarities are then summed up for
each sentence. Finally, the sentiment-polarities for each sentence are normalized and then
averaged to get an overall sentiment ascribed for the entity [58].
As we all know, that hero is associated with positive words with positive sentiment,
thus entities with positive sentiment attributed to them are role labelled as hero. Similarly,
victims and villains are associated with negative words with negative sentiments, thus
entities with negative sentiment attributed to them are role labelled as villain or victim
based on the magnitude of negativity. If the words(sub-sentences) have no polarity, they
don’t glorify or vilify or victimize any entity thus semantically similar to the class "other"
as described in Figure 5.4.
1
https://pypi.org/project/vaderSentiment/
5.2
Methods and Models
5.2.2 Framework-II
1. For each entity given in a particular meme, identify the words close(i.e., surround-
ing words) to these entities by linking the entity sentence.
2. Determine the resemblance of these words with the words used to describe heroes,
villains, and victims by curating word sets or dictionaries for each role.
3. Role label the entities by analyzing their similarity scores with those of hero, villain,
and victim. If the scores are zero or almost the same, role label it to "other" class.
After performing entity sentence linking, We create three dictionaries, one for each
hero, villain, and victim containing the words or terms similar to them, respectively. Then
by using a method like Wu-Palmer similarity2 we calculate the similarity score of each
word from the entity-sentence linking step with hero dictionary, villain dictionary, vic-
tim dictionary which were crafted by hand going through the whole OCR to create the
similarity dictionary Figure 5.3.
Then the similarity score for each entity is determined by summing the similarity
scores of all the words found in the sub-sentences. Then it is normalized to get an overall
similarity of a particular entity with the roles of hero, villain, victim, and others. We
assign an entity to the role whose similarity score is the highest using these similarity
scores [58]. Like, if an entity has the highest similarity score with that of hero role we
role label that entity as hero and similarly for all the other roles. If the similarity scores
with each of the roles are almost similar or zero, we assign it to the class "other" in the
proposed role assignment approach as described in Figure 5.5.
2
https://arxiv.org/ftp/arxiv/papers/1310/1310.8059.pdf
5.2
Similarity score
Entity Sentence
Entity List Linked terms calculation(Using Similarity Score Role assignment
Linking
Wu-Palmer method)
Methods and Models
Meme Similarity
Hero Villain Victim Other
Dictionary
Dictionary Dictionary
Linked
terms terms terms Dictionary
terms
1. For each meme-image generate the image caption and combine the generated cap-
tion with the initial OCR of the respective meme.
2. For each entity given in a particular meme, identify the words close(i.e., surround-
ing words) to these entities by linking the entity sentence.
3. Perform sentiment analysis to determine the polarity of these words, thus making
out the sentiment attributed to the entity.
4. Use sentiment polarity to role label the entities, according to the proposed semantic
classes.
To have greater context from the image we generate captions from the im-
ages(memes) using machine learning models like Inception-v3 along with LSTM as
shown in Figure 4.11 and use it to supplement the OCR.
Then after performing entity sentence linking, we determine the sentiment score of
the words(sub-sentences) linked with an entity; we do this for all the entities mentioned
in that particular meme. To do this, we calculate the sentiment(i.e., word polarity) for
each word using a standard toolkit like VADER-Sentiment3 (as it has a huge vocabulary
of the word polarities), thus getting a polarity for each word, which ranges between [-1,
1] (i.e., very-negative to very-positive). These sentiment-polarities are then summed up
for each sentence. Finally, the sentiment-polarities for each sentence are normalized and
then averaged to get an overall sentiment ascribed for the entity [58].
As we all know, that hero is associated with positive words with positive sentiment,
thus entities with positive sentiment attributed to them are role labelled as hero. Similarly,
victims and villains are associated with negative words with negative sentiments, thus
entities with negative sentiment attributed to them are role labelled as villain or victim
based on the magnitude of negativity. If the words(sub-sentences) have no polarity, they
don’t glorify or vilify or victimize any entity thus semantically similar to the class "other"
as described in Figure 5.6.
3
https://pypi.org/project/vaderSentiment/
5.2
Methods and Models
1. For each meme-image generate the image caption and combine the generated cap-
tion with the initial OCR of the respective meme.
2. For each entity given in a particular meme, identify the words close(i.e., surround-
ing words) to these entities by linking the entity sentence.
3. Determine the resemblance of these words with the words used to describe heroes,
villains, and victims by curating word sets or dictionaries for each role.
4. Role label the entities by analyzing their similarity scores with those of hero, villain,
and victim. If the scores are zero or almost the same, role label it to "other" class.
To have greater context from the image we generate captions from the im-
ages(memes) using machine learning models like Inception-v3 along with LSTM as
shown in Figure 4.11 and use it to supplement the OCR.
Then after performing entity sentence linking, We create three dictionaries, one for
each hero, villain, and victim containing the words or terms similar to them, respectively.
Then by using a method like Wu-Palmer similarity4 we calculate the similarity score of
each word from the entity-sentence linking step with hero dictionary, villain dictionary,
victim dictionary which were crafted by hand going through the whole OCR to create the
similarity dictionary Figure 5.3.
Then the similarity score for each entity is determined by summing the similarity
scores of all the words found in the sub-sentences. Then it is normalized to get an overall
similarity of a particular entity with the roles of hero, villain, victim, and others. We
assign an entity to the role whose similarity score is the highest using these similarity
scores [58]. Like, if an entity has the highest similarity score with that of hero role we
role label that entity as hero and similarly for all the other roles. If the similarity scores
with each of the roles are almost similar or zero, we assign it to the class "other" in the
proposed role assignment approach as described in Figure 5.7.
4
https://arxiv.org/ftp/arxiv/papers/1310/1310.8059.pdf
5.2
Methods and Models
Results
For the competition, teams were ranked based on macro F1-Score across all the
classes. The suggested method and model secured the eighth position in the competi-
tion for the task of dissecting harmful memes for Semantic role-labeling of entities. Table
6.2 shows the rankings of various teams, and the performance of the proposed system is
indicated in bold letters. The output for a meme from the test sample is shown in Figure
6.1. The figure contains both Framework-I and Framework-II generated role labels.
32
33
The model performs well in the role labeling task. However, in some cases, the
model under performs in identifying the categories due to the difficulty in capturing some
of the attributes or traits related to the roles. As a result, the overall systems’ macro F1-
score has been low at 23.855 as in Table 6.1. In addition, the ensembling of multiple
NLP sub-tasks also have contributed to the decrease of the F1-score of the system. The
systems’ performance can be further improved by modeling those NLP sub-tasks in the
proposed methods using better parameters which could potentially increase the score.
1 Shiroe 58.671
2 jayeshbanukoti 56.005
3 c1pher 55.240
4 zhouziming 54.707
5 smontariol 48.483
6 zjl123001 46.177
7 amanpriyanshu 31.943
8 Team IIITDWD (fharookshaik) 23.855
9 rabindra.nath 23.717
The current system implementations use NLP techniques such as entity recognition,
sentiment analysis, and word sets and dictionaries along with some machine learning,
all of which have shown promising results in the role labeling task. Across all classes,
the existing system implementation produced a good F1 score. However, as the model
is based on simple proximity measures, it has issues when dealing with OCR text that
contains composite grammatical structures such as indirect speech, passive voice etc. In
this experiment, the n-words window size used for data processing is n=3. As a result,
there is potential for various future changes to increase the system’s performance.
We also aim to implement image feature recognition on memes with the goal of
recognising facial traits or emotions that can be utilised to determine the meme’s senti-
ment in cases when OCR is unable to do so.
Further, in future experiments and add-ons, we plan to leverage some of the
SOTA(State Of The Art) machine learning models such as SVM to discover distinct sen-
timent polarity boundaries for various sub-tasks to enhance the working of sub-tasks and
thereby improving the system’s role labeling performance.
34
References
[4] J. M. J. Jeff Goodwin and F. Polletta, “Passionate politics: Emotions and social
movements,” University of Chicago Press, 2009.
[5] S. Biradar, S. Saumya, and A. Chauhan, “Combating the infodemic: Covid-19 in-
duced fake news recognition in social media networks,” Complex & Intelligent Sys-
tems, pp. 1–13, 2022.
[6] S. Stieglitz and L. Dang-Xuan, “Emotions and information diffusion in social me-
dia—sentiment of microblogs and sharing behavior,” Journal of management infor-
mation systems, 2013.
[7] S. Biradar, S. Saumya, and A. Chauhan, “Hate or non-hate: Translation based hate
speech identification in code-mixed hinglish data set,” in 2021 IEEE International
Conference on Big Data (Big Data). IEEE, 2021, pp. 2470–2475.
[9] M. C. Green, “Transportation into narrative worlds: The role of prior knowledge and
perceived realism,” Discourse processes.
35
References 36
[10] L. B. Diego Gomez-Zara, Miriam Boon, “Detection of roles in news articles us-
ing natural language techniques,” 23rd International Conference on Intelligent User
Interfaces, 2018.
[15] S. Vosoughi, D. Roy, and S. Aral, “The spread of true and false news online,” Sci-
ence, vol. 359, no. 6380, pp. 1146–1151, 2018.
[16] S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Goharian, and O. Frieder, “Hate
speech detection: Challenges and solutions,” PloS one, vol. 14, no. 8, p. e0221152,
2019.
[21] Y. Zhou and Z. Chen, “Multimodal learning for hateful memes detection,” 2021
IEEE International Conference on Multimedia & Expo Workshops (ICMEW), pp.
1–6, 2021.
[24] S. Zidani and R. Moran, “Memes and the spread of misinformation: Establishing
the importance of media literacy in the era of information disorder,” Teaching Media
Quarterly, vol. 9, no. 1, 2021.
[28] H. Fujita and A. Selamat, “Hate crime on twitter: Aspect-ased sentiment analysis
approach,” in Advancing Technology Industrialization Through Intelligent Software
Methodologies, Tools and Techniques: Proceedings of the 18th International Con-
ference on New Trends in Intelligent Software Methodologies, Tools and Techniques
(SoMeT_19), vol. 318. IOS Press, 2019, p. 284.
[31] L. Kun, J. Bankoti, and D. Kiskovski, “Logically at the constraint 2022: Multimodal
role labelling,” in Proceedings of the Workshop on Combating Online Hostile
Posts in Regional Languages during Emergency Situations. Dublin, Ireland:
Association for Computational Linguistics, May 2022, pp. 24–34. [Online].
Available: https://aclanthology.org/2022.constraint-1.4
[32] P. He, X. Liu, J. Gao, and W. Chen, “Deberta: Decoding-enhanced bert with disen-
tangled attention,” 2021.
[33] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettle-
moyer, and V. Stoyanov, “Roberta: A robustly optimized bert pretraining approach,”
2019.
[34] W. Kim, B. Son, and I. Kim, “Vilt: Vision-and-language transformer without con-
volution or region supervision,” 2021.
[35] M. Tan and Q. V. Le, “Efficientnet: Rethinking model scaling for convolutional
neural networks,” 2020.
[42] I. Feinerer and K. Hornik, wordnet: WordNet Interface, 2020, r package version
0.1-15. [Online]. Available: https://CRAN.R-project.org/package=wordnet
[45] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan, “Show and tell: Lessons learned
from the 2015 mscoco image captioning challenge,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, vol. 39, no. 4, pp. 652–663, 2017.
[47] J.-Y. Pan, H.-J. Yang, P. Duygulu, and C. Faloutsos, “Automatic image captioning,”
in 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE
Cat. No.04TH8763), vol. 3, 2004, pp. 1987–1990 Vol.3.
[50] ——, “Rethinking the inception architecture for computer vision,” in Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June
2016.
[51] M. Sundermeyer, R. Schlüter, and H. Ney, “Lstm neural networks for language mod-
eling,” in Thirteenth annual conference of the international speech communication
association, 2012.
[52] Y. Yu, X. Si, C. Hu, and J. Zhang, “A Review of Recurrent Neural Networks:
LSTM Cells and Network Architectures,” Neural Computation, vol. 31, no. 7, pp.
1235–1270, 07 2019. [Online]. Available: https://doi.org/10.1162/neco_a_01199
[53] S. Wang and J. Jiang, “Learning natural language inference with LSTM,” CoRR,
vol. abs/1512.08849, 2015. [Online]. Available: http://arxiv.org/abs/1512.08849
[54] J. A. Alzubi, R. Jain, P. Nagrath, S. Satapathy, S. Taneja, and P. Gupta, “Deep image
captioning using an ensemble of cnn and lstm based deep neural networks,” Journal
of Intelligent & Fuzzy Systems, vol. 40, no. 4, pp. 5761–5769, 2021.
[55] Hartatik, H. Al Fatta, and U. Fajar, “Captioning image using convolutional neural
network (cnn) and long-short term memory (lstm),” in 2019 International Seminar
on Research of Information Technology and Intelligent Systems (ISRITI), 2019, pp.
263–268.
[56] P. Shah, V. Bakrola, and S. Pati, “Image captioning using deep neural architectures,”
in 2017 International Conference on Innovations in Information, Embedded and
Communication Systems (ICIIECS), 2017, pp. 1–4.
[57] E. L. S. Bird, E. Klein, “Natural language processing with python: analyzing text
with the natural language toolkit,” O’Reilly Media, Inc, 2009.