Professional Documents
Culture Documents
1 Introduction
The COVID-19 pandemic is considered the global public health crisis of the
whole world and the biggest problem people faced after World War II. COVID-
19, a contagious disease caused by a coronavirus, has caused more than 75 million
confirmed cases and 1.7 million deaths across the world till 2020 December1 .
Unfortunately, the misinformation about COVID-19 has encouraged the growing
of the disease and chaos among people. During the Munich Security Council held
on February 15, 2020, World Health Organization (WHO) Director-General,
Tedros Adhanom Ghebreyesus [2] stated that the world was in a war to fight
not only a pandemic, but also an infodemic. So we should address the challenge
of fake news detection to stop the spreading of COVID-19 misinformation.
Since the global pandemic impacts the people, there is a broader public find-
ing information about the COVID-19, whose safety is intimidated by adversarial
agents invested in spreading fake news for economic and political reasons. Be-
sides, due to medical and public health issues, it is also hard to be totally valid
and factual, leading to differences that worsen with fake news. This difficulty
1
https://www.worldometers.info/coronavirus/
2 Gundapu and Mamidi
2 Related Work
Fake News Detection: Fake news can be defined as inaccurate and mis-
leading information that is growing knowingly or unknowingly [3]. Recognizing
the spread of false information such as rumors, fake news, propaganda, hoaxes,
spear phishing, and conspiracy theories is an essential task for natural language
processing [4]. Gartner’s [5] research studies explained that most people in ad-
vanced economies would believe more fake information than truthful information
by 2022.
To date, so many automated misinformation detection architectures have
been developed. Rohit et al. [6] provided an extensive survey to detect fake
news on various online social networks. Ghorbani et al. [7] presented an inclu-
sive overview of the recent studies related to misinformation. Furthermore, they
described the impact of misleading information, shown state-of-the-art fake news
detection systems, and explored the disinformation detection datasets. The ma-
jority of the fake news detection models developed using supervised machine
learning algorithms to classify the data as misleading or not [8]. This supervised
classification is concluded by comparing the user input text with some already
created corpora containing genuine and misleading information [9].
Aswini et al. [10] proposed a deep learning architecture with various word
embeddings for Fake News Challenge (FCN-1) dataset2 . They developed the
2
http://www.fakenewschallenge.org/
Fake News Detection 3
architecture to accurately predict the stance between a given pair of news head-
lines and the corresponding article/body. On the same FCN-1 dataset, Sean
et al. [11] developed an average weighted model of TalosCNN and TalosTree
called TalosComb. TalosCNN is a convolutional neural network with pre-trained
word2vec embeddings, and TalosTree is a gradient-boosted decision tree model
with SVD, word count, TF-IDF. By analyzing the relationship between the news
headline and the corresponding article, Heejung et al. [12] designed the Bidi-
rectional Encoder Representations from Transformers model (BERT) to detect
misleading news articles.
3 Dataset Description
Table 1 shows the corpus size and label distribution, and if we observe, the
labels in each dataset are all roughly balanced. Table 2 shows some examples
from the COVID-19 fake news detection in the English dataset. We illustrate the
most occurring word cloud of the real and fake data points after removing the
stop words in Figures 1(a) and 1(b). In Figure 1(a), we can see unique words in
real-labeled data points which don’t often occur in Figure 1(b), like “covid19”,
“discharged”, “confirmed”, “testing”, “indiafightscorona”, and “indiawin”, etc.;
meanwhile, from Figure 1(b), we can find unique words frequently appearing in
the fake articles, which include “coronavirus”, “kill”, “muslim”, “hydroxychloro-
quine”, “china”, and “facebook post”, but don’t frequently appear in the true
labeled data points. These frequent textual words can give important informa-
tion to differentiate the true data points from fake ones.
4 Methodology
In this part, we present our transformer based ensemble model that is trained and
tuned on the datasets which reported in the previous section. We compare our
Fake News Detection 5
approach with various machine learning (ML) and deep learning (DL) models
with different word embeddings. The full code of system architecture can be
found at GitHub4 .
In the preprocessing step, we will forward the tokenized tweet through the
pipeline to eliminate the noise in the fake news dataset by remove or normilize
the unnecessary tokens. The preprocessing pipeline includes the following sub-
parts:
1. Emoticon Conversion: In this step, we converted the each emoticon in
the tweet to text. Example: → Face with medical mask emoji
2. Handling of Hashtags: We identified the hashtag tokens by seeing pound
(#) sign and splitted these based on digits or capital letters. Example:
#IndiaF ightsCorona → IndiaF ightsCorona
3. Stemming: We removed the inflectional morphemes like “ed”, “est”, “s”,
and “ing” from their token stem. Ex: conf irmed → “conf irm” + “ − ed”
4. Text cleaning: To remove the irrelevent data we used this step. Removed
punctuation marks, digits and, non-ASCII glyphs from the tweet.
1. Word-level, n-gram level, and character level TF-IDF vectors with the feature
matrix size of 100000.
2. English Glove [23] word embeddings with the dimension of 300.
3. TF-IDF weighted averaging with Glove embeddings. We described below the
fake news vector construction.
N
P
tf-idf(tokeni ) × Glove(tokeni )
i=1
T weetvector = (1)
N
In the above formula, N is the total number of words in the input fake news
tweet, and tokeni is the ith token in the input text. After analyzing the results,
TF-IDF weighted averaging gave better results than the standard TF-IDF.
LSTM: We used Long Short-Term Memory (LSTM) [24] architecture with two
different pre-trained word embeddings Glove and Fasttext [25]. LSTM is a type of
Recurrent Neural Network (RNN) that can solve long term dependency problem,
and it is a well-suited model for sequence classification.
We converted the input data points into word vectors by using pre-trained
word embeddings. These word vectors are passed as input to the LSTM layer.
We stacked up two LSTM layers one after another with the dropout of 0.25. The
size of LSTM is 128, and the last time step output is treated as input data point
representation. The final time step’s outcome is passed as an input to a dense
layer for fake news detection.
BiLSTM with Attention: Sometimes not all the tokens in the input text
contribute equally to the representation of input text. So we advantage word
attention [26] mechanism to catch the tokens’ prominent influence on the input
data point. We built this attention mechanism on top of BiLSTM layers.
The sequence of word vector is passed through a BiLSTM layer, which con-
tains one forward and backward LSTM layer. Attention mechanism applied to
the output of BiLSTM layer, which produces a dense vector. This dense vector
is forwarded to a fully connected network.
CNN: We explored a Convolution Neural Network (CNN) [27] model for mis-
information detection. The model consists of an embedding layer, a convolution
layer with 3 convolutions, a max-pooling layer, and a fully connected network.
In the embedding layer, the input texts are converted into n×d sequence matrix,
Fake News Detection 7
where n is the length of the input data point and d is the length of the word
embedding dimension. In the convolution layer, fed the sequence matrix through
three 1D convolutions of kernel sizes 3, 4, and 5. And each convolutions filter size
is 128. The convolution layer’s output is max pooled over time and concatenated
to get the input datapoint representations in the max-pooling layer. The output
of the max-pooling layer is passed to a fully connected network with a softmax
output layer.
This section explored individual and ensembling of the three transformer mod-
els BERT, ALBERT, and XLNet. These models have outperformed the other
ML and DL algorithms. We implemented these models using HuggingFace5 is a
PyTorch transformer library. And the hyperparameters of the three models are
described in Table 1.
network, so it takes into account the entire text passage to comprehend the
meaning of each token.
BERT implementation has two steps; one is pre-training and another fine-
tuning. In the first step, the model is trained on unseen data over various pre-
training problems using a dataset in a particular language or in increases data
with multiple languages. In the second step, all the initialized parameters are
fine-tuned using the labeled data from certain tasks.
We fine-tuned the pre-trained BERT (Base) model for our COVID-19 fake
news detection task. BERT base model contains the 12 layers of encoder blocks
and 12 bidirectional self-attention heads by considering the sequence of 512 to-
kens and emitting the representations of a sequence of hidden vectors. We added
one additional output layer on top of the BERT model to calculate the condi-
tional probability over the output classes, either fake or real. See FIGURE 1 for
the fine-tuned model of BERT.
The XLNet model has been trained on a huge dataset using the permutation
language modeling. This technique is one of the main differences between BERT
and XLNet, and it uses permutations to generate data from the forward and
backward directions at the same time. We used the pre-trained XLNet model
from Hugging Face, then fine-tuned the model with a maximum length of 128
to update the pre-trained model to fit our fake news detection dataset.
ALBERT: Modern language models increasing the model size and quantity of
parameters when pre-training natural language representations. They often give
better improvements in many downstream tasks, but in some cases, they become
harder due to memory limitation and longer hours of training. To address these
problems, a self-supervised learning model ALBERT (A Lite BERT) [31] often
uses parameter reduction techniques to increase model speed and lower memory
consumption. We used the A Lite BERT model for our misinformation detection
problem, which achieves better performance than DL models.
6 Conclusion
In this paper, we presented various algorithms to combat the global infodemic,
but transformer-based algorithms performed better than others. And we sub-
mitted these models to the Shared Task of COVID-19 fake news detection for
English, ConstraintAI-2021 workshop.
Fake news is a progressively significant and tricky problem to solve, partic-
ularly in an unanticipated situation like the COVID-19 epidemic. Leveraging
state-of-the-art classical and advanced NLP models can help address the prob-
lem of COVID-19 fake news detection and other global health emergencies. We
intend to explore other contextualized embeddings like FLAIR, ELMo, etc., for
a better fake news detecting system in future works.
References
1. Patwa, P., Bhardwaj, M., Guptha, V., Kumari, G., Sharma, S., PYKL, S., Das,
A., Ekbal A., Akhtar, S., Chakraborty.: Overview of CONSTRAINT 2021 Shared
Tasks: Detecting English COVID-19 Fake News and Hindi Hostile Posts. (2021). In:
Proceedings of the First Workshop on Combating Online Hostile Posts in Regional
Languages during Emergency Situation (CONSTRAINT). Springer.
2. Datta, R., Yadav, K., Singh, A., Datta, K., Bansal, A.: The infodemics of COVID-
19 amongst healthcare professionals in india. Med. J. Armed Forces India, vol. 76,
no. 3, pp. 276–283, Jul. 2020.
3. Chen, X., Sin, S.J.: ’Misinformation? What of it?’ Motivations and individual dif-
ferences in misinformation sharing on social media. In: ASIST (2013)
4. Thorne, J., Vlachos, A.: Automated Fact Checking: Task formulations, methods
and future directions. In: COLING (2018)
5. Titcomb, J., Carson, J.: www.telegraph.co.uk. Fake news: What exactly is it – and
how can you spot it?
6. Kaliyar, R., Singh, N.: Misinformation Detection on Online Social Media-A Survey.
(2019). 1-6. 10.1109/ICCCNT45670.2019.8944587.
7. Zhang, X., Ghorbani, A.: An overview of online fake news: Characterization, de-
tection, and discussion. (2020). Inf. Process. Manag., 57, 102025.
8. Khan, J.Y., Khondaker, M.T., Iqbal, A., Afroz, S.: A Benchmark Study on Machine
Learning Methods for Fake News Detection. (2019). ArXiv, abs/1905.04749.
9. Elhadad, M., Li, K.F., Gebali, F.: A Novel Approach for Selecting Hybrid Features
from Online News Textual Metadata for Fake News Detection. In: Proc. 3PGCIC,
Antwerp, Belgium, 2019, pp. 914–925.
10. Thota, A., Tilak, P., Ahluwalia, S., Lohia, N.: Fake News Detection: A Deep Learn-
ing Approach. (2018). SMU Data Science Review: Vol. 1 : No. 3 , Article 10.
11. Sean, B., Doug, S., Yuxi, P: Talos Targets Disinformation
with Fake News Challenge Victory. (2017). Available online:
https://blog.talosintelligence.com/2017/06/talos-fake-news-challenge.html
12 Gundapu and Mamidi
12. Jwa, H., Oh, D., Park, K., Kang, J., Lim, H.: exBAKE: Automatic Fake News De-
tection Model Based on Bidirectional Encoder Representations from Transformers
(BERT). (2019). Applied Sciences, 9, 4062.
13. Shahi, G.K., Nandini, D.: FakeCovid - A Multilingual Cross-domain Fact Check
News Dataset for COVID-19. (2020). ArXiv, abs/2006.11343.
14. Zhou, X., Mulay, A., Ferrara, E., Zafarani, R.: ReCOVery: A Multimodal Reposi-
tory for COVID-19 News Credibility Research. (2020). In: Proceedings of the 29th
ACM International Conference on Information & Knowledge Management.
15. Cui, L., Lee, D.: CoAID: COVID-19 Healthcare Misinformation Dataset. (2020).
ArXiv, abs/2006.00885.
16. Memon, S.A., Carley, K.M.: Characterizing COVID-19 Misinformation Communi-
ties Using a Novel Twitter Dataset. (2020). ArXiv, abs/2008.00791.
17. Li, Y., Jiang, B., Shu, K., Liu, H.: MM-COVID: A Multilingual and Multi-
modal Data Repository for Combating COVID-19 Disinformation. (2020). ArXiv,
abs/2011.04088.
18. Al-Rakhami, M.S., Al-Amri, A.M.: Lies Kill, Facts Save: Detecting COVID-19
Misinformation in Twitter. (2020). IEEE Access, 8, 155961-155970.
19. Vijjali, R., Potluri, P., Kumar, S., Teki, S.: Two Stage Transformer Model for
COVID-19 Fake News Detection and Fact Checking. (2020).
20. Hossain, T., RobertL.Logan, I., Ugarte, A., Matsubara, Y., Young, S.,Singh,
S.: COVIDLies: Detecting COVID-19 Misinformation on Social Media. (2020).
NLP4COVID@EMNLP.
21. Patwa, P., Sharma, S., Pykl, S., Guptha, V., Kumari, G., Akhtar, M.S., Ekbal, A.,
Das, A., Chakraborty, T. (2020). Fighting an Infodemic: COVID-19 Fake News
Dataset. ArXiv, abs/2011.03327.
22. Elhadad, M.K., Li, K., Gebali, F.: Detecting Misleading Information on COVID-19.
(2020). IEEE Access, 8, 165201-165215.
23. Pennington, J., Socher, R., Manning, C.D.: Glove: Global Vectors for Word Rep-
resentation. (2014). In: EMNLP.
24. Hochreiter, S., Schmidhuber, J.: Long Short-Term Memory. (1997). Neural Com-
putation, 9, 1735-1780.
25. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching Word Vectors with
Subword Information. (2017). Transactions of the Association for Computational
Linguistics, 5, 135-146.
26. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser,
L., Polosukhin, I.: Attention is All you Need. (2017). NIPS.
27. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. (2015). Nature, 521(7553),
pp.436-444.
28. Devlin, J., Chang, M., Lee, K., Toutanova, K.: BERT: Pre-training of Deep Bidi-
rectional Transformers for Language Understanding. (2019). NAACL-HLT.
29. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., Le, Q.V.: XL-
Net: Generalized Autoregressive Pretraining for Language Understanding. (2019).
NeurIPS.
30. Dai, Z., Yang, Z., Yang, Y., Carbonell, J., Le, Q.V., Salakhutdinov, R.:
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context.
(2019). ACL.
31. Lan, Z., Chen, M., Goodman, S., Gimpel, K., Sharma, P., Soricut, R.: ALBERT:
A Lite BERT for Self-supervised Learning of Language Representations. (2020).
ArXiv, abs/1909.11942.