You are on page 1of 5

2023 IEEE Ninth International Conference on Big Data Computing Service and Applications (BigDataService)

2023 IEEE Ninth International Conference on Big Data Computing Service and Applications (BigDataService) | 979-8-3503-3379-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/BigDataService58306.2023.00025

Fact-Checking In Complex Networks


A hybrid textual-social based approach

Othman El houfi Dimitris Kotzinos


ETIS Lab, UMR 8051 ETIS Lab, UMR 8051
CY Cergy Paris University, ENSEA, CNRS CY Cergy Paris University, ENSEA, CNRS
France France
othmanelhoufi@gmail.com Dimitrios.Kotzinos@cyu.fr

Abstract—As false information and fake news continue propa- • Propagation-based Fake News Detection [3]: a prin-
gating throughout the internet and social networks, the need for cipled way to characterize and understand hierarchical
fact-checking operations arises, with notable examples politics propagation network features; based on a statistical com-
(such as the 2016 USA Elections) and public health (COVID-
19). A number of solutions have been proposed to deal with this parative analysis over these features, including micro-
problem and limit the spread of false information, both manual level and macro-level, of fake and true news.
and automatic. Undoubtedly the manual approaches done on • Credibility-based Fake News Detection [4]: the infor-
websites such as PolitiFact.com, FactCheck.org and Snopes.com mation about authors of news articles can indicate news
aren’t a viable long term solution: put simply, disinformation is credibility and help detect fake news.
increasing and human fact-checkers simply don’t scale up at the
same rate. This paper presents our contributions, which include: Although these approaches exist, numerous efforts for fake
(i) An automated solution for fact-checking using state-of-the-art news classification separate the content from the context:
Language Models (LMs) and five well known datasets containing
annotated claims/tweets to fine-tune each LM and classify a given • Language model-based approaches [5]: These methods
claim through textual context; and (ii) A custom architecture, the use LMs for fact-checking, with zero-shot LMs showing
Hybrid Fake News Classifier, that utilizes both textual context improvements over random baseline LMs.
and social context by combining both LLMs and Graph Auto- • Hierarchical propagation-based approaches [3]: These
Encoders (GAEs). We show that fine-tuning a LM with the
correct settings can achieve an high accuracy and F1-scores, methods focus on the correlation between news and
better than the majority of fact-checking methods that exist characteristics of a social network, analyzing the micro-
today. Moreover, the Hybrid Fake News Classifier achieves better level and macro-level propagation network features.
than SOTA accuracy and F1-score, which shows that we can use
the social interactions of users in a network as an additional key Our approach combines elements of the above and utilizes
attribute next-to textual claims for fake-news detection. firstly pre-trained Language Models (LMs) to solve the clas-
sification task i.e fact-checking claims. The model assesses
Index Terms—Fact-Checking, Fake-news, Natural Language with a degree of confidence the truthfulness or falseness of a
Processing, Multimodal Learning, Pre-trained Language Model, claim given as an input by exploiting LMs that were already
Fine-tuning, Text understanding, Natural Language Inferencing,
Complex Networks, Social Graphs, Graph AutoEncoder.
trained on large scale textual databases such as Wikipedia,
CCNews, etc. and is not intended for real-time use. Addition-
ally, we use a Graph Auto-encoder; which analyzes Twitter
I. I NTRODUCTION & R ELATED W ORK users by exploiting the social properties (user information and
user-to-user interactions) of the network without taking into
From a social and psychological perspective, humans have consideration the textual claim itself. A Graph Auto-Encoder
been proven irrational and vulnerable when differentiating (GAE) is an unsupervised deep learning model specifically
between true and fake news. Social media platforms have designed for graph-structured data. It aims to capture the latent
accelerated the spread of news, augmenting the problem. This representation of nodes in the graph, while preserving the
has given rise to research that aims to identify fake news graph’s topological structure and node interactions. The GAE
and understand their spreading and diffusion. In the computer consists of an encoder that maps nodes to embeddings in the
science literature, different approaches have been proposed, latent space and a decoder that reconstructs the graph from
mainly classified as: the latent embeddings. In essence, GAEs learn a compressed
• Knowledge-based Fake News Detection [1]: a method representation of the graph that can be used for various
aims to assess news authenticity by comparing the knowl- graph analysis tasks, like link prediction, node classification,
edge extracted from to-be verified news content with or community detection. The output of the graph encoder
known facts, also called fact-checking. is a 16-dimensional vector in the latent space; we call it
• Style-based Fake News Detection [2]: focuses on the ”Social Embedding”. Then, the textual input is encoded by
style of writing, i.e. the form rather than the meaning. a Transformer-based Language Model (e.g., BERT) so as

979-8-3503-3379-4/23/$31.00 ©2023 IEEE 132


DOI 10.1109/BigDataService58306.2023.00025
Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 10:57:01 UTC from IEEE Xplore. Restrictions apply.
TABLE I 1) Experimental Setup: We fine-tuned each LM using
S UMMARY OF DATASETS USED IN THE STUDY
pre-processed and cleaned datasets, experimenting with the
Dataset
FEVER
Domain
General
# of labels
3
Size
185,445
Training/Validation/Testing
145,449 / 9,999 / 9,999
Classes
Supports, Refutes, NEI
original number of labels and reducing them to two (True and
Liar
MultiFC
Political
General
6
5
12,834
34,918
10,268 / 1,283 / 1,283
21,823 / 1,364 / 1,365
True, False, MostlyTrue, etc.
True, False, MostlyTrue, etc. False) to evaluate their strength in well-defined boundaries
COVID-19 rumor Health 2 8,000+ 6,420 / 1,070 / 1,070 Real, Fake
ANTi-Vax Health 2 15,000+ 8,702 / 1,864 / 1,865 MisInfo, NotMisInfo and overlapping boundaries. COVID-19 rumor and ANTi-Vax
FakeNewsNet General 2 see table III-B see table III-B Real, Fake
datasets are using only two labels but for the FEVER dataset,
the Not Enough Information (NEI) records were excluded.
to get the ”Textual Embedding”, a 768-dimensional vector. Hyperparameters were carefully selected after multiple runs,
Finally, we concatenate both vectors and pass it through a and while not necessarily optimal, they serve the purpose of
Feed Forward Network as an input to classify the claim, thus this research. The LMs were implemented using HuggingFace
building a custom classifier that uses LMs and Graph Auto- API, and the monitoring via Weights & Biases API.
encoders; we call it: Hybrid Fake-News Classifier. Testing the 2) Evaluation metrics: In all our experiments we rely
new classifier provides improved results and opens up new on the usual classification metrics such as: accuracy, recall,
possibilities for correctly automatically classify fake news. precision, and F1-score.
3) Results & Discussion: As reported in the table II that
only takes into consideration the best LM from a performance
II. DATASETS perspective in each experiment (dataset and number of labels),
the overall results prove the validity of our method in some
We used the following diverse collection of datasets with cases and its ineffectiveness in other cases.
different properties to minimize biased solutions (Table I). For instance, we get good results when it comes to 2-labels
FEVER dataset [6] comprises 185,445 claims generated classification as the boundaries are well defined and there is
from altered Wikipedia sentences and labeled as Supported, no overlapping between claims, therefore it is expected for the
Refuted, or NotEnoughInfo. LIAR dataset [7] contains 12.8K LMs to perform well in this case. It’s important to mention
manually labeled short statements from PolitiFact with six that because the COVID-19 and ANTi-Vax datasets are topic-
levels of truthfulness, including true, predominantly true, half- specific and they don’t contain a higher variance when it comes
true, almost true, false, and pants-fire. MultiFC dataset [8] to language and topics, the LMs performed proficiently. In
consists of 34,918 claims from 26 fact-checking websites, this case we can deploy our LMs and completely rely on their
labeled by human journalists. We mapped the original labels claim assessment.
to a smaller class set for consistency. COVID-19 rumor For the FEVER dataset, the results are intriguing owing to
dataset [9] features over 6,000 annotated COVID-19 claims the fact that this dataset has a high topic variance and yet the
from social media platforms, labeled either Real or Fake. LMs performed impressively in comparison to the MultiFC
ANTi-Vax dataset [10] provides more than 15K tweets related and LIAR datasets. It is difficult to explain the difference in
to COVID-19 vaccinations, annotated as misinformation or performance in this case, but this may be justified by the fact
general COVID-19 vaccine tweets by medical experts. Fake- that all the LMs were pre-trained on Wikipedia and the FEVER
NewsNet dataset [11] is a multi-dimensional repository with dataset contains claims that were extracted directly from
datasets containing news content and social context, where Wikipedia. In addition, the language used in news reporting
each piece of news is labeled either Real or Fake. carries a specific style of writing that sometimes can only
contain keywords or unspecific spatiotemporal context that can
III. P ROPOSED S OLUTIONS AND R ESULTS render classification extremely hard, for example the claim
”About 99% of rape allegations are fabricated.” is vague as
Here, the initial experiments using LMs and our main it doesn’t contain sufficient information to deliver a reliable
contribution, the Hybrid Fake-news Classifier, are described. assessment, and this language style can be found in MultiFC
and LIAR datasets. Furthermore, we observe that by incre-
A. Transformer-based Language Models menting the number of labels, the classification task becomes
considerably harder. For the FEVER dataset using the original
The architecture of the Transformer models is based number of labels, our approach yields better results than most
on the Attention Mechanism, enabling a better overall of the traditional methods that involve external information
understanding of context. We chose the following set of retrieval modules as well as Facebook AI’s model [3], where
seven pre-trained LMs: BERT-base-uncased [12], RoBERTa- they used BERT-large that has 340 million parameters and 24
base [13], DistilBERT-base-uncased [14], XLNET-base-cased encoding layers and achieved 0.57 in accuracy and macro f1-
[15], ALBERT-base-v2 [16], BigBird-RoBERTa-base [17], score, thus making our LM more efficient to train, to store
and ConvBERT-base [18]. Despite their subtle differences, and to implement. Nevertheless, by taking into consideration
these LMs are all based on the Transformer architecture, and a third category of claims which is Not Enough Information,
were pre-trained on large-scale textual databases. By fine- the scores drop considerably making it much harder for the
tuning them on our specific datasets, we aim to investigate LMs to perform the classification task. The same goes for the
their capability as fact-checkers. MultiFC and LIAR datasets.

133

Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 10:57:01 UTC from IEEE Xplore. Restrictions apply.
TABLE II
LM S CLASSIFICATION RESULTS OF THE BEST LM IN EACH EXPERIMENT.
# of labels
Dataset Metric 2-labels 3-labels 5-labels 6-labels
accuracy 0.81 0.64 - -
FEVER
f1-score 0.81 0.63 - -
accuracy 0.72 - 0.50 -
MultiFC
f1-score 0.64 - 0.40 -
accuracy 0.69 - - 0.31
Liar
f1-score 0.61 - - 0.30
accuracy 0.98 - - -
Covid19
f1-score 0.98 - - -
accuracy 0.99 - - -
ANTiVax
f1-score 0.99 - - -

TABLE III
P ROPERTIES OF THE FAKE N EWS N ET DATASET AFTER TRANSFORMING
EACH ENTRY TO A HOMOGENEOUS GRAPH

Dataset # of graphs # of nodes # of edges # of features avg node degree


Politifact 534 127677 258445 388 2.02
Gossipcop 5468 122272 237491 388 1.94 Fig. 1. Hybrid social-textual context architecture.

It’s noteworthy that all LMs performed relatively close to latent representation for that graph then outputs a ”Social
each other during all experiments. The margin of difference Embedding”.
is 0.01 to 0.07 in accuracy and 0.01 to 0.14 in F1- • A Language Model: takes the textual claim as an input,

score. The lowest scores are attributed to ALBERT and it’s and encodes it to a vector i.e ”Textual Embedding”.
presumably due to the fact that this LM has lowest number of • A Feed Forward Network: takes the Social and Textual

parameters (12 million) in comparison to other LMs. On the embeddings as an input, and outputs the class to which
other hand, ConvBERT had the highest scores in most runs it belongs i.e Fake or Real.
— especially when the number of labels increases — which 1) Experimental setup: Before we dive into technical
may be explained by the subtle difference in the architecture details of the original architecture that we designed it is
(span level dynamic convolution). Alongside ConvBERT, we necessary to first understand the structure of the data. As
find RoBERTa and BERT that perform better than ConvBERT mentioned before in section II, FakeNewsNet dataset contains
by a margin of 0.01 in both accuracy and F1-score, but we annotated articles extracted from Politifact and Gossipcop, in
should keep in mind that a large number of variables controls addition, each article has a list of Tweet IDs. We exported from
the outcome of the experiments, for instance, changing the Twitter all related tweets and user information of the author
hyperparameters can cause minor or highly noticeable differ- of the tweet and the likers/retweeters (profile description,
ences in the performance of our LMs. When the networks are followers/following/tweet count, and whether or not the user
large, deep and complex, the explainability and interpretability is verified), then transformed the data to a tree-like graph
of the results becomes a exceedingly harder task. dataset with depth of two. At the top of the tree we find
the original annotated article from Politifact or Gossipcop
B. Hybrid fake-news classifier (depth=0), linked to this root node are all users who published
tweets mentioning the article (depth=1), and in the leafs all
As we discovered in the section III-A, LLMs can be a
users who interacted (liked or retweeted) with those tweets
powerful tool when it comes to classifying claims, especially if
(depth=2). At the end of this process, we are left with a set
the topic is narrow and well written. LMs struggle in broad and
of heterogeneous undirected graphs, each graph representing
diverse topics written in a short attractive way like in the LIAR
one annotated article and relevant users interactions. The user
dataset (extracted from Politifact). So, we explore the idea to
information extracted before represents the features of each
enhance a LM by using the social diffusion of information,
node: user profile description embedded in a 384-dimensional
meaning that people interact in a special way when they
vector that is then concatenated with four more values: fol-
come across fake content on social media. In Twitter, a tweet
lowers, following and tweet counts, and if the user is verified.
promoting fake news has a spike of engagements (comment,
Giving us a feature matrix of [num of nodes x 388].
like, retweet, etc.) moments after it’s published, in contrast,
So, we end up with a heterogeneous graph and since
real news have moderate engagements [19].
learning the representation of heterogeneous graphs using
With that in mind, we proposed a novel architecture that
GAEs is not an easy task, we convert all heterogeneous graphs
utilizes the social interactions of Twitter users around a claim
to homogeneous graphs only by changing the node type of the
and the textual context of that claim by combining both LMs
root since it is the only one different from other nodes. We
and Graph Auto-encoders, it has three main components:
transformed it to a dummy user profile like the rest of nodes
• A Graph Auto-encoder: takes the interactions around with the article title as a profile description. In Table III-B we
a given tweet as a graph and learns an interpretable report the number of graphs, nodes and edges after sampling,

134

Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 10:57:01 UTC from IEEE Xplore. Restrictions apply.
cleaning the FakeNewsNet dataset and transforming each entry TABLE IV
C LASSIFICATION RESULTS OF LM VS . G RAPH VS . H YBRID MODELS FOR
to a homogeneous graph as previously described. FAKE N EWS N ET DATASET.
The original architecture that we designed is composed of
Politifact Gossipcop
two encoders and a feed forward network. The first encoder Model accuracy f1-score accuracy f1-score
is a LM, we use it to encode the news article title, so BERT-classifier 0.81 0.79 0.76 0.72
Graph-classifier 0.66 0.61 0.73 0.51
it takes a textual input that goes through all Transformer Hybrid-classifier 0.89 0.89 0.80 0.74
blocks and outputs a 768-dimensional vector representing the Related work
Textual Context of that article title. The second encoder is a SAFE [20] 0.73 0.72 0.77 0.77
CSI [21] 0.76 0.76 0.75 0.75
Variational Graph Auto-encoder (VGAE), we chose to add word2vec+MLP [22] 0.76 0.76 0.84 0.85
this encoder based on the hypothesis that users interactions on GNN-CL [23] 0.62 0.62 0.95 0.95
GCNFN [24] 0.83 0.84 0.96 0.96
social media regarding a peace of news have a deep pattern
that can be detected, learned and inferenced. Users on Twitter
for example, engage more to a tweet promoting fake-news in the fact that having more users interactions data implies having
a short time frame; it can be represented by sarcasm, denial, strongly connected big graphs that contain more social context
approval, retweet or a simple like. For this reason, we imple- around the given claim, in addition, the graph embedding
mented a GAE to analyze and embed the structure of users doesn’t rely on the textual context, only on users information,
social interactions, so this encoder takes a homogeneous graph thus making it more robust to topic changing claims. This
previously built using FakeNewsNet dataset, and outputs a 16- model performs poorly when the number of examples in the
dimensional vector representing the Social Context around that dataset is not significant as we can see in case of Politifact.
annotated news article. The VGAE is composed of a graph- A downside to the Hybrid architecture we propose. is that
convolutional network (GCN) as encoder and a simple inner it takes on average 20x more time during the learning of
product as decoder. The GCN can encode both the graph’s graphs representation i.e training of the Graph Auto-encoders
structure and the features of the nodes, thus combining both in comparison with the Transformer-based approach, but that
structural and semantic relatedness between users. During the also depends on the number of graphs as well as the number
training of the VGAE on graph representation learning which of nodes in a graph. In fairness, it would also take an immense
is an independent step before inserting it to the architecture, amount of time to pre-train the LMs like BERT. Although if
we combine both Gossipcop and Politifact datasets to ensure both are already pre-trained, the training of the feed forward
diversity representation learning. network or MLP requires only several minutes.
The embeddings of the two encoders are concatenated to These models can also be improved by increasing the size of
form a 784-dimensional vector that is then used as an input to the dataset. During this work, due to limitations of the Twitter
a feed forward network with 4-linear-layers in order to solve API, we sampled a low representative subset of the relevant
the downstream task i.e fake-news classification. For a good tweets. By retrieving all of the tweets, the graphs become
comparability, we built a simple Graph Classifier that uses only richer and more inter-connected and this can improve the
VGAE(encoder part)+FFN, as well as a LM Classifier that uses representation learning of the VGAE model as it only reaches
BERT+FFN. We then experiment with each model separately 0.8 loss for now. By increasing the amount of nodes in each
(Graph Classifier, LM Classifier, and Hybrid Classifier) for graph, the embedding will be more accurate and representative.
both Politifact and Gossipcop datasets. We can also improve the accuracy by fine-tuning BERT LM
2) Results: As before, we use the usual classification before plugging it to the hybrid architecture (see section
metrics: accuracy, recall, precision, and F1-score. It is worth III-A3). Analyzing and embedding the entire textual article or
mentioning that experimenting on both of the FakeNewsNet tweet discussions around a claim instead of just the title, which
datasets shows the capability of a given model to classify not can be confusing or deceiving as it is short and specifically
only political claims found in Politifact but also diverse claims engineered to attract people’s attention, would definitely give
from different domains found in Gossipcop. This dataset is a better context understanding, while a whole tweet discussion
considered as the benchmark for fake-news classifiers. (comments and quotes) would give a better social sentiment of
As reported in Table IV, the Hybrid Classifier that utilizes the news. Additionally, replacing the [CLS] special token of
both Social and Textual context around a given claim is proven the LM input with the Social Embedding in order to enrich the
to be an improvement over only a Graph Classifier or a LM architecture itself with social context next to the textual
LM Classifier, as it increases accuracy and F1-score. We can claim i.e Social Context Vector Injection.
observe that the results related to LM Classifier decrease from
one dataset to the other even if we use more data on Gossipcop, IV. C ONCLUSION & F UTURE W ORK
meaning that adding more data to the learning phase won’t In this study, we explored the capabilities of LMs to be fine-
change the performance of this model, it may even get worse tuned and utilized for fact-checking claims. We have shown the
because diverse claims (more than one topic or theme) can effectiveness of pre-trained LMs as an entity capable of storing
render the LM weak like we saw in III-A3. One the other knowledge rather than implementing modules for external
hand, as data grows and gets more diverse (from Politifact to information retrieval adopted by traditional approaches. Our
Gossipcop), the Graph Classifier performs better, this is due to experiments conclusively yield results that surpass most of

135

Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 10:57:01 UTC from IEEE Xplore. Restrictions apply.
the existing fact-checking methods both traditional and LM- [12] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
based. We also built a custom hybrid architecture that takes Kristina Toutanova. Bert: Pre-training of deep bidirec-
advantage of both Textual Content and Social Context around tional transformers for language understanding. arXiv
a claim by combining both LMs and GAEs. Our experiments preprint arXiv:1810.04805, 2018.
showed that this hybrid architecture performs better for fake- [13] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
news classification and that multimodal learning has the ability dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke
to give us a better and more robust understanding of fake Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly
news based on the textual and social characteristics. We optimized bert pretraining approach. arXiv preprint
plan to experiment with more complex combinations of the arXiv:1907.11692, 2019.
embeddings produced and add different GNNs in the mix. [14] Victor Sanh, Lysandre Debut, Julien Chaumond, and
Thomas Wolf. Distilbert, a distilled version of bert:
R EFERENCES smaller, faster, cheaper and lighter. arXiv preprint
[1] Anton Chernyavskiy, Dmitry Ilvovsky, and Preslav arXiv:1910.01108, 2019.
Nakov. Whatthewikifact: Fact-checking claims against [15] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell,
wikipedia. arXiv preprint arXiv:2105.00826, 2021. Russ R Salakhutdinov, and Quoc V Le. Xlnet: General-
[2] Piotr Przybyla. Capturing the style of fake news. In ized autoregressive pretraining for language understand-
Proceedings of the AAAI Conference on Artificial Intel- ing. Advances in neural information processing systems,
ligence, 2020. 32, 2019.
[3] Kai Shu, Deepak Mahudeswaran, Suhang Wang, and [16] Zhenzhong Lan, Mingda Chen, Sebastian Goodman,
Huan Liu. Hierarchical propagation networks for fake Kevin Gimpel, Piyush Sharma, and Radu Soricut. Al-
news detection: Investigation and exploitation. In Pro- bert: A lite bert for self-supervised learning of language
ceedings of the International AAAI Conference on Web representations. arXiv preprint arXiv:1909.11942, 2019.
and Social Media, volume 14, pages 626–637, 2020. [17] Manzil Zaheer, Guru Guruganesh, Kumar Avinava
[4] Niraj Sitaula, Chilukuri K Mohan, Jennifer Grygiel, Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon,
Xinyi Zhou, and Reza Zafarani. Credibility-based fake Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang,
news detection. In Disinformation, Misinformation, and et al. Big bird: Transformers for longer sequences.
Fake News in Social Media, pages 163–182. Springer, Advances in Neural Information Processing Systems,
2020. 33:17283–17297, 2020.
[5] Nayeon Lee, Belinda Z Li, Sinong Wang, Wen-tau Yih, [18] Zi-Hang Jiang, Weihao Yu, Daquan Zhou, Yunpeng
Hao Ma, and Madian Khabsa. Language models as fact Chen, Jiashi Feng, and Shuicheng Yan. Convbert:
checkers? arXiv preprint arXiv:2006.04102, 2020. Improving bert with span-based dynamic convolution.
[6] Rami Aly, Zhijiang Guo, Michael Sejr Schlichtkrull, Advances in Neural Information Processing Systems,
James Thorne, Andreas Vlachos, Christos 33:12837–12848, 2020.
Christodoulopoulos, Oana Cocarascu, and Arpit [19] Van-Hoang Nguyen, Kazunari Sugiyama, Preslav Nakov,
Mittal. Feverous: Fact extraction and verification over and Min-Yen Kan. Fang. ACM, oct 2020.
unstructured and structured information. 2021. [20] Xinyi Zhou, Jindi Wu, and Reza Zafarani. Safe:
[7] William Yang Wang. ” liar, liar pants on fire”: A Similarity-aware multi-modal fake news detection. In
new benchmark dataset for fake news detection. arXiv Pacific-Asia Conference on Knowledge Discovery and
preprint arXiv:1705.00648, 2017. Data Mining, pages 354–367. Springer, 2020.
[8] Isabelle Augenstein, Christina Lioma, Dongsheng Wang, [21] Natali Ruchansky, Sungyong Seo, and Yan Liu. Csi: A
Lucas Chaves Lima, Casper Hansen, Christian Hansen, hybrid deep model for fake news detection. In Proceed-
and Jakob Grue Simonsen. Multifc: A real-world multi- ings of the 2017 ACM on Conference on Information and
domain dataset for evidence-based fact checking of Knowledge Management, pages 797–806, 2017.
claims. arXiv preprint arXiv:1909.03242, 2019. [22] Yingtong Dou, Kai Shu, Congying Xia, Philip S Yu, and
[9] Parth Patwa, Shivam Sharma, Srinivas Pykl, Vineeth Lichao Sun. User preference-aware fake news detection.
Guptha, Gitanjali Kumari, Md Shad Akhtar, Asif Ekbal, In Proceedings of the 44th International ACM SIGIR
Amitava Das, and Tanmoy Chakraborty. Fighting an Conference on Research and Development in Information
infodemic: Covid-19 fake news dataset. 2021. Retrieval, pages 2051–2055, 2021.
[10] K. Hayawi, S. Shahriar, M.A. Serhani, I. Taleb, and S.S. [23] Yi Han, Shanika Karunasekera, and Christopher Leckie.
Mathew. Anti-vax: a novel twitter dataset for covid-19 Graph neural networks with continual learning for fake
vaccine misinformation detection. Public Health, 2022. news detection from social media. arXiv preprint
[11] Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dong- arXiv:2007.03316, 2020.
won Lee, and Huan Liu. Fakenewsnet: A data repository [24] Federico Monti, Fabrizio Frasca, Davide Eynard, Damon
with news content, social context and spatialtemporal Mannion, and Michael M Bronstein. Fake news detection
information for studying fake news on social media. on social media using geometric deep learning. arXiv
arXiv preprint arXiv:1809.01286, 2018. preprint arXiv:1902.06673, 2019.

136

Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 10:57:01 UTC from IEEE Xplore. Restrictions apply.

You might also like