Professional Documents
Culture Documents
2023 IEEE Ninth International Conference on Big Data Computing Service and Applications (BigDataService) | 979-8-3503-3379-4/23/$31.00 ©2023 IEEE | DOI: 10.1109/BigDataService58306.2023.00025
Abstract—As false information and fake news continue propa- • Propagation-based Fake News Detection [3]: a prin-
gating throughout the internet and social networks, the need for cipled way to characterize and understand hierarchical
fact-checking operations arises, with notable examples politics propagation network features; based on a statistical com-
(such as the 2016 USA Elections) and public health (COVID-
19). A number of solutions have been proposed to deal with this parative analysis over these features, including micro-
problem and limit the spread of false information, both manual level and macro-level, of fake and true news.
and automatic. Undoubtedly the manual approaches done on • Credibility-based Fake News Detection [4]: the infor-
websites such as PolitiFact.com, FactCheck.org and Snopes.com mation about authors of news articles can indicate news
aren’t a viable long term solution: put simply, disinformation is credibility and help detect fake news.
increasing and human fact-checkers simply don’t scale up at the
same rate. This paper presents our contributions, which include: Although these approaches exist, numerous efforts for fake
(i) An automated solution for fact-checking using state-of-the-art news classification separate the content from the context:
Language Models (LMs) and five well known datasets containing
annotated claims/tweets to fine-tune each LM and classify a given • Language model-based approaches [5]: These methods
claim through textual context; and (ii) A custom architecture, the use LMs for fact-checking, with zero-shot LMs showing
Hybrid Fake News Classifier, that utilizes both textual context improvements over random baseline LMs.
and social context by combining both LLMs and Graph Auto- • Hierarchical propagation-based approaches [3]: These
Encoders (GAEs). We show that fine-tuning a LM with the
correct settings can achieve an high accuracy and F1-scores, methods focus on the correlation between news and
better than the majority of fact-checking methods that exist characteristics of a social network, analyzing the micro-
today. Moreover, the Hybrid Fake News Classifier achieves better level and macro-level propagation network features.
than SOTA accuracy and F1-score, which shows that we can use
the social interactions of users in a network as an additional key Our approach combines elements of the above and utilizes
attribute next-to textual claims for fake-news detection. firstly pre-trained Language Models (LMs) to solve the clas-
sification task i.e fact-checking claims. The model assesses
Index Terms—Fact-Checking, Fake-news, Natural Language with a degree of confidence the truthfulness or falseness of a
Processing, Multimodal Learning, Pre-trained Language Model, claim given as an input by exploiting LMs that were already
Fine-tuning, Text understanding, Natural Language Inferencing,
Complex Networks, Social Graphs, Graph AutoEncoder.
trained on large scale textual databases such as Wikipedia,
CCNews, etc. and is not intended for real-time use. Addition-
ally, we use a Graph Auto-encoder; which analyzes Twitter
I. I NTRODUCTION & R ELATED W ORK users by exploiting the social properties (user information and
user-to-user interactions) of the network without taking into
From a social and psychological perspective, humans have consideration the textual claim itself. A Graph Auto-Encoder
been proven irrational and vulnerable when differentiating (GAE) is an unsupervised deep learning model specifically
between true and fake news. Social media platforms have designed for graph-structured data. It aims to capture the latent
accelerated the spread of news, augmenting the problem. This representation of nodes in the graph, while preserving the
has given rise to research that aims to identify fake news graph’s topological structure and node interactions. The GAE
and understand their spreading and diffusion. In the computer consists of an encoder that maps nodes to embeddings in the
science literature, different approaches have been proposed, latent space and a decoder that reconstructs the graph from
mainly classified as: the latent embeddings. In essence, GAEs learn a compressed
• Knowledge-based Fake News Detection [1]: a method representation of the graph that can be used for various
aims to assess news authenticity by comparing the knowl- graph analysis tasks, like link prediction, node classification,
edge extracted from to-be verified news content with or community detection. The output of the graph encoder
known facts, also called fact-checking. is a 16-dimensional vector in the latent space; we call it
• Style-based Fake News Detection [2]: focuses on the ”Social Embedding”. Then, the textual input is encoded by
style of writing, i.e. the form rather than the meaning. a Transformer-based Language Model (e.g., BERT) so as
133
Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 10:57:01 UTC from IEEE Xplore. Restrictions apply.
TABLE II
LM S CLASSIFICATION RESULTS OF THE BEST LM IN EACH EXPERIMENT.
# of labels
Dataset Metric 2-labels 3-labels 5-labels 6-labels
accuracy 0.81 0.64 - -
FEVER
f1-score 0.81 0.63 - -
accuracy 0.72 - 0.50 -
MultiFC
f1-score 0.64 - 0.40 -
accuracy 0.69 - - 0.31
Liar
f1-score 0.61 - - 0.30
accuracy 0.98 - - -
Covid19
f1-score 0.98 - - -
accuracy 0.99 - - -
ANTiVax
f1-score 0.99 - - -
TABLE III
P ROPERTIES OF THE FAKE N EWS N ET DATASET AFTER TRANSFORMING
EACH ENTRY TO A HOMOGENEOUS GRAPH
It’s noteworthy that all LMs performed relatively close to latent representation for that graph then outputs a ”Social
each other during all experiments. The margin of difference Embedding”.
is 0.01 to 0.07 in accuracy and 0.01 to 0.14 in F1- • A Language Model: takes the textual claim as an input,
score. The lowest scores are attributed to ALBERT and it’s and encodes it to a vector i.e ”Textual Embedding”.
presumably due to the fact that this LM has lowest number of • A Feed Forward Network: takes the Social and Textual
parameters (12 million) in comparison to other LMs. On the embeddings as an input, and outputs the class to which
other hand, ConvBERT had the highest scores in most runs it belongs i.e Fake or Real.
— especially when the number of labels increases — which 1) Experimental setup: Before we dive into technical
may be explained by the subtle difference in the architecture details of the original architecture that we designed it is
(span level dynamic convolution). Alongside ConvBERT, we necessary to first understand the structure of the data. As
find RoBERTa and BERT that perform better than ConvBERT mentioned before in section II, FakeNewsNet dataset contains
by a margin of 0.01 in both accuracy and F1-score, but we annotated articles extracted from Politifact and Gossipcop, in
should keep in mind that a large number of variables controls addition, each article has a list of Tweet IDs. We exported from
the outcome of the experiments, for instance, changing the Twitter all related tweets and user information of the author
hyperparameters can cause minor or highly noticeable differ- of the tweet and the likers/retweeters (profile description,
ences in the performance of our LMs. When the networks are followers/following/tweet count, and whether or not the user
large, deep and complex, the explainability and interpretability is verified), then transformed the data to a tree-like graph
of the results becomes a exceedingly harder task. dataset with depth of two. At the top of the tree we find
the original annotated article from Politifact or Gossipcop
B. Hybrid fake-news classifier (depth=0), linked to this root node are all users who published
tweets mentioning the article (depth=1), and in the leafs all
As we discovered in the section III-A, LLMs can be a
users who interacted (liked or retweeted) with those tweets
powerful tool when it comes to classifying claims, especially if
(depth=2). At the end of this process, we are left with a set
the topic is narrow and well written. LMs struggle in broad and
of heterogeneous undirected graphs, each graph representing
diverse topics written in a short attractive way like in the LIAR
one annotated article and relevant users interactions. The user
dataset (extracted from Politifact). So, we explore the idea to
information extracted before represents the features of each
enhance a LM by using the social diffusion of information,
node: user profile description embedded in a 384-dimensional
meaning that people interact in a special way when they
vector that is then concatenated with four more values: fol-
come across fake content on social media. In Twitter, a tweet
lowers, following and tweet counts, and if the user is verified.
promoting fake news has a spike of engagements (comment,
Giving us a feature matrix of [num of nodes x 388].
like, retweet, etc.) moments after it’s published, in contrast,
So, we end up with a heterogeneous graph and since
real news have moderate engagements [19].
learning the representation of heterogeneous graphs using
With that in mind, we proposed a novel architecture that
GAEs is not an easy task, we convert all heterogeneous graphs
utilizes the social interactions of Twitter users around a claim
to homogeneous graphs only by changing the node type of the
and the textual context of that claim by combining both LMs
root since it is the only one different from other nodes. We
and Graph Auto-encoders, it has three main components:
transformed it to a dummy user profile like the rest of nodes
• A Graph Auto-encoder: takes the interactions around with the article title as a profile description. In Table III-B we
a given tweet as a graph and learns an interpretable report the number of graphs, nodes and edges after sampling,
134
Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 10:57:01 UTC from IEEE Xplore. Restrictions apply.
cleaning the FakeNewsNet dataset and transforming each entry TABLE IV
C LASSIFICATION RESULTS OF LM VS . G RAPH VS . H YBRID MODELS FOR
to a homogeneous graph as previously described. FAKE N EWS N ET DATASET.
The original architecture that we designed is composed of
Politifact Gossipcop
two encoders and a feed forward network. The first encoder Model accuracy f1-score accuracy f1-score
is a LM, we use it to encode the news article title, so BERT-classifier 0.81 0.79 0.76 0.72
Graph-classifier 0.66 0.61 0.73 0.51
it takes a textual input that goes through all Transformer Hybrid-classifier 0.89 0.89 0.80 0.74
blocks and outputs a 768-dimensional vector representing the Related work
Textual Context of that article title. The second encoder is a SAFE [20] 0.73 0.72 0.77 0.77
CSI [21] 0.76 0.76 0.75 0.75
Variational Graph Auto-encoder (VGAE), we chose to add word2vec+MLP [22] 0.76 0.76 0.84 0.85
this encoder based on the hypothesis that users interactions on GNN-CL [23] 0.62 0.62 0.95 0.95
GCNFN [24] 0.83 0.84 0.96 0.96
social media regarding a peace of news have a deep pattern
that can be detected, learned and inferenced. Users on Twitter
for example, engage more to a tweet promoting fake-news in the fact that having more users interactions data implies having
a short time frame; it can be represented by sarcasm, denial, strongly connected big graphs that contain more social context
approval, retweet or a simple like. For this reason, we imple- around the given claim, in addition, the graph embedding
mented a GAE to analyze and embed the structure of users doesn’t rely on the textual context, only on users information,
social interactions, so this encoder takes a homogeneous graph thus making it more robust to topic changing claims. This
previously built using FakeNewsNet dataset, and outputs a 16- model performs poorly when the number of examples in the
dimensional vector representing the Social Context around that dataset is not significant as we can see in case of Politifact.
annotated news article. The VGAE is composed of a graph- A downside to the Hybrid architecture we propose. is that
convolutional network (GCN) as encoder and a simple inner it takes on average 20x more time during the learning of
product as decoder. The GCN can encode both the graph’s graphs representation i.e training of the Graph Auto-encoders
structure and the features of the nodes, thus combining both in comparison with the Transformer-based approach, but that
structural and semantic relatedness between users. During the also depends on the number of graphs as well as the number
training of the VGAE on graph representation learning which of nodes in a graph. In fairness, it would also take an immense
is an independent step before inserting it to the architecture, amount of time to pre-train the LMs like BERT. Although if
we combine both Gossipcop and Politifact datasets to ensure both are already pre-trained, the training of the feed forward
diversity representation learning. network or MLP requires only several minutes.
The embeddings of the two encoders are concatenated to These models can also be improved by increasing the size of
form a 784-dimensional vector that is then used as an input to the dataset. During this work, due to limitations of the Twitter
a feed forward network with 4-linear-layers in order to solve API, we sampled a low representative subset of the relevant
the downstream task i.e fake-news classification. For a good tweets. By retrieving all of the tweets, the graphs become
comparability, we built a simple Graph Classifier that uses only richer and more inter-connected and this can improve the
VGAE(encoder part)+FFN, as well as a LM Classifier that uses representation learning of the VGAE model as it only reaches
BERT+FFN. We then experiment with each model separately 0.8 loss for now. By increasing the amount of nodes in each
(Graph Classifier, LM Classifier, and Hybrid Classifier) for graph, the embedding will be more accurate and representative.
both Politifact and Gossipcop datasets. We can also improve the accuracy by fine-tuning BERT LM
2) Results: As before, we use the usual classification before plugging it to the hybrid architecture (see section
metrics: accuracy, recall, precision, and F1-score. It is worth III-A3). Analyzing and embedding the entire textual article or
mentioning that experimenting on both of the FakeNewsNet tweet discussions around a claim instead of just the title, which
datasets shows the capability of a given model to classify not can be confusing or deceiving as it is short and specifically
only political claims found in Politifact but also diverse claims engineered to attract people’s attention, would definitely give
from different domains found in Gossipcop. This dataset is a better context understanding, while a whole tweet discussion
considered as the benchmark for fake-news classifiers. (comments and quotes) would give a better social sentiment of
As reported in Table IV, the Hybrid Classifier that utilizes the news. Additionally, replacing the [CLS] special token of
both Social and Textual context around a given claim is proven the LM input with the Social Embedding in order to enrich the
to be an improvement over only a Graph Classifier or a LM architecture itself with social context next to the textual
LM Classifier, as it increases accuracy and F1-score. We can claim i.e Social Context Vector Injection.
observe that the results related to LM Classifier decrease from
one dataset to the other even if we use more data on Gossipcop, IV. C ONCLUSION & F UTURE W ORK
meaning that adding more data to the learning phase won’t In this study, we explored the capabilities of LMs to be fine-
change the performance of this model, it may even get worse tuned and utilized for fact-checking claims. We have shown the
because diverse claims (more than one topic or theme) can effectiveness of pre-trained LMs as an entity capable of storing
render the LM weak like we saw in III-A3. One the other knowledge rather than implementing modules for external
hand, as data grows and gets more diverse (from Politifact to information retrieval adopted by traditional approaches. Our
Gossipcop), the Graph Classifier performs better, this is due to experiments conclusively yield results that surpass most of
135
Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 10:57:01 UTC from IEEE Xplore. Restrictions apply.
the existing fact-checking methods both traditional and LM- [12] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and
based. We also built a custom hybrid architecture that takes Kristina Toutanova. Bert: Pre-training of deep bidirec-
advantage of both Textual Content and Social Context around tional transformers for language understanding. arXiv
a claim by combining both LMs and GAEs. Our experiments preprint arXiv:1810.04805, 2018.
showed that this hybrid architecture performs better for fake- [13] Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Man-
news classification and that multimodal learning has the ability dar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke
to give us a better and more robust understanding of fake Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly
news based on the textual and social characteristics. We optimized bert pretraining approach. arXiv preprint
plan to experiment with more complex combinations of the arXiv:1907.11692, 2019.
embeddings produced and add different GNNs in the mix. [14] Victor Sanh, Lysandre Debut, Julien Chaumond, and
Thomas Wolf. Distilbert, a distilled version of bert:
R EFERENCES smaller, faster, cheaper and lighter. arXiv preprint
[1] Anton Chernyavskiy, Dmitry Ilvovsky, and Preslav arXiv:1910.01108, 2019.
Nakov. Whatthewikifact: Fact-checking claims against [15] Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell,
wikipedia. arXiv preprint arXiv:2105.00826, 2021. Russ R Salakhutdinov, and Quoc V Le. Xlnet: General-
[2] Piotr Przybyla. Capturing the style of fake news. In ized autoregressive pretraining for language understand-
Proceedings of the AAAI Conference on Artificial Intel- ing. Advances in neural information processing systems,
ligence, 2020. 32, 2019.
[3] Kai Shu, Deepak Mahudeswaran, Suhang Wang, and [16] Zhenzhong Lan, Mingda Chen, Sebastian Goodman,
Huan Liu. Hierarchical propagation networks for fake Kevin Gimpel, Piyush Sharma, and Radu Soricut. Al-
news detection: Investigation and exploitation. In Pro- bert: A lite bert for self-supervised learning of language
ceedings of the International AAAI Conference on Web representations. arXiv preprint arXiv:1909.11942, 2019.
and Social Media, volume 14, pages 626–637, 2020. [17] Manzil Zaheer, Guru Guruganesh, Kumar Avinava
[4] Niraj Sitaula, Chilukuri K Mohan, Jennifer Grygiel, Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon,
Xinyi Zhou, and Reza Zafarani. Credibility-based fake Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang,
news detection. In Disinformation, Misinformation, and et al. Big bird: Transformers for longer sequences.
Fake News in Social Media, pages 163–182. Springer, Advances in Neural Information Processing Systems,
2020. 33:17283–17297, 2020.
[5] Nayeon Lee, Belinda Z Li, Sinong Wang, Wen-tau Yih, [18] Zi-Hang Jiang, Weihao Yu, Daquan Zhou, Yunpeng
Hao Ma, and Madian Khabsa. Language models as fact Chen, Jiashi Feng, and Shuicheng Yan. Convbert:
checkers? arXiv preprint arXiv:2006.04102, 2020. Improving bert with span-based dynamic convolution.
[6] Rami Aly, Zhijiang Guo, Michael Sejr Schlichtkrull, Advances in Neural Information Processing Systems,
James Thorne, Andreas Vlachos, Christos 33:12837–12848, 2020.
Christodoulopoulos, Oana Cocarascu, and Arpit [19] Van-Hoang Nguyen, Kazunari Sugiyama, Preslav Nakov,
Mittal. Feverous: Fact extraction and verification over and Min-Yen Kan. Fang. ACM, oct 2020.
unstructured and structured information. 2021. [20] Xinyi Zhou, Jindi Wu, and Reza Zafarani. Safe:
[7] William Yang Wang. ” liar, liar pants on fire”: A Similarity-aware multi-modal fake news detection. In
new benchmark dataset for fake news detection. arXiv Pacific-Asia Conference on Knowledge Discovery and
preprint arXiv:1705.00648, 2017. Data Mining, pages 354–367. Springer, 2020.
[8] Isabelle Augenstein, Christina Lioma, Dongsheng Wang, [21] Natali Ruchansky, Sungyong Seo, and Yan Liu. Csi: A
Lucas Chaves Lima, Casper Hansen, Christian Hansen, hybrid deep model for fake news detection. In Proceed-
and Jakob Grue Simonsen. Multifc: A real-world multi- ings of the 2017 ACM on Conference on Information and
domain dataset for evidence-based fact checking of Knowledge Management, pages 797–806, 2017.
claims. arXiv preprint arXiv:1909.03242, 2019. [22] Yingtong Dou, Kai Shu, Congying Xia, Philip S Yu, and
[9] Parth Patwa, Shivam Sharma, Srinivas Pykl, Vineeth Lichao Sun. User preference-aware fake news detection.
Guptha, Gitanjali Kumari, Md Shad Akhtar, Asif Ekbal, In Proceedings of the 44th International ACM SIGIR
Amitava Das, and Tanmoy Chakraborty. Fighting an Conference on Research and Development in Information
infodemic: Covid-19 fake news dataset. 2021. Retrieval, pages 2051–2055, 2021.
[10] K. Hayawi, S. Shahriar, M.A. Serhani, I. Taleb, and S.S. [23] Yi Han, Shanika Karunasekera, and Christopher Leckie.
Mathew. Anti-vax: a novel twitter dataset for covid-19 Graph neural networks with continual learning for fake
vaccine misinformation detection. Public Health, 2022. news detection from social media. arXiv preprint
[11] Kai Shu, Deepak Mahudeswaran, Suhang Wang, Dong- arXiv:2007.03316, 2020.
won Lee, and Huan Liu. Fakenewsnet: A data repository [24] Federico Monti, Fabrizio Frasca, Davide Eynard, Damon
with news content, social context and spatialtemporal Mannion, and Michael M Bronstein. Fake news detection
information for studying fake news on social media. on social media using geometric deep learning. arXiv
arXiv preprint arXiv:1809.01286, 2018. preprint arXiv:1902.06673, 2019.
136
Authorized licensed use limited to: VIT University. Downloaded on March 08,2024 at 10:57:01 UTC from IEEE Xplore. Restrictions apply.