Professional Documents
Culture Documents
Abstract— Since there are so many users on social media, who has also impacted the 2016 U.S. presidential election cam-
are not qualified to report news, fake news has become a major paign [1], [12]. In the 5 months before the election, 30 million
problem in recent years. Therefore, it is crucial to identify and tweets were accumulated. A 25% of these tweets contained
restrict the dissemination of false information. Numerous deep
learning models that make use of natural language processing biased or fake news. Most of this fake news was circulated by
have yielded excellent results in the detection of fake news. Trump supporters, who also used it to influence the activities
bidirectional encoder representations from transformers (BERT), of Clinton supporters and assist Trump in winning the elec-
based on transfer learning, is one of the most advanced models. tion [13]. The researchers’ perspectives on the significance of
In this work, the researchers have compared the earlier studies determining if the information is accurate are also changing.
that employed baseline models versus the research articles where
the researchers used a pretrained model BERT for the detection As the content creator wants to deceive the reader, they will
of fake news. The literature analysis revealed that utilizing write the news in a way that makes it difficult for the reader to
pretrained algorithms is more effective at identifying fake news tell the difference between fake and real news, which makes
because it takes less time to train them and yields better results. the detection of fake news challenging [11]. Furthermore, the
Based on the results noted in this article, the researchers have information gleaned through social media is vast, frequently
advised the utilization of pretrained models that have already
been taught to take advantage of transfer learning, which anonymous, user-generated, and noisy. These difficulties have
shortens training time and enables the use of large datasets, led to the need to use more sophisticated techniques for
as well as a reputable model that performs well in terms of detecting false news [11].
precision, recall, as well as the minimum number of false positive A few studies [5], [6], [9] focused on the false news
and false negative outputs. As a result, the researchers created an identification field and displayed good accuracy. Despite this,
improved BERT model, while considering fine-tuning it to meet
the demands of the fake news identification assignment. To obtain most of these studies employ both deep learning and traditional
the most accurate representation of the input text, the final layer machine learning methods. In this study, the researchers have
of this model is also unfrozen and trained on news texts. The compared the pretrained bidirectional encoder representations
dataset used in the study included 23 502 articles of fake news from transformers (BERT) model with the baseline models
and 21 417 items of actual news. This dataset was downloaded for detecting false news. BERT is a deep learning method
from the Kaggle website. The results of this study demonstrated
that the proposed model showed a better performance compared based on transformers and uses transfer learning. It has
with other models, and achieved 99.96% and 99.96% in terms been developed by Google for natural language processing
of accuracy and F1 score, respectively. (NLP) [2]. A sizable corpus of unlabeled text has been used
Index Terms— Bidirectional encoder representations from to pretrain BERT. The training texts consisted of Book Corpus
transformers (BERT), fake news, pretrained model, social (800 million words) and the entirety of Wikipedia, totaling
networks. 2 500 000 000 words. The model was trained for performing
two tasks, i.e., next sentence prediction (NSP) and masked
language modeling (MLM) [2]. BERT uses the Transformers
I. I NTRODUCTION
to learn how words in a text relate to one another in context.
Authorized licensed use limited to: GMR Institute of Technology. Downloaded on August 28,2023 at 07:05:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
the BERT model to achieve the highest accuracy, F1 score, model showed better performance, with regard to the accuracy
precision, recall, and least error rate in terms of fake news rate and F1 score (0.904 and 0.928, respectively), using the
detection. PolitiFact dataset. The model did not perform as well on the
The main contributions of this study are listed shortly in second dataset as it did on the first, indicating that it did not
these points. behave consistently across different datasets. Additional side
1) Improving a big dataset that has long text for each information was needed to find the comprehensible remarks
record. that will strengthen the model.
2) Utilizing the transfer learning and pretrained models to To categorize false Bengali text in a dataset,
reduce the training time and make use of a large dataset. Mugdha et al. [6] examined various supervised models.
3) Enhancing a BERT model which takes into consideration The best performance was displayed by the Gaussian Naive
the sequential nature of the input data and the contextual Bayes (GNB) algorithm, which uses an extra tree classifier
meaning of the texts. to select the best features and text features based on term
4) Achieving a high recall, precision, and the least amount frequency–inverse document frequency (TF-IDF). This
of false positive and false negative results. model fared better than the other models and displayed an
The remaining study is organized in the following manner. accuracy rate of 87.42%. However, their study was restricted
Section II presents a discussion of a few related studies. to using only the news headlines, not the entire article.
In Section III, the researchers have discussed the gaps existing Another study [7] presented a complete analysis to assess
in the studies and the implications of this study. In Section IV, the effectiveness of 23 supervised machine learning models.
the researchers have discussed the methodology used in this The results indicated that the decision tree generated the
study. Section V has presented the results of all experiments best results. They established that deep learning models
conducted in this study. Finally, Section VI has mentioned the outperform the conventional machine learning models.
conclusions of the study. Singh [8] employed three models—the CNN, recurrent
neural network (RNN), and artificial neural network—to illus-
trate the advantages of deep learning. He utilized four forms to
II. R ELATED W ORK represent the vector space, i.e., Doc2Vec, Word2Vec, TF–IDF,
The impact of fake news on people and society makes it and one-hot vector encoding. The models were analyzed using
one of the most essential research topics [5], [10]. To dis- two datasets, i.e., LIAR and Kaggle, and the TF–IDF vector
tinguish fake news from legitimate news, the researchers representation showed a better performance using the Kaggle
in [3] extracted information from texts using machine learning dataset. The dataset only contains short sentences; hence it is
models and techniques. They evaluated their dataset using impossible to generalize the findings to all sentence lengths.
K -nearest neighbor, Naive Bayes, support vector machine The two classification models and user-level security proto-
(SVM), random forest, and XGBoost models. Though their col comprise three components of HRSP. The classification of
models performed well, they were not effective with large contents is one of the models. This methodology aims to cat-
datasets. In addition, the context of the statements was not egorize the content into three categories: hate speech, benign
considered. The context meaning of sentences or even the speech, and inappropriate speech. The seven machine learning
individual words is a critical determinant in any NLP process algorithms—KNN, J48, SVM, NB, R-F, logistic regression
as it impacts the overall meaning of the texts and could confuse (LR), and D-Table—were used in their study. They used a
the classification model [11], [12]. For instance, the term dataset of 22 000 tweets to test their algorithms, and their
“book” denotes something to read in the statement “I have overall accuracy was 84.4%. The sole drawback of this study
read a book this weekend.” Whereas it also means to “reserve is that it only analyses the textual data, although online social
a flight” in the statement, “I have to book a flight before next networks contain a plethora of data.
Friday.” Most studies on the topic of false news identification
A more complex model called FNDNet was proposed by rely on conventional machine learning and deep learning
Kaliyar et al. [4]. FNDNet is a system that uses deep convo- methods. However, they are not strong enough. To save
lutional neural networks (CNNs) to identify false information training time and develop more reliable and trustworthy
that circulated during the 2016 U.S. Presidential Election. models, researchers began to focus on pretrained models.
They applied the GLoVe word embedding technique and com- To identify false information during the COVID-19 epidemic,
pared their model to numerous baseline models, including ran- some researchers [2] presented a transformer-based model
dom forest (RF), decision tree, multinomial Naive Bayes, and for processing the explainable natural language using the
K -nearest neighbor. When compared with the baseline models, BERT model and its modifications. They used SHAP (Shapley
their model performed exceptionally well. Despite this, their Additive exPlanations) to enhance the model’s functionality.
strategy does not rely on context, content, or temporal-level To explain the results of the BERT model, SHAP assigns a
data. significant value to each characteristic that is associated with
To learn the order and correlation between the news content a certain prediction. They tested their model using two sizable
and to explain the prediction, Shu et al. [5] suggested a datasets and noted outstanding accuracy results of 0.972 and
model based on the sentence-comment and co-attention. Their 0.938. However, most COVID-19 statements that might be
model, which was created using BiLSTM, was applied to a confirmed are not included in their study, which suggests that
sizeable dataset acquired from PolitiFact and GossipCop. This these datasets may be deceptive and not accurately reflect the
Authorized licensed use limited to: GMR Institute of Technology. Downloaded on August 28,2023 at 07:05:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
population. In addition, the study is restricted to identifying originally written by the native speakers. The ARBERT model
false information regarding COVID-19, and other pieces of yielded the best results, with an E score of 98.9%, using their
knowledge are not used to predict the behavior of the model. dataset. Their only drawback was a small dataset.
The dissemination of false information is not limited to the
English language; Arabic is also affected. Due to a lack of III. C ONCEPTUAL C OMPARISON
resources and databases, detecting fake news in Arabic is a
difficult undertaking [18], [19]. Harrag and Djahli [20] used After comparing the different models, the following per-
the Arabic Fact-Checking and Stance Detection Corpus and spectives were derived in this study: 1) because fake news
said it is the only dataset that they could find in the Ara- has such a negative influence on people and society, it must
bic language and supports the fake news detection problem. be prevented from spreading readily through social media
Another issue in detecting fake news in the Arabic language is networks; 2) researchers must use more reliable techniques to
the dialect variety. Hanen Himdi et al. [21] had to collect their intercept any erroneous or misleading information that might
dataset in three different dialects to deal with this problem. be put on social media before users see it; 3) to train a
The AraCOVID19-MFH dataset was published and manu- model from scratch and produce a decent model with an
ally annotated by Ameur and Aliane [22] for detecting fake accurate output, a large dataset and a lot of time are required
news. They employed three baseline pretrained models in for understanding the good correlation between the input
addition to two transformer models that were further pretrained characteristics; and 4) while several studies on false news
on the COVID-19 dataset to familiarize them with COVID-19 identification have performed well in terms of accuracy and
terms, and then they applied them to the goal of identifying F1 scores, their studies were limited to short texts and small
fake news. The model that was trained on COVID-19 words datasets because they used conventional machine learning and
produced the greatest results, with a weighted F-score of deep learning models. Based on these perspectives and the
95.78%. Even though the dataset is small and could only conclusions derived from other studies, the researchers in this
be used for identifying the false news in COVID-19 news, study strived to take advantage of transfer learning and use
it showed high accuracy. pretrained models, which reduces training time and enables
For identifying bogus news in Arabic, Yahya et al. [23] the use of large datasets, for generating a reliable model that
compared the performance of the transformer-based (ArBert, displays a high precision, recall, and the least amount of false
ArElectra, AraBERT v1, AraBERT v2, AraBERT v02, positive and false negative results.
QARiB, and MarBert) and neural network-based [CNN, RNN, Table I describes, in brief, all related studies based on the
and gated recurrent unit (GRU)] models. They utilized four dataset that was used, either big or small, representing the
datasets for evaluating the different models. They demon- population or not and whether it made the model more weak
strated that the transformers-based models showed a bet- or robust. It is clear from Table I that most studies were made
ter performance than the conventional neural networks. The on small datasets where the model’s behavior is not predicted
transformer-based QARiB model showed the highest F1 score or not efficient with big datasets. Moreover, some studies are
of 95% and displayed the best performance. Despite the limited to a specific type of datasets. In addition, some datasets
impressive outcome, their dataset was limited and contained have noise or are not clean.
repeated tweets. It also has noise and unclassifiable texts. Table II shows a brief description of the models used in
Mahlous and Al-Laith [24] proposed a novel dataset for every study, the accuracy rate that was achieved and the
detecting fake news. After cleaning, the dataset included limitations of every study. It seems that most models have high
1537 manually annotated tweets and 34 529 automatically accuracy or F-score rate but it lacks reliance on the content,
annotated tweets. Only machine learning algorithms, including context, or temporal-level information.
Naive Bayes, LR, RF bagging model, multilayer perceptron After comparing the various concepts, a new BERT model
(MLP), SVM, and eXtreme Gradient Boosting Model (XGB), has been proposed in this study for fulfilling the study
were used to evaluate the datasets. The LR classifier showed objectives.
a better performance than other classifiers for both datasets,
with regard to the F1 score of 87.8% for the manually- IV. M ETHODOLOGY
annotated dataset, using the TF–IDF feature; and 93.3% for the
automatically annotated dataset with the count vector as the A. Dataset and Preprocessing Steps
feature. Although the results were excellent, the dataset used The study’s dataset was downloaded from the Kaggle web-
was very small and included different dialects, which made site [16]. The downloaded folder included two files, one of
it difficult to determine the good features for classification. which contained Fake news, and the second folder contained
In addition, it was affected due to language mistakes. True news. Both files included four columns: the Title column
For detecting fake news, Nassif et al. [25] assembled their with article titles, the Text column with the articles themselves,
dataset. They also utilized a different dataset that was released the Subject column with the article’s subject, and the Date
by Kaggle and was translated from English to Arabic using the column with the publication date for every article.
Google Translate feature. To examine both datasets, they used The dataset must be preprocessed before being used in the
eight transformer-based models designed for Arabic. Their experiments. Each file contains data without assigned labels,
findings demonstrated that working with the translated text therefore to identify false news, a column of individual values
was not similar in comparison to working on data that was is added to the fake news file. In addition, a column of zeros is
Authorized licensed use limited to: GMR Institute of Technology. Downloaded on August 28,2023 at 07:05:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE I TABLE II
L IMITATIONS OF THE S TUDIES BASED ON T HEIR D ATASETS L IMITATIONS OF THE S TUDIES BASED ON THE M ODELS
T HAT W ERE U SED AND T HEIR A CCURACY R ATES
Authorized licensed use limited to: GMR Institute of Technology. Downloaded on August 28,2023 at 07:05:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
Authorized licensed use limited to: GMR Institute of Technology. Downloaded on August 28,2023 at 07:05:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
updated during the training and have the values of the same
weights), followed by its training using news articles to obtain
the best accurate representation of the input text and 2) two
different splitting criteria are employed; criterion 1 used 70%
of the data for training, 15% for validation, and 15% for
testing. The second criterion uses 70% for training, while 30%
is used for testing. To find the ideal model parameters, the
trials are repeated numerous times. It has been discovered that,
for both splitting criteria, using two epochs and ten batches,
respectively, and unfreezing the final layer is optimum [15].
The best results are displayed in Table III.
V. R ESULTS
This study aimed to differentiate between detailed fake news Fig. 4. Confusion matrix on testing data.
and real news. As was previously indicated in the technique
section, an improved BERT model was employed for this
purpose. BERT is the model of choice for conducting experi-
ments as it considers the context of the input text. In addition,
by initializing the model parameters with the values of all
BERT parameters, we can increase the performance without
starting from scratch.
As we care about the accuracy of the model to measure
how much it is accurate in getting the right prediction for each
instance, we have to make sure to measure the model ability
to find the positive class and how much it is accurate when
classifying it as positive which achieved by using the F1 score
measurement. F1 score is the harmonic mean of the precision
and recall and it is the most suitable metric when the data
is imbalanced, since the accuracy is not a good one because
the number of positive class is much smaller or larger than Fig. 5. ROC curve on testing data.
the negative class. The researchers determined the model’s
performance using a variety of indicators. All indicators show
great performance, indicating that the model is reliable enough Fig. 4 depicts the confusion matrix for testing data. The false
to identify bogus news with high accuracy. Both splitting negative value was seen to be 0 in Fig. 4, which indicated that
criteria have achieved excellent results after training with the the model could properly classify all the positive values and
settings listed in Section IV, where the first one achieved the allocated them to the appropriate class. Five false positives
values of 0.27%, 99.94%, 99.94%, 99.94%, and 99.944% for indicate that just five occurrences of the negative class were
loss, accuracy, precision, recall, and F1 score, respectively. incorrectly classified as belonging to a positive class by the
The second splitting criterion presented values of 0.31%, model. This suggested that only five out of the entire dataset’s
99.96%, 99.96%, 99.97%, and 99.96% for loss, accuracy, 44 898 examples were wrongly identified, providing further
precision, recall, and F1 score, respectively. Results are shown proof of the model’s robustness.
in Table IV. As can be seen, the results are extremely close Fig. 5 displays the receiver operating characteristic (ROC)
to one another. The best F1 score and accuracy rate of curve which is used to evaluate the performance of binary
99.96% and 99.96%, respectively, were noted in this study, classification algorithms and provides a graphical representa-
which indicated that the proposed model outperformed the tion of the classifier’s performance, rather than a single value.
previously-mentioned models. The results indicate that this ROC curve is constructed by plotting the true positive rate
model has a very low error percentage and a higher accuracy (TPR), or Sensitivity, against the false positive rate (FPR),
for both the splitting criteria, which makes it very trustworthy or (1-Specificity). The true positive rate is the proportion of
and reliable. the instances that were correctly predicted to be positive out
Authorized licensed use limited to: GMR Institute of Technology. Downloaded on August 28,2023 at 07:05:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
TABLE V
C OMPARISON B ETWEEN THE R ESULTS AND T HOSE
P RESENTED IN THE L ITERATURE
Fig. 6. Comparison of the results noted in this study and those published in
earlier studies.
Authorized licensed use limited to: GMR Institute of Technology. Downloaded on August 28,2023 at 07:05:17 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
[2] J. Ayoub, X. J. Yang, and F. Zhou, “Combat COVID-19 infodemic using [15] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training
explainable natural language processing models,” Inf. Process. Manage., of deep bidirectional transformers for language understanding,” 2018,
vol. 58, no. 4, Jul. 2021, Art. no. 102569. arXiv:1810.04805.
[3] J. C. S. Reis, A. Correia, and F. Murai, A. Veloso, and F. Benevenuto, [16] Fakenews1. Accessed: May 25, 2022. [Online]. Available:
“Supervised learning for fake news detection,” IEEE Intell. Syst., vol. 34, https://www.kaggle.com and www.kaggle.com/datasets/kruzes1/
no. 2, pp. 76–81, Mar./Apr. 2019. fakenews1
[4] R. K. Kaliyar, A. Goswami, P. Narang, and S. Sinha, “FNDNet—A deep [17] S. Qin, J. Zhu, J. Qin, W. Wang, and D. Zhao, “Recurrent attentive
convolutional neural network for fake news detection,” Cogn. Syst. Res., neural process for sequential data,” 2019, arXiv:1910.09323.
vol. 61, pp. 32–44, Jun. 2020. [18] K. Darwish, W. Magdy, and T. Zanouda, “Improved stance prediction
[5] K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, “Defend: Explainable in a user similarity feature space,” in Proc. IEEE/ACM Int. Conf. Adv.
fake news detection,” in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Social Netw. Anal. Mining, Jul. 2017, pp. 145–148.
Discovery Data Mining, 2019, pp. 395–405. [19] T. Elsayed et al., “Overview of the CLEF-2019 CheckThat! Lab: Auto-
[6] S. B. S. Mugdha, S. M. Ferdous, and A. Fahmin, “Evaluating matic identification and verification of claims,” in Proc. Int. Conf. Cross-
machine learning algorithms for Bengali fake news detection,” in Lang. Eval. Forum Eur. Lang. Lugano, Switzerland: Springer, 2019,
Proc. 23rd Int. Conf. Comput. Inf. Technol. (ICCIT), Dec. 2020, pp. 301–321.
pp. 1–6. [20] F. Harrag and M. K. Djahli, “Arabic fake news detection: A fact checking
[7] F. A. Ozbay and B. Alatas, “Fake news detection within online social based deep learning approach,” ACM Trans. Asian Low-Resource Lang.
media using supervised artificial intelligence algorithms,” Phys. A, Stat. Inf. Process., vol. 21, no. 4, pp. 1–34, Jul. 2022.
Mech. Appl., vol. 540, Feb. 2020, Art. no. 123174. [21] H. Himdi, G. Weir, F. Assiri, and H. Al-Barhamtoshy, “Arabic fake
[8] L. Singh, “Fake news detection: A comparison between available deep news detection based on textual analysis,” Arabian J. Sci. Eng., vol. 47,
learning techniques in vector space,” in Proc. IEEE 4th Conf. Inf. pp. 10453–10469, Feb. 2022.
Commun. Technol. (CICT), Dec. 2020, pp. 1–4. [22] M. S. H. Ameur and H. Aliane, “AraCOVID19-MFH: Arabic COVID-19
[9] M. B. Yassein, S. Aljawarneh, and Y. Wahsheh, “Hybrid real-time multi-label fake news & hate speech detection dataset,” Proc. Comput.
protection system for online social networks,” Found. Sci., vol. 25, no. 4, Sci., vol. 189, pp. 232–241, Jan. 2021.
pp. 1095–1124, 2019. [23] M. Al-Yahya, H. Al-Khalifa, H. Al-Baity, D. AlSaeed, and A. Essam,
[10] X. Zhang and A. A. Ghorbani, “An overview of online fake news: “Arabic fake news detection: Comparative study of neural networks
Characterization, detection, and discussion,” Inf. Process. Manage., and transformer-based approaches,” Complexity, vol. 2021, pp. 1–10,
vol. 57, no. 2, Mar. 2020, Art. no. 102025. Apr. 2021.
[11] S. Raza and C. Ding, “Fake news detection based on news content and [24] A. R. Mahlous and A. Al-Laith, “Fake news detection in Arabic tweets
social contexts: A transformer-based approach,” Int. J. Data Sci. Anal., during the COVID-19 pandemic,” Int. J. Adv. Comput. Sci. Appl., vol. 12,
vol. 13, pp. 335–362, Jan. 2022. no. 6, pp. 1–10, 2021.
[12] E. Amer, K.-S. Kwak, and S. El-Sappagh, “Context-based fake news [25] A. B. Nassif, A. Elnagar, O. Elgendy, and Y. Afadar, “Fake news
detection model relying on deep learning models,” Electronics, vol. 11, detection in Arabic tweets during the COVID-19 pandemic,” Neural
no. 8, p. 1255, Apr. 2022. Comput. Appl., vol. 12, no. 6, pp. 121–129, Sep. 2021.
[13] A. Bovet and H. A. Makse, “Influence of fake news in Twitter during [26] Y. Wu et al., “Google’s neural machine translation system: Bridging the
the 2016 U.S. Presidential election,” Nature Commun., vol. 10, no. 1, gap between human and machine translation,” 2016, arXiv:1609.08144.
pp. 1–14, Dec. 2019. [27] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving
[14] A. Vaswani et al., “Attention is all you need,” in Proc. Adv. Neural Inf. language understanding with unsupervised learning,” OpenAI, USA,
Process. Syst., vol. 30, 2017, pp. 1–11. Tech. Rep., 2018.
Authorized licensed use limited to: GMR Institute of Technology. Downloaded on August 28,2023 at 07:05:17 UTC from IEEE Xplore. Restrictions apply.