You are on page 1of 6

10 IV April 2022

https://doi.org/10.22214/ijraset.2022.41124
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com

Fake News Detection using Machine Learning


Vookanti Anurag Reddy1, CH. Vamsidhar Reddy2, Dr. R. Lakshminarayanan3
1, 2
SRM Institute of Science and Technology, Kattankulathur
3
Assistant Professor, SRM Institute of Science and Technology, Kattankulathur

Abstract: This Project comes up with the applications of NLP (Natural Language Processing) techniques for detecting the 'fake
news', that is, misleading news stories that comes from the non-reputable sources. Only by building a model based on a count
vectorizer (using word tallies) or a (Term Frequency Inverse Document Frequency) tfidf matrix, (word tallies relative to how
often they’re used in other articles in your dataset) can only get you so far. But these models do not consider the important
qualities like word ordering and context. It is very possible that two articles that are similar in their word count will be
completely different in their meaning. The data science community has responded by taking actions against the problem. There
is a Kaggle competition called as the “Fake News Challenge” and Facebook is employing AI to filter fake news stories out of
users’ feeds. Combatting the fake news is a classic text classification project with a straight forward proposition. Is it possible for
you to build a model that can differentiate between “Real “news and “Fake” news? So a proposed work on assembling a dataset
of both fake and real news and employ a Naive Bayes classifier in order to create a model to classify an article into fake or real
based on its words and phrases.
I. INTRODUCTION
These days’ fake news is creating different issues from sarcastic articles to a fabricated news and plan government propaganda in
some outlets. Fake news and lack of trust in the media are growing problems with huge ramifications in our society. Obviously, a
purposely misleading story is “fake news “ but lately blathering social media’s discourse is changing its definition. Some of them
now use the term to dismiss the facts counter to their preferred viewpoints. The importance of disinformation within American
political discourse was the subject of weighty attention , particularly following the American president election . The term 'fake
news' became common parlance for the issue, particularly to describe factually incorrect and misleading articles published mostly
for the purpose of making money through page views. In this paper,it is seeked to produce a model that can accurately predict the
likelihood that a given article is fake news. Facebook has been at the epicenter of much critique following media attention. They
have already implemented a feature to flag fake news on the site when a user sees’s it ; they have also said publicly they are working
on to to distinguish these articles inan automated way. Certainly, it is not an easy task. A given algorithm must be politically
unbiased – since fake news exists on both ends of the spectrum – and also give equal balance to legitimate news sources on either
end of the spectrum. In addition, the question of legitimacy is a difficult one. However, in order to solve this problem, it
is necessary to have an understanding on what Fake News is. Later, it is needed to look into how the techniques in the fields of
machine learning, natural language processing help us to detect fake news.

II. WORKING
A. Existing System
There exists a large body of research on the topic of machine learning methods for deception detection, most of it has been focusing
on classifying online reviews and publicly available social media posts. Particularly since late 2016 during the American
Presidential election, the question of determining 'fake news' has also been the subject of particular attention within the
literature. Conroy, Rubin, and Chen outlines several approaches that seem promising towards the aim of perfectly classify the
misleading articles. They note that simple content-related n-grams and shallow parts-of-speech (POS) tagging have proven
insufficient for the classification task, often failing to account for important context information. Rather, these methods have been
shown useful only in tandem with more complex methods of analysis. Deep Syntax analysis using Probabilistic Context Free
Grammars (PCFG) have been shown to be particularly valuable in combination with n-gram methods. Feng, Banerjee, and Choi are
able to achieve 85%-91% accuracy in deception related classification tasks using online review corpora. Feng and First
implemented a semantic analysis looking at ‘object: descriptor’ pairs for contradictions with the text on top of Feng’s initial deep
syntax model for additional improvement. Rubin, Lukoianova and Tatiana analyze rhetorical structure using a vector space model
with similar success. Ciampaglia et al. employ language pattern similarity networks requiring a pre-existing knowledge base.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 227
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com

B. Disadvantages
1) It is not possible to find whether the given data is Real or Fake.
2) Fake data will be increases.

C. Proposed System
In this paper a model is build based on the count vectorizer or a tfidf matrix ( i.e ) word tallies relatives to how often they are used in
other artices in your dataset ) can help . Since this problem is a kind of text classification, Implementing a Naive Bayes classifier
will be best as this is standard for text-based processing. The actual goal is in developing a model which was the text transformation
(count vectorizer vs tfidf vectorizer) and choosing which type of text to use (headlines vs full text). Now the next step is to extract
the most optimal features for count vectorizer or tfidf-vectorizer, this is done by using a n-number of the most used words, and/or
phrases, lower casing or not, mainly removing the stop words which are common words such as “the”, “when”, and “there” and only
using those words that appear at least a given number of times in a given text dataset.

D. Flow Of The Project

E. Advantages
1) It is possible to find whether the given data is Real or Fake.
2) Fake data will be decreases.

III. RESULTS & DISCUSSIONS

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 228
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com

IV. CONCLUSION
With an increasing focus of academic researchers and practitioners alike on the detection of online misinformation, the current
investigation allows for two key conclusions. First, computational linguistics can aide in the process of identifying fake news in an
automated manner well above the chance level. The proposed linguistics-driven approach suggests that to differentiate between
fake and genuine content it is worthwhile to look at the lexical, syntactic and semantic level of a news item in question. The
developed system’s performance is comparable to that of humans in this task, with an accuracy up to 76%. Nevertheless, while
linguistics features seem promising, we argue that future efforts on misinformation detection should not be limited to these and
should also include meta features (e.g., number of links to and from an article, comments on the article), features from different
modalities (e.g., the visual makeup of a website using computer vision approaches), and embrace the increasing potential of
computational approaches to fact verification (Thorne et al., 2018). Thus, 3400 future work might want to explore how hybrid
decision models consisting of both fact verification and data-driven machine learning judgments can be integrated. Second, we
showed that it is possible to build resources for the fake news detection task by combining manual and crow sourced annotation
approaches. Our paper presented the development of two datasets using these strategies and showed tha wort they exhibit
linguistic properties related to deceptive content. Furthermore, different from other available fake news datasets, our dataset
consists of actual news excerpts, instead of short statements containing fake news information. Finally, with the current
investigation and dataset, we encourage the research community and practitioners to take on the challenge of tackling
misinformation.

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 229
International Journal for Research in Applied Science & Engineering Technology (IJRASET)
ISSN: 2321-9653; IC Value: 45.98; SJ Impact Factor: 7.538
Volume 10 Issue IV Apr 2022- Available at www.ijraset.com

REFERENCES
[1] Majed AlRubaian, Muhammad Al-Qurishi, Mabrook Al-Rakhami, Sk Md Mizanur Rahman, and Atif Alamri. 2015. A Multistage Credibility Analysis Model
for Microblogs. In Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015 (ASONAM '15),
Jian Pei, Fabrizio Silvestri, and Jie Tang (Eds.). ACM, New York, NY, USA, 1434-1440. DOI: http://dx.doi.org/10.1145/2808797.2810065
[2] Majed Alrubaian, Muhammad Al-Qurishi, A Credibility Analysis System for Assessing Information on Twitter, IEEE Transactions on Dependable and Secure
Computing, 1-14. DOI : http://dx.doi.org/ 10.1109/TDSC.2016.2602338
[3] Manish Gupta, Peixiang Zhao, Jiawei Han, 2012. Evaluating Event Credibility on Twitter, Proceedings of the 2012 SIAM International Conference on Data
Mining, 153-164 , DOI: http://epubs.siam.org/doi/abs/10.1137/ 1.9781611972825.14
[4] Krzysztof Lorek, Jacek Suehiro-Wici´nski, Micha l Jankowski-Lorek, Amit Gupta, Automated credibility assessment on twitter, Computer Science, 2015,
Vol.16(2), 157-168, DOI: http://dx.doi.org/10.7494/csci. 2015.16.2.157
[5] Ballouli, Rim El, Wassim El-Hajj, Ahmad Ghandour, Shady Elbassuoni, Hazem M. Hajj and Khaled Bashir Shaban. 2017. “CAT: Credibility Analysis of
Arabic Content on Twitter.” WANLP@EACL (2017).
[6] Alina Campan, Alfredo Cuzzocrea, Traian Marius Truta, 2017. Fighting Fake News Spread in Online Social Networks: Actual Trends and Future Research
Directions, IEEE International Conference on Big Data (BIGDATA), 4453-4457
[7] Carlos Castillo, Marcelo Mendoza, and Barbara Poblete. 2011. Information credibility on twitter. In Proceedings of the 20th international conference on World
wide web (WWW '11). ACM, New York, NY, USA, 675-684. DOI: http://dx.doi.org/10.1145/1963405. 1963500
[8] Muhammad Abdul-Mageed, Mona Diab, Sandra Kübler,2014. SAMAR: Subjectivity and sentiment analysis for Arabic social media, Computer Speech &
Language, Elsevier BV, ISSN: 0885-2308, V.28, Issue: 1, 20- 37, DOI: http://dx.doi.org/10.1016/j.csl.2013.03.001
[9] Niall J. Conroy, Victoria L. Rubin, and Yimin Chen. 2015. Automatic deception detection: methods for finding fake news. In Proceedings of the 78th ASIS&T
Annual Meeting: Information Science with Impact: Research in and for the Community (ASIST '15). American Society for Information Science, Silver Springs,
MD, USA, , Article 82 , 4 pages.
[10] Granik, M., & Mesyura, V. 2017. Fake news detection using naive Bayes classifier. 2017 IEEE First Ukraine Conference on Electrical and Computer
Engineering (UKRCON), 900-903.
[11] Hertz, J., Palmer, R.G., Krogh. A.S. 1990 Introduction to the theory of neural computation, Perseus Books. ISBN 0-201-51560-1
[12] Thandar M., Usanavasin S. 2015 Measuring Opinion Credibility in Twitter. In: Unger H., Meesad P., Boonkrong S. (eds) Recent Advances in Information and
Communication Technology 2015. Advances in Intelligent Systems and Computing, vol 361. Springer, Cham
[13] Aker, A., Bontcheva, K., Liakata, M., Procter, R., & Zubiaga, A. 2017. Detection and Resolution of Rumours in Social Media: A Survey. CoRR,
abs/1704.00656.
[14] W. Vorhies, Using algorithms to detect fake news - The state of the art, 2017, [online] Available: http://www.datasciencecentral.com/profiles/ blogs/using-
algorithms-to-detect-fake-news-the-state-of-the-art.
[15] Ehsanfar, Abbas and Mo Mansouri. 2017. “Incentivizing the dissemination of truth versus fake news in social networks.” 2017 12th System of Systems
Engineering Conference (SoSE), 1-6.
[16] Hal Berghel. 2017. Alt-News and Post-Truths in the "Fake News" Era. Computer 50, 4 (April 2017), 110-114. DOI: https://doi.org/10.1109/ MC.2017.104
[17] Buntain, C., & Golbeck, J. 2017. Automatically Identifying Fake News in Popular Twitter Threads. 2017 IEEE International Conference on Smart Cloud
(SmartCloud), 208-215

©IJRASET: All Rights are Reserved | SJ Impact Factor 7.538 | ISRA Journal Impact Factor 7.894 | 230

You might also like