You are on page 1of 4

FAKE NEWS DETECTOR

Muhammad Hassan Ur Rehman, Muhammad Huzaifa, Sufyan A. Siddiqui, Taber Bin Zameer

Abstract--Fake news is propaganda or manipulated II. LITERATURE REVIEW


news that is spread across the internet with an
objective to damage a person, agency and Author in [1] used a dataset from Kaggle to predict their
organization. Most of fake news systems use the accuracy of detecting fake news. He tested various
linguistic feature of the news. First, we have to take models like Naïve Bayes, Convolutional Neural
dataset which contain both fake and real news and Networks (CNN), Long Short-Term Memory (LSTM),
conduct various test to make a fake news detector. K nearest neighbor and Random Forest. After, testing
We use the techniques of Machine learning and deep these models for accuracy it was found that CNN model
learning to classify the datasets. This has also drawn was the most accurate with 98.3% accuracy.
people interest from researchers around the globe to
The authors in [2] incorporated sentimental analysis to
work on deception detection mechanisms to
further enhance the accuracy of currently existing
eradicate the problem. The goal is to introduce a
dataset. A merged dataset was formed from 3 existing
mechanism that automatic, vigorous, trustworthy,
datasets in order to test the accuracy. It was found that
efficient and accurate. We take a dataset that contain
tf-idf was the best text processing technique.
both real and fake news and from that we make a
Furthermore, the accuracy between Naïve Bayes and
classifier that can make decision about fake and real
Random Forest model was compared and it was found
news. We use two different models Naive Bayes,
that Random Forest was more accurate.
Passive Aggressive Classifier and LSTM. The best
performing model was LSTM with the most perfect Author in [3] prepared a Fact Database. This fact
predictions. database contained articles related to various topics. The
input was checked in the Fact Database and the relevant
I. INTRODUCTION
article was picked. Then they used Bidirectional Multi-
Fake news, by the name it is very small and clearly not Perspective Matching (BiMPM) to find if the given
understandable but in today’s world where we are proposition was in the context of the article. The result
surrounded by thousands of lies due to which many was then based on this calculation.
conflict and rumors arises. This is the area where we
Author in [4] used a dataset from Buzzfeed News which
believe we must proceed very carefully though.
contained Facebook posts from three major news
Identifying the truth is complicated. We have developed
sources. The posts were labelled by the BuzzFeed News
a project that can find out if the news is true or false.
as true, false, mostly true, mostly false and mixture of
Fake news is becoming threat to the society. It is
true and false. The dataset was then trained and tested
generated to gather the attraction of viewers and for
using naïve bayes classifier. Using this approach, they
advertising purposes. We have developed fake news
got accuracy of about 74.
detector to stop people from lies because from lies hate
is prevailing around the globe. Even though the III. METHODOLOGY
difficulty of fake news is not another debate, revealing
false news is accepted to be a confused requirement a) Platform and Technologies:
hardened that people incline to acknowledge tempting
guideline and the absence of control of the show of • Jupyter Notebook
deceiving substance. • Flask
• Docker

1
b) Dataset: works by taking an example, learning from it and then
throwing it away.
In order to detect the fake or real news, first we need
actual real or fake data to build our that will detect the
news. We use pre-build data of news labelled fake/real
from Kaggle.

c) Preprocessing:

In this phase, we preprocess the data by removing


spaces, characters, punctuations and by lowering the
data to make it simple so that it can be classified.
Confusion Matrix using tf-idf vectorizer
d) Model/Training:

Now selection of model takes place according to


classification. Here we use two models “Naive Bayes”
and “Passive Aggressive Classifier”.

e) Naïve Bayes:

This is one of the simplest approaches to classification,


in which a probabilistic approach is used, with the
assumption that all features are conditionally Confusion Matrix using count vectorizer
independent given the class label.
g) Testing:

Finally, the testing process comes, in which we test the


fake/real news. Here we take user input and the model
predict the news upon the nature of the words used.

The process is also demonstrated in the diagram below

Data flow diagram:

Confusion Matrix using tf-idf vectorizer

Confusion Matrix using count vectorizer

f) Passive Aggressive Classifier:

The Passive Aggressive Algorithm is an online


algorithm; ideal for classifying massive streams of data
(e.g. twitter). It is easy to implement and very fast. It

2
IV. CASE STUDY ACKNOWLEDGMENT

The interface choice and the interaction with the This project was not have been possible without the
Application is as straight forward as possible without contribution of all the members, especially Taber and
anything fancy going on. The user is simply provided Hassan (students of NED University) with their
with the three textboxes. assistance and instructions throughout the project.

We are also grateful to Ms. Zainab (lecturer at NED


University) for motivating us to work hard on this
project for research purposes and for helping us find
resources.

REFERENCES

[1] R. K. Kaliyar, "Fake News Detection Using A


Deep Neural Network," in 2018 4th International
Conference on Computing Communication and
a) URL: Automation (ICCCA), 2018.
This textbox will be filled with the URL of the news
[2] B. Bhutani, N. Rastogi, P. Sehgal and A. Purwar,
article that you want to analyze. Users can provide any
"Fake News Detection Using Sentiment Analysis".
source of news and it is not limited to just news sites.

b) Title: [3] K.-h. Kim and C.-s. Jeong, "Fake News Detection
System using Article Abstraction".
As the news title is playing a vital part of our training
dataset, we also require the user to enter the title of the [4] M. Granik and V. Mesyura, "Fake News Detection
said news. Using Naive Bayes Classifier," in IEEE First
Ukraine Conference on Electrical and Computer
c) Content: Engineering (UKRCON), 2017.
This is the important part of the input. This will provide
[5] J. Straub, Gurmeet, N. Snell and T. Traylor,
us with the text that we want to proceed and carry out
"Classifying Fake News Articles Using Natural
our classification on.
Language Processing to Identify In-Article
On pressing the Analyze button, it will show you the Attribution as a Supervised Learning Estimator,"
results. in IEEE 13th International Conference on
Semantic Computing (ICSC), 2019.
V. CONCLUSION:
BIOGRAPHY
Projects like fake news are very rare because fake news
is contending bone for all of us. It is not important that Muhammad Huzaifa Shuja (SE-093):
the project we made is 100 percent accurate but we can
say that it’s a small contribution to the world from our Muhammad Huzaifa Shuja is an undergraduate student
side. We face many problems while making this project of Software Engineering Department of NED University
because it requires a complete dataset for real and fake of Engineering. He has his interests in electronics and
news. We can say that it is unimplemented project especially Arduino. In this project he made a significant
because it is not available for our use on the internet but contribution in making the frontend of the Application.
only a small-scale project is available of this type. We
Sufyan Ahmed Siddiqui (SE-060):
think it would be a great contribution from our side if we
succeed in making a complete practical version of it Sufyan Ahmed Siddiqui is an undergraduate student of
Software Engineering Department of NED University of

3
Engineering. He has his certain interests in
programming on C++ and networking. In this project his
most notable work was to develop the front-end with
Huzaifa and connect it with the backend through flask.

Taber Bin Zameer (SE-082):

Taber Bin Zameer is an undergraduate student of


Software Engineering Department of NED University of
Engineering. He has his interests in Machine Learning
and data science. He worked with testing and training
different models.

Muhammad Hassan Ur Rehman (SE-062):

Muhammad Hassan Ur Rehman is an undergraduate


student of Software Engineering Department of NED
University of Engineering. He is a Java Developer and
also has experience working in the field of Artificial
Intelligence. In this project he worked with Taber to get
test and train select the best model for prediction.

You might also like