You are on page 1of 14

Maharana Pratap Engineering College

COMPUTER SCIENCE AND ENGINEERING

Final Year Project

(KCS 071)
POWERPOINT PRESENTATION

FAKE NEWS DETECTION


SUPERVISED BY- PRESENTED BY-
MR. ABHISHEK SINGH SENGAR SHAQUIB RAZA (Asst. Professor, CSE )
SYED SADIQ MEHDI SHAMREZ
KHAN -
SAIF KHAN
ABSTRACT

 Today any individual can post a news verified or not on social media a platforms
like twitter or facebook communities and other microblogging websites.

 The idea this project is to detect the accuracy of the fake news using python and
machine learning algorithms.
MODULES :
 
Hardware : 1) 4GB RAM
2) I3 Processor
3) 500 mb space
 
Software : 1) Anaconda
2) Python
 
Frontend : 1) Command Prompt
 
 
Backend : 1) Python
 
Data Set : Data set is used for this project is
LIAR dataset which contains 3
files with . tsv format for test,
train and validation .
Dataset :
 The data source used for this project is LIAR dataset which contains 3
files with .tsv format for test, train and validation. Below is some
description about the data files used for this project.
 LIAR: A BENCHMARK DATASET FOR FAKE NEWS DETECTION
 William Yang Wang, "Liar, Liar Pants on Fire": A New Benchmark
Dataset for Fake News Detection, to appear in Proceedings of the 55th
Annual Meeting of the Association for Computational Linguistics (ACL
2017), short paper, Vancouver, BC, Canada, July 30-August 4, ACL.
File descriptions :
DataPrep.py : This file contains all the pre processing functions needed to
process all input documents and texts. First we read the
train, test and validation data files then performed some pre processing like
tokenizing, stemming etc. 

FeatureSelection.py : In this file we have performed feature extraction and


selection methods from sci-kit learn python libraries.
For feature selection, we have used methods like simple bag-of-words and n-
grams and then term frequency like tf-tdf weighting.
.Classifier.py : Here we have build all the classifiers for predicting the fake news
detection. The extracted features are fed into different
 classifiers. We have used Naive-bayes, Logistic Regression, Linear SVM,
Stochastic gradient descent and Random forest classifiers from sklearn
. Each of the extracted features were used in all of the classifiers. Once fitting the
model, we compared the f1 score and checked the confusion matrix.
After fitting all the classifiers, 2 best performing models were selected as
candidate models for fake news classification.

Prediction.py : Our finally selected and best performing classifier was Logistic


Regression which was then saved on disk with name final_model.sav.
Once you close this repository, this model will be copied to user's machine and
will be used by prediction.py file to classify the fake news.
 It takes an news article as input from user then model is used for final
classification output that is shown to user along with probability of truth.
PROJECT FLOW
 Put in the entire file structure and the system diagram here At first we start by selecting an data-set
that can be used for detecting fake news. We have selected the LIAR data-set that contains 13
columns .We are using the first two columns for our testing phase. The data-set is split into the
training and testing for further classification of news. The data-set has the first column as the
news headline and the second column as the label class. Then the data is sent for pre-processing
like data stemming , tokenizing and checking for null values. Further we perform extraction and
feature selection using python libraries along with algorithms like “simple bag of words” , “ngram
“ and then tf-idf(term frequency–inverse document frequency). We could use POS tagging and
word2vec for further extraction of features. Further we build classifiers for predicting fake news .
All of the features extracted above are fed into classifiers and then we use sklearn from python to
implement ,compare the models that we have selected. The models selected are Naive-Bayes,
Logistic Regression, Linear SVM, Stochastic gradient decent and Random forest classifiers. Once
we have fitted the models we compare the f1 scores of each model and then check the confusion
matrix. The last part would be selecting the final model which would be the best performing in all
of the models which in our case would be Logistic Regression as it has the best f1 score .
Conclusion

 The paper has shown the implementation of five algorithms in


accordance with n-grams and tf-idf for the purpose of finding fake
news. By doing the above experiment and coding all of the classifiers
we have found out that random forest with n-grams along with logistic
regression with n-grams has a similar performance measure. We can
validate this with the numbers of F1 scores and the confusion matrix’s
that we have generated for each model. We have selected the the
logistic regression with n - grams model. We save this model and use it
for prediction .

You might also like