FAKE NEWS DETECTION
BY
• ADARSH LENIN
• ATHUL P
• BIMAL MURALI
• NIDHIN PHILIP ALEX
INTRODUCTION
• What is Fake News?
• A type of yellow journalism, fake news encapsulates pieces of news that may be hoaxes and is generally
spread through social media and other online media. This is often done to further or impose certain
ideas and is often achieved with political agendas. Such news items may contain false and/or
exaggerated claims, and may end up being viralized by algorithms, and users may end up in a filter
bubble.
• Fake News Detection in Python
• In this project, we have used various natural language processing techniques and machine learning
algorithms to classify fake news articles using sci-kit libraries from python.
FLOWCHART
PREREQUISITES
• PYTHON
• FLASK
• HTML
• CSS
FLASK
• Flask is a web framework, it’s a Python module that lets you develop web
applications easily. It’s has a small and easy-to-extend core: it’s a
microframework that doesn’t include an ORM (Object Relational Manager) or
such features.
• It does have many cool features like url routing, template engine. It is a WSGI
web app framework.
PACKAGES
NUMPY
NumPy, which stands for Numerical Python, is a library consisting of
multidimensional array objects and a collection of routines for
processing those arrays. Using NumPy, mathematical and logical
operations on arrays can be performed
PANDAS
Pandas is an open source Python package that is most widely used
for data science/data analysis and machine learning tasks. It is built
on top of another package named Numpy, which provides support
for multi-dimensional [Link] for crerating and storing data
frames
REGULAR EXPRESSION
Regular Expression, is a sequence of characters that
forms a search [Link] can be used to check if a
string contains the specified search pattern.
STOPWORDS
The stopwords in “nltk” library are the most common words in data.
They are words that you do not want to use to describe the topic of your
content. Words that doesn’t add much value to a paragraph
PORTERSTEMMER
The Porter stemming algorithm (or 'Porter stemmer') is a process for
removing the commoner morphological and inflexional endings from words
in English. It gives root word for a particular word
TFIDFVECTORIZER
Term frequency-inverse document frequency is a text vectorizer
that transforms the text into a usable vector. It combines 2
concepts, Term Frequency (TF) and Document Frequency (DF). The
term frequency is the number of occurrences of a specific term in
a document.
TRAIN AND SPLIT
The train-test split is used to estimate the performance of machine
learning algorithms that are applicable for prediction-based
Algorithms/Applications. This method is a fast and easy procedure to
perform such that we can compare our own machine learning model
results to machine results.
LOGISTIC REGRESSION
Logistic Regression is a Machine Learning classification algorithm that is
used to predict the probability of a categorical dependent variable. In
logistic regression, the dependent variable is a binary variable that
contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.).
ACCURACY SCORE
The accuracy_score method is used to calculate the accuracy of either the
faction or count of correct prediction in Python Scikit learn. Mathematically
it represents the ratio of the sum of true positives and true negatives out of
all the predictions