Pavan Final

VIGNAN’ S INSTITUTE OF INFORMATION TECHNOLOGY,VISAKHAPATNAM
Department of MCA
DETECTING FAKE NEWS USING MACHINE LEARNING AND DEEP LEARNING

ALGORITHMS
Under the Esteemed

Guidance of
Mr .K. Leela Prasad , M.TECH
Assistant professor
Presented by
M.Pavan Kumar
18L35F0009
III MCA
VIIT(A)
OUTLINE
 ABSTRACT
 INTRODUCTION
 EXISTING SYSTEM
 PROPOSED SYSTEM
 SYSTEM DESIGN
 SYSTEM REQUIREMENTS
 ALGORITHMS
 IMPLEMENTATION
 CONCLUSION & FUTURE SCOPE
 REFERENCES
ABSTRACT
• Social media interaction especially the news spreading around the network is a
great source of information nowadays.
• Twitter being a standout amongst the most well-known ongoing news sources
additionally ends up a standout amongst the most dominant news radiating
mediums.
• Fake News have great impact on business and commerce.
• This paper proposes a model for recognizing forged news messages from twitter
posts, by figuring out how to anticipate precision appraisals, in view of
computerizing forged news identification in Twitter datasets
INTRODUCTION
• An intense inspection the present tweets demonstrates that false news spreads
frequently through human than a genuine news does . Lie gets traveled around us
quicker, and more extensively than reality in all spheres of information, and the
effects were more dangerous and horrifying. There are several kinds of tweets
like issues on a government, trending topics around the world, mental abuse,
urban legends, occasions in calamities. What's more notifying is that it's not just
bots that is outpouring the majority of the misrepresentations, studies claimed. It
was some specific individuals performing a large share of this crime. Normally
general users, as well, they explained. In this case, verified users and those with
numerous fans were not more often the center in spreading misinformation of the
corrupted posts. Fake news on social media which got viral like a rocket in no
time can cause much havoc to our society human and country.
Scope and objective
• To implement an efficient tool for analysis of fake news system which can use supervised
and semi supervised learning system for better accuracy of results
• The goal of this project is to find the effectiveness and limitations of language-based
techniques for detection of fake news through the use of machine learning algorithms.
• The outcome of this project should determine how much can be achieved in this task by
analyzing patterns contained in the text and blind to outside information about the world.
Existing System
• A number of features have been proposed and evaluated in view of the impact of
features on the accuracy of the classifier.
• Part of the speech (POS) tags, Linguistic inquiry and word count, which in
combination with unigrams and bigrams are used.In the existing work, the system
uses only to semi-supervised learning. Only Text Classification as sentiment text and
it never finds fake news.
Disadvantages :
• In the existing work, the system uses only to semi-supervised learning.
• Only Text Classification as sentiment text and it never finds fake news.
Proposed System
• In the proposed system our main focus was on the feature engineering that if we
could tune-up the features or add some other features the accuracy of detecting
news can be much efficient. From the idea of psychological research on false
news, we find out the word lengths in a tweet statement can be a great feature as
unauthenticated news content a lot of title, words and fictional statements. So, we
added a new feature word length which is actually the count of words in a
tweeted statement without any links, date or any indications.
Advantages:
• The system is very fast and effective due to semi-supervised and

supervised learning.
• Focused on the content of the news based approaches. As feature we

have used count vector ,TF-IDF and N-vector.
System Architecture
SYSTEM REQUIREMENTS
Software Requirements:
• Operating system : - Windows 10
• Language : - Python(IDLE)
Hardware Requirements:
• System : Intel i5
• Hard Disk : 1 TB
• RAM : 4GB
Algorithms Used
• Support Vector Machine
• Naive Bayes
• Logistic Regression
• Random Forest
Support Vector Machine
• A support vector machine (SVM) is a type of deep learning algorithm
that performs supervised learning for classification or regression of data
groups.
• In machine learning, supervised learning systems provide both input and

desired output data, which are labeled for classification. The
classification provides a learning basis for future data processing.
• SVM is one of the most popular Supervised Learning algorithms, which

is used for Classification as well as Regression problems. However,
primarily, it is used for Classification problems in Machine Learning.
• The goal of the SVM algorithm is to create the best line or decision
boundary that can segregate n-dimensional space into classes so that we
can easily put the new data point in the correct category in the future.
Logistic Regression
• Logistic regression is a classification algorithm used to assign
observations to a discrete set of classes. Unlike linear regression which
outputs continuous number values, logistic regression transforms its
output using the logistic function to return a probability value.
• It is a supervised learning classification algorithm used to predict the

probability of a target variable.
• The nature of target or dependent variable is dichotomous, which

means there would be only two possible classes.
Naive Bayes
• Naive Bayes Classifier is one of the simple and most effective

Classification algorithms which helps in building the fast machine
learning models that can make quick predictions.
• Some popular examples of Naïve Bayes Algorithm are spam filtration,

Sentimental analysis, and classifying articles.
• It is used to convert the given dataset into frequency tables.

Random Forest
• Random forest is a supervised learning algorithm which is used
for both classification as well as regression. Random forest
algorithm creates decision trees on data samples and then gets the
prediction from each of them and finally selects the best solution
by means of voting. It is an ensemble method which is better than
a single decision tree because it reduces the over-fitting by
averaging the result.
Implementation
• Count Vector - Count Vector represents a notation in the form of a matrix
data set matrix notation in which corpus document is represented by each row,
each column represents a corpus term, and each cell represents the frequency
count of a particular term in a particular document.
• TF-IDF - TF-IDF represents how frequent a term is in an entire document. It

tries to assign a metric value to represent the presence of that term. This is
widely and frequently utilized in text mining. This weight is a factual measure
used to assess how essential a word is to a report in a gathering or corpus.
• N- Gram- It is a sequence of tokens. In the context of computational

linguistics, these tokens are usually words, though they can be characters or
subsets of characters. The n simply refers to the number of tokens.
Determining output
• Confusion Matrix: A confusion matrix is a table that is often used to describe

the performance of a classification model (or “classifier”) on a set of test data
for which the true values are known. It allows the visualization of the
performance of an algorithm.
Confusion matrix of algorithms
Comparing Accuracy
Conclusion
• We analyzed a computerized model for checking the verification of news

extracted from Twitter which gives general answers for information
accumulation and expository demonstration towards fake news recognition.
• Machine Learning may find a decent outcome on such a critical issue as the
spread of fake news issues worldwide. Accordingly, the aftereffects of this
examination propose much more, that systems like this might come very much
handy and be effectively used to handle this critical issue.
• The dataset in this analysis is relied upon machine learning based statistical
calculations like Support Vector Machines (SVM), Naive Bayes (NB),
Logistic Regression (LR), Random Forest (RF). In this analysis , SVM
performs best for characterization technique.
REFERENCES
1. Conroy, Niall & Rubin, Victoria & Chen, Yimin. (2015). Automatic
Deception Detection: Methods for Finding Fake News. USA
2. Ball, L. & Elworthy, J. J Market Anal (2014) 2: 187.
https://doi.org/10.1057/jma.2014.15
3. Lu TC. Yu T., Chen SH. (2018) Information Manipulation and Web

Credibility. In: Bucciarelli E., Chen SH., Corchado J. (eds) Decision
Economics: In the Tradition of Herbert A. Simon's Heritage. DCAI 2017.
Advances in Intelligent Systems and Computing, vol 618. Springer, Cham
4. Rubin, Victoria & Conroy, Niall & Chen, Yimin & Cornwell, Sarah. (2016).
Fake News or Truth? Using Satirical Cues to Detect Potentially Misleading
News.10.18653/v1/W160802.

Pavan Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pavan Final

Uploaded by

Copyright:

Available Formats

VIGNAN’ S INSTITUTE OF INFORMATION TECHNOLOGY,VISAKHAPATNAM

DETECTING FAKE NEWS USING MACHINE LEARNING AND DEEP LEARNING

Under the Esteemed

• Fake News have great impact on business and commerce.

• In the existing work, the system uses only to semi-supervised learning.

• The system is very fast and effective due to semi-supervised and

• Focused on the content of the news based approaches. As feature we

• Support Vector Machine

• In machine learning, supervised learning systems provide both input and

• SVM is one of the most popular Supervised Learning algorithms, which

• It is a supervised learning classification algorithm used to predict the

• The nature of target or dependent variable is dichotomous, which

• Naive Bayes Classifier is one of the simple and most effective

• Some popular examples of Naïve Bayes Algorithm are spam filtration,

• It is used to convert the given dataset into frequency tables.

• TF-IDF - TF-IDF represents how frequent a term is in an entire document. It

• N- Gram- It is a sequence of tokens. In the context of computational

• Confusion Matrix: A confusion matrix is a table that is often used to describe

• We analyzed a computerized model for checking the verification of news

3. Lu TC. Yu T., Chen SH. (2018) Information Manipulation and Web

You might also like