You are on page 1of 20

Partial Fulfillment of Degree

A
SEMINAR REPORT
ON
“Fake News Detection Using Machine
Learning”
By
Ms. Dhas Prietysudha Machhindranath(PRN No: 2141018)

T.Y. BTech (Computer Engineering), Sem-VI

Department of Computer Engineering

Government College of Engineering,


Jalgaon - 425001
[An Autonomous Institute of Government of Maharashtra]
[2023-24]
A
Seminar Report
On
“Fake News Detection Using Machine
Learning”
In partial fulfilment of requirements for the degree of
Bachelor of Technology
In
Computer Engineering
Submitted By:

Ms. Dhas Prietysudha Machhindranath (PRN: 2141018)


Guided by:

Prof. D.V. Chaudhari

Department of Computer Engineering


Government College of Engineering,
Jalgaon - 425001
[An Autonomous Institute of Government of Maharashtra]
[2023-24]
Department of Computer Engineering
Government College of Engineering,
Jalgaon

CERTIFICATE

This is to certify that the seminar entitled “Fake News Detection Using Machine
Learning” has been carried out by Ms. Dhas Prietysudha Machhindranath (PRN:
2141018) for Subject Seminar (CO361U) under the guidance of Prof.
D.V.Chaudhari in partial fulfilment of the requirements for the degree of
Bachelor of Technology in Computer Engineering of Government College of
Engineering, Jalgaon during the academic year 2023-24(Semester-VI).
Date:
Place: Jalgaon

Seminar Guide Seminar Coordinator


Prof. D.V.Chaudhari Prof. T.K.Gawali

HOD Principal
Prof. D.V.Chaudhari Dr. Suhas.S.Gajre
ACKNOWLEDGEMENT

At the outset, I wish to express my deep sense of gratitude to my guide


Prof.D.V.Chaudhari, Head of the Department of Computer Engineering
for the grateful effects to inoculate in me and their directions
encouraged me to complete this seminar report only due to their
illuminating supervision, it become possible for me to study the whole
process in detail.

My sincere thanks to My friends. I am very thankful to them for inspiring


me to ensure and retain the quality of the seminar report.
ABSTRACT

We all use multiple platforms of social media such as Facebook,


WhatSapp, Instagram, Twitter, etc., for sharing information and
media. Due to this sharing of media and information in large
numbers there are usually many fake news or rumours which come
along with it and we cannot tell the difference between the fake
news and the real one. This project focuses on detecting the fake
news from real and methods we have used to do so. In this project,
we will implement few of machine learning techniques like Random
Forest classifier, Logistic regression, Decision tree. These techniques
will help in providing good and accurate result.
CHAPTER 1
INTRODUCTION

People are spending more and more time interacting on social media,
as the wide adoption of smartphones makes their access available
almost anytime and anywhere, which is not the case with traditional
media. In addition, they facilitate interaction with friends, families,
and even strangers through the comment chains, be it through
comments, discussions, or simply like and dislike buttons.
Unlike rumours, fake news is, by definition, always false and, thus, can
be seen as a type of disinformation. Therefore, credible and reliable
sources of information are needed so that the public does not fall prey
to the intentions of those interested in manipulating reality.

1.1 DEFINATION
Fake news detection refers to the process of identifying and
distinguishing misinformation, disinformation, or fabricated content from
genuine and accurate information. It involves employing various
techniques, such as content analysis, source verification, user
engagement analysis, and machine learning algorithms, to assess the
credibility, authenticity, and reliability of news articles, posts, or
multimedia content.

Key points:-
1. Identification of Misinformation
2. Preservation of Trust
3. Limitation of Spread
4. Protection of Public Discourse.
5. Prevention of Harm
6. Enhancement of Media Literacy
7. Support for Fact-Checking
8. Informing Policy
9. Algorithmic Improvement
1.2 APPLICATION

1.Social Media Platforms: Social media companies can implement fake news
detection algorithms to flag or remove false information from their platforms.
They can also provide users with tools to report suspicious content.

2.News Organizations: News outlets can use fake news detection techniques to
verify the authenticity of information before publishing or sharing it. Fact-
checking teams within newsrooms can employ these methods to ensure the
accuracy of their reporting.

3. Educational Institutions: Schools and universities can incorporate lessons on


media literacy and fake news detection into their curricula. Teaching students
how to critically evaluate information can empower them to navigate the
digital landscape more effectively.

4.Fact-Checking Organizations: Fact-checking organizations play a crucial role


in debunking false claims and verifying the accuracy of information. They can
leverage fake news detection tools and methodologies to support their
investigative work.

5.Research and Development: Ongoing research and development efforts are


essential for advancing fake news detection techniques. Researchers can
explore new algorithms, datasets, and methodologies to improve the
effectiveness of detection systems.

6.International Collaboration: Fake news is a global problem that requires


international cooperation. Governments, organizations, ad researchers from
different countries can collaborate to share resources, best practices, and
expertise in fake news detection.
1.3 FUTURE SCOPE
The future scope of fake news detection is vast and promising, driven by
technological advancements, evolving media landscapes, and the increasing
sophistication of misinformation tactics. Here are some key areas of future
development.

1. Advanced Machine Learning Techniques


2. Multimodal Analysis.

3. Real-Time Detection.
4. User-Centric Approaches.
5. Cross-Platform Integration
6. Privacy-Preserving Techniques.
7. Public Awareness and Education.

How your topic may suit?


The technology can support us to find out the following purposes
1. Providing Information and Resources
2. Assisting with Research and Development
3. Educational Support
4. Improving Detection Systems
5. Analyzing Social Media Content
6. Addressing Concerns and Misinformation
CHAPTER 2
METHODOLOGY

To detect fake news using machine learning, the process typically


begins with gathering a dataset consisting of labelled news articles,
distinguishing between fake and genuine sources. Following this, the
collected text data undergoes
Pre-processing, involving cleaning and tokenization to prepare it for
analysis. Here are some steps to follow:-

2.1. Data Collection:-


Data collection means pooling data by scraping, capturing, and loading it from
multiple sources, including offline and online sources. Furthermore, all
datasets have flaws. This is why data preparation is so crucial in the machine
learning process.
In a word, data preparation is a series of processes for making your dataset
more machine learning-friendly. In a broader sense, data preparation also
entails determining the best data collection mechanism.
Data collection allows you to capture a record of past events so that we can
use data analysis to find recurring patterns. From those patterns, you build
predictive models using machine learning algorithms that look for trends and
predict future changes.
Types of data collected

1. Audio Collection:-Automatic speech recognition technologies must


be trained with multilingual audio data of various types and associated with
different scenarios, to help machines recognize the intents and nuances of
human speech.

2. Image & Video Collection:-Computer vision systems and other AI


solutions that analyse visual content need to account for a wide variety of

scenarios. Large volumes of high-resolution images and videos that are

accurately annotated provide the training data that is necessary for the

computer to recognize images with the same level of accuracy as a human.

2.2. Data Pre-processing:-


In the realm of fake news detection, data pre-processing serves as a crucial
preliminary step aimed at refining raw text data into a structured and
analysable format.
This process begins with the cleansing of the text, where extraneous elements
such as HTML tags, punctuation, and special characters are removed to ensure
data integrity.
Stop word removal follows, eliminating common words that carry little
semantic meaning, thus reducing noise in the data. Additionally, stemming and
lemmatization techniques are applied to reduce words to their base forms,
enhancing the efficiency of subsequent analyses. Special cases, such as URLs or
numeric values, are also addressed appropriately.
Lastly, imbalanced data, a common challenge in this domain, is managed
through techniques like oversampling or under sampling to ensure equitable
representation of fake and genuine news articles.
Through meticulous pre-processing, raw text data is refined into a
standardized and refined form conducive to effective analysis and model
training for fake news detection.

2.3. Feature Extraction:-


Feature extraction in machine learning for fake news detection involves
identifying and extracting relevant characteristics or attributes from the text
data that can be used to distinguish between genuine and fake news articles.

1. Word Frequency: One common approach is to represent each news


article as a vector of word frequencies, where each dimension corresponds to a
unique word in the vocabulary, and the value represents the frequency of that
word in the article.
2. TF-IDF Scores: Term Frequency-Inverse Document Frequency (TF-IDF) is
another popular method for feature extraction. It considers not only the
frequency of a word in a document but also its rarity across all documents.
Words with high TF-IDF scores in fake news articles may be considered
important features.
3. N-grams: N-grams are contiguous sequences of N words in a text
document. Extracting Engrams as features captures not only individual words
but also sequences of words, which can provide additional context and
information for distinguishing between genuine and fake news.
4. Word Embedding’s: Word embedding’s are dense vector representations
of words in a continuous vector space, learned from large text corpora. They
capture semantic relationships between words and can be used as features in
machine learning models for fake news detection.
5. Linguistic Features: Linguistic features such as sentence length, readability
scores, sentiment analysis, and grammatical structures can also be extracted
from text data and used as features for distinguishing between genuine and fake
news.
6. Source Reputation: In addition to text-based features, features related to
the source of the news article, such as website reputation, author credibility,
and domain characteristics, can also be extracted and incorporated into the
feature set for fake news detection.
2.4. Model Selection:-
The goal is to select a model that effectively captures the underlying
patterns and characteristics of

1. Nature of the Data: Understanding the characteristics of the dataset is


crucial. Factors such as data size, dimensionality, imbalance between classes,
and noise levels can influence the choice of model.

2. Complexity of the Task: The complexity of the fake news detection task can
vary depending on factors such as the diversity of news sources, the subtlety
of fake news, and the presence of misleading information. More complex
tasks may require models with greater capacity to learn intricate patterns.

3. Interpretability: Some models, such as logistic regression or decision trees,


offer straightforward interpretations of their predictions, making them
suitable for scenarios where explain ability is important. In contrast, deep
learning models may provide superior performance but are often considered
black-box models.

4. Performance Metrics: The selection of an appropriate evaluation metric


plays a role in model selection. Metrics such as accuracy, precision, recall, F1
score, and area under the ROC curve (AUC-ROC) provide insights into
different aspects of model performance and can guide the selection process.

5. Computational Resources: Models vary in terms of computational


complexity and resource requirements. Deep learning models, for example,
may require significant computational resources for training and inference
compared to simpler models like logistic regression.
2.5. Evaluation:-
Evaluation is a critical step in assessing the performance of a fake news
detection system. It involves measuring how well the system identifies and
distinguishes between genuine and fake news articles. Here's an explanation of
evaluation in this context:

1. Metrics: Various evaluation metrics are used to assess the performance


of fake news detection systems.
These metrics include: -
Accuracy: The proportion of correctly classified instances out of the total
instances. Precision: The ratio of true positive predictions to the total number
of positive predictions, indicating the system's ability to correctly identify fake
news.
Recall: The ratio of true positive predictions to the total number of actual
positive instances, measuring the system's ability to capture all instances of fake
news.

2. Cross-Validation: Cross-validation techniques, such as k-fold cross-


validation, are commonly used to evaluate the generalization performance
of fake news detection models. In k-fold cross-validation, the dataset is
divided into k subsets, and the model is trained and tested k times, with each
subset used as the test set once.

3. Train-Test Split: In a train-test split evaluation, the dataset is divided into


training and testing sets. The model is trained on the training set and
evaluated on the separate testing set to assess its performance on unseen
data.
4. Confusion Matrix: A confusion matrix provides a detailed breakdown of
the model's predictions, showing the true positive, true negative, false
positive, and false negative predictions. It is useful for understanding the
types of errors made by the model.

5. Bias and Fairness: Evaluation should also consider potential biases in the
dataset and the model's predictions. It is essential to assess whether the
model performs consistently across different demographic groups and does
not exhibit discriminatory behaviour.

6. Qualitative Assessment: Qualitative assessment involves examining


misclassified instances and understanding the reasons behind
mispredictions. It can provide insights into areas for improvement and guide
future model iterations.

Hence, this are some steps to follow will detecting the fake news
CHAPTER 3
WORKFLOW

1. Data Collection.
2. Data Pre-processing
3. Feature.
4. Model Selection
5. Model Training.
6. Evaluation.
7. Fine-Tuning and Optimization
8. Deployment.
9. Monitoring and Maintenance
CHAPTER 4
ALGORITHM
1. Data Collection: Gather a large dataset of labelled news articles, with each
article marked as either "fake" or "real".

2. Pre-processing: Clean and pre-process the text data by:


Tokenizing words and phrases
Removing stop words and punctuation
Stemming or Lemmatizing words
Converting to lowercase
Removing special characters and digits

3. Feature Extraction: Convert the text data into numerical features that can be
used by machine learning algorithms. Some common techniques include:
Bag-of-Words (Bow): represents each article as a frequency distribution
of words
Term Frequency-Inverse Document Frequency (TF-IDF): weights word
frequencies by their importance across the dataset
Word Embedding’s (e.g., Word2Vec, Glove): represents words as vectors in a
high dimensional space

4. Model Selection: Choose a suitable machine learning algorithm based on the


dataset and features. Common choices include:
Supervised learning algorithms like:
+ Logistic Regression
+ Decision Trees
+ Random Forest
+ Support Vector
Machines (SVM) Deep learning
algorithms like:
+ Convolutional Neural Networks (CNN)
+ Recurrent Neural Networks (RNN)
+ Long Short-Term Memory (LSTM) networks

5. Model Training: Train the selected model on the labelled dataset, using a portion
of the data for training and the rest for validation.

6. Model Evaluation: Evaluate the performance of the trained model on a separate


test dataset, using metrics like:
Accuracy
Precision
Recall
F1-score
ROC-AUC score

7. Deployment: Deploy the trained model in a production environment, where it


can be used to classify new, unseen news articles as "fake" or "real".

Some popular machine learning libraries for fake news detection include:

scikit-learn (Python)

TensorFlow (Python)
NLTK (Python)
spaCy (Python)
CHAPTER 5
CONCLUSION
The fundamental purpose of this study is to mitigate one of social media’s
drawbacks: the rapid distribution of fake news, which frequently
misinforms people, produces false impressions and harms society. In recent
years, an increasing number of methods for automatically detecting false
news have been presented in the literature. The datasets and a set of
explicit classes are two critical aspects that have a substantial influence on
the accuracy of the existing models. Therefore, this study proposed several
models for the detection of fake news which have been highlighted as
follows.

• The study of fake news detection was carried out using the ISOT and LIAR
datasets with real and fake news.
• Initially, the proposed technique selects important feature terms relying on
the parts of speech (POS) in the textual information, and then uses
sentiment analysis to estimate users’ control variables for opinions using
lexicon-based scoring analysis.
• For improving classification-based false news identification, a data
imputation preparation approach is presented. This approach is based on
the utilisation of data imputation techniques to handle missing values in a
dataset.
• Subsequently, the term frequency and inverse document frequency (TF-IDF)
were used for the extraction of useful features from the datasets to help
the detection accuracy.
• Finally, the fake news was detected using multiple classification models.
Initially, for the multiclass prediction and robustness of predicting the class
of text, the Naïve Bayes model had been used. Secondly, the passive-
aggressive classifier trains the model incrementally and eventually, the
deep neural network was used to increase the efficiency to detect fake
news.
CHAPTER 6
REFERENCES

1. Ahmad I, Yousaf M, Yousaf S, Ahmad M. Fake news detection using machine


learning ensemble methods. Complexity. 2020;2020:1–11. [Google Scholar]

2. Akinyemi B. Department of computer science and engineering, Obafemi


Awolowo University, Ile-Ife, Nigeria, Adewusi O, Oyebade a. an improved
classification model for fake news detection in social media. Int J Inf Technol
Comput Sci. 2020;12(1):34– 43. doi: 10.5815/ijitcs.2020.01.05. [CrossRef]
[Google Scholar]

3. Alonso MA, Vilares D, Gómez-Rodríguez C, Vilares J. Sentiment analysis for


fake news detection. Electronics (Basel) 2021;10(11):1348.
doi: 10.3390/electronics10111348. [CrossRef] [Google Scholar]

4. Zhang, S., Pan, J., Wu, J., & Zhu, X. (2018, February). Detecting
fake news for reducing misinformation risks using analytics approaches.
In Proceedings of the 51st Hawaii International Conference on System
Sciences.

5. Patel, M., Gupta, V., & Joshi, S. (2018). Fake news detection using
machine learning techniques: A systematic literature review. Journal of
Physics: Conference Series, 1000(1), 012038.

6. Castillo, C., Mendoza, M., & Poblete, B. (2011, July). Information


credibility on twitter. In Proceedings of the 20th international
conference on World wide web (pp. 675-684).

7. Tacchini, E., Ballarin, G., Della Vedova, M. L., & Moret, S. (2017).
Some like it hoax: Automated fake news detection in social networks. In
Italian Conference on Computational Linguistics (pp. 318-327).
Springer, Cham.

You might also like