Daa - Mini - Project (1) Orginal

FAKE NEWS DETECTION
Submitted by
Jatin Garg(21BCS10747)
Shubhankar(21BCS10627)
Harsh Gupta(21BCS10721)
in partial fulfillment for the award of

the degree of
BACHELOR OF
ENGINEERING IN
COMPUTER SCIENCE
ENGINEERING
Chandigarh University
NOVEMBER 2023
BONAFIDE CERTIFICATE
Certified that this project report “FAKE NEWS DETECTION” is the

bonafide work of “Jatin Garg, Shubhankar, Harsh Gupta” who carried out
the project work under my/our supervision.
Project Guide
Malik Muzamil Ishaq
ASSISTANT PROFESSOR
Department of CSE
Chandigarh University
TABLE OF CONTENTS
Abstract..............................................................................................................3
Chapter 1.
1.1.......................................................................................................................5
1.2.......................................................................................................................5
1.3.......................................................................................................................6
Chapter 2.
2.1.........................................................................................................................6
2.2.........................................................................................................................7
Chapter 3.
3.1……………………………………………………………………………….8
3.2……………………………………………………………………………….9
Chapter 4...........................................................................................................15
APPENDIX........................................................................................................19
REFERENCES..................................................................................................20
ABSTRACT
In recent years, due to the booming development of online social networks, fake
news for various commercial and political purposes has been appearing in large
numbers and widespread in the online world. With deceptive words, online social
network users can get infected by these online fake news easily, which has brought
about tremendous effects on the offline society already. An important goal in
improving the trustworthiness of information in online social networks is to
identify the fake news timely. This paper aims at investigating the principles,
methodologies and algorithms for detecting fake news articles, creators and
subjects from online social networks and evaluating the corresponding
performance. Information preciseness on Internet, especially on social media, is an
increasingly important concern, but web-scale data hampers, ability to identify,
evaluate and correct such data, or so called "fake news," present in these
platforms. In this paper, we propose a method for "fake news" detection and ways
to apply it on Facebook, one of the most popular online social media platforms.
This method uses Naive Bayes classification model to predict whether a post on
Facebook will be labeled as real or fake. The results may be improved by applying
several techniques that are discussed in the paper. Received results suggest, that
fake news detection problem can be addressed with machine learning methods.
CHAPTER 1
INTRODUCTIO
N
1.1 Identification of Client and Need:
These days‟ fake news is creating different issues from sarcastic articles to
a fabricated news and plan government propaganda in some outlets. Fake
news and lack of trust in the media are growing problems with huge
ramifications in our society. Obviously, a purposely misleading story is
“fake news “but lately blathering social media’s discourse is changing its
definition. Some of them now use the term to dismiss the facts counter to
their preferred viewpoints.
1.2 Relevant contemporary issues:
Fake news detection faces a range of ongoing challenges in the ever-

evolving landscape of information dissemination. One of the most pressing
issues is the emergence of deepfake content, where advanced AI
technologies are used to create highly convincing fake videos and audio
recordings. These deepfakes can deceive both automated systems and
human observers, making it increasingly difficult to differentiate real news
from fabricated content.
The rapid spread of misinformation on social media platforms is another

critical concern. Social media provides a fertile ground for the quick
dissemination of false information, often without proper fact-checking. The
challenge lies in identifying and countering fake news within the context of
social media, where news articles can go viral within minutes.
Adversarial attacks pose another contemporary challenge. Malicious actors

actively try to subvert fake news detection systems by tweaking news
articles to evade algorithmic detection. This means that fake news detection
models need to be robust and adaptive to changing attack strategies.
Understanding context is also vital in fake news detection. Fake news often
relies on taking information out of context, so improving the models' ability
to understand the context in which news is presented is crucial
1.3 Task Identification:
1. Data Collection and Preparation:
 Identify sources for real and fake news articles.

 Collect and curate a dataset of news articles with labels indicating
their authenticity.
 Preprocess the data by cleaning and formatting the text, handling
missing data, and converting it into a suitable format for machine
learning.
2. Feature Engineering:
Define the features that the model will use for fake news detection.
These can include text-based features (e.g., TF-IDF, word
embeddings), metadata features (e.g., publication date, source
credibility), and more advanced features (e.g., sentiment analysis,
readability scores).
3. Model Selection and Development:
 Choose the machine learning or deep learning models that will be

used for fake news detection.
 Develop and fine-tune these models to achieve optimal performance.
Experiment with different algorithms and approaches.
4. Training and Testing Data Split:
Split the dataset into training and testing sets to train the model and
evaluate its performance.
5. Model Evaluation:
 Use appropriate evaluation metrics (e.g., accuracy, precision, recall,

F1-score) to assess the model's performance.
 Fine-tune hyperparameters and make necessary adjustments based on
the evaluation results.
6. Adversarial Testing:
 Implement adversarial testing to evaluate the model's robustness

against adversarial attacks or attempts to deceive the system.
7. Interpretability and Explain ability:
 If applicable, incorporate methods for making model predictions more

interpretable and explainable, allowing users to understand why the
model classified a news article as fake.
8. User Interface (Optional):
 If the project includes an end-user component, design and develop a

user-friendly interface for users to interact with the fake news
detection system.
9. Deployment and Integration:
 Deploy the fake news detection system in a real-world setting, which

may involve integration with a website, app, or platform.
10.Continuous Monitoring and Updating:
 Implement mechanisms for regularly updating the model with new

data to adapt to evolving fake news trends.
 Monitor the system's performance and address any issues or updates
as needed.
11.Ethical Considerations:
 Continuously assess and address ethical considerations, such as bias,

fairness, and privacy, throughout the project.
12.Regulatory Compliance:
 Ensure that the project complies with relevant regulations and legal
requirements for data handling and content moderation.
13.User Education and Training:
Develop educational materials and training for end-users or

administrators to effectively use and manage the fake news detection
system.
14.Documentation and Reporting:
 Create comprehensive documentation for the project, including a

detailed report on the methodologies used, challenges faced, and
results obtained.
CHAPTER 2
DESIGN FLOW/PROCESS
2.1 Evaluation & Selection of Specifications/Features:
There exists a large body of research on the topic of machine learning
methods for deception detection, most of it has been focusing on classifying
online reviews and publicly available social media posts. Particularly since
late 2016 during the American Presidential election, the question of
determining 'fake news' has also been the subject of particular attention
within the literature. Conroy, Rubin, and Chen outline several approaches
that seem promising towards the aim of perfectly classify the misleading
articles. They note that simple content-related n-grams and shallow parts-
of-speech tagging have proven insufficient for the classification task, often
failing to account for important context information. Rather, these methods
have been shown useful only in tandem with more complex methods of
analysis. Deep Syntax analysis using Probabilistic Context Free Grammars
have been shown to be particularly valuable in combination with n-gram
methods. Feng, Banerjee, and Choi are able to achieve 85%-91% accuracy
in deception related classification tasks using online review corpora.
In this paper a model is build based on the count vectorizer or a tiff matrix
( i.e. ) word tallies relatives to how often they are used in other articles in
your dataset ) can help . Since this problem is a kind of text classification,
implementing a Naive Bayes classifier will be best as this is standard for
text-based processing. The actual goal is in developing a model which was
the text transformation (count vectorizer vs tiff vectorizer) and choosing
which type of text to use (headlines vs full text). Now the next step is to
extract the most optimal features for count vectorizer or tiff-vectorizer, this
is done by using a n-number of the most used words, and/or phrases, lower
casing or not, mainly removing the stop words which are common words
such as “the”, “when”, and “there” and only using those words that appear
at least a given number of times in a given text dataset.
2.2. Design Constraints:
1. Data Quality and Availability: The quality and availability of labelled

datasets for training and testing fake news detection models can be a
significant constraint. Limited or biased data can affect the model's accuracy
and generalizability.
2. Resource Limitations: Constraints related to computational resources, such
as available hardware, memory, and processing power, can impact the
choice of machine learning algorithms and the scale of data that can be
processed.
3. Privacy and Legal Compliance: Fake news detection may involve
analyzing and storing large datasets, which can raise privacy concerns and
legal requirements, especially when dealing with user-generated content.
4. Real-Time Processing: If the system needs to operate in real-time, it
imposes constraints on processing time and may require more efficient
algorithms to make quick decisions.
5. Scalability: Designing a system that can handle a large volume of data and
user requests is a constraint, especially if the system is expected to operate
at scale.
6. Cost Constraints: Budget limitations can restrict the use of expensive
machine learning models, hardware, or software tools.
7. Ethical Considerations: Ensuring that the fake news detection system
operates ethically and without introducing biases is a constraint that should
guide model development and decision-making.
8. Regulatory Constraints: Compliance with local and international
regulations related to data handling, privacy, and content moderation is
crucial, which can limit certain data processing or storage methods.
9. Interoperability: If the system needs to integrate with other software or
platforms, ensuring compatibility and interoperability with existing systems
can be a constraint.
10.Model Explainability: The need to provide explanations for model
predictions can be a constraint, as it may require using interpretable models
or implementing explainability techniques, which could affect the model's
accuracy.
11.Human Resources: The availability of skilled personnel for tasks such as
data annotation, model development, and system maintenance can be a
constraint.
12.User Education and Training: If the project includes user-facing
components, constraints related to user education and training must be
considered, as users may need to understand how to interact with the system
effectively.
13.Data Diversity: Limited diversity in the data, particularly with respect to
language or topics, can constrain the model's ability to detect fake news
across various contexts.
14.Crisis and Disaster Scenarios: If the system is designed for crisis
management, constraints associated with the availability of data and
resources during high-stress situations need to be considered.
15.Performance Metrics: Choosing appropriate evaluation metrics and
understanding their limitations is a constraint, as the choice of metrics can
affect how the system is perceived to perform.
CHAPTER 3.
RESULTS ANALYSIS AND VALIDATION
3.1 INTRODUCTION
Data Use
So, in this project we are using different packages and to load and read the
data set we are using pandas. By using pandas, we can read the .csv file and
then we can display the shape of the dataset with that we can also display
the dataset in the correct form. We will be training and testing the data,
when we use supervised learning, it means we are labeling the data. By
getting the testing and training data and labels we can perform different
machine learning algorithms but before performing the predictions and
accuracies, the data is need to be preprocessing i.e. the null values which
are not readable are required to be removed from the data set and the data is
required to be converted into vectors by normalizing and tokening the data
so that it could be understood by the machine. Next step is by using this
data, getting the visual reports, which we will get by using the Mat Plot
Library of Python and Sickit Learn. This library helps us in getting the
results in the form of histograms, pie charts or bar charts.
Preprocessing
The data set used is split into a training set and a testing set containing in
Dataset I-3256 training data and 814 testing data and in Dataset II- 1882
training data and 471 testing data respectively. Cleaning the data is always
the first step. In this, those words are removed from the dataset. That helps
in mining the useful information. Whenever we collect data online, it
sometimes contains the undesirable characters like stop words, digits etc.
which creates hindrance while spam detection. It helps in removing the
texts which are language independent entities and integrate the logic which
can improve the accuracy of the identification task.
Feature Extraction
Feature extraction s the process of selecting a subset of relevant features for

use in model construction. Feature extraction methods helps in to create an
accurate predictive model. They help in selecting features that will give
better accuracy. When the input data to an algorithm is too large to be
handled and its supposed to be redundant then the input data will be
transformed into a reduced illustration set of features also named feature
vectors. Altering the input data to perform the desired task using this
reduced representation instead of the full-size input. Feature extraction is
performed on raw data prior to applying any machine learning algorithm, on
the transformed data in feature space.
Naive Bayes
 One of supervised learning algorithm based on probabilistic

classification technique.
 It is a powerful and fast algorithm for predictive modelling.
 In this project, I have used the Multinomial Naive Bayes Classifier.
Support Vector Machine- SVM
 SVM‟s are a set of supervised learning methods used for

classification, and regression.
 Effective in high dimensional spaces.
 Uses a subset of training points in the support vector, so it is also
memory efficient.
Logistic Regression
 Linear model for classification rather than regression.

 The expected values of the response variable are modeled based
on combination of values taken by the predictors
3.2 SAMPLE CODE
1. Fibonacci
public class Fibonacci {
public static int fib(int n)
{
int dp[] = new int[n+1];
dp[0] =0;
dp[1] = 1;
for(int i=2;i<=n;i++)
{
dp[i]= dp[i-1] + dp[i-2];
}
return dp[n];
}
public static void main(String args[])
{
int n= 6;
System.out.println(fib(n));
}
}
The nth Fibonacci number can be efficiently calculated using dynamic
programming by utilizing the method defined in the "Fibonacci" class defined in
this Java code. A synopsis of the code is provided below:
Dynamic programming is used to determine the nth Fibonacci number via the
"Fibonacci" class's static function "fib".
The numerical array "dp" of size "n + 1" is initialized to hold Fibonacci
numbers, with "n" representing the input value that needs to be found.
Given that these are the first two Fibonacci numbers, it sets the base cases for
Fibonacci, "dp[0]" to 0 and "dp[1]" to 1.
Adding the preceding two numbers to each Fibonacci number, the code iterates
from "i = 2" to "n," calculating and storing each number in the "dp" array.
2. Coin Change
public class coinchange {
public static void print(int dp[][]){
for(int i=0;i<dp.length;i++){
for(int j=0;j<dp[0].length;j++){
System.out.print(dp[i][j]+" ");
}
System.out.println();
}
}
public static int coinchange(int coins[] , int sum){
int n= coins.length;
int dp[][] = new int[n+1][sum+1];
for(int i=0;i<n+1;i++){
dp[i][0] = 1;
}
for(int j=1;j<sum+1;j++){
dp[0][j] = 0;
}
for(int j=1;j<sum+1;j++)
{ if(coins[i-1]<= j){
dp[i][j] = dp[i][j-coins[i-1]] + dp[i-1][j];
}
else{
dp[i][j] = dp[i-1][j];
}}
}
print(dp);
return dp[n][sum];
}
public static void main(String[] args) {
int coins[] = {1,2,3};
int sum =5;
System.out.println(coinchange(coins,sum));
}
}
To address the Coin Change issue, the code defines a class named "coinchange".
It offers a way to "print" the contents of a 2D array, which is useful for
visualizing dynamic programming tables.
The "coinchange" method determines how many different ways there are to use
the specified coin denominations to generate a target total.
It use a 2D array called "dp" to hold intermediate results, where "dp[i][j]"
denotes methods for generating sum "j" from the first "i" coin denominations.
In order to determine how many methods there are to make the goal sum, the
code initializes base cases, iterates through coins and total, and fills the "dp"
table. After printing the table for visualization, it gives back the outcome.
The code finds the number in the main method to show how to use the
"coinchange" function.
3. Knapsack
public class knapsack01 {
public static void print(int dp[][]){
System.out.print(dp[i][j]+" ");
}
System.out.println();
}
}
public static int Knap(int val[] ,int wt[] , int W)
{
int n = val.length;
int dp[][] = new int[n+1][W+1];
dp[i][0] =0;
}
dp[0][j] =0;
}
for(int j=1;j<W+1;j++){
int v = val[i-1];
int w = wt[i-1];
if(w <= j){
int incprofit = v+ dp[i-1][j-w];
int excprofit = dp[i-1][j];
dp[i][j] = Math.max(incprofit,excprofit);
}
else{
int excprofit = dp[i-1][j];
dp[i][j] = excprofit;
}}
}
print(dp);
return dp[n][W];
}
int val[] = {15,14,10,45,30};
int wt[] = {2,5,1,3,4};
int W = 7;
System.out.println(Knap(val,wt,W));
}}
In order to address the 0/1 Knapsack problem, the code constructs a class
named "knapsack01".
It offers a way to "print" a 2D array's contents, which is commonly used
to visualize dynamic programming tables.
With respect to item values, weights, and a maximum capacity, the
"Knap" approach determines the highest value that may be obtained in
the 0/1 Knapsack issue.
In order to store interim results, it first initializes a 2D array called "dp,"
where "dp[i][j]" denotes the maximum value with the first "i" elements
and a knapsack capacity of "j."
The code considers whether to include or exclude each item in the
knapsack depending on weight limitations, setting base cases to 0 and
filling the "dp" table accordingly. The variable "dp[n][W]," where "n" is
the number of elements, stores the highest value that may be achieved.
4. Matrix Multiplication
import java.util.*;
public class MatrixMultiplication {
public static int mcm(int arr[],int i,int j){
if(i == j)
{
return 0;
}
int ans = Integer.MAX_VALUE;
for(int k = i ; k<=j-1 ;k++){
int cost1 = mcm(arr, i, k); // Ai....Ak => arr[i-1]xarr[k]
int cost2 = mcm(arr, k+1, j); // Ai+1. . .Aj => arr[k]xarr[j]
int cost3 = arr[i-1] * arr[k] * arr[j];
int finalcost = cost1 + cost2 + cost3;
ans = Math.min(ans, finalcost);
}
return ans;
}
int arr[] = {1,2,3,4,3};
int n = arr.length;
System.out.println(mcm(arr,1,n-1));
}
}
This Java code creates a class called "MatrixMultiplication" and uses a
recursive method to determine the minimal cost of matrix chain multiplication.
The code is described as follows in brief:
A static method called "mcm" in the "MatrixMultiplication" class iteratively
determines the least amount of money needed to multiply a chain of matrices.
"I" and "j" are indices that specify the current range of matrices to be multiplied,
and the procedure accepts an array "arr" that represents the dimensions of
matrices.
In order to divide the matrix chain into smaller issues, the "mcm" algorithm
loops over various partition points ("k"). Along with multiplying the resulting
submatrices, it also computes the cost of multiplying the matrices on the left and
right sides of the split. Among these partitions, it determines which has the
lowest cost.
As the best method of parenthesizing the matrices for the lowest total cost, the
code gives the smallest cost of matrix chain multiplication.
The code presents an array representing matrix dimensions in the main method,
where it finds the minimal cost of multiplying the matrices. This illustrates how
to use the "mcm" function. For bigger matrix chains, this code may not be the
most effective method because it takes a recursive approach.
CHAPTER 4.
CONCLUSION AND FUTURE WORK
Many people consume news from social media instead of traditional news
media. However, social media has also been used to spread fake news,
which has negative impacts on individual people and society. In this paper,
an innovative model for fake news detection using machine learning
algorithms has been presented. This model takes news events as an input
and based on twitter reviews and classification algorithms it predicts the
percentage of news being fake or real.
The feasibility of the project is analyzed in this phase and business proposal
is put forth with a very general plan for the project and some cost estimates.
During system analysis the feasibility study of the proposed system is to be
carried out. This is to ensure that the proposed system is not a burden to the
company. For feasibility analysis, some understanding of the major
requirements for the system is essential.This study is carried out to check
the economic impact that the system will have on the organization. The
amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified.
Thus the developed system as well within the budget and this was achieved
because most of the technologies used are freely available. Only the
customized products had to be purchased.
REFERENCES
1]. Parikh, S. B., & Atrey, P. K. (2018, April). Media-Rich Fake News Detection: A
Survey. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval
(MIPR) (pp. 436-441). IEEE.
[2]. Conroy, N. J., Rubin, V. L., & Chen, Y. (2015, November). Automatic deception
detection: Methods for finding fake news. In Proceedings of the 78th ASIS&T Annual
Meeting: Information Science with Impact: Research in and for the Community (p.
82). American Society for Information Science.
[3]. Helmstetter, S., & Paulheim, H. (2018, August). Weakly supervised learning for
fake news detection on Twitter. In 2018 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining (ASONAM) (pp. 274-277). IEEE.
[4]. Stahl, K. (2018). Fake News Detection in Social Media.
[5]. Della Vedova, M. L., Tacchini, E., Moret, S., Ballarin, G., DiPierro, M., & de
Alfaro, L. (2018, May). Automatic Online Fake News Detection Combining Content
and Social Signals. In 2018 22nd Conference of Open Innovations Association
(FRUCT) (pp. 272-279). IEEE.
[6] Tacchini, E., Ballarin, G., Della Vedova, M. L., Moret, S., & de Alfaro, L. (2017).
Some like it hoax: Automated fake news detection in social networks. arXiv preprint
arXiv:1704.07506.
[7]. Shao, C., Ciampaglia, G. L., Varol, O., Flammini, A., & Menczer, F. (2017). The
spread of fake news by social bots. arXiv preprint arXiv:1707.07592, 96-104.
[8]. Chen, Y., Conroy, N. J., & Rubin, V. L. (2015, November). Misleading online
content: Recognizing clickbait as false news. In Proceedings of the 2015 ACM on
Workshop on Multimodal Deception Detection (pp. 15-19). ACM.

Daa - Mini - Project (1) Orginal

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Daa - Mini - Project (1) Orginal

Uploaded by

Copyright:

Available Formats

FAKE NEWS DETECTION

in partial fulfillment for the award of

Certified that this project report “FAKE NEWS DETECTION” is the

Malik Muzamil Ishaq

1.2 Relevant contemporary issues:

Fake news detection faces a range of ongoing challenges in the ever-

The rapid spread of misinformation on social media platforms is another

Adversarial attacks pose another contemporary challenge. Malicious actors

1. Data Collection and Preparation:

 Identify sources for real and fake news articles.

 Choose the machine learning or deep learning models that will be

 Use appropriate evaluation metrics (e.g., accuracy, precision, recall,

 Implement adversarial testing to evaluate the model's robustness

 If applicable, incorporate methods for making model predictions more

 If the project includes an end-user component, design and develop a

 Deploy the fake news detection system in a real-world setting, which

 Implement mechanisms for regularly updating the model with new

 Continuously assess and address ethical considerations, such as bias,

Develop educational materials and training for end-users or

 Create comprehensive documentation for the project, including a

1. Data Quality and Availability: The quality and availability of labelled

Feature extraction s the process of selecting a subset of relevant features for

 One of supervised learning algorithm based on probabilistic

Support Vector Machine- SVM

 SVM‟s are a set of supervised learning methods used for

 Linear model for classification rather than regression.

3.2 SAMPLE CODE

You might also like