Professional Documents
Culture Documents
Submitted by
Jatin Garg(21BCS10747)
Shubhankar(21BCS10627)
Harsh Gupta(21BCS10721)
ENGINEERING IN
COMPUTER SCIENCE
ENGINEERING
Chandigarh University
NOVEMBER 2023
BONAFIDE CERTIFICATE
Project Guide
ASSISTANT PROFESSOR
Department of CSE
Chandigarh University
TABLE OF CONTENTS
Abstract..............................................................................................................3
Chapter 1.
1.1.......................................................................................................................5
1.2.......................................................................................................................5
1.3.......................................................................................................................6
Chapter 2.
2.1.........................................................................................................................6
2.2.........................................................................................................................7
Chapter 3.
3.1……………………………………………………………………………….8
3.2……………………………………………………………………………….9
Chapter 4...........................................................................................................15
APPENDIX........................................................................................................19
REFERENCES..................................................................................................20
ABSTRACT
In recent years, due to the booming development of online social networks, fake
news for various commercial and political purposes has been appearing in large
numbers and widespread in the online world. With deceptive words, online social
network users can get infected by these online fake news easily, which has brought
about tremendous effects on the offline society already. An important goal in
improving the trustworthiness of information in online social networks is to
identify the fake news timely. This paper aims at investigating the principles,
methodologies and algorithms for detecting fake news articles, creators and
subjects from online social networks and evaluating the corresponding
performance. Information preciseness on Internet, especially on social media, is an
increasingly important concern, but web-scale data hampers, ability to identify,
evaluate and correct such data, or so called "fake news," present in these
platforms. In this paper, we propose a method for "fake news" detection and ways
to apply it on Facebook, one of the most popular online social media platforms.
This method uses Naive Bayes classification model to predict whether a post on
Facebook will be labeled as real or fake. The results may be improved by applying
several techniques that are discussed in the paper. Received results suggest, that
fake news detection problem can be addressed with machine learning methods.
CHAPTER 1
INTRODUCTIO
N
1.1 Identification of Client and Need:
These days‟ fake news is creating different issues from sarcastic articles to
a fabricated news and plan government propaganda in some outlets. Fake
news and lack of trust in the media are growing problems with huge
ramifications in our society. Obviously, a purposely misleading story is
“fake news “but lately blathering social media’s discourse is changing its
definition. Some of them now use the term to dismiss the facts counter to
their preferred viewpoints.
Understanding context is also vital in fake news detection. Fake news often
relies on taking information out of context, so improving the models' ability
to understand the context in which news is presented is crucial
1.3 Task Identification:
Define the features that the model will use for fake news detection.
These can include text-based features (e.g., TF-IDF, word
embeddings), metadata features (e.g., publication date, source
credibility), and more advanced features (e.g., sentiment analysis,
readability scores).
3. Model Selection and Development:
Split the dataset into training and testing sets to train the model and
evaluate its performance.
5. Model Evaluation:
Ensure that the project complies with relevant regulations and legal
requirements for data handling and content moderation.
13.User Education and Training:
In this paper a model is build based on the count vectorizer or a tiff matrix
( i.e. ) word tallies relatives to how often they are used in other articles in
your dataset ) can help . Since this problem is a kind of text classification,
implementing a Naive Bayes classifier will be best as this is standard for
text-based processing. The actual goal is in developing a model which was
the text transformation (count vectorizer vs tiff vectorizer) and choosing
which type of text to use (headlines vs full text). Now the next step is to
extract the most optimal features for count vectorizer or tiff-vectorizer, this
is done by using a n-number of the most used words, and/or phrases, lower
casing or not, mainly removing the stop words which are common words
such as “the”, “when”, and “there” and only using those words that appear
at least a given number of times in a given text dataset.
2.2. Design Constraints:
Data Use
So, in this project we are using different packages and to load and read the
data set we are using pandas. By using pandas, we can read the .csv file and
then we can display the shape of the dataset with that we can also display
the dataset in the correct form. We will be training and testing the data,
when we use supervised learning, it means we are labeling the data. By
getting the testing and training data and labels we can perform different
machine learning algorithms but before performing the predictions and
accuracies, the data is need to be preprocessing i.e. the null values which
are not readable are required to be removed from the data set and the data is
required to be converted into vectors by normalizing and tokening the data
so that it could be understood by the machine. Next step is by using this
data, getting the visual reports, which we will get by using the Mat Plot
Library of Python and Sickit Learn. This library helps us in getting the
results in the form of histograms, pie charts or bar charts.
Preprocessing
The data set used is split into a training set and a testing set containing in
Dataset I-3256 training data and 814 testing data and in Dataset II- 1882
training data and 471 testing data respectively. Cleaning the data is always
the first step. In this, those words are removed from the dataset. That helps
in mining the useful information. Whenever we collect data online, it
sometimes contains the undesirable characters like stop words, digits etc.
which creates hindrance while spam detection. It helps in removing the
texts which are language independent entities and integrate the logic which
can improve the accuracy of the identification task.
Feature Extraction
Logistic Regression
1. Fibonacci
public class Fibonacci {
public static int fib(int n)
{
int dp[] = new int[n+1];
dp[0] =0;
dp[1] = 1;
for(int i=2;i<=n;i++)
{
dp[i]= dp[i-1] + dp[i-2];
}
return dp[n];
}
public static void main(String args[])
{
int n= 6;
System.out.println(fib(n));
}
}
The nth Fibonacci number can be efficiently calculated using dynamic
programming by utilizing the method defined in the "Fibonacci" class defined in
this Java code. A synopsis of the code is provided below:
Dynamic programming is used to determine the nth Fibonacci number via the
"Fibonacci" class's static function "fib".
The numerical array "dp" of size "n + 1" is initialized to hold Fibonacci
numbers, with "n" representing the input value that needs to be found.
Given that these are the first two Fibonacci numbers, it sets the base cases for
Fibonacci, "dp[0]" to 0 and "dp[1]" to 1.
Adding the preceding two numbers to each Fibonacci number, the code iterates
from "i = 2" to "n," calculating and storing each number in the "dp" array.
2. Coin Change
public class coinchange {
public static void print(int dp[][]){
for(int i=0;i<dp.length;i++){
for(int j=0;j<dp[0].length;j++){
System.out.print(dp[i][j]+" ");
}
System.out.println();
}
}
public static int coinchange(int coins[] , int sum){
int n= coins.length;
int dp[][] = new int[n+1][sum+1];
for(int i=0;i<n+1;i++){
dp[i][0] = 1;
}
for(int j=1;j<sum+1;j++){
dp[0][j] = 0;
}
for(int i=1;i<n+1;i++){
for(int j=1;j<sum+1;j++)
{ if(coins[i-1]<= j){
dp[i][j] = dp[i][j-coins[i-1]] + dp[i-1][j];
}
else{
dp[i][j] = dp[i-1][j];
}}
}
print(dp);
return dp[n][sum];
}
public static void main(String[] args) {
int coins[] = {1,2,3};
int sum =5;
System.out.println(coinchange(coins,sum));
}
}
To address the Coin Change issue, the code defines a class named "coinchange".
It offers a way to "print" the contents of a 2D array, which is useful for
visualizing dynamic programming tables.
The "coinchange" method determines how many different ways there are to use
the specified coin denominations to generate a target total.
It use a 2D array called "dp" to hold intermediate results, where "dp[i][j]"
denotes methods for generating sum "j" from the first "i" coin denominations.
In order to determine how many methods there are to make the goal sum, the
code initializes base cases, iterates through coins and total, and fills the "dp"
table. After printing the table for visualization, it gives back the outcome.
The code finds the number in the main method to show how to use the
"coinchange" function.
3. Knapsack
public class knapsack01 {
public static void print(int dp[][]){
for(int i=0;i<dp.length;i++){
for(int j=0;j<dp[0].length;j++){
System.out.print(dp[i][j]+" ");
}
System.out.println();
}
}
public static int Knap(int val[] ,int wt[] , int W)
{
int n = val.length;
int dp[][] = new int[n+1][W+1];
for(int i=0;i<dp.length;i++){
dp[i][0] =0;
}
for(int j=0;j<dp[0].length;j++){
dp[0][j] =0;
}
for(int i=1;i<n+1;i++){
for(int j=1;j<W+1;j++){
int v = val[i-1];
int w = wt[i-1];
if(w <= j){
int incprofit = v+ dp[i-1][j-w];
int excprofit = dp[i-1][j];
dp[i][j] = Math.max(incprofit,excprofit);
}
else{
int excprofit = dp[i-1][j];
dp[i][j] = excprofit;
}}
}
print(dp);
return dp[n][W];
}
public static void main(String[] args) {
int val[] = {15,14,10,45,30};
int wt[] = {2,5,1,3,4};
int W = 7;
System.out.println(Knap(val,wt,W));
}}
In order to address the 0/1 Knapsack problem, the code constructs a class
named "knapsack01".
It offers a way to "print" a 2D array's contents, which is commonly used
to visualize dynamic programming tables.
With respect to item values, weights, and a maximum capacity, the
"Knap" approach determines the highest value that may be obtained in
the 0/1 Knapsack issue.
In order to store interim results, it first initializes a 2D array called "dp,"
where "dp[i][j]" denotes the maximum value with the first "i" elements
and a knapsack capacity of "j."
The code considers whether to include or exclude each item in the
knapsack depending on weight limitations, setting base cases to 0 and
filling the "dp" table accordingly. The variable "dp[n][W]," where "n" is
the number of elements, stores the highest value that may be achieved.
4. Matrix Multiplication
import java.util.*;
public class MatrixMultiplication {
public static int mcm(int arr[],int i,int j){
if(i == j)
{
return 0;
}
int ans = Integer.MAX_VALUE;
for(int k = i ; k<=j-1 ;k++){
int cost1 = mcm(arr, i, k); // Ai....Ak => arr[i-1]xarr[k]
int cost2 = mcm(arr, k+1, j); // Ai+1. . .Aj => arr[k]xarr[j]
int cost3 = arr[i-1] * arr[k] * arr[j];
int finalcost = cost1 + cost2 + cost3;
ans = Math.min(ans, finalcost);
}
return ans;
}
public static void main(String[] args) {
int arr[] = {1,2,3,4,3};
int n = arr.length;
System.out.println(mcm(arr,1,n-1));
}
}
This Java code creates a class called "MatrixMultiplication" and uses a
recursive method to determine the minimal cost of matrix chain multiplication.
The code is described as follows in brief:
A static method called "mcm" in the "MatrixMultiplication" class iteratively
determines the least amount of money needed to multiply a chain of matrices.
"I" and "j" are indices that specify the current range of matrices to be multiplied,
and the procedure accepts an array "arr" that represents the dimensions of
matrices.
In order to divide the matrix chain into smaller issues, the "mcm" algorithm
loops over various partition points ("k"). Along with multiplying the resulting
submatrices, it also computes the cost of multiplying the matrices on the left and
right sides of the split. Among these partitions, it determines which has the
lowest cost.
As the best method of parenthesizing the matrices for the lowest total cost, the
code gives the smallest cost of matrix chain multiplication.
The code presents an array representing matrix dimensions in the main method,
where it finds the minimal cost of multiplying the matrices. This illustrates how
to use the "mcm" function. For bigger matrix chains, this code may not be the
most effective method because it takes a recursive approach.
CHAPTER 4.
CONCLUSION AND FUTURE WORK
Many people consume news from social media instead of traditional news
media. However, social media has also been used to spread fake news,
which has negative impacts on individual people and society. In this paper,
an innovative model for fake news detection using machine learning
algorithms has been presented. This model takes news events as an input
and based on twitter reviews and classification algorithms it predicts the
percentage of news being fake or real.
The feasibility of the project is analyzed in this phase and business proposal
is put forth with a very general plan for the project and some cost estimates.
During system analysis the feasibility study of the proposed system is to be
carried out. This is to ensure that the proposed system is not a burden to the
company. For feasibility analysis, some understanding of the major
requirements for the system is essential.This study is carried out to check
the economic impact that the system will have on the organization. The
amount of fund that the company can pour into the research and
development of the system is limited. The expenditures must be justified.
Thus the developed system as well within the budget and this was achieved
because most of the technologies used are freely available. Only the
customized products had to be purchased.
REFERENCES
1]. Parikh, S. B., & Atrey, P. K. (2018, April). Media-Rich Fake News Detection: A
Survey. In 2018 IEEE Conference on Multimedia Information Processing and Retrieval
(MIPR) (pp. 436-441). IEEE.
[2]. Conroy, N. J., Rubin, V. L., & Chen, Y. (2015, November). Automatic deception
detection: Methods for finding fake news. In Proceedings of the 78th ASIS&T Annual
Meeting: Information Science with Impact: Research in and for the Community (p.
82). American Society for Information Science.
[3]. Helmstetter, S., & Paulheim, H. (2018, August). Weakly supervised learning for
fake news detection on Twitter. In 2018 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining (ASONAM) (pp. 274-277). IEEE.
[4]. Stahl, K. (2018). Fake News Detection in Social Media.
[5]. Della Vedova, M. L., Tacchini, E., Moret, S., Ballarin, G., DiPierro, M., & de
Alfaro, L. (2018, May). Automatic Online Fake News Detection Combining Content
and Social Signals. In 2018 22nd Conference of Open Innovations Association
(FRUCT) (pp. 272-279). IEEE.
[6] Tacchini, E., Ballarin, G., Della Vedova, M. L., Moret, S., & de Alfaro, L. (2017).
Some like it hoax: Automated fake news detection in social networks. arXiv preprint
arXiv:1704.07506.
[7]. Shao, C., Ciampaglia, G. L., Varol, O., Flammini, A., & Menczer, F. (2017). The
spread of fake news by social bots. arXiv preprint arXiv:1707.07592, 96-104.
[8]. Chen, Y., Conroy, N. J., & Rubin, V. L. (2015, November). Misleading online
content: Recognizing clickbait as false news. In Proceedings of the 2015 ACM on
Workshop on Multimodal Deception Detection (pp. 15-19). ACM.