Professional Documents
Culture Documents
Introduction
Literature Review
Methodology or Algorithm Used
Problem
Research gap
Related work
Method and results
Strong point
Weak point
The product model will test the unseen data, the results will be plotted,
and accordingly, the product will be a model that detects and classifies
fake articles and can be used and integrated with any system for future
use.
Problem:
This research resolve the problem of effective prediction of Cardiovascular
diseases through Machine Learning, which affects the heart or blood vessels of
human inmonopolized system of measuring, metering, diagnosing and control
the factors that responsible for the spread of the diseases such as high blood
pressure, smoking, diabetes, body mass index (BMI), cholesterol, age, family
history, etc. .
This disease causes highest number of death rates globally.Therefore the early
prediction of these kinds of diseases is very important so that precautionary
measures could be taken before something serious happens.
The methodology of this paper was inspired by the SLR (Simple Linear
Regression guidelines. Three steps from this methodology in order to
review the work from 2010 to 2021 are planning, implementation, and
reporting. Planning to study problem statement, objective, protocol and
Implementation is to study Quality assessment, Data extraction, search
keywords, queries, and procedures. Reporting is to synthesis data and to
analyze critically. These are the SLR guideline that this paper followed to
review past decade works. As mentioned above these studies were
comprehensively reviewed from five different aspects, including
collection sites, types and characteristics of datasets, data pre-processing
and data sampling techniques, feature types, feature selection and feature
extraction techniques, ML algorithm utilization and performance
evaluation metrics.
The dataset for this project was built with a mix of both real and fake
news. The entire dataset amounted to 44,898 news articles out of which
23,481 were fake news and 21,417 were real news. The sources of real
and fake news include Yahoo News, AOL, Reuters, Bloomberg,USA
NewsFlash, Truth-Out, and Controversial Files and so on. To extract
important content from the crawled pages i used two strategies. First
was to reduce noise by removing Fake News Detection insignificant
and irrelevant information like images, tables, headers, footers, special
symbols, navigation bars etc.. With this I noticed I was able to extract
most of the important information across many web pages. Since each
website has its own style of layout and parameters, a one size fit all
strategy would have failed, and hence I leveraged a generic approach.
The collected data was processed using various text preprocessing
measures, as explained later and stored in CSV files. The real and fake
25 By: data were
Solomon then merged and shuffled to get a CSV file containing09/04/2023
Kebede a
consolidated randomized dataset. From the consolidated randomized
4.1 Real News
For fake news I also used Kaggle’s ‘Getting Real about Fake News’
dataset. The CSV file with data was available off the shelf for use, and
I had to perform minimal text processing on this data. The total number
of fake news as mentioned above is around 23,481.
In modern era , the majority of the tasks are done online. Newspapers
that were earlier preferred as hard-copies are now being substituted by
applications like Facebook, Twitter, and news articles to be read online.
Whatsapp’s forwards are also a major source. The growing problem of
fake news only makes things more complicated and tries to change or
hamper the opinion and attitude of people towards use of digital
technology. Thus, in order to solve this challenge, i have developed my
Fake news Detection system that takes input from the user and classify
it to be true or fake. To implement this, various preprocessing and
vectorization of data and Machine Learning Techniques have to be
used. The model is trained using an appropriate dataset from Kaggle
and performance evaluation is also done using various performance
measures. The best model, i.e. the model with highest accuracy is used
to classify the news headlines or articles. As evident above for static
28 By: search, my best model came out to be Logistic Regression with 09/04/2023
Solomon Kebede an
accuracy of 98.50%.
Cont….
performance of logistic regression which then gave me the accuracy
of 98.80%. Hence I can say that if a user feed a particular news article
or its headline in my model, there are 98.80%chances that it will be
classified to its true nature.