Professional Documents
Culture Documents
Abstract
Fake news detection is a critical yet challeng- ing problem in Natural Language Processing (NLP). The
rapid rise of social networking platforms has not only yielded a vast increase in information accessibility
but has also accel- erated the spread of fake news. Given the mas- sive amount of Web content,
automatic fake news detection is a practical NLP problem re- quired by all online content providers. We
will use a Fake News Dataset from Kaggle or the FakeNews Corpus present on the web. The tagged data
set will allow us to train our Algorithm. We will be following a step-by-step procedure to train our
algorithm in order to properly classify the news.
After importing our dataset, we will separate our dataset into training and test set.
Remove stop words which are frequently used words and cannot be used to distinguish if an article is
fake or not such as articles (and, An, the, etc).
Next will build a word counter and make use of TFidf Vectors to associate words and find their
importance in terms of weights in order to later use them for detection.
After extracting the feature names of our CountVectors and TFidf Vectors, we will apply various Machine
Learning Models and even make use of Stack Ensemble(Multiple Models together) to help make our
models get better results.
Finally we will analyse the differences in our models and their metrics. We will further discuss how the
models we have trained can be improved for a better accuracy.
Tools
Python 3.6
Libraries -
● Pandas
● Numpy
● Nltk
● sklearn.feature_extraction
● Sklearn.model_selection
● Matplotlib
● itertools