You are on page 1of 5

DATASET ANALYSIS USING KEYWORD

SEARCHING IN TWITTER DATA

PRAGYA KATHPALIA, PRAGYA SHARMA ,PRATIBHA ,SHANTANU SINGH,


Ms. KIRTI JAIN

INDERPRASTHA ENGINEERING COLLEGE, GHAZIABAD,

Dr. A P J ABDUL KALAM TECHNICAL UNIVERSITY LUCKNOW

ABSTRACT — This project addresses the sentiment classification of an unknown tweet


problem of the dataset and sentiment analysis stream.
using[1] keywords in Twitter; that is classifying Keywords- Sentiment analysis(SA), Twitter,
tweets according to the sentiment expressed in Machine Learning(ML), Naive Bayes (NB)[3],
them: positive, negative, or neutral. Twitter is Real-Time Analysis
an online microblogging and social networking
platform which allows users to write short
status updates of a maximum length of 140 1. INTRODUCTION
characters. It is a rapidly expanding service Sentiment analysis[1] in the domain of micro-
with over 200 million registered users - out of blogging is a relatively new research topic so
which 100 million are active users and half of there is still a lot of room for further research
them log on Twitter daily - generating nearly in this area. A decent amount of related prior
250 million tweets per day. Due to this large work has been done on sentiment analysis[1] of
amount of usage, we hope to achieve a
user reviews, documents, web blogs/articles,
reflection of public sentiment by analyzing the
sentiments expressed in the tweets. Analyzing and general phrase-level sentiment analysis.
the public sentiment is important for many These differ from twitter mainly because of the
applications such as firms trying to limit of 140 characters per tweet which forces
find out the response of their products in the the user to express opinion compressed in a
market, predicting very short text. The best results reached in
political elections, and predicting sentiment classification using supervised
socioeconomic phenomena like the stock
learning techniques such as Naive Bayes[3] and
exchange. This project aims to develop a
functional classifier for accurate and automatic Support Vector Machines, but the manual
labeling required for the supervised approach
is very expensive. Some work has been done on 2. DEFINITION AND MOTIVATION
unsupervised and semi-supervised approaches, Sentiment analysis[1] can be defined as a
and there is a lot of room for improvement. process that automates mining of attitudes,
Various researchers testing new features and opinions, views and emotions from text,
classification techniques often just compare speech, tweets and database sources
through Natural Language Processing
their results to baseline performance. There is a (NLP). Sentiment analysis[1] involves
need for proper and formal comparisons classifying opinions in text into categories
between these results arrived through different like "positive" or "negative" or "neutral".
features and classification techniques to select It's also referred to as subjectivity analysis,
the best features and most efficient opinion mining, and appraisal extraction.
classification techniques for particular An example for terminologies for
Sentiment Analysis[1] is as given below:
applications.
Statement: “The food in this cafe was
bad !”
Being extremely interested in everything having
a relation with Machine Learning, the now here in this statement
independent project was a great occasion to object : <cafe>
give us the time to learn and confirm our feature: <food>
interest in this field. The fact that we can make opinion: <bad>
estimations, predictions and give the ability for polarity[4]: <negative>
machines to learn by themselves is both
powerful and limitless in terms of application
Immense quantities of client-created web-
possibilities. We can use Machine Learning in based social networking communications
Finance, Medicine, and almost everywhere. are being persistently delivered in the
That’s why we decided to conduct a project forms of surveys, online journals,
around Machine Learning. comments, discourses, pictures, and
recordings.
“This project” was motivated by the desire
Twitter is a small-scale blogging stage
to investigate the sentiment analysis[1] field
where clients generate 'tweets' that are
of machine learning since it allows us to
communicated to their devotees or to
approach natural language processing
another client. At 2016, Twitter has more
which is a very hot topic. Following our
than 313 million dynamic clients inside a
previous experiences where it was about
given month, including 100 million clients
classifying short music according to their
daily
emotion, we applied the same idea with
tweets and tried to figure out which is Twitter has of late been the subject of
positive or negative too with the help of much scrutiny, as Tweets frequently
keyword searching. express client's sentiment on controversial
issues. In the social media context, criteria. Instead, train a machine
sentiment analysis[1] and mining opinions learning model to perform sentiment
are highly challenging tasks, and this is analysis, using one set of rules, on all
due to the enormous information generated your Twitter data, so results are
by humans and machines. consistent.
Sentiment analysis[1] critically encourages
organizations to determine customers’
likes and dislikes about products and 4. METHODOLOGY
company image. In addition, it plays a vital
role in analyzing data of industries and 4.1 Dataset : The tweepy module is used to obtain
organizations to aid them in making our dataset.First, you need to import all the
business decisions. packages required and initialize the token and key
variables. OAuth essentially allows the user, via an
authentication provider that they have previously
successfully authenticated with, to give another
website/service a limited access authentication
token for authorization to additional resources .
3. OBJECTIVE(S) After getting access to Twitter data we create a file
Twitter sentiment analysis[1] allows you to listento save all the tweets in it .Then we create a filter
to your customers and understand what they that will extract tweets based on certain words that
need. By introducing dataset & sentiment are mentioned. Basically, it will extract tweets that
analysis[1] tools into your workflows, you can contain the words which are valid for our project
automatically organize unstructured and hence our dataset is obtained.
information (which includes Twitter data) in
real-time, at scale, and accurately: 4.2 Algorithm Used : Naive Bayes Algorithm[3] -
The algorithm is named after famous statistician
● Scalability: Analyze hundreds or
Thomas Bayes who proposed Bayesian theorem.
thousands of tweets mentioning your
This theorem assumes that all the attributes are
brand and automate manual tasks. Easily conditionally independent to each other. In this
scale sentiment analysis[1] tools as your algorithm, conditional probability for each
data grows and gain valuable insights on attribute with respect to a certain class level is
the go. calculated. The new document is classified using
● Real-Time Analysis:Twitter sentiment the sum of probabilities for each class. The
analysis[1] is essential for monitoring classification framework is briefly discussed as
sudden shifts in customer moods, follows: Bayes rule is describing the probability of
detecting if complaints are on the rise, an event on prior knowledge of the occurrence of
and for taking action before problems another event related to it. Then the probability of
escalate. With sentiment analysis, you occurrence of event A given that event B has
can monitor brand mentions on Twitter already occurred is
in real-time and gain actionable insights.
● Consistent Criteria: Avoid
inconsistencies that stem from several
agents tagging data against different
Using both these equations, we can rewrite them
collectively as

5. RESULTS

The application is a web-based application According to the percentage of user and as


which is used to detect whether or not the well as admin can determine the
application is fake or not by retrieving the the authenticity of the application. The positive and
tweets given by the users. The application negative keywords are stored at first in a lexicon
generates the result which array and then meaningful tweets are retrieved by
is the outcome of the meaningful tweets NLP their classification is done by naive Bayes
and it is categorized into[5]: theorem[3] and finally the decision is made by a
1.POSITIVE decision tree. The application will help the users to
2.WEAKLY POSITIVE find the authenticity of an application and it would
3.STRONGLY POSITIVE be real because users themselves have given the
4.NEUTRAL tweets.
5.NEGATIVE The sentiment analysis[1] is undoubtedly one
6.WEAKLY NEGATIVE of the fastest-growing research areas. This area
7.STRONGLY NEGATIVE has been used by thousands of people for
for example : generating algorithms, apps etc. The application is
based on the array which has both positive and
negative keywords.
The use of naive Bayes[3] is done for the
classification of tweets and further, the
Tweets fetched: the polarity[4] of tweets is obtained with the help
1. of a decision tree. It is the pictorial representation
of the nodes and leaf which gives a clear
representation of the final analysis of the
application done by users
Twitter sentiment analysis[1] comes under the
category of text and opinion mining. It focuses on
2. analyzing the sentiments of the tweets and feeding
the data to a machine learning model to train it and
then check its accuracy so that we can use this
model for future use according to the results. It
comprises steps like data collection, text
preprocessing, sentiment detection, sentiment
classification, training, and testing the model. This
research topic has evolved during the last decade
with models reaching the efficiency of almost 85%-
90%. But it still lacks the dimension of diversity in
the data. Along with this, it has a lot of application
issues with the slang used and the short forms of
words. Many analyzers don’t perform well when
the number of classes is increased. Also, it’s still
not tested how accurate the model will be for topics
other than the one in consideration. Hence
sentiment analysis[1] has a very bright scope of
development in future

6. REFERENCES

[1] M,S and Rajashree R,” Sentiment Analysis in


Twitter using Machine Learning Techniques” 4th
ICCCNT 2013,at Tiruchengode, India. IEEE

[2] B. Kotsiantis “Supervised Machine Learning:


A Review of Classification Techniques”(2007)

[3] Pablo Gamallo, Marcos Garcia, “Citius: A


Naive-Bayes Strategy for Sentiment Analysis on
English Tweets", 8th InternationalWorkshop on
Semantic Evaluation (SemEval 2014), Dublin,
Ireland,Aug 23-24 2014, pp 171-175.

[4] P. D. Turney, “Thumbs up or thumbs down?:


semantic orientation applied to unsupervised
classification of reviews,” in Proceedings of the
40th annual meeting on association for
computational linguistics, pp. 417–424,
Association for Computational Linguistics, 2002.

[5] S. Batra and D.Rao, “Entity based sentiment


analysis on Twitter”, Stanford University 2010

You might also like