You are on page 1of 4

DATASET ANALYSIS USING KEYWORD

SEARCHING IN TWITTER DATA

PRAGYA KATHPALIA, PRAGYA SHARMA ,PRATIBHA ,SHANTANU SINGH,


Ms. KIRTI JAIN

INDERPRASTHA ENGINEERING COLLEGE, GHAZIABAD,

Dr. A P J ABDUL KALAM TECHNICAL UNIVERSITY LUCKNOW

Abstract — This project addresses the problem important for many applications such as firms
of the dataset and sentiment analysis using trying to
keywords in Twitter; that is classifying tweets find out the response of their products in the
according to the sentiment expressed in them: market, predicting
positive, negative, or neutral. Twitter is an political elections, and predicting
online microblogging and social networking socioeconomic phenomena like the stock
platform which allows users to write short exchange. This project aims to develop a
status updates of a maximum length of 140 functional classifier for accurate and automatic
characters. It is a rapidly expanding service sentiment classification of an unknown tweet
with over 200 million registered users - out of stream.
which 100 million are active users and half of Keywords-
them log on Twitter daily - generating nearly
250 million tweets per day. Due to this large
amount of usage, we hope to achieve a
reflection of public sentiment by

analyzing the sentiments expressed in the


tweets. Analyzing the public sentiment is
give us the time to learn and confirm our
interest in this field. The fact that we can make
Introduction estimations, predictions and give the ability for
machines to learn by themselves is both
powerful and limitless in terms of application
Sentiment analysis in the domain of micro- possibilities. We can use Machine Learning in
blogging is a relatively new research topic so Finance, Medicine, almost everywhere. That’s
there is still a lot of room for further research why we decided to conduct a project around
in this area. A decent amount of related prior Machine Learning.
work has been done on sentiment analysis of
“This project” was motivated by the desire
user reviews, documents, web blogs/articles, to investigate the sentiment analysis field
and general phrase-level sentiment analysis. of machine learning since it allows to
These differ from twitter mainly because of the approach natural language processing
limit of 140 characters per tweet which forces which is a very hot topic. Following our
previous experiences where it was about
the user to express opinion compressed in a
classifying short music according to their
very short text. The best results reached in emotion, we applied the same idea with
sentiment classification using supervised tweets and tried to figure out which is
learning techniques such as Naive Bayes and positive or negative that too with the help
Support Vector Machines, but the manual of keyword searching.
labeling required for the supervised approach Objective(s) of the project
is very expensive. Some work has been done on Twitter sentiment analysis allows you to listen
unsupervised and semi-supervised approaches, to your customers and understand what they
and there is a lot of room for improvement. need. By introducing dataset & sentiment
analysis tools into your workflows, you can
Various researchers testing new features and automatically organize unstructured
classification techniques often just compare information (which includes Twitter data) in
their results to baseline performance. There is a real-time, at scale, and accurately:
need for proper and formal comparisons ● Scalability: Analyze hundreds or
between these results arrived through different thousands of tweets mentioning your
features and classification techniques to select brand and automate manual tasks. Easily
scale sentiment analysis tools as your
the best features and most efficient
data grows and gain valuable insights on
classification techniques for particular the go.
applications. ● Real-Time Analysis: Twitter sentiment
analysis is essential for monitoring
Being extremely interested in everything having sudden shifts in customer moods,
detecting if complaints are on the rise,
a relation with Machine Learning, the
and for taking action before problems
independent project was a great occasion to escalate. With sentiment analysis, you
can monitor brand mentions on Twitter
in real-time and gain actionable insights.
● Consistent Criteria: Avoid
inconsistencies that stem from several Naive Bayes Algorithm - The algorithm is named
agents tagging data against different after famous statistician Thomas Bayes who
criteria. Instead, train a machine proposed Bayesian theorem. This theorem assumes
learning model to perform sentiment that all the attributes are conditionally independent
analysis, using one set of rules, on all to each other. In this algorithm, conditional
your Twitter data, so results are probability for each attribute with respect to
consistent. certain class level is calculated. The new document
is classified using the sum of probabilities for each
class. The classification framework is briefly
discussed as follows: Bayes rule is describing the
Related Work probability of an event on prior knowledge of the
occurrence of another event related to it. Then the
Here is a list of facts and research outcomes probability of occurrence of event A given that
that proves that it is the most important step event B has already occurred is
which needs to be taken
1. Sentiment Analysis Of Twitter Data[1]

Remarks: This paper describes the Using both these equations, we can rewrite them
Sentiments Are Significantly High Which collectively as
Shows There Is A Need To Improve Twitter
Sentiment Analysis
2. Supervised Machine Learning[2]

Remarks: This Paper Describes Various


Supervised Machine Learning. We Hope
That The References Cited Will Cover The
Major Theoretical Issues, Guiding The
Research In Interesting Research Directions Result
And Suggesting Possible Bias Combinations The application is a web-based application
That Have Yet To Be Explored which is used to detect whether or not the
3. Study Of Twitter Sentiment Analysis Using application is fake or not by retrieving the
Machine Learning Algorithm Python[3] tweets gave by the users. The application generates
the result which
Remarks: This Research Topic Has Evolved
During The Last Decade With Models is the outcome of the meaningful tweets
Reaching The Efficiency Of Almost 85-90%. and it is categorized into:
1.POSITIVE
2.WEAKLY POSITIVE
3.STRONGLY POSITIVE
Methodology/Algorithm Used 4.NEUTRAL
5.NEGATIVE
6.WEAKLY NEGATIVE
7.STRONGLY NEGATIVE Twitter sentiment analysis comes under the
for example : category of text and opinion mining. It focuses on
analyzing the sentiments of the tweets and feeding
the data to a machine learning model to train it and
then check its accuracy so that we can use this
model for future use according to the results. It
comprises steps like data collection, text
preprocessing, sentiment detection, sentiment
classification, training, and testing the model. This
research topic has evolved during the last decade
with models reaching the efficiency of almost 85%-
90%. But it still lacks the dimension of diversity in
the data. Along with this, it has a lot of application
issues with the slang used and the short forms of
words. Many analyzers don’t perform well when
the number of classes is increased. Also, it’s still
According to the percentage of user and as not tested that how accurate the model will be for
well as admin can determine the topics other than the one in consideration. Hence
the authenticity of the application. sentiment analysis has a very bright scope of
The positive and negative keywords are development in future
stored at first in a lexicon array and then
meaningful tweets are retrieved by NLP
their classification is done by naive Bayes
theorem and finally the decision is made
by a decision tree.
The application will help the users to find
the authenticity of an application and it would
be real because users themselves have
given the tweets.
The sentiment analysis is undoubtedly one
of the fastest-growing research
areas. This area has been used by
thousands of people for generating
algorithms apps etc.
The application is based on the array
which has both positive and negative
keywords.
The use of naive Bayes is done for
the classification of tweets and further, the
the polarity of tweets is obtained with the help
of a decision tree. It is the pictorial representation
of the nodes and leaf which
gives a clear representation of the final
analysis of the application done by users

You might also like