Professional Documents
Culture Documents
[Document title]
cex-stirling
[Company name]
[Date]
Introduction
Twitter is a free microbiology website in which users create a personal profile where they
write a small post compromising of their opinion towards any topic which is called a tweet.
Twitter was the brain child of Jack Dorsey in 2006 his original idea was to create twitter as a
SMS platform. Twitter can be used on many platforms and devices and is accessible to the
users at almost all times. Tweets are composed of 140 characters and are wrote by the user
which are permanent, searchable and they are public. Anyone can search tweets on Twitter,
whether they are a member or not. On twitter consumers use sentiment analysis to
research products of interest they are considering to buying before deciding to purchase
said product. Gathering all this information together can be very useful from a marketing
point of view as it can be used to see public opinion of their product and company or to
gather information on public opinion on a product. It can also be used to establish any
refined feelings.” The Authors of this report decided to analyse the sentiments in tweets to
improve marketing so in the experiment tweets are broken into three different groups;
neutral, positive and negative. For the purpose of the research the neutral tweets were not
include. Twitter messages have many different attitudes which is broken into the following;
Length: a tweet is 140 characters long with the average tweet being 14 to 78
characters long.
Data Availability: the number of tweets that are available are in the millions which
Language Model: The frequency of spelling mistakes and the use of slang words will
Domain: The number of different topics which could be wrote about was taken into
consideration.
The authors used unigrams, bigrams, and parts of speech as features. The unigram was a
feature and the simplest way to retrieve information from a tweet. The researchers found
that the machine-based algorithms found better results than the keyword baseline. Bigrams
were used by the researchers to contain sayings like “not good”. The researchers found it
difficult to work unigrams and bigrams, so they decided to combine them together. The
researchers then found the accuracy of the results was then improved. The last feature they
used parts of speech which they used what was called POS tags due to the fact that a word
may have different meanings. For example, the word ‘over’ may have a negative meaning
while it may also be a noun. It was found that POS tags were not found useful while the
unigram and bigram were found useful to find tweets with certain sentiments in them which
Part of Speech (POS): “A software tool that labels words as one of several categories
to identify the word’s function in a given language. In the English language, words
fall into one of eight or nine parts of speech. POS categories include; noun, verb,