You are on page 1of 17

NAME: OLATUNJI FATAI ABIODUN

MATRIC NUMBER: 20001571


PROJECT TOPIC: APPLICATION OF LONG SHORT-TERM MEMORY
FOR SENTIMENT ANALYSIS OF COVID-19 TWEETS

CHAPTER TWO
2.1 Introduction

Sentiment analysis is the automated process of identifying and classifying subjective information

in text data. This might be an opinion, a judgment, or a feeling about a particular topic or product

feature.

The most common type of sentiment analysis is ‘polarity detection’ and involves classifying

statements as positive, negative or neutral.

Sentiment analysis uses Natural Language Processing (NLP) to make sense of human language,

and machine learning to automatically deliver accurate results.

Natural Language Processing (NLP) is a field of Artificial Intelligence (AI) that makes human

language intelligible to machines. NLP combines the power of linguistics and computer science

to study the rules and structure of language, and create intelligent systems (run on machine

learning and NLP algorithms) capable of understanding, analyzing, and extracting meaning from

text and speech.

Natural Language Processing (NLP) allows machines to break down and interpret human

language. It’s at the core of tools we use every day – from translation software, chatbots, spam

filters, and search engines, to grammar correction software, voice assistants, and social media

monitoring tools.
NLP is used to understand the structure and meaning of human language by analyzing different

aspects like syntax, semantics, pragmatics, and morphology. Then, computer science transforms

this linguistic knowledge into rule-based, machine learning algorithms that can solve specific

problems and perform desired tasks.

Machine learning (ML) is a branch of artificial intelligence (AI) that enables computers to

self-learn and improve over time without being explicitly programmed. In short, machine

learning algorithms are able to detect and learn from patterns in data and make their own

predictions.

In traditional programming, someone writes a series of instructions so that a computer can

transform input data into a desired output. Instructions are mostly based on an IF-THEN

structure: when certain conditions are met, the program executes a specific action.

Machine learning, on the other hand, is an automated process that enables machines to solve

problems and take actions based on past observations.

Long Short Term Memory Networks is a specific type of Recurrent Neural Network (RNN)

that is very effective in dealing with long sequence data and learning long term dependencies.

Long short-term memory (LSTM) units or blocks are part of a recurrent neural network

structure. Recurrent neural networks are made to utilize certain types of artificial memory

processes that can help these artificial intelligence programs to more effectively imitate human

thought.

The recurrent neural network uses long short-term memory blocks to provide context for the way

the program receives inputs and creates outputs. The long short-term memory block is a complex
unit with various components such as weighted inputs, activation functions, inputs from previous

blocks and eventual outputs.

The unit is called a long short-term memory block because the program is using a structure

founded on short-term memory processes to create longer-term memory. These systems are often

used, for example, in natural language processing. The recurrent neural network uses the long

short-term memory blocks to take a particular word or phoneme, and evaluate it in the context of

others in a string, where memory can be useful in sorting and categorizing these types of inputs.

In general, LSTM is an accepted and common concept in pioneering recurrent neural networks.

Twitter boasts 330 million monthly active users (Ying Lin 2020), which allows businesses to

reach a broad audience and connect with customers without intermediaries. On the downside,

there’s so much information that it’s hard for brands to quickly detect negative social mentions

that could harm their business.

2.2 Review of Related Work

2.2.1 Sentiment Analysis and Opinion Mining from Social Media

Due to the huge growth of social media on the web, opinions extracted in these media are used

by individuals and organizations for decision making. Each site contains a large amount of

opinionated text which makes it challenging for the user to read and extract information (G. U.

Vasanthakumar, et al, 2016). This problem can be overcome by using sentiment analysis

techniques. The main objective of sentiment analysis is to mine sentiments and opinions

expressed in the user generated reviews and classifying it into different polarities. The output is

the data annotated with sentiment labels. Machine learning techniques are widely used for

sentiment classification (N. Godbole, M. Srinivasaiah, and S. Skiena, 2016). For a specific
domain D, sentiment data Xi and Yi denoting data Xi has polarity Yi. If the overall sentiment

expressed in Xi is positive, then Yi is +1, else -1. Labelled sentiment data is a pair of sentiment

text and its corresponding sentiment polarity fXi,Yig. If Xi is not assigned with any polarity data

Yi, then it is an unlabelled sentiment data. In supervised sentiment classification methods,

classifiers are trained using labeled data from a particular domain. Semi Supervised classification

method, combines unlabeled data along with few labeled training data to construct the classifier

(S. Li, C.-R. Huang, G. Zhou, and S. Y. M. Lee, 2010).

Applications: There is a variety of information in the form 77 of news blogs, twitter etc.. are

available in social media about different products. Sentiment Analysis can summarize and give a

score that represents the opinion

of that data. This is used by customers depending on their need. There are a number of

applications of sentiment analysis and opinion mining. The area where Sentiment Analysis is

used is in Finance, Politics,

Business and public actions. In business Domain, Sentiment analysis is used to detect the

customer's interest in their product. Sentiment analysis in the political domain is used to get

clarity on the politician's position. Opinion Mining is also used to find the public interest on the

newly applied rules by the government. Motivation: Current trend is to look for opinions and

sentiments in the product reviews that are available in large scale in social media. Before making

a decision, we tend to look at the sentiment analysis results of the opinion given by different

users. This helps any customer to decide his opinion on that product. As data available in large

scale, it is a laborious process to look into all the user opinions. Hence Sentiment analysis is

required. The main Objective of sentiment analysis is to classify the sentiment into different

categories. Fig 2.1, shows the overall architecture of the sentiment analysis. Document level,
sentence level and aspect level are the different levels of sentiment classification. Classifying

each document into positive or negative classes is called document-level sentiment classification.

While expressi- ng the sentiment of a document by this type of classifier, it assumes that

document contains the opinion of the user about a single object. Aspect level sentiment analysis

classifies the opinion about a document assuming that the opinion is expressed about different

aspects in a document.

Sentiment classifiers, designed using data from one domain may not work with high accuracy if

the same is used to classify the data from a different do- main. One of the main reasons is that the

sentiment words of a domain can be different from another domain. Thus, Domain adaptations

are required to bridge the gaps between domains. The Domain used to train the classifier is

called source domain and the domain to which we apply the trained classifier is called the target

domain. The advantage of this method is that we need some or no labeled data of the target

domain, where labeled data is costly as well as in-feasible to manually label the reviews for each

domain type. This type of classification is called Cross Domain Sentiment Classification.

Heterogeneous domain adaptation is required when domains of different dimensions are input to

the topic adaptive sentiment classifier.


Fig: 2.1Architecture of SentimentAnalysis

Sentiment classifiers can be broadly classified into machine learning and lexicon based. Machine

learning algorithms are used in machine learning approaches. These algorithms can work in

supervised, semi-supervised or unsupervised learning methods. Supervised learning methods

give more accurate results compared to semi-supervised and unsupervised lear- ning methods,

but it requires labeled data which is an expensive and time consuming process. Semi-super- vised

approach uses Easy Adapt (++[EA++]) which is easier than the Easy Adapt [EA] which requires

labeled data from source and target domain. This is because it uses both labelled and unlabeled

data from the target domain which results in superior performance

theoretically and empirically over EA and hence it can be efficiently used for preprocessing (J.

Jiang and C. Zhai, 2007). Lexicon based approach utilizes Sentiment lexicon to analyze the

sentiments in a review. Lexicon based approach can use a dictionary or corpus to classify the

sentiment words. Due to the shortage of labeled data, a single classifier can be designed to

classify reviews from different domains. But a classifier designed to classify data from one

domain may not work efficiently on another domain.

This is due to domain specific words which are different for every domain.

Support vector machines and Naive Baye's classifiers are the important classifiers that support

machine learning approach. Support vector machines classify data by finding hyper-planes that

separates into different classes. Naive Baye's classifier is a probabilistic classifier based on Bayes

theorem and the strong independence between the features. As there is a shortage of labeled data,

a single classifier can be designed to classify reviews from different domains. But a classifier
designed to classify data from one domain may not work efficiently on another domain. This is

due to domain specific words which are different for different domains.

2.2.2 Sentiment Analysis as a Service: A social media-based sentiment analysis

framework

Social media platforms, i.e., social information services, such as Facebook, Twitter, etc., have

emerged as a source of free public data (A. Dingli, et, al, 2015). During any emergency event, a

large number of users rapidly generate and share the data by using social information services.

Thus, these users are interpreted as human sensors, i.e., social sensors (S. Takeshi, et, al, 2010).

The data generated by social sensors has two beneficial features:

a. It is composed of the subjective information (e.g., sentiments and opinions) of social

sensors.

b. It contains the spatio- temporal information of social sensors.

Sentiment analysis facilitates to extract and understand human dynamics such as behaviors,

trends, attitudes and emotions from the subjective information (J.Guerrero, et, al, 2015). In

addition, the spatio-temporal information in the social sensor’s data provides the promising

opportunity to gain insights of hu- man’s activities based on geographical locations (M. Hwang,

et, al, 2013). Thus, combining both features can benefit to understand sentiments and emotions

based on various geo-locations.

Despite the several benefits of social information services, it has met with some serious

challenges. Social information services often contain a lot of noise, i.e., irrelevant and

unnecessary data. Moreover, there are diverse types of social information services available

online. These services provide various features and impose different limitations (e.g., text length)

for the data sharing. As a result, social information services have diverse data characteristics
such as size, qual- ity, etc. Thus, various types of social information services require different

mechanisms (e.g., tools and algorithms) for extracting the useful information. Although, there

are several online tools available for sentiment analysis, however, they only focus on

general-purpose search and analysis. Moreover, many online tools are dedicated to a single

information service. Thus, end users may need to use multiple tools in an ad-hoc manner. Using

various tools is time consuming and provides inconsistent views of the social sensors’ data(S.

Wan and C. Paris, 2014).

In the paper, ‘Sentiment Analysis as a Service’ (SAaaS) is proposed as a framework that

abstracts sentiments from multiple social information services, analyses and transforms into

useful information, and delivers as a service. We classify social information services by using

various properties of the social sensors’ data. SAaaS uses this classification to dy- namically

compose services for noise removal, geo-tagging (e.g., location extraction) and sentiment

extraction. Finally, the results are presented in various formats, i.e., maps, charts.

SAaaS uses a generic information composition approach to compose the social sensors’ data as a

service from multiple sources for sentiment analysis. Traditional approaches do not consider

different types and characteristics of social information services for sentiment analysis. On the

contrary, SAaaS takes into account the different properties such as data size, type, etc., and

dynamically composes appropriate services for sentiment analysis. In particular, we focus in the

domain of disease surveillance. However, our framework is not limited to disease surveillance

and it can be applied to other domains applicable to sentiment analysis. The main contribution of

this work is as follows:


● A service framework that exploits the spatio-temporal properties of social information

services to monitor epidemic outbreaks via sentiment analysis. Our frame- work includes

a new service model for composite and component services for sentiment analysis.

● We present a classification of social information ser- vices based on the social sensors’

data properties. We propose a social information service classification driven service

composition mechanism to compose ser- vices for sentiment analysis

● We present a new service quality model to evaluate the quality of social information

services.

2.3 Social Media and Crisis Events

During a crisis, whether natural or man-made, people tend to spend relatively more time on

social media than the normal. As the crisis unfolds, social media platforms such as Facebook and

Twitter become an active source of information (Imran M, et, al, 2015) because these platforms

break the news faster than official news channels and emergency response agencies (Imran M, et,

al, 2020). During such events, people usually make informal conversations by sharing their

safety status, querying about their loved ones’ safety status, and reporting ground level scenarios

of the event (Imran M, et, al, 2015). This process of continuous creation of conversations on such

public platforms leads to accumulating a large amount of socially generated data. The amount of

data can range from hundreds of thousands to millions (Kalyanam J, et, al, 2016). With proper

planning and implementation, social media data can be analyzed and processed to extract

situational information that can be further used to derive actionable intelligence for an effective

response to the crisis. The situational information can be extremely beneficial for the first
responders and decision-makers to develop strategies that would provide a more efficient

response to the crisis.

In recent times, the most used social media platforms for informal communications have been

Facebook, Twitter, Reddit, etc. Amongst these, Twitter, the microblogging platform, has a

well-documented Application Programming Interface (API) for accessing the data (tweets)

available on its platform. Therefore, it has become a primary source of information for

researchers working on the Social Computing domain. Earlier works have shown that the tweets

related to a specific crisis can provide better insights about the event. In the past, millions of

tweets specific to crisis events such as the Nepal Earthquake, India Floods, Pakistan Floods,

Palestine Conflict, Flight MH370, etc., have been collected and made available (Imran M, et, al,

2016). Such Twitter data have been used in designing machine learning models for classifying

unseen tweets to various categories such as community needs, volunteering efforts, loss of lives,

and infrastructure damages. The classified tweets corpora can be

a. trimmed or summarized and sent to the relevant department for further analysis,

b. used for sketching alert-level heat maps based on the location information

contained within the tweet metadata or the tweet body.

Similarly, Twitter data can also be used for identifying the flow of fake news. If miss-information

and unverified rumors are identified before they spread out on everyone’s news feed, they can be

flagged as spam or taken down.

Further, in-depth textual analyses of Twitter data can help

a. discover how positively or negatively a geographical region is being textually-verbal

towards a crisis,

b. understand the dissemination processes of information throughout a crisis.


2.4 Novel Coronavirus (COVID-19)

As of July 17, 2020, the number of Novel coronavirus (COVID-19) cases across the world had

reached more than thirteen million, and the death toll had crossed half a million (Worldometer,

2020). States and countries worldwide are trying their best to contain the spread of the virus by

initiating lockdown and even curfew in some regions. As people are bound to work from home,

social distancing has become a new normal. With the increase in the number of cases, the

pandemic’s seriousness has made people more active in social media expression. Multiple terms

specific to the pandemic have been trending on social media for months now. Therefore, Twitter

data can prove to be a valuable resource for researchers working in the thematic areas of Social

Computing, including but not limited to sentiment analysis, topic modeling, behavioral analysis,

fact-checking and analytical visualization.

Large-scale datasets are required to train machine learning models or perform any kind of

analysis. The knowledge extracted from small datasets and region-specific datasets cannot be

generalized because of limitations in the number of tweets and geographical coverage. Therefore,

this paper introduces a large-scale COVID-19 specific English language tweets dataset,

hereinafter, termed as the COV19 Tweets Dataset. As of July 17, 2020, the dataset has more than

310 million tweets and is available at IEEE DataPort (Lamsal R, 2002). The dataset gets a new

release every day. The dataset’s geo version, the GeoCOV19Tweets Dataset, is also made

available [29]. As per the stats reported by the IEEE platform, the datasets (Lamsal R, 2020)have

been accessed over 74.5k times, collectively, worldwide.


2.5 Artificial Intelligence

Artificial intelligence enables computers and machines to mimic the perception, learning,

problem-solving, and decision-making capabilities of the human mind.

In computer science, the term artificial intelligence (AI) refers to any human-like intelligence

exhibited by a computer, robot, or other machine. In popular usage, artificial intelligence refers

to the ability of a computer or machine to mimic the capabilities of the human mind—learning

from examples and experience, recognizing objects, understanding and responding to language,

making decisions, solving problems—and combining these and other capabilities to perform

functions a human might perform, such as greeting a hotel guest or driving a car.

After decades of being relegated to science fiction, today, AI is part of our everyday lives. The

surge in AI development is made possible by the sudden availability of large amounts of data and

the corresponding development and wide availability of computer systems that can process all

that data faster and more accurately than humans can. AI is completing our words as we type

them, providing driving directions when we ask, vacuuming our floors, and recommending what

we should buy or binge-watch next. And it’s driving applications—such as medical image

analysis—that help skilled professionals do important work faster and with greater success.

As common as artificial intelligence is today, understanding AI and AI terminology can be

difficult because many of the terms are used interchangeably; and while they are actually

interchangeable in some cases, they aren’t in other cases. What’s the difference between artificial

intelligence and machine learning? Between machine learning and deep learning? Between

speech recognition and natural language processing? Between weak AI and strong AI? This
article will try to help you sort through these and other terms and understand the basics of how

AI works.

2.5.1 Artificial intelligence, machine learning, and deep learning

The easiest way to understand the relationship between artificial intelligence (AI), machine

learning, and deep learning is as follows:

Think of artificial intelligence as the entire universe of computing technology that exhibits

anything remotely resembling human intelligence. AI systems can include anything from an

expert system—a problem-solving application that makes decisions based on complex rules or

if/then logic—to something like the equivalent of the fictional Pixar character Wall-E, a

computer that develops the intelligence, free will, and emotions of a human being.

Machine learning is a subset of AI applications that learns by itself. It actually reprograms

itself, as it digests more data, to perform the specific task it's designed to perform with

increasingly greater accuracy.

Deep learning is a subset of machine learning applications that teaches itself to perform a

specific task with increasingly greater accuracy, without human intervention.


Fig 2.2: Artificial intelligence

2.5.2 Machine Learning

Machine learning applications (also called machine learning models) are based on a neural

network, which is a network of algorithmic calculations that attempts to mimic the perception

and thought process of the human brain. At its most basic, a neural network consists of the

following:

a. An input level, where data enters the network.

b. At least one hidden level, where machine learning algorithms process the inputs and

apply weights, biases, and thresholds to the inputs.

c. An output layer, where various conclusions—in which the network has various degrees of

confidence—emerge.
Fig 2.3: Basic Neural Network

Machine learning models that aren’t deep learning models are based on artificial neural networks

with just one hidden layer. These models are fed labeled data—data enhanced with tags that

identify its features in a way that helps the model identify and understand the data. They are

capable of supervised learning (i.e., learning that requires human supervision), such as periodic

adjustment of the algorithms in the model.

2.5.3 Deep Learning

Deep learning models are based on deep neural networks—neural networks with multiple hidden

layers, each of which further refines the conclusions of the previous layer. This movement of

calculations through the hidden layers to the output layer is called forward propagation. Another
process, called backpropagation, identifies errors in calculations, assigns them weights, and

pushes them back to previous layers to refine or train the model.

Fig 2.4: Deep Neural Network

While some deep learning models work with labeled data, many can work with unlabeled

data—and lots of it. Deep learning models are also capable of unsupervised learning—detecting

features and patterns in data with the barest minimum of human supervision.

A simple illustration of the difference between deep learning and other machine learning is the

difference between Apple’s Siri or Amazon’s Alexa (which recognize your voice commands

without training) and the voice-to-type applications of a decade ago, which required users to
“train” the program (and label the data) by speaking scores of words to the system before use.

But deep learning models power far more sophisticated applications, including image recognition

systems that can identify everyday objects more quickly and accurately than humans.

You might also like