A18 CU6051NA A2 CW Coursework 16034872 Anjil Shrestha PDF

Module Code & Module Title
CU6051NA - Artificial Intelligence
Assessment Weightage & Type

80% Individual Coursework
Year and Semester

2018-19 Autumn
Student Name: Anjil Shrestha
London Met ID: 16034872
College ID: sity1c117018
Assignment Due Date: 11th February, 2019
Assignment Submission Date: 11th February, 2019
I confirm that I understand my coursework needs to be submitted online via Google Classroom under the
relevant module page before the deadline in order for my assignment to be accepted and marked. I am fully
aware that late submissions will be treated as non-submission and a marks of zero will be awarded.
Table of contents
1. Introduction ................................................................................................................................... 1
1.1. AI, ML, NLP & Sentiment Analysis .................................................................................... 1
1.2. Problem Domain ................................................................................................................... 3
2. Background ................................................................................................................................... 4
2.1. Sentiment analysis and its approaches ................................................................................ 4
2.1.1. Approaches .................................................................................................................... 5
2.2. Research works done on Sentiment Analysis ..................................................................... 6
2.3. Current applications of Sentiment analysis ........................................................................ 7
3. Solution .......................................................................................................................................... 8
3.1. Approach to solving the problem ........................................................................................ 8
3.2. Explanation of the AI algorithm .......................................................................................... 9
3.3. Pseudocode........................................................................................................................... 13
3.4. Flowchart ............................................................................................................................. 14
3.5. Development ........................................................................................................................ 15
3.6. Achieved result .................................................................................................................... 19
3.6.1. Home page .................................................................................................................... 19
3.6.2. Training progress .......................................................................................................... 20
3.6.3. Sentiment prediction page ............................................................................................. 21
3.6.4. Test for positive sentiment and result ........................................................................... 22
3.6.5. Test for negative sentiment and result .......................................................................... 23
3.6.6. Visualization ................................................................................................................. 24
4. Conclusion ................................................................................................................................... 26
4.1. Analysis of the work done .................................................................................................. 26
4.2. Solution addressing the real-world problems ................................................................... 27
4.3. Further work ....................................................................................................................... 28
5. References .................................................................................................................................... 29
Table of figures
Figure 1 Machin learning types (Morgan, 2018) .................................................................................... 1
Figure 2 Relation between AI, NLP, ML and Sentiment Analysis ......................................................... 2
Figure 3 Sentiment Analysis Overview .................................................................................................. 4
Figure 4 Difference approaches on Sentiment Analysis ......................................................................... 5
Figure 5 Bayes Theorem ......................................................................................................................... 9
Figure 6 Flowchart of algorithm ........................................................................................................... 14
Figure 7 Home Page for training the model .......................................................................................... 19
Figure 8 Training progress page ........................................................................................................... 20
Figure 9 Sentiment prediction page ...................................................................................................... 21
Figure 10 Positive sentiment test .......................................................................................................... 22
Figure 11 Positive sentiment result ....................................................................................................... 22
Figure 12 Negative sentiment test......................................................................................................... 23
Figure 13 Negative sentiment result ..................................................................................................... 23
Figure 14 Visualization tab ................................................................................................................... 24
Figure 15 Total reviews on particular course........................................................................................ 24
Figure 16 Total positive, negative and neutral reviews ........................................................................ 25
Table of tables
Table 1 Labeled training data.................................................................................................................. 9
Table 2 Bag of words ............................................................................................................................ 10
Table 3 All libraries and tools used ...................................................................................................... 15
Table 4 Libraries used for data pre-processing ..................................................................................... 16
Table 5 Library used for splitting training and testing set .................................................................... 17
Table of Abbreviation
I. AI – Artificial Intelligence
II. ML – Machine Learning
III. NLP - Natural Language Processing
IV. NLTK – Natural language tool kit
V. SVM – Simple Vector Machine
VI. CNN - Convolutional Neural Network
VII. RNN – Recurrent Neural Network
VIII. RNTN – Recursive Neural Tensor Net
CU6051NA Artificial Intelligence
1. Introduction
1.1. AI, ML, NLP & Sentiment Analysis
Artificial Intelligence is the ability of a machine or a software to perceives its environment

and takes actions that are relatable to human behavior and this action has high chances of
success. AI is not a system but is implemented in the system that has an ability to learn and
solve problems. (Sharma, 2018)
AI is a broad field of study and is incorporated in variety of technology and machine learning
is one of it. Machine learning is the subfield of Artificial Intelligence that allows software
applications to automatically learn and improve from experience without being explicitly
programmed. Machine learning focuses on the construction of algorithms to receive inputs
and statistical analysis to predict output while new data are available to update outputs
whenever new data is available. (Rouse, 2018) (Reese, 2017)
Primary goal of Machine Learning is to allow the computers learn automatically without
human intervention or without being actually programmed. The process involves searching
data for patterns and adjusting program actions. Under machine learning there are supervised
learning, un-supervised learning and reinforcement learning which every individual has
different process of training data and fitting the model.
Figure 1 Machin learning types (Morgan, 2018)
Another technology that AI has incorporated is Natural Language Processing (NLP). NLP is
a fundamental element of AI for communicating with an intelligent system using natural
language. Some famous applications of NLP are speech recognition, text translation and
sentiment analysis. Basically, NLP is like building a system that can understand human
Anjil Shrestha | 16034872 1

language. In order to make machine understand a language, the machine should first learn
how to do it and this is where machine learning is used within the NLP. (BOUKKOURI,
2018) (Expertsystem, 2018)
Sentiment analysis falls under the different applications of NLP and is a process of
determining whether a piece of writing is positive, negative or neutral. Basically, it is a text
classification which aims to estimate sentiment polarity of a body of text based solely on its
content i.e. text can be defined as a value that says whether the exposed opinion is positive
(polarity=1), negative (polarity=0), or neutral. In order to get machine extract sentiments out
of piece of texts the machine needs to be trained using pre-labeled dataset of positive,
negative, neutral content. This means that, techniques of NLP and ML are required for a
system to perform sentiment analysis.
Figure 2 Relation between AI, NLP, ML and Sentiment Analysis

1.2. Problem Domain
Due to advancement in internet today data is being generated in such a high scale that going
to each pieces of data are humanly impossible. In business data is very useful for findings of
different problems and analyzing those data helps to plan next step for improvising the
business. One of the most important part of a business is taking account on public opinions
and customers feedback on their brands and services. With all those huge volumes of
customer feedbacks it becomes hard to determine whether their services are flourishing or
customers are not liking their services or product. Public opinions on particular product is
what makes that product improve over time and its very challenging to determine whether the
opinions are positive or negative when the opinions are in huge amount. (Stecanella, 2017)
(Gupta, 2017)
Coursera is a huge online learning platform. It provides thousands of courses and has
thousands of viewers or customers. Viewers leave their feedback on their learning
experiences and this feedback is also generated in thousands. Determining whether a
particular feedback is positive, negative or neutral along with thousands of other feedbacks is
humanly impossible. Feedbacks are very important because by the help of its performance of
a particular course can be tracked and helps in further business decisions. Sentiment analysis
can be used to identify and extract subjective information which will help the business to
understand the social sentiment of their courses.

2. Background
2.1. Sentiment analysis and its approaches
Sentiment analysis is not a straight forward procedure, there are many factors that determines
a sentiment of speech or a text. Text information can be categorized into two main types in
general: facts and opinions. Opinions are of two types: direct and comparative. Direct
opinions give an opinion about an entity directly. For example, “This course is helpful”. In
comparative opinions the opinion is expressed by comparing an entity with another example
for example “The teaching method of course A is better than that of course B”. These
collected opinions on fresh hands can be made structured by the help of sentiment analysis
systems. (Stecanella, 2017)
There are various types of sentiment analysis. Some important types are systems that focus
on polarity (positive, negative, neutral) and some systems that detect feelings and emotions
or identify intentions. Polarity of a text is associated with particular feelings like anger,
sadness, or worries (i.e. negative feelings) or happiness, love or enthusiasm (i.e. positive
feelings). Lexicons and machine learning algorithm are used to detect the feelings and
emotions from texts. It gets very tricky when a system is restored to lexicons as the way that
people express their emotions varies a lot and so do the lexical items they use.
Figure 3 Sentiment Analysis Overview

2.1.1. Approaches
Currently there are many methods and algorithms introduced that extracts sentiment out of
texts. Computation linguistic is very huge that research and works are still going on to
improve the end result or accuracy that these methods provide. The sentiment analysis
systems are classified as following:
Figure 4 Difference approaches on Sentiment Analysis
2.1.1.1. Rule-based systems
In this approach, set of rules are defined that identifies subjectivity, polarity, or the subject
of an opinion via some kind of scripting language. The variety of inputs that may be used
in this approach are classic NLP techniques like tokenization, part of speech tagging,
stemming, parsing and other resources, such as lexicons. (Stecanella, 2017)
2.1.1.2. Automatic systems
This is the approach that relies on machine learning techniques to learn from data. In this
approach the task is modeled as a classification problem where a classifier is fed with a text
and returns corresponding sentiment e.g. positive, negative or neutral. The classifier is
implemented by first training a model to associate a particular input to the corresponding
output with training samples. The pairs of feature vectors and tags (e.g. positive, negative,

or neutral) are fed into the machine learning algorithm to generate a model. The second step
is the prediction process where the unseen text inputs are transformed into feature vectors
by the feature extractor. The predicted tags are generated when those feature vectors are fed
in the model. Under supervision learning the classification algorithms that are widely used
are Naïve Bayes, Logistic Regression, Support Vector machines and Neural Networks.
(Walaa Medhat, 2014)
2.1.1.3. Hybrid systems
It is the approach that combines the best of both rules based an automatic. Combining both
approaches can improve the accuracy and precision of result.
2.2. Research works done on Sentiment Analysis
Many research works have been carried out on sentiment analysis. On one research conducted
by Pang and Lee they have described the existing techniques and approaches for an opinion-
oriented information retrieval. Their survey includes the material on summarization of
evaluative text and on broader issues regarding privacy, manipulation, and economic impact
that the development of opinion-oriented information-access services gives rise to. (Bo Pang,
2008)
In another research the authors used web-blogs to construct corpora for sentiment analysis
and use emoticons assigned to blog posts as indicators of users’ mood. In this research SVM
and CRF learners were used to classify sentiments at the sentence level. Additionally, several
strategies were investigated to determine overall sentiment of the document. This research
concluded as the winning strategy is defined by considering the sentiment of the last sentence
of the document as the sentiment at the document level. (Changhua Yang, 2007)
Alec Go and team performed a sentiment search by using Twitter to collect training data.
Various classifiers were used in a corpora constructed by using positive and negative samples
from emoticons. Among the classifiers used Naïve Bayes classifier obtained by best result
with accuracy up to 81% on their test set but this method when used with three classes
(“negative”, “positive” and “neutral”) showed bad performance. (Alec Go, 2009)

In a research done by Alexander Pak and Patrick Paroubek used twitter as a corpus for
Sentiment Analysis and opinion mining. Their research paper focuses on using Twitter for
the task of sentiment analysis. Their paper includes on procedures for automatic collection of
corpus and approaches on performing linguistic analysis of the collected corpus. They have
further built sentiment classifier by using the corpus, that is able to determine polarity
(positive, negative and negative) of a document. (Alexander Pak, 2008)
2.3. Current applications of Sentiment analysis
Sentiment analysis bas become a key tool for making sense of the data where 2.5 quintillion
of data is generated every day. This has helped companies to get key insights and automate
all kind of process and analytics for improving business. Sentiment analysis is being used for
various purposes. In a company where it manufactures different types of products sentiment
analysis has helped track the performance of the product in the market by collecting
sentiments from the customer feedback and reviews.
Sentiment analysis is being used on various aspects. Some common aspects are:
• Brand Monitoring
• Customer Support
• Customer Feedback
• Product Analytics
• Market Research and Analysis
• Workforce Analytics & Voice of the Employee
• Spam filtering

3. Solution
3.1. Approach to solving the problem
Taking account of above research and explanations it is clear that sentiment analysis can be
used for various aspects like:
• Brand Monitoring
• Customer Support
• Customer Feedback
• Product Analytics, etc.
The ideal solution in achieving above aspects is the use of machine learning technique and
algorithms by incorporating some NLP techniques in data preprocessing. Supervision
learning is the preferred approach to achieve this task of predicting sentiment. Kaggle holds
many datasets for sentiment analysis and for this particular task the labeled dataset on
Coursera’s course reviews is to be used as the training dataset. There are many algorithms
available to fit the model into. Under neural network there are algorithms like RNN, CNN,
RNTN etc. and under non-neural networks-based models there are naive bayes, SVM,
FastText, Deepforest. For the given task Naïve Bayes is the algorithm for predicting the
sentiment. It is considered to be used as the classifier due to following reasons: (Gupta, 2018)
(Shailendra Singh Kathait, 2017)
- Highly practical method

- Frequently used for working with natural language text documents.
- Naive because of storing independence assumption it makes
- Probabilistic model
- Fast, accurate and reliable

3.2. Explanation of the AI algorithm
Naïve Bayes is a probabilistic algorithm that takes advantage of probability theory and
Bayes’ theorem to predict sentiment of a text. In this algorithm the probability of each tag
for a given text is calculated and output is the tag with highest probability. In probability
theory, Bayes rule describes the probability of a feature based on prior knowledge of
conditions that might be related to that feature. (Stecanella, 2017)
Figure 5 Bayes Theorem
P(A|B) – posterior
P(A) – prior
P(B) – evidence
P(B|A) – likelihood
The first step in naïve bayes algorithm is creating a frequency table containing word
frequencies. Every document is treated as a set of the words it contains by ignoring word
order and sentence construction. From the training data the text can be represented by using
the bag of words approach. It is an approach where each word from a sentence is separated
and its repentance in that sentence is counted. For example:
Training data Label

Helpful course and materials. +
Boring. -
Don’t waste time in this. -
Useful materials and content. +
Helped a lot. Thanks +
Table 1 Labeled training data

Vocabulary of unique words ignoring case and punctuations:
(Helpful, course, and, materials, boring, don’t, waste, time, in, this, useful, content, helped, lot,
thanks)
helpf cour an materi bori don was tim i thi usef conte help lo than a L
ul se d als ng t te e n s ul nt ed t ks a
b
e
l
Helpfu 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 +
l
course
and
materia
ls.
Boring 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 -
.
Don’t 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 -
waste
time in
this.
Useful 0 0 1 1 0 0 0 0 0 0 1 1 0 0 0 0 +
materia
ls and
content
.
Helped 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 +
a lot.
Thanks
Table 2 Bag of words
To predict a review to be positive or negative bayes theorem can be used:
Let’s take a review: “I dont like it.”
P(“ I dont like it” | + ) * P (+) and P(“ I dont like it” | + ) * P (-). Comparison between these
two probabilities can be made to separate either the given review is positive or negative.
As we are using naïve bayes algorithm we assume every word in a sentence is independent
of the other ones so we are no longer looking at entire sentences, but rather at individual
words.
So, for P(“ I dont like it” | + ) * P (+) we write P(+ ) * P( I | + ) * P ( don’t | + ) * P ( like | + )
* P ( it | + ) and for negative P(“ I dont like it” | - ) * P (-) we write P(- ) * P( I | - ) * P ( don’t
| - ) * P ( like | - ) * P ( it | - ).

For positive:
P( + ) = 3/5 = 0.6
P( I | + ) = (0+1)/(10+16)=0.0384
P( don’t | + ) = (0+1)/(10+16)=0.0384
P (like | + ) = (0+1)/(10+16)=0.0384
P (it | +) = (0+1)/(10+16)=0.0384
Y+ = P(+ ) * P( I | + ) * P ( don’t | + ) * P ( like | + ) * P ( it | + ) = 0.09216
For negative:
P ( - ) = 2/5 = 0.4
P( I | - ) = (0+1)/(6+16)= 0.0454
P( don’t | - ) = (1+1)/(6+16)= 0.0909
P (like | - ) = (0+1)/(6+16)= 0.0454
P (it | -) = (0+1)/(6+16)= 0.0454
y- = P(- ) * P( I | - ) * P ( don’t | - ) * P ( like | - ) * P ( it | - ) = 0.19986
As value of y- is greater that y+ the review is classified as negative. This is how bayes theorem
is used in naïve bayes classifier.
To increase the performance of this classifier some advanced NLP techniques are used they
are listed below:
- Removing stopwords
- Tokenization.
- Ignoring case and punctuation
- Strip white space.
- Remove numbers and other characters

Naïve bayes classifier can be effectively implemented using python. This algorithm is
implemented using python programming language as it provides many libraries for data pre-
processing, NLP and machine learning. The libraries are listed below:
- Pandas
- NumPy
- Scikit-learn
- NLTK
- Regex
A predicting model will be built using these python libraries and the end product will be a
web app built using Flask Framework.

3.3. Pseudocode
Import necessary libraries (pandas, sklearn, nltk tools)
Collect labeled training datasets
Read dataset and separate sentiment text and its sentiment label.
dataframe = Pandas.readCsv(“training data”)
x = datafrane.sentimentText
y = sentimentLabel
Split X and Y into training and testing set
X_train, X_test,y_train,y_test=train_test_split(X,Y,test_size=0.2,random_state=1)
Perform data pre-processing using countvectorizer.
Remove stopwords.
Tokenization.
Ignoring case and punctuation
Strip white space.
Remove numbers and other characters
Train the model on training set
model=naive_bayes.MultinomialNB()
model.fit(X_train,y_train)
Make the prediction on testing set
my_test_data=['This is really good','This was bad']
my_vectorizer=vectorizer.transform(my_test_data)
model.predict(my_vectorizer
Compare actual response value with the predicted response value.

3.4. Flowchart
Figure 6 Flowchart of algorithm

3.5. Development
As supervision learning is the selected approach, the development process starts by collecting
labeled dataset to prediction of sentiment. Below is the all tools and libraries used for
development this project:
Tools/libraries Description
1.) Pandas It is a python package for data manipulation and
analysis.
2.) re (Regex) It is python package for regular expressions or
simply regex. Regex is a sequence of characters for
finding other strings or sets of strings using
specialized syntax.
3.) NLTK It is a platform of python to work with natural
language. It contains many libraries for
classification, tokenization, stemming, etc.
4.) Scikit-learn It is a library in python with many supervised and
unsupervised algorithms. Naïve Bayes algorithm
from this library is used for training the model. The
feature extraction method “CountVectorizer” is the
method this library provides is used in this project
for extracting features.
5.) Flask framework It is a microframework for python. This framework
is used to develop a web app for training the model
and predicting sentiment.
6.) Bootstrap It is a free and open-source web framework
containing HTML and CSS-based design templates.
This framework was used to create a web app for
this project.
7.) Highchart.js It is a software library for charting written in pure
JavaScript. This tool was used to display the total
reviews of courses.
Table 3 All libraries and tools used
For better understanding the development process of training the model and predicting the
sentiment was divided into following steps:

1. Dataset Collection
A dataset containing user ratings and reviews on particular courses was extracted from
Kaggle.com, which is a huge online community of data scientists and machine learners.
There are many datasets published and the dataset used in this project was one of the
datasets published in this website.
2. Data pre-processing
In this process the dataset in the form of CSV is imported and a dataframe is created.
The data in the dataframe contains many wanted characters and un-supported characters
like reviews in different language. These characters are removed using regex. The rows
with empty review are also removed.
Tools and libraries used in this process:
Tools/libraries Purpose
1.) Pandas This library is used to import the csv file
and create multidimensional data structure.
2.) re (Regex) This module of python is used to find
reviews from dataset that do not match the
provided pattern and remove from
dataframe. Basically, this module is used to
filter unwanted characters.
Table 4 Libraries used for data pre-processing
3. Separate reviews and corresponding labels

After the data-pre-processing and filtering is complete. The reviews and its
corresponding labels are extracted and prepared for next process by creating two
variables, one of data from filtered column of prepared dataframe and another from
label column.

4. Split the prepared data into training and testing set

In this process training data set and testing data set are separated with test size of 0.2.
The tools and libraries used in this process are tabulated below:
Tools/libraries Purpose
1.) Train_test_split from Scikit- This method is used for separating the
learn training data and testing data.
Table 5 Library used for splitting training and testing set
5. Fit transform the review data into CountVectorizer
After the training and testing data is separated. The next task is to prepare the training
data for predictive modeling. CountVetctorizer method of Scikit_learn library is used
for feature extraction from text data. The stopwords are removed and all the texts are
converted to lower case in this process.
6. Fit the training data set into MultinomialNB model.
After the feature extraction works are completed the next step is to fit into the predictive
model. Naive Bayes classifier is used as the predictive model which Multinomial naïve
bayes classifier is used as the distribution.
7. Test data to calculate prediction

After the model has been trained the test data is used to calculate the accuracy score.
Different texts are tested out to observe the result of sentiment prediction.

Above is the complete explanation of development steps and the tools/libraries used for
development. It explains the process of creating a sentiment predicting model using
machine learning algorithm. For making this trained model more usable, a web app is
developed using flask framework. With the integration of bootstrap framework, the
front-end of web app is developed. For visualization of total reviews on particular
course, highcart.js software is used. A bar diagram is displayed which shows the total
reviews made on particular course.

3.6. Achieved result

On running the test.py python script, the system initiates. On startup the home page is
displayed. The chronological order of the web app workflow is described below with
related screenshots of the program.
3.6.1. Home page
Figure 7 Home Page for training the model
It is the home page displayed during start up of the web app. This page gives an option
to train two datasets. There are two datasets to train the model with. On clicking the train
button, the data preprocessing and training of the data takes place in the backend. The
sentiment.py script does all the task of data preprocessing and training data.

3.6.2. Training progress
Figure 8 Training progress page
This page is displayed after the training process begins. In backend the data pre-processing
is getting carried out and the model gets trained using Naïve Bayes classifier.

3.6.3. Sentiment prediction page
Figure 9 Sentiment prediction page
This page is displayed after the model has been trained. The accuracy measured is
displayed with label “Accuracy Score”. For predicting sentiment of a text, the text is to be
input through the text box form and submit for obtaining the result.

3.6.4. Test for positive sentiment and result
Figure 10 Positive sentiment test
Figure 11 Positive sentiment result
A text with positive sentiment is inserted to check whether the system correctly predicts
or not. Tested text is “I love this course”. On clicking submit button the result is display
below the submit button. The result is as expected. The system correctly predicted the
sentiment of the text.

3.6.5. Test for negative sentiment and result
Figure 12 Negative sentiment test
Figure 13 Negative sentiment result
A text with negative sentiment is inserted to check whether the system correctly predicts or
not. Tested text is “boaring useless course”. On clicking submit button the result is display
below the submit button. The result is as expected. The system correctly predicted the
sentiment of the text.

3.6.6. Visualization
Figure 14 Visualization tab
On clicking the floating navigation button, a navigation horizontal bar is opened which
displays an option to display visualization page.
Figure 15 Total reviews on particular course
This is the visualization page that displays the bar diagram showing the total reviews made
on particular course. The bar diagram shows the data of random 12 courses.

Figure 16 Total positive, negative and neutral reviews
Below the bar diagram, total positive, negative and neutral review is displayed. This is the
data extracted from the dataset.

4. Conclusion
4.1. Analysis of the work done
Due to increase of computational power and development on big data the field of AI is
flourishing and has brought revolutionary changes in current technologies and has not yet
reached its furthest extent. In this report short explanation of AI is done highlighting its impact
on other different fields. Making a machine or a software smart can be achieved by the use of
different machine learning approaches. How machine learning techniques makes machine or
a software achieve this is explained in this report. Making a machine understand our natural
language and act accordingly is one of the ultimate goals of AI and different machine learning
algorithms has made this possible to some extent. Explanation of NLP and different
applications of it is described briefly in this report. For a business to succeed, it has to monitor
many aspects including customer review, customer feedback, brand monitoring etc. and in
this report how these can be achieved by the implementation of machine learning algorithms
is highlighted. Sentiment analysis has been introduced in the introduction part of this project
with the analysis on approaches it takes to tackle with different problem domains.
Different approaches can be taken in sentiment analysis and these different approaches are
explained thoroughly in background section of this report. Some research works conducted
in sentiment analysis has been included. The taken procedures and the result of their research
has been highlighted.
From different available machine learning classifiers for text classification, Naïve Bayes
classifier was selected as the classifier for sentiment analysis. The approach on selecting this
classifier has been included in this report. Naïve Bayes classifier uses the Bayes Theorem to
predict the sentiment. How this theorem is used for predicting the sentiment of a text is
explained with each steps of algorithm. An example also has been demonstrated in this report
to address how sentiment of a word can be predicted using Bayes theorem. Pseudocode and
flowchart of the algorithm have been included in the report, which can be used during actual
implementation of the algorithm.

4.2. Solution addressing the real-world problems
Sentiment analysis bas become a key tool for making sense of the data where 2.5 quintillion
of data is generated every day. This has helped companies to get key insights and automate
all kind of process and analytics for improving business. Sentiment analysis is being used for
various purposes. In a company where it manufactures different types of products sentiment
analysis has helped track the performance of the product in the market by collecting
sentiments from the customer feedback and reviews. (Stecanella, 2017)
Sentiment Analysis has empowered all kinds of market research and competitive analysis,
whether exploring a new market, anticipating future trends, or keeping an edge on the
competition, sentiment analysis has made all the difference. Sentiment analysis makes this
possible by analyzing product review of a brand and compare those with other competitors,
compare sentiment across international markets and so on. (Stecanella, 2017)
Sentiment analysis can be used in monitoring social media. Tweets / Facebook posts can be
analyzed over a period of time to see sentiment of a particular audience. This can be used to
gain deep insight into what’s the current market status of the product. It helps prioritize action
and track trends over time. (Stecanella, 2017)
For any types of service like trolley bus service, free water service etc., the feedbacks and
opinions of the public is crucial. Surveys can be conducted to get the feedbacks and opinions
of the public. Sentiment analysis can be performed in these surveys to identify how well these
services are benefiting the people and understand the changes required for improving the
existing services.
These are only some real-world areas that sentiment analysis can benefit or has been
benefiting. It can be applied to many other aspects of business, from brand monitoring to
product analytics, from customer service to market research. Leading brands are being able
to work faster and with more accuracy by incorporating sentiment analysis into their existing
system and analytics.

4.3. Further work
This report has only touched the surface of sentiment analysis. For accurately predicting a
sentiment it requires combined usage of both rule-based approaches like lexicons and
automatic approaches i.e. machine learning approach. Naïve Bayes is a basic model but
performance of this model can be increased by using different data pre-processing techniques,
matching the level of other advanced methods. The techniques like lemmatizing words, N-
grams, TF-IDF, laplace correction, stemming, emoticon, negation, dictionary and so on can
significantly increase the accuracy score. (Ray, 2017) (Giulio Angiani, 2015)
Data visualization is very important because it enables to see analytics that helps grasp
difficult concepts or identify new patterns. Sentiments between products can be compared
using charts like pie, graph line etc. This is very useful for any other companies to track
product performance, identify necessary changes and all kinds of insights. So, sentiment
visualization is another prospect which further increases the efficiency of sentiment analysis.

5. References
Alec Go, R. B. L. H., 2009. Twitter Sentiment Classification using Distant Supervision,
Stanford: s.n.
Alexander Pak, P. P., 2008. Twitter as a Corpus for Sentiment Analysis and Opinion
Mining. In: France: Orsay Cedex, pp. 1321-1326.
Bo Pang, L. L., 2008. Opinion mining and sentiment analysis. 2 ed. s.l.:Foundations and
Trends in Information Retrreva;.
BOUKKOURI, H. E., 2018. Medium.com. [Online]

Available at: https://medium.com/data-from-the-trenches/text-classification-the-first-step-
toward-nlp-mastery-f5f95d525d73
[Accessed 29 January 2019].
Changhua Yang, K. H.-Y. L. a. H.-H. C., 2007. Emotion classification using web blog
corpora. In: Washington: s.n., pp. 275-278.
Expertsystem, 2018. Expertsystem. [Online]

Available at: https://www.expertsystem.com/examples-natural-language-processing-
systems-artificial-intelligence/
Giulio Angiani, L. F. T. F. P. F. E., 2015. A Comparison between Preprocessing Techniques

for Sentiment Analysis in Twitter, Parma: s.n.
Gupta, S., 2017. Towards Data Sceience. [Online]

Available at: https://towardsdatascience.com/sentiment-analysis-concept-analysis-and-
applications-6c94d6f58c17
Gupta, S., 2018. Paralleldots. [Online]

Available at: https://blog.paralleldots.com/data-science/breakthrough-research-papers-and-
models-for-sentiment-analysis/
[Accessed 2 February 2019].

Morgan, J., 2018. Differencebetween. [Online]

Available at: http://www.differencebetween.net/technology/differences-between-
supervised-learning-and-unsupervised-learning/
Ray, S., 2017. Analyticsvidhya. [Online]

Available at: https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-explained/
Reese, H., 2017. TechRepublic. [Online]

Available at: https://www.techrepublic.com/article/understanding-the-differences-between-
ai-machine-learning-and-deep-learning/
Rouse, M., 2018. TechTarget. [Online]

Available at: https://searchenterpriseai.techtarget.com/definition/AI-Artificial-Intelligence
Shailendra Singh Kathait, S. T. A. B. V. K. S., 2017. INTELLIGENT SYSTEM FOR

ANALYZING SENTIMENTS OF FEEDBACK. Volume 8, pp. 588-594.
Sharma, A., 2018. Geeksforgeeks. [Online]

Available at: https://www.geeksforgeeks.org/difference-between-machine-learning-and-
artificial-intelligence/
Stecanella, B., 2017. Monkeylearn. [Online]

Available at: https://monkeylearn.com/blog/practical-explanation-naive-bayes-classifier/
Stecanella, B., 2017. Monkeylearn. [Online]

Available at: https://monkeylearn.com/blog/practical-explanation-naive-bayes-classifier/
Walaa Medhat, A. H. H. K., 2014. ScienceDirect. [Online]

Available at: https://www.sciencedirect.com/science/article/pii/S2090447914000550

A18 CU6051NA A2 CW Coursework 16034872 Anjil Shrestha PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A18 CU6051NA A2 CW Coursework 16034872 Anjil Shrestha PDF

Uploaded by

Copyright:

Available Formats

Module Code & Module Title

CU6051NA - Artificial Intelligence

Assessment Weightage & Type

Year and Semester

Student Name: Anjil Shrestha

London Met ID: 16034872

College ID: sity1c117018

Assignment Due Date: 11th February, 2019

Assignment Submission Date: 11th February, 2019

1.1. AI, ML, NLP & Sentiment Analysis

Artificial Intelligence is the ability of a machine or a software to perceives its environment

Figure 1 Machin learning types (Morgan, 2018)

Anjil Shrestha | 16034872 1

Figure 2 Relation between AI, NLP, ML and Sentiment Analysis

Anjil Shrestha | 16034872 2

1.2. Problem Domain

Anjil Shrestha | 16034872 3

2.1. Sentiment analysis and its approaches

Figure 3 Sentiment Analysis Overview

Anjil Shrestha | 16034872 4

Figure 4 Difference approaches on Sentiment Analysis

2.1.1.1. Rule-based systems

2.1.1.2. Automatic systems

Anjil Shrestha | 16034872 5

2.1.1.3. Hybrid systems

2.2. Research works done on Sentiment Analysis

Anjil Shrestha | 16034872 6

2.3. Current applications of Sentiment analysis

Anjil Shrestha | 16034872 7

3.1. Approach to solving the problem

- Highly practical method

Anjil Shrestha | 16034872 8

3.2. Explanation of the AI algorithm

Figure 5 Bayes Theorem

Training data Label

Anjil Shrestha | 16034872 9

Vocabulary of unique words ignoring case and punctuations:

To predict a review to be positive or negative bayes theorem can be used:

Let’s take a review: “I dont like it.”

Anjil Shrestha | 16034872 10

Y+ = P(+ ) * P( I | + ) * P ( don’t | + ) * P ( like | + ) * P ( it | + ) = 0.09216

P( don’t | - ) = (1+1)/(6+16)= 0.0909

P (like | - ) = (0+1)/(6+16)= 0.0454

P (it | -) = (0+1)/(6+16)= 0.0454

y- = P(- ) * P( I | - ) * P ( don’t | - ) * P ( like | - ) * P ( it | - ) = 0.19986

Anjil Shrestha | 16034872 11

Anjil Shrestha | 16034872 12

Import necessary libraries (pandas, sklearn, nltk tools)

Collect labeled training datasets

dataframe = Pandas.readCsv(“training data”)

Split X and Y into training and testing set

Perform data pre-processing using countvectorizer.

Ignoring case and punctuation

Strip white space.

Remove numbers and other characters

Train the model on training set

Make the prediction on testing set

my_test_data=['This is really good','This was bad']

Compare actual response value with the predicted response value.

Anjil Shrestha | 16034872 13

Figure 6 Flowchart of algorithm

Anjil Shrestha | 16034872 14

Anjil Shrestha | 16034872 15