You are on page 1of 18

CU6051NA - Artificial Intelligence

20% Individual Coursework

2019-20 Autumn

Student Name: Renish Gautam

London Met ID: 17031035

College ID: np01cp4a170052

Assignment Due Date: 13th January 2020

Assignment Submission Date: 13th January 2020

I confirm that I understand my coursework needs to be submitted online via Google Classroom under the relevant

module page before the deadline in order for my assignment to be accepted and marked. I am fully aware that late

submissions will be treated as non-submission and a mark of zero


Contents
1. Introduction .........................................................................................................................1
1.1. Explanation of the AI concept chosen ...........................................................................2
1.1.1. Sentiment Analysis ................................................................................................2
1.2. Explanation/introduction of the chosen problem domain/topic ......................................3
2. Background .........................................................................................................................4
2.1. Sentiment Analysis and its approaches ..........................................................................4
2.1.1. Approaches ............................................................................................................4
2.2. Research works done on Sentiment Analysis .................................................................6
2.3. Current applications of Sentiment analysis ....................................................................7
Social Media Monitoring .....................................................................................................7
McDonalds vs. Burger King ................................................................................................8
3. Solution ...............................................................................................................................9
3.1. Explanation of the proposed solution/approach to solving the problem ..........................9
3.2. Explanation of the AI algorithm .................................................................................. 10
3.3. Pseudocode ................................................................................................................. 11
3.4. Flowchart .................................................................................................................... 12
4. Conclusion ........................................................................................................................ 13
4.1. Analysis of the work done ........................................................................................... 13
4.2. Solution addressing the real-world problems ............................................................... 13
4.3. Further work ............................................................................................................... 13
5. Bibliography ..................................................................................................................... 14
Table of Figures

Figure 1: Different Approaches on sentiment analysis .................................................................5


Figure 2: Social Media Monitoring ..............................................................................................7
Figure 3Mc Donald vs Burger King. ...........................................................................................8
Figure 4: Bayes Theoram ......................................................................................................... 10
Figure 5: Flowchart Diagram .................................................................................................... 12
CU6051NI Artificial Intelligence

1. Introduction
Artificial intelligence (AI) is the simulation of human intelligence processes by machines,
especially computer systems. It is the ability of a digital computer to perform tasks commonly
associated with intelligent beings. The term is frequently applied to the project of developing
systems endowed with the intellectual processes characteristic of humans, such as ability to reason,
discover meaning, generalize, or learn from past experience. Despite continuing advances in
computer processing speed and memory capacity, there are as yet no programs that can match
human flexibility over wider domains or in tasks requiring much everyday knowledge. On the
other hand, some programs have attained the performance levels of human experts and
professionals in performing certain specific tasks, so that artificial intelligence in this limited sense
is found in applications as diverse as medical diagnosis, computer search engines, and voice or
handwriting recognition. While the huge volume of data that’s being created on a daily basis would
bury a human researcher, AI applications that use machine learning can take that data and quickly
turn it into actionable information. (Cambria, 2017) Lately, AI has been so general that we don’t
even realize that we have always been using it as in some social networking sites like Facebook,
YouTube, Instagram etc. These social networking sites show the content based on our interest.
Moreover, Google AI has been helping us in image recognition, voice assistant for android devices
and so on. Hence, AI is wide-ranging branch of computer science concerned with building smart
machines. (Pozzi, 2016)

Machine learning is the science of getting a computer to act without programming. It is an


application of AI. Deep learning is a subset of machine learning that, in very simple terms, can be
thought of as the automation of predictive analytics. Such computer programs are allowed to learn,
modify, develop and grow by themselves when introduced to new data. The process of machine
learning begins with observation of data, like direct experience, or instruction, in order to look for
patterns in data and make better decisions in the future based on the data that were provided. There
are four types of machine learning algorithms:

 Supervised learning: Here, the data sets are labeled so that patterns can be detected and
used to label new data sets.
 Unsupervised learning: Here, data sets are not labeled and are sorted according to
similarities to differences.

1|Page
Renish Gautam
CU6051NI Artificial Intelligence

 Semi-supervised: Here, self-training, multi-view learning, and self-ensembling are


included. Self –training uses a model’s own predictions on unlabeled data to add to the
labeled data set.
 Reinforcement learning: Here, data sets are not labeled but, after performing an action or
several actions, the AI system is given feedback. (theappsolutions, 2020)

However, machine learning remains a relatively ‘hard’ problem. Machine learning remains a hard
problem when implementing existing algorithms and models to work well for one’s new
application.

1.1. Explanation of the AI concept chosen


Social Medias, these days, contain rapidly changing information generated by millions of users
that can dramatically affect one’s personality or the reputation of an organization. This shows the
importance of sentiment analysis. YouTube, as a unique platform, is multimodal and contains
social graph and discussion between people with various opinions. Those opinions might be
positive, negative or neutral. The YouTube API is not effective at formatting comments by
relevance, although it claims to do so. As a result, the most relevant comments do not align with
the top comments at all, they are not even sorted by likes or replies. So I found it very important
for the community to conduct sentiment analysis research on YouTube comments.

1.1.1. Sentiment Analysis


Sentiment Analysis is the process of analyzing online pieces of writing to determine the emotional
tone they carry. In other words, sentiment analysis is the automated process of classifying online
text data as positive, neutral or negative, giving businesses the opportunity to gain a deeper
understanding of how customers perceive their product, brand or service. Currently, sentiment
analysis is a topic of great interest and development since it has many practical applications.
Companies use sentiment analysis to automatically analyze survey responses, product reviews,
social media comments, and they like to get valuable insights about their brands, product, and
services. Sentiment analysis helps data analysts with large number of businesses to collect public
opinion, conduct complex market research, monitor products brand and reputation, analyze the
comments and understand the end users experience. (Miner, 2019) Sentiment analysis provides
some answers into what the most important issues are, from the perspective of customers, at least.

2|Page
Renish Gautam
CU6051NI Artificial Intelligence

Because the sentiment analysis can be automated, and therefore decisions can be made based on a
significant amount of data rather than plain intuition that is not always right. (Hardy, 2020)

Basic sentiment analysis of the text works in a straightforward process. At, First the text document
is break down into its component parts like phrases, token, sentence and parts of speech. After that
the Identification of each and every sentiment-bearing phrase and the component is complete.
Those components identified are then assigned to each phrase as sentiment score. Instead, we can
merge multi-layered sen scores (lexalytics, 2020)

1.2. Explanation/introduction of the chosen problem domain/topic

For many people, YouTube is used to watch music video, comedy shows, how to guides, recipes,
hacks and more. YouTube can be a great space for teens to discover things they like. It has been
one of the growing platforms with the simplest video sharing service which users can watch, like,
share, comment, and upload their own videos. The YouTubers' main challenges are to collect all
relevant comment and detect them with summarizing the overall responses about the single video.
This is definitely much time consuming. By using the sentiment analysis Youtuber can easily know
about the reviews given by the viewers without spending lot of time. However, not every person
‘s comment in the videos are same and different kind of emotion are attached in comments. Some
may react badly to any type of disagreement, while others may even thrive there on. In order to
determine the sentiment of the comment Sentiment analysis is used.

At times, the comments of the YouTube can be so toxic that it might sabotage people, religion,
and gender personally. About 500 million comments are deleted. A lot of Youtubers have
complained about the effect they have had on their videos because of hate comments. This toxicity
seems to have a serious impact on how many people tend to engage in conversation and
discourages some from engaging in online conversation altogether. As a result, online platforms
tend to struggle effectively to facilitate connections, resulting in many small groups

3|Page
Renish Gautam
CU6051NI Artificial Intelligence

2. Background
2.1. Sentiment Analysis and its approaches
There are various factors that determines a sentiment of speech or a text, Sentiment analysis is
not a straight procedure. Text information can typically be divided into two main types: facts
and opinions. Opinions are of two types: Comparative and Direct. Direct opinions give an
opinion about an entity directly. (Jadav, 2017)

There are numerous types of sentiment analysis. Systems which focuses on polarity (positive,
negative, neutral) and some systems that detect feelings and emotions or identify intentions are
some important types. Similar emotions such as disappointment, frustration or anxiety (i.e.
negative feelings) or joy, affection or excitement (i.e. positive feelings) are correlated with th
e polarity of a text. Machine learning and Lexicons algorithm are used to detect the emotions
and feelings from texts. When a system is restored to lexicons, it becomes very tricky as the
way people express their emotions varies greatly and so do the lexical items they use.

2.1.1. Approaches
Currently there are many methods and algorithms introduced that extracts sentiment out of

texts. Computation linguistic is very huge that research and works are still going on to

improve the end result or accuracy that these methods provide. The sentiment analysis

systems are classified as following:

 Rule-based: Set of rules are described in this approach that identifies subjectivity, polarity,
or the subject of an opinion via some form of scripting language. Classic NLP techniques
such as tokenization, part of speech marking, stemming, sorting and other tools such as
lexicons are the variety of inputs that can be used in this method. (Monkey Learn, 2020)

 Automatic: That is the approach to learning from data based on machine learning
techniques. In this approach, the task is modeled as a classification problem where a
classifier is fed with a text and then returns corresponding sentiment e.g., negative,
positive or neutral. The classifier is applied with the training samples by first training a
model to associate a specific input with the respective output. The pairs of tags and
4|Page
Renish Gautam
CU6051NI Artificial Intelligence

feature vectors (e.g. positive, negative, or neutral) are fed into the machine learning
algorithm to generate a model. The second step is the process of prediction, in which the
feature extractor transforms the unseen text inputs into feature vectors. When those
feature vectors are fed into the model, the predicted tags are generated. Naïve Bayes,
Logistic Regression, Support Vector machines and Neural Networks are under
supervision learning the classification algorithms which are commonly used. (Monkey
Learn, 2020)

 Hybrid: The concept of hybrid methods is very intuitive: just combine the best of both
worlds, the rule-based and the automatic ones. Usually, by combining both approaches,
the methods can improve accuracy and precision (Monkey Learn, 2020)

Figure 1: Different Approaches on sentiment analysis

5|Page
Renish Gautam
CU6051NI Artificial Intelligence

2.2. Research works done on Sentiment Analysis


Many researches have been conducted on sentiment analysis. Some of the research papers
and journals studied are as follows:

In the journal written by Lambodara Parabhoi, and Payel Saha namely, Sentiment Analysis
of YouTube Comments on Koha Open Source Software Videos has conducted sentiment
analysis on total of 404 comment on Koha ILS video on the Youtube Channel. The main
objective of this project was to analyze if the comments were positive, negative or neutral.
It discusses on using Naïve Bayes Algorithm for the sentiment analysis. They used Parallel
Dots API and Google Spreadsheet using AYLIEN Text Analysis API. The sentiment
analysis was done on categories like intention, subjectivity and sentiments, emotion and
world frequency. (Parabhoi & Saha, 2018)

In another research the authors Joe Timoney, Adarsh Raj, and Brian Davis conducted
Sentiment Analysis on comment of extracted from Youtube’s song. 250 song titles were
gathered and total of 100 comments were extracted from these videos. Various
Classification approaches such as Naïve Bayes, Decision Tree, Cross Validation
techniques and Evaluation metrics were discussed. Two machine learning algorithms were
tested: Naïve Bayes and Decision Trees. The accuracy obtained using Naïve Bayes was
79% and Decision tree was 86.09%. (Timoney et al., 2019)

In the third research written the authors have proposed to present Natural Language
Processing (NLP) based sentiment analysis approach on user comment on the Youtube.
They have proved the effectiveness of scheme by data driven experiment in terms of
accuracy of finding popular and high-quality videos. The NLP process consisted of four
processes: Comment collection and preprocessing, Generation of data sets, sentiment
measures and video rating. (Bhuiyan et al., 2017)

6|Page
Renish Gautam
CU6051NI Artificial Intelligence

2.3. Current applications of Sentiment analysis


Social Media Monitoring
Social media monitoring is a way business are currently using sentiment analysis. With sentiment
analysis, the data can be automatically put into categories of positive, neutral, and negative. This
allows the customer service team to put out urgent fires from disgruntled customers.

Figure 2: Social Media Monitoring

7|Page
Renish Gautam
CU6051NI Artificial Intelligence

McDonalds vs. Burger King

Figure 3Mc Donald vs Burger King.

In the above application, it performs sentiment analysis for McDonalds vs. Burger King. We can
see a massive spike in positive sentiment for Burger King. At the same time, McDonalds was hit
with a wave of negative sentiment.

8|Page
Renish Gautam
CU6051NI Artificial Intelligence

3. Solution
3.1. Explanation of the proposed solution/approach to solving the problem
Taking account of above research and explanations it is clear that sentiment analysis can be

used for various aspects like:

 Brand Monitoring
 Customer Support
 Customer Feedback
 Product Analytics, etc.

Supervised Learning is preferable to achieve the task of predicting the feeling of YouTube
comments in order to successfully complete the proposed problem among many approaches
of sentiment analysis. Naïve Bayes is the algorithm for predicting the sentiment among the
many algorithms under the neural network. For the YouTube comments, Kaggle is used to
gather training datasets.

Reasons for choosing Naïve Bayes are listed below:

 Fast
 Requires less training data
 Highly scalable
 It can make probabilistic prediction
 It is easy to implement
 It works more efficiently than other algorithms if the independence assumption holds.
(educba, 2020)

9|Page
Renish Gautam
CU6051NI Artificial Intelligence

3.2. Explanation of the AI algorithm


Naïve Bayes is a probabilistic algorithm based on Bayes ' Theorem, with an assumption of
independence between predictors. A Naive Bayes classifier assumes, in simple terms, that the
inclusion of a particular feature in a class is unrelated to any other feature being present.

For example, if a fruit is red, round, and around 3 inches in diameter, it may be called an apple.
Even if these characteristics depend on each other or on the existence of the other characteristics,
all these characteristics contribute independently to the probability that this fruit is an apple, which
is why it is called' Naive.'

Naive Bayes model is simple to build and especially useful for very large data sets. Naive Bayes
is considered to outperform even highly sophisticated methods of classification, as well as
simplicity. (Ray, 2017)

Bayes Theorem provides a way for P(c), P(x) and P(x) to measure posterior probability. Look at
the equation underneath:

Figure 4: Bayes Theoram

Here,

 P(c|x) is the posterior probability of class (c, target) given predictor (x, attributes).
 P(c) is the prior probability of class.
 P(x|c) is the likelihood which is the probability of predictor given class.
 P(x) is the prior probability of predictor.
10 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence

3.3. Pseudocode
Import necessary libraries

Collect labeled training datasets

Read dataset and separate sentiment text and its sentiment label.

dataframe = Pandas.readCsv(“training data”)

Split dataframe and sentiment labeltraining and testing set

dataframe _train, dataframe_test,label,


training_train,training_test=train_test_split(dataframe,label training,
test_size=0.2,random_state=1)

Perform data pre-processing

Remove stopwords.

Tokenization.

Ignoring case and punctuation

Strip white space.

Remove numbers and other characters

Train the model on training set

model=naive_bayes.MultinomialNB()

model.fit(X_train,y_train)

11 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence

Make the prediction on testing set

my_test_data=['This is really good','This was bad']

my_vectorizer=vectorizer.transform(my_test_data)

model.predict(my_vectorizer

Compare real response value with the value of the expected response.

3.4. Flowchart

Figure 5: Flowchart Diagram

12 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence

4. Conclusion

4.1. Analysis of the work done


This documentation includes study of Artificial Intelligence. We understood that AI
comprised of various concept which include Machine Learning, Deep Learning and Neural
Networks.
Machine Learning is the subset of the AI which includes NLP as one of its sub types. We
understood that Sentiment Analysis is an important application of AI which automatically
help analyze text into positive or negative label. For this assignment we have briefly
analyzed and introduced to the topic sentiment analysis. An application will be developed
for analyzing sentiment of YouTube comment.

4.2. Solution addressing the real-world problems


With above researches we can conclude that sentiment analysis is an important tool for
improvement of human life. Sentiment analysis on Youtube comment will help youtubers
to know the preferences of the viewer and increase their revenue. With accuracy of
sentiment analysis, the admin of the youtube can avoid cyber crime by deleting offensive
comment and protect privacy of the youtube video creators. Further, it can also help
youtubers to improve their content and make necessary improvements.

4.3. Further work


In this coursework we have conducted research on various topic of AI. We understood
general concept of NLP and ML and about sentiment analysis. For further work we will be
developing a working application that would conduct sentiment analysis on Youtube
comments that are collected from dataset. After coding, final documentation is to be done
which further explains the steps and method used for the development.

13 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence

5. Bibliography
Bhuiyan, H., ara, J., Bardhan , R. & Islam, R. (2017) Retrieving YouTube Video by Sentiment
Analysis on User Comment onn User Comment. Proc. of the 2017 IEEE International Conference
on Signal and Image Processing Applications , p.478.

Cambria, E. (2017) A Practical Guide to Sentiment Analysis (Socio-Affective Computing). In A


Practical Guide to Sentiment Analysis (Socio-Affective Computing). 1st ed. Springer. p.196.

educba. (2020) Sentiment Analysis in Social Media [Online]. Available from:


https://www.educba.com/sentiment-analysis-social-media/ [Accessed 2020].

Hardy, J. (2020) Social Media Today [Online]. Available from:


https://www.socialmediatoday.com/content/introduction-sentiment-analysis [Accessed 2020].

Jadav, S. (2017) Sentiment Analysis: A Review. Scientific Journal of Impact Factor (SJIF): 4.72
, p.962.

lexalytics. (2020) Sentiment Analysis Explained [Online]. Available from:


https://www.lexalytics.com/technology/sentiment-analysis [Accessed 2020].

Miner, C. (2019) What is Sentiment Analysis? [Online]. Available from:


https://callminer.com/blog/sentiment-analysis-examples-best-practices/ [Accessed 30 April
2019].

Monkey Learn. (2020) Sentiment Analysis [Online]. Available from:


https://monkeylearn.com/sentiment-analysis/ [Accessed 1 January 2020].

Parabhoi, & Saha,. (2018) Sentiment Analysis of YouTube Comments on Koha Open Source
Software Videos. International Journal of Library and Information Studies, 8, p.102.

Pozzi, F.A. (2016) Sentiment Analysis in Social Networks. In Sentiment Analysis in Social
Networks. 1st ed. Morgan Kaufmann. p.284.

14 | P a g e
Renish Gautam
CU6051NI Artificial Intelligence

Ray, S. (2017) 6 Easy Steps to Learn Naive Bayes Algorithm with codes in Python and R [Online].
Available from: https://www.analyticsvidhya.com/blog/2017/09/naive-bayes-
explained/?fbclid=IwAR1-5mSCWS8WwOHc3B6OJPy8-
R73G3OqTxDWn42c528CoOZO2jw5BQYXmSM [Accessed 11 September 2017].

theappsolutions. (2020) 4 TYPES OF MACHINE LEARNING ALGORITHMS [Online]. Available


from: https://theappsolutions.com/blog/development/machine-learning-algorithm-types/
[Accessed 13 January 2020].

Timoney, , Raj, & Davis , B. (2019) Nostalgic Sentiment Analysis of YouTube Comments for Chart
Hits of the 20th Century. Maynooth: Dept. of Computer Science, Maynooth University.

15 | P a g e
Renish Gautam

You might also like