You are on page 1of 12

OFFICIAL (CLOSED) \ NON-SENSITIVE

Natural Language
Processing (NLP)
Mr Hew Ka Kian
hew_ka_kian@rp.edu.sg
OFFICIAL (CLOSED) \ NON-SENSITIVE

Similarity
• Word vectors - also called word embeddings - are mathematical descriptions of individual words
such that words that appear frequently together in the language will have similar values. In this
way we can mathematically derive context.
• As a result, the word vector for "lion" will be closer in value to "cat" than to "dandelion".
• .similarity() method of the Doc token exposes the vector relationship
• Words that often appear together in the same context will have high similarity score, although
the meaning may be opposite.
• ‘like’ and ‘love’ have a high similarity score and
• ‘like’ and ‘hate’ have a high similarity score too as they are often used in the same context
although they have opposite meaning

Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE

Sentiment Analysis
• NLTK VADER module provides sentiment scores of negative, neutral and positive categories.
• It also gives a compound score that is easy to understand: less than 0 means negative and more
than 0 means positive. -1 is the most negative score and 1 is the most positive score.
• Example code below
from nltk.sentiment.vader import SentimentIntensityAnalyzer
sid = SentimentIntensityAnalyzer()
a = 'This was a good movie.'
sid.polarity_scores(a)
{'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.4404}

sid.polarity_scores(a)['compound']
0.4404

Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE

Student Activity
Exercise B:
• Create a function get_compound_score() that will return the
'compound' sentiment score of the text in the parameter

Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE

Student Activity
Exercise B:
• Create a 'sentiment' column in df containing the compound score of
the text in 'review' column.

Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE

Student Activity
Exercise B:
• Create a function named get_sentiment(compound) that will
return 'pos' if the compound parameter > 0, otherwise return 'neg'

Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE

Student Activity
Exercise B:
• Create a column 'sentiment' in df that shows 'pos' or 'neg' based on
the score in the 'compound' column.

Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE

Student Activity
Exercise B:
• Show only the rows that our NLP guesses the sentiment wrongly
('label' column and 'sentiment' column values do not match)

Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE

Problem-Simple_Chatbot
It is your turn:
1. Modify the code so that the code can be more interactive. Write a
function chat() that will do the following when called:
1.1 It should continually ask the user for input
1.2 It will end when the user types "exit"

Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE

Problem-Simple_Chatbot
It is your turn:
• Enhance the chatbot to also respond to user's greetings like "Good
morning" with "It is a good day, enjoy!"

Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE

Problem-Simple_Chatbot
It is your turn:
• Check for the user's tone. If he is displeased, respond with "You
sound displeased, please cool down."

Source:
OFFICIAL (CLOSED) \ NON-SENSITIVE

Problem-Simple_Chatbot
It is your turn:
• Set a similarity threshold. If no matching statement matches with
similarity score equals or above the threshold, tell the user to
rephrase.

Source:

You might also like