744-Andrei Suiu CodeCamp Presentation-Building A chatbot-NLP Pipeline and Dependency Parsing

Building a chatbot: NLP pipeline and dependency parsing
By: Andrei Şuiu
meetup.com/IASI-AI/
facebook.com/AI.in.Iasi/
What Is a Chatbot?
Chat robots are computer programs
powered by rules and sometimes
artificial intelligence, that mimic
conversation with people via a chat
interface.
Applications:
● Legal consultancy
● HR services
● Customer Services
● Call centres
● Banks
● Restaurants
● Travel Services & Hotels
● Medical services
meetup.com/IASI-AI/ facebook.com/AI.in.Iasi/
History: first chatbot
ELIZA
Created from
1964 to 1966
@MIT AI Laboratory
by Joseph
Weizenbaum
Applications: virtual lawyer
Applications: virtual lawyer
https://donotpay-search-master.herokuapp.com
DoNotPay - a chatbot that provides free legal
advices using AI invented by British entrepreneur
Joshua Browder. It can assist with writing letters
and filling out forms.
By June of 2016, DoNotPay had successfully

contested 160,000 parking tickets - a 64% success
rate - and earlier this year, Browder added
capabilities to assist asylum seekers in the US, UK
and Canada.
Now, the bot is able to assist with over 1,000

different legal issues in all 50 states and across the
UK.
Applications: Human Resources
Help new employees
to learn & find:
● Kitchen, coffee
● company main
internal policies
● Printer/xerox
● Company
structure
● Main business
processes
● etc.
Perception of chatbots
● You can think of a bot just as of another user
● Bot can be invited to a group and post messages with the help of keywords
● Bots can have many of the same qualities as their human counterparts:
○ names
○ profile photos
○ can be direct messaged or mentioned
○ can post messages or initiate conversation
○ upload files, etc...
Perception of chatbots: responsibility
Why use chatbots?
● Human language is a natural way to command and ask questions
● Single point of navigation that offers contextual&personalized information
● Chatbots give you the opportunity to serve more clients with less human
resources
● Chatbots are often more cost effective and faster than their human
counterparts.
Messaging platforms are opening their APIs
Applications
● Legal consultancy
● HR services
● Customer Services (Emag)
○ cross selling/up-selling, help make purchase decisions
○ Handle objections personally, get customer feedback
○ Offer discount codes
○ Deliver shipping notifications, out-of-stock notificatoins
● Call centres
● Banks (Livia de la BT)
● Restaurants
● Travel Services & Hotels (Uber chatbot)
● Medical services
Building a chatbot
The key for a bot to efficiently communicate with humans is its ability to
understand the intentions of humans and extraction of relevant information from
that intention and of course relevant action against that information.
One of the main concerns of NLP science is to extract the intentions and other
relevant information from text.
Intention identification
Below I propose a simple method for identification of some types of intentions.
Generally, you'll get a unicode string out of the user input, either this is written
at keyboard, either it's a string generated by a speech recognition engine from
the audio stream received from a phone line. We'll use a technique called
semantic role labeling.
Semantic role labeling is a task in NLP consisting of the detection of the

semantic arguments associated with the predicate/verb of a sentence and their
classification into their specific roles.
This is an important step towards making sense of the meaning of a sentence.
Semantic Role Labeling
Can we figure out that these sentences have the same meaning?
● Gates sold Microsoft stock to Google.

● Google bought Microsoft stock from Gates.
● The Microsoft stock was sold to Google by Gates.
● The Microsoft stock was purchased by Google from Gates.
Predicates sold, bought, purchase represent an event.

Semantic roles express the abstract role that arguments of a predicate can take
in event.
Gates - agent that sells
Google - agent that buys
Microsoft stock - the object being transacted
NLP Pipeline
An example of a NLP pipeline for role labeling:
raw text → sentence tokenization → tokenization → PoS-tagging →
→ lemmatization → dependency parsing → role labeling
NLP Pipeline
Sentence tokenization
How would you split sentences in a text?
We know that the period in Mr. Smith and Google Inc. do not mark sentence boundaries.
● a period may denote an abbreviation, decimal point, an ellipsis(...), or an email address – not
the end of a sentence.
● About 47% of the periods in the Wall Street Journal corpus denote abbreviations.
And sometimes sentences can start with non-capitalized words.

i is a good variable name.
And some sentences are not separated by periods!
Sentence Boundary Disambiguation: you can use PTBTokenizer from Stanford CoreNLP for Java, or
Punkt Sentence Tokenizer from NLTK for Python.
NLP Pipeline
Word tokenization
How would you split words in a sentence?
We don't want to lose the negative particle.
And usually, punctuation marks are not part of the words!
NLP Pipeline
PoS Tagging
Part-Of-Speech Tagger is a piece of software that reads text in some language and assigns parts of
speech to each word (and other token), such as noun, verb, adjective, etc., although generally
computational applications use more fine-grained POS tags like 'proper-noun-plural' or
'verb-past-gerund'.
Usually taggers use PoS abbreviations like:
● NN - noun, singular
● NNPS - proper noun, plural
● VBZ - verb, 3rd person singular present
● JJR - adjective, comparative
● RBS - adverb, superlative
Usually PoS taggers performs tokenization and lemmatization in the same time.
NLP Pipeline
Lemmatization & Stemming
The goal of both stemming and lemmatization is to reduce inflectional forms and sometimes
derivationally related forms of a word to a common base form. For instance:
● am, are, is → be
● car, cars, car's, cars' → car
Stemming usually refers to a crude heuristic process that chops off the ends of words in the hope of
achieving this goal correctly most of the time, and often includes the removal of derivational affixes.
Lemmatization usually refers to doing things properly with the use of a vocabulary and
morphological analysis of words, normally aiming to remove inflectional endings only and to return
the base or dictionary form of a word, which is known as the lemma.
NLP Pipeline
Dependency parsing
A dependency parse connects words according to their relationships. It

generates a directed acyclic graph where nodes are words that are dependent
on the parent, and edges are labeled by the relationship.
Above is an example of a graph generated by Stanford CoreNLP parser
Dependency parsing
Another way to represent dependencies. Note the root relation.
The quick brown fox jumps over the lazy dog.
● root(ROOT-0, jumps-5)
● det(fox-4, The-1)
● det(dog-9, the-7)
● amod(fox-4, brown-3)
● amod(dog-9, lazy-8)
● nsubj(jumps-5, fox-4)
● case(dog-9, over-6)
● amod(fox-4, quick-2)
_
● nmod(jumps-5, dog-9)
Dependency parsing
Another way to represent dependencies. Note the root relation.
The quick brown fox jumps over the lazy dog.
● root(ROOT-0, jumps-5)
● det(fox-4, The-1) determiner
● det(dog-9, the-7) determiner
● amod(fox-4, brown-3) adjectival modifier
● amod(dog-9, lazy-8) adjectival modifier
● nsubj(jumps-5, fox-4) nominal subject is the proto-agent of a clause
● case(dog-9, over-6) The case relation is used for any preposition in English.
● amod(fox-4, quick-2) An adjectival modifier of a nominal is any adjective that serves to
modify the meaning of the nominal.
● nmod(jumps-5, dog-9) nominal modifier relation is used for nominal modifiers of nouns or
clausal predicates
●
Word sense disambiguation
Consider next sentences:
● I suspect that he is the offender.

● I suspect the truthfulness of his words.
● I suspect Osama to be the terrorist.
● Osama is the main suspect.
Word sense disambiguation: using Ontologies
Hyponym - is a word or phrase whose semantic
field is included within that of another word, its
hyperonym or hypernym. In simpler terms, a
hyponym shares a type-of relationship with its
hypernym.
Verb hypernymy is also called troponymy.
Wordnet is a large lexical database of English, and because it has hypernym/hyponym

relationships among the synsets, it can be used as a lexical ontology.
Ontology is a formal naming and definition of the types, properties, and interrelationships of the
entities that really or fundamentally exist for a particular domain of discourse.
Ontologies
Ontologies
Word sense disambiguation: Verb frames
In WordNet, word meanings are represented by synonym sets called synsets - lists of synonymous
word forms that are interchangeable in some context. Examples: (suspect, surmise), (suspect, distrust,
mistrust), (suspect, believe_to_be_guilty)
Each verb synset contains a list of generic sentence frames illustrating the types of simple sentences
in which the verbs in the synset can be used. There are total of 35 frames. Some examples:
● Something ----s (Ex: vegetate)

● Somebody ----s (Ex: respire, sleep)
● It is ----ing (Ex: rain, snow)
● Somebody ----s VERB-ing (Ex: continue/proceed/keep, avoid )
● Somebody ----s something (Ex: manipulate, wave, insuflate)
● Something ----s Adjective/Noun (Ex: become/go/get)
● Somebody ----s something to somebody (Ex: dedicate, delegate/depute)
Word sense disambiguation
nsubj - nominal subject is the proto-agent of a
clause
ccomp - clausal complement of a verb is a

dependent clause with an internal subject which
functions like an object of the verb.
dobj - direct object is the entity that is acted

upon by the subject.
Synset Verb Frames Hypernym synset Sentence
suspect, surmise Somebody ----s something guess, venture, I suspect that he is the
Somebody ----s that pretend, hazard offender.
CLAUSE
suspect, distrust, Somebody ----s somebody disbelieve, discredit I suspect the

mistrust Somebody ----s truthfulness of his
something words.
suspect, Somebody ----s somebody think, opine, suppose, I suspect Osama to be

believe_to_be_guilty to INFINITIVE imagine, reckon, guess the terrorist.
Somebody ----s that
CLAUSE
1. suspect = suppose
2. suspect = believe to be guilty
3. suspect = disbelieve
4. suspect = accused/defendant
Handling diathesis
Passive voice vs Active voice: dobj ↔ nsubjpass, nsubj ↔ nmod:agent
Dependency parsing pitfalls: syntactical ambiguity
Ben sees John climbing the mountain with his telescope.
His telescope has been installed on the mountain last month.
His telescope has been installed on the mountain last month.
The telescope is heavy and John gets tired.
His telescope helps Ben see John from 10km away.
His telescope helps Ben see John from 10km away.
So whose is the telescope?
Mary sees John climbing the mountain with his telescope.
His telescope helps Mary see John from 10km away.
So whose is the telescope?
Anaphora/Capthora is the use of an expression whose interpretation

depends upon another expression in context (its antecedent or postcedent).
These are types of coreference.

Take a look over coreference resolution techniques.
Interested in learning more about chatbots, artificial intelligence or machine
learning? Join IAȘI AI meetups and workshops, the latest technical community
from Iași counting more than 250 members.

744-Andrei Suiu CodeCamp Presentation-Building A chatbot-NLP Pipeline and Dependency Parsing

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

744-Andrei Suiu CodeCamp Presentation-Building A chatbot-NLP Pipeline and Dependency Parsing

Uploaded by

Copyright:

Available Formats

Building a chatbot: NLP pipeline and dependency parsing

By: Andrei Şuiu

By June of 2016, DoNotPay had successfully

Now, the bot is able to assist with over 1,000

Semantic role labeling is a task in NLP consisting of the detection of the

This is an important step towards making sense of the meaning of a sentence.

● Gates sold Microsoft stock to Google.

Predicates sold, bought, purchase represent an event.

raw text → sentence tokenization → tokenization → PoS-tagging →

→ lemmatization → dependency parsing → role labeling

raw text → sentence tokenization → tokenization → PoS-tagging →

→ lemmatization → dependency parsing → role labeling

And sometimes sentences can start with non-capitalized words.

raw text → sentence tokenization → tokenization → PoS-tagging →

→ lemmatization → dependency parsing → role labeling

And usually, punctuation marks are not part of the words!

raw text → sentence tokenization → tokenization → PoS-tagging →

→ lemmatization → dependency parsing → role labeling

raw text → sentence tokenization → tokenization → PoS-tagging →

→ lemmatization → dependency parsing → role labeling

raw text → sentence tokenization → tokenization → PoS-tagging →

→ lemmatization → dependency parsing → role labeling

A dependency parse connects words according to their relationships. It

Above is an example of a graph generated by Stanford CoreNLP parser

The quick brown fox jumps over the lazy dog.

The quick brown fox jumps over the lazy dog.

● I suspect that he is the offender.

Verb hypernymy is also called troponymy.

Wordnet is a large lexical database of English, and because it has hypernym/hyponym

● Something ----s (Ex: vegetate)

ccomp - clausal complement of a verb is a

dobj - direct object is the entity that is acted

suspect, distrust, Somebody ----s somebody disbelieve, discredit I suspect the

suspect, Somebody ----s somebody think, opine, suppose, I suspect Osama to be

2. suspect = believe to be guilty

Passive voice vs Active voice: dobj ↔ nsubjpass, nsubj ↔ nmod:agent

His telescope has been installed on the mountain last month.

His telescope has been installed on the mountain last month.

The telescope is heavy and John gets tired.

His telescope helps Ben see John from 10km away.

His telescope helps Ben see John from 10km away.

So whose is the telescope?

His telescope helps Mary see John from 10km away.

So whose is the telescope?

Anaphora/Capthora is the use of an expression whose interpretation

These are types of coreference.

You might also like