Professional Documents
Culture Documents
• You: Hello
• Op: Hi. This is Railway Enquiry
• You: What is the status of train 2803?
• Op: It’s right on time. The train will leave CST at
5:45 pm. Is there anything else I could assist you
with?
• You : No, thank you
• Op: You are welcome. Indian railways wishes a nice
and happy journey.
History
• ELIZA
o Developed in the 1960s
o Looks for pronouns and verbs
o ‘You’ becomes ‘I’ and vice versa
o User: You are a dork.
o ELIZA: What makes you think I am a dork?
• PARRY (1972)
o Simulated a paranoid schizophrenic
History (contd)
• RACTER
o 1984
o Generate English language prose at random
o Generated a book
o The Policeman's Beard Is
Half-Constructed
Excerpt from the book
History (contd.)
• ALICE
o 1994
o Three time loebner prize winner
o Richard Wallace
o http://www.alicebot.org/
Talk Outline
• How do chat bots work?
• Applications
o Intelligent Tutoring Systems
o Other applications here
• Conclusion
How Do Chatbots Work
• Don’t have a good understanding of conversation
• Are based on pattern matching
• Chatterbots have a set of input and output rules
• Recognize cue words from user and responds with
a pre-calculated response
• For e.g.:- Human: “I am feeling very worried
today.”
Chatterbot: “Why are you feeling worried lately?”
• Other chatterbots learn through user interactions
ELIZA
• One of the first chatterbots designed in 1966
• Parodies a therapist rephrasing statements of the
user and posing them back as questions
• Works by simple parsing and substitution of key
words into canned phrases
• People get emotionally caught up by ELIZA’s
confident replies forgetting that it’s a machine
Example of Conversation
• Eliza’s conversation with a patient is:-
– Human: Well, my boyfriend made me come here.
ELIZA: Your boyfriend made you come here?
Human: He says I'm depressed much of the time.
ELIZA: I am sorry to hear you are depressed.
Human: It's true. I'm unhappy.
ELIZA: Do you think coming here will help you not to be
unhappy?
• Able to elicit emotional responses from users
though being programmed not to do so
• Demonstrates ELIZA effect
Jabberwacky
• No fixed rules and principles programmed into it
• Learns language and context through human
interaction. Stores all conversations and comments
which are used to find appropriate responses
• Problems faced due to this approach:-
– Continuous changing of subject and conversation
– May respond in a bad-tempered and rude manner
• Was designed to pass the Turing test and is the
winner of the Loeber Prize contest
ALICE Chatbot System
• ALICE(Artificial Linguistic Internet Computer
Entity) is inspired by ELIZA
• Applies heuristic pattern matching rules to input
to converse with user
• ALICE is composed of two parts
– Chatbot engine
– Language Model
• Language models are stored in AIML(Artificial
Intelligence Mark-up Language) files
Structure of AIML
• AIML consists of data objects which are made up of units
called topics and categories
• A topic has a name attribute and categories associated with it
• Categories consist of pattern and template and are the basic
unit of knowledge
• Pattern consists of only words, spaces and wildcard symbols _
and *.
Types of ALICE/AIML Categories
• Atomic categories: do not have wildcard symbols.
Synonyms
ALICE Pattern Matching Algorithm
The folder has a subfolder stars with _,then, ”_/”,scan through and match
all words suffixed X, if no match then:
Go back to the folder, find another subfolder start with word X, if so then
turn to “X/”,scan for matching the tail of X. Patterns are matched. If no
match then:
Go back to the folder, find a subfolder starting with *,turn to, “*/”, try all
suffixes of input following “X” to see one match. If no match was found,
change directory back to the parent of this folder and put “X” back to the
head of the input.
Dialogue Corpus Training Dataset
Alice tries to mimic the real human conversations. The
training to mimic ‘real’ human dialogues and conversational
rules for the ALICE chatbot is given in the following ways.
Introduction to
Information Retrieval
Introducing Information Retrieval
and Web Search
Introduction to Information Retrieval
Information Retrieval
▪ Information Retrieval (IR) is finding material (usually
documents) of an unstructured nature (usually text)
that satisfies an information need from within large
collections (usually stored on computers).
2
Introduction to Information Retrieval
3
Introduction to Information Retrieval
4
Introduction to Information Retrieval Sec. 1.1
5
Introduction to Information Retrieval
Info need
Info about removing mice
without killing them
Misformulation?
Query
how trap mice alive Search
Search
engine
Query Results
Collection
refinement
Introduction to Information Retrieval Sec. 1.1
7
Introduction to Information Retrieval
Introduction to
Information Retrieval
Introducing Information Retrieval
and Web Search
Introduction to Information Retrieval
Introduction to
Information Retrieval
Term-document incidence matrices
Introduction to Information Retrieval Sec. 1.1
Incidence vectors
▪ So we have a 0/1 vector for each term.
▪ To answer query: take the vectors for Brutus, Caesar
and Calpurnia (complemented) bitwise AND.
▪ 110100 AND
▪ 110111 AND
▪ 101111 =
▪ 100100
12
Introduction to Information Retrieval Sec. 1.1
Answers to query
▪ Antony and Cleopatra, Act III, Scene ii
Agrippa [Aside to DOMITIUS ENOBARBUS]: Why, Enobarbus,
When Antony found Julius Caesar dead,
He cried almost to roaring; and he wept
When at Philippi he found Brutus slain.
13
Introduction to Information Retrieval Sec. 1.1
Bigger collections
▪ Consider N = 1 million documents, each with about
1000 words.
▪ Avg 6 bytes/word including spaces/punctuation
▪ 6GB of data in the documents.
▪ Say there are M = 500K distinct terms among these.
14
Introduction to Information Retrieval Sec. 1.1
15
Introduction to Information Retrieval
Introduction to
Information Retrieval
Term-document incidence matrices
Introduction to Information Retrieval
Introduction to
Information Retrieval
The Inverted Index
The key data structure underlying modern IR
Introduction to Information Retrieval Sec. 1.2
Inverted index
▪ For each term t, we must store a list of all documents
that contain t.
▪ Identify each doc by a docID, a document serial number
▪ Can we used fixed-size arrays for this?
Inverted index
▪ We need variable-size postings lists
▪ On disk, a continuous run of postings is normal and best
▪ In memory, can use linked lists or variable length arrays
▪ Some tradeoffs in size/ease of insertion Posting
Dictionary Postings
Sorted by docID (more later on why). 19
Introduction to Information Retrieval Sec. 1.2
Tokenizer
Token stream Friends Romans Countrymen
Linguistic modules
Modified tokens friend roman countryman
Indexer friend 2 4
roman 1 2
Inverted index
countryman 13 16
Introduction to Information Retrieval Sec. 1.2
Tokenizer
Token stream Friends Romans Countrymen
More on
these later. Linguistic modules
Modified tokens friend roman countryman
Indexer friend 2 4
roman 1 2
Inverted index
countryman 13 16
Introduction to Information Retrieval
Doc Doc
1 2
I did enact Julius So let it be with
Caesar I was killed Caesar. The noble
i’ the Capitol; Brutus hath told you
Brutus killed me. Caesar was ambitious
Introduction to Information Retrieval Sec. 1.2
Terms
and
counts IR system
implementation
• How do we
index efficiently?
• How much
storage do we
need?
Pointer
26
s
Introduction to Information Retrieval
Introduction to
Information Retrieval
The Inverted Index
The key data structure underlying modern IR
Introduction to Information Retrieval
Introduction to
Information Retrieval
Query processing with an inverted index
Introduction to Information Retrieval Sec. 1.3
29
Introduction to Information Retrieval Sec. 1.3
2 4 8 16 32 64 128 Brutus
1 2 3 5 8 13 21 34 Caesar
30
Introduction to Information Retrieval Sec. 1.3
The merge
▪ Walk through the two postings simultaneously, in
time linear in the total number of postings entries
2 4 8 16 32 64 128 Brutus
1 2 3 5 8 13 21 34 Caesar
The merge
▪ Walk through the two postings simultaneously, in
time linear in the total number of postings entries
2 4 8 16 32 64 128 Brutus
2 8
1 2 3 5 8 13 21 34 Caesar
33
Introduction to Information Retrieval
Introduction to
Information Retrieval
Query processing with an inverted index
Introduction to Information Retrieval
Introduction to
Information Retrieval
Phrase queries and positional indexes
Introduction to Information Retrieval Sec. 2.4
Phrase queries
▪ We want to be able to answer queries such as
“stanford university” – as a phrase
▪ Thus the sentence “I went to university at Stanford”
is not a match.
▪ The concept of phrase queries has proven easily
understood by users; one of the few “advanced search”
ideas that works
▪ Many more queries are implicit phrase queries
▪ For this, it no longer suffices to store only
<term : docs> entries
Introduction to Information Retrieval Sec. 2.4.1
Extended biwords
▪ Parse the indexed text and perform part-of-speech-tagging
(POST).
▪ Bucket the terms into (say) Nouns (N) and
articles/prepositions (X).
▪ Call any string of terms of the form NX*N an extended
biword.
▪ Each such extended biword is now made a term in the
dictionary.
▪ Example: catcher in the rye
N X X N
▪ Query processing: parse it into N’s and X’s
▪ Segment query into enhanced biwords
▪ Look up in index: catcher rye
Introduction to Information Retrieval Sec. 2.4.1
<be: 993427;
1: 7, 18, 33, 72, 86, 231; Which of docs
2: 3, 149; 1,2,4,5
could contain “to be
4: 17, 191, 291, 430, 434;
or not to be”?
5: 363, 367, …>
Proximity queries
▪ LIMIT! /3 STATUTE /3 FEDERAL /2 TORT
▪ Again, here, /k means “within k words of”.
▪ Clearly, positional indexes can be used for such
queries; biword indexes cannot.
▪ Exercise: Adapt the linear merge of postings to
handle proximity queries. Can you make it work for
any value of k?
▪ This is a little tricky to do correctly and efficiently
▪ See Figure 2.12 of IIR
Introduction to Information Retrieval Sec. 2.4.2
Rules of thumb
▪ A positional index is 2–4 as large as a non-positional
index
Combination schemes
▪ These two approaches can be profitably combined
▪ For particular phrases (“Michael Jackson”, “Britney
Spears”) it is inefficient to keep on merging positional
postings lists
▪ Even more so for phrases like “The Who”
▪ Williams et al. (2004) evaluate a more sophisticated
mixed indexing scheme
▪ A typical web query mixture was executed in ¼ of the time
of using just a positional index
▪ It required 26% more space than having a positional index
alone
Introduction to Information Retrieval
Introduction to
Information Retrieval
Phrase queries and positional indexes
Word Classes and Part-of-Speech
(POS) Tagging
CS4705
Julia Hirschberg
CS 4705
Garden Path Sentences
2
Word Classes
3
Some Examples
4
Defining POS Tagging
WORDS
TAGS
the
koala
put N
the V
keys P
on DET
the
table
5
Applications for POS Tagging
• Speech synthesis pronunciation
– Lead Lead
– INsult inSULT
– OBject obJECT
– OVERflow overFLOW
– DIScount disCOUNT
– CONtent conTENT
• Parsing: e.g. Time flies like an arrow
– Is flies an N or V?
• Word prediction in speech recognition
– Possessive pronouns (my, your, her) are likely to be followed by
nouns
– Personal pronouns (I, you, he) are likely to be followed by verbs
• Machine Translation
6
Closed vs. Open Class Words
7
Open Class Words
• Nouns
– Proper nouns
• Columbia University, New York City, Arthi
Ramachandran, Metropolitan Transit Center
• English capitalizes these
• Many have abbreviations
– Common nouns
• All the rest
• German capitalizes these.
8
– Count nouns vs. mass nouns
• Count: Have plurals, countable: goat/goats, one goat, two
goats
• Mass: Not countable (fish, salt, communism) (?two fishes)
• Adjectives: identify properties or qualities of
nouns
– Color, size, age, …
– Adjective ordering restrictions in English:
• Old blue book, not Blue old book
– In Korean, adjectives are realized as verbs
• Adverbs: also modify things (verbs, adjectives,
adverbs)
– The very happy man walked home extremely slowly
yesterday.
9
– Directional/locative adverbs (here, home, downhill)
– Degree adverbs (extremely, very, somewhat)
– Manner adverbs (slowly, slinkily, delicately)
– Temporal adverbs (Monday, tomorrow)
• Verbs:
– In English, take morphological affixes (eat/eats/eaten)
– Represent actions (walk, ate), processes (provide, see),
and states (be, seem)
– Many subclasses, e.g.
• eats/V ⇒ eat/VB, eat/VBP, eats/VBZ, ate/VBD,
eaten/VBN, eating/VBG, ...
• Reflect morphological form & syntactic function
How Do We Assign Words to Open or
Closed?
• Nouns denote people, places and things and can
be preceded by articles? But…
My typing is very bad.
*The Mary loves John.
• Verbs are used to refer to actions, processes, states
– But some are closed class and some are open
I will have emailed everyone by noon.
• Adverbs modify actions
– Is Monday a temporal adverbial or a noun?
11
Closed Class Words
• Idiosyncratic
• Closed class words (Prep, Det, Pron, Conj, Aux,
Part, Num) are generally easy to process, since we
can enumerate them….but
– Is it a Particles or a Preposition?
• George eats up his dinner/George eats his dinner up.
• George eats up the street/*George eats the street up.
– Articles come in 2 flavors: definite (the) and indefinite
(a, an)
• What is this in ‘this guy…’?
12
Choosing a POS Tagset
13
Penn Treebank Tagset
14
Using the Penn Treebank Tags
15
Tag Ambiguity
16
Tagging Whole Sentences with POS is Hard
17
How Do We Disambiguate POS?
• Many words have only one POS tag (e.g. is, Mary,
very, smallest)
• Others have a single most likely tag (e.g. a, dog)
• Tags also tend to co-occur regularly with other
tags (e.g. Det, N)
• In addition to conditional probabilities of words
P(w1|wn-1), we can look at POS likelihoods
(P(t1|tn-1)) to disambiguate sentences and to assess
sentence likelihoods
18
Some Ways to do POS Tagging
• Rule-based tagging
– E.g. EnCG ENGTWOL tagger
• Transformation-based tagging
– Learned rules (statistic and linguistic)
– E.g., Brill tagger
• Stochastic, or, Probabilistic tagging
– HMM (Hidden Markov Model) tagging
19
Rule-Based Tagging
20
Start with a POS Dictionary
• she: PRP
• promised: VBN,VBD
• to: TO
• back: VB, JJ, RB, NN
• the: DT
• bill: NN, VB
• Etc… for the ~100,000 words of English
21
Assign All Possible POS to Each Word
NN
RB
VBN JJ VB
PRP VBD TO VB DT NN
She promised to back the bill
22
Apply Rules Eliminating Some POS
23
Apply Rules Eliminating Some POS
24
EngCG ENGTWOL Tagger
26
ENGTWOL Tagging: Stage 1
• First Stage: Run words through FST morphological
analyzer to get POS info from morph
• E.g.: Pavlov had shown that salivation …
Pavlov PAVLOV N NOM SG PROPER
had HAVE V PAST VFIN SVO
HAVE PCP2 SVO
shown SHOW PCP2 SVOO SVO SV
that ADV
PRON DEM SG
DET CENTRAL DEM SG
CS
salivation N NOM SG
27
ENGTWOL Tagging: Stage 2
• Second Stage: Apply NEGATIVE constraints
• E.g., Adverbial that rule
– Eliminate all readings of that except the one in It isn’t
that odd.
Given input: that
If
(+1 A/ADV/QUANT) ; if next word is adj/adv/quantifier
(+2 SENT-LIM) ; followed by E-O-S
(NOT -1 SVOC/A) ; and the previous word is not a verb like
consider which allows adjective
complements (e.g. I consider that odd)
Then eliminate non-ADV tags
Else eliminate ADV
28
Transformation-Based (Brill) Tagging
* 29
Transformation-Based Tagging
• Basic Idea: Strip tags from tagged corpus and try to learn them
by rule application
– For untagged, first initialize with most probable tag for each word
– Change tags according to best rewrite rule, e.g. “if word-1 is a
determiner and word is a verb then change the tag to noun”
– Compare to gold standard
– Iterate
• Rules created via rule templates, e.g.of the form if word-1 is an
X and word is a Y then change the tag to Z”
– Find rule that applies correctly to most tags and apply
– Iterate on newly tagged corpus until threshold reached
– Return ordered set of rules
• NB: Rules may make errors that are corrected by later rules
* 30
Sample TBL Rule Application
• 2 parts to a rule
– Triggering environment
– Rewrite rule
• The range of triggering environments of templates
(from Manning & Schutze 1999:363)
Schema ti-3 ti-2 ti-1 ti ti+1 ti+2 ti+3
1 *
2 *
3 *
4 *
5 *
6 *
7 *
8 *
9 *
TBL Tagging Algorithm
* 33
TBL Issues
* 34
Methodology: Error Analysis
• Confusion matrix: VB TO NN
– E.g. which tags did we
most often confuse
with which other tags? VB
– How much of the
overall error does each TO
confusion account for?
NN
More Complex Issues
• RDF triple
• Freebase
• semi-supervised
• Unsupervised
• distant supervision
Using Patterns to Extract Relations
• Hearst patterns
• lexico-syntactic pattern
Relation Extraction via Supervised Learning
Feature-based supervised relation classifiers.
Example:
• American Airlines, a unit of AMR, immediately matched the move,
spokesman Tim Wagner said
• Syntactic structure
• syntactic path
• Constituent paths between M1 and M2
NP ↑ NP ↑ S ↑ S ↓ NP
• Dependency-tree paths
Airlines ←sub j matched ←comp said →sub j Wagner
Neural supervised relation classifiers
• TACRED relation extraction dataset
• SPANbert
• hand-labeled data
• noisy-or
Distant Supervision for Relation Extraction
• Distant supervision
Unsupervised Relation Extraction
• open information extraction or Open IE
• For example, the ReVerb system extracts a relation from a sentence s in 4
steps:
• 1. Run a part-of-speech tagger and entity chunker over s
• 2. For each verb in s, find the longest sequence of words w that start with a
verb and satisfy syntactic and lexical constraints, merging adjacent matches.
• 3. For each phrase w, find the nearest noun phrase x to the left which is not a
relative pronoun, wh-word or existential “there”. Find the nearest noun
phrase y to the right.
• 4. Assign confidence c to the relation r = (x,w, y) using a confidence
classifier and return it
Evaluation of Relation Extraction
• Supervised
• Semi-supervised
• Unsupervised
• Tuples
• The estimated precision Pˆ is
Extracting Times
• Temporal Expression Extraction
• Temporal Normalization
• Template Filling
• using the tf-idf values and spelling out the dot product as a sum of
products:
Inverted Index
• It consists of two parts:
• dictionary
• Postings
Evaluation of Information-Retrieval Systems
• Precision and recall are then defined as:
• precision-recall curve
• interpolated precision
• mean average precision
IR-based Factoid Question Answering
• IR-based QA
• retrieve and read
• reading comprehension
Entity Linking
• entity linking
• Wikification
• add gate
• output gate
Gated Units, Layers and Networks
Thank you