Professional Documents
Culture Documents
NLP PDF
NLP PDF
5 6
1
Modular Comprehension Ambiguity
• Natural language is highly
ambiguous and must be
disambiguated.
– I saw the man on the hill with a
Acoustic/ Pragmatics telescope.
Syntax Semantics
sound Phonetic meaning
words parse literal – I saw the Grand Canyon flying to LA.
waves trees meaning (contextualized)
– Time flies like an arrow.
– Horse flies like a sugar cube.
– Time runners like a coach.
– Time cars like a Porsche.
7 8
• Many jokes rely on the ambiguity of language: • Ambiguity is the primary difference between
– Groucho Marx: One morning I shot an elephant in my natural and computer languages.
pajamas. How he got into my pajamas, I’ll never know. • Formal programming languages are designed to be
– She criticized my apartment, so I knocked her flat. unambiguous, i.e. they can be defined by a
– Noah took all of the animals on the ark in pairs. Except grammar that produces a unique parse for each
the worms, they came in apples.
sentence in the language.
– Policeman to little boy: “We are looking for a thief with
a bicycle.” Little boy: “Wouldn’t you be better using • Programming languages are also designed for
your eyes.” efficient (deterministic) parsing, i.e. they are
– Why is the teacher wearing sun-glasses. Because the deterministic context-free languages (DCLFs).
class is so bright. – A sentence in a DCFL can be parsed in O(n) time
where n is the length of the string.
11 12
2
Syntactic Parsing Context Free Grammars (CFG)
• Produce the correct syntactic parse tree for a • N a set of non-terminal symbols (or variables)
sentence. • Σ a set of terminal symbols (disjoint from N)
• R a set of productions or rules of the form
A→β, where A is a non-terminal and β is a
string of symbols from (Σ∪ N)*
• S, a designated non-terminal called the start
symbol
3
Spurious Ambiguity Parsing
• Most parse trees of most NL sentences make no • Given a string of non-terminals and a CFG,
sense. determine if the string can be generated by the
CFG.
– Also return a parse tree for the string
– Also return all possible parse trees for the string
• Must search space of derivations for one that
derives the given string.
– Top-Down Parsing: Start searching space of
derivations for the start symbol.
– Bottom-up Parsing: Start search space of reverse
deivations from the terminal symbols in the string.
19
S S
VP NP VP
Verb NP Pronoun
book that flight
book Det Nominal
that Noun
flight
S S
NP VP NP VP
Pronoun ProperNoun
X
book
4
Top Down Parsing Top Down Parsing
S S
NP VP NP VP
X
book
S S
NP VP Aux NP VP
Det Nominal
X
book
S S
Aux NP VP VP
X
book
5
Top Down Parsing Top Down Parsing
S S
VP VP
Verb Verb
book
S S
VP VP
Verb Verb NP
X
book that
S S
VP VP
Verb NP Verb NP
6
Top Down Parsing Top Down Parsing
S S
VP VP
Verb NP Verb NP
S S
VP VP
Verb NP Verb NP
S S
VP VP
Verb NP Verb NP
7
Top Down Parsing Bottom Up Parsing
VP
Verb NP
that Noun
book that flight
flight
44
Nominal
Noun Noun
45 46
Nominal Nominal
X
Noun Noun
47 48
8
Bottom Up Parsing Bottom Up Parsing
Nominal Nominal
Nominal PP Nominal PP
49 50
Nominal Nominal
Nominal PP Nominal PP
NP NP
Noun Det Nominal Noun Det Nominal
flight
51 52
Nominal Nominal
Nominal PP Nominal PP S
NP NP VP
Noun Det Nominal Noun Det Nominal
flight flight
53 54
9
Bottom Up Parsing Bottom Up Parsing
Nominal Nominal
Nominal PP S Nominal PP
X
NP VP NP
Noun Det Nominal X Noun Det Nominal
flight flight
55 56
VP
NP NP
Verb Det Nominal Verb Det Nominal
flight flight
57 58
S S
VP VP X
NP NP
Verb Det Nominal Verb Det Nominal
flight flight
59 60
10
Bottom Up Parsing Bottom Up Parsing
VP VP
VP PP VP PP
NP X NP
Verb Det Nominal Verb Det Nominal
flight flight
61 62
VP VP
NP NP
NP
Verb Det Nominal Verb Det Nominal
flight flight
63 64
flight
65 66
11
Syntactic Parsing & Ambiguity Statistical Parsing
• Just produces all possible parse trees. • Statistical parsing uses a probabilistic model of
syntax in order to assign probabilities to each
• Does not address the important issue of
parse tree.
ambiguity resolution.
• Provides principled approach to resolving
syntactic ambiguity.
• Allows supervised learning of parsers from tree-
banks of parse trees provided by human linguists.
• Also allows unsupervised learning of parsers from
unannotated text, but the accuracy of such parsers
has been limited.
67 68
12
Sentence Probability Three Useful PCFG Tasks
73 74
X
NP → NP PP 0.3 The dog big barked. NP → Det A N 0.5 NP VP
NP → NP PP 0.3
NP → PropN 0.2
A→ε 0.6
NP → PropN 0.2 PCFG John V NP PP
A→ε 0.6 Parser
A → Adj A
PP → Prep NP
0.4
1.0
? The big dog barked A → Adj A 0.4 liked the dog in the pen
PP → Prep NP 1.0
VP → V NP 0.7 O2 VP → V NP 0.7
VP → VP PP 0.3
P(O2 | English) > P(O1 | English) ? VP → VP PP 0.3
English English
75
• What is the most probable derivation (parse • If parse trees are provided for training sentences, a
tree) for a sentence. grammar and its parameters can be can all be
estimated directly from counts accumulated from the
tree-bank (with appropriate smoothing).
Tree Bank
S
John liked the dog in the pen.
S → NP VP 0.9 NP VP
S → VP S S → NP VP 0.9
0.1 John V NP PP
NP → Det A N S → VP 0.1
0.5
NP → NP PP 0.3 NP VP put the dog in the pen NP → Det A N 0.5
NP → PropN 0.2 PCFG
John V NP
S
Supervised NP → NP PP
NP → PropN
0.3
0.2
A→ε 0.6 Parser
NP VP
13
Estimating Production Probabilities PCFG: Maximum Likelihood Training
• Set of production rules can be taken directly • Given a set of sentences, induce a grammar that
maximizes the probability that this data was
from the set of rewrites in the treebank. generated from this grammar.
• Parameters can be directly estimated from • Assume the number of non-terminals in the
frequency counts in the treebank. grammar is specified.
• Only need to have an unannotated set of sequences
generated from the model. Does not need correct
parse trees for these sentences. In this sense, it is
unsupervised.
79 80
81 82
S → NP VP 0.9 John put the dog in the pen. S S → NP VP 0.9 John put the dog in the pen. S
S → VP 0.1 S → VP 0.1
X
NP → Det A N 0.5 NP VP NP → Det A N 0.5 NP VP
NP → NP PP 0.3 NP → NP PP 0.3
NP → PropN 0.2 PCFG John V NP PP NP → PropN 0.2 PCFG John V NP
A→ε 0.6 Parser A→ε 0.6 Parser
A → Adj A 0.4 put the dog in the pen A → Adj A 0.4 put the dog in the pen
PP → Prep NP 1.0 PP → Prep NP 1.0
VP → V NP 0.7 VP → V NP 0.7
VP → VP PP 0.3 VP → VP PP 0.3
English 83 English 84
14
Treebanks First WSJ Sentence
• English Penn Treebank: Standard corpus for ( (S
testing syntactic parsing consists of 1.2 M words (NP-SBJ
of text from the Wall Street Journal (WSJ). (NP (NNP Pierre) (NNP Vinken) )
(, ,)
• Typical to train on about 40,000 parsed sentences (ADJP
(NP (CD 61) (NNS years) )
and test on an additional standard disjoint test set (JJ old) )
of 2,416 sentences. (, ,) )
(VP (MD will)
• Chinese Penn Treebank: 100K words from the (VP (VB join)
Xinhua news service. (NP (DT the) (NN board) )
(PP-CLR (IN as)
• Other corpora existing in many languages, see the (NP (DT a) (JJ nonexecutive) (NN director) ))
Wikipedia article “Treebank” (NP-TMP (NNP Nov.) (CD 29) )))
(. .) ))
85 86
87
Recall = 10/12= 83.3% Precision = 10/12=83.3% F1 = 83.3%
89 90
15
Ambiguity Resolution Word Sense Disambiguation (WSD)
is Required for Translation as Text Categorization
• Syntactic and semantic ambiguities must be properly • Each sense of an ambiguous word is treated as a category.
– “play” (verb)
resolved for correct translation: • play-game
• play-instrument
– “John plays the guitar.” → “John toca la guitarra.” • play-role
– “John plays soccer.” → “John juega el fútbol.” – “pen” (noun)
• writing-instrument
• An apocryphal story is that an early MT system gave • enclosure
the following results when translating from English to • Treat current sentence (or preceding and current sentence)
Russian and then back to English: as a document to be classified.
– “play”:
– “The spirit is willing but the flesh is weak.” ⇒ • play-game: “John played soccer in the stadium on Friday.”
“The liquor is good but the meat is spoiled.” • play-instrument: “John played guitar in the band on Friday.”
• play-role: “John played Hamlet in the theater on Friday.”
– “Out of sight, out of mind.” ⇒ “Invisible idiot.” – “pen”:
• writing-instrument: “John wrote the letter with a pen in New York.”
• enclosure: “John put the dog in the pen in New York.”
91 92
93 94
16
Learning Algorithms Learning Curves for WSD of “line”
• Naïve Bayes
– Binary features
• K Nearest Neighbor
– Simple instance-based algorithm with k=3 and Hamming distance
• Perceptron
– Simple neural-network algorithm.
• C4.5
– State of the art decision-tree induction algorithm
• PFOIL-DNF
– Simple logical rule learner for Disjunctive Normal Form
• PFOIL-CNF
– Simple logical rule learner for Conjunctive Normal Form
• PFOIL-DLIST
– Simple logical rule learner for decision-list of conjunctive rules
97 98
Discussion of
Learning Curves for WSD of “line”
• Naïve Bayes and Perceptron give the best results.
• Both use a weighted linear combination of
evidence from many features.
• Symbolic systems that try to find a small set of
Other Syntactic Tasks
relevant features tend to overfit the training data
and are not as accurate.
• Nearest neighbor method that weights all features
equally is also not as accurate.
• Of symbolic systems, decision lists work the best.
99
17
Part Of Speech (POS) Tagging Phrase Chunking
• Annotate each word in a sentence with a • Find all non-recursive noun phrases (NPs)
part-of-speech. and verb phrases (VPs) in a sentence.
I ate the spaghetti with meatballs. – [NP I] [VP ate] [NP the spaghetti] [PP with]
Pro V Det N Prep N [NP meatballs].
John saw the saw and decided to take it to the table.
– [NP He ] [VP reckons ] [NP the current account
PN V Det N Con V Part V Pro Prep Det N
deficit ] [VP will narrow ] [PP to ] [NP only #
• Useful for subsequent syntactic parsing and 1.8 billion ] [PP in ] [NP September ]
word sense disambiguation.
107
18
Textual Entailment Problems
from PASCAL Challenge
ENTAIL
TEXT HYPOTHESIS MENT
Anaphora Resolution/
Co-Reference
Ellipsis Resolution
• Determine which phrases in a document refer • Frequently words and phrases are omitted
to the same underlying entity. from sentences when they can be inferred
– John put the carrot on the plate and ate it. from context.
"Wise men talk because they have something to say;
– Bush started the war in Iraq. But the president fools
fools,talk because
because theythey
havehave to say
to say something.“
something.“ (Plato)
(Plato)
needed the consent of Congress.
• Some cases require difficult reasoning.
• Today was Jack's birthday. Penny and Janet went to the store.
They were going to get presents. Janet decided to get a kite.
"Don't do that," said Penny. "Jack has a kite. He will make you
take it back."
114
19
Question Answering Text Summarization
• Directly answer natural language questions • Produce a short summary of a longer document or
based on information presented in a corpora article.
of textual documents (e.g. the web). – Article: With a split decision in the final two primaries and a flurry of
superdelegate endorsements, Sen. Barack Obama sealed the Democratic
– When was Barack Obama born? (factoid) presidential nomination last night after a grueling and history-making
campaign against Sen. Hillary Rodham Clinton that will make him the
• August 4, 1961 first African American to head a major-party ticket. Before a chanting and
cheering audience in St. Paul, Minn., the first-term senator from Illinois
– Who was president when Barack Obama was savored what once seemed an unlikely outcome to the Democratic race
born? with a nod to the marathon that was ending and to what will be another
hard-fought battle, against Sen. John McCain, the presumptive Republican
• John F. Kennedy nominee….
– How many presidents have there been since – Summary: Senator Barack Obama was declared the
presumptive Democratic presidential nominee.
Barack Obama was born?
• 9
• Translate a sentence from one natural • The need for disambiguation makes language
understanding difficult.
language to another. • Levels of linguistic processing:
– Hasta la vista, bebé ⇒ – Syntax
Until we see each other again, baby. – Semantics
– Pragmatics
• CFGs can be used to parse natural language but
produce many spurious parses.
• Statistical learning methods can be used to:
– Automatically learn grammars from (annotated)
corpora.
– Compute the most likely interpretation based on a
learned statistical model.
118
20