You are on page 1of 35

Natural Language Processing

Ambiguity

1
NLP Tasks
• NLP applications require several NLP analyses:
– Word tokenization
– Sentence boundary detection
– Part-of-speech (POS) tagging
• to identify the part-of-speech (e.g. noun, verb) of each word
– Named Entity (NE) recognition
• to identify proper nouns (e.g. names of person, location,
organization; domain terminologies)
– Parsing
• to identify the syntactic structure of a sentence
– Semantic analysis
• to derive the meaning of a sentence

2
1. Part-Of-Speech (POS) Tagging
• POS tagging is a process of assigning a POS or lexical
class marker to each word in a sentence (and all
sentences in a corpus).

Input: the lead paint is unsafe


Output: the/Det lead/N paint/N is/V unsafe/Adj

3
Syntactic Analysis - Grammar
• sentence -> noun_phrase, verb_phrase
• noun_phrase -> proper_noun
• noun_phrase -> determiner, noun
• verb_phrase -> verb, noun_phrase
• proper_noun -> [mary]
• noun -> [apple]
• verb -> [ate]
• determiner -> [the]
5
2. Named Entity Recognition (NER)
• NER is to process a text and identify named entities in a
sentence
– e.g. “U.N. official Ekeus heads for Baghdad.”

6
3. Shallow Parsing
• Shallow (or Partial) parsing identifies the (base) syntactic phases in
a sentence.

[NP He] [v saw] [NP the big dog]

• After NEs are identified, dependency parsing is often applied to


extract the syntactic/dependency relations between the NEs.
[PER Bill Gates] founded [ORG Microsoft].
found Dependency Relations
dobj
nsubj(Bill Gates, found)
nsubj
dobj(found, Microsoft)

Bill Gates Microsoft


7
4. Information Extraction (IE)

• Identify specific pieces of information (data) in an


unstructured or semi-structured text
• Transform unstructured information in a corpus of texts
or web pages into a structured database (or templates)
• Applied to various types of text, e.g.
– Newspaper
articles
– Scientific
articles
– Web pages
– etc.

8
Source: J. Choi, CSE842, MSU
9
Bridgestone Sports Co. said Friday it had set up a joint venture in Taiwan with a
local concern and a Japanese trading house to produce golf clubs to be supplied
to Japan.
The joint venture, Bridgestone Sports Taiwan Co., capitalized at 20 million new
Taiwan dollars, will start production in January 1990 with production of 20,000
iron and “metal wood” clubs a month.
template filling

TIE-UP-1 ACTIVITY-1
Relationship: TIE-UP Activity: PRODUCTION
Entities: “Bridgestone Sport Co.” Company:
“a local concern” “Bridgestone Sports Taiwan Co.”
“a Japanese trading house” Product:
Joint Venture Company: “iron and ‘metal wood’ clubs”
“Bridgestone Sports Taiwan Co.” Start Date:
Activity: ACTIVITY-1 DURING: January 1990
Amount: NT$200000000

10
But NLP very is hard..
• Understanding natural languages is hard …
because of inherent ambiguity
• Engineering NLP systems is also hard …
because of:
– Huge amount of data resources needed (e.g.
grammar, dictionary, documents to extract
statistics from)
– Computational complexity (intractable) of
analyzing a sentence

11
Why NL Understanding is hard?
• Natural language is extremely rich in form and structure,
and very ambiguous.
– How to represent meaning,
– Which structures map to which meaning structures.
• One input can mean many different things. Ambiguity
can be at different levels.
– Lexical (word level) ambiguity -- different meanings of words
– Syntactic ambiguity -- different ways to parse the sentence
– Interpreting partial information -- how to interpret pronouns
– Contextual information -- context of the sentence may affect the
meaning of that sentence.
• Many input can mean the same thing.
• Interaction among components of the input is not clear.
12
Ambiguity (1)

“Get the cat with the gloves.”

13
Ambiguity

14
Knowledge of Language
• Phonology – concerns how words are related to the sounds that
realize them.

• Morphology – concerns how words are constructed from more


basic meaning units called morphemes. A morpheme is the primitive
unit of meaning in a language.

• Syntax – concerns how can be put together to form correct


sentences and determines what structural role each word plays in
the sentence and what phrases are subparts of other phrases.

• Semantics – concerns what words mean and how these meaning


combine in sentences to form sentence meaning. The study of
context-independent meaning.

15
Phonology
• Red and Read
• Flower and Flour
• I and Eye
• Write and Right
• Knows and Nose
• Hear and Here
• Weight and Wait
• A part and Apart
• Piece and Peace
• ate and eight

16
What is Morphology?

-ing

bat bats rat rats


write writer browse browser
Morphology for NLP
• Machine Translation

Noun, Direct Case, Plural


Analyze Generate
Transfer

Noun,
Direct Case, Plural

• Information Retrieval
– goose and geese are two words referring to the same root goose
Morphemes
• Smallest meaning bearing units constituting a word
Classes of Morphology

• Inflection??
• Derivation??
Knowledge of Language (cont.)
• Pragmatics – concerns how sentences are used in different
situations and how use affects the interpretation of the sentence.

• Discourse – concerns how the immediately preceding sentences


affect the interpretation of the next sentence. For example,
interpreting pronouns and interpreting the temporal aspects of the
information.

• World Knowledge – includes general knowledge about the world.


What each language user must know about the other’s beliefs and
goals.

21
Ambiguity (2)
“I made her duck”
1. I cooked waterfowl for her benefit (to eat)
2. I cooked waterfowl belonging to her
3. I created the duck she owns
4. I caused her to quickly lower her head or body
5. I used magic and turned her into a duck.
• duck – morphologically and syntactically ambiguous:
noun or verb.
• her – syntactically ambiguous: dative or possessive.
• make – semantically ambiguous: cook or create.
• make – syntactically ambiguous:
– Transitive – takes a direct object.
– Di-transitive – takes two objects.
– Takes a direct object and a verb.

22
Ambiguity is Pervasive
• Phonetics
– I mate or duck
– I’m eight or duck
– Eye maid; her duck
– Aye mate, her duck
– I maid her duck Sound like
“I made her duck”
– I’m aid her duck
– I mate her duck
– I’m ate her duck
– I’m ate or duck
– I mate or duck

23
• Lexical category (part-of-speech)
– “duck” as a noun or a verb
• Lexical Semantics (word meaning)
– “duck” as an animal or a plaster duck statue
• Compound nouns
– e.g. “dog food”, “Intelligent design scores …”
• Syntactic ambiguity

“I saw a man on the hill with a telescope”

• [But semantics can sometimes help disambiguate]

“I saw a man on the hill with a hat”


• NLU VS NLG
24
Syntax + Semantics

25
Topics: Linguistics
• Word-level processing
• Syntactic processing
• Lexical and compositional semantics
• Discourse structure

26
Natural Language Understanding
Words

Morphological Analysis
Morphologically analyzed words (another step: POS tagging)
Syntactic Analysis
Syntactic Structure
Semantic Analysis
Context-independent meaning representation
Discourse Processing
Final meaning representation

27
Natural Language Generation
Meaning representation
Utterance Planning
Meaning representations for sentences
Sentence Planning and Lexical Choice
Syntactic structures of sentences with lexical choices
Sentence Generation
Morphologically analyzed words
Morphological Generation
Words

28
Different Levels of Linguistic Analysis
• Phonology
– Speech audio signal to phonemes
• Morphology
– Inflection (e.g. “I”, “my”, “me”; “eat”, “eats”, “ate”, “eaten”)
– Derivation (e.g. “teach”, “teacher”, “nominate”, “nominee”)
• Syntax
– Part-of-speech (noun, verb, adjective, preposition, etc.)
– Phrase structure (e.g. noun phrase, verb phrase)
• Semantics
– Meaning of a word (e.g. “book” as a bound volume or an
accounting ledger) or a sentence
• Discourse
– Meaning and inter-relation between sentences

29
Topics: Techniques

• Finite-state methods
• Context-free methods Supervised machine
• Probabilistic models learning methods
• Neural network models

30
31
Process Pipeline
• Phonology Each kind of knowledge has
• Morphology associated with it an encapsulated
• Syntax set of processes that make use of it.
• Semantics Interfaces are defined that allow the
• Pragmatics various levels to communicate.
• Discourse This often leads to a pipeline
architecture.

Morphological Syntactic Semantic


Context
Processing Analysis Interpretation

32
Dealing with Ambiguity
Four possible approaches:
1. Formal approaches -- Tightly coupled
interaction among processing levels;
knowledge from other levels can help decide
among choices at ambiguous levels.
2. Pipeline processing that ignores ambiguity as
it occurs and hopes that other levels can
eliminate incorrect structures.
3. Probabilistic approaches based on making
the most likely choices
4. Don’t do anything, maybe it won’t matter

33
Models and Algorithms
• By models we mean the formalisms that are used to
capture the various kinds of linguistic knowledge we
need.
• Algorithms are then used to manipulate the knowledge
representations needed to tackle the task at hand.

34
Various Algorithms
• In particular..
– State-space search
• To manage the problem of making choices during processing
when we lack the information needed to make the right choice
– Dynamic programming
• To avoid having to redo work during the course of a state-
space search
– CKY, Earley, Minimum Edit Distance, Viterbi, Baum-Welch
– Classifiers
• Machine learning based classifiers that are trained to make
decisions based on features extracted from the local context

35

You might also like