You are on page 1of 42

Module 2 – Natural

Language Processing
Paulo Gomes

DEI – FCTUC, 2006/2007

Paulo Gomes ATAI 06/07 1


Module Overview
• Introduction to Natural Language Processing
(NLP)

• Morphological Analysis

• Syntactic Analysis

• Semantic Analysis

• Applications
Paulo Gomes ATAI 06/07 2
Introduction to NLP
• Example:
– “But I have promises to keep, and miles to go before I sleep.”
[Miller, 2001]
– Using the word definitions:
Word Definitions Combinations
But 11 11
I 3 33
have 16 528
promises 7 3696
to 21 77616
keep 17 1319472
and 5 6597360
miles 5 32986800
to 21 692722800
go 29 20088961200
before 10 200889612000
I 3 602668836000
sleep 6 3616013016000

Paulo Gomes ATAI 06/07 3


Introduction to NLP
• How do humans manage this exponential
number of possible meanings?
• Response – Filtering by:
– Lexical Knowledge
– Syntactical Knowledge
– Semantic Knowledge
– Pragmatics Knowledge
–…

Paulo Gomes ATAI 06/07 4


Introduction to NLP
• Another Example:
– Dave Bowman: “Open the pod bay doors, HAL”.
– HAL: “I'm sorry Dave, I'm afraid I can't do that”.
• from “2001: A Space Odyssey”

– What HAL must do to say this:


• Phonetics → signal analysis:
– Sound → Symbols → Reasoning → Symbols → Sound

Paulo Gomes ATAI 06/07 5


Introduction to NLP
• What HAL must do to say this:
– Morphology → understand words:
• “pod bay doors” – meaning of words.
• “I’m” – inflection, linguistic phenomenon.

– Syntax → understand the combination of words.


• “Open the pod bay doors, HAL” – is a valid sentence, with
clear roles for words.

Paulo Gomes ATAI 06/07 6


Introduction to NLP
• What HAL must do to say this:
– Semantics → understand the meaning of sentences:
• “Open the pod bay doors, HAL” – identification of a request,
and what is the request.

– Pragmatics/Discourse → understand the request and


take action according.
• “Open the pod bay doors, HAL” – a request for an action,
that HAL does not want to perform.

Paulo Gomes ATAI 06/07 7


Introduction to NLP
• Knowledge Categories:
– Phonetics and Phonology → language sounds.
– Morphology → words.
– Syntax → structural relations between words.
– Semantics → meaning of words and sentences.
– Pragmatics → how the language is used.
– Discourse → linguistic components above sentences
(anaphora, metaphor …).

Paulo Gomes ATAI 06/07 8


Introduction to NLP
• Main difference between Natural
Languages (NL) and Formal Languages
(FL):

Symbols in FL are not ambiguous as in NL.

Paulo Gomes ATAI 06/07 9


Module Overview
• Introduction to Natural Language Processing
(NLP)

• Morphological Analysis

• Syntactic Analysis

• Semantic Analysis

• Applications
Paulo Gomes ATAI 06/07 10
Morphological Analysis

“The University of Coimbra is 700 years old.”

• Tokenization:
– The | University | of | Coimbra | is | 700 | years | old | .

Refers to the same entity!!!

– Problem:
• Identification of Compound Names (Named Entity
Recognition).

Paulo Gomes ATAI 06/07 11


Morphological Analysis

“The University of Coimbra is 700 years old.”

• Elimination of stop words:


– The | University | of | Coimbra | is | 700 | years | old | .

– University of Coimbra | is | 700 | years | old

Paulo Gomes ATAI 06/07 12


Morphological Analysis

“The University of Coimbra is 700 years old.”

• Stemming or Morphological Analysis :


– University of Coimbra | is | 700 | years | old

– University of Coimbra | be | 700 | year | old

Paulo Gomes ATAI 06/07 13


Morphological Analysis
• Stemming or Morphological Analysis :
– Plurals:
• Years → year
– Verb forms:
• Has → have
– Compounding:
• Bookkeeper → book + keeper
– Word derivation:
• Prefixes:
– Shortness → short
• Suffixes:
– Unbuckle → buckle
• Circumfixes (very rare in english):
– enlighten
• Infixes (very rare in english):
– Piperidine

Paulo Gomes ATAI 06/07 14


Morphological Analysis

• Word/term identification.

• Finding the morphemes (constituents) of a word.

Paulo Gomes ATAI 06/07 15


Module Overview
• Introduction to Natural Language Processing
(NLP)

• Morphological Analysis

• Syntactic Analysis

• Semantic Analysis

• Applications
Paulo Gomes ATAI 06/07 16
Syntactic Analysis
The | University of Coimbra | is | 700 | years | old | .

• Part-Of-Speech (POS) tagging:


– Identification of word lexical classes.

– The → determiner (DT)


– University of Coimbra → singular proper noun (NNP)
– is → verb 3rd person singular present (VBZ)
– 700 → cardinal number (CD)
– years → plural noun (NNS)
ReBuilder TextToDiagram
– old → adjective (JJ) (OpenNLP)
– . → sentence-final ponctuation (.)
Paulo Gomes ATAI 06/07 17
Syntactic Analysis
• Penn Treebank project Tags:

Paulo Gomes ATAI 06/07 18


Syntactic Analysis
The [DT] | University of Coimbra [NNP] | is [VBZ] | 700 [CD]
| years [NNS] | old [JJ] | .

• Parsing:
– Full Parsing
• define the sentence structure using a parsing tree.
– Shallow Parsing
• define the sentence structure using parsing chunks.

Paulo Gomes ATAI 06/07 19


Syntactic Analysis
The [DT] | University of Coimbra [NNP] | is [VBZ] | 700 [CD]
| years [NNS] | old [JJ] | .

• Full Parsing:
S

NP VP

NP ADJP

The/DT University of Coimbra/NNP Is/VBZ 700/CD years/NNS old/JJ

Paulo Gomes ATAI 06/07 20


Syntactic Analysis
The [DT] | University of Coimbra [NNP] | is [VBZ] | 700 [CD]
| years [NNS] | old [JJ] | .

• Shallow Parsing:
• [NP The/DT University of Coimbra/NNP ]
• [VP is/VBZ ]
• [NP 700/CD years/NNS ]
• [ADJP old/JJ ]

Paulo Gomes ATAI 06/07 21


Syntactic Analysis

• Identification of word/term POS class (POS


tagging).

• Identification of sentence structure (parsing):


– Full Parsing
– Shallow Parsing

Paulo Gomes ATAI 06/07 22


Module Overview
• Introduction to Natural Language Processing
(NLP)

• Morphological Analysis

• Syntactic Analysis

• Semantic Analysis

• Applications
Paulo Gomes ATAI 06/07 23
Semantic Analysis
• Syntax-Driven Semantic Analysis:
– Example:
• “Vegetarians eat fruit.”

NP VP
∀x,y : vegetarian(x) Λ fruit(y) ⇒ eats(x,y)
NN
VB NN OR

Vegetarians eat fruit

Paulo Gomes ATAI 06/07 24


Semantic Analysis
• Syntax-Driven Semantic Analysis:
– Principle of compositionality:
• The meaning of a sentence can be composed from the
meaning of its parts.
S

NP VP
Natural
Language Semantic Semantic
Parser NN VB NN
Representation
Text Analysis
Vegetarians eat fruit

Parse Tree

Paulo Gomes ATAI 06/07 25


Semantic Analysis
• Lexical Semantics

“Bank”
What does it mean?

– Difference between symbol (lexeme) and


meaning of the symbol.

– Study of linguistic phenomenon.


Paulo Gomes ATAI 06/07 26
Semantic Analysis
• Lexical Semantics
– Homonymy:
• Same lexeme, different meaning (e.g. Bank).
• Word Sense Disambiguation
– Synonymy:
• Different lexemes, same meaning (e.g. Price &
Cost).
– Hyponymy:
• One lexeme is a subclass of another lexeme (e.g.
Car & Vehicle).
Paulo Gomes ATAI 06/07 27
Semantic Analysis
• Lexical Semantics
– Meronymy:
• One lexeme is part of another lexeme (e.g. Car &
Wheel).
– Other linguistic phenomenon:
• Metaphor
• Anaphora
• Metonymy
• …

Paulo Gomes ATAI 06/07 28


Semantic Analysis
• Lexical Semantics:
– to analyze the meaning of words and
semantic relations between them.
• Lexeme:
– an individual entity in the lexicon.
• Lexicon:
– the finite list of expressions used in the
language to express meaning.

Paulo Gomes ATAI 06/07 29


Semantic Analysis
• Thesaurus:
– Organizes lexemes and their meanings
(senses or synsets), along with the semantic
relations between senses.
• WordNet
– Online thesaurus:
http://wordnet.princeton.edu/perl/webwn

Paulo Gomes ATAI 06/07 30


Semantic Analysis
• Definition:
– the process whereby meaning representations are
composed and assigned to linguistic input.

Natural Language Text Formal


+ Semantic Analysis Representation
Parsing Structure (text meaning)

Paulo Gomes ATAI 06/07 31


Module Overview
• Introduction to Natural Language Processing
(NLP)

• Morphological Analysis

• Syntactic Analysis

• Semantic Analysis

• Applications
Paulo Gomes ATAI 06/07 32
NLP Applications
• Question & Answering (Q&A)
– Brain boost:
http://www.brainboost.com/

Natural
Web Search
Language
(Google, ...)
Question

Web
Answer Resulting Web
extraction Pages

Paulo Gomes ATAI 06/07 33


NLP Applications
• Machine Translation

– Google language tools:

http://www.google.com/language_tools?hl=en

Paulo Gomes ATAI 06/07 34


NLP Applications
• Conversation Systems:

– Eliza

http://www-ai.ijs.si/eliza-cgi-bin/eliza_script

– A page about Chatterbots

http://www.simonlaven.com/

Paulo Gomes ATAI 06/07 35


NLP Applications
• Text Mining
Query Concept
distribution in
the selected
documents

Selected
Documents

Temporal
distribution of
the selected
documents

Paulo Gomes ATAI 06/07 36


NLP Applications
• Information retrieval from Databases

Natural
SQL
Language
(SELECT)
Question

Database

Answer
Resulting
Extraction and
Table(s)
Formatting

Paulo Gomes ATAI 06/07 37


NLP Applications
• Document Management:
– Categorization, Clustering, Summarization

Document
List
Ontology

Information
about the
selected
Concept document
Selection

Paulo Gomes ATAI 06/07 38


NLP Applications
• Document/Email Routing and Filtering

EmailN
... Ontology User 1
+ Email1
Email2
User profiles
Email1

User 2
Information
NewsN
Routing System ...
...
News2
News2
News1

User N

Paulo Gomes ATAI 06/07 39


NLP Applications
• Other Applications:
– Query Expansion
– Web Mining
– Named Entity Recognition
– Word Sense Disambiguation
– Human Computer Interfaces
– Natural Language Generation
–…

Paulo Gomes ATAI 06/07 40


Next Modules
• Case-Based Reasoning (CBR)
• Planning
• Knowledge Discovery
• Intelligent Systems for Knowledge Management
• Ontologies
• Semantic Web
• Affective Computing
• Exploration of Unknown Environments and Map
Constructions
Paulo Gomes ATAI 06/07 41
The End

• Questions?

Paulo Gomes ATAI 06/07 42

You might also like