You are on page 1of 41

Natural Language Processing

About the Course


• Course Title: Natural Language Processing
• Course Code: CMIT 6122
• Credit Hour: 3
• ECTS Credits: 5
• Contact Hours (per week): Lecture – 2 Lab - 2
Course Objective
• To introduce the fundamental concepts and ideas
in natural language processing.

• To understand both algorithms for processing


linguistic informative and computation properties
of natural languages.

• Particularly, the course considers computational


morphology, syntactic, and semantic processing
from both a linguistic and an algorithmic
perspective, aiming to get up with current research
areas.
Course Contents
• Chapter One: Introduction:
– Natural Language Processing Concepts
– Ambiguity and Uncertainty in Language
• Chapter Two: Regular Expressions and
Finite-state Automata
– Regular Expression
– Finite State Automata
Course Contents…
• Chapter Three: Words and Transducers
– Introduction
– Finite State Morphological Parsing
– Transducers and Orthographic Rules
• Chapter Four: Language Modeling
– N-gram
– Word Classes and Parts of Speech Tagging
– Hidden Markov and Maximum Entropy Models
Course Contents…
• Chapter Five: Syntax
– Formal Grammar
– Parsing with Context Free Grammars
– Statistical Parsing
– Language and Complexity
• Chapter Six: Semantics
– Representing Meaning
– Computational Semantics
Course Contents…
• Chapter Seven: Machine Learning for
Natural Language Processing
– Supervised and Unsupervised Machine learning
– Bayesian Networks
• Current Research Issues in NLP
– Phonetic
– Morphology
– Syntax
– Semantics
– ….
References
• Daniel Jurafsky and James H. M.: Speech and
language processing, prentice-Hall, 2006. (text
book)
• Christopher M. and Hinrich S.: Foundation of
statistical natural language processing, MIT press,
1999.
• Eugene C.: Statistical language learning, MIT
press,1996
• Robert D., Hermann M. and Harold S.: Handbook
of natural language processing, 2000
References…
• Lucja M. Iwanska and Stuart C. : Natural
Language processing and knowledge
representation, MIT press, 2000.
• Frederik J. : Statistical methods for speech
recognition, MIT press, 1998
• Roland R. H. : Foundation of computational
Lingustics: Human-computer communication in
natural language, Springs Verlag, 2001
Evaluation
• Assignments 20%
• Mini Project 30%
• Final Exam 50%
============================
Natural Language
• Natural Language:
– The system of communication in speech and
writing
– In contrast to artificial languages
• programming languages and
• mathematical notations
– A language is not random
• It is composed of rules.
• Even though it is hard to pin down with explicit
rules
Natural Language Processing
• To perform useful tasks involving human
language, tasks like
– enabling human-machine communication,
– improving human-human communication, or
– useful processing of text or speech
• For example,
– Conversational agents
Conversational Agent - HAL
• HAL (Heuristically programmed Algorithmic computer)
– Dave Bowman:
“Open the pod bay doors, HAL”
– HAL:
“I'm sorry Dave, I’m afraid I can’t do that.”

Stanley Kubric and Arthur C. Clarke


Screen play of “2001: A Space Oddyssey”
What it takes to have HAL like computer
• Understanding Human Input:
– Speech Recognition
– Natural Language Understanding
• Ability to communicate information comparable to
humans:
– Natural Language Generation
– Speech Synthesis
• Information Retrieval:
– Finding out where needed textual resources reside
• Information Extraction:
– Extraction of pertinent Facts from textual resources
• Inference:
– Drawing conclusions based on known facts
Knowledge in Speech & Language Processing

• Techniques that process Spoken and Written human


language.

• Necessary use of knowledge of a language.

• Example: Unix wc command:


– Counts bytes and number of lines that a text file contains.
– Also counts number of words contained in a file.
• Requires knowledge of what it means to be a word.
Knowledge in Speech & Language Processing
• HAL ⇦ David:
– Requires analysis of audio signal:
• Generation of exact sequence of the words that David is
saying. (signal to text)
• Analysis of additional information that determines meaning
of that sequence of the words. (understand meaning of
sequence of words)
• HAL ⇨ David
– Requires to ability to generate an audio signal that can be
recognized:
• Phonetics,
• Phonology,
• Synthesis, and
• Syntax
Knowledge in Speech & Language Processing
– Hal must have knowledge of morphology in order to capture
the information about the shape and behavior of words in
context

– the way words break down into component parts that carry
meanings like singular versus plural
Knowledge in Speech & Language Processing

• Beyond individual words:

– HAL must know how to analyze the structure of Dave’s


utterance.
• REQUEST: HAL, open the pod bay door
• STATEMENT: HAL, the pod bay door is open
• QUESTION: HAL, is the pod bay door open?

– HAL must use similar structural knowledge to properly string


together the words that constitute its response (Syntax):
• I’m I do, sorry that afraid Dave I’m can’t.
Knowledge in Speech & Language Processing

• Knowing the words and Syntactic structure of what Dave


said does not tell HAL much about the nature of his
request.
– Knowledge of the meanings of the component words is
required (lexical semantics)
– Knowledge of how these components combine to form
larger meanings (compositional semantics).
Knowledge in Speech & Language Processing

• Despite its bad behavior HAL knows enough to be


polite to Dave (pragmatics).
– Direct Approach:
• No
• No, I won’t open the door.
– Embellishment:
• I’m sorry
• I’m afraid
– Indirect Refusal: I can’t
– Direct Refusal: I won’t.
Knowledge in Speech & Language Processing

• Instead simply ignoring Dave’s request HAL chooses


to engage in a structured conversation relevant to
Dave’s initial request.
– HAL’s correct use of the words “that” in its answer to
Dave’s request is a simple illustration of the kind of
between-utterance device common in such conversations.
– Correctly structuring such conversations requires knowledge
of discourse conventions.
Knowledge in Speech & Language Processing

• In the following question:


– How many states were in the United States that
year?
– One needs to know what “that year” refers too.
Summary
• Phonetics and Phonology:
– The study of linguistic sounds
• Morphology:
– The study of the meaningful components of words.
• Syntax:
– The study of the structural relationships between words.
• Semantics:
– The study of meaning
• Pragmatics:
– The study of how language is used to accomplish goals.
• Discourse:
– The study of linguistic units larger then a single utterance.
Ambiguity
• Most if not all tasks in speech and language
processing can be viewed as resolving ambiguity.
• Example:
– I made her duck.
• Possible interpretations:
1. I cooked waterfowl for her
2. I cooked waterfowl belonging to her
3. I created (plaster?) duck she owns.
4. I caused her to quickly lower her head or body.
5. I waived my magic want and turned her into undifferentiated
waterfowl.
Ambiguity
• These different meanings are caused by a number
of ambiguities.
1. First, the words duck and her are morphologically or
syntactically ambiguous in their part-of-speech.
• Duck can be a verb or a noun, while
• her can be a dative pronoun or a possessive
pronoun.
Duck (webster.com)
1
duck noun, often attributive \ˈdək\, plural ducks
1 or plural duck
a : any of various swimming birds (family Anatidae, the duck family) in which the neck and legs are short,
the feet typically webbed, the bill often broad and flat, and the sexes usually different from each
other in plumage
b : the flesh of any of these birds used as food
2 : a female duck — compare DRAKE
3 chiefly British : DARLING —often used in plural but singular in construction
4 : PERSON, CREATURE

duck verb intransitive verb


Definition of DUCK 1 a : to plunge under the surface of water
transitive verb b : to descend suddenly : DIP
1 : to thrust under water 2 a : to lower the head or body suddenly : DODGE
2 : to lower (as the head) quickly : BOW b : BOW, BOB
3 : AVOID, EVADE <duck the issue> 3 a : to move quickly
b : to evade a duty, question, or responsibility
Her (webster.com)
1
her adj \(h)ər, ˈhər\
Definition of HER
: of or relating to her or herself especially as possessor, agent, or object of an

action <her house> <her research>


2
her pronoun
objective form of SHE
dative pronoun
— used to refer to a certain woman, girl, or female animal as the object of a verb or a
preposition ▪ Tell her I said hello. ▪ Did you invite her? ▪ I gave the book to her. ▪ a gift
for her ▪ The dress fits her sister as well as her

possessive pronoun
I gave her book back to her.
Ambiguity
2. Second, the word make is semantically ambiguous; it
can mean create or cook.
3. Finally, the verb make is syntactically ambiguous in a
different way.
◆ Make can be transitive, that is, taking a single direct object (2),
or
◆ it can be intransitive, that is, taking two objects (5), meaning
that the first object (her) got made into the second object
(duck).
– Finally, make can take a direct object and a verb,
meaning that the object (her) got caused to perform the
verbal action (duck).
Approaches for Disambiguation
• The models and algorithms as ways to resolve or
disambiguate these ambiguities.
• For example:
– deciding whether duck is a verb or a noun can be solved
by part-of-speech tagging.
– deciding whether make means “create” or “cook” can
be solved by word sense disambiguation.
• Resolution of part-of-speech and word sense
ambiguities are two important kinds of lexical
disambiguation.
– A wide variety of tasks can be framed as lexical
disambiguation problems. For example,
• A text-to-speech synthesis system reading the word lead needs
to decide whether it should be pronounced as in lead pipe or as
in lead me on.
Approaches for Disambiguation
• By contrast, deciding whether
– her and duck are part of the same entity (as in (1) or (4)) or
– are different entity (as in (2)) is an example of syntactic
disambiguation and can be addressed by probabilistic parsing.

• Ambiguities that don’t arise in this particular example (like


whether a given sentence is a statement or a question) will
also be resolved by speech act interpretation.
Models and Algorithms
• The various kinds of knowledge described in the
last sections can be captured through
– the use of a small number of formal models, or
theories.
• Fortunately, these models and theories are all
drawn from the standard toolkits of
– computer science,
– mathematics, and
– Linguistics
• Among the most important models are
– State machines,
– Rule systems,
Models and Algorithms
– Logic,
– Probabilistic models, and
– Vector-space models.
• These models, in turn, lend themselves to a small
number of algorithms, among the most important
of which are
– state space search algorithms such as
• dynamic programming, and
– machine learning algorithms such as
• classifiers and
• EM (Estimation Maximization) and other learning
algorithms
State Machines
• In their simplest formulation, state machines are
formal models that consist of
– states,
– transitions among states, and
– an input representation.
• Some of the variations of this basic model that we
will consider are
– deterministic and non-deterministic finite-state
automata and
– finite-state transducers.
Grammars
• Closely related to these models are their
declarative counterparts:
– formal rule systems.
• Among the more important ones we will consider
are
– regular grammars and
– regular relations,
– context-free grammars,
– feature-augmented grammars, as well as
– probabilistic formulations
• State machines and formal rule systems are the main tools used
when dealing with knowledge of phonology, morphology, and
syntax.
Logic
• The third model that plays a critical role in capturing
knowledge of language is logic.
– first order logic, also known as the predicate calculus, as well as
– such related formalisms as
• lambda-calculus,
• feature-structures, and
• semantic primitives.
• These logical representations have traditionally been used
for modeling semantics and pragmatics,
• although more recent work has focused on more robust
techniques drawn from non-logical lexical semantics
Probabilistic Models
• Probabilistic models are crucial for capturing every kind of
linguistic knowledge.
• Each of the other models (state machines, formal rule
systems, and logic) can be augmented with probabilities.
• For example the state machine can be augmented with
probabilities to become
– the weighted automaton or
– Markov model.
• We will spend a significant amount of time on hidden
Markov models or HMMs, which are used everywhere in
the field, in
– part-of-speech tagging,
– speech recognition,
– dialogue understanding,
– text-to-speech, and
– machine translation.
Probabilistic Models
• The key advantage of probabilistic models is
– their ability to solve the many kinds of ambiguity
problems that we discussed earlier;
– almost any speech and language processing problem
can be recast as:
• “given N choices for some ambiguous input, choose the most
probable one”.
Vector-Space Models
• Finally, vector-space models, based on linear algebra,
underlie information retrieval and many treatments of word
meanings.
• Processing language using any of these models typically
involves a search through a space of states representing
hypotheses about an input.
– In speech recognition, we search through a space of phone
sequences for the correct word.
– In parsing, we search through a space of trees for the syntactic
parse of an input sentence.
– In machine translation, we search through a space of translation
hypotheses for the correct translation of a sentence into another
language.
• For non-probabilistic tasks, such as state machines, we use well-known
graph algorithms such as depth-first search.
• For probabilistic tasks, we use heuristic variants such as
– best-first and
– A* search, and rely on dynamic programming algorithms for
computational tractability.
Classifiers & Sequence Models
• For many language tasks, we rely on machine
learning tools like
– classifiers and
– Sequence models.
• Classifiers like
– decision trees,
– support vector machines,
– Gaussian Mixture Models and
– logistic regression are commonly used.
• Sequence Models:
– A Hidden Markov Model is one kind of sequence
model; other models are
– Maximum Entropy Markov Models or
– Conditional Random Fields.
Cross-Validation
• Another tool that is related to machine learning is
methodological;
• the use of distinct training and test sets, statistical
techniques like cross-validation, and careful
evaluation of our trained systems.
Reading Assignment
• Brief History

You might also like