You are on page 1of 35

Introduction to Natural Language Processing (CSE 5321)

Lecture 07: Approaches to NLP

Department of Computer Science and Engineering


Adama Science and Technology University

Teshome M. Bekele

2020/21—Sem I
Introduction Representing Linguistic Knowledge
Models Rule-Based vs Statistical Approaches
Algorithms Mathematical Foundations

Representing Linguistic Knowledge

• In the preceding lectures, we have been engaged in enhancing knowledge about the
linguistic structures of English and Amharic at different levels.

• NLP requires a set of rules to represent knowledge about the linguistic structures.

• Depending on how rules are acquired, approaches to NLP can be rule-based or


statistical.

♦ Rule-based approach – rules are written manually

♦ Statistical approach – rules are acquired from large size corpora

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 2/35
Introduction Representing Linguistic Knowledge
Models Rule-Based vs Statistical Approaches
Algorithms Mathematical Foundations

Rule-Based vs Statistical Approaches

Rule-Based Approaches Statistical Approaches


• Requires linguistic expertise • Not much linguistic expertise required
• No frequency information • Based on frequency information
• More brittle and slower • Robust and quick
• Often more precise • Generalized model built from corpora
• Error analysis is usually easier • Error analysis is often difficult

Hybrid Approaches
• Both rule-based and statistical approaches have their own pros and cons.

• Thus, rule-based and statistical approaches are usually combined to benefit from their
synergy effect.

♦ This gives rise to hybrid approaches.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 3/35
Introduction Representing Linguistic Knowledge
Models Rule-Based vs Statistical Approaches
Algorithms Mathematical Foundations

Mathematical Foundations

• There are well-established mathematical foundations for both rule-based and statistical
approaches to NLP.

• The various kinds of knowledge representations of natural languages can be captured


through the use of a small number of formal mathematical models or theories.

• Models and theories applied in NLP are all drawn from the standard toolkit of computer
science, mathematics and linguistics.

• These mathematical models, in turn, lend themselves to a small number of algorithms.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 4/35
Introduction Representing Linguistic Knowledge
Models Rule-Based vs Statistical Approaches
Algorithms Mathematical Foundations

Mathematical Foundations: Commonly Used Models and Algorithms

Mathematical Models Commonly Used in NLP


• State machines

• Formal rule systems

• Logic-based models

• Probabilistic models

• Vector space models

Algorithms Commonly Used in NLP

• State space search algorithms

• Machine learning algorithms

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 5/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines

• State machines are widely used in NLP for modeling phonology, morphology and syntax.

• State machines are formal models that consist of states, transitions among states, and
an input representation.
♦ States – represent the set of properties of an abstract machine
♦ Transitions – represent jumps from one state to another
♦ Inputs – sequences of symbols or letters that can be read by the machine

• A machine with finite number of states is called finite state machine (FSM).

• FSM has two special states: start state and final state.
1 1 Input symbol
0 Transition
1 Final state
S0 S1 S2
0

Start state

• There are two types of FSMs: finite state automata and finite state transducers.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 6/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

• Finite state automaton (FSA) is finite state machine that only accepts a set of given
strings (a language).

• FSA can be deterministic or non-deterministic.

• In deterministic FSA, every state has one transition for each possible input.

♦ Example: A deterministic FSA that determines if a binary string contains


an even number of 0's.

1 1
0

ε S2
S0 S1
0

♦ Strings accepted by this deterministic FSA are: ε, 1, 11, 111, 00, 010,
1010, 10110, etc.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 7/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

• In non-deterministic FSA, an input can lead to one, more than one or no transition for
a given state.

♦ Example: A non-deterministic FSA that determines if a binary string


contains an even number of 0’s or an even number of 1’s.
1 1
0

ε S2
S1
0
S0 0 0
1
ε
S3 S4
1

♦ Strings accepted by this non-deterministic FSA are: ε, 1, 11, 111, 00,


010, 1010, 10110, 011, 11011, 1010101, etc.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 8/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Word Recognition
• FSAs can be used to recognize words in a language.

• Examples:

♦ Single word recognition

ሰ በ ረ
S0 S1 S2 S3

w a l k
S0 S1 S2 S3 S4

ሰበረ
S0 S1

walk
S0 S1

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 9/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Word Recognition
♦ Recognition of multiple words

ሰበረ, ሰበቀ, ሰበብ ረ

ሰበ ቀ
S0 S1 S2

internal, eternal, ethical, ethiopia, ethanol

in
S2
tern
e al
S0 S1 c
i opia
S4 S5
eth
S3 anol

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 10/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Word Recognition
♦ Recognition of multiple words (for instance, Amharic pronouns: Eኔ, Eኛ, Aንተ,
Aንቺ, Eናንተ, Eስዎ, Eርስዎ, Eሱ, Eርሱ, Eሷ, Eርሷ, Eሳቸው, Eርሳቸው, Eነሱ, Eነርሱ)


S1
Aን ሷ
E ሱ
ሳቸው
E ር ስዎ
S0 S2 S3 S6



ነ ናነተ

S4 ር S5

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 11/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Modeling Morphology
• One word and multiple inflections

s
walk ed
S0 S1 S2
ing

...
ኧን
ኧህ
ኣት
ኧው
S0 ሰበር S1 ኣቸው S2
ኧኝ
ኧሽ
ኣችሁ
ኣችሁት
..
.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 12/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Modeling Morphology
• Multiple words and multiple inflections
..
.
jump s
walk ed
S0 S1 S2
help ing
..
.
...
ኧን
ኧህ
..
. ኣት
ማረክ ኧው
S0 ሰበር S1 ኣቸው S2
ገደል ኧኝ
..
. ኧሽ
ኣችሁ
ኣችሁት
..
.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 13/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Modeling Morphology
• One word and multiple inflections with affixes

.
.
.

. ኧን
.
. ህ
Eንዲ ኣት
Eንዳይ ኧው
S0 ከሚ S1 ሰብር S2 ኣቸው S3
ሊ ብን
የሚ በት
.
. ለት
.
ባቸው
.
.
.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 14/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Modeling Morphology
• Multiple words and multiple inflections with affixes

.
.
.

. ኧን
.
. ህ
.
Eንዲ . ኣት
.
Eንዳይ ማርክ ኧው
S0 ከሚ S1 ሰብር S2 ኣቸው S3
ሊ ገድል ብን
.
የሚ . በት
.
.
. ለት
.
ባቸው
.
.
.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 15/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Modeling Morphology
• Marking part-of-speech

ion

[word] y cate
S0 S1 S3 S5

ism er y
ist
S2 S4

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 16/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Modeling Morphology
• Marking part-of-speech

ion

[word] y cate
S0 N Adj V

ism er y
ist
N N

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 17/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Automatically Learning Morphology


• Collect words in a large corpus and compile into a trie data structure:

... walk walked walking walks wall walls want wanted wanting
wants warn warned warning warns ...

d
e
k s
i
l n g
l
s
d
e
w a n t s
i g
n

r e d
n s
i g
n

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 18/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Automatically Learning Morphology


..
. ኧው
Eንደሚሰብረው
Uበት
Eንደሚሰብሩበት
Eንደሚሰብሩባቸው ሰብር Uባቸው
Uት
Eንደሚሰብሩት
Eንደሚሰብር
Eንደሚገድለው ገድል ኧው
Eንደሚገድሉበት
ሚ Uበት
Eንደሚገድሉባቸው
Uባቸው
Eንደሚገድሉት
Uት
Eንደሚገድል
Eንደማይሰብረው Eንደ
Eንደማይሰብሩበት ኧው
Eንደማይሰብሩባቸው ማይ Uበት
Eንደማይሰብሩት ሰብር Uባቸው
Eንደማይሰብር Uት
Eንደማይገድለው
Eንደማይገድሉበት
ገድል ኧው
Eንደማይገድሉባቸው
Eንደማይገድሉት Uበት
Eንደማይገድል Uባቸው
..
. Uት

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 19/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Automatically Learning Morphology


• Identify frequent suffix trees

d Discovered Morphology
e
k s • Stems - with common
i suffix tree:
l n g
l ♦ walk
s
♦ want
d ♦ warn
e
w a n t s
i • Morphemes - frequent
n g suffix tree:
r e d ♦ ε
n s ♦ – ed
i ♦ –s
n g ♦ – ing

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 20/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Automata

Automatically Learning Morphology

ኧው
Discovered Morphology
Uበት • Stems - with common
ሰብር Uባቸው suffix tree:
Uት
♦ ሰብር
ገድል ♦ ገድል
ኧው
ሚ Uበት • Morphemes - frequent
Uባቸው suffix tree:
Uት
Eንደ ♦ ε
♦ – ኧው
ኧው
ማይ
♦ – Uበት
Uበት
ሰብር Uባቸው
♦ – Uባቸው
Uት ♦ – Uት
• Other affixes:
ገድል ኧው
Uበት
♦ – Eንደ
Uባቸው ♦ –ሚ–
Uት ♦ – ማይ –

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 21/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Transducers

• Finite state transducers (FSTs) are extensions of finite state automata (FSA) that can
generate outputs.
• FSTs can be considered as:
♦ Recognizer: a machine that takes a pair of strings as input and outputs
“accept” if the string-pair is in the string-pair language, and
“reject” if it is not.
♦ Generator: a machine that outputs pairs of strings of the language, i.e. the
output is a “yes” or “no”, and a pair of output strings.
♦ Translator: a machine that reads a string and outputs another string.
♦ Set relater: a machine that computes relations between sets.

b:b b:ε b b b b
b
ε ε
a:b a a
S0 S1 S0 S1 S0 S1
b b
a:ba a a
ba ba
Different ways of representing input/output relations in FSTs
N.B: Identical input/output pairs can be written using one symbol, e.g. “b:b” Î “b”.
The ε symbol represents empty symbol.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 22/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Transducers

• Like FSA, FSTs can be deterministic (called sequential transducer) or non-deterministic


regarding their input.
• In sequential transducer, for each state there exists at most one outgoing transition on
one input symbol.
♦ However, sequential transducers may have nondeterministic output.
♦ Thus, multiple outgoing transitions with one output symbol may occur.

• Depending on the type of accepted input and produced output, FSTs can be:
♦ String-to-string transducers: produce strings as outputs.
♦ String-to-weight transducers: produce weights as outputs.

• The weights in string-to-weight transducers in most cases represent probabilities.


♦ Thus, string-to-weight transducers are also known as weighted automata or
probabilistic automata.
♦ In addition to the output weights of the transitions, string-to-weight
transducers are provided with initial and final weights (to the initial and final
states, respectively).

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 23/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Transducers

S1
b:ε
b a/2 b/3

a:b S1
S0
S0/4 S2/1
a:ba
b/5

A sequential string-to-string transducer A sequential string-to-string transducer


with nondeterministic output
Input “aab” produces: “bbab” Input “ab” produces: 4+2+3+1 = 10

aab
Initial weight Final weight

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 24/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Transducers

Two-Level Morphology
• In the finite-state morphology paradigm, a word is represented as a correspondence
between a lexical level and the surface level.
♦ Lexical level represents a concatenation of morphemes making up a word.
♦ Surface level represents the concatenation of letters which make up the actual
spelling of the word.
• Morphological parsing is the process of building a structured representation of words by
breaking down into component morphemes. For example:
♦ “bigger” is morphologically parsed as “big+ADJ+COMPARATIVE”.
♦ “lower” is morphologically parsed as “low+ADJ+COMPARATIVE”.
♦ “ተማሪዎች” is morphologically parsed as “ተማሪ+N+PLURAL”.
• Thus, morphological parser is used to identify the correspondence between a lexical
level and the surface level.
♦ For example, the lexical level representation for the surface level word “lower” is
“low+ADJ+COMPARATIVE”.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 25/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

State Machines: Finite State Transducers

Two-Level Morphology
• Two-level morphology is an important application of FSTs to morphological
representation and parsing.

Lexical level: ተ ማ ሪ +N ε +PLU


S0 S1 S2 S3 S4 S5 S6
Surface level: ተ ማ ሪ ε ዎ ች

b i g ε +ADJ ε +COMP
S1 S2 S3 S4 S5 S6 S7
b i g g ε e r
S0
l o w
S8 S9 w
l o

• FSTs can also be used to implement spelling rules applied during inflection of words.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 26/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

Formal Rule Systems

• Formal Rule Systems are formalisms used to define languages using formal grammars.
♦ Formal grammar is a set of formation rules for strings in a formal language.
♦ Formal grammars and languages are studied under Formal Language Theory.
• There are two types of grammars:
♦ Generative grammar: gives a set of rules that will correctly predict which
combinations of words will form grammatical sentences.
♦ Analytic grammar: tries to deal how a given string is determined.
• Formal Rule Systems are widely used in NLP to model:
♦ Phonology
♦ Morphology
♦ Syntax
• Review [Lecture 03] for further details.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 27/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

Logic-Based Models

• Logic-Based Models are formalisms used to define languages using mathematical logic.
• Commonly used logic models used in NLP are:
♦ First-order logic/predicate calculus
♦ λ-calculus
♦ semantic primitives
• Logic-Based Models are widely used in NLP to model:
♦ Semantics
♦ Pragmatics
• Review [Lecture 04] for further details.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 28/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

Vector Space Models

• Vector Space is a mathematical structure that is used to model natural language


concepts using the theories of linear algebra.
• Vector Spaces Models are widely used in the following NLP applications:
♦ Latent Semantic Analysis
♦ Information Filtering
♦ Information Retrieval
♦ Indexing
• Review [Lecture 04] and [Lecture 08] for further details.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 29/35
State Machines
Introduction Formal Rule Systems
Models Logic-Based Models
Algorithms Vector Space Models
Probabilistic Models

Probabilistic Models

• Probabilistic Models are statistical models constructed based on the theories of


probability.
• The basic concepts of probability that are commonly applied in NLP are:
♦ A Priori Probability: p(e) – the chance that e happens.
♦ Conditional Probability: p(f |e) – the chance of f given e.
♦ Joint Probability: p(e,f ) – the chance of e and f both happening.
• Each of the other NLP models can be augmented with probabilities.
♦ State machines augmented with probabilities can become weighted automata or
Markov model.
♦ Context Free Grammars (CFG) augmented with probabilities can become
Probabilistic Context Free Grammars (PCFG).
• Probabilistic models are the most commonly used models in NLP.
♦ They are widely used in the following NLP applications: Part-of-Speech Tagging,
Speech Recognition, Handwriting Recognition, Text-to-Speech Conversion,
Machine Translation, Disambiguation, etc.
• The use of probabilistic models in NLP is described in [Lecture 08] and [Lecture 09].

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 30/35
Introduction
State Space Search Algorithms
Models
Machine Learning Algorithms
Algorithms

State Space Search Algorithms

• State space is the set of states of the problem we can get to by applying operators to a
state of the problem to get a new state.
♦ Natural language problems are often modeled as a state space.
• A state space has some common properties:
♦ Complexity, where branching factor is important
♦ Structure of the space:
ƒ Directionality of arcs
ƒ Tree
ƒ Rooted graph
• The state space for a given problem is usually huge, and as a result, state space
searching requires efficient strategies.
• Dynamic programming is one of the most commonly used strategies.
♦ Dynamic programming is a method for solving complex state space search
problems by breaking them down into many simpler and much easier ones.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 31/35
Introduction
State Space Search Algorithms
Models
Machine Learning Algorithms
Algorithms

State Space Search Algorithms

• In NLP problems, among the most important algorithms that employ dynamic
programming strategy are:
♦ Viterbi Algorithm
ƒ Used for finding the most likely sequence of hidden states in Hidden
Markov Models (HMMs).
♦ Chart Parsing Algorithms
ƒ Partial hypothesized results are stored in a structure called chart.
ƒ Used for parsing strings that belong to Context Free Grammars (CFGs).
• There are two types of chart parsers.
♦ Early Parser: employs top-down parsing approach.
♦ Cocke-Younger-Kasami (CYK): employs bottom-up parsing approach.
• In general, state space search algorithms are used in the following NLP applications:
♦ Speech Recognition
♦ Parsing

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 32/35
Introduction
State Space Search Algorithms
Models
Machine Learning Algorithms
Algorithms

Machine Learning Algorithms

• Machine learning refers to a system capable of automatically learning from experience,


training, analytical observation, and other means.
♦ Results in a system that can continuously self-improve and thereby exhibit
efficiency and effectiveness.

New Example

Labeled Training Machine Learning Prediction Rule


Examples Algorithm

Predicted Classification

• Among the most important machine learning algorithms used in NLP are:
♦ Classifiers
♦ Expectation-Maximization (EM) algorithms

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 33/35
Introduction
State Space Search Algorithms
Models
Machine Learning Algorithms
Algorithms

Machine Learning Algorithms

• The goal of classifiers is to categorize a given input into a fixed set of categories based
on the training model.
♦ Classifiers commonly used in NLP are Decision Trees, Support Vector Machines,
Gaussian Mixture Models, etc.
• The EM algorithm is an efficient iterative procedure to compute the Maximum Likelihood
(ML) estimate in the presence of missing or hidden data.
• In ML estimation, we wish to estimate the model parameter(s) for which the observed
data are the most likely.
• Each iteration of the EM algorithm consists of two processes:
♦ The expectation (E)-step: the missing data are estimated given the observed
data and current estimate of the model parameters.
♦ The maximization (M)-step: the likelihood function is maximized under the
assumption that the missing data are known.
• Baum-Welch Algorithm is a special case of EM algorithm which is used as a standard
algorithm for HMM training.
• In general, Machine Learning Algorithms are widely used in the development of NLP
applications such as Document Classification, Disambiguation, Speech Recognition,
Machine Translation, Optical Character Recognition, etc.

Department of Computer Science and Engineering, ASTU Lecture 07: Approaches to NLP 34/35
TOC: Course Syllabus

Previous: Disambiguation

Current: Approaches to NLP

Next: Applications of NLP

You might also like