Professional Documents
Culture Documents
Translation
Pasindu Nivanthaka Tennage
Computer Science and Engineering
University of Moratuwa
Content
1. Overview
2. Language Model
3. Translation Model - Word Based translation
4. Translation Model - Phrase Based translation
5. Decoder
6. Evaluation
Overview
German parallel
Corpora
Translation Model
English Text
Language Model
Decoding algorithm
Language Model
1. Word Order
1. Parsing
2. Sequence Models
Parsing
NP VP
N N
V
Example:
IS THIS CORRECT???
p(John Loves Mary)=
p(John|<s>)*p(loves|<s>John)*p(Mary|<s>John loves)
*p(.|<s>John loves Mary)*p(</s>|<s>John Loves Mary)
-> Bigram
-> Trigram
-> Unigram
Example Bigram p(wi|wi-1)
p(John loves Mary) =
p(John|<s>)*p(Loves|John)*p(Mary|Loves)*p(.|Mary)*p(</s>|.)
p(John|<s><s>)*p(loves|<s>John)*p(Mary|John loves)
*p(.|loves Mary)*p(</s>|Mary.)
Parameter Estimation
Maximum likelihood estimation -> counting problem
Count(wi-1)
Count(wi-2 wi-1)
Estimating from text
PROBLEM :What if the word is missing???
SOLUTION: Reduce some value from calculated probability and distribute them
among absent words - SMOOTHING
Example:
2. Back off
3. Stupid backoff - GOOGLE
4. Linear Interpolation
Backoff
Example :
Stupid Backoff
s(wi | wi-2 wi-1) = LAMDA*s(wi | wi-1) else where LAMDA = 0.4 (only works for large
corpus)
Linear Interpolation
Example :
p(Malta | work in) = A1*p(Malta | work in) +A2*p (Malta | in) +A3* p(Malta)
A1+A2+A3=1
Translation model : Word Based Models
Words translation aka lexical translation
Requires a dictionary
count(Haus) = 10 000
p(house| Haus) =
Lexical translation probability distribution
Pf = e-> pf(e)
Given foreign word f, returns a probability for each choice of english translation e,
that indicates how likely the translation is
T tables :
e t(e|f)
thee 0.7
thy 0.2
tho 0.1
Alignment function
Maps each English output word at position i to a German input word at position j.
A: j->i
Problem: Some models in source language may have no relation to any of the
target language input words
For NULL
Number of
alignments
Example
54
Translation Model :Phrase Based Translation
NOTE: phrase is not the linguistic phrase it can be any combination of words
1. Words are not best atomic unit for translation (due to frequent one to many
mappings)
2. Translating word groups instead of single words helps to resolve translation
ambiguities.
Mathematical definition
Translation p(e | f)
Of course 0.4
naturally 0.3
Learning a phrase translation table
Consistency
A phrased based pair (f , e) consistent with an alignment A, if all words f1f2f3 . That
have alignment points in A have these words with e1e2e3 . and vise versa.
Task of Decoder is to find the best scoring translation according to these formulae.
2. Automatic evaluation
a. speed
b. Size (can be used in mobile phones)
c. Integration and usability