Machine Translation

Machine Translation
A Presentation by:
Julie Conlonova,
Rob Chase,
and Eric Pomerleau
Overview
 Language Alignment System

 Datasets
 Sentence-aligned sets for training (ex. The
Hansards Corpus, European Parliamentary
Proceedings Parallel Corpus)
 A word-aligned set for testing and evaluation
to measure accuracy and precision
 Decoding
Language Alignment
 Goal: Produce a word-aligned set from

a sentence-aligned dataset
 First step on the road toward Statistical
Machine Translation
 Example Problem:
 The motion to adjourn the House is now
deemed to have been adopted.
 La motion portant que la Chambre s'ajourne
maintenant est réputée adoptée.
IBM Models 1 and 2
-Kevin Knight, A Statistical MT Tutorial Workbook, 1999
 Each capable of being used to produce a

word-aligned dataset separately.
 EM Algorithm
 Model 1 produces T-values based on
normalized fractional counting of
corresponding words.
 Additionally, Model 2 uses A-values for
“reverse distortion probabilities” –
probabilities based on the positions of the
words
Training Data
 European Parliament Proceedings Parallel
Corpus 1996-2003
 Aligned Languages:
 English - French
 English - Dutch
 English - Italian
 English - Finish
 English - Portuguese
 English - Spanish
 English - Greek
Training Data cont.
 Eliminated
 Misalignedsentences
 Sentences with 50 or more words
 XML tags
 Symbols and numerical characters other then

commas and periods
Ideally…
http://www.cs.berkeley.edu/~klein/cs294-5
Bypassing Interlingua: Models I-III
 Variables contributing to the probability

of a sentence:
Correlation between words in the
source/target languages
Fertility of a word
Correlation between order of words in
source sentence and order of words
in target
A Translation Matrix
Rob Cat is Dog
Rob 1 0 0 0
Gato 0 1 0 0
es 0 0 .5 0
esta 0 0 .5 0
Perro 0 0 0 1
Building the Translation Matrix: Starting
from alignments
 Find the sentence alignment

 If a word in the source aligns with a word
in the target, then increment the
translation matrix.
 Normalize the translation matrix
Can’t find alignments
 Most sentences in the hansards corpus

are 60 words long. There are many that
can be over 100.
 100100 possible alignments
Counting
 Rob is a boy. Rob es nino.
 Rob is tall. Rob es alto.
 Eric is tall. Eric es alto.
… …
Base counts on co-occurrence, weighting
based on sentence length.
Iterative Convergence
 Use Estimation Rob Is Tall boy
Maximization
algorithm Rob .66 .33 .25 .25
 Creates translation
matrix es .30 .66 .25 .25
alto .2 .05 .5 0
nino .2 .05 0 .5
Distorting the Sentence
 Word order changes between languages
 How is a sentence with 2 words distorted?
 How is a sentence with 3 words distorted?
 How is a sentence with …
To keep track of this information we use…

A tesseract!
 (A quadruply nested default

dictionary)
 This could be a problem if there
are more than 100 words in a
sentence.
 100x100x100x100 = too big for
RAM and takes too much time
Broad Look at MT
 “The translation process can be
described simply as:
1. Decoding the meaning of the source text, and
2. Re-encoding this meaning in the target
language.”
- “Translation Process”, Wikipedia, May 2006
Decoding
 How to go from the T-matrix and A-matrix
to a word alignment?
 There are several approaches…

Viterbi
 If only doing alignment, much smaller

memory and time requirements.
 Returns optimal path.
 T-Matrix probabilities function as the

“emission” matrix
 A-Matrix probabilities concerned with
the positioning of words
Decoding as a Translator
Without supplying a translated sentence

to the program, it is capable of being a
stand-alone translator instead of a word
aligner.
However, while the Viterbi algorithm runs

quickly with pruning for decoding, for
translating the run time skyrockets.
Greedy Hill Climbing
Knight & Koehn, What’s New in Statistical Machine Translation, 2003
 Best first search

 2-step look ahead to avoid getting stuck in
most probable local maxima
Beam Search
 Optimization of Best First Search with

heuristics and “beam” of choices
 Exponential tradeoff when increasing the
“beam” width
Other Decoding Methods
 Finite State Transducer

 Mapping between languages based on a finite
automaton
 Parsing
 String to Tree Model
Problem: One to Many
Necessary to take all alignments over a

certain probability in order to capture the
“probability that e has fertility at least a
given value”
Al-Onaizan, Curin, Jahr, etc., Statistical Machine Translation, 1999

Results
 Study done in 2003 on word alignment

error rates in Hansards corpus:
 Model 2–
 29.3% on 8K training sentence pairs
 19.5% on 1.47M training sentence pairs
 Optimized Model 6 –
 20.3% on 8K training sentence pairs
 8.7% on 1.47M training sentence pairs
Och and Ney, A Systematic Comparison of Various Statistical Alignment
Models, 2003
Expected Accuracy
70% overall
 Language performance:
 Dutch
 French
• Italian, Spanish, Portuguese
 Greek
 Finish
Possible Future Work
 Given more time, we would’ve implemented IBM

Model 3
 Additionally uses n, p, and d fertilities for weighted
alignments:
 N, number of words produced by one word
 D, distortion
 P, parameter involving words that aren’t involved directly
 Invokes Model 2 for scoring
Another Possible Translation Scheme
 Example-Based Machine Translation

 Translation-by-Analogy
 Can sometimes achieve better than the “gist”

translations from other models
Why Is Improving Machine
Translation Necessary?
A Chinese to English Translation
The End
Are there any
questions/comments?

Machine Translation

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Translation

Uploaded by

Copyright:

Available Formats

Machine Translation

 Language Alignment System

 Goal: Produce a word-aligned set from

 Each capable of being used to produce a

 Symbols and numerical characters other then

 Variables contributing to the probability

 Find the sentence alignment

 Most sentences in the hansards corpus

To keep track of this information we use…

 (A quadruply nested default

 There are several approaches…

 If only doing alignment, much smaller

 T-Matrix probabilities function as the

Without supplying a translated sentence

However, while the Viterbi algorithm runs

 Best first search

 Optimization of Best First Search with

 Finite State Transducer

Necessary to take all alignments over a

Al-Onaizan, Curin, Jahr, etc., Statistical Machine Translation, 1999

 Study done in 2003 on word alignment

 Given more time, we would’ve implemented IBM

 Example-Based Machine Translation

 Can sometimes achieve better than the “gist”

You might also like