You are on page 1of 31

Machine Translation

A Presentation by:
Julie Conlonova,
Rob Chase,
and Eric Pomerleau
Overview

 Language Alignment System


 Datasets
 Sentence-aligned sets for training (ex. The
Hansards Corpus, European Parliamentary
Proceedings Parallel Corpus)
 A word-aligned set for testing and evaluation
to measure accuracy and precision
 Decoding
Language Alignment

 Goal: Produce a word-aligned set from


a sentence-aligned dataset
 First step on the road toward Statistical
Machine Translation
 Example Problem:
 The motion to adjourn the House is now
deemed to have been adopted.
 La motion portant que la Chambre s'ajourne
maintenant est réputée adoptée.
IBM Models 1 and 2
-Kevin Knight, A Statistical MT Tutorial Workbook, 1999

 Each capable of being used to produce a


word-aligned dataset separately.
 EM Algorithm
 Model 1 produces T-values based on
normalized fractional counting of
corresponding words.
 Additionally, Model 2 uses A-values for
“reverse distortion probabilities” –
probabilities based on the positions of the
words
Training Data
 European Parliament Proceedings Parallel
Corpus 1996-2003
 Aligned Languages:
 English - French
 English - Dutch
 English - Italian
 English - Finish
 English - Portuguese
 English - Spanish
 English - Greek
Training Data cont.
 Eliminated
 Misalignedsentences
 Sentences with 50 or more words

 XML tags

 Symbols and numerical characters other then


commas and periods
Ideally…

http://www.cs.berkeley.edu/~klein/cs294-5
Bypassing Interlingua: Models I-III

 Variables contributing to the probability


of a sentence:
Correlation between words in the
source/target languages
Fertility of a word
Correlation between order of words in
source sentence and order of words
in target
A Translation Matrix
Rob Cat is Dog

Rob 1 0 0 0

Gato 0 1 0 0

es 0 0 .5 0

esta 0 0 .5 0

Perro 0 0 0 1
Building the Translation Matrix: Starting
from alignments

 Find the sentence alignment


 If a word in the source aligns with a word
in the target, then increment the
translation matrix.
 Normalize the translation matrix
Can’t find alignments

 Most sentences in the hansards corpus


are 60 words long. There are many that
can be over 100.
 100100 possible alignments
Counting
 Rob is a boy. Rob es nino.
 Rob is tall. Rob es alto.
 Eric is tall. Eric es alto.
… …
Base counts on co-occurrence, weighting
based on sentence length.
Iterative Convergence
 Use Estimation Rob Is Tall boy
Maximization
algorithm Rob .66 .33 .25 .25
 Creates translation
matrix es .30 .66 .25 .25

alto .2 .05 .5 0

nino .2 .05 0 .5
Distorting the Sentence
 Word order changes between languages
 How is a sentence with 2 words distorted?
 How is a sentence with 3 words distorted?
 How is a sentence with …

To keep track of this information we use…


A tesseract!

 (A quadruply nested default


dictionary)
 This could be a problem if there
are more than 100 words in a
sentence.
 100x100x100x100 = too big for
RAM and takes too much time
Broad Look at MT
 “The translation process can be
described simply as:
1. Decoding the meaning of the source text, and
2. Re-encoding this meaning in the target
language.”
- “Translation Process”, Wikipedia, May 2006
Decoding
 How to go from the T-matrix and A-matrix
to a word alignment?

 There are several approaches…


Viterbi

 If only doing alignment, much smaller


memory and time requirements.
 Returns optimal path.

 T-Matrix probabilities function as the


“emission” matrix
 A-Matrix probabilities concerned with
the positioning of words
Decoding as a Translator

Without supplying a translated sentence


to the program, it is capable of being a
stand-alone translator instead of a word
aligner.

However, while the Viterbi algorithm runs


quickly with pruning for decoding, for
translating the run time skyrockets.
Greedy Hill Climbing
Knight & Koehn, What’s New in Statistical Machine Translation, 2003

 Best first search


 2-step look ahead to avoid getting stuck in
most probable local maxima
Beam Search
Knight & Koehn, What’s New in Statistical Machine Translation, 2003

 Optimization of Best First Search with


heuristics and “beam” of choices
 Exponential tradeoff when increasing the
“beam” width
Other Decoding Methods
Knight & Koehn, What’s New in Statistical Machine Translation, 2003

 Finite State Transducer


 Mapping between languages based on a finite
automaton
 Parsing
 String to Tree Model
Problem: One to Many

Necessary to take all alignments over a


certain probability in order to capture the
“probability that e has fertility at least a
given value”

Al-Onaizan, Curin, Jahr, etc., Statistical Machine Translation, 1999


Results

 Study done in 2003 on word alignment


error rates in Hansards corpus:
 Model 2–
 29.3% on 8K training sentence pairs
 19.5% on 1.47M training sentence pairs

 Optimized Model 6 –
 20.3% on 8K training sentence pairs
 8.7% on 1.47M training sentence pairs
Och and Ney, A Systematic Comparison of Various Statistical Alignment
Models, 2003
Expected Accuracy

70% overall
 Language performance:
 Dutch
 French
• Italian, Spanish, Portuguese
 Greek
 Finish
Possible Future Work

 Given more time, we would’ve implemented IBM


Model 3
 Additionally uses n, p, and d fertilities for weighted
alignments:
 N, number of words produced by one word
 D, distortion
 P, parameter involving words that aren’t involved directly
 Invokes Model 2 for scoring
Another Possible Translation Scheme

 Example-Based Machine Translation


 Translation-by-Analogy

 Can sometimes achieve better than the “gist”


translations from other models
Why Is Improving Machine
Translation Necessary?
A Chinese to English Translation
The End
Are there any
questions/comments?

You might also like