Professional Documents
Culture Documents
Alberto Massidda
Who we are
● Founded in 2001;
● Branches in Milan, Rome and London;
● Market leader in enterprise ready solutions based on Open Source tech;
● Expertise:
○ Open Source
○ DevOps
○ Search
https://creativecommons.org/licenses/by-nc-sa/3.0/
Outline
1. Statistical Machine Translation
2. Neural Machine Translation
3. Domain Adaptation
4. Zero shot translation
5. Unsupervised Neural MT
Statistical Machine Translation
Translating as a ciphered message recovery through probability laws:
● “È troppo tardi” ✓
● “Tardi troppo è” ✗
P(e) and P(f|e)
Where does
these numbers
come from?
Language model
P(e) comes from a Language model, a machine that assigns scores to
sentences, estimating their likelihood.
1. Record every sentence ever said in English (1 Billion?)
2. If the sentence “how’s it going?” appears 76413 times in that database, then
we say:
Translation model
Next we need to worry about P(f|e), the probability of a French string f given an
English string e.
EN IT
I' ve never seen anything like that! Non ho mai visto nulla di simile!
But a feed forward network is not suitable to map the temporal dependencies
between words. We need an architecture than can explicitly map sequences.
Recurrent network
Neural language model
Encoder - Decoder architecture
With a sentence f and e :
CAMERI
IL PRESE I PIATTI
ERE
g g g g g
h h h h h
g g g g g
+
0.7 0.05
0.1 0.1 0.05
h h h h h
g g g g g
+
0.1 0.05
0.7 0.1 0.05
h h h h h
g g g g g
+
0.05 0.05
0.1 0.7 0.1
h h h h h
g g g g g
+
0.05 0.1
0.05 0.1 0.7
h h h h h
g g g g g
+
0.05 0.7
0.05 0.1 0.1
h h h h h
Each corpus builds a latent semantic space. Similar languages build similar spaces.
https://github.com/google/seq2seq
A general-purpose encoder-decoder framework for Tensorflow
https://github.com/awslabs/sockeye
seq2seq framework with a focus on NMT based on Apache MXNet
http://www.statmt.org/
Old school statistical MT reference site
QA