You are on page 1of 2

Synthesis on ‘Arabic Dialect Handling in Hybrid

Machine Translation’

Handling dialect Arabic could be carried out by hybrid machine translation system
using a decoding algorithm to normalize non-standard dialectal Arabic into Modern
Standard Arabic, which has been proved in this paper by measuring and comparing
machine translations results in terms of BLEU (bilingual evaluation understudy) with
and without normalization.

As described in the paper “Transferring Egyptian Colloquial Dialect into MSA“, the
main challenge is the lack of conventional syntactic structures, which differ from that
of standard written language as well as the issue of noisy text from broadcast
transmissions and web content.

Hybrid machine translation system is based on both rule-based and statistical


approaches, which are used to mitigate their weaknesses especially in terms of rare
words that statistical approach cannot handle, unless if we proceed to the
preprocessing which is helpful to remove the unknown words and textual noise, and
access to the lexical functional grammar system (LFG) that is available in the rule-
based engine parser. LFG is richly annotated by syntactic and semantic information
that will be processed by the decoding algorithm.

The decoding algorithm is based upon functional constraints containing (parse trees)
attribute-value pairs that specify part of speech, grammatical information, semantic
disambiguation information and argument structure in source and target text, which
performs deeper syntactic and semantic analysis.

The statistical translation model is used along with rule-based approach in order to
make use of the functional constraints. In this paper, STM is a phrase-based
approach similar to the alignment template approach; it is based on what is known as
probabilistic mathematical theory, which permits it to learn the probability of word
occurrence based on examples of text by using a combination of standard n-gram
language models and structural language models.

The main contribution of this paper is normalizing the words from the local dialects by
a character-based dialect, which utilizes simple rules to convert words into the most
similar MSA word, a non-dialect morphological analyzer and a dialect-specific
morphological analyzer, which uses a morphological set of hand-crafted rules, which
describe the general morphology of the different dialects, which may yield into
potential multiple output.

The dialect normalization has shown an increase in the BLEU score on the broadcast
and web content compared to the non-normalized baseline by running the
normalization on training and test data, and for the other case, only test data leading
to about 42% with HMT approach.

The HMT approach has proved to be the best classic method, however with the
advent of neural networks, the neural language models is now the preferred
approach. As far as dialect normalization, it would be helpful for dealing with
colloquial Arabic dialects such as Moroccan.

You might also like