You are on page 1of 2

Synthesis on ‘Transferring Egyptian Colloquial

Dialect into Modern Standard Arabic’

The Arabic dialects are increasingly becoming more important than the modern
standard Arabic since, they are the most spoken all over the Arab world and widely
predominated on the social media, which, nowadays, constitutes a primordial source
of data. Most of these written colloquial has been in the Egyptian colloquial dialect,
which is considered the most widely dialect understood and used throughout the
Arab world.

Among the challenges that the paper addresses are the writings of colloquial words,
which are usually romanized letters, distortion and deviation from MSA, lack of
syntactic rules and lexical expansion rate.

In order to preprocess the Arabic dialect, the colloquial dialect words can be
converted into their corresponding MSA words by using morphological analysis and
lexical acquisition of colloquial words.

The paper proposed several solutions for the previous challenges, which are
detecting Romanized words in the input and transliterate these words into Arabic
lexographic letters, normalizing the words such as removing repeated characters,
using an existing mature MSA lexicon “Buckwalter” (Arabic morphological analysis
that breakdowns the input Arabic word into three elements: prefix, stem and suffix) to
build a rule-based lexical transfer approach and using empirical corpus-based
techniques from example based machine translation (EBMT).

The transfer between Egyptian Arabic dialect and MSA can be either one-to-many or
one-to-one, which involves the author of this paper to enhance the buckwalter’s
lexicon by adding new extra fields such as ID and segmentType to distinguish each
word segment and its new position of this latter. The mapping table is in charge to
encode the mapping rules between Egyptian Arabic to MSA and to use the value of
the lexicon’s ID field, this table has three fields: source colloquial word, target
colloquial word and the mapping mode. For example the mapping of " ‫ "امتى‬to the
MSA is "‫"متى‬, this rule will be represented in the MT by an entry with the values:
source colloquial interrogative=ID 79831, target MSA interrogative ID=64063 and
mapping mode=0.
These techniques may reuse Arabic morphological analysis resources and enhance
them, in addition they can be applied to other colloquial Arabic dialects such as
Moroccan.

You might also like