Professional Documents
Culture Documents
Paper4 PDF
Paper4 PDF
24
Proceedings of the Conference on Language & Technology 2009
च CA Chay र RA Ray
ज JA
Jeem व VA " Wow
प
PA Pay
उ ◌ु ُ ʊ
25
Proceedings of the Conference on Language & Technology 2009
Here we are writing down few sample words by Similar sound character problem always occurs due
reading those a reader can have an idea of difference in to multiple Urdu characters against one Hindi
writing style of both languages. character, as can be seen in table 1. For the same
/
दक
ु ानदार -.
reason, the wrong selection of character has often
found for words that end on “”ﮦ.
नमःते 0123.
टमाटर 4
3 (2) तुम खाना हमेशा बहुत अCछा बनाती
हो
3. Issues in Hindi-Urdu conversion @, / /
8:a5
?a
; aABa
=>3:a
.
;a<
The paper discusses the issues in Hindi to Urdu
conversion that are remained unsolved in CRULP’s In (2), for example, word “
=>3:” is written with “”
and Malik’s system. To identify these problems we instead of “)”. Table 4 gives the list of same sound
made a small survey of Hindi text available at [3] and Urdu characters.
[4]. We transliterated the Hindi text to Urdu using
CRULP’s Hindi to Urdu transliterator. The problems
Table 4: List of same sound Urdu characters
identified in the converted text are listed. We explored
Malik’s solution to find whether his algorithm and
Sounds Urdu Default
structures have solution of these problems. It was
Characters list Characters for
found that most of the problems are not solved by his
for each sound Transliteration
model too.
Character
The identified issues are of three types. The first type
of issues has unsolved problems in character by t sound C
character transliteration of Hindi text into Urdu script. s sound a&C a$C % $
But there are issues that are beyond the scope of
character to character transliteration. There are z sound aDC aEC aFC G D
26
Proceedings of the Conference on Language & Technology 2009
27
Proceedings of the Conference on Language & Technology 2009
Observe that the word “A T 4`a ” has short vowel 3.2.3. Persio-Arabic words. Urdu’s vocabulary is
ending. CRULP’s transliterator has not dealt with this comprised of many Persio-Arabic words. These
issue whereas Malik [2] has presented his idea on a borrowings have their own writing conventions which
particular problem but this problem is also present in Urdu speakers generally follow.
his system. When Hindi speakers write persio-arabic words they
Hindi and Urdu share a basic common vocabulary don’t follow the writting scheme of these borrowings.
that includes pronouns and auxiliaries. Errors in Issues have been seen in the words that have silent-
transliteration occur because of different writing Alif. Hindi speakers neglect silent-alif part in writing
conventions which have been followed for words that because of the unawareness with the correct form of the
are being used by both language speakers from same word. The lack in familiarity with the correct form of
set of vocabulary. As shown in (8): a pronoun “9” is the words is depicted as in (11):
transliterated as “0”.
(11) मS Nफ़लहाल अपने घर जा रहा हूँ
/ ,
(8) ये मरिगला क@ पहाNड़याँ हH 8:a
b a
a4;a0?a!
B6ST aJ>V
J>ba
Ba5ac
T 4Va
T 0 Problem-word in example (11) is “!
B6ST ” and its
, correct form is “!
fKa5S”.
This issue is often found in words “9”, “9T ”, “9”
/ Another problem occurs with the persio-arabic
and “)"”. First three in pronunciations produce [e]
words with “O” sound. When “O” is followed by [u]
sound that’s why they are written with “”. However,
/ sound, they are mostly written with short vowel (ُ)
the pronoun “)"” is exception. It has been found in two (equivalent of Pesh) in Hindi text. This issue is
different writing styles in today’s Hindi text i.e. “""” discussed in detail in 3.2.2
3.2.2. Short vowel for Vao. Words adopted by Hindi 3.3. Multi-morphemes without space
speakers from Urdu (Persio-Arabic) vocabulary have
issues like wrong marking of “ ”وwith short vowel pesh If we analyze Hindi text we would find out frequent
(ُ◌). This problem does not occur for all instances of use of multi-morphemes words. These words are
vao in Urdu. typically associated with the different problem-groups.
Below we have discussed these groups and have
morphologically correlated Hindi and Urdu grammars.
(9) ^या हुआ?
/
Zad:a
>
28
Proceedings of the Conference on Language & Technology 2009
3.3.3. kar/ke after verb. Both Hindi and Urdu Close Class Words Open Class Words
languages include conjunctions. These conjunctions Hindi Urdu Hindi Urdu
are basically used to join two words or phrases دوارا ﮯ ِاِﮩ !"ر
together. Urdu language considers these conjunctions رن$ ہ ﮯ%و '(%را )'
as individual word forms thus they are written
separately. While in Hindi content conjunctions “kar” /
There is a difference in the use of pronoun “)"”. In
and “ke” are joined with the preceding verb.
Urdu, it / is used for both singular and plural. But in
Hindi “)"” is used for third person singular and “"” is
(16) Nफर याद चलकर मेरे पास ^य? आती है
used as third person plural.
0:a5*a8>a$
a4>Va4h6 a
a4;T
29
Proceedings of the Conference on Language & Technology 2009
30
Proceedings of the Conference on Language & Technology 2009
5. Conclusion
After reviewing the output of existing Hindi to Urdu
transliteration system, we find that there are issues that
are still needed to be solved. The main issues for a
better transliteration are: slight difference in both
languages script, difference in spelling of the same
words due to unawareness with correct form or due to
diverse writing style, alteration in original forms of the
words (commonly being used by both languages
chroniclers) due to variation in a pronunciation of the
words. We suggested post processing of the
transliterator output and solved issues caused by
writing convention differences, by consulting specific
word lists.
31