You are on page 1of 1

Korean-Vietnamese Neural Machine Translation

System with Korean Morphological Analysis and


Word Sense Disambiguation
Abstract:
Although deep neural networks have recently led to great achievements in machine translation
(MT), various challenges are still encountered during the development of Korean-Vietnamese
MT systems. Because Korean is a morphologically rich language and Vietnamese is an analytic
language, neither have clear word boundaries. The high rate of homographs in Korean causes
word ambiguities, which causes problems in neural MT (NMT). In addition, as a low-resource
language pair, there is no freely available, adequate Korean-Vietnamese parallel corpus that can
be used to train translation models. In this paper, we manually established a lexical semantic
network for the special characteristics of Korean as a knowledge base that was used for
developing our Korean morphological analysis and word-sense disambiguation system:
UTagger. We also constructed a large Korean-Vietnamese parallel corpus, in which we applied
the state-of-the-art Vietnamese word segmentation method RDRsegmenter to Vietnamese texts
and UTagger to Korean texts. Finally, we built a bi-directional Korean-Vietnamese NMT system
based on the attention-based encoder-decoder architecture. The experimental results indicated
that UTagger and RDRsegmenter could significantly improve the performance of the Korean-
Vietnamese NMT system, achieving remarkable results by 27.79 BLEU points and 58.77 TER
points in Korean-to-Vietnamese direction and 25.44 BLEU points and 58.72 TER points in the
reverse direction.

You might also like