You are on page 1of 13

Workshop : 2005/11/02 (Wed) 1

W3C Workshop on Internationalizing SSML

SSML Extension for Korean

Sang-Jin Kim
sangjin@icu.ac.kr
2

Contents
 Characteristic of Korean
 SSML Extension for Chinese Characters in Korean
 SSML Extension for Homograph Words in Korean
 Conclusion
3

Characteristic of Korean
 Hangul, The Korean Character
 Consists of forty letters
 21 vowels (including 13 diphthongs), and 19 consonants
 Syllable
 V, CV, VC, and CVC (C : consonant, V : vowel)
 Eojeol, the word phrase is different from a phrase in English
 Completely different from Japanese except for the grammatica
l structure
 Completely different from Chinese although Korean has borro
wed many Chinese words and some Chinese characters
4

Characteristic of Korean
 Vowels in Hangul, The Korean Character
 Monothong vowels classified according to tongue position and
height
5

Characteristic of Korean
 Consonants in Hangul, The Korean Character
 Consonants classified according to place and manner of
articulation
SSML Extension for 6

Chinese Characters in Korean


 Chinese Characters in Korean
 Present Korean and Japanese use many Chinese Characters
 But, pronunciation of the characters is different

 Same characters is represented differently according to the co


untry

 These simplified characters are not used in Korea


SSML Extension for 7

Chinese Characters in Korean


 Chinese Characters in Korean
 We can write text only with Korean characters
 Not unusual to use Chinese characters as well

 The pronunciation of the are exactly same


SSML Extension for 8

Chinese Characters in Korean


 Chinese Characters in Korean TTS
 The input text for text-to-speech(TTS) system has to be conv
erted into a phonetic list
 If Chinese characters are mixed with Korean characters, they h
ave to be substituted to Korean

 We don’t use all Chinese characters, rather there is a frequentl


y-used-Chinese-character-list recommended by our Korean
government and its size is 2000
 We need to utilize this list and their pronunciations in the Kore
an TTS system, since the pronunciations of them are different
from Chinese and Japanese
SSML Extension for 9

Chinese Characters in Korean


 SSML Extension for Chinese Characters in Korean
 Same characters but different pronunciation in Chinese
Characters according to the country

<lexicon xml:lang=”ko” uri=”http://www.multilingual.org/lexicon.file”>


<lexicon xml:lang=”ko-CN” uri=”http://www.multilingual.org/Chinese_lexicon_freq_KR.file”>
<lexicon xml:lang=”ko-CN” uri=”http://www.multilingual.org/Chinese_lexicon_technical.fil
e”>

<lexicon xml:lang=”ja-KR” uri=”http://www.multilingual.org/Chinese_lexicon_JP.file”>


<lexicon xml:lang=”cn-KR” uri=”http://www.multilingual.org/Chinese_lexicon_CN.file”>
SSML Extension for 10

Homograph Words in Korean


 Homograph Words in Korean
 Same word, different pronunciation, different meaning
 The difference is “duration”
SSML Extension for 11

Homograph Words in Korean


 SSML Extension for Homograph Words in Korean
 Only the difference for these words is the duration in
pronunciation
 necessary to give the duration information to a TTS system for
these kinds of words
 SSML recommendation supports “say-as” element and “sub”
element, these elements cannot handle the above problem
successfully
SSML Extension for 12

Homograph Words in Korean


 SSML Extension for Homograph Words in Korean
 We suggest “tone” tag for this problem
 Attribute values for tone element are ‘long’, ‘short’ and
‘default’ would be enough for Korean.
13

Conclusion
 SSML Extension for Chinese Characters in Korean
 lexicon element doesn’t support “xml:lang” tag
 We suggest xml:lang=“ko”, xml:lang=“ko-CN”, xml:lang=“ja-
KR”, xml:lang=“cn-KR” tags

 SSML Extension for Homograph Words in Korean


 “say-as” and “sub” elements cannot handle homograph probl
em successfully
 We suggest “tone” element
 Attribute values, type=“long”, type=“short”, and type=“defaul
t” would be enough for Korean

You might also like