Professional Documents
Culture Documents
Introduction
Multiple pronunciation problem Same word but different pronunciations Newton: /nju:tn/ v.s. /nu:tn/ Same spelling but different pronunciations (homograph) refuse: /r'fju:z/ v.s. /'refju:s/
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme>nju:tn</phoneme> <phoneme>nu:tn</phoneme> </lexeme> <lexeme> <grapheme>refuse</grapheme> <phoneme> r'fju:z </phoneme> <phoneme>'refju:s</phoneme> </lexeme> </lexicon>
The Value Networking Company
2/9
Effective in speech synthesis Only in multiple pronunciations for same orthography Not in homograph problem refuse: verb/r'fju:z/ v.s. noun/'refju:s/ No information for ASR systems.
The Value Networking Company
5/9
Text
Prosody Analysis
Waveform production
Speech
Pronunciation Dictionary
morpheme POS Pronunciation
6/9
Proposals
Proposal 1: POS attribute of phoneme element Optional attribute Proposal 2: POS element Lexeme element contain optional POS elements.
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>refuse</grapheme> <phoneme> r'fju:z </phoneme> <pos> verb </verb> </lexeme> </lexicon>
POS values: language-specific Type: allow vendor-specific POS type? Outstanding POS set: Penn Treebank, Sejong project (Korean)
The Value Networking Company
9/9
Conclusion
No element or attribute for resolving multiple pronunciations In current SSML, PLS POS information can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. Can reduce the search time in a large vocabulary recognition system. Can be effective in agglutinative language. Proposals POS element POS attribute