The Usage of Part-Of-Speech For Resolving Multiple Pronunciations in SSML

Position Paper for W3C Workshop on Internationalizing SSML
The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML

2005. 11. 3. Myoung-Wan Koo and Du-Seong Chang KT/KAIT
Introduction
Multiple pronunciation problem Same word but different pronunciations Newton: /nju:tn/ v.s. /nu:tn/ Same spelling but different pronunciations (homograph) refuse: /r'fju:z/ v.s. /'refju:s/
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme>nju:tn</phoneme> <phoneme>nu:tn</phoneme> </lexeme> <lexeme> <grapheme>refuse</grapheme> <phoneme> r'fju:z </phoneme> <phoneme>'refju:s</phoneme> </lexeme> </lexicon>
The Value Networking Company
2/9
Multiple pronunciation in SSML&PLS

SSML The Speech Synthesis Markup Language Specification Version 1.0 Pronunciation information in SSML Phoneme element Lexicon element PLS Pronunciation Lexicon Specification Version 1.0 Pronunciation information in PLS Phoneme element Prefer attribute They doesnt fully support the pronunciation lexicon for multiple pronunciations and agglutinative language. Part-Of-Speech information is needed
3/9 The Value Networking Company
Pronunciation information in PLS (1/2)

Pronunciation Lexicon Specification Version 1.0/Feb 2005/W3C Voice Browser Working Group It allow interoperable specification of pronunciation information for either ASR and TTS engines within voice browsing applications. It is expected to handle multiple pronunciation. Example of PLS
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns=http://www.w3.org/2005/01/pronunciation-lexicon alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>tomato</grapheme> <phoneme> tmeiou</phoneme> </lexeme> </lexicon>
Pronunciation information in PLS (2/2)

Prefer attribute of phoneme element Give one pronunciation high priority among pronunciation candidates.
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme prefer="true">nju:tn</phoneme> <phoneme>nu:tn</phoneme> </lexeme> </lexicon>
Effective in speech synthesis Only in multiple pronunciations for same orthography Not in homograph problem refuse: verb/r'fju:z/ v.s. noun/'refju:s/ No information for ASR systems.
5/9
Typical Korean TTS system structure

Structural Information Morphemes, POS Phonemes, POS Phonemes, Prosody
Text
Morphological Grapheme-toAnalyzer Phoneme

Morpheme Dictionary
morpheme POS1 POS2
Prosody Analysis
Waveform production
Speech
Pronunciation Dictionary
morpheme POS Pronunciation
6/9
POS for resolving multiple pronunciations

POS information can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. The word refuse can have two different pronunciations depending on pos information. Proposal: POS attribute
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>refuse</grapheme> <phoneme pos=verb> r'fju:z </phoneme> </lexeme> <lexeme> <grapheme>refuse</grapheme> <phoneme pos=noun>'refju:s</phoneme> </lexeme> </lexicon>
POS information for LVCSR

Large vocabulary continuous speech recognition of agglutinative language Basic unit is morpheme (pseudo-morpheme) for reducing the vocabulary size. Many homographs in the recognition dictionary. POS information help system to get a proper pronunciation in a dictionary as well as to resolve multiple pronunciations in some words. It reduce the search time since POS information could cut the wrong word connection in the first stage, not in the semantic interpretation stage.
Proposals
Proposal 1: POS attribute of phoneme element Optional attribute Proposal 2: POS element Lexeme element contain optional POS elements.
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>refuse</grapheme> <phoneme> r'fju:z </phoneme> <pos> verb </verb> </lexeme> </lexicon>
POS values: language-specific Type: allow vendor-specific POS type? Outstanding POS set: Penn Treebank, Sejong project (Korean)
9/9
Conclusion
No element or attribute for resolving multiple pronunciations In current SSML, PLS POS information can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. Can reduce the search time in a large vocabulary recognition system. Can be effective in agglutinative language. Proposals POS element POS attribute

The Usage of Part-Of-Speech For Resolving Multiple Pronunciations in SSML

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

The Usage of Part-Of-Speech For Resolving Multiple Pronunciations in SSML

Uploaded by

Copyright:

Available Formats

Position Paper for W3C Workshop on Internationalizing SSML

The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML

Multiple pronunciation in SSML&PLS

Pronunciation information in PLS (1/2)

Pronunciation information in PLS (2/2)

Typical Korean TTS system structure

Morphological Grapheme-toAnalyzer Phoneme

The Value Networking Company

POS for resolving multiple pronunciations

POS information for LVCSR

8/9 The Value Networking Company

10/9 The Value Networking Company

You might also like