You are on page 1of 10

Position Paper for W3C Workshop on Internationalizing SSML

The Usage of Part-Of-Speech for Resolving Multiple Pronunciations in SSML


2005. 11. 3. Myoung-Wan Koo and Du-Seong Chang KT/KAIT

Introduction
Multiple pronunciation problem Same word but different pronunciations Newton: /nju:tn/ v.s. /nu:tn/ Same spelling but different pronunciations (homograph) refuse: /r'fju:z/ v.s. /'refju:s/
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme>nju:tn</phoneme> <phoneme>nu:tn</phoneme> </lexeme> <lexeme> <grapheme>refuse</grapheme> <phoneme> r'fju:z </phoneme> <phoneme>'refju:s</phoneme> </lexeme> </lexicon>
The Value Networking Company

2/9

Multiple pronunciation in SSML&PLS


SSML The Speech Synthesis Markup Language Specification Version 1.0 Pronunciation information in SSML Phoneme element Lexicon element PLS Pronunciation Lexicon Specification Version 1.0 Pronunciation information in PLS Phoneme element Prefer attribute They doesnt fully support the pronunciation lexicon for multiple pronunciations and agglutinative language. Part-Of-Speech information is needed
3/9 The Value Networking Company

Pronunciation information in PLS (1/2)


Pronunciation Lexicon Specification Version 1.0/Feb 2005/W3C Voice Browser Working Group It allow interoperable specification of pronunciation information for either ASR and TTS engines within voice browsing applications. It is expected to handle multiple pronunciation. Example of PLS
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns=http://www.w3.org/2005/01/pronunciation-lexicon alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>tomato</grapheme> <phoneme> tmeiou</phoneme> </lexeme> </lexicon>
4/9 The Value Networking Company

Pronunciation information in PLS (2/2)


Prefer attribute of phoneme element Give one pronunciation high priority among pronunciation candidates.
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Newton</grapheme> <phoneme prefer="true">nju:tn</phoneme> <phoneme>nu:tn</phoneme> </lexeme> </lexicon>

Effective in speech synthesis Only in multiple pronunciations for same orthography Not in homograph problem refuse: verb/r'fju:z/ v.s. noun/'refju:s/ No information for ASR systems.
The Value Networking Company

5/9

Typical Korean TTS system structure


Structural Information Morphemes, POS Phonemes, POS Phonemes, Prosody

Text

Morphological Grapheme-toAnalyzer Phoneme


Morpheme Dictionary
morpheme POS1 POS2

Prosody Analysis

Waveform production

Speech

Pronunciation Dictionary
morpheme POS Pronunciation
6/9

The Value Networking Company

POS for resolving multiple pronunciations


POS information can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. The word refuse can have two different pronunciations depending on pos information. Proposal: POS attribute
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>refuse</grapheme> <phoneme pos=verb> r'fju:z </phoneme> </lexeme> <lexeme> <grapheme>refuse</grapheme> <phoneme pos=noun>'refju:s</phoneme> </lexeme> </lexicon>
7/9 The Value Networking Company

POS information for LVCSR


Large vocabulary continuous speech recognition of agglutinative language Basic unit is morpheme (pseudo-morpheme) for reducing the vocabulary size. Many homographs in the recognition dictionary. POS information help system to get a proper pronunciation in a dictionary as well as to resolve multiple pronunciations in some words. It reduce the search time since POS information could cut the wrong word connection in the first stage, not in the semantic interpretation stage.

8/9 The Value Networking Company

Proposals
Proposal 1: POS attribute of phoneme element Optional attribute Proposal 2: POS element Lexeme element contain optional POS elements.
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" alphabet="ipa" xml:lang="en-US"> <lexeme> <grapheme>refuse</grapheme> <phoneme> r'fju:z </phoneme> <pos> verb </verb> </lexeme> </lexicon>

POS values: language-specific Type: allow vendor-specific POS type? Outstanding POS set: Penn Treebank, Sejong project (Korean)
The Value Networking Company

9/9

Conclusion
No element or attribute for resolving multiple pronunciations In current SSML, PLS POS information can reduce the overhead of resolving multiple pronunciations in ASR and TTS systems. Can reduce the search time in a large vocabulary recognition system. Can be effective in agglutinative language. Proposals POS element POS attribute

10/9 The Value Networking Company

You might also like