You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228791284

A Simple Malay Speech Synthesizer Using Syllable Concatenation Approach

Article

CITATIONS READS
7 299

4 authors, including:

Nur-Hana Samsudin Sabrina Tiun


University of Birmingham Universiti Kebangsaan Malaysia
9 PUBLICATIONS   11 CITATIONS    28 PUBLICATIONS   114 CITATIONS   

SEE PROFILE SEE PROFILE

Enya Kong Tang


Universiti Sains Malaysia
90 PUBLICATIONS   242 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

acoustic-prosodic for speech language classiffication View project

Natural Language Processing View project

All content following this page was uploaded by Sabrina Tiun on 02 June 2014.

The user has requested enhancement of the downloaded file.


A Simple Malay Speech Synthesizer Using
Syllable Concatenation Approach
Nur-Hana SAMSUDIN, Sabrina TIUN and TANG Enya Kong
Computer-Aided Translation Unit
School of Computer Sciences
Universiti Sains Malaysia
Penang, Malaysia
{nursham, sab, enyakong}@cs.usm.my

Abstract— A Malay speech synthesizer system will be • concatenative synthesis.


discussed. This paper will cover the available Malay speech Articulatory synthesis simulates speech base on
synthesis system, the underlying structure of our system, human articulators’ movement while formant synthesis
brief description of crucial modules, general evaluation of
the system, the proposed enhancement and future work of
model the pole frequencies of speech signal used to
Malay text-to-speech system in Computer-Aided Translation determine the parameters necessary to synthesize a
Unit (UTMK). The objective is to highlight how our system desired utterance based on a set of rules. Concatenative
works and how to improve its performance. We would also synthesis stores pre-recorded speech segments which
enlighten our future paradigm in the text-to-speech are later retrieved and replayed to produce synthesized
research at UTMK. speech.
Index Terms—Speech Synthesis, syllable concatenation, Type of unit to be stored as pre-recorded speech
Malay Language, distortion. segments rely on the developers’ preference and also
the best unit to represent a particular language for the
I. INTRODUCTION speech synthesis system. These units can be classified
into two types; context dependent units such as

O ver the last several years, speech technology has


advanced considerably. We can build systems that
interact through speech; which the system can listen to
phonemes, syllables and words; and also context
independent units such as diphone, triphone and
demisyllable [2]. Other representation will be unit
what was said, compute or do something, and then selection database. For this type of database, there will
speak back, using spoken language generation. be different type of unit available in the database and
Generally, speech system can be categorized into two there will be a few instances for an utterance [3].
broad categories: speech synthesis and speech In this paper, we will emphasize on concatenative
recognition. Generally, speech synthesis is the process synthesis using syllables unit.First of all, we will give
of generating spoken language succession from an overview of the available Malay speech synthesis
arbitrary text whereas speech recognition is the process system, our Malay Speech Synthesizer; which includes
of identifying spoken language into required form such the architecture, methods of the modules development
as converts it into text or translate it as command. and the performance of the system. We will also
Mostly, speech synthesis in Malaysia still revolves discuss how the development of the system can help to
around research level. Different research centre focus improve our technique in solving the synthesis
on different speech synthesis approach. For example, problems and our future plan in UTMK.
Asia Pacific Institute of Information Technology
(APIIT) under Research and Development Team is II. BACKGROUND
developing a visual text-to-speech system, named
AVISS. At Universiti Kebangsaaan Malaysia (UKM) A. Existing Malay Synthesizer using Syllable
under Department of Electrical Engineering Faculty of Concatenation
Engineering, they focus on digital speech synthesis and
here at Computer-Aided Translation Unit (UTMK), A variety of Malay speech synthesis by
School of Computer Science, Universiti Sains concatenation system has been proposed. For example,
Malaysia, we focus on pre-recorded speech synthesis. in Say It! system [4], the segmenting technique is to
Three main approaches in speech synthesis system select the longest phoneme sequence and compare the
development are [1]: selected sequence in the available syllable database. If
• articulatory synthesis; matches occur, the sequence will be taken out and
• formant synthesis; and consider as a syllable unit. Else, the last phoneme in the
sequence will be eliminated and the comparison will be shown in Figure 1.
done again with the reduced phonemes sequence. The
process will be repeated until the match is found in the
database. This technique does provide a simple
Raw Text
implementation and produced quick result but the
parsing could also be segmented wrongly [4]. Abbreviation,
acronym and
Text Normalization
In another synthesizer implementation [5], the whole symbol
system was implemented by adapting a previously dictionary
developed Arabic synthesizer system. The technique of Malay
Syllable Parser
segmentation has considered the linguistic analysis and Database
phonetic analysis of standard Malay sound systems plus
Malay loan words from Arabic. The approach they used
Syllable Concatenation Syllable
was by comparing the structure of the syllables. The Database
proposed approach uses three syllables structure in
Malay Language as the basis of segmentation. This
syllables structure is base on the consonants and vowels Synthesized Speech
sequence:
• CV Figure 1: Malay Speech Synthesizer Architecture
• CVC [6]
• VC
The phonetic transcription will be tagged in consonant- Raw Text is the input of the system. Any typed text
vowel label. The syllable segmentation will be done will be normalized in Text Normalization Module.
base on the matching structure of Malay syllable Syllable Parser will segment the normalized text to
structure that has been stated above. Base on the test syllable unit according to Malay rules. Syllable
executed, the system has proven its ability to produce concatenation will combine syllable unit sound file to
an acceptable result [5].
produce a synthesized speech.

B. Malay Speech Synthesizer Architecture III. SYLLABLE CONCATENATION APPROACH

Here in UTMK, we have implemented systems by Malay Speech Synthesizer system was implemented
adapting modules to the existing synthesizer engines using C++ and Visual Basic. Our Malay Speech
which developed for other language (American English synthesizer is using concatenative approach with
and Indonesian) and we also have developed our own syllables unit. There are three different stages to
speech synthesis system using our own speech produce a synthesized speech
synthesizer engine. We found that adapting a system
using other language synthesizer engine will cause the
synthesized speech to sound foreign. Hence it is A. Text Normalization
important for us to have our own synthesizer1 as a The inputted text may not only contain words. There
starting point to have a robust speech synthesis engine. could be other different type of character such as
It is only a preliminary stage of Malay Speech numbers, abbreviation and acronyms which we
Synthesizer development in UTMK. This exploratory considered as symbol in this system. This module will
study has uncovered the problems arise during the convert the symbols input into readable text.
development of the synthesizer. It help us realized
things that need to be considered to produce a good B. Syllable Parser
speech synthesizer, such as in the phonetic analysis
context and linguistic analysis context as well as the Syllable Parser functioned to extract syllables from
synthesizer approach itself. It has also opened the door each normalized words and arranged it according to the
to future enhance and implement a better and robust sequence of the syllables based on Malay phonological
system. rules. According to Malay Speech Synthesizer
The architecture of our system may be presented as implementation [6] there are 4 syllable structures in
Malay Sound System. There are
• CV
1
The system name is Malay Speech Synthesizer. Developed • VC
between 2002 to 2003 by Nur-Hana Samsudin and Morshidah Yazid.
• CVC IV. SYSTEM PERFORMANCE
• V
Although structures are already available, the From our rules, syllable segmentation has shown
synthesizer might still not able to segment all words acceptable results. These are an example of
correctly. This is because there are a lot of loan words segmentation done for different types of words in
in Malay vocabulary. The origins of those loan words Malay language.
are also varied such as English, Arabic, Sanskrit, Indian
and Javanese. To make sure the system will also be • Malay origin word
able to segment loan words correctly, we have applied Word: belalang ‘grasshopper’
a set of rules to make sure that as much loan word Parser converter:
could be segmented correctly as possible and of course, CVCVCVC (<ng> is bound as one;
it must also correctly segment all Malay words. /ŋ/)
This module has a few sub-modules to ensure the Parser segmenter:
desired output is produced. These are some of the sub- CV. CV. CVC.
modules and the short description of the sub modules: Synthesizer:
/bə/ + /la/ + /laŋ/
1) ParseSequence: This sub module will receive the
input text as a string of characters and convert the
• English loan word
string into consonant or vowel symbol.
Word: struktur ‘structure’
Parser converter:
2) PronounceAI: Since we do not have morphology
CCCVCCVC
analyzer, we have to distinguish the suffixes of <i> in a
Parser segmenter:
word. We have a collection of words which end with
CCCVC. CVC.
<ai> that is pronounced as a diphthong and not
Synthesizer:
separately pronounce as in /a/ + /i/.
/struk/ + /tur/
3) ParseSyllabification: syllable segmentation module
is implemented here. • Arabic loan word
Word: maghrib ‘when the sun falls in the
4) ParseDiphtong: This module will bind all Malay horizon’
diphthong in one unit vowel. Malay diphtongs are: Parser converter:
<ai>, <au> and <oi>. CVCCVC (<gh> is bound as one; /ɣ/)
Parser segmenter:
5) ConvertEnd: If there are sequence of <ai>, <au>, CVC. CVC.
<ia>, <iu>, <ua> and <ui> in a word, which is parsed Synthesizer:
into different syllable, <w> or <y> need to be inserted /maɣ/ + /rib/
as how the Malay phonological rules has stated. This is
also known as glides insertion. Segment concatenation has also produced
reasonably intelligible synthetic speech. However,
distortion still occurs between concatenate segments
C. Syllable Concatenation and hence implementing wave modification to
handle the distortion is one of the options to have
This module will receive a list of syllable segment more natural speech. Based on the informal
that has been properly arranged according to the raw perceptibility test, it is easy to determine what is
text. Base on the list of syllable, Syllable Concatenation being said if the input are words and short sentences.
module will concatenate the sound according to the However, spoken text is harder to comprehend when
sequence and finally play the sound which we know as the text is in paragraph length. This perceptibility
synthesized speech. difficulty is more significant without the prosody
The system is capable of doing Malay language text modification algorithm.
conversion into Malay “baku” synthesized speech. It
should also capable of segmenting syllable of non-loan V. IMPROVING THE SYNTHESIZER
words and loan words.
Indeed, developing this preliminary system
enabled us to discover the criteria that need to pay
attention to [6], especially during the analysis and
design of the system. These are the proposed parameter values and different type of units such as
improvement. phonemes and syllables. This approach is still in
Firstly, a thorough phonetic analysis needs to be research level now.
done. Since we would like to make the system able
to speak all type of Malay words, we have to study VII. CONCLUSION
the phonological rules for both Malay words and
loan words and have to make sure it can adapt in the A simple Malay speech synthesizer using syllables
system well. unit was presented. The quality of the synthesized
Secondly, formal grapheme-to-phoneme speech is reasonably intelligible but quite unnatural.
conversions need to be done at the early stage of the The development of the system has arisen many
development to avoid chaos in the system characteristic that require deeper understanding. We are
implementation. This is also important to make sure able to identify the problems and proposed the methods
the design is well structured and can easily to improve the synthesizer. It has also opened our mind
reengineer later. to another step further; which is to develop synthesizer
Another thing is that, the synthesizer must have using unit selection database with small wave
additional module which is the smoothing algorithm. modification implement in it.
Smoothing is important to ensure that the
concatenation between syllables has as small ACKNOWLEDGEMENT
distortion as possible and also to ensure the
mismatches between words are lessen as much as Authors thank all people at Computer-Aided Translation
possible. Unit (UTMK) especially to the other Speech Team member at
UTMK; Dr Bali Ranaivo, Tan Tien Ping and Dr Chuah Choy
We also need to re-record speech segments. More
Kim. We would also say our gratitude to UTMK’s staff for
attention needs to be paid to the frequency and
the information on the Malay sound system rules and also to
amplitude range as well as the duration of each pre- Morshidah Yazid as another developer of the system.
recorded segments. This is important to ensure only
slight modification is necessary to apply to the wave
REFERENCES
or else it will cause other distortion problems. It is
also important to choose the right environment such [1] S. Lemmetty, “A Review of Speech Synthesis
as the recording room and also the equipment use for Technology”, Master Thesis,Department of Electrical
and Communication Engineering, Helsinki University
recording to increase the sound quality.
of Technology, Helsinki, Finland, March 1999.
Each of the improvement plans is in parallel [2] X. Huang, A.Acero and H.-W. Hon, “Spoken Language
process in UTMK and currently ongoing. Processing A Guide to Theory, Algorithm and System
Development”, New Jersey: Prentice Hall, 2001.
VI. FUTURE WORK [3] A.W. Black and N. Campbell, “Optimising Selection of
Units from Speech Databases for Concatenative
Synthesis”, Proceeding Eurospeech ’95, Madrid, Spain,
In UTMK, we have a prototype engine which the September 1995, pp. 581 – 584.
unit for concatenative synthesis are phonemes. [4] A.M. Zeki and N. Azizah, "A Speech Synthesizer for
Compared the quality of the results, the syllable is Malay Language”, National Conference on Research
indeed produced better results. However, with no wave and Development in Computer Science, Selangor,
modification, the distortion will occur and this may Malaysia, October 2001
lead to low natural quality and perhaps low [5] Y. A. El-Imam and Z.M. Don, "Text-to-Speech
intelligibility speech. On the other hand, if we Conversion of Standard Malay”, International Journal Of
implement wave modification algorithm, the distortion Speech Technology, no 3, pp. 129 – 146, 2000.
[6] N.-H. Samsudin, “A Malay Speech Synthesizer”, Final
will still occur. The main point is, once the waveform is
Year Undergraduate Project Report, School of Computer
modified, the distortion will exist. It is just the matter of Science, Universiti Sains Malaysia, Penang, Malaysia,
how significant the distortion is to human March 2003.
perceptibility. If the modified sound’s value range, such
as the pitch sound is not much differ from the original
pitch value, than the distortion is not very significant
(assuming other modified attribute is corresponded
with the original sound properly).
Realizing these facts, we are going to implement
another concatenative synthesizer approach using unit
selection database as its unit. They will be a few
candidates of the same utterance with different

View publication stats

You might also like