Professional Documents
Culture Documents
net/publication/228791284
Article
CITATIONS READS
7 299
4 authors, including:
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Sabrina Tiun on 02 June 2014.
Here in UTMK, we have implemented systems by Malay Speech Synthesizer system was implemented
adapting modules to the existing synthesizer engines using C++ and Visual Basic. Our Malay Speech
which developed for other language (American English synthesizer is using concatenative approach with
and Indonesian) and we also have developed our own syllables unit. There are three different stages to
speech synthesis system using our own speech produce a synthesized speech
synthesizer engine. We found that adapting a system
using other language synthesizer engine will cause the
synthesized speech to sound foreign. Hence it is A. Text Normalization
important for us to have our own synthesizer1 as a The inputted text may not only contain words. There
starting point to have a robust speech synthesis engine. could be other different type of character such as
It is only a preliminary stage of Malay Speech numbers, abbreviation and acronyms which we
Synthesizer development in UTMK. This exploratory considered as symbol in this system. This module will
study has uncovered the problems arise during the convert the symbols input into readable text.
development of the synthesizer. It help us realized
things that need to be considered to produce a good B. Syllable Parser
speech synthesizer, such as in the phonetic analysis
context and linguistic analysis context as well as the Syllable Parser functioned to extract syllables from
synthesizer approach itself. It has also opened the door each normalized words and arranged it according to the
to future enhance and implement a better and robust sequence of the syllables based on Malay phonological
system. rules. According to Malay Speech Synthesizer
The architecture of our system may be presented as implementation [6] there are 4 syllable structures in
Malay Sound System. There are
• CV
1
The system name is Malay Speech Synthesizer. Developed • VC
between 2002 to 2003 by Nur-Hana Samsudin and Morshidah Yazid.
• CVC IV. SYSTEM PERFORMANCE
• V
Although structures are already available, the From our rules, syllable segmentation has shown
synthesizer might still not able to segment all words acceptable results. These are an example of
correctly. This is because there are a lot of loan words segmentation done for different types of words in
in Malay vocabulary. The origins of those loan words Malay language.
are also varied such as English, Arabic, Sanskrit, Indian
and Javanese. To make sure the system will also be • Malay origin word
able to segment loan words correctly, we have applied Word: belalang ‘grasshopper’
a set of rules to make sure that as much loan word Parser converter:
could be segmented correctly as possible and of course, CVCVCVC (<ng> is bound as one;
it must also correctly segment all Malay words. /ŋ/)
This module has a few sub-modules to ensure the Parser segmenter:
desired output is produced. These are some of the sub- CV. CV. CVC.
modules and the short description of the sub modules: Synthesizer:
/bə/ + /la/ + /laŋ/
1) ParseSequence: This sub module will receive the
input text as a string of characters and convert the
• English loan word
string into consonant or vowel symbol.
Word: struktur ‘structure’
Parser converter:
2) PronounceAI: Since we do not have morphology
CCCVCCVC
analyzer, we have to distinguish the suffixes of <i> in a
Parser segmenter:
word. We have a collection of words which end with
CCCVC. CVC.
<ai> that is pronounced as a diphthong and not
Synthesizer:
separately pronounce as in /a/ + /i/.
/struk/ + /tur/
3) ParseSyllabification: syllable segmentation module
is implemented here. • Arabic loan word
Word: maghrib ‘when the sun falls in the
4) ParseDiphtong: This module will bind all Malay horizon’
diphthong in one unit vowel. Malay diphtongs are: Parser converter:
<ai>, <au> and <oi>. CVCCVC (<gh> is bound as one; /ɣ/)
Parser segmenter:
5) ConvertEnd: If there are sequence of <ai>, <au>, CVC. CVC.
<ia>, <iu>, <ua> and <ui> in a word, which is parsed Synthesizer:
into different syllable, <w> or <y> need to be inserted /maɣ/ + /rib/
as how the Malay phonological rules has stated. This is
also known as glides insertion. Segment concatenation has also produced
reasonably intelligible synthetic speech. However,
distortion still occurs between concatenate segments
C. Syllable Concatenation and hence implementing wave modification to
handle the distortion is one of the options to have
This module will receive a list of syllable segment more natural speech. Based on the informal
that has been properly arranged according to the raw perceptibility test, it is easy to determine what is
text. Base on the list of syllable, Syllable Concatenation being said if the input are words and short sentences.
module will concatenate the sound according to the However, spoken text is harder to comprehend when
sequence and finally play the sound which we know as the text is in paragraph length. This perceptibility
synthesized speech. difficulty is more significant without the prosody
The system is capable of doing Malay language text modification algorithm.
conversion into Malay “baku” synthesized speech. It
should also capable of segmenting syllable of non-loan V. IMPROVING THE SYNTHESIZER
words and loan words.
Indeed, developing this preliminary system
enabled us to discover the criteria that need to pay
attention to [6], especially during the analysis and
design of the system. These are the proposed parameter values and different type of units such as
improvement. phonemes and syllables. This approach is still in
Firstly, a thorough phonetic analysis needs to be research level now.
done. Since we would like to make the system able
to speak all type of Malay words, we have to study VII. CONCLUSION
the phonological rules for both Malay words and
loan words and have to make sure it can adapt in the A simple Malay speech synthesizer using syllables
system well. unit was presented. The quality of the synthesized
Secondly, formal grapheme-to-phoneme speech is reasonably intelligible but quite unnatural.
conversions need to be done at the early stage of the The development of the system has arisen many
development to avoid chaos in the system characteristic that require deeper understanding. We are
implementation. This is also important to make sure able to identify the problems and proposed the methods
the design is well structured and can easily to improve the synthesizer. It has also opened our mind
reengineer later. to another step further; which is to develop synthesizer
Another thing is that, the synthesizer must have using unit selection database with small wave
additional module which is the smoothing algorithm. modification implement in it.
Smoothing is important to ensure that the
concatenation between syllables has as small ACKNOWLEDGEMENT
distortion as possible and also to ensure the
mismatches between words are lessen as much as Authors thank all people at Computer-Aided Translation
possible. Unit (UTMK) especially to the other Speech Team member at
UTMK; Dr Bali Ranaivo, Tan Tien Ping and Dr Chuah Choy
We also need to re-record speech segments. More
Kim. We would also say our gratitude to UTMK’s staff for
attention needs to be paid to the frequency and
the information on the Malay sound system rules and also to
amplitude range as well as the duration of each pre- Morshidah Yazid as another developer of the system.
recorded segments. This is important to ensure only
slight modification is necessary to apply to the wave
REFERENCES
or else it will cause other distortion problems. It is
also important to choose the right environment such [1] S. Lemmetty, “A Review of Speech Synthesis
as the recording room and also the equipment use for Technology”, Master Thesis,Department of Electrical
and Communication Engineering, Helsinki University
recording to increase the sound quality.
of Technology, Helsinki, Finland, March 1999.
Each of the improvement plans is in parallel [2] X. Huang, A.Acero and H.-W. Hon, “Spoken Language
process in UTMK and currently ongoing. Processing A Guide to Theory, Algorithm and System
Development”, New Jersey: Prentice Hall, 2001.
VI. FUTURE WORK [3] A.W. Black and N. Campbell, “Optimising Selection of
Units from Speech Databases for Concatenative
Synthesis”, Proceeding Eurospeech ’95, Madrid, Spain,
In UTMK, we have a prototype engine which the September 1995, pp. 581 – 584.
unit for concatenative synthesis are phonemes. [4] A.M. Zeki and N. Azizah, "A Speech Synthesizer for
Compared the quality of the results, the syllable is Malay Language”, National Conference on Research
indeed produced better results. However, with no wave and Development in Computer Science, Selangor,
modification, the distortion will occur and this may Malaysia, October 2001
lead to low natural quality and perhaps low [5] Y. A. El-Imam and Z.M. Don, "Text-to-Speech
intelligibility speech. On the other hand, if we Conversion of Standard Malay”, International Journal Of
implement wave modification algorithm, the distortion Speech Technology, no 3, pp. 129 – 146, 2000.
[6] N.-H. Samsudin, “A Malay Speech Synthesizer”, Final
will still occur. The main point is, once the waveform is
Year Undergraduate Project Report, School of Computer
modified, the distortion will exist. It is just the matter of Science, Universiti Sains Malaysia, Penang, Malaysia,
how significant the distortion is to human March 2003.
perceptibility. If the modified sound’s value range, such
as the pitch sound is not much differ from the original
pitch value, than the distortion is not very significant
(assuming other modified attribute is corresponded
with the original sound properly).
Realizing these facts, we are going to implement
another concatenative synthesizer approach using unit
selection database as its unit. They will be a few
candidates of the same utterance with different