You are on page 1of 25

SOFTWARE DREAMS AND TALKING MACHINES

Software Dreams and Talking Machines


Speech and Speech Recognition

Resources on the World Wide Web

[The following materials were gathered from the web during the month of May 1996.
This article has been in periodic update from 1993-1996.]

Author: Andrew Hund

This is a REPORT. SUPERADAPTOID does not REVIEW products that have not been
personally evaluated by DEMONSTRATION.

Report Presented By: SUPERADAPTOID

The following article outlines the scope of Comp.Speech FAQ Postings. Institutional,
research, and business resources available in the web. These business and product
listings are not complete. However, this represents the Better-Of-The-Best of lists.

COMP.SPEECH FAQ POSTING - PART 3/3

Text By Andrew Hunt

FAQ SECTION 5 - SPEECH SYNTHESIS

SpeechLinks: Speech Synthesis


Q5.1: What is speech synthesis?
Q5.2: How can speech synthesis be performed?
Q5.3: References/Books on Synthesis
Q5.4: Speech Synthesis on the WWW
Q5.5: Speech Synthesis Software/Hardware

Q5.1: WHAT IS SPEECH SYNTHESIS?

Speech synthesis is the task of transforming written input to spoken output. The input
can either be provided in a graphemic/orthographic or a phonemic script, depending on
its source.

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (1 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Could someone provide a more informative description?

Q5.2: PERFORMING SPEECH SYNTHESIS

There are several algorithms. The choice depends on the task they're used for. The
easiest way is to just record the voice of a person speaking the desired phrases. This is
useful if only a restricted volume of phrases and sentences is used, e.g. messages in a
train station, or schedule information via phone. The quality depends on the way
recording is done.

More sophisticated but worse in quality are algorithms which split the speech into
smaller pieces. The smaller those units are, the less are they in number, but the quality
also decreases. An often used unit is the phoneme, the smallest linguistic unit.
Depending on the language used there are about 35-50 phonemes in western European
languages, i.e. there are 35-50 single recordings. The problem is combining them as
fluent speech requires fluent transitions between the elements. The intellegibility is
therefore lower, but the memory required is small.

A solution to this dilemma is using diphones. Instead of splitting at the transitions, the
cut is done at the center of the phonemes, leaving the transitions themselves intact.
This gives about 400 elements (20*20) and the quality increases.

The longer the units become, the more elements are there, but the quality increases
along with the memory required. Other units which are widely used are half-syllables,
syllables, words, or combinations of them, e.g. word stems and inflectional endings.

Q5.3: REFERENCES/BOOKS ON SYNTHESIS

BOOKS AND PAPERS

* Douglas O'Shaughnessy, Speech Communication: Human and Machine Addison


Wesley series in Electrical Engineering: Digital Signal Processing, 1987.

* D. H. Klatt, "Review of Text-To-Speech Conversion for English", Jnl. of the Acoustic


Society of America (JASA), Vol 82, pp 737-793.

* "Talking Machines, Theories, Models and Designs" Eds, G. Bailly & C. Benoit
(Elsevier: North Holland)

* I. H. Witten. Principles of Computer Speech, London: Academic Press, Inc., 1982.

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (2 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

* W.B. Kleijn and K.K. Paliwal (Eds.), Speech Coding and Synthesis, Elsevier,
Amsterdam, 1995.

* John Allen, Sharon Hunnicut and Dennis H. Klatt, "From Text to Speech: The MITalk
System", Cambridge University Press, 1987.
Survey of the State of the Art in Human Language Technology Report edited by Ronald
A. Cole et. al. with a section on Text-to-Speech Technologies.

BIBLIOGRAPHIES AND REFERENCE LISTS

WWW searchable online-bibiliography for Phonetics and Speech Technology with more
than 8000 entries.
Provided by Institut fur Phonetik at Johann Wolfgang Goethe-Universitat
Frankfurt.

Computational Speech Processing


Speech Analysis, Recognition, Understanding, Compression, Transmission,
Coding, Synthesis ; Text to Speech Systems, Speech to Tactile Displays, Speaker
Identification, Prosody Processing : BIBLIOGRAPHY, by Conrad F.Sabourin, 1994,
2 volumes, 1187p, ISBN 2-921173-21-2, INFOLINGUA inc., P.O. Box 187 Snowdon,
Montreal, H3X 3T4, Canada.
See also: http://gomer.mlink.net/infolingua.html

Q5.4: SPEECH SYNTHESIS ON THE WWW

Most of the following are links to WWW pages with demonstrations of speech
synthesis. Plenty more links are included in the detailed list of speech synthesis
software/hardware in Q5.5.

Speech Synthesis "Museum" URL: http://www.cs.bham.ac.uk/~jpi/synth/museum.html

Maintained by Jon Iles (j.p.iles@cs.bham.ac.uk) at the University of Birmingham.


Information and speech samples for

YorkTalk
Loughborough Sound Images
University of Birmingham - FDFS
Eurovocs
DECtalk
AT&T Bell Labs Synthesiser

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (3 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

S.W.A.Ll.C. - Welsh Synthesis from CSTR


All-Prosodic Speech Synthesis - IPOX
Orator from Bellcore

Pavarobotti
WWW demo of the Pavarobotti synthesis technology developed at the National
Center for Voice and Speech

Say...
WWW demo of the rsynth speech synthesis software. The WWW capability was
implemented by Axel Belinfante.

Musee sonore de la synthese de la Parole en francais


Speech synthesis examples from a series of French language speech
synthesisers plus links to other speech synthesis demo pages.

ICP-Grenoble
CNET-Lannion (with TD-PSOLA)
KTH-Stockholm
Universite-Mons - several versions
AT&T Bell Laboratories Voices
WWW interface to the Demo of the Laureate speech synthesis system - not yet
commercially available. (this link may be good but it gives odd error messages)

ORATOR from Bellcore


Online demo of the ORATOR system developed at Bellcore.

SVOX from TIK, ETH in Zurich


Demo of German speech synthesis from Institut fur Technische Informatik und
Kommunikationsnetze.

Multi-Lingual TTS from Gerhard-Mercator University, Duisburg


Synthesis in German, English or Japanese.

TMH: Institutionen for Taloverforing och Musikakustik, Kungliga Tekniska Hogskolan


Synthesis in Swedish, Finish, Norwegian, Icelandic, Danish, British and American
English, French, German, Italian, Spanish, LA Spanish and Greek.

Examples of several types of speech synthesis.


Articulatory Synthesis by HyperASY. SineWave Synthesis. Gestural

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (4 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Computational Model. Pattern Playback system of the 1940's!

BeSTspeech from Berkeley Speech Technologies, Inc., (BST)

Eurovocs Multilingual Speech Synthesis


Based on Lernout and Hauspie technology.

HADIFIX German Speech Synthesis


Provided by the Instituts fur Kommunikationsforschung und Phonetik, Universitat
Bonn.

Centigram's TruVoice Demo


Allows control of speech rate, pitch and other prosodic characteristics.

Institute of Phonetic Sciences


Links to lots of on-line speech synthesis demonstrations provided by the Institute
of Phonetic Sciences of the Faculty of Arts of the University of Amsterdam.

Yahoo page on speech generation

Q5.5: SPEECH SYNTHESIS SOFTWARE/HARDWARE

Please email any updates, corrections or additions to the following list. The range of
commercially available synthesis software is growing rapidly so any help in keeping up
to date will be appreciated.

Other lists of speech synthesis software on the WWW include:

Kevin Lenzo's list of Macintosh Speech Resources and Apps

Speech Toys Speech Synthesis Information

IN THE FAQ...

The following speech recognition software/hardware is described in the comp.speech


FAQ.

AsTeR
BeSTspeech from Berkeley Speech Technologies, Inc., (BST)
TheBigMouth

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (5 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Creative TextAssist and TextAssist API


CSRE: Computerized Speech Research Environment
DECtalk: Text-to-Speech from Digital
Eloquence
Emacspeak - A Speech Output Subsystem For Emacs
Eurovocs
HADIFIX
Infovox Product Range
IPOX: All Prosodic Speech Synthesis Architecture
JSRU
Klatt-style synthesiser
KPE80 - A Klatt Synthesiser and Parameter Editor
"learph": Trainable text-to-phoneme software by Antonio Lucca
Lernout and Hauspie Text-To-Speech (3 products)
Lernout and Hauspie Text-To-Speech Windows SDK
Macintosh Speech Output Applications
MacinTalk
Monologue for Windows from First Byte
Narrator Translator Library
Narrator
TextToSpeech Kit (NeXT)
Orator from Bellcore
PAM - A Text-To-Speech Application
ProVerbe Speech Engine for Windows
ProVoice Developer's Speech Toolkit from First Byte
RC Systems V8600/V8601 Text to Speech synthesizers
rsynth
SENSYN speech synthesizer
SGI Developers Toolbox Synthesiser
SIMTEL
Sound Bytes DeveloperUs Kit
spchsyn.exe
Speak
Speech Manager and PlainTalk
Text to Phoneme Program 1
Text to phoneme program 2
Text to phoneme program 3
Tinytalk
TrueTalk
TruVoice from Centigram
WinSpeech

AsTeR

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (6 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Platform: UNIX
Description:
TTS front-end program which encodes structural information about documents in
speech synthesis. For more information check out:
http://www.research.digital.com/CRL/personal/raman/aster/aster-toplevel.html

Operation requirements: Lisp: Lucid, clisp


Contact: T. V. Raman
WWW page
Email: raman@adobe.com

BeSTspeech from Berkeley Speech Technologies, Inc., (BST)

Platform: ?
Description: BeSTspeech reads ASCII text no vocabulary limits. Available for Dutch,
English (male and female), French, German, Italian, Portuguese, Spanish, Arabic,
Cantonese, Japanese, Korean, Malay, Mandarin and Russian.

Price: ?
Contact: Berkeley Speech Technologies, Inc.
2246 Sixth Street, Berkeley, California 94710, USA
Ph: (510) 841-5083, Fax: (510) 841-5093
Email: webmaster@bst.com
WWW

TheBigMouth - a Text to Speech Program

Platform: NeXT
Description: Text to speech program based on concatenation of pre-recorded speech
segments. NeXT equivalent of "Speak" for Suns.
Availability: try NeXT archive sites such as sonata.cc.purdue.edu.

Creative TextAssist

Platform: Windows
Description: Based on DECtalk speech synthesis. A detailed technical description of
TextAssist is provided on the Creative WWW pages.

Availability: Creative TextAssist is bundled with most (all?) Creative Sound Blaster
audio cards.

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (7 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Contact: Creative Labs, Inc.


Address, phone, email etc unknown
WWW
Info

Creative TextAssist API

Platform: Windows
Description: The TextAssist API (TAAPI) is created for Microsoft Windows 3.1x and
Windows 95 developers who intend to develop 16-bit Text-to-Speech software
applications using Creative's TextAssist speech engine. It supports direct control of
speech output characteristics, concurrent playback of text-to-speech and wave files,
foreign language support, speech synchronization, and exception dictionaries. It also
includes a voice editing tool for creating new custom voices, a Visual Basic Custom
Control for high-level text-to-speech support in Visual Basic and other languages and
some sample programs.

Availability: The TextAssist API is released to registered developers at no cost.


Contact: WWW

CSRE: Computerized Speech Research Environment

Platform: PC
Description: CSRE is a software system which includes in an implementation of the
Klatt speech synthesizer. See the CSRE entry in Q1.9 and the AVAAZ WWW pages for
more detail.

Contact: AVAAZ Innovations Inc.


P.O.Box 8040, 1225 Wonderland Rd. N, London, Ontario, CANADA, N6G 2B0
Ph: +1-519-472-7944 , Fax: +1-519-472-7814
Email: info@avaaz.com
WWW

DECtalk Speech Synthesis

Platform: Windows NT, Alpha with Digital UNIX and RS232 ports
Description:
Converts ordinary text into natural-sounding, intelligible speech. Provides
personalized voices, and extensive user controls. DECtalk technology is available
for the following packaging options.

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (8 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

DECtalk PC card option:


An industry-standard ISA/EISA bus card implementation that can be integrated
with any Intel 486 processor-based system running DOS or Windows.
Applications can be interfaced to the bus via a DOS Terminate and Stay Resident
(TSR) driver or a Windows Dynamic Link Library (DLL). This option is available
with an external speaker with volume control and headphone jack.

DECtalk Express external package:


An external, portable package that you can plug in to any PC or serial port. The
external package includes a built-in speaker and headphone jack, plus combined
on/off and volume controls and a rechargeable battery pack.

DECtalk Software solution:


Software-only text to speech for Alpha or Intel systems running Windows NT or
Alpha systems running Digital UNIX. Provides complete speech synthesis
capabilities so developers can enhance applications with DECtalk technology.
DECtalk Software output can be directed to audio devices, into WAVE files, or into
memory buffers.

Pricing:DECtalk-Speech-Synthesis
More Information:
Digital Equipment Corporation WWW pages:
Ph: 1-800-DIGITAL

DECtalk Software

Platform: Digital UNIX and Windows NT


Description:
DECtalk converts standard ASCII text into natural, intelligible speech. Speech
output through any audio device is supported by Microsoft Video for Windows or
Multimedia Services for Digital UNIX. An API gives developers direct access to
text-to-speech functions. Provides nine voice personalities (4 female, 4 male, 1
child). Provides punctuation and tonal control, supports customized
pronunciation of trade jargon and acronyms. Common programming interface
works with both Alpha and Intel platforms.

More Information:
Digital Equipment Corporation WWW pages:
DECtalk Software page:
Ph: 1-800-DIGITAL

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (9 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Eloquence

Platform: Windows, Solaris, SunOS, SGI, RS/6000


Description:
Software based text-to-speech package. Generates waveforms completely
algorithmically instead of by concatenating waveforms, for maximum flexibility
and naturalism. For instance, when the user requests a deeper voice, the software
simulates a larger vocal tract, instead of simply pitch-shifting samples.

Uses high-level linguistic parsing, which obviates the need for a huge dictionary.
Handles numbers, acronyms, currency, etc. Includes a set of annotation symbols,
for placing stress on particular words, expressing excitement/boredom, etc. Also
allows phonetic input. Support for Windows DDL.

Produces male and female voices for General American English. Dialects under
development include Alabama, Brooklyn, and Boston.

Price:
Flexible license agreements on application.
Availability:
Eloquent Technology, Inc.
2389 North Triphammer Road
Ithaca, NY 14850
Ph: (607) 607-266-7020 Fax: (607) 607-266-7030
Email: eti@plab.dmll.cornell.edu

Emacspeak - A Speech Output Subsystem For Emacs

Platform: UNIX, Emacs


Description:
Emacspeak is a speech output system that will allow someone who cannot see to
work directly on a UNIX system. Emacspeak is built on top of Emacs. With
emacspeak loaded, Emacs provides spoken feedback for everything you do.
Emacspeak currently supports the new Dectalk Express speech synthesizer, as
well as older versions of the Dectalk e.g. the MultiVoice. See the Emacspeak
WWW page, the Emacspeak FAQ or the Emacspeak distribution for additional
details.

Requirements:
Requires GNU FSF Emacs 19 (version 19.23 or later) and TCLX 7.3B (Extended
TCL) to run Emacspeak.
Availability:

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (10 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Not known at this time (web sites are gone)


Contact: T. V. Raman, raman@adobe.com

Eurovocs

Platform: Various - RS232 Connection


Description:
Eurovocs is a stand-alone text-to-speech synthesizer which uses the text-to-
speech technology of Lernout and Hauspie Speech Products. Available for Dutch,
French, German and American English with other languages planned for release
soon. One Eurovocs device can support two different languages. Eurovocs can
be connected to any computer via a standard serial interface (RS232). It supports
personal dictionaries, generation of DTMF tones, and pronunciation of special
character sequences such as digit strings, telephone-numbers, date and time
indications, abbreviations, alphanumeric strings etc.

Contact:
Technologie & Revalidatie
Postbus 128, B-9000 Gent, Belgium
Ph: +32-9-264 33 97, Fax: +32-9-264 35 94
E-mail: noe@elis.rug.ac.be WWW page:

HADIFIX

Platform: Windows
Description:
German speech synthesis system developed at the Institute for Communications
Research and Phonetics , University of Bonn. Provides conversion of input text to
phonemes, automatic prediction of stress, phrasing and pitch, and speech
generation by concatenation of small units of natural speech. Demisyllables and
similar units are used; they comprise all consonants before the vowel and the
beginning of the vowel (initial demisyllable) or the end of the vowel and the
following consonants (final demisyllable). For example, the word 'Strolch' is
formed by concatenating 'Stro' and 'olch'.

Demo:
Windows demo software available. Limited to synthesis of one short text (text.txt)
at a time. Speech format limitations too. 1.3MB file.
WWW page
On-line demo

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (11 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Infovox Product Range

Description:
Multilingual Text-to-speech systems, languages available: American English,
British English, German, French, Spanish, Italian, Swedish, Norwegian, Icelandic,
Danish and Finnish.

Product name:INFOVOX 500, PC BOARD


Product description: Half length expansion board for IBM PC, XT, AT, PS/2 model
30 or compatible personal computers. The board can also be connected via the
serial port. Language and control program for downloading into RAM or mounted
on EPROMs

Platform: for IBM PC, XT, AT, PS/2 model 30 or compatible


❍ Delivered standard interface: MS DOS I/O driver

Product name: INFOVOX 600, OEM BOARD


Product description: OEM board built with CMOS IC's. Language and control
program are stored in on-board fixed memory.

Platform: any, Interface: 9-pole D-SUB (RS 232-C) 300-9600 Baud.


❍ Delivered standard interfaces: MS DOS I/O driver and interface to Apple

Speech manager.
Product name: INFOVOX 700, DESKTOP UNIT
Product description: Desktop unit with built in Infovox 600 to be connected to any
computer or terminal via an RS 232-C serial interface. Built in loudspeaker and
rechargable battery for 4 hours use, and control knobs for continuous control of
speech volume and speed.

Platform: any

❍ Delivered standard interfaces: MS DOS I/O driver and interface to Apple

Speech manager
Product name: INFOVOX 650, OEM BOARD
Product description: OEM-board built with CMOS IC's. Language and control
program are stored in on-board memory.
❍ Platform: any, Interface: 9 pole D-SUB (RS 232-C) 300-9600 Baud

❍ Delivered standard interfaces: MS DOS I/O driver and interface to Apple

Speech manager
Product name: INFOVOX 750, DESKTOP UNIT
Product description: Desktop unit with built in Infovox 650 to be connected to any
computer or terminal via an RS 232-C serial interface. Built in loudspeaker and
rechargable battery for 5 hours use, and a control knob for continuous control of
speech volume.

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (12 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Platform: any

❍ Delivered standard interfaces: MS DOS I/O driver and interface to Apple

Speech manager
Product name: Infovox 210, software for Apple Macintosh
Product description: Software based text-to-speech conversion. Produces 16 bit
and 8 bit sound. Delivered on 3.5" diskettes with user lexicon and a complete
documentation.

Platform: Apple Macintosh with minimum 68030, 33 MHz microprocessor.


❍ Delivered standard interfaces: Standard interface to Apple Speech manager

Product name: Infovox 220, software for Microsoft Windows.


Product description: Software based text-to-speech conversion. Produces 16 bit
sound and conforms to Microsoft Windows multimedia standard MCI. Delivered
on 3.5" diskettes with user lexicon and a complete documentation.

❍ Platform: IBM compatible PC with minimum 486, 25 MHz microprocessor.


❍ Delivered standard interfaces: Standard interface to Microsoft Windows 3.1
and sound boards supporting Microsoft Windows multimedia driver for
audio.
Contact:
Telia Promotor Infovox AB
TTS Sales Division
P.O. Box 2069
S-171 02 Solna, Sweden
Ph: +46 8 764 35 00 Fax: +46 8 735 78 76
email: tts-sales@infovox.se

IPOX: All Prosodic Speech Synthesis Architecture

Description:
IPOX is an experimental, all-prosodic speech synthesizer, developed by Arthur
Dirksen and John Coleman. IPOX is freely available (after registration) for
evaluation and non-profit research purposes.

Requirements:
PC (preferably a fast 486) running Windows 3.1 or higher. Sound output requires a
16-bit Windows-compatible sound card
Availability: By WWW

JSRU

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (13 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Platform: UNIX and PC


Cost:
100 pounds sterling (from academic institutions and industry)
Description:
A "C" version of the JSRU system, Version 2.3 is available. It's written in Turbo C
but runs on most Unix systems with very little modification. A Form of Agreement
must be signed to say that the software is required for research and development
only.

Contact:
Dr. E.Lewis eric.lewis@bristol.ac.uk

Klatt-style synthesiser

Platform: Unix
Cost: Free
Description:
Software posted to comp.speech in late 1992.
Availability:
By ftp from the comp.speech ftp site

KPE80 - A Klatt Synthesiser and Parameter Editor

Platform: Unix
Description:
The KPE80 program provides a graphical interface for the implementation of the
Klatt 1980 formant synthesiser written by Jon Iles and Nick Ing-Simmons. It was
inspired by IGE, a piece of code written by Rob Fletcher.

Technical Desc.:
It is comprised of an X-Window interface and version 3.03 of the synthesiser code.
The interface allows users to display and edit Klatt parameters using a graphical
display which includes the time-amplitude waveform of both the original speech
and its synthetic copy, and some signal analysis facilities. Most of the work in
choosing the parameter values to produce the synthetic copy has to be done by
the user. KPE will estimate the fundamental frequency contour from an original
token; this estimate will need to be amended where errors occur. It is possible to
specify the formant trajectories with some precision by overlaying the appropriate
formant frequency parameter tracks on the spectrogram of the target waveform. A
number of facilities exist to help in the refinement of parameter values: original
and synthetic waveforms can be compared aurally, spectrally, and
spectrographically using built-in speech analysis facilities.

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (14 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

File formats:
KPE will read RIFF (.wav) files and SFS files. (SFS is a suite of speech-signal
processing programs available free from Phonetics and Linguistics, UCL.)
Availability:
❍ KPE for SunOs 4.1.3 (statically compiled libraries)

❍ KPE for Linux (statically compiled libraries)

❍ The source code (needs gcc and SUIT to compile)

❍ A postscript overview of KPE

❍ The SFS distribution

See also: Public domain Klatt-style speech synthesis code.


Contact: Andrew Simpson
Department of Phonetics and Linguistics, University College London
Wolfson House, 4 Stephenson Way, London NW1 2HE
Email: a.simpson@ucl.ac.uk
WWW page

"learph": Trainable text-to-phoneme software by Antonio Lucca

Platform: UNIX
Description: Experimental software which learns text to phoneme translation from
examples using decision-tree-like data structures. It is based on the assumption that
each letter can correspond to different phoneme strings depending on the context.

Availability: Examples and source are available on the WWW


Contact: Antonio Lucca: lucca@ghost.dsi.unimi.it

Lernout & Hauspie Text-to-Speech (3 products)

Lernout & Hauspie have three TTS products. The functionality of the products is similar,
however, they differ in hardware implementation and other details where described
below.
L&H tts2000/T: TTS for the Telephony and Telecommunications Market
L&H tts2000/M: TTS for the Computer and Multimedia Market
L&H tts3000/C: TTS for the Buisness and Consumer Electronics Market
Description:
Text to Speech (TTS) software based on parameterized segment concatenation
(diphones, triphones and tetraphones) algorithms. Available for US English,
German, Dutch, French, Spanish (Castilian), Italian and Korean.
General features include:
❍ The control of volume, speech rate and speech pitch.

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (15 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

❍ The use of control sequences to customize TTS output (adding pauses, using
phonetic input, etc.).
❍ Switching between languages at run time.

❍ A personal vocabulary editor is available for building exception dictionaries.

❍ Readout modes: letter by letter, word by word or sentence by sentence.

❍ Input formats: orthographic input, phonetic input, phonetic input with prosodic

information.
tts2000/T
❍ Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear PCM.

❍ Sampling Frequency: 8kHz

❍ Single channel platform examples: SHARP SH7000, ARM6/ARM7, Intel i960, TI

TMS320C31, AT&T DSP3210


❍ Multi-channel platform examples: TI TMS320C31, AT&T DSP3210

tts2000/M
❍ Output formats: 8/16 bit wave format, 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit

linear PC.
❍ Sampling Frequency: 8/10/11.025 kHz

❍ Single processor platform examples: ARM6/ARM7, Intel 386/486/Pentium,

Motorola 68040
❍ Two processor platform examples: {Intel 386/486/Pentium or Motorola 68030} and

{ADI ADSP21XX or Motorola 5600X or TI TMS320C25/20C5X}


tts3000/C
❍ Output formats: 8 bit mu-law PCM, 8 bit A-law PCM, 16 bit linear PCM.

❍ Sampling Frequency: 10kHz

❍ Single processor platform examples: SHARP SH7000, ARM6/ARM7, Intel i960, TI

TMS320C31, AT&T DSP3210


❍ Two processors platform examples: { SHARP SH7000 or ARM6/ARM7 or Intel

386EX or Motorola 683XX} and {ADI ADSP21XX or Motorola 5600X or TI


TMS320C25/C5X or TI TSP50C10}
See also: L&H Windows TTS SDK
More Information: on the Lernout & Hauspie WWW pages
Price: Unknown
Contact: Lernout & Hauspie Speech Products
800 West Cummings Park, Suite 3100
Woburn, MA 01801, USA
Tel: (617) 932 4118
Fax: (617) 932 9209
Email: sales@lhs.com
WWW

Lernout & Hauspie Text-to-Speech Windows SDK

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (16 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Platform: IBM-Compatible
Description: The L&H Text-to-Speech software developers kit is able to integrate text-to-
speech technology with your own or existing PC applications under Microsoft Windows
3.1. This software will allow conversion of written text into clear human sounding
synthetic speech.

Requirements:
❍ IBM-compatible PC 386 DX/33, 8Mb RAM

❍ MS DOS 5.0 and MS Windows 3.1 (or higher)

❍ SoundBlaster compatible sound board.

See also: L&H TTS Products


More Information: on the Lernout & Hauspie WWW pages
Price: Unknown
Contact: Lernout & Hauspie Speech Products
800 West Cummings Park, Suite 3100
Woburn, MA 01801, USA
Tel: (617) 932 4118
Fax: (617) 932 9209
Email: sales@lhs.com, WWW page

Macintosh Speech Output Applications

A comprehensive list of Macintosh Speech Applications is provided by Kevin Lenzo at


CMU
The Apple Speech WWW Site has some useful information

MacinTalk

Platform: Macintosh
Cost: Free
Description: Formant based speech synthesis. There is also a program called "tex-edit"
which apparently can pronounce English sentences reasonably using Macintalk.

Note: MacinTalk doesn't run reliably on Macintosh's with new sound hardware under the
lastest OS (System 7.1 w/HUD 2.0). More recent software is listed above.
Availability:
By anonymous ftp from many archive sites (have a look on archie if you can). tex-
edit is on many of the same sites.
❍ http://www.riken.go.jp/archives/mac/umich/sound/speech/00index.txt

❍ http://jumbo.com/util/mac/speech/

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (17 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

This article by my friend Denise Lance will give you some ideas on the more modern
speech offerings of Apple/Macintosh. When you have finished reading the article (there
are some appropriate notes to read) you can also download English_Text-to-Speech
from there.

Monologue for Windows from First Byte

Description:
Monologue is a software program that reads text from the clipboard in Windows
16 or 32 bit applications. It can be found as a bundled product with many sound
cards and multimedia general purpose computer systems. Monologue can add
the element of speech to virtually any text oriented application. Any
pronounceable combination of letters and numbers will be spoken clearly. It can
be applied to tasks such as eyes-free proofreading, data verification (e.g.
spreadsheets), reading E-mail and more. User-changeable parameters provide
control over the sound quality by allowing for changes in pitch, and the speed of
speech. An exception dictionary saves preferred pronunciation of words and
abbreviations.

Monologue Win32 now includes support for the Microsoft SAPI. Monologue male
"SpeechFonts" are available for US English, British English, German, French,
Latin American Spanish, Italian. A US English Female SpeechFont is also
available. For more detailed information and examples go to the First Byte WWW
pages.

Availability: Currently bundled with many sound cards and multimedia general purpose
computer systems. For pricing, licensing details, and release information see the First
Byte WWW pages or email info@firstbyte.davd.com.

See also: ProVoice Developer's Speech Toolkit from First Byte


Contact: First Byte
19840 Pioneer Ave., Torrance, CA 90503
Ph: 310-793-0610 Fax: 310-793-0611
Email: info@firstbyte.davd.com or WWW page

Narrator Translator Library

Platform: Amiga
Description:
A replacement for the Commodore-supplied "translator.library" which is a part of
the Narrator speech synthesis package. It implements multi-lingual text-to-speech
for an Amiga. The library allows the user to specify the language the text to be

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (18 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

spoken should be translated as. This can be done by setting the default language
or by including markup codes in the text in a similar way to Latex or Html. eg:
"\french{Bonjour}". There is currently support for American English, British
English, Swedish, Maori, Finnish, German, Icelandic, Klingon, Polish, Italian, and
Welsh.P
Availability:
The library (but not source) is available by anonymous ftp from Aminet
More Information: is available on the WWW

Narrator

Platform: Amiga
Description:
Formant based speech synthesis. Includes a Engish-to-phoneme translation
library, and a SPEAK: pseudo-device for speech output.
Hardware: Standard Amiga hardware
Availability: Part of AmigaOS
See Also: The Narrator Translation library

TextToSpeech Kit

Platform: NeXT Computers


Description:
The TextToSpeech Kit does unrestricted conversion of English text to
synthesized speech in real-time. The user has control over speaking rate, median
pitch, stereo balance, volume, and intonation type. Text of any length can be
spoken, and messages can be queued up, from multiple applications if desired.
Real-time controls such as pause, continue, and erase are included.
Pronunciations are derived primarily by dictionary look-up. The Main Dictionary
has nearly 100,000 hand-edited pronunciations which can be supplemented or
overridden with the User and Application dictionaries. A number parser handles
numbers in any form. A letter-to-sound knowledge base provides pronunciations
for words not in the Main or customized dictionaries. Dictionary search order is
under user control. Special modes of text input are available for spelling and
emphasis of words or phrases. The actual conversion of text to speech is done by
the TextToSpeech Server. The Server runs as an independent task in the
background, and can handle up to 50 client connections.

Misc:
The TextToSpeech Kit comes in two packages: the Developer Kit and the User Kit.
The Developer Kit enables developers to build and test applications which
incorporate text-to-speech. It includes the TextToSpeech Server, the

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (19 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

TextToSpeech Object, the pronunciation editor PrEditor, several example


applications, phonetic fonts, example source code, and developer documentation.
The User Kit provides support for applications which incorporate text-to-speech.
It is a subset of the Developer Kit.

Hardware:
Uses standard NeXT Computer hardware.
Cost:
❍ TextToSpeech User Kit: $175 CDN ($145 US)

❍ TextToSpeech Developer Kit: $350 CDN ($290 US)

❍ Upgrade from User to Developer Kit: $175 CDN ($145 US)

Availability: Trillium Sound Research


1500, 112 - 4th Ave. S.W., Calgary, Alberta, Canada, T2P 0H3
Tel: (403) 284-9278 Fax: (403) 282-6778
Order Desk: 1-800-L-ORATOR (US and Canada only)
Email: TTSInfo@trillium.ab.ca

Orator Text-to-Speech Synthesizer

Platform: SUN SPARC, Decstation 5000. Written in C, and therefore portable to other
UNIX platforms. Some successful ports: --> HP, RS-6000, PC-Unix [Linux].
Description:
Sophisticated speech synthesis package. Has text preprocessing (for
abbreviations, numbers), acronym rules, and human-like spelling routines.
Natural-sounding synthesis based on demisyllable concatenation. Has high
accuracy for pronunciation of names of people, places and businesses in
America; good accuracy for English text; rules for stress and intonation marking;
various methods of user control and customization at most stages of processing.

A new version of the ORATOR system is under development. Both ORATOR and
this new "ORATOR II" system are capable of general text synthesis. The ORATOR
II system has a more natural-sounding voice.

Hardware: Runs on common SPARC or Decstation workstations, using their internal


audio output capability. Recommend at least 16M of memory.

More detailed information plus examples of ORATOR synthesis are available on the
ORATOR WWW pages

Misc 1: A free demo cassette is available.

Misc 2: Examples of Orator are also available on the University of Birmingham Speech

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (20 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Synthesis "Museum" WWW site (see Q5.4).

Availability and Pricing: Contact Bellcore's Licensing Office


Tel: 1-800-521-CORE (521-2673)
Fax: 1-908-336-2559
Email to Anthony Lindsey: alin1@panix.com

PAM - A Text-To-Speech Application

Platform: Windows
Description:
PAM is a talking personal assistant and text reader application. It uses the
ProVoice TTS package. PAM will verbally advise about appointments and
reminder messages at specified times during the day. It can read text files,
clipboard text, and text sent in DDE messages. Using the full verbal interface,
PAM can be used by visually challenged individuals. Shareware - thirty day free
trial.

Requirements: Any Windows sound card, speakers or headphones.


Min. memory - 4 megs, 8 megs recommended.
A more complete description is available on the JTS homepage
Availability:
The shareware and associated files can be downloaded by ftp
Price: $US40 for the registered version.
Contact: Tom Slemko:tslemko@islandnet.com
JTS Micro Consulting Ltd
10931 Lytton Road, RR#4
Ladysmith, B.C., Canada, V0R 2E0

ProVerbe Speech Engine for Windows (95 and NT)

Description: The ProVerbe Speech Engine produces natural sounding speech from
written text. Naturalness is achieved by using the TD-PSOLA process from the CNET
(France telecom's research lab.) which is based on the concatenation of elementary
speech units (including diphones). Supported languages are British English, German,
French and Spanish. For multi-channel applications Elan Informatique also provides
hardware platforms. The Elan Informatique provides a SDK reference document (sdken.
exe: WinWord6 format in a self extractable compressed format).

Demo versions:
❍ Telephone demonstration: +33-61 17 6701

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (21 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

❍ Anonymous ftp
The directory includes the following demos.
❍ PVBSEDP.zip: French male voice (4.3MB)

❍ PVBFRF.zip: French female voice (4.6MB)

❍ PVBSPA.zip: Spanish male voice (4.6MB)

❍ PVBGER.zip: German male voice (14.0MB)

❍ PVBENG.zip: English male voice (9.9MB)

The directory also includes synthesis samples for a French male voice, French female
voice, English male voice, and a German male voice. The readme file in the directory
describes the memory requirements for the demos.
A CD-ROM with all these demonstrations is available. To request it, please email Elan
Informatique.

Contact: Elan Informatique


4 rue Jean Rodier, 31400 TOULOUSE FRANCE
Contact person: Pierre Delrat
Phone: +33-61-36-0777 Fax: +33-61-36-0770
BBS: +33-61-36-0788
E-mail: 101346.465@compuserve.com
Anonymous FTP

ProVoice Developer's Speech Toolkit from First Byte

Platform: ProVoice Developer's Toolkits are available for DOS, Windows 3.1, Windows
95, Windows NT, OS/2, and Macintosh.
Description:
ProVoice allows programmers to add synthesized speech to their applications.
Your program passes text strings to the ProVoice speech engine that translates
text into audible speech. Male and/or female "SpeechFonts" are available for
many languages; English, French, German, UK British English, Italian, and
Spanish.

ProVoice converts text to speech in two phases using a set of phonetic


translation and pronunciation rules. First, the software analyzes and translates
text into "sound descriptors", a phonetic language with pitch, duration, and
amplitude codes which are needed to produce stress patterns in phrases and
sentences. Rules are used to analyze words, numbers, and punctuation. The
second phase converts the intermediate phonetic language in speech signals;
algorithms drive distinct speech signals into smooth flowing, continuous, clear
speech. Real time synchronization of mouth movement and word boundaries
allows animation of a graphical talking character, or highlighting of displayed text
as it is spoken.

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (22 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

Necessary tools and examples are provided for programmers to manipulate the
ProVoice speech technology; including installation instructions, extensive
samples programs, and complete documentation. In addition, sample code is
provided on disk to illustrate speech programming techniques.

Note 1: First Byte will perform custom work for embedded systems.

Note 2: ProVoice Windows includes support for the Microsoft SAPI. It will speak
through any Windows-supported wave audio device.

Note 3: Distribution of ProVoice for commercial use is subject to execution of a


Commercial Product Distribution License Agreement.

For more detailed information and examples go to the First Byte WWW page.
See also: Monologue for Windows from First Byte

Price and Availability:


Contact: First Byte
19840 Pioneer Ave., Torrance, CA 90503
Ph: 310-793-0610, Fax: 310-793-0611
Email: info@firstbyte.davd.com or WWW page.

RC Systems V8600/V8601 Text to Speech synthesizers

Platform 1: IBM PC: ISA card.


Platform 2: Interface to PC/104 standard microcontrollers.
Platform 3: Standalone (or embedded) thru RS232 or parallel printer port or processor
bus.
Description: Converts plain ASCII text to speech. Programmable voices, pitch rate,
volume, etc. Built-in DTMF and tone generators.
Price: $151-$299 US (qty 1)
Contact: RC Systems
1609 England Avenue, Everett, WA 98203, USA
Ph: (206) 355-3800 Fax: (206) 355-1098
Europe: +44181 539-0285

rsynth

Platform: Various (including Solaris2.3, SunOS4.1.3, HPUX, SGI Irix4.x, Linux)


Description: Public domain text-to-speech systm assembled from a variety of sources.
It supports CMU and BEEP format dictionaries (as described in Q1.10) and now utilises

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (23 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

stress marks in the dictionary in synthesising intonation.

Price: Free
Misc: Axel Belinfante has implemented a WWW rsynth demo
Availability: anonymous ftp #1 or anonymous ftp #2

SENSYN speech synthesizer

Platform: PC, Mac, Sun, and NeXt


Rough Cost: $300
Description:
This formant synthesizer produces speech waveform files based on the (Klatt)
KLSYN88 synthesizer. It is intended for laboratory and research use. Note that
this is NOT a text-to-speech synthesizer, but creates speech sounds based upon
a large number of input variables (formant frequencies, bandwidths, glottal pulse
characteristics, etc.) and would be used as part of a TTS system. Includes full
source code.

Availability: Sensimetrics Corporation


64 Sidney Street, Cambridge MA 02139.
Fax: (617) 225-0470; Tel: (617) 225-2442.
Email: sensimetrics@sens.com

SGI Developers Toolbox Synthesiser

Platform: SGI
Description: The SGI Developer Toolbox 4.0 CDROM contains a basicpublic domain text-
to-speech program in the publics/speak directory. The directory includes man pages
and source.

Availability: on the SGI Developer Toolbox 4.0 CDROM

SIMTEL

A wide range of speech related software, sound-blaster software and signal processing
software for PCs is available on SimTel and its mirror sites. It can be obtained by ftp
from:

● ftp://www.cdrom.com/pub/simtelnet/msdos/sound/

Note: Voicemaker - The archives include the program Voicemaker which synthesises

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (24 of 25)2004/12/10 02:42:38 •.•


SOFTWARE DREAMS AND TALKING MACHINES

speech.

GOOD HUNTING AND ENJOY!

Top | ACSP Home | SuperAdaptoid Column

file:///F|/summary/temp/GAR/SOFTWARE_DREAMS_AND_TALKING.HTM (25 of 25)2004/12/10 02:42:38 •.•

You might also like