You are on page 1of 4



ver the past 50 years, computer technology for automatic
speech recognition has advanced significantly. Low cost com-
mercial products are now available with good performance
in continuous speech recognition for people without speech
difficulties. Such software can give a new dimension to the
ability to use computers, including office tools, internet
access and even games.
We have been investigating the viability of automatic speech recognition use
by people with dysarthric speech difficulties. The key issues are the capability
of the software to adapt to the particular characteristics of the dysarthric
speech, and the degree to which the recognition can tolerate an increased
variability of certain characteristics of the dysarthric speech (Thomas-Stonell
et al, 1998; Blaney & Wilson, 2000).
From limited recognition of a small vocabulary of single words or digits in
the 1950s, speech research moved to new concepts in the matching of input
speech with stored speech databases in the 90s. Statistical modelling and
matching methods enabled the development of a series of viable products,
which gave continuous speech recognition with progressively increasing
vocabularies. Newer product technologies with what are called Hidden
Markov models, together with neural nets, allow effective and rapid user
training of the inbuilt speech databases. This personalisation gives a high
level of recognition accuracy (95 per cent) across many speakers.
Earlier research with people with dysarthria was restricted by technology in
terms of both cost and performance. However, even with the early template-
based solutions, successful recognition of dysarthric speech could be achieved,
with closely limited vocabulary and tailoring of the speech model (Ahmed, 1985).
Ferrier et al (1975) showed good recognition in trials with one, and then
ten, dysarthric speakers, using early software. A panel of listeners also scored
the intelligibility of the speakers. The work demonstrated the value and
effectiveness of the Hidden Markov models learning/training process where
recognition accuracy increased from 30 per cent to 90 per cent over three sessions.
The poorer speakers displayed more variability in their speech, including that
caused by fatigue. This demanded longer automatic speech recognition
training sessions but, ultimately, similar levels of recognition were achieved
to the moderate or mild dysarthria. Further investigation of the causes and
characteristics of this variability would be useful, especially with larger numbers
of speakers.
Doyle et al (1997) also endorsed the effectiveness of the automatic speech recog-
nition learning with the newer software products (six speakers: two mild, two
moderate, two severe), especially for severe dysarthria where, after six sessions, the
automatic speech recognition was still displaying learning improvements.
Increasing viability
In the UK research continues with STARDUST at Sheffield University and various
projects at Frenchay Hospital, Bristol. Our current work (Roberts, 2002) has
endorsed further the increasing viability of these software packages, certainly for
mild and some moderate dysarthria, and we have developed specific guidelines
to help the dysarthric user and their carers in the most effective configuration and
use of the facilities.
The computers and operating systems now available are well established
and give good value for money. As they are being marketed in a competitive
You talk -
but what does
if you want to find out how
clients can benefit from new
communication tools
technology can be adapted to
enable wider access
to choose software with
appropriate features
Read ths
left to right: Malcolm Joyce,
Claire Philpott, Peter Roberts
the remaining moderates and the severes could not use the systems because
they were not able to enrol (the initial process to enable the systems to
recognise the specific voice). However, as you will see, we were able to follow
a series of procedures that improved on this initial recognition performance.
Satisfactory user speech input was achieved with the small headset and
microphone supplied with the packages. The instructions included in the
package guide the fitting and positioning of the headset. The use of these
microphones gave a good tolerance to the effects of ambient noise and
acoustic characteristics of the surroundings. Some trials with hand-held or
desk-mounted microphones were less successful, as potentially improved
quality of the microphone was offset by increased susceptibility to the effects
of surroundings. We therefore recommend use of the supplied microphones.
Once the software is installed, an initial set-up process is carried out. This
enables the adjustment, usually automatic, of microphone loudness levels,
according to particular users.
Unique user set
At this stage a unique user set of information is initiated. A name is chosen
for this user, then subsequently re-selected as required in the future. The
variability of some dysarthric speakers can usefully be managed and accom-
modated by the user identification, as there is nothing stopping one user
progressively establishing him/herself with more than one user information
set. For example, if the persons voice has differences between early morning
and later in the day, with fatigue affecting loudness or pitch, then two user
names would be set up. Careful choice of names for each user set helps man-
agement and avoids later confusion.
situation, designers are continually looking to give equivalent and improved
facilities. This means there is considerable ongoing change and update to the
products, with the risk of compatibility problems and change to appearance
and procedures. Different versions, or releases, of software do change aspects
that are of particular relevance to the dysarthric user. A good example is the
improvement in flexibility of the IBM speech training/enrolment process in
release 10, from release 8. Further software releases may include changes to
the advantage of people with dysarthria. However, a progressively increasing
capability has a down side of potentially increased complexity of use or com-
patability problems that may be a particular disadvantage for disabled users.
Users need to maintain an awareness and alertness to these possibilities
when acquiring particular systems and ensure the use of appropriate profes-
sional advice as necessary.
It is especially important that users and carers avoid the frustration and
errors of an impatient, unprepared start, and build up an understanding of
the features of automatic speech recognition systems. A range of self-train-
ing tools is included with the software and specialist training is also available.
In our research we used proprietary software packages from IBM (ViaVoice)
and ScanSoft (Dragon; Naturally Speaking) with Pentium 3 and 4 computer
processors, and Microsoft Windows 98SE, 2000, and XP. The conditions of the
various speakers involved in the research are in table 1.
Table 1 Speakers involved in the research
patient ID severity M/F condition
mild1 mild male kennedy syndrome, slowly
mild2 mild male cerebellar ataxia, slowly
moderate1 mod male cerebellar ataxia, slowly progressive
moderate2 mod male stroke, static
moderate3 mod male multiple systems atrophy, slowly
moderate4 mod male kennedy syndrome, slowly
moderate5 mod male cerebellar ataxia, slowly progressive
moderate6 mod female stroke, static
moderate7 mod male cerebellar ataxia, slowly progressive
moderate8 mod male stroke, improving
severe1 severe male cerebral palsy, static
severe2 severe male head injury, static
severe3 severe male stroke, static
control1 male
control2 female
control3 male
In the initial analysis (figure 1) people with mild and some with moderate
conditions were able to make some use of the software. However, all of
these moderates had recognition performances of less than 50 per cent. Also
Most people are familiar with the idea
of using a keyboard to type up a
document, send an e-mail or play games
on a computer. Automatic speech
recognition software can also be used to
do these tasks; as the adverts say, you
talk, it types. But can this work when
the user has a speech difficulty such as
dysarthria? Peter Roberts, Malcolm Joyce
and Claire Philpott find out...
es it type?
Figure 1 Initial recognition performance

As initial set-up proceeds, the user is requested to progressively introduce
samples of their voice to train the software in the particular characteristics
of their voice. This is usually called the enrolment process.
This area of set-up gave difficulty to some dysarthric users. Early versions
(such as release 8 of IBM ViaVoice) demanded complete and correct reading of
the prompting texts on the screen, but some versions (including release 10, and
all versions of Naturally Speaking that we examined) allowed the user to skip
sections that were proving difficult, and progressed to a point where sufficient
data was achieved. We therefore recommend the more flexible enrolment.
Where enrolment was still proving difficult (even after trying the various
alternative texts provided) we had the possibility of changing the actual text
to be read. Figure 2 provides an indication of where this helped as shown by
the data labelled tailored enrol.
When the speech recognition system is being used, the software is continually
referring back to a vocabulary database held in the system. This database is
generally a large and reasonably complete vocabulary as supplied with the
system and is adequate for typical use. Associated features that can be of
value to dysarthric users are:
1. Different variants such as US or UK English can be selected.
2. Different supplementary databases such as legal or medical can be invoked,
according to the application area of the users.
3. It is straightforward to add vocabulary. Words and phrases, specific to users,
are added either by specifying (and speaking) particular items, or by letting
the software analyse typical documents relevant to the user.
4. As the system is being used, the opportunity is available for correcting mistakes
in recognition. When this is done as instructed, it has the effect of improving
accuracy of subsequent recognition.
However, where dysarthria is significantly affecting the ability of the software
to recognise the speech, an alternative strategy may be of value. If the vocab-
ulary database is deliberately reduced to a minimum set of words used by the
speaker in particular restricted circumstances, it can give the automatic
speech recognition a better chance of selecting appropriate matches to the
spoken word. Figure 2 indicates where this helped, as shown by the data
labelled reduced vocab, but it does place an equivalent constraint on the
use to which the software can be effectively applied.
All the systems examined are supplied with a series of commands built in
to the vocabulary, which carry out specific actions rather than just counting
as dictation words. The obvious needs are for commands for the speech dic-
tation, such as newline, uppercase, fullstop and so on. Further commands
are especially useful for navigation, including on Internet pages when using
a web browser, like back, forward, move down items or close. These fea-
tures are of potential value to disabled users, especially if keyboard ability is
difficult or limited.
With certain versions of the automatic speech recognition packages (usually
the higher specification / cost variants) it is possible for the user to add their
own commands and macros / shortcuts. These can be especially valuable to
disabled users, where a command can save considerable effort and time. For
example myaddress1 could insert the speakers full name and address auto-
matically. Alternatively contact1 could generate all the text concerning a
wish to meet or contact at a particular address and time.
You will need to examine closely specific details of all these features in the
specifications or manuals for the package in question.
Ongoing improvement
All the systems have a range of features to enable ongoing improvement of
the vocabulary databases as the system is being used. As more recognition is
being carried out, especially in the creation of letters and documents, the
correction of mistakes / misrecognitions can be set to build up improvement
to recognition. Documents being produced can also be used to build up better
automatic speech recognition knowledge of the speaker.
In general, you have to deliberately activate the above features, and cor-
rections to recognition mistakes have to be carried out in a particular way.
Specific documents have to be triggered to contribute to vocabulary update.
We recommend the use of these features where practical for the users.
Also, as users become more familiar with the speech input capability, they can
progressively integrate the automatic speech recognition with other applications
on their computer. Spreadsheets, internet browsers, and even graphics packages
can be speech enabled to give facilities especially valuable to a disabled user.
If you have a significant level of technical competence with computers, you
can make further changes and tailoring to enrolment and vocabulary. Be
warned, though, that incorrect changes at this level could disrupt the operation
of the software and the computer system as a whole, and can cause consid-
erable difficulty and frustration.
1. Changes to enrolment
For some dysarthric users, the available enrolment texts were not usable.
There was difficulty with certain words, or difficulty maintaining concentration
and interest in an enrolment process that was potentially very tiring. This
meant that enrolment was difficult, and in some cases not possible.
As an extreme test of this issue, we changed the enrolment text to a series
of much simpler texts used by speech and language therapists in the course
of their work. Figure 2 shows how this enabled successful enrolment for
speakers designated moderate5 and moderate6. The simplified enrolment
text also gave a better recognition performance for moderates 1, 2, 3 and 8,
although there was probably a limiting of the range of trained speech.
In another test, the text was changed to a passage more familiar to the
user, enabling a more relaxed and representative enrolment. Speaker desig-
nated mild2 in figure 2 shows the improvement in this case.
The change to enrolment text was carried out on the Scansoft/Dragon
Naturally Speaking packages versions 5 and 6. Initial searching is required to
find the file containing an existing enrol text (a datan.bin file) and carrying
out a series of cut and paste then re-save operations in the Microsoft
Notepad editor to replace with alternative text.
When these changes to enrolment are used, it is also important to watch for
any potential omissions in the vocabulary and to use the automatic speech
recognition vocabulary editor to fill any omissions relevant for the typical usage.
2. Changes to vocabulary
Automatic speech recognition systems come with various editors to enable
basic and more involved adjustment to vocabulary. You should consult the rel-
evant instruction manuals for more information.
To evaluate potential benefit of reducing vocabulary, we re-established the
speakers designated moderate 6,7 and 8 with a vocabulary limited to the simple
texts. This meant that, when using the software, the choices of vocabulary and
phrases likely to be spoken by that speaker were deliberately bounded. The
recognition tests were then also limited to those words and phrases.
Figure 2 shows the improvements in recognition accuracy achieved, but at
the cost of the heavily reduced scope of usage that is a consequence of the
limited vocabulary. For some dysarthric users, this situation may be regarded
as a benefit and improvement.
The vocabulary reduction in this test instance was achieved by using an
enrolment option with empty vocabulary, for example in Naturally
Figure 2 Enhanced recognition performance

Speaking, which is intended for specialist users to add specific vocabularies.
Some items of basic vocabulary are defined in the system so that they cannot
be deleted. We then had to ensure that the words of vocabulary needed for
the targeted use were added by analysing the relevant text documents.
We have summarised guidelines for the use of automatic speech recognition
by dysarthric speakers in figure 3. Our work suggests that current commer-
cially available automatic speech recognition products can be viable for mild
or moderate dysarthric users. This applies, to a reasonable extent, to even the
basic lowest cost options of the software (around 30).
Peter Roberts and Malcolm Joyce are based at Lancaster University, Lancaster
LA1 4YR. Claire Philpott is a speech and language therapist with Morecambe
Bay NHS Primary Care Trust, Lancaster LA1 4JT.
Please note that we give guidelines in the context of the software examined,
and to a level of features considered appropriate. This should not be taken
as a definitive recommendation, or a criticism, of any specific manufacturers
package. The manufacturers and distributors involved are very willing to give
help and advice to disabled users, in order to assist their use of the products.
To get easiest responses you should contact them by email rather than telephone.
Ahmed, W.W. (1985) Computer recognition of cerebral palsy speech. Proc
speech tech conf, 205-209.
Blaney, B. & Wilson, J. (2000) Acoustic variability in dysarthria and computer
speech recognition. Clinical Linguistics & Phonetics, 14(40): 307-327.
Doyle, P.C., Lepper, H.A., Kotler, A., Thomas-Stonell, N. Oneil, C., Dylke, M. & Rolls,
K. (1997) Dysarthric speech: a comparison of computerised speech recognition
and listener intelligibility. Jour rehabilitation research & dev 34(3): 309-316.
Ferrier, L.J., Shane, H.C., Ballard, H.F., Carpenter, T. & Benoit, A. (1975)
Dysarthric speakers intelligibility and speech characteristics in relation to computer
speech recognition. Augmentative & Alternative Communication 11: 165-174.
Holmes, J. & Holmes, W. (2001) Speech Synthesis and Recognition. Taylor &
Francis, ISBN 0 748 408576.
Roberts, P.E. (2002) Speech Recognition Technology for Dysarthric Speech
(pp243-248). In: Advances in Communications and Software Technologies,
WSEAS, ISBN 960 8052 71 8.
Thomas-Stonell, N., Kotler, A-L., Lepper, H.A. & Doyle, P.C. (1998) Computerised
speech recognition: influence of intelligibility and perceptual consistency on
recognition accuracy, Augmentative & Alternative Communication 14: 51-56.
AbilityNet at Warwick UK, tel 01926 312847 (
Lancaster University, contact
Step by Step
Work on the StepByStep software for people with aphasia is
progressing. The developers welcome contact from anyone interested
in becoming a beta tester.
Jane Mortley, Steps Consulting Limited, e-mail
Get animated
New items in the 2004 Don Johnston Special Needs sourcebook
include PCS Animations, aimed at helping pupils to learn about verb
meanings and tenses.
The animated Picture Communication Symbols include washing dishes
and brushing your teeth. They can be imported into other programs
such as Speaking Dynamically Pro, Clicker 4 and BuildAbility.
PCS Animations is 78+VAT, tel. 01925 256500.
Stammering Research
The British Stammering Association is to publish an on-line
international journal, dedicated to the furtherance of research into
stammering. The editor is Peter Howell of the Department of
Psychology at University College London.
Stammering Research, see
2004 Directory
The Contact a Family Directory of Specific Conditions and Rare
Disorders 2004 includes 30 new entries. Every entry contains a
medical description of the condition with details of inheritance
patterns and pre-natal diagnosis, and relevant support organisations.
Print edition 35, tel. 0207 608 8700, CD-ROM 88.13 (single user),
Do l consder both postve and negatve eects when
pannng change'
Do l read nstructons to ensure l make the best use o
a product'
Do l access hep n the vountary and academc sectors'
Figure 3 Guideline summary
Key points
use the microphones provided, and follow set-up procedures carefully
ensure adequate preparation and understanding of the features
before starting
take particular care and patience with enrolment
set up more than one user per person to manage variability
improve the vocabulary database
balance features and changes against potential complexity of use.
Specialist adaptations
try simplified enrolment to aid initial set up
modify the vocabulary database in a way appropriate to the user.
Afasic Helpline
A leaflet outlining the services offered by the Afasic Helpline includes
the opening times, recruitment of volunteers and the helpline
complaints procedure.
The confidential service provides support and information to everyone,
particularly parents, to enable them to secure appropriate help for
their children with speech, language and communication needs.
Copies from Afasic, tel. 020 7490 9410.
BSL software
Lets Sign & Write is a new development to support British Sign
Language (BSL) in education as a separate and equal language.
Over 700 BSL graphics, in both plain line drawings and full colour, can
be used with Widgits Writing with Symbols software, or in other
programs such as Microsoft Word and desk top publishing programs.
You can use them to create individualised materials, either with signs
alone, or with symbols, finger spelling and text.
A guide book includes information on supporting signers, a glossary
of the signs and ideas for creating resources.
Lets Sign & Write by Cath Smith is published by Widgit Software, tel. 01223
425558,, 35 single user, 45 single geographical site.
Understanding Aspergers syndrome
The author of Autism and Creativity: Is there is a link between
autism in men and exceptional ability? hopes his book will help
people with autism be more understood and included in a society in
which everyone has strengths and weaknesses.
Michael Fitzgerald (Henry Marsh Profession of Child and Adolescent
Psychiatry at Trinity College Dublin) suggests that several high profile
men, including Socrates, Lewis Carroll, Keith Joseph and Eamon de
Valera, could have had Aspergers syndrome.
Pub. Brunner-Routledge, 29.99.

You might also like