Initial
enrolment
cannot
enrol
gudenes
SPEECH & LANGUAGE THERAPY IN PRACTICE SPRING 2004 +8
As initial set-up proceeds, the user is requested to progressively introduce
samples of their voice to train the software in the particular characteristics
of their voice. This is usually called the enrolment process.
This area of set-up gave difficulty to some dysarthric users. Early versions
(such as release 8 of IBM ViaVoice) demanded complete and correct reading of
the prompting texts on the screen, but some versions (including release 10, and
all versions of Naturally Speaking that we examined) allowed the user to skip
sections that were proving difficult, and progressed to a point where sufficient
data was achieved. We therefore recommend the more flexible enrolment.
Where enrolment was still proving difficult (even after trying the various
alternative texts provided) we had the possibility of changing the actual text
to be read. Figure 2 provides an indication of where this helped as shown by
the data labelled tailored enrol.
When the speech recognition system is being used, the software is continually
referring back to a vocabulary database held in the system. This database is
generally a large and reasonably complete vocabulary as supplied with the
system and is adequate for typical use. Associated features that can be of
value to dysarthric users are:
1. Different variants such as US or UK English can be selected.
2. Different supplementary databases such as legal or medical can be invoked,
according to the application area of the users.
3. It is straightforward to add vocabulary. Words and phrases, specific to users,
are added either by specifying (and speaking) particular items, or by letting
the software analyse typical documents relevant to the user.
4. As the system is being used, the opportunity is available for correcting mistakes
in recognition. When this is done as instructed, it has the effect of improving
accuracy of subsequent recognition.
However, where dysarthria is significantly affecting the ability of the software
to recognise the speech, an alternative strategy may be of value. If the vocab-
ulary database is deliberately reduced to a minimum set of words used by the
speaker in particular restricted circumstances, it can give the automatic
speech recognition a better chance of selecting appropriate matches to the
spoken word. Figure 2 indicates where this helped, as shown by the data
labelled reduced vocab, but it does place an equivalent constraint on the
use to which the software can be effectively applied.
All the systems examined are supplied with a series of commands built in
to the vocabulary, which carry out specific actions rather than just counting
as dictation words. The obvious needs are for commands for the speech dic-
tation, such as newline, uppercase, fullstop and so on. Further commands
are especially useful for navigation, including on Internet pages when using
a web browser, like back, forward, move down items or close. These fea-
tures are of potential value to disabled users, especially if keyboard ability is
difficult or limited.
With certain versions of the automatic speech recognition packages (usually
the higher specification / cost variants) it is possible for the user to add their
own commands and macros / shortcuts. These can be especially valuable to
disabled users, where a command can save considerable effort and time. For
example myaddress1 could insert the speakers full name and address auto-
matically. Alternatively contact1 could generate all the text concerning a
wish to meet or contact at a particular address and time.
You will need to examine closely specific details of all these features in the
specifications or manuals for the package in question.
Ongoing improvement
All the systems have a range of features to enable ongoing improvement of
the vocabulary databases as the system is being used. As more recognition is
being carried out, especially in the creation of letters and documents, the
correction of mistakes / misrecognitions can be set to build up improvement
to recognition. Documents being produced can also be used to build up better
automatic speech recognition knowledge of the speaker.
In general, you have to deliberately activate the above features, and cor-
rections to recognition mistakes have to be carried out in a particular way.
Specific documents have to be triggered to contribute to vocabulary update.
We recommend the use of these features where practical for the users.
Also, as users become more familiar with the speech input capability, they can
progressively integrate the automatic speech recognition with other applications
on their computer. Spreadsheets, internet browsers, and even graphics packages
can be speech enabled to give facilities especially valuable to a disabled user.
If you have a significant level of technical competence with computers, you
can make further changes and tailoring to enrolment and vocabulary. Be
warned, though, that incorrect changes at this level could disrupt the operation
of the software and the computer system as a whole, and can cause consid-
erable difficulty and frustration.
1. Changes to enrolment
For some dysarthric users, the available enrolment texts were not usable.
There was difficulty with certain words, or difficulty maintaining concentration
and interest in an enrolment process that was potentially very tiring. This
meant that enrolment was difficult, and in some cases not possible.
As an extreme test of this issue, we changed the enrolment text to a series
of much simpler texts used by speech and language therapists in the course
of their work. Figure 2 shows how this enabled successful enrolment for
speakers designated moderate5 and moderate6. The simplified enrolment
text also gave a better recognition performance for moderates 1, 2, 3 and 8,
although there was probably a limiting of the range of trained speech.
In another test, the text was changed to a passage more familiar to the
user, enabling a more relaxed and representative enrolment. Speaker desig-
nated mild2 in figure 2 shows the improvement in this case.
The change to enrolment text was carried out on the Scansoft/Dragon
Naturally Speaking packages versions 5 and 6. Initial searching is required to
find the file containing an existing enrol text (a datan.bin file) and carrying
out a series of cut and paste then re-save operations in the Microsoft
Notepad editor to replace with alternative text.
When these changes to enrolment are used, it is also important to watch for
any potential omissions in the vocabulary and to use the automatic speech
recognition vocabulary editor to fill any omissions relevant for the typical usage.
2. Changes to vocabulary
Automatic speech recognition systems come with various editors to enable
basic and more involved adjustment to vocabulary. You should consult the rel-
evant instruction manuals for more information.
To evaluate potential benefit of reducing vocabulary, we re-established the
speakers designated moderate 6,7 and 8 with a vocabulary limited to the simple
texts. This meant that, when using the software, the choices of vocabulary and
phrases likely to be spoken by that speaker were deliberately bounded. The
recognition tests were then also limited to those words and phrases.
Figure 2 shows the improvements in recognition accuracy achieved, but at
the cost of the heavily reduced scope of usage that is a consequence of the
limited vocabulary. For some dysarthric users, this situation may be regarded
as a benefit and improvement.
The vocabulary reduction in this test instance was achieved by using an
enrolment option with empty vocabulary, for example in Naturally
Figure 2 Enhanced recognition performance
standard
use
tailored
enrol
reduced
vocab
cannot
enrol
Speaking, which is intended for specialist users to add specific vocabularies.
Some items of basic vocabulary are defined in the system so that they cannot
be deleted. We then had to ensure that the words of vocabulary needed for
the targeted use were added by analysing the relevant text documents.
We have summarised guidelines for the use of automatic speech recognition
by dysarthric speakers in figure 3. Our work suggests that current commer-
cially available automatic speech recognition products can be viable for mild
or moderate dysarthric users. This applies, to a reasonable extent, to even the
basic lowest cost options of the software (around 30).
Peter Roberts and Malcolm Joyce are based at Lancaster University, Lancaster
LA1 4YR. Claire Philpott is a speech and language therapist with Morecambe
Bay NHS Primary Care Trust, Lancaster LA1 4JT.
Please note that we give guidelines in the context of the software examined,
and to a level of features considered appropriate. This should not be taken
as a definitive recommendation, or a criticism, of any specific manufacturers
package. The manufacturers and distributors involved are very willing to give
help and advice to disabled users, in order to assist their use of the products.
To get easiest responses you should contact them by email rather than telephone.
References
Ahmed, W.W. (1985) Computer recognition of cerebral palsy speech. Proc
speech tech conf, 205-209.
Blaney, B. & Wilson, J. (2000) Acoustic variability in dysarthria and computer
speech recognition. Clinical Linguistics & Phonetics, 14(40): 307-327.
Doyle, P.C., Lepper, H.A., Kotler, A., Thomas-Stonell, N. Oneil, C., Dylke, M. & Rolls,
K. (1997) Dysarthric speech: a comparison of computerised speech recognition
and listener intelligibility. Jour rehabilitation research & dev 34(3): 309-316.
Ferrier, L.J., Shane, H.C., Ballard, H.F., Carpenter, T. & Benoit, A. (1975)
Dysarthric speakers intelligibility and speech characteristics in relation to computer
speech recognition. Augmentative & Alternative Communication 11: 165-174.
Holmes, J. & Holmes, W. (2001) Speech Synthesis and Recognition. Taylor &
Francis, ISBN 0 748 408576.
Roberts, P.E. (2002) Speech Recognition Technology for Dysarthric Speech
(pp243-248). In: Advances in Communications and Software Technologies,
WSEAS, ISBN 960 8052 71 8.
Thomas-Stonell, N., Kotler, A-L., Lepper, H.A. & Doyle, P.C. (1998) Computerised
speech recognition: influence of intelligibility and perceptual consistency on
recognition accuracy, Augmentative & Alternative Communication 14: 51-56.
Resources
AbilityNet at Warwick UK, tel 01926 312847 (http://www.abilitynet.co.uk)
Lancaster University, contact pe.roberts@lancaster.ac.uk
SPEECH & LANGUAGE THERAPY IN PRACTICE SPRING 2004 +,
gudenes
Step by Step
Work on the StepByStep software for people with aphasia is
progressing. The developers welcome contact from anyone interested
in becoming a beta tester.
Jane Mortley, Steps Consulting Limited, e-mail
jpmortley@btinternet.com.
Get animated
New items in the 2004 Don Johnston Special Needs sourcebook
include PCS Animations, aimed at helping pupils to learn about verb
meanings and tenses.
The animated Picture Communication Symbols include washing dishes
and brushing your teeth. They can be imported into other programs
such as Speaking Dynamically Pro, Clicker 4 and BuildAbility.
PCS Animations is 78+VAT, tel. 01925 256500.
Stammering Research
The British Stammering Association is to publish an on-line
international journal, dedicated to the furtherance of research into
stammering. The editor is Peter Howell of the Department of
Psychology at University College London.
Stammering Research, see www.stammering.org/research.html.
2004 Directory
The Contact a Family Directory of Specific Conditions and Rare
Disorders 2004 includes 30 new entries. Every entry contains a
medical description of the condition with details of inheritance
patterns and pre-natal diagnosis, and relevant support organisations.
Print edition 35, tel. 0207 608 8700, CD-ROM 88.13 (single user),
www.cafamily.org.uk.
...resources...resources..
Do l consder both postve and negatve eects when
pannng change'
Do l read nstructons to ensure l make the best use o
a product'
Do l access hep n the vountary and academc sectors'
Reectons
Figure 3 Guideline summary
Key points
use the microphones provided, and follow set-up procedures carefully
ensure adequate preparation and understanding of the features
before starting
take particular care and patience with enrolment
set up more than one user per person to manage variability
improve the vocabulary database
balance features and changes against potential complexity of use.
Specialist adaptations
try simplified enrolment to aid initial set up
modify the vocabulary database in a way appropriate to the user.
Afasic Helpline
A leaflet outlining the services offered by the Afasic Helpline includes
the opening times, recruitment of volunteers and the helpline
complaints procedure.
The confidential service provides support and information to everyone,
particularly parents, to enable them to secure appropriate help for
their children with speech, language and communication needs.
Copies from Afasic, tel. 020 7490 9410.
BSL software
Lets Sign & Write is a new development to support British Sign
Language (BSL) in education as a separate and equal language.
Over 700 BSL graphics, in both plain line drawings and full colour, can
be used with Widgits Writing with Symbols software, or in other
programs such as Microsoft Word and desk top publishing programs.
You can use them to create individualised materials, either with signs
alone, or with symbols, finger spelling and text.
A guide book includes information on supporting signers, a glossary
of the signs and ideas for creating resources.
Lets Sign & Write by Cath Smith is published by Widgit Software, tel. 01223
425558, www.widgit.com, 35 single user, 45 single geographical site.
Understanding Aspergers syndrome
The author of Autism and Creativity: Is there is a link between
autism in men and exceptional ability? hopes his book will help
people with autism be more understood and included in a society in
which everyone has strengths and weaknesses.
Michael Fitzgerald (Henry Marsh Profession of Child and Adolescent
Psychiatry at Trinity College Dublin) suggests that several high profile
men, including Socrates, Lewis Carroll, Keith Joseph and Eamon de
Valera, could have had Aspergers syndrome.
Pub. Brunner-Routledge, 29.99.