You are on page 1of 2

First Impressions of ViaVoice

Continuous Dictation Software from IBM

by Roger Fletcher

Since speech recognition software for computers became available three years ago, the
Holy Grail for all manufacturers has been continuous speech dictation. The earlier versions
required the speaker to insert a short pause of at least one tenth of a second between words.
Although for many people this was an easy skill to acquire, it nevertheless posed a
psychological barrier for others. In only three years we have moved from an isolated word
dictation model requiring a dedicated adapter to a continuous speech model using an
industry standard sound card.
My own experience has been limited to IBM products, and my experience of continuous
speech products to the beta version of IBM ViaVoice UK English.
Believing that the only way fairly to test new software is to use it in a real situation, I
recently used it to translate the Chinese radiation protection regulations signed by premier Li
Peng himself. This amounted to a total of over 4,000 English words in translation, which I
accomplished in less than a day.
However, the experience was not without its problems, one of which was shedding the
ingrained habit of inserting pauses between words. When I did this, little extra words were
inserted in the pauses! When I spoke at normal conversational speed, the accuracy improved
greatly, but every time I paused to think, the problem occurred again. This was clearly going
to be a major problem with translation.
It then occurred to me that I had raced through the enrolment at great speed in order to be
able to begin the translation. Wondering if this had been a significant factor in the
performance, I later re-enrolled and tried dictating again. The problem had largely
disappeared! It is therefore clear that the manner of enrolment is of crucial importance, and
some guidance may be useful here. In the UK English version, there are 473 sentences to
read. This can be accomplished in two stages, 100 sentences followed by the remaining 373.
I recommend reading the sentences at a moderate and comfortable speed while taking care
to pronounce the words correctly. Clearly if we pronounce “and” as in Rock ’n’ Roll, we can
expect recognition problems.
However, it is likely that any continuous dictation product will be more sensitive to, for
example, an intake of breath at the beginning of a sentence. I then discovered that the old
style of isolated word speech also worked for difficult passages where much thought was
required. What did cause the insertion of extra words was hesitation and lingering over a
word. The program has learnt during enrolment how much time a particular word takes, and
if we take three times as long while thinking, it will clearly try to interpret this as several
words.
As an indication of performance, I am dictating this article at normal conversational speed,
and the first error occurred with the name of Li Peng, which is not entirely surprising. It is
also not surprising that the names IBM and ViaVoice should appear correctly, but it is also
of note that the Holy Grail was also correct and correctly capitalised.
I have subsequently done several industrial chemistry translations, and the word urea
caused problems, coming out as either “your ear” or even worse “your rear.” However, after
a few corrections in context, the probability of correct recognition increased significantly.
Other delightful examples were “force for us” instead of “phosphorus” and, not
unreasonably, “potassium van a date” for “potassium vanadate.” After all, the previous
isolated speech models knew what constituted a word because of the silence before and after
the utterance, but continuous speech programs have no such obvious clue. Under the
circumstances, I am very pleasantly surprised at just how well it manages this task,
particularly with general vocabulary, which has already been thoroughly analyzed in the
process of creating the program.
Before doing the chemical translations mentioned above, I decided to give the so-called
Vocabulary Expander a workout. I opened the Expander window and from the File menu
opened a text file I had created by concatenating half a dozen previous chemical translations.
I clicked on the Analyze menu item, and a drop-down list of words in the translation file but
not in the ViaVoice vocabulary appeared. I selected those I wished to add by clicking with
the mouse while holding down the Ctrl key. After clicking the Add button, I was prompted
to record pronunciations of the added words as required. In the space of half an hour, I
added some 200 words.
One final word about system requirements: the minimum specified by IBM is a Pentium
166 MMX with 32 MB RAM for Windows 95 or 48 MB RAM for Windows NT. I have
been using it on a Pentium Pro 200 with 32 MB RAM and Windows NT. While processing
the enrolment, I kept receiving messages about low system resources, but it limped through
to the end. After upgrading to 64 MB RAM, performance was very noticeably faster.
I have been an enthusiastic advocate of dictation software ever since it appeared, while
many others, doubtless correctly, regarded it as an immature technology. With that
experience behind me, I can now say that dictating general texts with ViaVoice is, in
comparison with previous versions, a much more pleasant and relaxed affair—and at one
tenth of the original price, very much better value for money. However, I have lingering
doubts as to whether the continuous speech product is yet a complete replacement for the
isolated word model when dictating highly technical texts.

Editor’s Note: Click here for the review of another dictation software package appearing in
this issue of the Translation Journal.

© Copyright Gabe Bokor 1997


Send your comments to the Webmaster
URL: http://accurapid.com/journal/02dict1.htm
Updated 09/22/97

You might also like