VICUG-L Archives

Visually Impaired Computer Users' Group List

VICUG-L@LISTSERV.ICORS.ORG

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Jamal Mazrui <[log in to unmask]>
Reply To:
VICUG-L: Visually Impaired Computer Users' Group List
Date:
Mon, 25 May 1998 13:41:08 -0800
Content-Type:
text/plain
Parts/Attachments:
text/plain (1785 lines)
Voice input technology is increasingly becoming a mainstream
commodity, which has ramifications for people with disabilities..
Below are the two issues to date (current first)) of a consumer-
oriented newsletter on voice input/speech recognition, published
by the Center for Accessible Technology (http://www.el.net/CAT).

Regards,
Jamal

----------
 Voice Input Updates

produced by the Center for Accessible Technology

The Voice Input Update is a series of at least four quarterly
updates on the issues of voice recognition and people with
disabilities. The first year of the project is made possible by
a grant from the California Consumer Protection Foundation.

Articles are written by Paul Hendrix and Jane Berliss-Vincent.
Please request permission to reprint any part of this newsletter
by contacting the Center.

Voice Input Update #2 Spring 1998

Articles:

  * PowerSecretary: Mac Voice Input
  * Vendor Interviews: Clair Calhoon and Daniel Newman
  * Internet Surfing with SurfTalk and VoiceType Connection
  * Kurzweil's VOICEPlus
  * VoicePower has Limits
  * Writing vs Dictation (the personal experience of one staff
    person)
  * Current PC Voice Dictation Products (listing of
    requirements, features, cost)
  * Voice Input Demonstrations at CforAT

Voice Input Update #1 Winter 1997-98

Articles:

  * User Interview: Mark Hendrix
  * Discrete vs Continuous Speech
  * Focus on Dragon Systems
  * Hype Hype Hooray (Claims vs Reality)

Staying Current

People with disabilities have huge hopes for operating their
computers simply by speaking, hopes that may not match the
reality of current technology. Research has shown that having
realistic expectations is the biggest predictor for success in
using voice input technology for people with disabilities. As
voice input systems become more numerous, and commercial
advertising for these products becomes more pervasive, it
becomes increasingly important for consumers and those assisting
them to have clear information about this technology. Confusing
or misleading information can lead to consumer frustration and
failure even for those people for whom voice input may really be
the best solution.

This newsletter is part of a project of the Center for
Accessible Technology, made possible by a grant from the
California Consumer Protection Foundation. Our goal is to try to
reduce frustration for consumers with disabilities and prevent
the costly disappointment that occurs when a system is purchased
with inaccurate expectations. In addition to this newsletter, we
will be posting information on the Internet at
el.net/CAT/Vrnews, and offering special voice input
demonstrations. We encourage you to contact us to share
questions, comments, and information.

Voice Input Today

In no other area of assistive technology has recent development
been as dramatic as in the area of speech recognition. Having a
computer distinguish between different sounds and interpret them
as identifiable words is a very complex technical problem, and
one that requires powerful processors and large amounts of
computer memory. Recent advances in computer technology have
enabled developers of speech recognition products to achieve
results previously impossible on any but the largest mainframe
computers.

We have seen an amazing drop in the cost of speech recognition
products. Software superior to that costing several thousands of
dollars a few years ago is now available for a little over $100;
one DragonDictate program has dropped from $700 to $140 in the
past six months.

Although these advances have been remarkable, they can also be
confusing. A wider range of products is now available, with a
confusing range of capabilities, and the characteristics of
these programs are rapidly changing. In addition to being able
to understand the differences and similarities between the
various offerings, it remains important to recognize what can
and cannot be accomplished with a given program, or with speech
technology in general. Although speech recognition is becoming
more effective and easier to use in many areas, it is still not
necessarily a transparent, "out-of-the-box" solution for every
user.

----------
                  PowerSecretary: Mac Voice Input

Other than Apple's own PlainTalk software, which provided some
limited voice commands in the Macintosh operating system, and
some specialized single application programs (see article on
voice controlled Web browsers),PowerSecretary is the only voice
input product for the Macintosh. This software can be made to
work satisfactorily, although its features are limited. It is
cumbersome to set up and train, is not well supported, and
offers less for significantly more money than similar
Windows-based products.

In General

This review refers to PowerSecretary version 2.0.7, which we
tested with the supplied preamplifier and a Shure SM10A headset.
Although Dragon Systems claims that the software will run on a
33MHz 68040 processor, it requires a separate 16 bit sound card
on all but the AV model 68040-based Macintoshes. Our experience
was that operation on anything but a PowerPC was too sluggish;
the faster the processor, the better. The software requires
system 7.5 or greater, 32 MB of RAM is recommended, and it takes
up 25 MB of hard disk space. Dragon claims it will run under OS
8, but that use of the desktop picture feature of that system
version will cause it to crash.

PowerSecretary comes in three versions. The Personal Edition is
a single application dictation program with a 30,000 word active
vocabulary. It works with either Microsoft Word, WordPerfect,
ClarisWorks, or FileMaker Pro, and costs $199.00. The Power
Edition, which we tested, works with "most Macintosh
applications" and theoretically offers hands-free control of the
computer; it costs $395.00 and features a 60,000 word active
vocabulary. There is also a Medical version, which adds a
specialized medical vocabulary for $695.00.

As originally developed byArticulate Systems, PowerSecretary did
not work very well. It was subsequently purchased by Dragon
Systems, which improved its recognition accuracy considerably.
However, Dragon Systems does not appear very committed to the
support or development of this orphan product, and has announced
no plans to develop a Macintosh version of their newer
continuous speech programs.

PowerSecretary is a discrete speech system, which means that the
user must pause between words or utterances. The program's
operation strongly resembles DragonDictate, on which it was
based. Unfortunately, the program lacks many of the features
that make DragonDictate successful. Some of these problems
result from the limitations of the Macintosh operating system,
while some seem to proceed from poor program design and
lackluster support.

Awkward Interface

An example of the latter is the process for creating a new
user's voice file. The initial user's file is created
automatically. To create a file for another user, it is
necessary to hold the shift key down while double-clicking on
the program icon. There is a user menu displayed when the
program is running, but it appears to have no function; you
cannot use it to create a new user or to switch between users.
The program will always start by using the voice file of the
last user. To switch to another user, it is necessary to quit
PowerSecretary and double-click on the voice file of the desired
user. This is simply poor program design; there is no need for
such basic functions to be hidden from the user in this
non-intuitive way.

Similarly, the process of adjusting the microphone volume is
needlessly obscure and complicated. Unlike other voice input
programs that feature a utility to test and adjust the input
volume levels, PowerSecretary requires that these settings be
entered by hand, as numerical values from 1 to 100, in the
Preferences. Here is where the advantages of the Macintosh
should come into play. Unlike a Windows-based machine, which
could be using any one of a number of sound cards, a setting
that works on one PowerPC 6400/180 should work just as well on
the next. All Dragon need do is test each of the Macintosh
models for the appropriate settings and provide the user with a
list. Unfortunately, Dragon has not done this; the manual refers
you to the Dragon Systems Web site, which has settings for only
a few Macintosh models.

One of our test machines, a PowerBook 1400, was not listed at
all. The next closest model appeared to be the PowerBook 5300,
which was listed with a recommended volume setting of 79 with
the pre-amp in the "Hi" position. This did not work on the 1400.
The Dragon support technician did not know what the proper
setting should be, but suggested we "try" a lower setting. After
some experimentation, we found that a volume level of 15 worked
well, but this was basically a trial and error process, since
there is no way to test the settings other than by attempting
some dictation. Since this cannot occur until after a laborious
training process, the user runs the risk of spending hours
training the program only to find that the microphone has been
improperly set the entire time. Even a proper volume adjustment
might not be sufficient. We found that we also had to adjust the
"Minimum Amplitude Threshold" in the Recognition Preferences in
order to train individual words; otherwise, the program claimed
that the utterances were "too soft" (no matter how loudly we
yelled), although the volume level was adequate for reasonably
accurate dictation. Again, there is no utility for adjusting
this feature; you must try different values until one works.

There are some desirable aspects of the "hand adjustment"
features of this program. For example, you can adjust how long
the user can take to say a word (1 to 20 seconds), and adjust
the time you must pause between words. This fine tuning could be
very helpful for some users with dysarthric speech, who may have
difficulty with recognition because they break multi-syllabic
words into separate utterances.

Voice File Training

A major drawback of PowerSecretary is the process of training
the user's voice file. It is easily the most frustrating and
protracted of any voice input product. The user is first asked
to read a sentence, and the program then decides whether the
user fits one of two user models. A training sequence then
begins, which requires the user to first read a number of
commands, and then 91 sentences, of up to 26 words each:
approximately 2,000 words total. Any pronunciation of a word
which does not "match the user model" requires that the word be
read again; misrecognition of one of the many words in one of
the sample sentences will result in the entire sentence having
to be read again. Since most of the "sentences" are weird,
senseless tongue-twisters, like: "Carbon aluminum aerospace
material changes liquid absorption unlike other standard goods,"
it is almost guaranteed that they will have to be recited over
and over again. The manual indicates that the training will take
45 minutes to an hour; you will be doing well to accomplish it
in twice that time.

Once the initial training has been completed, the program's
accuracy continues to improve over time, if the user properly
corrects misrecognized words. It is also possible to
individually train misrecognized words using a Vocabulary
Manager that contains an alphabetical list of words and
commands, which can be trained or edited.

Operation

Once a voice file has been created, the program works fairly
well. Immediately after the initial training, we dictated a 114
word sample (the first paragraph of the last issue of this
newsletter) as quickly as possible while pausing between words.
This was accomplished in 2.23 minutes with 42 errors; a similar
test after the preliminary training of DragonDictate (on a
233MHz Pentium II with 64 MB RAM) took 2.30 minutes with 38
recognition errors. Reading in the same paragraph while using
the program's correction features, resulting in an error-free
finished product, took almost exactly the same time (5 minutes,
or 23 words per minute) on PowerSecretary and DragonDictate.
This is acceptable first-time performance for a discrete speech
system. Recognition accuracy improves as they are used.

During program operation, a window appears which contains a
microphone status indicator (on, off, and a recording volume
level) and a numbered list of the user's utterances. If a word
is misrecognized, a correction window can be brought up, which
contains a number of alternative words that the program "thinks
you might have said." If the correct word is displayed in the
list, it can be selected by speaking its number ("Choose 3"). If
the desired word is not in the list at all, it can be spelled,
using the international communication alphabet ("alpha, bravo,
charlie," etc.). Usually, the desired word will appear in the
list after one or two letters have been spelled out, since the
program just needs a hint, it already knows what the word
"sounds like." This correction box can be always present, set to
appear after a certain time lapse, or set to appear after the
user says "correct that." Recognition errors that are not
immediately noticed can be corrected by selecting the
misrecognized word from the numbered "history" list.

As with all discrete speech systems, this is a slow process. The
correction mechanism is complex and can be confusing for the
inexperienced user. At best, the process interrupts your train
of thought while writing. It is also a problematic interface for
people with visual difficulties, since it requires them to be
able to see the display to determine whether the computer
recognized their words correctly, although PowerSecretary does
allow adjustment of the font in the status and correction
windows.

PowerSecretary requires thorough training to use properly. Given
the painfully protracted enrollment process, it is important to
properly create and preserve a good voice file. This means that
it is crucial to set the program up and perform the initial
training carefully. Additionally, monitoring the program's
performance is important because the voice file is gradually
refined over time, as the program monitors what the user says in
relation to what the user accepts as correct. This can
eventually result in very good recognition accuracy, but
requires care in operation, since if a recognition error occurs
which is not corrected, the program assumes that it was correct,
and modifies the voice file accordingly. If numerous mistakes
are made, and not corrected, the quality of the voice file will
deteriorate, and recognition accuracy will become poorer and
poorer.

Hands-Free Problems

The primary performance shortcomings of PowerSecretary lie in
its "hands-free" features. Verbal control of the mouse is
accomplished by giving directional commands ("move up", "move
left", "stop"), which are slow and cumbersome. Most
Windows-based voice input programs do no better, but the Windows
operating system is less dependent on the mouse than the
Macintosh. In Windows it is possible to control the operating
system entirely with the vocal equivalents of keyboard commands.

This mouse-dependent feature of the Macintosh makes the command
and control features of PowerSecretary somewhat awkward. Most
voice input programs have some provision for creating macros -
strings of words or computer commands that are associated with,
and activated by, some voice command. PowerSecretary
incorporates a number of pre-made macros for operating Macintosh
programs, and contains a perfectly serviceable utility to
associate macros with voice commands. However, the creation of
the computer command macros themselves relies on Apple's complex
and obscure AppleScript scripting language. Since the Macintosh
does not lend itself readily to keyboard control, there is no
way around this. Voice commands for Macintosh keyboard shortcuts
(like Command-P for Print) can be easily created, but, unlike
with Windows, you cannot easily make a voice command to simply
display the items in a menu since there is no keyboard
equivalent to this mouse action.

As the PowerSecretary manual states, you can easily write macros
"once you become familiar with the AppleScript commands." This
requires familiarity with things like: "vDoKey [58,56,0]" or
"vLaunch [file "Disk:FrameMaker:TestPlan"]"- exactly the kind of
thing most people bought Macintoshes to avoid. More
significantly, the AppleScript language itself does not allow
complete control of the Macintosh; certain control panels and
features like hierarchical menus are simply not accessible with
AppleScript - and, therefore, not accessible with
PowerSecretary, unless activated by the awkward, voice
controlled mouse.

In the End . . .

The primary question for the consumer is how important the use
of the Macintosh platform is to you since, comparable Windows
programs are more fully featured, easier to set-up and train,
and cost nearly $300.00 less. If you are determined to use a
Macintosh, are willing to use a discrete speech system, can
accept limitations on the program's "hands-free" capabilities,
and have the patience and technical facility to get it working
properly, PowerSecretary is a perfectly usable voice input
system. If your primary interest is text dictation,
PowerSecretary's theoretical maximum of 35-40 wpm is not very
impressive compared to the 150 wpm achievable with continuous
speech programs like ViaVoice and NaturallySpeaking.
Unfortunately, there are no continuous speech programs for the
Macintosh on the horizon; for the foreseeable future,
PowerSecretary is it.

----------
                         Vendor Interview:

                 Clair Calhoon and Daniel Newman

Clair Calhoon is the owner and president of WorkLink - ADA
Solutions, a Berkeley store that stocks many computer items of
use to persons with disabilities and is one of the largest
Dragon vendors in the country (http://www.worklink.net/). Daniel
Newman is the owner of Berkeley Voice Solutions, a computer
consulting firm specializing in speech input hardware, software,
and training (http://pcvoice.com/). We asked both vendors a
series of questions about speech input.

1. Speech input systems are becoming widely available through
mainstream computer stores, often in "economy" models. Is speech
input becoming a consumer product that no longer requires
training?

C. Calhoon: No. Unfortunately, the media is giving the
impression that people can use speech input systems without
getting training, and people therefore think the ability to use
these systems just jumps into their brain. This may be the case
for some people who are computer literate and can make
inferences from the documentation provided with the program.
However, documentation is almost non-existent, and if people
don't get training they may not realize the real potential of
the system and consequently believe they've been stuck with a
bad product. It would be as if typewriters came on the market
for the first time; if people weren't trained to use them,
they'd think typewriters were just a piece of junk. Speech input
systems have a lot of features that are not documented. Some
trainers may not even know the whole range of capabilities of
the product.

D. Newman: Yes, speech input is definitely a consumer product,
and yes, it still requires tutoring. First, use of almost any
software can benefit from instruction&emdash;word processors,
databases, browsers. A minority of users can figure out the
whole product on their own. A majority of users can figure out
maybe 20% of functionality, which may be enough to do what they
need. The remaining users need some way to learn the extra 80%
of the program. They may get some functionality but not the full
performance depth of the product unless they get help from a
friend or an experienced instructor or they know computers. If
speech input is being used as an access tool, often 20%
knowledge is not enough to make the speech input program
useable. If the user has no upper limb use, what they can learn
easily will not give them the ability to use the computer
hands-free.

2. In your experience, how much training is required for
consumers to effectively use speech input? How is this training
time affected by factors such as age, previous computer
experience, etc.?

D. Newman: The key determinants of success are computer
experience, extent of disability, and desire to reach a level of
proficiency. Speech input will be easiest for experienced
computer users who know Windows 95, are familiar with word
processing and could use it if they could type, and only have to
learn the speech input program. It's harder for someone who
needs to use page layout programs, Email, spreadsheets, etc.,
since these programs have an additional level of complexity and
require more "command and control" - more mouse and keyboard
emulation. It's most difficult for people with little or no
computer experience, including those with DOS-only experience
since speech input is mostly based on Windows. Speech input
programs are built on the assumption that the user already knows
how to use the keyboard, the operating system, and the
applications.

C. Calhoon: There isn't a rule of thumb. Everyone, no matter how
computer literate, can use some intense instruction in use of
speech input systems, and this becomes more important for people
with limited computer experience. Age is not as important as the
ability to gain information from whatever print documentation
exists. Still, anyone who has less than 2-3 hours of training is
going to be frustrated, and the user needs to spend 2-3 weeks
working intensively with the program before they can become
comfortable with it. This is in contrast to the way these
products are marketed, which implies that no learning is
involved.

3) Do your customers who have some ability to use the keyboard
and mouse currently prefer discrete programs with hands-free use
such as Dragon Dictate, or continuous speech programs with more
limited mouse and keyboard emulation such as Naturally Speaking?

C. Calhoon: It's an incorrect assumption that Dragon Dictate is
totally hands-free; this can only be accomplished by use of
macros. In addition, the latest version of Naturally Speaking
can be used almost hands-free, but you cannot dictate text such
as file names; you need to input these by hand or by using
Dragon Dictate. The new Naturally Speaking Deluxe comes bundled
with Dragon Dictate Classic, and users can switch back and forth
between the programs using voice commands. Naturally Speaking is
currently our largest seller. Demand for Dragon Dictate has
dropped off significantly.

D. Newman: Continuous speech is better for almost all users.
NaturallySpeaking is the most versatile continuous speech
program currently available as far as hands-free ability. With
NaturallySpeaking, users can do dictation, editing, command and
control mostly hands-free. Keystrokes are only needed for about
5-10% of functions&emdash;saving files, doing some things on the
Internet, etc. If users need 100% hands-free access, it's
necessary to use NaturallySpeaking and DragonDictate at the same
time because DragonDictate will eliminate the need for those
extra keystrokes. DragonDictate alone isn't really an effective
solution for anyone because it's more difficult to learn and is
slower, although it might be useful for people with older
computers that can't run NaturallySpeaking.

4) In the future, what would you like to see speech input
systems do that they don't do now, or do better than they do now?

D. Newman: The mainstream of speech input product development is
focused on three main areas. The first is speech-independence,
or the ability for the system to recognize anyone's voice
without being trained. This isn't particularly useful for
disabled users, because those who are using speech input are
motivated to do the training in the first place. The second area
is accuracy. Current speech recognition engines break down the
voice into phonemes and sub-phonemes. The engines being
developed can recognize larger parts of speech, such as
syllables. This will be of definite benefit to disabled users,
because there will be fewer recognition errors and the programs
will be easier to use. Finally, there will be a redesign of
programs and operating systems to be more amenable to speech
input. For example, some programs now have "button bars" for
executing program operations. To access them, DragonDictate has
to mimic keyboard and mouse commands. But within the next year,
programs like Quicken and Netscape will come with a voice
interface option where items might have visual reminders of the
correct speech input command for executing the program
operation. This will come soon because it doesn't require a
technological breakthrough, just a design change. This will be
significant for disabled users, especially those with limited
computer experience.

C. Calhoon: I'd hope that systems become so sophisticated that
training will not be necessary. This will be accomplished
through faster computers, better recognition algorithms, and
more parsing features. I saw continuous speech recognition
demonstrated on a DOS system in 1992. It worked perfectly but
took about 5 times longer than it would on a current computer
with current recognition capabilities. With the advent of
computers with more RAM and better sound cards, speed has
improved. In addition, most speech input programs now have
better linguistic analysis so that more context clues are used.
The accuracy of Naturally Speaking is actually reduced if you
pause after words because it analyzes context. It will
accurately handle the sentence, "James Wright writes right now"
if you speak at a natural rate, but if you speak slowly it will
lose its ability to pick up contextual clues. In addition, there
need to to be educated trainers in schools and colleges.
Children could be more creative, more verbally fluent if they
are taught to use speech input at an early age by people who
know what they're doing.

5) Any other comments?

C. Calhoon: Some negative reviews in computer magazines about
speech input are doing a great disservice to people with
disabilities such as repetitive strain injury, amputees, and
quadriplegics. Some of these reviewers are pooh-poohing the
current technology and showing ignorance of the potentials of
these programs. Potential consumers should take these mainstream
reviews with a grain of salt, realizing that the information
included may be incorrect or superficial.

D. Newman: A lot of people ask about the effect of speech input
on the voice. Preventive maintenance&emdash;doing vocal
exercises, drinking plenty of water&emdash;are important and can
prevent vocal strain, especially when using discrete speech
products. This strain seems to be lessened for users of
continuous speech products, which is a breakthrough.

People should temper their expectations of speech input
technology in proportion to their computer experience. New users
may be able to use speech input to write and edit documents, but
not to do fancy formatting. People who already know how to do
the formatting will be able to do it using speech input.

Speech input doesn't make the computer easier to use. Having
said that, it's still an exciting time for speech input. A year
ago, the best speech input technology cost $1,700; now there are
entry level programs available for less than $100. The price of
computers has also dropped by half. These improvements and price
breaks are aimed at the general market, but disabled users have
already benefited from them, and this will continue. However,
training is still extremely important, whether it takes the form
of dilligent reading of the instruction manual, having a friend
provide assistance, or hiring someone as a trainer. If you just
plug in the program and start to use it, your experience won't
be as good as if you put in some time to teach yourself or get
some help.

----------
      Internet Surfing with SurfTalk and VoiceType Connection

If your primary need for voice access to the computer is Web
surfing, two specialized programs provide verbal control of a
Web browser for Macintosh and Windows 95. These programs only
work with the Netscape browser and do not provide general
computer control. However, within these limitations, the
programs work well, and are priced at only $15.

Surftalk

Surftalk is an application program that works with Apple's
PlainTalk speech recognition utility. This means that PlainTalk
must be installed on your computer. PlainTalk requires a PowerPC
and a PlainTalk-compatible microphone. PlainTalk can be
downloaded, for free, from Apple's web site
(www.speech.apple.com/ptk/#etts). A free demo of SurfTalk, which
operates for 15 days, can be downloaded from www.surftalk.com.

SurfTalk (and the underlying PlainTalk) are "speaker
independent." This means that they are "one-size-fits-all" voice
products, designed to recognize "standard" North American
English speakers without recognition training. This makes them
simple to use however, because there is no training possible,
the program may not recognize people whose speech is heavily
accented or otherwise "non-standard."

SurfTalk has a fairly limited command set. You can go forward
and back, scroll up and down, stop, go home, reload a page, and
add a bookmark. If you have the PlainTalk text-to-speech utility
installed, you can also turn on speech feedback that will
audibly tell you what SurfTalk is doing ("ready," "accessing
page", etc.). The program can be set to ignore your voice unless
you are holding down a "talk" key on the keyboard, ignore your
voice unless you preface your commands with "Computer" (or some
other word), or try to interpret everything you say as a
SurfTalk command.

The program has no provision for entering text, so that it
cannot be used hands-free if you need to enter an Internet
address or a search term in a search engine. This limits its
utility severely for some users, although if, like some people,
you generally access a few specific pages - like an on-line news
service - these can simply be bookmarked.

The real strength of this program is that bookmarks and
hypertext links can be activated simply by speaking their names.
For example, if there is a link on the page called "Services we
offer", you can simply say "See Services we offer" and that link
will be selected and activated. This resolves one of the most
significant problems of Macintosh web browsers - activating a
link without using the mouse.

VoiceType Connection for Netscape

VoiceType Connection is a little utility program based on IBM's
VoiceType/SimplySpeaking speech recognition technology. Designed
as an add-on for these programs, it is also usable by itself.
Like SurfTalk, it works in Netscape 3, and can be obtained at
(www.software.ibm.com/workgroup/voicetyp/vtconn.html).

VoiceType Connection (VTC) works similarly to SurfTalk. Basic
browser commands can be activated by voice, and links can be
selected and activated by reading their names. It is a good deal
more fully featured, and more complex as well. When the Web
browser is running, VTC displays a window listing links that can
be activated by voice. Green text in the window can be spoken
immediately, while grey text is not in the program's vocabulary
and must be trained. The grey words do not need to be trained if
the link contains enough green words for VTC to figure out which
link you mean when you say its name.

Toolbar and directory bar buttons in Netscape can be activated
by saying "Press <name of button>". Most (but not all) of the
menu bar commands also have voice commands associated with them.
The bookmark menu can be activated by voice and bookmarks
selected by name, or a separate set of "VoiceMarks" can be
created. Netscape News and Mail are controllable, but text
cannot be dictated into a mail message unless you also have one
of IBM's dictation products, such as VoiceType or
SimplySpeaking. VTC will also control the mouse, with "up, down,
left, right, run, and walk" commands.

Although it is not possible to dictate text with VTC alone, the
program does allow you to dictate numbers, characters, and
symbols, one by one. Although clearly inadequate for e-mail or
chat rooms, this would be adequate for entering an address in
the "Go To Location" dialog box, or entering a search term in a
search engine.

Summary

For negligible cost, these two programs offer significant
hands-free capability for Web browsing. VoiceType Connection for
Netscape provides fairly complete voice access to the Internet,
although it's breadth of features make its operation somewhat
complex to learn. Although SurfTalk does not offer complete Web
access, since text cannot be entered, it could be a helpful tool
for a user whose Internet access needs are limited, or who can
enter text with another assistive technology, such as switch
scanning. Indeed, except for the text entry component, SurfTalk
offers better Web access features than the only other Macintosh
voice input system, PowerSecretary, which requires links to be
activated with verbal mouse commands. In either case, these
programs provide a significant level of access for a $15.00
price tag.

----------
                       Kurzweil's VOICEPlus

VOICEPlus is a speech input system from Kurzweil, which has been
involved for many years with designing voice input systems for
niche markets such as medical and legal transcriptionists.
VOICEPlus, however, is designed as a mainstream product for use
on computers running Windows 95 or Windows 3.1. We tested it on
a machine with a 200 MHz/MMX processor chip, 64K of RAM, and
Windows95.

Training: Talk Now, Pay Later

Of all available speech input systems, VOICEPlus probably has
the shortest initial training time before you can start using
the product. The training starts with a standard VOICEPlus level
check, which includes the ability to set "Mike Gain," or
microphone sensitivity. A "VOICEPlus Profile" dialog box then
comes up asking you to specify your gender and age. If you
indicate that you are under 17, you may then specify if you are
under 10, between 10 and 12, or between 13 and 16 - which
demonstrates an unusual level of manufacturer awareness that
children or teens may be using the product. (However, when we
actually tried setting up this program for an 11-year-old boy,
the program issued error messages and refused to run until we
deselected the "Under 17" box.) Once a profile is selected and
you read a short list of suggestions (e.g., "Don't talk to
yourself,") you literally can go ahead and start dictating into
an application.

However, there is still training to do. After you have dictated
about 800 words, Kurzweil asks if you would like to "enroll."
Enrolling consists of speaking 400 vocabulary words (no
commands) and then waiting 1-2 hours for the program to process
your speech patterns. Since this improves accuracy, it is
puzzling why users should wait to speak 800 words (about 2/3 the
size of this article) before being encouraged to go through this
process, and even more puzzling why this enrollment option is
not automatically presented to first-time users. You may hit
"Cancel" at any time to exit the "enrollment." When you return,
the training will pick up where you left off.

The default training mode requires the user to hit a "Continue"
button to move to the next word. It is possible to select a
"train-without-confirms" option and bypass the need to press
this button. However, if you wish to correct a mispronunciation,
you need to press a "Back Up" button. There does not appear to
be a VOICEPlus command for activating this button (nor is there
apparently one for activating the "Continue" button).

Commands may be trained separately by VOICEPlus. This involves
opening an "Active Words" menu and selecting the "Train" option.
This lets you train 573 built-in commands. As always, there is
the trade-off between the time spent to train these commands and
improved recognition accuracy.

Taking Flight

Kurzweil provides a pleasant, basic tutorial program for
first-time users, whose theme (dictating instructions on folding
a paper airplane) is appropriate for kids without being
excruciating for adults. Avoid the tutor-menu option, which
supposedly lets you check on your progress within the tutorial
but provides no obvious means of letting you return to where you
left off.

When VOICEPlus is activated, a "Center" window appears. This
window contains an icon indicating the on/off status of
VOICEPlus and a Help function. It also includes a menu which
allows you to re-set the sound levels, set a variety of
recognition options and other preferences (e.g., the font size
of VOICEPlus windows), use a Recognition Wizard to fix some
common problems (such as differentiating between two words), and
check on the profile of the current user. This window is also
used to reflect the most recently spoken command or word.

VOICEPlus is a discrete speech product, which means you need to
dictate word-by-word instead of in blocks of text. When you
speak a word, the "Center" window reflects what VOICEPlus heard
you say. You can then continue or correct the word by saying
"correct-that." This brings up a "Take" window list of up to 5
possible words. You can choose a word from the list or type in
the word you want. If you want to back up and fix a problem, you
can do so within 20 words by using a "back-up" command and then
saying "move-on" to return to where you left off. Microsoft
Word, Word Perfect, and WordPro also support a "Point and Fix"
feature, which allows you to fix words within an adjustable
range of text stored in memory. This range can be between 20 and
5,000 words, but the wider the range, the more memory is eaten
up by this feature.

There are no separate "modes" per se in VOICEPlus, but it
appears to distinguish reasonably well between dictation and
commands. VOICEPlus has built-in support for commands for a
variety of Microsoft, Corel, and Lotus products. You can use the
"Active Words" window to name and define other commands, and
then use this window's "Train" feature so that VOICEPlus will
recognize this new command when spoken.

Hands-free? Yes, but...

For users who cannot use a keyboard, there is a Keyboard window
that lets you enter words by vocally spelling with the military
alphabet (alpha, bravo, charlie, etc.). Although it isn't
documented, you can also insert non-alphanumeric characters this
way (apostrophe, asterisk, percent sign, etc.) Unfortunately,
this window does not appear to work with the "Take" window, so
if a word is not heard correctly and the "Take" window list does
not include the desired option, the expectation is still that
the user can manually type in the desired word.

Besides the Keyboard window, there is also a Mouse window, plus
a range of Mouse-related commands (click, left, etc.) The Mouse
window lets you specify a number of pixels plus one of four
directions for the mouse cursor to move. While users are likely
to acquire the ability to estimate how far the cursor moves for
a given number of pixels, this feature lacks the precision and
flexibility of Dragon's MouseGrid utility (which, for example,
allows you to move the cursor diagonally).

The documentation lists a few things that VOICEPlus cannot
accomplish through VOICEPlus commands alone. These include using
the control menu (which controls sizing and closing of the
window) on the "Center" window and several features connected
with first-time use. More importantly, Windows cannot be shut
down if VOICEPlus is active, so that hands-free users will
require some assistance to turn the computer off. VOICEPlus can
be put in the Windows "Start Up" group so that it will activate
automatically when the machine is turned on.

Summary

VOICEPlus is a powerful program that tries very hard to live up
to the speech input marketing cliches of "Dictate right out of
the box!" and "Totally hands-free use!" Ultimately, there are
problems with both of these claims. For accuracy's sake, you
will eventually need to go through a training period at least as
rigorous as those of competing products. Although several
hands-free features are included, there does not appear to be an
overall sense of serving users who truly have no ability to use
the keyboard or mouse at all. It's a program with a lot of
potential that with luck (and consumer input) will become more
useable in the next version.

----------
                       VoicePower Has Limits

VoicePower is a voice input product from VoiSys
(www.voisys.com). It is advertised as a product that enables you
"to control your PC using only your voice - completely
hands-free!" Although this claim is technically true, the
program's functions are extremely limited.

VoicePower works with Windows 95 or NT and comes with a headset
microphone. It requires a Windows-compatible sound card, and has
modest system requirements. It sells for approximately $50.00.

When running, VoicePower displays a small toolbar, which
contains a microphone volume indicator, five buttons that turn
on and off the various VoicePower features, and an "Options"
button, which gives the user access to the training and settings
dialog box. The five feature buttons can be activated by voice,
three of them can also be activated with a mouse click. The
"Options" button cannot be activated by voice.

"Complete hands-free control of Windows 95 or NT"

The first button on the toolbar controls the "Typing Mode."This
mode allows the letters, numbers, function keys, and some other
keyboard keys to be controlled by voice one at a time. Typing
letters requires you to use the military alphabet. For example,
to type the word "keyboard", you would activate the typing mode
and say "kilo-echo-yankee-bravo-oscar-alpha-romeo-delta." There
is no provision for dictating whole words, only letters.

Since the Windows operating system can, for the most part, be
controlled from the keyboard, this feature might give you
"complete hands-free control of Windows."Although it would be
pretty cumbersome, you can use the program to print a document,
for example, by saying "Alt Key-foxtrot" to activate the File
menu, and "papa" to activate the Print command. However, not all
keyboard keys are made voice accessible. For instance, there are
no Control or Arrow keys, so it is difficult to see how
"complete hands-free control" could really be accomplished, even
dictating each keystroke seperately.

"Never click the mouse again"

The next button on the toolbar activates the "Mouse Commands"
mode. This allows voice control of mouse movement - the user can
command the mouse to start moving in various directions, to
stop, and to click and drag. This is a fairly cumbersome way to
use a mouse, but it does work.

"Launch any Windows application on your computer by voice"

Another button activates the "Verbal Launch Pad." This feature
allows you to search through the files on your computer and
associate a voice command with an application program. This must
be done for each application you wish to launch by voice. Once
accomplished, opening the "Launch Pad" and speaking the proper
command will start that application. This is the only set of
commands the user can create.

Custom Commands

The limited control features of the program would be offset if
you could create voice macros for commonly used commands. For
example, if instead of using the verbal keyboard, you could
create a "print" command which would execute the key sequence
"Alt-F-P Enter," the program would have real utility as a
command and control interface.

"Surf the Internet by voice"

The most significant feature of VoicePower is a set of pre-made
macros for controlling Netscape or Internet Explorer by voice.
Many of the standard browser commands - forward, back, home,
reload - can be activated by voice. The bookmark menu can be
activated by voice, but selecting a bookmark requires the user
to say "menu down" repeatedly until the desired bookmark is
reached. The verbal typing mode, nearly worthless for general
computer operation, is adequate for the limited typing required
for browser operation, i.e. entering search terms and URLs.

The program only supports version 3 of these programs. Since
version 3 of Netscape does not allow you to move from link to
link by pressing the tab key, links must be selected in this
program by using the verbal mouse controls. This process is
sufficiently cumbersome that effective browser control with
VoicePower is really limited to Internet Explorer. Moving from
link to link in this browser is accomplished by saying "next
link" repeatedly until the desired link is highlighted, and then
saying "enter."

This is fine, as far as it goes. It pales somewhat in comparison
to IBM's VoiceType Connection for Netscape, however, which
provides all of these functions, allows you to activate links
simply by speaking their names, and is available by download
from IBM for free. VoicePower is more complicated to operate
than VoiceType, only works with Netscape, and does not provide
for mouse or keyboard control outside of the Web browser.

"Multi-User System means no training required"

VoicePower claims to require no training of a voice file to
operate, although individual commands can be trained by
selecting them from a list and reading them several times. The
list is accessed by clicking on the Options button on the
toolbar; it cannot be activated by voice. The controls in the
training dialog box cannot be accessed by voice either, although
they can be accessed by verbal keyboard and mouse commands
(assuming that those are not the commands you need to train).

The Options button also allows adjustments to be made to the
accuracy and sensitivity of the system; again, these cannot be
directly made by voice. There is no utility for testing these
settings; the process is one of trial and error. Accuracy is
adjustable from 0 to 4000, and sensitivity from 0 to 2000.
VoiSys recommends settings of "about 1997" and "about 203"
respectively.

I found it necessary to train several commands for consistent
recognition, although the program did work fairly well out of
the box. This is hardly surprising, since the program only
recognizes some 200 utterances. A user with "non-standard"
speech, or even a pronounced accent, would have to train many
more.

In Summary

VoicePower does most of what it claims, technically speaking,
although unsophisticated buyers may think that claims like
"complete hands-free control of your PC" and "works with any
Windows program" mean more than the ability to type letter by
letter using your voice. But even these claims are not entirely
true. The fact that the keyboard emulator does not provide all
of the keyboard keys means that programs that require certain
keys cannot be controlled with VoicePower. Claiming that the
program will operate both Netscape and Internet Explorer is
disingenuous, since the inability to effectively navigate links
by voice in Netscape makes it practically worthless.

Misleading claims aside, this program would be useful for a
fairly narrow range of users. As a means of surfing the Internet
by voice, it is functional, but more sophisticated programs to
accomplish the same thing are available for free. As far as
general computer control is concerned, it costs the same as
Kurzweil's VoicePad or IBM's SimplySpeaking, which are both
vastly more sophisticated programs. Conceivably,VoicePower's
simple operation and lack of training might make it useful for
some users, but its extremely limited functionality would
seriously reduce this usefulness.

Its primary advantage is that there are no personalized user
files; anyone can walk up to the computer and use the voice
commands. This might be helpful on a publicly accessed system,
such as a library computer, where a very limited set of options
were available. Unfortunately, since there is no way to program
custom voice commands into the system, it would be of limited
use even in this setting.

----------
                       Writing vs Dictation

Some time ago, I injured my left elbow. While not a serious
injury, typing was painful, and I had a great deal of writing to
do. I was familiar with voice input software, and had a good
selection of programs and machines available to me as a staff
member of CforAT. Although I had not been using voice input
software for my day-to-day report writing, I was confident that
the transition would not be a problem.

Since I only needed to input text, I began working with
continuous speech dictation program. With some training and
adjustment, I managed to attain a 152 wpm input rate with a
better than 98% accuracy rate, just as the manufacturer claimed.

However, when I turned to my daily writing tasks (usually
multi-page text documents) it quickly became apparent that
although I was working hard, and the software was operating
well, my productivity was very low. Although my manual typing
speed is about 60 wpm, and I was dictating at a nominal 152 wpm,
it took me several times longer to complete a report by voice
than by hand.

The problem, of course, was that I wasn't writing, I was
dictating. Reading existing text, or making up phrases to test
dictation accuracy, or just speaking, is quite a different
process than composing a document. After years of writing using
a keyboard, my compositional abilities adjusted themselves to my
typing so that I composed text in my head at about the speed I
was able to type it. Consequently, I could sit down at a
keyboard to begin composing a document, and type steadily until
the document was finished.

This does not occur when I am dictating. I will speak a phrase
(at 152 wpm), and then stop. And think about the next phrase,
and maybe revise the first phrase, since it is there on the
screen. It's hard to achieve any consistent flow of information:
hearing my voice distracts me; seeing what I have already
produced distracts me; and, ultimately, I speak differently than
I write. The compositional process bogs down in very much the
same way it did when I was first learning to write. Then I did
the same thing, I would write a phrase, stop and look at it,
revise it, cross it out and start over, proceeding slowly in
little chunks. It took practice to be able to write fluidly, and
using voice input sometimes makes me feel like I'm starting over.

Of course, I'm not really starting over. I know how to write,
I'm just using a different technique for recording my thoughts.
It's getting rapidly better, but it does take practice. A
different part of my brain is in use. Learning how to dictate,
whether into a tape recorder or a computer, is a separate
process from learning how to use voice input software.

Obviously, if you are unable to comfortably use your hands to
type, and you need to generate text on the computer, learning to
dictate will probably not be an insurmountable hurdle. But if an
injury has interfered with your ability to do your job, and you
are planning to solve the problem with voice input, it's an
issue worth considering. The time it takes to return to
productivity may be much longer than the time it takes to learn
the software; you may have to figure in the time it takes to
learn how to dictate effectively, as well.

This is even more of a issue if you don't need to use voice
input, but are considering it because you believe it is "easier"
than typing. It may not be easier; it isn't for me, even though
the software works perfectly, and does everything the
manufacturer claims. The assumption often seems to be that no
one would type if they had a smoothly working voice input
system, but my experience suggests otherwise. It also suggests
that this is a problem that transcends the technology - that
voice input will not be a seamless substitute for typing no
matter how powerful and easy to operate the software becomes.

- Paul Hendrix,

Rehabilitation Technologist

----------
 Current PC

Voice Dictation Products

DRAGON SYSTEMS PRODUCTS

DragonDictate Classic

  * Discrete speech
  * Hands-free computer control for all applications
  * 30,000 word active vocabulary
  * Text-to-speech capability
  * Minimum 486/66 processor, 16+ MB RAM
  * $150

DragonDictate Power

  * Discrete speech
  * Hands-free computer control - all applications
  * 60,000 word active vocabulary
  * Text-to-speech capability
  * Minimum 486/66 processor, 16+ MB RAM
  * $695

NaturallySpeaking Personal

  * Continuous speech
  * Dictation into a dedicated word processor
  * 30,000 word active vocabulary
  * Single user only
  * Minimum 133 MHz Pentium processor, 32-48+ MB RAM
  * $100

NaturallySpeaking Preferred

  * Continuous speech
  * Dictation into dedicated word processor or MS Word 97
  * 30,000/45,000/55,000 word active vocabulary
  * mouse control
  * Digitized speech playback or text-to-speech
  * Minimum 133 MHz Pentium processor, 32-48+ MB RAM
  * $170

NaturallySpeaking Deluxe

  * Continuous speech
  * Dictation into dedicated word processor or MS Word 97
  * 60,000 word active vocabulary
  * Digitized speech playback or text-to-speech
  * Minimum 133 MHz Pentium processor
  * Macro creation capabilities
  * Includes Dragondictate 3.0
  * 32-48+ MB RAM
  * $695

IBM PRODUCTS

SimplySpeaking

  * Discrete speech
  * Dictation into dedicated word processor
  * 22,000 word active vocabulary
  * Minimum 100 MHz Pentium processor, 16+ MB RAM
  * $50

SimplySpeaking Gold

  * Discrete speech
  * Command & control, dictation into dedicated word
  * processor or MS Word
  * 64,000 word active vocabulary
  * Digitized speech playback or text-to-speech
  * Minimum 100 MHz Pentium processor, 16-32 MB RAM
  * $49

ViaVoice

  * Continuous speech
  * Dictation into dedicated word processor or MS Word
  * 22,000 word active vocabulary
  * Digitized speech playback or text-to-speech
  * Minimum 166 MHz MMX Pentium processor
  * 16-32 MB RAM
  * $70

ViaVoice Gold

  * Continuous speech
  * Command & control, dictation into dedicated word
  * processor or MS Word
  * 22,000 word active vocabulary
  * Digitized speech playback or text-to-speech
  * Minimum 166 MHz MMX Pentium processor
  * 32+ MB RAM
  * $120

KURZWEIL (LERNOUT & HAUSPIE) PRODUCTS

VoicePad

  * Discrete speech
  * Command & control and dictation into a dedicated
  * word processor
  * 20,000 word active vocabulary
  * $40

VoicePlus

  * Discrete speech
  * General computer control, multiple applications,
  * nearly hands-free
  * 30,000 word active vocabulary, & child voice profiles
  * Minimum Pentium processor, 24-32 MB RAM
  * $80

VoicePro

  * Discrete speech
  * General computer control, multiple applications,
  * nearly hands-free
  * 60,000 word active vocabulary, & child voice profiles
  * Minimum Pentium processor, 24-32 MB RAM
  * $100

Please Note:

€The minimum systems listed are taken from the manufacturer's
specifications. User feedback has indicated that using a faster
processor and doubling the amount of RAM works best.

€Prices are current catalog/retail at time of publication and
are subject to change.

----------
                    Voice Input Demonstrations

As part of a grant provided by the California Consumer
Protection Foundation, we are providing free demonstrations of
speech input technologies on 1 or 2 Thursdays per month at the
Center. These demonstrations cover ways to use speech input as
an adjunct or replacement for keyboard and mouse use, and how to
use them most effectively. Products that we are currently
demonstrating include DragonDictate (which comes closest to
being a hands-free system), NaturallySpeaking (Dragon's latest
continuous speech product), and Kurzweil's VoicePlus (reviewed
elsewhere in this issue), We plan to add additional programs as
we acquire them.

Technology users, parents, teachers and others have been taking
advantage of these demonstrations. One teacher commented, "It
was nice to be able to bring parents...to view new technology
for themselves."

Two sets of demonstrations are given each session: one at 4 pm
and one at 5 pm. The topics covered may vary depending on the
number and specific queries of attendees. No pre-registration is
required, so just show up!

For more information, contact Jane Berliss-Vincent on Tuesdays
or Thursdays at the Center, 510-841-3224.

----------
 User Interview

Mark Hendrix has been using personal computers since 1988. He
has a B.A. in film and significant experience in commercial and
computer art, including design of brochure covers, business
cards, and multimedia works. After working at two
computer-intensive jobs, he developed tendonitis in both hands,
severely limiting his computer use. However, since April he has
been doing computer work using DragonDictate on an IBM PC
computer.

How did you use computers after you developed tendonitis and
before you started to use DragonDictate?

I had to quit one of my jobs (coordinating class waiting lists
for UC Berkeley Extension). The other job (preparing reports for
an emergency attendant program) only required 4-5 hours of
computer work per month. Otherwise, I just stopped using
computers.

How long did it take until you felt Dragon Dictate was trained
to effectively recognize your voice?

It took me about one week where I used it every day for two to
three hours per day. The major problem that I encountered during
this time was a technical glitch. My CPU [the processing unit of
the computer] sits on the floor, and the microphone jack is in a
difficult-to-reach position. So that I wouldn't have to remove
the DragonDictate headset microphone every time I wanted to move
away from the computer, I bought and started using an extension
cable. Suddenly, recognition became
intermittent&emdash;sometimes fine, sometimes not working at
all. I exhausted all other possibilities until I realized the
extension cable was causing all the problems. I ditched it, and
although I now have to take the headset off every time I leave
my computer, I haven't had recognition problems since.

What do you use DragonDictate for?

Basically for business communication with the Corel WordPerfect
word processor; in most cases, I still prefer to handwrite
personal letters. It's on my to-do list to explore using Dragon
with the Excel spreadsheet program. However, I still find Dragon
somewhat intimidating. First of all, I was a Macintosh user for
eight years. Everyone says the Windows interface is just like
the Mac, but that hasn't been my experience. I taught myself all
sorts of complicated graphics programs with no problems, but now
I have to learn application programs and Dragon Dictate and how
they work together. I know I'm smart enough to learn it, but I
get boxed up and overwhelmed by what to do first, and end up
finding ways not to use the computer.

What are the strengths and weaknesses of DragonDictate?

The strength is that I can just dictate and print letters and
text, which I can't do in any other efficient or non-painful
way. The weakness is that Dragon is very complex. There's a
pop-up list that tells you what you can say at any point, but
you have to know what you're supposed to be saying to get what
you want accomplished; there's a steep learning curve. You have
to remember commands. Macros [creating shortcuts] also seem
difficult, and the manual is vast. Another problem is that it
can be hard to correct&emdash;I've sometimes gone through the
Spell Mode to correct a word only to have Dragon not accept what
I've spelled. This is fine for someone like me who can fix the
problem by typing, but not for someone with no typing ability at
all.

What should other people know when they're planning to buy
DragonDictate?

Make sure that your computer system, including your sound card,
is powerful enough to run both Dragon Dictate and additional
applications. Be patient with yourself and realize that using
Dragon is a tricky undertaking. If you can, figure out a support
system beyond the company you bought the system from to create
your own support network of Dragon users.

Mark Hendrix can be reached at [log in to unmask]

----------
 Discrete vs. Continuous Speech

One of the most significant developments in speech recognition
has been the appearance of continuous speech dictation programs.
Until recently, speech recognition has required "discrete
speech."That is, the user has to pause between each utterance,
so that the software can tell when one word ends and the next
begins. This results in fairly slow, cumbersome text dictation,
can interrupt the compositional process for some people, and may
lead to voice problems due to the unnatural mode of speech.
However, discrete speech systems can be very powerful, allowing
the user to operate the computer entirely or almost entirely by
voice, navigating through the operating system, controlling any
software, and generally functioning as a replacement or near
replacement for the keyboard. Examples of this type of software
would be Dragon-Dictate, IBM's SimplySpeaking, or Kurzweil's
Voice. As a general rule, discrete speech programs gradually
adapt to the user's voice over time, and require some practice
to master. Substantial training is required to learn to create
and maintain a voice file and operate the program properly.

Continuous speech programs allow the user to dictate text into
the computer in a more natural fashion, speaking at a normal
pace without the need to pause between each word. Although it is
necessary to speak clearly, and to articulate punctuation by
saying "period" or "comma", the process of using one of these
programs is much more similar to dictating into a tape recorder.
In most cases, the program operates off of a voice model built
in to the program or created by an initial training session; the
voice file does not change over time except when corrections are
made. These programs tend to be much less complex to operate,
largely because their function is much narrower.

Continuous speech programs are designed primarily for dictation
of text. They are not designed for hands-free control of the
computer. In most cases, the user dictates into the speech
program itself, and the resulting text is "cut-and-pasted" into
another application, such as a word processor or e-mail program.
Examples of this type of software would be Dragon's
NaturallySpeaking, IBM's ViaVoice, or Lernout & Hauspie's Voice
Xpress. Some software allows for dictation directly into a word
processor, and the development trend will be towards integrating
speech input programs with other software. At the moment,
however, continuous speech programs should be viewed as
dictation programs, not as a way to "operate your computer by
voice". You cannot operate a spreadsheet program with a
continuous speech dictation program, or control the operating
system, or perform research on the Internet. It is important to
be aware of this distinction. Although they may merge in the
future, at present they perform different functions and meet
different needs.

If you are able to access the computer with the standard
keyboard to some extent, use the computer primarily for word
processing, and need to rapidly enter large amounts of text, a
continuous speech program may be ideal for you. For example, a
person who can use her hands but finds extended typing
uncomfortable, such as a person with RSI or a single handed
typist, who wants to use the computer primarily for writing,
would find this software useful. On the other hand, if you need
primarily hands-free access, or if you want to use your voice to
perform computer tasks other than word processing, such as
accounting, Internet access, or the use of specialized software,
a discrete speech program will be more suitable, despite its
more awkward dictation capabilities.

Continuous speech dictation programs are certainly easier to use
than discrete speech systems, since most people find talking to
the computer in normal speaking cadences to be more comfortable
than the forced process of halting between each word. This fact
suggests that voice input product developers will concentrate
more and more on expanding the capabilities of continuous speech
systems, at the expense of discrete speech.

This raises some concerns for those people whose speech does not
fall into standard patterns. Since discrete speech systems
analyze the sound of each utterance separately, and allow for
targeted training of individual words, they may be easier to
train for people whose speech is difficult to understand. The
complex speech models used for continuous speech programs may
make them more difficult to train for people with dysarthric
speech. We are not aware of testing of continuous speech
software that would determine how serious this issue is, but it
could be that some "improvements" in speech input technology may
render it less useful for those users whose speech is
"non-standard."

----------
 Focus on Dragon Systems

(This is the first article in a series considering the features
of specific voice input software in detail. Other manufacturers
products will be discussed in subsequent issues, as will new
releases and developments pertaining to reviewed software.)

Dragon Systems (Newton, MA, 800-825-5897) is one of the
principal developers of consumer voice input technology. Their
discrete speech product, DragonDictate, has been one of the most
widely used programs for many years. In the summer of 1997, they
were the first vendor to release a continuous speech voice input
product, NaturallySpeaking.

DragonDictate is the only completely hands-free voice input
product; it is possible to control all aspects of the computer's
operation by voice. This makes it a desirable product for those
people whose disability prevents them from using the keyboard
and mouse at all, such as those with no use of their upper
extremities.

DragonDictate is a discrete speech program, which means that the
user must pause between each utterance. Among other things, this
allows the program to distinguish between the dictated words "go
to sleep" and the command "gotosleep", which turns off the
microphone. In addition to slowing down the dictation process,
this unnatural halting, staccato speaking style can be hard on
the vocal cords.

The software comes packaged with a noise-canceling headset
microphone, which is suitable for most users, but must be
modified or replaced with a desk-mounted microphone for those
who have difficulty putting on and removing the headset. The
software requires that the user's computer be equipped with a
16-bit sound card (preferable a Creative Labs Soundblaster),
into which the microphone is plugged. DragonDictate requires at
least a 486/66 processor and 16 MB of RAM; as a practical
matter, it functions more effectively with a Pentium processor,
and, as is true with all voice input software, its performance
is noticeably improved by faster processors and more RAM.
DragonDictate comes in two versions, one with a 30,000 word
active vocabulary and one with a 60,000 word active vocabulary.

DragonDictate has two modes for receiving input, "dictate mode"
and "command mode." In dictate mode, the program assumes that
the word it executed. For example, in dictate mode, saying the
word "file" will cause the letters "f-i-l-e" to be typed; in
command mode, saying the word "file" will cause the File menu to
be activated (the actual keyboard entry to achieve this would be
"ALT-F"). The necessity of switching back and forth between
these modes can be initially confusing, and adds to the
complexity of the software. However, it makes it easier for the
program to accurately recognize words. In command mode, the
options are limited to the list of commands appropriate to the
software that is currently running, and the software need not be
able to distinguish between other words. This means, for
example, that when one is using the Windows calculator, the
program needs to be able to recognize numbers and arithmetic
commands like "plus" and "divided by"; it does not need to be
able to recognize the word "elephant".

When the program is operating, a menu bar appears at the top of
the screen, displaying the DragonDictate commands, a microphone
status indicator (on, off, and a recording volume level), a box
that indicates the program currently running (and the
corresponding vocabulary set the program is prepared to
recognize), and a box that displays the program's interpretation
of your last utterance. When you speak into the microphone this
latter box displays what DragonDictate "thinks you said." As you
speak, a drop-down list appears underneath the box, which
contains a number of alternative words that the program "thinks
you might have said." If the word at the top of the list, the
program's "best guess", is correct, you accept it by speaking
the next word you want to dictate. If the word at the top of the
list is not correct, but the correct word is displayed elsewhere
in the list, you can select it by speaking its number ("Choose
3"). If the word you want is not in the list at all, you begin
to spell it, using the international communication alphabet
("alpha, bravo, charlie," etc.). Usually, the desired word will
appear in the list after one or two letters have been spelled
out, since the program just needs a hint, it already knows what
the word "sounds like".

There is also a mechanism for correcting recognition errors that
are not immediately noticed. Saying "Oops" will cause a window
to be displayed in which you can review the last few (up to 32)
words dictated, select the appropriate word, and delete or
correct it, using the same procedure. Editing existing text can
be accomplished by moving the cursor through the document by
voice, and selecting words or phrases to be modified (e.g.,
"move up three lines, copy previous six words, move down two
lines, paste").

This user interface is functional, but it poses some problems.
It is slow, requiring the user to pause between each word, watch
the display box to be sure the computer recognized the word
properly, and scan the drop-down list if it did not. It is
complex and can be confusing for the inexperienced user; even at
best, the process interrupts your train of thought while
writing. It is also a difficult interface for people with visual
difficulties, since it requires them to be able to see the
display to determine whether the computer recognized their words
correctly.

Monitoring the program's performance is important because of the
way it learns to recognize your voice. When a new user file is
created, the software takes voice sample by having the user read
sixteen words or phrases. This provides an initial model of the
user's voice, but not a very complete or accurate one. Over the
course of the next week or so of use, the voice file is
gradually refined, as the program monitors what the user says in
relation to what the user accepts as correct. This results in
very good recognition accuracy over time, but makes it essential
to pay attention to the program's operation, since if a
recognition error occurs which is not corrected, the program
assumes that it was correct, and modifies the voice file
accordingly. If numerous mistakes are made, and not corrected,
the quality of the voice file will deteriorate, and recognition
accuracy will become poorer and poorer. This "learn-as-you-go"
function can be turned off once an accurate voice file has been
developed, but patient, thorough correction of recognition
mistakes at the beginning is critical to the software's accurate
operation. This is one reason that proper training is so
important for this product.

This painstaking training process can be a boon for people whose
speech is difficult to understand. Although developing a voice
file may be a slow process, people whose speech is consistent
can achieve accurate recognition, even if their speech is almost
completely unintelligible to another person. Speech must be
consistent, however; the user must make the same sound for a
word each time. People whose speech is so slow that the
syllables of multisyllabic words are separately articulated have
the most difficulty because although the program can be adjusted
to some extent, if the user's speech is too slow, the program
will try to interpret each syllable as a separate word.

In general, training is essential for the proper operation of
this product. It is a complex piece of software that is able to
completely control the operation of the computer, and it has a
correspondingly complicated set of commands. There are specific
procedures for correcting errors, adding words, or changing
system parameters, which must be accurately followed; for
example, you must recognize that "Choose 3" and "Select 3" will
produce different results. It is very accurate if your voice
file is properly developed and maintained, but can be hopelessly
frustrating if the voice file is allowed to become corrupt and
recognition accuracy deteriorates. It is not a piece of software
that can be learned intuitively, or operated out of the box by
perusing the manual by any but the most sophisticated user. We
generally recommend between six and eighteen hours of training
for successful use of the software; at the $50-75 per hour
charged by most vendors, this expense far outweighs the minimal
($160) cost of the software itself.

Despite its cumbersome interface, slow text dictation, and the
necessity for intensive training, this program offers a powerful
option for people with disabilities. Essentially anything that
can be accomplished with the keyboard can be accomplished with
DragonDictate. It has a powerful macro generating utility, which
allows blocks of text, computer commands, or even mouse actions
to be associated with custom voice commands. It has
text-to-speech capabilities, which allow existing text to be
read aloud to the user. It incorporates a very effective mouse
control feature which, although not really suitable for graphics
work, allows for very rapid placement of the cursor. This
function, "MouseGrid" causes a numbered grid to be superimposed
on the screen. Selecting a numbered section of the grid by voice
causes that section to be further subdivided and numbered, and
the pointer to be moved into that section. Repeating this
process allows the cursor to be positioned anywhere on the
screen, usually with three or four commands. This is far
superior to other mouse position methods, which require the
mouse to be verbally "steered" ("mouse up, mouse left, mouse
right, stop, mouse down").

DragonDictate is not a program for everyone. It requires
thorough training, a systematic, detail-oriented approach to
computer operation, and patience. It is also very powerful,
allows the operation of essentially any Windows-based software
by voice, and offers true hands-free control. It can be a very
effective tool for the subgroup of computer users for whom these
features are important.

NaturallySpeaking is Dragon Systems' continuous speech product
offering. Like DragonDictate, it comes with a headset microphone
that plugs into the computer's 16-bit sound card. Its system
requirements are higher than DragonDictate's; it requires a 133
MHz Pentium processor and 32 MB of RAM; like DragonDictate, it
does better on faster, higher memory machines (e.g., 200 MHz
Pentium, 64 MB RAM).

Unlike DragonDictate, Naturally-Speaking is a program for
dictating text; it does not enable you to control your computer
by voice. The Personal Edition of NaturallySpeaking is
essentially a voice controlled word processor. Text is dictated
into this word processor, where it can be edited with voice
commands. The text can then be cut and pasted into another
application, such as a word processor or an e-mail program, but
the other application cannot itself be controlled by
NaturallySpeaking. NaturallySpeaking cannot be used to operate
other software, only to dictate text into its own word
processor, and accordingly does not allow for general hands-free
computer operation, although NaturallySpeaking itself can be
operated hands-free.

However, it is a much more effective tool for entering text than
DragonDictate. Most obviously, it does not require the user to
pause between each word. Although it is necessary to speak
clearly, and articulate punctuation, the user can otherwise
dictate in a normal voice, at a normal speaking pace, much like
dictating into a tape recorder. There is no need to error-check
each word as it is dictated; instead, the whole block of text
can be dictated, and errors corrected in the editing process.
This is a much faster, less cognitively interrupting way to
compose text, and does not put as much strain on the voice.

NaturallySpeaking also features improved correction and editing
functions. A misrecognized word or phrase can be corrected
merely by saying "Correct [the misrecognized word or phrase]".
The mistaken entry is then selected, wherever it is in the
document, and a drop-down list appears, presenting alternatives
that can be selected by number. If none of the alternatives are
correct, the phrase can be spelled by voice using either the
international alphabet ("alpha, bravo, charlie... ") or the
regular alphabet ("a,b,c... "), although recognition of the
international alphabet is more consistent.

A similar procedure can be used for editing. Words or phrases
can be selected by voice (e.g., "Select"can be used for
editing), and then modified: made bold, the font changed,
centered, cut, copied or pasted. The entire document can be
selected, and voice commands can be used to switch to another
running application, such as a word processor or e-mail program,
paste in the selected text, and switch back to NaturallySpeaking.

Because NaturallySpeaking accepts continuous speech, the
distinction present in a discrete speech program between a
dictated phrase ("go to sleep") and a command utterance
("gotosleep") does not exist; both will sound the same to the
program. As a consequence,NaturallySpeaking will sometimes
interpret a command utterance as dictated text. On the
infrequent occasions that this occurs, holding down the command
key while speaking will force the program to treat the words as
a command, while holding down the shift key while speaking with
force the words to be recognized as dictated text.

The process of training the voice file is also different from
discrete speech programs. Because NaturallySpeak-ing requires
samples of how the user strings words together, training
consists of reading several chapters of text into the
microphone. This can take an hour or so to accomplish, after
which the computer processes the file for 15-20 minutes.
Thereafter, its recognition accuracy is quite good, and although
it will improve incrementally as corrections are made, it does
not require the extended learning period that DragonDictate does.

Because of its more intuitive interface, and because of its
narrower capabilities, NaturallySpeaking requires much less
training in operation than Dragon Dictate. It is simpler to
operate but it does less. This software could be operated out of
the box by a knowledgeable computer user, and should only
require a few hours training in its operation, even for a novice.

NaturallySpeaking is an ideal piece of software for someone who
needs to enter large amounts of text into the computer without
using their hands, but who does not otherwise need hands-free
control of the computer. It would not be a good choice for
someone who needs complete hands-free control, or for someone
who needs voice control for any purpose other than entering text.

It is possible to achieve both simple, rapid text entry and
hands-free computer control by using both DragonDictate and
Naturally Speaking together. Version 3.0 of DragonDictate is
designed to work with NaturallySpeaking and lets you use voice
commands to switch between DragonDictate and Naturally-Speaking.
Operating both programs together requires 64 MB of RAM, and a
faster processor is well advised.

Dragon Systems has recently announced more elaborate versions of
NaturallySpeaking. The Preferred Edition allows dictation
directly into the most recent version of Microsoft Word, allows
for multiple users' voice files on the same installation of the
program, includes the MouseGrid mouse control utility, and
incorporates both text-to-speech functions and digitized
playback of the user's actual dictation. A Deluxe Edition
incorporates these features, an expandable active vocabulary,
and voice macros.

----------
 Hype Hype Hooray!

Research has shown that having realistic expectations is the
biggest predictor of the success rate of voice recognition
technology use by people with disabilities. Unfortunately, the
advertisements for voice recognition products often consist of
brief paragraphs that do not provide an accurate picture of the
actual product capabilities or the time and effort required to
make the maximum use of these capabilities. Users and VR
counselors may spend a one-time set of funds on this technology
only to have it not perform as expected, leaving the user
disappointed and frustrated and the counselor unwilling or
unable to purchase additional equipment. While there is always a
level of subjective preference&emdash;what seems an unacceptably
slow speech input rate to one user may be fine for
another&emdash;a basic understanding of what voice recognition
can and cannot do will be a valuable tool in consumer
empowerment.

As discussed in this issue, there are two primary types of voice
recognition products: discrete speech and continuous speech.
Discrete speech requires the user to pause between each word.
Continuous speech products allow the user to use a normal rate
of speech but provide only text dictation.

The following are some actual examples of vendors' product
claims, followed by what we have found to be more realistic
product expectations:

"100 to 160+ words per minute with up to 98% accuracy."
(Continuous speech product) What is almost never mentioned is
that this rate and level of accuracy is only achievable after an
initial training period that can take an hour or more, and a
user commitment to correct all errors (assuming the user is able
to properly recognize and fix their errors). Voice recognition
products work by "learning" a particular user's voice and
preferences. Just as you'd train a puppy by correcting it the
first time it does something wrong, rather than waiting until it
had developed a bad habit, you need to continuously correct
mis-recognized words to eventually approach the speed and
accuracy promised by the vendor.

In addition, many factors can affect accuracy on a given day,
including background noise or having a cold. Since voice pitch
changes during the day, even the time of day the product is used
can have an effect on accuracy.

"You can start dictating immediately!" (Discrete speech product)
This isn't entirely off base, since it doesn't say, "You can
start dictating accurately immediately!" Discrete speech
products usually have much shorter initial training periods than
continuous speech products, meaning the user can begin actual
dictation within a shorter time frame. The trade-off is not only
that the user will need to speak more slowly, but also that the
user will need to spend more time training the system while they
are dictating, pausing to select a correct word from a list or
spell the word they want. Most products will let you correct
mistakes if you don't discover them immediately, but this can
involve an even more laborious process of identifying the
incorrect word and then fixing it.

"Priceless...for those who can't type." (Continuous speech
product) Since this is a continuous speech product, the
assumption must exist that the user is still physically able to
make some use of the keyboard and/or mouse to transfer the
dictated text into a different application, move the cursor
around the screen, etc. If the user "can't type" because of
disability, this product may be useless instead of priceless.

"Context Sensitive...Monitors and catches all like-sounding
words." This would be believable if English homonyms (words that
sound alike but are spelled differently) were always different
parts of speech, like the noun "right" and the verb "write." But
consider the two following sentences, which sound identical but
mean different things: "He stood fast, defending his rights,"
and "He stood fast, defending his rites." Given the large number
of homonyms in the English language, it is inaccurate to promise
that the voice recognition technology will always make the
correct choice for the user, no matter how context-sensitive it
may be.

In contrast to some of the foregoing claims, we would like to
commend Keyboard Alternatives, a vendor based in Santa Rosa.
Their catalog description of DragonDictate gives a more
realistic, balanced view of what can be expected from the
product. While extolling the many real advantages of this
program, the ad points out that the user should not expect to be
able to operate the computer as fast as a two-handed typist, and
that learning and mastering the program requires serious
commitment and effort.

Because of the length of training necessary, it is not usually
possible for you to try out a continuous speech product before
buying, and a trial run of a continuous speech product will not
give you an accurate picture of what the product will be like
when it is fully trained. However, we recommend that you find
out as much about the product as possible. Go to a demonstration
at your local dealer (or here at the Center for Accessible
Technology) and ask questions. Don't be satisfied with the
phrases the demonstrator is using; bring your own sample text
and ask him or her to read it. If possible, see how it works
with applications you will need to use. If your workspace tends
to be noisy, see how the product works when extraneous noise is
present. Note how diligently the demonstrator corrects mistakes.
Once you have a realistic picture of what voice recognition
technology can do, you will be able to make a better decision as
to whether it should be any part of a solution for your computer
access needs.

----------
End of Document

ATOM RSS1 RSS2