from the New York Times
January 19, 1998
Voice Technology Appears Ready
To Recognize the Bottom Line
By DYLAN LOEB McCLAIN
W hen Charles Schwab, the brokerage company, decided late last year
to switch to an automated phone system for taking customer calls,
it put its trust in speech-recognition technology, which it would
have spurned a few years earlier.
[INLINE]
Credit: Don Hogan Charles / The New York Times
At Bear Stearns's trading floor, equity stocks trader Raymond Mazzo
uses a voice activated system to control his transactions.
______________________________________________________________
Schwab was skeptical at first, because the technology, long hyped,
had often proved disappointing. "We knew American Express had been
playing with it for about 10 years and hadn't really been able to
make a go of it," said Alan Nathan, head of new products.
After a pilot program drew favorable customer response, though,
Schwab went ahead. In October, the system began handling calls in
California, Oregon, Colorado and Washington, executing mutual fund
trades and providing stock quotes.
"We thought if we could get this to do quotes, hallelujah," Nathan
said. "We did not set out as an initial goal that we wanted to do
trading. It was better than we expected."
Many businesses express similar sentiments. After decades of
unfulfilled promises, speech-recognition technology may finally be
coming of age, moving beyond directory assistance and primitive
dictation software.
Within the last year, breakthroughs in programming and faster
computers have persuaded some companies to begin using it, while
others are investing significant amounts of time and money testing
it in the most grueling conditions, like on the trading floors of
brokerage firms and exchanges.
Experts in the field caution that there is a long way to go. While
speech recognition -- recognizing the words, not the speaker -- has
been widely available since AT&T started using it for its
long-distance service almost a decade ago, the software will have
to get better, and the hardware smaller and cheaper, before people
start having conversations with their computers, their cars or
their toaster ovens. Still, the technology now appears robust
enough to support large-scale development and business
opportunities. Consider the following:
* In September, Microsoft invested $45 million in Lernout & Hauspie,
which develops and licenses speech technology. Gaston Bastiaens,
Lernout's chief executive, said the company was working with
Microsoft to integrate the technology into Microsoft's operating
systems. Already Microsoft has given its Windows NT 5.0 system the
ability to read text out loud. (The plans for the latest version
of the operating system, Windows 98, do not, however, include
speech recognition.)
* IBM has created a division with 200 employees to work on speech
recognition. Its technology, called Viavoice, has been licensed to
software developers, like Edmark Corp. and Syracuse Language
Systems, and sold to consumers as dictation software.
* Lucent Technologies has developed a phone system that simply asks
what you want, not asking you to choose among various options.
Within two months it promises a software kit for creating desktop
computer speech applications.
* Ficomp Systems has developed a voice-controlled system for Bear,
Stearns that allows traders to record orders and check prices. The
two companies signed an agreement last month under which Bear,
Stearns will market the system to its 2,000 independent brokers.
Other concerns are also using or testing the system, including the
Chicago Mercantile Exchange, which has tried it out in four of its
trading pits over the last year.
* Wildfire Communications, a 6-year-old company that has received
financial backing from Microsoft and others, has created a
personal electronic assistant that listens in on phone calls and
becomes active when the user calls out its name. As in "Wildfire,
what is my next appointment?"
Pacific Bell Mobile Services, a unit of SBC Communications, has
just signed an agreement to offer Wildfire's telephone assistant
to its customers in the next three months, at a likely cost of $7
to $10 a month.
This research and development is finding a receptive audience among
businesses and consumers. Brian Lewis, editor of Speech Technology
magazine, estimated that sales of speech-recognition technology
totaled about $500 million last year, a figure he predicted would
double by 2001.
Why this sudden flurry of activity? What has changed?
From a technological point of view, analysts cite two catalysts.
One was the introduction of Intel's Pentium chip in 1994, which
finally gave computers enough processing power to run
speech-recognition software quickly. The other was the introduction
of a general-purpose dictation program by Dragon Systems last June
that was better at deciphering conversational speech.
[INLINE]
Credit: Edmark
This child is using Edmark's "Let's Go Read! An Island Adventure."
Children develop beginning reading skills, comprehension, and
vocabulary while interacting with characters. The program comes with
IBM Speech Recognition Technology so that the computer can listen,
interpret and respond when the child reads out loud.
______________________________________________________________
Speech-recognition systems work by converting sounds into
mawthematical models. Every word is made up of sounds with distinct
frequencies, and when a system detects those sounds, it translates
them into numeric equivalents. It then compares this information
with a data base of models, looking for a match. The newest systems
process normal conversational speech, which involves analyzing an
exponentially greater number of sounds than was possible a year
ago.
One factor in the rise of speech recognition is that costs have
come down. Wildfire's service, for example, cost $150 to $200 a
month a person when it was introduced three years ago because of
its dependence on expensive hardware. "Forty percent of the cost of
the system was the recognizer hardware," said Leslie Anderson, a
Wildfire spokesman. "We couldn't have more than 12 simultaneous
users on one system. Now that we have the recognizer in the
software, we can put several thousand people on one Pentium chip."
Competition has also driven costs down. When Dragon Systems
introduced its software called Naturally Speaking, it cost $700.
But after IBM started selling its Viavoice program for $99 in
August, the price of Dragon Systems' program began dropping.
Sunday, it matched IBM's price.
For some, the technology's ability to pay its way has been another
selling point. Most companies say they use speech recognition to
serve customers better, but some save money in the process. Schwab,
for example, estimates that its system handles as many calls as 300
operators would.
Even with all this interest, though, companies developing
speech-recognition technology are not necessarily cashing in.
Ficomp's product has been on the market for about 18 months, but
sales have been slow. Stephen Coryell, the company's head of
technology, said the obstacles included skepticism among traders
and the price of the system, which runs from $4,000 to $8,000 for
each trader's station.
"They come in and say, 'Gee, that looks great, but I'm not too sure
we want to go to the expense of it,"' Coryell said.
Nuance Communications, which developed Schwab's system as well as a
similar one that helps United Parcel Service customers track their
packages, is a privately held company with 55 employees that has
yet to turn a profit, said Steve Ehrlich, head of marketing.
William Osbourne, head of IBM's speech division, said that over the
years the company had invested more than $50 million developing
speech technology. It is only in the last few months, though, that
the investment has started paying off, although he would not say
how much the company has made.
With the recent developments in speech recognition, and despite
some enthusiasm, analysts and experts say it still needs to get
better. Even the best systems are accurate only about 95 percent of
the time. On a call to demonstrate its pilot program with USAA
Bank, for example, Lucent's system transferred the caller to
account services when directions to the bank were requested.
(Vendors are quick to point out that people do not get it right 100
percent of the time either.)
"These are first-generation products," said Jackie Fenn, an analyst
at the Gartner Group. "They will need some improvements."
Bastiaens of Lernout said the market for voice technology is just
now developing.
"When you see it becoming part of the lives of the average Joe, it
will start to move," he said.
How quickly the technology improves and is widely adopted depends
in large part on what Microsoft decides to do. If it makes a big
commitment to integrating speech recognition into its operating
systems, it would rapidly accelerate the speed with which speech is
used in computers and other types of machines. Right now, Microsoft
is following a go-slow approach, said Xuedong Huang, a senior
researcher at the company.
"We want to make computers so that you can really have a
conversation," he said. "It's going to take a long time -- more
than five years. Unfortunately, what people want is a real person.
There is a gap between what people want and what we can do."
Copyright 1998 The New York Times Company
|