VICUG-L Archives

Visually Impaired Computer Users' Group List

VICUG-L@LISTSERV.ICORS.ORG

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
Date:
Fri, 10 Dec 2004 20:25:57 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (220 lines)
 >     AP Worldstream
 >Monday, November 29, 2004
 >
 >Synthesizing human emotions
 >
 >By By Michael Stroh, Sun Staff
 >
 >Speech: Melding acoustics, psychology and linguistics, researchers teach
 >computers to laugh and sigh, express joy and anger.
 >
 >Shiva Sundaram spends his days listening to his computer laugh at him.
 >Someday, you may know how it feels.
 >
 >The University of Southern California engineer is one of a growing number
of
 >researchers trying to crack the next barrier in computer speech synthesis -
 >emotion. In labs around the world, computers are starting to laugh and
sigh,
 >express joy and anger, and even hesitate with natural ums and ahs.
 >
 >Called expressive speech synthesis, "it's the hot area" in the field today,
 >says Ellen Eide of IBM's T.J. Watson Research Center in Yorktown Heights,
 >N.Y., which plans to introduce a version of its commercial speech
 >synthesizer that incorporates the new technology.
 >
 >It is also one of the hardest problems to solve, says Sundaram, who has
 >spent months tweaking his laugh synthesizer. And the sound? Mirthful, but
 >still machine-made.
 >
 >"Laughter," he says, "is a very, very complex process."
 >
 >The quest for expressive speech synthesis - melding acoustics, psychology,
 >linguistics and computer science - is driven primarily by a grim fact of
 >electronic life: The computers that millions of us talk to every day as we
 >look up phone numbers, check portfolio balances or book airline flights
 >might be convenient but, boy, can they be annoying.
 >
 >Commercial voice synthesizers speak in the same perpetually upbeat tone
 >whether they're announcing the time of day or telling you that your
 >retirement account has just tanked. David Nahamoo, overseer of voice
 >synthesis research at IBM, says businesses are concerned that as the
 >technology spreads, customers will be turned off. "We all go crazy when we
 >get some chipper voice telling us bad news," he says.
 >
 >And so, in the coming months, IBM plans to roll out a new commercial speech
 >synthesizer that feels your pain. The Expressive Text-to-Speech Engine took
 >two years to develop and is designed to strike the appropriate tone when
 >delivering good and bad news.
 >
 >The goal, says Nahamoo, is "to really show there is some sort of feeling
 >there." To make it sound more natural, the system is also capable of
 >clearing its throat, coughing and pausing for a breath.
 >
 >Scientist Juergen Schroeter, who oversees speech synthesis research at AT&T
 >Labs, says his organization wants not only to generate emotional speech but
 >to detect it, too.
 >
 >"Everybody wants to be able to recognize anger and frustration
 >automatically," says Julia Hirschberg, a former AT&T researcher now at
 >Columbia University in New York.
 >
 >For example, an automated system that senses stress or anger in a caller's
 >voice could automatically transfer a customer to a human for help, she
says.
 >The technology also could power a smart voice mail system that prioritizes
 >messages based on how urgent they sound.
 >
 >Hirschberg is developing tutoring software that can recognize frustration
 >and stress in a student's voice and react by adopting a more soothing tone
 >or by restating a problem. "Sometimes, just by addressing the emotion, it
 >makes people feel better," says Hirschberg, who is collaborating with
 >researchers at the University of Pittsburgh.
 >
 >So, how do you make a machine sound emotional?
 >
 >Nick Campbell, a speech synthesis researcher at the Advanced
 >Telecommunications Research Institute in Kyoto, Japan, says it first helps
 >to understand how the speech synthesis technology most people encounter
 >today is created.
 >
 >The technique, known as "concatenative synthesis," works like this:
 >Engineers hire human actors to read into a microphone for several hours.
 >Then they dice the recording into short segments. Measuring in the
 >milliseconds, each segment is often barely the length of a single vowel.
 >
 >When it's time to talk, the computer picks through this audio database for
 >the right vocal elements and stitches them together, digitally smoothing
any
 >rough transitions.
 >
 >Commercialized in the 1990s, concatenative synthesis has greatly improved
 >the quality of computer speech, says Campbell. And some companies, such as
 >IBM, are going back to the studio and creating new databases of emotional
 >speech from which to work.
 >
 >But not Campbell.
 >
 >"We wanted real happiness, real fear, real anger, not an actor in the
 >studio," he says.
 >
 >So, under a government-funded project, he has spent the past four years
 >recording Japanese volunteers as they go about their daily lives.
 >
 >"It's like people donating their organs to science," he says.
 >
 >His audio archive, with about 5,000 hours of recorded speech, holds samples
 >of subjects experiencing everything from earthquakes to childbirth, from
 >arguments to friendly phone chat. The next step will be using those sounds
 >in a software-based concatenative speech engine.
 >
 >If he succeeds, the first customers are likely to be Japanese auto and toy
 >makers, who want to make their cars, robots and other gadgets more
 >expressive. As Campbell puts it, "Instead of saying, 'You've exceeded the
 >speed limit,' they want the car to go, "Oy! Watch it!"
 >
 >Some researchers, though, don't want to depend on real speech. Instead,
they
 >want to create expressive speech from scratch using mathematical models.
 >That's the approach Sundaram uses for his laugh synthesizer, which made its
 >debut this month at the annual meeting of the Acoustical Society of America
 >in San Diego.
 >
 >Sundaram started by recording the giggles and guffaws of colleagues. When
he
 >ran them through his computer to see the sound waves represented
 >graphically, he noticed that the sound waves trailed off as the person's
 >lungs ran out of air. It reminded him of how a weight behaves as it bounces
 >to a stop on the end of a spring. Sundaram adopted the mathematical
 >equations that explain that action for his laugh synthesizer.
 >
 >But Sundaram and others know that synthesizing emotional speech is only
part
 >of the challenge. Yet another is determining when and how to use it.
 >
 >"You would not like to be embarrassing," says Jurgen Trouvain, a linguist
at
 >Saarland University in Germany who is working on laughter synthesis.
 >
 >Researchers are turning to psychology for clues. Robert R. Provine, a
 >psychologist at the University of Maryland, Baltimore County who pioneered
 >modern laughter research, says the truth is sometimes counterintuitive.
 >
 >In one experiment, Provine and his students listened in on discussions to
 >find out when people laughed. The big surprise?
 >
 >"Only 10 to 15 percent of laughter followed something that's remotely
 >jokey," says Provine, who summarized his findings in his book Laughter: A
 >Scientific Investigation.
 >
 >The one-liners that elicited the most laughter were phrases such as "I see
 >your point" or "I think I'm done" or "I'll see you guys later." Provine
 >argues that laughter is an unconscious reaction that has more to do with
 >smoothing relationships than with stand-up comedy.
 >
 >Provine recorded 51 samples of natural laughter and studied them with a
 >sound spectrograph. He found that a typical laugh is composed of expelled
 >breaths chopped into short, vowel-like "laugh notes": ha, ho and he.
 >
 >Each laugh note lasted about one-fifteenth of a second, and the notes were
 >spaced one-fifth of a second apart.
 >
 >In 2001, psychologists Jo-Anne Bachorowski of Vanderbilt University and
 >Michael Owren of Cornell found more surprises when they recorded 1,024
 >laughter episodes from college students watching the films Monty Python and
 >the Holy Grail and When Harry Met Sally.
 >
 >Men tended to grunt and snort, while women generated more songlike
laughter.
 >When some subjects cracked up, they hit pitches in excess of 1,000 hertz,
 >roughly high C for a soprano. And those were just the men.
 >
 >Even if scientists can make machines laugh, the larger question is how will
 >humans react to machines capable of mirth and other emotions?
 >
 >"Laughter is such a powerful signal that you need to be cautious about its
 >use," says Provine. "It's fun to laugh with your friends, but I don't think
 >I'd like to have a machine laughing at me."
 >
 >
 >---------------------------------------------------------------------------
-----
 >
 >To hear clips of synthesized laughter and speech, visit
 >www.baltimoresun.com/computer
 >
 >The first computer speech synthesizer was created in the late 1960s by
 >Japanese researchers. AT&T wasn't far behind. To hear how the technology
 >sounded in its infancy, visit
 >http://sal.shs.arizona.edu/~asaspeechcom/PartD.html
 >
 >Today's most natural sounding speech synthesizers are created using a
 >technique called "concatenative synthesis," which starts with a prerecorded
 >human voice that is chopped up into short segments and reassembled to form
 >speech. To hear an example of what today's speech synthesizers can do, all
 >you need to do is dial 411. Or visit this AT&T demo for its commercial
 >speech synthesizer: http://www.naturalvoices.com/demos/
 >
 >Many researchers are now working on the next wave of voice technology,
 >called expressive speech synthesis. Their goal: to make machines that can
 >sound emotional. In the coming months, IBM will roll a new expressive
speech
 >technology. To hear an early demo, visit http://www.research.ibm.com/tts/
 >
 >For general information on speech synthesis research, visit
 >http://www.aaai.org/AITopics/html/speech.html
 >
 >Copyright © 2004, The Baltimore Sun
 >
 >http://www.baltimoresun.com/news/health/bal-te.voice29nov29,1,550833.story?
coll=bal-news-nation


VICUG-L is the Visually Impaired Computer User Group List.
To join or leave the list, send a message to
[log in to unmask]  In the body of the message, simply type
"subscribe vicug-l" or "unsubscribe vicug-l" without the quotations.
 VICUG-L is archived on the World Wide Web at
http://maelstrom.stjohns.edu/archives/vicug-l.html


ATOM RSS1 RSS2