Inspired by descriptions of new technology in the recent post about Web
Speak and the Productivity Works, I was soon surfing www.labyrinten.se.
This Swedish company has a beta copy of the DAISY digital book software
and www.daisy.org reports that a portable player based on CD technology
called the Plextalk is being manufactured. So far, I have not been able
to locate any material to read, but titles apparently are available in
Japan.
Here is a description of this technology which may someday enable us to
download talking books over the Internet.
--- from www.daisy.org ---
DAISY - Digital Talking Book System
A system presentation
This material presents a radically new way to record, store,
distribute and read talking books for print impaired individuals, i.e.
people who have difficulties reading printed information.
The system is built upon digital technology, using standard personal
computers and components as its hardware platform. The system is based
on a general concept called "Digital Audio-based Information System",
or "DAISY" for short.
The new talking book system is being developed by a project funded by
the DAISY Consortium, an international group of talking book producers
such as libraries and institutions. Software and hardware development
is carried out in collaboration with several commercial partners.
Definition
The term "talking book" traditionally refers to a recording of a human
narrator’s voice, reading out printed information from a book or
other printed publication. Talking books are normally produced under
special agreements with the publishers, and are only intended to be
read by blind or other print impaired individuals. The typical talking
book is lent to the readers, not given away or sold. The user (the
reader of the talking book) gains access to the printed information by
listening to the recording, using a suitable device and means to
control the playback.
A talking book is in this regard not the same thing as a "book on
tape", which is a recording of a book produced and distributed
commercially by a publisher. These books can be bought and read by
anyone.
There are also "electronic books" these days. These are not narrated,
but consist of electronic text that can be read by using e.g. a
personal computer.
The main design goal
While developing the data structure and data handling methods for
DAISY, some of the main objective has been:
To make the system capable of conveying the printed material’s
logical structure to the talking book reader, and to allow the user to
use that structure to navigate and read the material. This means that
the data representing the recording of the book should be possible to
organise and structure, so that it fully resembles the original.
To give the reader the same - or better - speed and flexibility in
accessing the book’s information as a fully able user has when
reading a printed book. This means fast, arbitrary access to any part
of the recording.
The system should be capable of storing the talking book data
efficiently. The goal should be that any talking book recording (even
with a length of up to 50 hours or more) could be stored and
distributed on a single mass storage medium, such as a CD-ROM disc. To
achieve this, audio data compression technology must be used.
The system should be independent of distribution media to be able to
adapt new distribution and storage technology as it develops.
Furthermore, the system should not be fixed to any particular method
for digital audio data compression, since these technologies will also
be evolving in the future. In short, the system must be "future-safe",
since precious recording efforts must be preserved, even though the
field of digital technology is far from mature yet.
Structured audio
The key to meet the design goals lies in the system’s ability to
intelligently and automatically divide the voice data representing the
talking book recording into manageable blocks or speech, each
representing a small, separate unit of information in the printed
original.
To carry out this task, the system uses an advanced voice analysis
technology while recording the narrator’s voice. The incoming
stream of digitised voice data is in this process broken down into
segments, based on the flow of the speech. For example, when the
narrator makes a slight pause to indicate the end of a sentence, the
recording system can use this pause to identify a unit, which is
referred to as a "phrase".
Even though a recorded phrase often corresponds to a sentence of text,
this does not need to be the case. The recording system can be set up
to detect anything from single words to whole paragraphs as phrases.
The structuring process is carried out automatically during the
recording of the talking book. The playback equipment does not need to
do any such processing, but can instead make use of the structure to
allow structured access to the talking book material.
The concept of the "phrase" is central to the system, since it is the
smallest informational unit the user can access. The system’s
automatic phrase division capability is the main structuring principle
for the talking book. The digital talking book is not stored as just a
stream of digitised voice data, but as a database of small data
objects. This concept is what makes the DAISY system unique, and it is
also what makes DAISY so well suited for reading of talking books.
Structured digital audio technology offers new and efficient methods
for recording and editing in the production stage of a book.
Structured audio can be stored or distributed efficiently, either on
mass storage medium or via network transmission. Most of all,
structured audio means that the reader can access the information in a
truly efficient way. By this technology, the talking book can become a
modern information tool.
Structured access
The DAISY book is normally structured to resemble the printed original
as closely as possible. The main navigational index is typically based
on the book’s Table of Contents (ToC), containing a number of
sections such as chapters, sub-chapters etc. The phrases that contain
the narrated speech in the "audio database" are organised into
sections that correspond to the different entries in the ToC. This
structuring is done as a part of the recording process, and means that
the reader can easily and quickly navigate the book by moving from
section to section in the ToC.
As the book’s Table of Contents is navigated, the headings are
announced by the system playing the relevant audio - i.e., the phrases
that correspond to the particular section heading. The reader is in
other words moving through a "talking table of contents". As headings
are announced, the playback position is also moved to the
corresponding location in the recording. When the user starts the
playback, the narration begins from the selected section.
Structured access does not stop with sections and phrases, though.
Each phrase of the audio database has a unique identity which defines
both its sequential placement in the talking book recording and also
which section it belongs to. A phrase can also have other attributes,
e.g. to identify it to be the first phrase on a new page, a phrase
with a link to another phrase, a phrase with a footnote and so on. By
making use of these attributes, the talking book material can be
navigated and read in many and powerful ways. Talking book access in
the future can therefore be similar to hypermedia or multimedia
access.
The transition to DAISY
A "tape transfer" module in the DAISY recording system allows old
talking book material - stored on master tapes - to be transferred and
converted to the new digital format. The transfer process is highly
automated and can currently work at twice the normal playback speed.
In the transfer process, the same voice analysis methods are used as
in the ordinary ("live") recording process. The audio information is
automatically converted to DAISY format as it is transferred to the
recording software. To further automate the transfer process, standard
index tones on the tape can be identified and used to automatically
break the material up into sections.
After the transfer, the operator can use the system’s editing
tools to clean up the material, insert text for the section headings,
adjusting the hierarchy of the talking book’s Table of Contents
etc. Page breaks and other attributes can also be defined as a part of
the editing work. The editing is efficient and flexible with full
audio feedback.
The recording system also has the ability to create analogue tape
versions of a DAISY book by means of a "Tape Manager" module. This
way, a producer will be able to provide books in both DAISY and
analogue tape format to a reasonable cost. This ability will be useful
during the transition from the traditional talking book to the DAISY
book.
Based around an open standard
The DAISY talking book system is built around the DAISY Data Format,
an open storage format specification for digital audio-based
applications, published by the DAISY Consortium.
The DAISY data format specification is available to anyone interested
in making their own implementation of it, e.g. to create their own
recording software or DAISY playback device.
The DAISY data format is suggested by the DAISY Consortium to become a
commonly accepted standard for digital talking books and structured
audio management amongst producers and libraries worldwide. This would
secure interlending of digital talking books and also help to reduce
costs for development of new systems.
The DAISY data format is advanced enough for making of highly complex
talking book material, and also for creating new kinds of electronic
publications making use of digital audio. The format specification
allows for other data types - such as text, graphics etc. – to be
stored together with the voice data, linked together on the phrase
level for synchronised presentation.
Even though the data format is advanced, it is also well suited for
storing materials of less complex nature than textbooks, cookery books
etc. All kinds of publications - ranging from poetry or leisure
literature to religious texts - can actually benefit from using the
DAISY format. The readers will use one type of access device and
reading methods for all kinds of books, even though the structure and
complexity of the material may vary.
Standard PC technology
The current talking book system developed by the DAISY Consortium
consists of software running on standard IBM-compatible PC:s with
multimedia capabilities (CD-ROM and audio input/output hardware). The
operating systems used are Microsoft Windows 95 and Windows NT. The PC
offers a generally available, low cost, open hardware platform, well
suited for running both DAISY recording and playback software.
As the system is based on the DAISY Data Format, the DAISY
Consortium’s system is just one of many possible implementations.
Other platforms than the PC might be introduced for recording and/or
playing back DAISY books
The system uses CD-ROM technology today, since it is currently the
most cost-effective medium for storage of large amounts of data. Other
types of media may come into the picture in the future, as they become
commercially or practically attractive.
As the DAISY data format as such is independent of storage medium, it
is fully up to each producer to choose the storage/distribution medium
or distribution channel. However, to allow for interlending between
libraries, a common medium is of great help, and so the DAISY
Consortium currently recommends CD-ROM technology.
If and when a new storage medium or distribution channel is introduced
amongst talking book producers, there will be a need for a general
acceptance of the new technology. Devices used for talking book
reading must be equipped with appropriate hardware to be able to
access the new medium.
Efficient audio storage
By performing sophisticated audio data compression, large amounts of
voice data may be stored on a single medium. As an example, a single
CD-ROM can hold up to 50 hours of recorded narration, allowing for
large books to be stored conveniently on a single disk. Even at 20-30
hours per CD, the sound quality of the voice is very good - as good as
or better than a good analogue tape recording. The audio quality may
be further increased for books with shorter playback length.
If the producer so wishes, it is fully possible to store more than one
publication on each medium. As even more compact audio storage will be
made available in the future, this will allow for a whole collection
of titles to be stored on a single medium.
The DAISY data format is not dependent on a particular audio data
compression technology, since it needs to be future-safe. The field is
evolving rapidly, and new technologies can be accepted, as they are
becoming available. However, the DAISY data format specifies a limited
number of data formats that must be supported as a minimum capability
by DAISY-compliant equipment. These data formats include PCM
(uncompressed audio data), ADPCM and MPEG-2 layer 2 compression.
Several different sampling frequencies and bit-rates are supported, to
allow for a good balance between data size and audio quality.
A new kind of production environment
DAISY books are recorded and edited using a dedicated recording
software package. The recording software runs on a modern, powerful PC
running Windows 95 or Windows NT.
The recording is made directly from microphone to a data medium, e.g.
a hard disk, a magneto-optical (MO) disk or the like. A narrator with
moderate computer skills should be capable of handling all, or most
of, the recording, editing and production process without assistance
form a studio or computer technician.
The audio processing equipment in the recording studio can be
virtually the same as for analogue talking book recording. What is new
is the PC instead of the tape recorder. The narrator or operator will
also have a new kind of remote control for the PC, and may also make
use of a normal PC keyboard for the work.
The structured approach used in DAISY means that a recording project
can be carried out in a non-linear fashion. The book can be narrated
in any order, and the recording can be done section per section. It is
possible to define the structure of the recording project before
filling it with voice data, which means that the narrator can navigate
the recording by using a Table of Contents on the screen of the
recording PC.
The recording system is flexible enough to allow also for a more
linear method of production. The narrator can then just start
recording and create new sections while reading along. The sections
can be named and organised at a later stage to resemble the structure
of the original book’s Table of Contents.
Mistakes made during narration can very easily and quickly be
corrected by using the keyboard or the remote control. Since the
recording is done in a phrase-based fashion, it is very easy to "punch
in" at the right place to correct mistakes or to add more data.
The narrator can place different attributes on certain phrases, such
as for indication of page break, beginning of a new paragraph and so
on. This can be done by pressing a button on the remote control or a
key on the keyboard while reading. If so desired, the phrase
attributes may also be set or edited later, when the recorded data has
been stored to disk.
The recording system offers highly powerful ways of inserting,
deleting, copying and moving recorded data. Working with structured
audio is rather similar to working with text in a word processor.
DAISY reading equipment
The same DAISY book can be read on several platforms. Today, there is
playback software available for standard multimedia PC:s , as well as
purpose-built reading devices.
The PC has the advantage of being useful for many other things than
just DAISY book reading, and offers advanced reading features such as
text searching, making of notes etc. The disadvantage of the PC is
that it still is rather expensive, at least if it is only going to be
used for talking book reading. Reading books on a personal computer is
also hardly not going to be the preferred way of reading by all users.
The advantage of a dedicated DAISY player is that it can be made light
and small in comparison with a typical PC, and also that it can be
produced at a lower cost. Such a player allows for a wider range of
reading situations. Its limitations lie in difficulty in expanding the
hardware, e.g. by adding reading devices for new storage media, as
well as in its lack of keyboard and screen, which in turn means that
advanced reading features can not be used.
Thanks to the DAISY data format, however, a CD-ROM containing a DAISY
book can be read on both platforms. The same book can therefore be
moved between different devices to allow for reading in different
ways. Advanced users, such as students, may have access to both a
personal computer and a purpose-built DAISY playback device.
Dramatically improved reader access
Usually, one talking book is stored on one distribution media,
typically a CD-ROM disc. The disc is distributed to the reader by
postal services. The disc is inserted into the CD-ROM device of the PC
or dedicated playback machine. The title of the book or books on the
CD can be determined by a special command, or be automatically
presented.
The user navigates and reads the DAISY book by using its talking Table
of Contents as the main navigational index. The ToC for the talking
book normally resembles the printed material’s ToC, though this
is fully up to the producer.
By navigating amongst the section headings in the ToC, the user
selects the desired playback position. The process of finding a place
in the book is thus similar to when reading a printed book. In fact,
it is even simpler and quicker, since the talking book reader does not
have to flip the pages of the book to the correct page, but merely
selects the desired heading and starts the voice playback.
The response of the system is fast. Typically, using a standard PC
with a CD-ROM gives random access times to sections of less than one
second. Dedicated playback devices may have the same or even better
performance.
When a PC is used for reading, the user has the ability to move
directly to a particular section by searching for a text string that
appears in the heading. The typical DAISY book has a Table of Contents
in electronic text format that complements the speech, and this ToC
can be used both for on-screen presentation and for searching.
When in a section, the user may brows the material much in the same
way as a printed book may be skim-read. The user may move the playback
position forwards and backwards in the material by phrase, by phrase
group or by page, depending on to which degree the talking book has
been indexed by the producer.
If the DAISY book has been page-indexed, the talking book reader may
instantly move the playback position to the phrase that represents the
first text of a new page in the original text. This way, the DAISY
reader can flip pages and search for particular pages in relation to
the printed original.
The system can move very quickly between speech units within a
section. By only listening to the first few moments of each phrase, it
is an easy process to find a place in the recording by simply
listening for it. Of course, the reader can also listen to the talking
book as a continuos narration. When the playback is started, it always
starts at the beginning of a phrase. The speech can be instantaneously
stopped at any point.
Playback voice control
The narration of a talking book can be done differently depending on
who is narrating as well as the nature and purpose of the talking book
material. The DAISY system offers several ways to change the speed of
the recording, without changing the pitch of the voice. Such features
are highly desirable by many advanced users, such as students. These
features can of course also be turned off to let the reader listen to
the unmodified version of the narrator’s voice. There are even
possibilities to slow down the narration relative to the original,
again without changing the pitch of the voice. This may be of
assistance to some users, e.g. when reading advanced material.
The ability to control playback speed is not an aspect of the DAISY
book as such, though the use of structured audio makes it relatively
easy to create these kinds of features in the reading software or a
playback device. Both the DAISY playback software and the dedicated
DAISY player that exist today support "Intelligent Time Compression"
or ITC for short. With ITC, the playback speed can be varied from 75%
or the original up to 300% speed, i.e. the recording can be played
back up to 3 times faster then it was recorded. Naturally, high
playback speed means that the speech will be harder to understand,
even if the pitch of the voice is not changed.
Other features for reading
Every phrase in the talking book - that is, the block of recorded
audio corresponding to something like a sentence in the printed book -
has a unique identification in the "audio database" that makes up a
DAISY book. This identity defines the placement of the phrase in
relation to the other phrases. It also identifies which section each
phrase belongs to.
This database-style approach allows for simple creation of links
between phrases and user data, as for example notes made by the
student in relation to a narrated textbook. Also the reader can mark
parts of special interest in the book, and then use these marked
phrases as a user-defined index for browsing of the talking book.
Bookmarks may be placed in the material at any phrase location. By a
simple command, the system can be instructed to move to a desired
bookmark. The references to the book s voice data take up very little
storage space, and they may therefore easily be stored on a local hard
disk on the playback system, or even be transferred to diskette for
use on another PC.
A special bookmark is automatically placed by the system when the
continuous playback is stopped. This bookmark is saved with the user
data and the system can use it to automatically restore the reading
position the next time the talking book is opened.
The user data is automatically stored on disk and is linked to the
currently opened book. The next time the same book is loaded, the
system automatically loads the relevant user data.
Dedicated playback devices can not normally offer all those
sophisticated features. However, they typically offer page search and
bookmark handling. They also keep track of user data for the DAISY
books last read.
Future versions of the playback software may include even more
features to make up the perfect environment e.g. for students or
dyslectic readers. For example, by storing the original book’s
text in electronic format along with the audio, the user may have a
speech synthesiser or Braille display connected to the system and use
it to e.g. spell out names in the text. With this set-up, the user can
also search for certain text strings that occur in the book, which
will add yet another access method to the system.
The ability to present text on screen in full synchronisation with the
narrated audio may be of great assistance to for example readers with
dyslexia. However, even though the DAISY data format allows for this,
dual-media books will cost more to produce than pure audio books.
Also, most legal systems have copyright legislation that will make it
hard or impossible to obtain the original book’s text in
electronic format for publishing.
Flexibility in distribution
The system has been desired to be as future safe as possible. Thus, it
is not fixed on any particular storage medium for the recorded data.
The producer can choose a suitable media for storage of master
recordings, and then produces the distribution media when necessary.
If the same media type is used both for master and distribution, DAISY
books for lending can be created by a simple and fast copying process.
Any storage media that can store ordinary data files under operating
system control may be used as information carrier for the talking
book. Furthermore, any information distribution channel that can offer
file transfer under a commonly supported control protocol can be used
for distribution of DAISY books.
The amount of data used by a DAISY book is rather big, at least in
comparison with electronic text, which means that network transmission
will only be feasible if a lot of bandwidth can be used at a low
price. However, as networks such as the Internet is rapidly evolving,
it is likely that electronic distribution can be introduced into the
DAISY concept in a not too distant future.
For some years to come, it is highly likely that most producers
related to the DAISY Consortium will use CD-ROM disks as their
preferred distribution media. The producer would then typically use
CD-R technology to create the disks. This technology is now very
cost-effective and the production can be done at a lower cost than for
analogue cassettes. Since a single CD-ROM can replace a large number
of cassette tapes, the reduction in size and weight will bring the
costs for transportation and storage down dramatically.
As soon as new mass storage media become available and economically
favourable, they might be integrated into the system. When this
happens, the user’s reading equipment only needs to be
complemented with a device for reading the media in question. On the
production side, it will be a rather trivial matter to convert from
one type of media to another - data can just be copied between them at
no loss of audio quality.
An example of a new storage medium is DVD-ROM, which is a rapidly
evolving standard in the PC market today. The DVD-ROM disc can store
several times more data than a CD-ROM, which will allow for even the
largest talking books to be distributed on a single DVD-ROM disc.
Alternatively, the sound quality of the typical DAISY book can be
improved by using data compression technology that gives better
quality but demands more storage space.
However, before the DVD-ROM technology can become useful for DAISY
books, it needs to be negotiated amongst all parts of the DAISY
Consortium and its allies that the new medium should be supported. All
playback devices must be equipped with compatible hardware to be able
to access the new medium.
The DVD-ROM standard must be mature enough to be a safe alternative
for the future - different devices and discs must be fully compatible.
There will also be some time before DVD-ROM devices are shipped as a
cheap, standard device with multimedia PC:s, as is the case for CD-ROM
today. There also remains the problem how to produce DVD-ROM disks in
a cost-effective manner. DVD-R devices currently exist or are being
developed, and as soon as the price drops to a reasonable level, they
can be taken into use.
Back to home page
_________________________________________________________________
This page is maintained by [log in to unmask] (March 25, 1998)
|