VICUG-L Archives

Visually Impaired Computer Users' Group List

VICUG-L@LISTSERV.ICORS.ORG

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
David Poehlman <[log in to unmask]>
Reply To:
David Poehlman <[log in to unmask]>
Date:
Wed, 23 Jun 2010 10:12:22 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (43 lines)
source url:

http://googledocs.blogspot.com/2010/06/optical-character-recognition-ocr-in.html
Optical character recognition (OCR) in Google Docs.

Tuesday, June 22, 2010.

A couple of months ago, my co-worker, Mike, showed up at my desk with a pile of
paper, each of the yellowed sheets densely covered with an ancient-looking
typewriter font. His wife had recently discovered parts of her family chronicles
in the attic, typed up by her grandmother many years ago! Now he was wondering
if there was a way for her to continue writing the chronicles in Google Docs.

The papers sat on my desk for a while, but recently, I returned them to Mike
with a smile, cheerfully telling him that what started as my 20% project is now
ready for everyone to use - Google Docs now officially supports importing
scanned documents. What we launched as an experimental feature for the Documents
List Data API last year is now available on the upload page: check the “Convert
text from PDF or image files to Google Docs documents”, upload your scanned
images (JPEG, GIF, PNG) or PDFs, and Google Docs will extract text and
formatting from the scans for you to edit away.

For the technically curious: we’re using Optical Character Recognition (OCR)
that our friends from Google Books helped us set up. OCR works best with
high-resolution images, and not all formatting may be preserved. The original
images will be included in the new document to make it easier for you to correct
mistakes. Supported languages include English, French, Italian, German and
Spanish, with more languages and character sets on their way. We’re looking
forward to get feedback from you while we keep improving the feature over the
next months.

And Mike’s scanned family chronicles have even been extended by an additional
chapter in Google Docs: his wife recently had a baby boy named James!

Posted by: Jaron Schaeffer, Software Engineer, Google Docs.


    VICUG-L is the Visually Impaired Computer User Group List.
Archived on the World Wide Web at
    http://listserv.icors.org/archives/vicug-l.html
    Signoff: [log in to unmask]
    Subscribe: [log in to unmask]

ATOM RSS1 RSS2