Tuesday, May 31, 2011

30 Days With...Google Docs: Day 26

Day 26: OCR in Google Docs
There are enough different features and scenarios to explore with Google Docs that 30 Days With...Google Docs only scratches the surface in some ways. For example, here we are with 25 days down and a only a handful to go and we haven't yet examined the OCR (optical character recognition) feature.

I haven't done much with OCR in recent years. I recall being highly disappointed in the concept during the early days of mainstream consumer scanners. Back then OCR was about as accurate as Google Voice speech to text transcription. Just as Google Voice transcription tends to yield more gibberish than coherent sentences, OCR results made more work rather than making life easier. I was apprehensive, but hopeful that the OCR in Google Docs is better than what I remember.

Google DocsCheck the box in the Google Docs Upload Settings to convert text from PDF or image files.When you upload a file or folder to Google Docs, there are two checkbox options available in the Upload Settings. The first one tells Google Docs to convert Microsoft Office file formats into native Google Docs formats. The second checkbox, however, directs Google Docs to put OCR to use and convert text found in PDF or image files into editable content.
To test it out, I uploaded a PDF file from my Documents folder. It was a 4-page PDF document weighing in at 225Kb, but Google Docs managed to upload, and convert it in a matter of seconds. But, the million dollar question is "Did the OCR accurately capture the content of the file?"
The first thing I noticed when checking out my newly uploaded file is that it was now eight pages. Why? Well, Google Docs does something cool that helps you ensure the accuracy of the OCR translation--it included the original PDF / image, followed by the editable transcript of its contents. So, each of the four pages of the original PDF was now two pages--one original image, and one editable content.
I was pleasantly surprised. The transcript of the text appears to be flawless. The OCR version loses something in formatting in terms of inserted images or icons on the original, but it even manages to try and match heading sizes and bulleted lists. All in all, the Google Docs OCR is impressive.
I will issue one small caveat. The PDF file I uploaded was a fairly straight forward text document in a fairly standard font. Your results may vary if you are trying to upload and convert unique fonts or fancy text.
Anyway, this is a cool feature that delivers what it promises. It doesn't come up for me frequently, but on those occasions where I have an image or a PDF and I want to edit the text content, I know I can use Google Docs to convert the text to something I can work with.
Day 25: Don't Lose Your Google Docs Data
Day 27: Google Docs Scripts and Power Tips

0 comments:

Post a Comment