05 July 2007

Google Books Open Up - A Bit

One of the problems with the otherwise laudable Google Book Project is that it's not actually providing access to the texts, just adding searchability. That's useful, but not really want we need. And since many of the the books that it is scanning are in the public domain, there seems no reason not to offer full access.

Google seems to have realised this, finally:

I work on a project at Google called Google Accessible Search, which helps promote results that are more accessible to visually impaired users. Building on that work is today's release of accessible public domain works through Google Book Search. It's opening up hundreds of thousands of books to people who use adaptive technologies such as speech output, screen readers, and Braille displays.

As this notes, one of the advantages of opening up in this way is that the text may be re-purposed for adaptive technologies. Put another way, texts that remain closed, locked up behind DRM or similar, are largely denied to people who rely on those technologies - another reason why closing up knowledge in this way is ethically wrong.


Sylvia Thornley said...

I read your post with interest and felt compelled to post a response to tell you about www.ultrapedia.com - we publish the Recognized version of a scanned public domain book.

So how does the Recognized version of a book differ from the book formats already on Google Book Search? We preserve the original format of the book, including graphics and photos; so the book appears as though it was recently created on a computer, not printed centuries ago. Rather different from scanned images of a book or plain text!

We’ve been digitizing books for years as a hobby and when I first looked a GBS, I thought it was great but….

www.ultrapedia.com – is a full text and full retrieval search engine focused on delivering the Recognized version of a scanned book that is out of copyright; that you can read online or download in PDF format.

The system is still being developed and we’re still ironing out the kinks for example the books aren’t spellchecked yet; the next phase is to publish several different versions of the books; these include

• Recognized version of the complete book - all pages – rather than single pages like those indexed at the moment.

• Recognized and layered version of the complete book.

• Recognized, layered, and spellchecked version of the complete book – the ones online at the moment aren’t spellchecked.

• Recognized version of the complete book with a copy of the original PDF embedded page-for-page in it's own independent layer.

• Collated version of the original PDF and the recognized version in a twin-view window for side-by-side correction.

• Integrating our own public domain library into the mix.

Glyn Moody said...

That looks very interesting - thanks for that. So how exactly have you "partnered" with Google?