12 April 2007

Recognising Google's True Character

It's easy to become apprehensive about the massive and growing power of Google. After all, its operating plan is essentially to know everything about everything that happens online - and, as a consequence, offline. I certainly share those concerns, but it's also important to note the company continues to make moves that contribute to the free software commons.

The latest one is pretty cool:

We're happy to announce the OCRopus OCR Project, a Google-sponsored project to develop advanced OCR technologies in the IUPR research group, headed by Prof. Thomas Breuel at the DFKI (German Research Center for Artificial Intelligence, Kaiserslautern, Germany).

The goal of the project is to advance the state of the art in optical character recognition and related technologies, and to deliver a high quality OCR system suitable for document conversions, electronic libraries, vision impaired users, historical document analysis, and general desktop use. In addition, we are structuring the system in such a way that it will be easy to reuse by other researchers in the field.

Just as important is the choice of base platform:

We are initially targeting Linux x86 and x86/64 and are developing under Ubuntu 6.10. The code should be easily portable to other Linux distributions and other platforms. If you're interested in taking responsibility for another platform, please let us know.

OCR is an area where free software is still lagging somewhat compared to proprietary code: Google's latest gift to the community is therefore highly welcome - even if ultimately it will help it know even more about documents and hence us. (Via Matt Asay).

2 comments:

Unknown said...

I would LOVE to have some decent OCR in ubuntu.

Glyn Moody said...

Well, that looks rather more likely now....here's hoping.