Showing posts with label swedish. Show all posts
Showing posts with label swedish. Show all posts

28 February 2006

Wanted: a Rosetta for the MegaWikipedia

As I write, Wikipedia has 997,131 articles - close to the magic, if totally arbitrary, one million (if we had eleven fingers, we'd barely be halfway to the equally magic 1,771,561): the MegaWikipedia.

Except that it's not there, really. The English-language Wikipedia may be approaching that number, but there are just eight other languages with more than 100,000 articles (German, French, Italian, Japanese, Dutch, Polish, Portuguese and Swedish), and 28 with more than 10,000. Most have fewer than 10,000.

Viewed globally, then, Wikipedia is nowhere near a million articles on average, across the languages.

The disparity between the holdings in different languages is striking; it is also understandable, given the way Wikipedia - and the Internet - arose. But the question is not so much Where are we? as Where do we go from here? How do we bring most of the other Wikipedias - not just five or six obvious ones - up to the same level of coverage as the English one?

Because if we don't, Wikipedia will never be that grand, freely-available summation of knowledge that so many hope for: instead, it will be a grand, freely-available summation of knowledge for precisely those who already have access to much of it. And the ones who actually need that knowledge most - and who currently have no means of accessing it short of learning another language (for which they probably have neither the time nor the money) - will be excluded once more.

Clearly, this global Wikipedia cannot be achieved simply by hoping that there will be enough volunteers to write all the million articles in all the languages. In any case, this goes against everything that free software has taught us - that the trick is to build on the work of others, rather than re-invent everything each time (as proprietary software is forced to do). This means that most of the articles in non-English tongues should be based on those in English. Not because English is "better" as a language, or even because its articles are "better": but simply because they are there, and they provide the largest foundation on which to build.

A first step towards this is to use machine translations, and the new Wikipedia search engine Qwika shows the advantages and limitations of taking this approach. Qwika lets you search in several languages through the main Wikipedias and through machine-translations of the English version. In effect, it provides a pseudo-conversion of the English Wikipedia to other tongues.

But what is needed is something more thoroughgoing, something formal - a complete system for expediting the translation of all of the English-language articles into other languages. And not just a few: the system needs to be such that any translator can use it to create new content based on the English items. The company behind Ubuntu, Canonical, already has a system that does something similar for people who are translating open source software into other languages. It's called, appropriately enough, Rosetta.

Now that the MegaWikipedia is in sight - for Anglophones, at least - it would be the perfect time to move beyond the succerssful but rather ad hoc approach currently taken to creating multilingual Wikipedia content, and to put the Net's great minds to work on the creation of something better - something scalable: a Rosetta for the MegaWikipedia.

What better way to celebrate what is, for all the qualifications above, truly a milestone in the history of open content, than by extending it massively to all the peoples of the world, and not just to those who understand English?