23 June 2009

Why Open Source, Clouds and Crowds Rule

The Guardian's crowd-sourcing of the initial analysis of hundreds of thousands of PDFs of MPs' expenses is fast becoming mythic. If you want some more technical details, this post is a good place to start. I was particular struck by the following:

As well as the Guardian’s first Django joint, this was its first project with EC2, the Amazon contract-hosting service beloved by startups for its low capital costs.

Willison’s team knew they would get a huge burst of attention followed by a long, fading tail, so it wouldn’t make sense to prepare the Guardian’s own servers for the task. In any case, there wasn’t time.

“The Guardian has lead time of several weeks to get new hardware bought and so forth,” Willison said. “The project was only approved to go ahead less than a week before it launched.”

With EC2, the Guardian could order server time as needed, rapidly scaling it up for the launch date and down again afterward. Thanks to EC2, Willison guessed the Guardian’s full out-of-pocket cost for the whole project will be around £50.

As for the software, it was all open-source, freely available to the Guardian — and to anyone else who might want to imitate them. Willison hopes to organize his work in the next few weeks.

None of this happens without open source to allow zero-cost hacks; nothing happens without clouds, that allow immediate and low-cost scale-up. (Fifty quid? Blimey.) Bottom line: increasingly popular crowdsourcing efforts won't be happening without either.

No comments: