23 March 2006

Open Data in the Age of Exponential Science

There's a very interesting article in this week's Nature, as part of its 2020 Computing Special (which miraculously is freely available even to non-subscribers), written by Alexander Szalay and Jim Gray.

I had the pleasure of interviewing Gray a couple of years back. He's a Grand Old Man of the computing world, with a hugely impressive curriculum vitae; he's also a thoroughly charming interviewee with some extremely interesting ideas. For example:

I believe that Alan Turing was right and that eventually machines will be sentient. And I think that's probably going to happen in this century. There's much concern that that might work out badly; I actually am optimistic about it.

The Nature article is entitled "Science in an exponential world", and it considers some of the approaching problems that the vast scaling up of Net-based, collaborative scientific endeavour is likely to bring us in the years to come. Here's one key point:

A collaboration involving hundreds of Internet-connected scientists raises questions about standards for data sharing. Too much effort is wasted on converting from one proprietary data format to another. Standards are essential at several levels: in formatting, so that data written by one group can be easily read and understood by others; in semantics, so that a term used by one group can be translated (often automatically) by another without its meaning being distorted; and in workflows, so that analysis steps can be executed across the Internet and reproduced by others at a later date.

The same considerations apply to all open data in the age of exponential science: without common standards that allow data from different groups, gathered at different times and in varying circumstance, to be brought together meaningfully in all sorts of new ways, the openness is moot.

No comments: