Showing posts with label chemoinformatics. Show all posts
Showing posts with label chemoinformatics. Show all posts

17 June 2009

The Doctor Who Model of Open Source

I often write of the way in which other domains are learning from open source and its successes. But that's not to say the traffic is all one way: increasingly, the other opens have much to *teach* open source, too.

For example, Peter Murray-Rust is one of the leading exponents of open data and open chemistry, notably through the Blue Obelisk group:


The Internet has brought together a group of chemists/programmers/informaticians who are driven by wanting to do things better, but are frustrated with the Closed systems that chemists currently have to work with. They share a belief in the concepts of Open Data, Open Standards and Open Source (ODOSOS) (but not necessarily Open Access). And they express this in code, data, algorithms, specifications, tutorials, demonstrations, articles and anything that helps get the message across.

Here's an interesting point he raised recently:

How do we sustain Open Source in a distributed world? We are facing this challenge with several of our chemical software creations/packages. People move, institutions change. Open Source does not, of itself, grow and flourish – it needs nurturing. Many packages require a lot of work before they are in a state to be usefully enhanced by the community - “throw it over the wall and it will flourish” does not work.

Many OS projects have clear governance and (at least implicitly) funded management. Examples are Apache, Eclipse, etc. Many others have the “BDFL” - Benevolent Dictator For Life with characters such as R[M]S, Linus, Guido Python, Larry Perl, etc. These command worldwide respect and they have income models which are similar to literary giants. These models don’t (yet?) work for chemistry.

Instead the Blue Obelisk community seems to have evolved a “Doctor Who” model. You’ll recall that every few years something fatal happens to the Doctor and you think he is going to die and there will never be another series. Then he regenerates. The new Doctor has a different personality, a different philosophy (though always on the side of good). It is never clear how long any Doctor will remain unregenerated or who will come after him. And this is a common theme in the Blue Obelisk.

The rest of the post fleshes out this analogy - well worth reading.

Follow me @glynmoody on Twitter or identi.ca.

03 June 2009

Why Chemical Software Will be Open Source

Here's an important post from Mr Open Chemistry, Peter Murray-Rust:


“Chemical software will be Open Source”

This statement expresses both a simple truth (Simple Future, see WP) and an aspiration (Coloured Future – Software shall be free). The latter is what I have been advocating on this blog – the moral, pragmatic, utilitarian value of Open Source. The former simply states that it will happen. IOW a betting person could lay a wager.

The heart of Peter's argument is this:

there is a particular aspect to “Chemoinformatics” - the software that supports the management of chemical compounds, reactions and their measured and computed properties:

There have been no new developments in the last decade

What I mean by this is that there have been no new algorithms or information management strategy to have come out of commercial chemoinformatics manufacturers. Chemical search, heuristic properties and fingerprints, molecule docking are “solved” problems. And advance comes from packaging, integration and parameter_tweaking/machine_learning. Only the last adds to science and since the commercial manufacturers are secretive then we can’t measure this (and I believe this to be mainly pseudoscience in its practice – you can make extravagant plans without independent assessment). So the advances from the manufacturers have been engineering – ease of use, deployability, interoperation with third-party software – but not functionality.

So the Open Source community – the Blue Obelisk – is catching up. I believe that OSCAR is already the best chemical language processing tool, that OPSIN will soon be as good as any commercial name2structure parser and that OSRA will do the same for chemical images.

What this essentially means is that chemoinformatics has become commoditised; and as history has shown us time and again, once that happens, the advantages of open source in terms of aggregated, distributed development kick in. It is proprietary software that does not scale - ironically, given the prevailing wisdom to the contrary - and which therefore always falls behind open source projects once a particular domain has matured.

This is not to say that free software never innovates, as I've discussed elsewhere; simply that in new sectors open source's advantages are less clear than they are in mature ones. Peter's point is that chemoinformatics in particular is ripe for open source to produce better versions of existing tools; and the implication is that as successive areas of science software become similarly mature, so free software offerings will move in and ultimately take over.

Follow me @glynmoody on Twitter or identi.ca.