09 March 2008

The World's Leading Anti-Scientific Society

Science is a pradigmatically open endeavour. It proceeds by sharing knowledge freely, allowing others to build on your work. If any domain should display openness in depth, it is science. That seems to have escaped the notice of the American Chemical Society, which pompously declares itself "the world's leading scientific society", as Peter Murray-Rust explains:

CAS identifiers have come to be accepted as a primary identifier system for chemistry - thus caffeine has the CAS number [58-08-2]. This is the only number I can reliably get from CAS without paying (or having my institution or country pay). The number is semantically almost void - it cannot be worked out like an InChI. InChI and CAS serve different purposes - CAS can be related to any substance including mixtures of molecules such as kerosene - InChI is algorithmically derived from the molecular structure and does not apply to mixtures. CAS numbers are frequently used to assert what a substance is and to indicate whether two substances are the same or different. They are commonly used in supplier catalogues and on bottles.

CAS numbers are copyright CAS/ACS who have the legal right to regulate their use - as above. They would make excellent identifiers for the semantic web, except that they are closed. If I want to find out what [67-64-1] is I can only do this by paying CAS - about 6 USD for each lookup (e.g. on STN Easy). This immediately rules it out for any semantic web application which assumes that resolving links is free. Wikpedia tells me that this number corresponds to acetone (nail varnish remover) but they now do not have the freedom to do this. Similarly Pubchem do not use CAS numbers as they have no right to do so. (Anumber of suppliers and other sources quote CAS numbers, many without explicit permission).

An identifier system for chemistry is extremely valuable (patents, safety, etc.) but can cause great problems when mistakes are made. If compounds are misordered because of mistakes in identifiers serious accidents could occur. An open system of identifiers would be highly valuable in developing the chemical semantic web and increasing quality. The closed and restrictive practices of CAS make it more difficult to create Web 2.0 applications in chemistry.

I do not believe this situation can last. Closed systems on the web cannot survive for many more years unless rigorously enforced by restrictive legal and business processes. The heads of chemistry departments who currently have no concern for informatics in the C21 will retire and a new generation of less conservative chemists will increasingly sweep away the Closed approach. Technology such as robots acting on semantic publications will make human-collected abstracts obsolete.

Fortunately, Peter points out that there is a solution:

The use of CAS numbers has been abandoned by organisations such as PubChem for exactly this reason. PubChem now has nearly 20 million substances. It holds records for all compounds that are likely to occur on MSDS. It’s highly respected (although ACS lobbied the US government to limit Pubchem’s activities). It is part of the NIH and now - with the NIH mandate - effectively safe from the ACS. It provides a credible alternative.

We (including Wikipedia) should now switch from using CAS numbers to using PubChem IDs wherever possible. It won’t be a simple transition - certainly we shan’t find 100% overlap. But it will solve all the common substances and therefore 90%+ use of CAS numbers.

We shall need software. We and others are now developing the next generation of chemical informatics software using RDF (Resource Description Framework). RDF allows the description of ambiguities and ontologies. This will allow chemical information to be gleaned directly from authoritative sources using robots. (Of course some of the authorities are currently conservative and do not allow access to their material because of restrictive copyright and licences, but that is starting to change, even in chemistry). As information becomes more open, the CAS system will be increasingly isolated in a world of chemical commerce.

Clearly, it's time to kill off this pernicious closed CAS system, which is damaging science, by boycotting it entirely. And while we're at it, I suggest we might as well get rid of the world's leading *anti*-scientific society too. (Via Open Access News.)

Update: There seems to be some movement as far as using CAS numbers on Wikipedia, but I can't tell whether that's just a one-off, highly limited solution, or part of a larger move to make ACS knowledge freely available to all such open projects. We shall see.

2 comments:

ChemSpiderman said...

I've made two comments on CAS and the Wikipedia commentary.
http://www.chemspider.com/blog/cas-discourages-using-scifinder-to-help-curate-wikipedia-structures-and-cas-numbers.html
http://www.chemspider.com/blog/enforcing-copyright-of-cas-numbers.html

I am not ready to abandon hope that the ACS/CAS can reach a point whereby they recognize the value, both public relations wise as well as good for their business. I believe it's easy to declare that we should just abandon CAS and their dominant position but it is not so easy. The relationship between CAS numbers and their >100 years of literature/patents is deeply entrenched in their offering. They have very skilled people dealing with their systems and, other than the protectionism we judge is prevailing, their systems are good. Sure, they can be more open but let's try and achieve that with a common-good discussion rather than abandoning them. While it's easy to talk about RDF solutions and OWL and, and, and, these are solutions yet to be proven. They are valiant efforts and need to be pursued but they are yet to be proven. Also, think about the politics...if PubChem IDs prevail and damage CAS' business then CAS initial views of PubChem damaging their business will have been validated...people will be out of jobs and all hell might break loose. I say bring the right people to the table to work through the complex business issues and do it soon.
That said, I'll acknowledge that I prefer to try and navigate the complex issues to a mutually beneficial point rather than go into attack mode preferred by others.

Glyn Moody said...

Thanks for your thoughts.

Fortunately (for me) I have the luxury of not being directly involved with any of this. I also write from the position of one who blogs extensively about these issues, and is not afraid to rush in where angels fear to tread.

I accept that there is plenty of room to negotiate and to attempt to move the ACS; I wish the best of luck to those trying to do that.

But I also think it can be useful (good cop, bad cop) to suggest more outrageously radical solutions - like chucking the ACS completely.

Speaking as an outsider observer, I have been frankly disgusted (as in Tunbridge Wells) by its behaviour - not just on this occasion, but previously, too. As I said in the post, science seems a quintessentially open endeavour, and for the ACS to put money over knowledge seems unforgivable.

However, we shall see.