The World's Leading Anti-Scientific Society
Science is a pradigmatically open endeavour. It proceeds by sharing knowledge freely, allowing others to build on your work. If any domain should display openness in depth, it is science. That seems to have escaped the notice of the American Chemical Society, which pompously declares itself "the world's leading scientific society", as Peter Murray-Rust explains:CAS identifiers have come to be accepted as a primary identifier system for chemistry - thus caffeine has the CAS number [58-08-2]. This is the only number I can reliably get from CAS without paying (or having my institution or country pay). The number is semantically almost void - it cannot be worked out like an InChI. InChI and CAS serve different purposes - CAS can be related to any substance including mixtures of molecules such as kerosene - InChI is algorithmically derived from the molecular structure and does not apply to mixtures. CAS numbers are frequently used to assert what a substance is and to indicate whether two substances are the same or different. They are commonly used in supplier catalogues and on bottles.
CAS numbers are copyright CAS/ACS who have the legal right to regulate their use - as above. They would make excellent identifiers for the semantic web, except that they are closed. If I want to find out what [67-64-1] is I can only do this by paying CAS - about 6 USD for each lookup (e.g. on STN Easy). This immediately rules it out for any semantic web application which assumes that resolving links is free. Wikpedia tells me that this number corresponds to acetone (nail varnish remover) but they now do not have the freedom to do this. Similarly Pubchem do not use CAS numbers as they have no right to do so. (Anumber of suppliers and other sources quote CAS numbers, many without explicit permission).
An identifier system for chemistry is extremely valuable (patents, safety, etc.) but can cause great problems when mistakes are made. If compounds are misordered because of mistakes in identifiers serious accidents could occur. An open system of identifiers would be highly valuable in developing the chemical semantic web and increasing quality. The closed and restrictive practices of CAS make it more difficult to create Web 2.0 applications in chemistry.
I do not believe this situation can last. Closed systems on the web cannot survive for many more years unless rigorously enforced by restrictive legal and business processes. The heads of chemistry departments who currently have no concern for informatics in the C21 will retire and a new generation of less conservative chemists will increasingly sweep away the Closed approach. Technology such as robots acting on semantic publications will make human-collected abstracts obsolete.
Fortunately, Peter points out that there is a solution:
The use of CAS numbers has been abandoned by organisations such as PubChem for exactly this reason. PubChem now has nearly 20 million substances. It holds records for all compounds that are likely to occur on MSDS. It’s highly respected (although ACS lobbied the US government to limit Pubchem’s activities). It is part of the NIH and now - with the NIH mandate - effectively safe from the ACS. It provides a credible alternative.
We (including Wikipedia) should now switch from using CAS numbers to using PubChem IDs wherever possible. It won’t be a simple transition - certainly we shan’t find 100% overlap. But it will solve all the common substances and therefore 90%+ use of CAS numbers.
We shall need software. We and others are now developing the next generation of chemical informatics software using RDF (Resource Description Framework). RDF allows the description of ambiguities and ontologies. This will allow chemical information to be gleaned directly from authoritative sources using robots. (Of course some of the authorities are currently conservative and do not allow access to their material because of restrictive copyright and licences, but that is starting to change, even in chemistry). As information becomes more open, the CAS system will be increasingly isolated in a world of chemical commerce.
Clearly, it's time to kill off this pernicious closed CAS system, which is damaging science, by boycotting it entirely. And while we're at it, I suggest we might as well get rid of the world's leading *anti*-scientific society too. (Via Open Access News.)
Update: There seems to be some movement as far as using CAS numbers on Wikipedia, but I can't tell whether that's just a one-off, highly limited solution, or part of a larger move to make ACS knowledge freely available to all such open projects. We shall see.