Antony Williams (Chemspiderman) is actively involved in creating Open chemistry. Here he reveals the limitations imposed by the American Chemical Society on creating Open data.
Posted by: Antony Williams in UncategorizedCopyright©2008 Antony Williams
Tonight I was catching up with my Watchlist on Wikipedia for the first time in a long time and noted that a comment had been added to the Wikipedia Project: CAS Validation page. This discussion page was started to have a place to discuss a second validation of my work by other membe[r]s of the WP:Chem team and especially to deal with my concerns about CAS numbers not matching the structure drawn in the Chemical Box or Drug Box. Sometimes the CAS number might be for the chloride salt but the structure would be the neutral form for example. So, this was our discussion place. I believe there is general agreement by all participants at WP:Chem that CAS Numbers have value for the users of Wikipedia and chemists is general so the presence of a CAS number in the boxes makes absolute sense and, of course, the correct CAS number for the structure makes sense in an encyclopedia. Therefore, validation and sourcing of CAS numbers has been pursued.
A comment from Eric Shively at CAS can be found here online at Wikipedia. He comments:
“Chemical Abstracts Service (CAS) objects to anyone encouraging the use of SciFinder� and STN� to curate third-party databases or chemical substance collections, including the one found in Wikipedia. SciFinder and STN are provided to researchers under formal license agreements, under which the researchers agree to refrain from using these tools to build databases. We urge and expect those researchers to respect the explicit terms of the agreements they have entered into. CAS is a division of the American Chemical Society. Please contact CAS if you have questions. Eric Shively, CAS, firstname.lastname@example.org Eshively (talk) 20:56, 5 March 2008 (UTC)”
It’s an interesting stance. This at a time when there is more focus on facilitating information exchange. In an environment where people are using resources such as Wikipedia to source information one would assume that the availability of CAS numbers would actually be encouraged rather than so blatantly discouraged. It’s been said before that CAS numbers are like the phone numbers of the chemistry world so if they were to be sourced from a vendors catalog would that be acceptable? And how would anybody know where they are sourced anyway? If they were sourced from a bottle of chemicals on the shelf and added to Wikipedia is that acceptable?
Nevertheless,� as Mr Shively comments there are legal agreements in place and they are expected to be respected. Question: does every user of Scifinder read the agreement? When a large Pharma company licenses access to Scifinder for their users do they expect people to know the legalities of usage and train their users in such detail? Maybe…
As it is I am not a user of SciFinder…though I’d like to be. I think it’s an incredible resource. So, I don’t have to worry about the legal repercussions of using the system (yet). As it is I will continue my work of curating and I guess there will be a discussion now with the WP:Chem team about what to do about CAS Numbers.
PMR: I should at least thank the CAS/ACS for being so clear about their position - even though it is a simple NO. (It is usually impossible to get any replies at all from Closed Access and Closed Data publishers). In a previous post (Robert Massie on OA and PMR) I reported when Robert Massie commented on the value of Scifinder. Here the issue was that Scifinder (a search engine) and the content (Chemical Abstracts) was Closed, which in m opinion limits its use in Web2.0 applications - RobertM disagreed, saying that Web2.0 and Scifinder was not a binary decision.
Here the issue is that CAS identifiers have come to be accepted as a primary identifier system for chemistry - thus caffeine has the CAS number [58-08-2]. This is the only number I can reliably get from CAS without paying (or having my institution or country pay). The number is semantically almost void - it cannot be worked out like an InChI. InChI and CAS serve different purposes - CAS can be related to any substance including mixtures of molecules such as kerosene - InChI is algorithmically derived from the molecular structure and does not apply to mixtures. CAS numbers are frequently used to assert what a substance is and to indicate whether two substances are the same or different. They are commonly used in supplier catalogues and on bottles.
CAS numbers are copyright CAS/ACS who have the legal right to regulate their use - as above. They would make excellent identifiers for the semantic web, except that they are closed. If I want to find out what [67-64-1] is I can only do this by paying CAS - about 6 USD for each lookup (e.g. on STN Easy). This immediately rules it out for any semantic web application which assumes that resolving links is free. Wikpedia tells me that this number corresponds to acetone (nail varnish remover) but they now do not have the freedom to do this. Similarly Pubchem do not use CAS numbers as they have no right to do so. (Anumber of suppliers and other sources quote CAS numbers, many without explicit permission).
An identifier system for chemistry is extremely valuable (patents, safety, etc.) but can cause great problems when mistakes are made. If compounds are misordered because of mistakes in identifiers serious accidents could occur. An open system of identifiers would be highly valuable in developing the chemical semantic web and increasing quality. The closed and restrictive practices of CAS make it more difficult to create Web 2.0 applications in chemistry.
I do not believe this situation can last. Closed systems on the web cannot survive for many more years unless rigorously enforced by restrictive legal and business processes. The heads of chemistry departments who currently have no concern for informatics in the C21 will retire and a new generation of less conservative chemists will increasingly sweep away the Closed approach. Technology such as robots acting on semantic publications will make human-collected abstracts obsolete.
If CAS do not adapt to the culture of the modern web tensions will continue to increase in the chemical information arena. RobertM has already hinted that there is systematic stealing of CAS material. I do not condone this, but neither do I condone the closed control of a valuable system of identifiers.