Antony Williams (Chemspiderman) is actively involved in creating Open chemistry. Here he reveals the limitations imposed by the American Chemical Society on creating Open data.

CAS Discourages Using SciFinder to Help Curate Wikipedia Structures and CAS Numbers

Tonight I was catching up with my Watchlist on Wikipedia for the first time in a long time and noted that a comment had been added to the Wikipedia Project: CAS Validation page. This discussion page was started to have a place to discuss a second validation of my work by other membe[r]s of the WP:Chem team and especially to deal with my concerns about CAS numbers not matching the structure drawn in the Chemical Box or Drug Box. Sometimes the CAS number might be for the chloride salt but the structure would be the neutral form for example. So, this was our discussion place. I believe there is general agreement by all participants at WP:Chem that CAS Numbers have value for the users of Wikipedia and chemists is general so the presence of a CAS number in the boxes makes absolute sense and, of course, the correct CAS number for the structure makes sense in an encyclopedia. Therefore, validation and sourcing of CAS numbers has been pursued.

A comment from Eric Shively at CAS can be found here online at Wikipedia. He comments:

Chemical Abstracts Service (CAS) objects to anyone encouraging the use of SciFinder� and STN� to curate third-party databases or chemical substance collections, including the one found in Wikipedia. SciFinder and STN are provided to researchers under formal license agreements, under which the researchers agree to refrain from using these tools to build databases. We urge and expect those researchers to respect the explicit terms of the agreements they have entered into. CAS is a division of the American Chemical Society. Please contact CAS if you have questions. Eric Shively, CAS, Eshively (talk) 20:56, 5 March 2008 (UTC)

It’s an interesting stance. This at a time when there is more focus on facilitating information exchange. In an environment where people are using resources such as Wikipedia to source information one would assume that the availability of CAS numbers would actually be encouraged rather than so blatantly discouraged. It’s been said before that CAS numbers are like the phone numbers of the chemistry world so if they were to be sourced from a vendors catalog would that be acceptable? And how would anybody know where they are sourced anyway? If they were sourced from a bottle of chemicals on the shelf and added to Wikipedia is that acceptable?

Nevertheless,� as Mr Shively comments there are legal agreements in place and they are expected to be respected. Question: does every user of Scifinder read the agreement? When a large Pharma company licenses access to Scifinder for their users do they expect people to know the legalities of usage and train their users in such detail? Maybe…

As it is I am not a user of SciFinder…though I’d like to be. I think it’s an incredible resource. So, I don’t have to worry about the legal repercussions of using the system (yet). As it is I will continue my work of curating and I guess there will be a discussion now with the WP:Chem team about what to do about CAS Numbers.

PMR: I should at least thank the CAS/ACS for being so clear about their position - even though it is a simple NO. (It is usually impossible to get any replies at all from Closed Access and Closed Data publishers). In a previous post (Robert Massie on OA and PMR) I reported when Robert Massie commented on the value of Scifinder. Here the issue was that Scifinder (a search engine) and the content (Chemical Abstracts) was Closed, which in m opinion limits its use in Web2.0 applications - RobertM disagreed, saying that Web2.0 and Scifinder was not a binary decision.

Here the issue is that CAS identifiers have come to be accepted as a primary identifier system for chemistry - thus caffeine has the CAS number [58-08-2]. This is the only number I can reliably get from CAS without paying (or having my institution or country pay). The number is semantically almost void - it cannot be worked out like an InChI. InChI and CAS serve different purposes - CAS can be related to any substance including mixtures of molecules such as kerosene - InChI is algorithmically derived from the molecular structure and does not apply to mixtures. CAS numbers are frequently used to assert what a substance is and to indicate whether two substances are the same or different. They are commonly used in supplier catalogues and on bottles.

CAS numbers are copyright CAS/ACS who have the legal right to regulate their use - as above. They would make excellent identifiers for the semantic web, except that they are closed. If I want to find out what [67-64-1] is I can only do this by paying CAS - about 6 USD for each lookup (e.g. on STN Easy). This immediately rules it out for any semantic web application which assumes that resolving links is free. Wikpedia tells me that this number corresponds to acetone (nail varnish remover) but they now do not have the freedom to do this. Similarly Pubchem do not use CAS numbers as they have no right to do so. (Anumber of suppliers and other sources quote CAS numbers, many without explicit permission).

An identifier system for chemistry is extremely valuable (patents, safety, etc.) but can cause great problems when mistakes are made. If compounds are misordered because of mistakes in identifiers serious accidents could occur. An open system of identifiers would be highly valuable in developing the chemical semantic web and increasing quality. The closed and restrictive practices of CAS make it more difficult to create Web 2.0 applications in chemistry.

I do not believe this situation can last. Closed systems on the web cannot survive for many more years unless rigorously enforced by restrictive legal and business processes. The heads of chemistry departments who currently have no concern for informatics in the C21 will retire and a new generation of less conservative chemists will increasingly sweep away the Closed approach. Technology such as robots acting on semantic publications will make human-collected abstracts obsolete.

If CAS do not adapt to the culture of the modern web tensions will continue to increase in the chemical information arena. RobertM has already hinted that there is systematic stealing of CAS material. I do not condone this, but neither do I condone the closed control of a valuable system of identifiers.

  1. Name (required) says:

    We are legally required to supply vendor MSDS forms to our staff. The vendors have included CAS numbers on their MSDS forms, and we keep the forms in a database. So technically, we must be in breach of our SciFinder license?

    If we get sued, I wonder whether the judge would side with the legal statutes or the contractual agreement?

    What CAS should be doing is making CAS numbers an open standard - like PDF files - that everybody can adopt.

  3. will says:

    "The vendors have included CAS numbers on their MSDS forms"

    Dont worry. CAS no.s are identifiers (i.e. descriptors or metadata). You did not obtain the numbers from CAS databases (where their terms and conditions apply) - you got them elsewhere. CAS cannot legally stop anyone from obtaining CAS no's from a non-CAS database.

    This is because the data is just an ID number in this case. Just avoid implying that you own or are associated with the CAS no. trademark.

  4. David Gerard says:

    They're trying to have it both ways, like Pantone with colour swathes - an industry standard they can claim a monopoly rent on.

    This might be tolerable for commercial printing, but is less so in the pursuit of pure scientific knowledge. Various internal Wikimedia mailing lists have had this brought to their attention, and I've also posted it to a couple of external lists to work out what can be done about this claim (beyond the obvious, i.e. loud laughter).

    And you thought Westlaw suing over the use of *page numbers* was egregious ...

  6. Joshua Zelinsky says:

    It isn't clear to me that they are actually objecting to the use solely of the numbers. It looks like they are mainly objecting to what they see as database slurping. So if Wikipedia picks up the numbers from other sources then things should be ok, right?

  7. Physchim62 says:

    You have been misled by CAS propaganda in saying that CAS registry numbers are copyrighted. They are not. CAS owns a number of trade marks and (in the EU) database rights over the collection, but the numbers are assigned without any creativity, an essential element of copyright protection. This is exactly why CAS has to rely on such draconian license terms for access to its database, terms which would appear to breach the EU Database Directive and the competition law of many countries.

  8. Joshua, It would be better to get into a "conversation" with Mr Shively rather than a written exchange to determine exactly what the objections are but I "think" the objection is using the Scifinder database to validate CAS numbers listed in the ChemBox. SO, when CAS numbers are inserted into a ChemBox and the question is whether or not the CAS number listed is appropriate for the drawn structure then how could it be validated. There are numerous ways - looking through public databases to see whether or not there can be enough evidence that the CAS number and structure match or validating against the authority directly. I have shown a number of times that registry numbers in the free access databases can be in error (see In validating against the authority database, SciFinder, one of two searches will be the CAS number to validate the structure or, more appropriately, search the structure to validate the CAS Number. If the CAS number listed in the ChemBox is incorrect then one possible action would be to utilize the CAS number in the registry to insert into the ChemBox. I have not done this myself (since I don't have access to Scifinder) but this is where their objections lie.

    SO, the choice therefore comes down to validating against other sources (non-authority), not validating at all, or removing CAS numbers from the ChemBox. All of them are less than ideal for an encyclopedic source and the hope is that CAS/ACS will reconsider. A member(s) of the WP:CHEM team has been trying to conduct a conversation with them for quite a while to understand the concerns in detail.

  9. Could it be that CAS numbers fall outside the realm of copyrightability?


