I was very pleased to see:
ChemSpider Blog » Blog Archive » The Entire ChemSpider Database is On Its Way to PubChem!
which describes how the Chemspider database is being offered to Pubchem as “open data”. Chemspiderman has made a valuable attempt to navigate the complexities of Open Data and recursive licences. It is technically difficult and takes us into unknown territory. For a start it is difficult to decribe what the final object is. I understand Pubchem as a collection of links coupled to authority – i.e. Pubchem holds links to the Chemspider compounds but does not actually hold the data. (I am not aware that Pubchem holds any data other than a fairly small amount of computed data (e.g. number of rotatable bonds) and names). It does, of course, hold the data that NIH collects through the roadmap program. But I’d be happy to be corrected.
Chemspider repeats my suggestions for criteria for Open Data and adds:
CS: For right now I am giving up on trying to track where Open Data might end up. Based on my previous discussions with Peter Suber regarding navigating the complexities of Open Access definitions, I understand there is a need to define our own policies. I’m not going to do that here but what I will be clear with is that once the ChemSpider structure set is deposited in PubChem then we are at the mercies of THEIR data sharing policies. I believe Peter [PMR, not sure which Peter – but if me, see below] holds up PubChem as the primary example of Open Data (but maybe not). So, I believe it should be true to say that the ChemSpider structure set IS Open Data when accessed/downloaded/shared from PubChem. But I understand that will then be the PubChem data set and all association with us will likely be lost. But that is fully acceptable!
PMR: This shows the complexities. We will need to see how the data actually end up in Pubchem. But at present Pubchem holds only links to authorities. Thus if I search for aspirin I get 61 suppliers of information (search result) each entry in which links back to the supplier’s site.So any “data” (e.g. melting point) is not in Pubchem. Unless Chemspider is different then I would expect that only the links would be held in Pubchem. If I am right, then accessing Chemspider through Pubchem is simply another way of accessing Chemspider.
In a comment Rich Apodaca says:
Regardless of how exactly linkage occurs, the end result would be that any third party could, independently of ChemSpider, reconstruct the entire ChemSpider compound database. By using the ChemSpider Web APIs, they could develop a parallel service that re-processes the ChemSpider analytical data and patent/primary literature data, possibly mashing up the data from other sources as well.
This sets the bar very high for Open data in chemistry. I’m not sure what to call it, but it’s a game-changer.
If Chemspider allows the direct download and re-use of their data from their site then I also congratulate them. This is completely independent of whether the entries are linked from Pubchem. However it will be necessary to add a licence statement to the Chemspider pages (not Pubchem) making this clear.
It may be picky but I don’t think that Pubchem – in common with many other bioscience sites – actually gives explicit permission for re-use. Agreed that it is a work of the US government so should be free of copyright. There is an unspoken tradition in bioscience that data and collections are “Open” in some way but it isn’t well spelt out.
It should be.