Open Data: publishers are the problem

The Chemspider site and blog have been making rapid and valuable progress towards Open Data. This is particularly laudable for a commercial site where Openness in chemistry is a long way from being a proven business model and is actively resisted by many. Here is a typical tale of frustration – I comment below
Why We Can’t Publish Scraped CrystalEye Data Yet….And Science Commons Declare a Protocol for Implementing Open Access Data
Previously I blogged about our intention to scrape CrystalEye data and publish onto ChemSpider. The original comments regarding the data on CrystalEye were as follows:

  1. pm286 Says:
    October 26th, 2007 at 7:54 am (1) All data come from Free sources – i.e. visible without a subscription. Some journals (Acta Crystallographica and RSC for example) do not copyright the data. Others like ACS add copyright notices. It is our contention, and Elsevier has agreed for its own material, that facts are not copyrightable. We have therefore extracted and transformed facts and mounted these. Where the original material (CIF) does not carry copyright we mount it on our pages – where it does we do not, but we have the transformed data. In those cases it would be possible to recreate the original CIF data in semantic form ,but not the exact typographical layout which contains meaningless whitespace.I am not aware that ACS or Elsevier have ever made statements of any kind about our Open Data efforts.You may scrape anything, must you must honour the source and the metadata and you should add the Open Data sticker. If you scrape the link (simplest) you may simpy point to our site. If you scrape more data you should ensure that the integrity of the data is maintined and that if it is re-used the re-used data should still clearly show our metadata.

[PMR: Yesterday’s announcement of the CCZero licence could mean that we change from a meta-licence (“Open Data”) to an explicit CCZero licence. I will need to read the details. I don’t think it changes the arguments below.]

We have already done the work to scrape certain data from the site but have chosen to be extra careful with taking the declaration of Open Data made to all data sources. My primary worry was with the data scraped from the ACS journals. With this caution in mind I sent a letter to the copyright department at ACS as outlined here. In fact I made a couple of phone calls, sent the email about 2 more times and finally managed to talk to a nice gentleman from the ACS copyright department and brought my concerns to light. Since then we have exchanged multiple emails, spoken again on the phone and I have been told that a meeting of minds from both Washington and Ohio was being scheduled to discuss the situation. That’s 2 months after my original email.
Today I received the following email and I am excerpting from it..
“Thank you for your inquiry about the proposed use by ChemSpider of information in the CrystalEye database that has been published within certain ACS journal publications. In light of your query, we are examining the manner in which ACS published material is represented within that database as well as the nature of your proposed use, so that we can respond in an informed manner to your request.
If you will be attending the ACS National Meeting in New Orleans, perhaps we could confer with you at that time to discuss our findings and advise you appropriately?
Communicators Name withheld ”
What I thought was a simple question and done with the intention that ChemSpider was safe turns out not to be so simple. It could take until March 2008 to get an answer! At this stage we will not be publishing any of the CrystalEye data without confirmation from each of the publishers that this is allowed. I asked the question previously “Who gets to declare data open or not?“ and even received the question “Why even offer the option of closed?” The primary reason is that we have turbulent times ahead of us around such issues of “openness” and until these are navigated I am working to keep ChemSpider “safe “. I am willing to participate, support and contribute to the evangelism of openness but am equally concerned with keeping ChemSpider alive for the close to 3000 users per day now accessing the service.
It was an interesting day to receive this email about a potential FIVE MONTH delay to a decision about Open Data especially now that Science Commons have released a Protocol for Implementing Open Access Data just yesterday. …
So, while protocols are exposed to the community by Science Commons the challenge of utilizing them now begins…I will be in communication with members of the Science Commons soon to determine how ChemSpider can it into the model…

PMR: This is, unfortunately, completely typical. Earlier this year I wrote to Tetrahedron (an Elsevier journal) asking if they would consider posting CIFs (crystallographic data):

Request for Open publication of crystallographic data in Elsevier’s Tetrahedron

=========== Open letter to editors of Tetrahedron ==========
Professor L. Ghosez ,
Professor Lin Guo-Qiang ,
Professor T. Lectka ,
Professor S.F. Martin ,
Professor W.B. Motherwell ,
Professor R.J.K. Taylor ,
Professor K. Tomioka
Subj: Request for Open publication of crystallographic data in Tetrahedron
Dear editors,
I have recently been reviewing access to supplemental data in chemistry publications, in particular crystallographic data (”CIFs”). Many publishers (IUCr, RSC, ACS…) expose these on their websites as Open Data (for examples see: The data are acknowledged not to be copyrightable (see where your colleague Jennifer Jones (copied) has confirmed:

Dear Peter Murray-Rust
Thanks for your email. Data is not copyrighted. If you are reusing the entire presentation of the data, then you have to seek permission, otherwise, you can use the data without seeking our permission.
Yours sincerely
Jennifer Jones
Rights Assistant
Global Rights Department
Elsevier Ltd
PO Box 800
Oxford OX5 1GB
Tel: + 44 (1) 865 843830
Fax: +44 (1) 865 853333

Other Elsevier journals such as those publishing thermochemistry (see last blog post) are now actively making the supplemental data Openly available on the journal website. I am therefore asking whether Tetrahedron (and perhaps other Elsevier chemistry journals) might consider publishing their data Openly in this way and would be grateful for your views.
(This is an Open letter ( and I would like to publish your reply so please mark any confidential material as such).
Thank you for considering this

PMR: Five editors – I haven’t had the courtesy of a reply. This is not uncommon – I didn’t get replies on Open topics from Wiley, Springer (first time round) either. Either journals are not in the habit of replying – they consider ordinary scientists too low in the foodchain to merit consideration (most likely) – or they regard anything Open as a pain and want to slow it by inaction (also most likely). They have their set way of doing things – God ordained in 1972 that the world belongs to the publishers and they don’t want to see it change.
Another typical example. I was invited to write an article for Serials Review on Open Data. I asked if I could write my artcile in HTML and embed my own copyright material, noted as such under appropriate licence. The editorial office siad that would come back to me. It’s now past the closing date of the submission. After ca. 6 weeks I got the reply:

Facts and data are not copyrightable but the expression of data is
copyrightable. If you wish to use third-party data in a different
format within your article, including full acknowledgement to the source
of the data, then that would be acceptable. However, if you wish to
retain the expression of the data, then you will need to include
alternate diagrams within the article.

So I can use the data – IF I can get it. If I can only get a graph then I can’t unless I redraw it. Is redrawing a graph a useful activity for science – do I need to answer? The only value is that it adds some random errors to the data (or systematic ones) that would be fun to give as exercises in bad scientific practice for students. “Expression of the data” – i.e. the author’s graphs – are not re-usable.
So what’s the answer? Currently I use the “ask forgiveness, not ask permission” mode. And if the “owners” ot the data (read “appropriators”) send the lawyers and ask for a take-down – make a huge public fuss. As the world did when Shelly Batts “stole” a graph from from Wiley (Sued for 10 Data Points). And Wiley backed down. The publishers don’t like public fuss.
So a few months ago I would have advised Chemspider “go ahead”. But they ran foul of another publisher (I think it was the Royal Society of Chemistry). I never understood the details but Chemspider linked to publicly visible papers (not Open) and were asked to take the links out of the Chemspider database. This doesn’t even seem to make sense. I would have thought publishers would like people linking to their papers – maybe it was the metadata.
So I appreciate Chemspider’s wish to remain on the correct legal side of the publisher. But [the publishers’] actions destroy scientific data in the current century. Chemistry publishers [OA publishers and IUCr excepted] are actively and passively resisting the re-use of data. They copyright factual data, hide it, require take-downs, refuse to reply to reasonable letters – everything. They are simply in the way between the creator of the data and the consumer
As I have blogged we now have an exciting project sponsored by Microsoft on eChemistry. We are going to fill repositories with data. And we are going to get that data (“not copyrightable” – see above) from any source we reasonably can. It will be available to the whole world. It will probably be stamped CCZero. CrystalEye will be in there. We shall, of course, include the source (provenance) as we really care about it and metadata. So people will know where it came from.
Why can’t the ACS reply “Yes” to Chemspider by return? Does it really make sense for chemistry publishers to be universally seen as Luddites? Because the world will sweep these restrictive practices away, and the business will have moved from the publishers to somewhere in the twenty-first century (the one we are in).

This entry was posted in chemistry, open issues. Bookmark the permalink.

3 Responses to Open Data: publishers are the problem

  1. Just to clarify your intention with the phrase “So I appreciate Chemspider’s wish to remain on the correct legal side of the publisher. But their actions destroy scientific data in the current century.”
    When you say THEIR actions destroy scientific data in the current century….do you mean ChemSpider’s actions or the publishers actions? It reads as though you mean ChemSpider…thanks

  2. Regarding your comment “As I have blogged we now have an exciting project sponsored by Microsoft on eChemistry. We are going to fill repositories with data.”
    ChemSpider has just been integrated into Microsoft’s InfoMesa ( using our web services. “And we are going to get that data (”not copyrightable” – see above) from any source we reasonably can. It will be available to the whole world. It will probably be stamped CCZero. CrystalEye will be in there. We shall, of course, include the source (provenance) as we really care about it and metadata. So people will know where it came from.” We have already made the data available to PubCHem as you know so the ChemSpider data are there if you want them or we can supply them directly.

  3. pm286 says:

    (1) Sorry – corrected in the text.
    (2) Understood. Pubchem is a major resource in the project. So by implication all the data in there are available, and we appreciate Chemspider’s community-centric action in making their data available. I haven’t looked in detail at what there is in Pubchem and the data are slightly hidden away. The question of what to use will depend considerably on metadata and the consistency of different sources when translated to RDF.

Leave a Reply

Your email address will not be published. Required fields are marked *