Cambridge Crystallographic Data Centre disputes non-re-usability of primary data (Am. Chem. Soc charges > 100 USD to view this discussion)

I have been alerted to a discussion in the letter pages of J. Chem. Inf. Modeling (an ACS Journal). I normally read the literature through a paywall window (my home machine has no privileges and so I get a "citizen-enhanced" view of the primary literature. The enhancement is of course massively negative – I can't read most of this. For most things if I can't read them they don't exist – an increasingly common approach. Occasionally I switch on access to the University VPN which allows me to read the fulltext – thereby requiring the University to continue its subscription (in dollars) to this journal. Unless they use the paywall filter academics in rich universities (which is the only real market for scholarly journals) have no idea how impoverished the world is. But many of my readers will appreciate – they are the Scholarly Poor. And what follows can be understood by anyone – you don't have to be a chemist. Note that many research institutions do not subscribe to JCIM so I expect most readers will have a "scholarly poor lens" on what follows.

  • Earlier this year a paper was published http://pubs.acs.org/doi/abs/10.1021/ci100223t

    Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress

    Alessio Andronico, Arlo Randall, Ryan W. Benz, and Pierre Baldi*

    School of Information and Computer Sciences, Institute for Genomics and Bioinformatics and Department of Biological Chemistry, University of California, Irvine, Irvine, California 92697-3435, United States

    J. Chem. Inf. Model., 2011, 51 (4), pp 760–776 DOI: 10.1021/ci100223t Publication Date (Web): March 18, 2011 Copyright © 2011 American Chemical Society

I can't reproduce the abstract because although it was written by the authors they have signed over its ownership/copyright to ACS. (ACS in their generosity allow you to read this at the end of the link above). Note that the system is mounted at http://cosmos.igb.uci.edu/ . It contains the rubric:

Note: In as much as this Service uses data from the CSD [Cambridge Structural Database] , it has been given express permission from the CCDC [Cambridge Crystallographic Data Centre] . At the request of the CCDC, no more than 100 molecules can be uploaded to the Service at a time, and the Service ought to be used for scientific purposes only, and not for commercial benefit or gain.

Well – that was a pretty challenging paper, wasn't it? (Sorry scholarly poor, I can't tell you what it said – but trust me – or pay 35 USD).

This elicited a response from the director of the (CCDC). If you read the abstract you will see their involvement. (BTW I have no relation to them except geographical proximity and the University has declared that they don't belong to the University (for FOI) although they are listed as a department). Here is his 1-page response:

  • http://pubs.acs.org/doi/pdfplus/10.1021/ci2002523 Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress. A Response from The Cambridge Crystallographic Data Centre,

    Colin R Groom* The Cambridge Crystallographic Data Centre, 12 Union Road, Cambridge CB2 1EZ, U.K.

He clearly disagrees with their contention. (Scholarly Poor you will have to fork out another 35 USD to read this single page). [2]

And the original authors responded

  • (http://pubs.acs.org/doi/abs/10.1021/ci200460z ) Data-Driven High-Throughput Prediction of the 3-D Structure of Small Molecules: Review and Progress. A Response to the Letter by the Cambridge Crystallographic Data Center

    Pierre Baldi

    J. Chem. Inf. Model., Just Accepted Manuscript • DOI: 10.1021/ci200460z • Publication Date (Web): 22 Nov 2011

Wow! Some strong disagreement on matters of fact. (Stop whining Scholarly Poor and pay another 35 USD to read this letter – it's nearly 2 pages!). I'll reveal that it contains phrases like "simply false". And you can read the abstract which contains the phrase "significant impediments to scientific research posed by the CCDC."

So that is a pretty damning indictment. Of the CCDC? Maybe, if you can read the letters. But certainly of the ACS. An important discussion about the freedom of re-use of the scholarly literature is hidden behind a paywall. The letters have been written by scientists and presumably reproduced verbatim by the ACS. What possible justification is there for requiring the charge of 35 USD? There is no peer review involved. But then the ACS charges 35 USD for everything, including an 8-WORD retraction notice. (It's sort of easier just to charge vast amounts of money than think what you are doing to science).

So I am in a dilemma. How to I bring this discussion to public view. Because that is what a Scholarly Society SHOULD wish. I can't expect everyone to pay 105 USD. (The part of the first paper that is involved is only two sentences). I have the following options:

  • Do nothing – this will perpetuate the injustices
  • Write summaries of the letters (absurd because it will distort the meaning)
  • Extract paragraphs and publish them under fair use. (There is no doctrine of fair use in the UK and I could be sued for any phrase extracted – I have already laid myself open to this with the phrase "simply false"
  • Urge the authors of the letters to publish them Openly. In doing so they will break the conditions of publication and lay themselves open to legal action or having subscriptions to JCIM cut off
  • Write to the editor of the Journal suggesting it would be in the public interest to publish the letters? In general editors don't reply – but I know this one. But in any caseI dounbt they would do it and it makes the situation worse
  • Or follow a reader's suggestion I haven't thought of

Because I am now going to continue to challenge the CCDC. I have been turned down on FOI ground with a technicality (that the CCDC although listed as a department of the University isn't part of it for FOI). BTW it took the University FOI 19.8 days to work that out.

If you read the last paper (shut up and pay!) you will see that the authors quote our work on Crystaleye and suggest that it, together with the Crystallography Open Data Base (COD) could and now should replace the CCDC. They say (I have removed all the letter "O"s [1] to avoid direct quoting) 35 USD will tell you where the O's are meant to be.

As histry shws, thse wh stand in the way f demcracy and scientific prgress end up lsing ver the lng-run. The reactinary attitude f the CCDC staff has started t backfire by energizing academic labratries arund the wrld t find alternative slutins arund the CCDC.

I agree with the sentiments expressed. The only problem is that the authors chose to do it behind a paywall.

I shall continue my campaign to liberate "our" data from the CCDC+Wiley/Elsevier/Springer monopoly. Sancho Panza (http://en.wikipedia.org/wiki/Sancho_Panza ) is welcome to join me.

[1] http://en.wikipedia.org/wiki/The_Wonderful_O James Thurber.

[2] UPDATE: I managed to get it for free but maybe I have a cached copy?

UPDATE: It now seems that most people can get the first letter ("Editorial") for free but I still have to pay for the UCI response

6 thoughts on “Cambridge Crystallographic Data Centre disputes non-re-usability of primary data (Am. Chem. Soc charges > 100 USD to view this discussion)

  1. Egon Willighagen

    Peter, do you know about the Open nature of COSMOS itself? The reply to the reply comments that being allowed to reuse, redistribute and modify software tools for structure prediction is 'essential', but I could not find the source code of COSMOS anywhere yet.

    Reply
    1. pm286 Post author

      There is only a web page among which COSMOS is a selectable option. That's where I got the stuff about CCDC. I don't get the feel that it's OS which is an additional problem but that isn't the current issue

      Reply
  2. Anthony Smith

    I've been following this exchange and had the same question as Egon. It's not at all clear that the source code for COSMOS is Open Source. If it isn't then doesn't it undermine the integrity of arguments favoring openness? Also, is access to primary data the issue here? I guess it depends on what you mean by primary data but I assume you mean the "sourceCIFs" referred to in previous blog posts. My reading of the original paper (and indeed the text from the web site you quote) is that COSMOS uses the CSD and not the sourceCIFs and I'm pretty sure I've heard you say before that you don't have an issue with restrictions on access to the CSD.

    Reply
    1. pm286 Post author

      >>I’ve been following this exchange and had the same question as Egon. It’s not at all clear that the source code for COSMOS is Open Source. If it isn’t then doesn’t it undermine the integrity of arguments favoring openness?

      It weakens the argument for some people but not me. My own view is that anyone can use my/our source code and my/our data for any lawful purpose. So if people incorporate OSCAR or Crystaleye into commercial products that's fine by me. The whole question of restriction of downstream use (non-commercial) is discussed in the simultaneous discussion on NC licences on this blog. Licences and contracts are a blunt weapon when trying to control the ethics of the consumer.

      >>Also, is access to primary data the issue here? I guess it depends on what you mean by primary data but I assume you mean the “sourceCIFs” referred to in previous blog posts.

      There are two separate issues:

      1. the CCDC is the monopoly recipient of certain sourceCIFS (others are Openly available). CCDC refuse consistently to make these available and thereby set themselves up as unacciountable gatekeepers. (Their argument is simple - they need the monopoly to preserve their business). I believe that between the publishers and the CCDC there is an ethical scientific duty to make these Openly available without payment or restriction and that is what *I* want.

      2. UCI subscribe to CCDC and therefore have access to the "raw data" and the CSD system. CCDC forbid downstream publication of derivative works without their permission and in this case they have refused (i.e. the need for permission is asserted by default). If UCI/COSMOS is accessing "raw data" - coordinates and molecular geometry calculated by stanadard algorithms then I would content that they have a moral and ethical right to use it and that contractual restrictions by CCDC are unethical. If they are reproducing the CSD system then they are violating the contract. My guess is that they are not.

      In any case this shows that CCDC is on the path to self-inflicted oblivion. COSMOS can take Crystaleye or COD data and build a system independent of CCDC data. 250,000 Open structures should be quite enough. There is enough community software to create an Open equivalent of large parts of the CSD system and this can be built surprisingly quickly. Lawyers restrictions may delay the end by a few years but at the cost of destroying any viable future. Because building a business on a restrictive monopoly of Open content requires the imperviousness of a film or music magnate.

      >>My reading of the original paper (and indeed the text from the web site you quote) is that COSMOS uses the CSD and not the sourceCIFs and I’m pretty sure I’ve heard you say before that you don’t have an issue with restrictions on access to the CSD.

      CSD are welcome to create a value-added system and market it. What they cannot do is continue a business by restrcting access to the primary scientific data. I am going to pursue this by political means - I will be happy for allies - and we shall win through.

      Reply
  3. Anthony Smith

    PS: I accessed the latest letter for free by logging on to the ACS Network. Don't think you need to be a member of ACS to register for this (I'm certainly not).

    Reply
    1. pm286 Post author

      Thanks,
      I have had variable access as well. When I started the blogpost, the 2 letters were all charged at 35 USD. Then later I found the first letter was available. Please can other readers let us know. And who knows, the ACS might also give a definitive answer. [But it is not uncommon for journals to charge for free material - due to errors]

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>