CCDC: Reasons why sourceCIF data must be Open

This is the text of a letter I have sent to Dr Colin Groom, Director of the Cambridge Crystallographic Data Centre. For context, read my blog posts (/pmr/ ) over the last 4 days. “sourceCIFs” are raw data created as part of a crystallographic experiment by scientists (not in the CCDC) and required by community norms as part of the scholarly publication process. Some are published Openly, but others are sent by the author or publisher to CCDC in an exclusive process. CCDC then control the further distribution of this data which are either made available in trivial amounts (less than 0.1% of the CCDC’s holding of sourceCIFs) or significant financial subscription (which many institutions cannot afford). I re-emphasize that I simply wish to make Open the collection of the author’s original data [i.e. not the file with a CCDC header and CCDC accession code].

The arguments in this letter may apply to other disciplines, wherever Open data are managed by an exclusive gatekeeper.

Colin,

At my presentation in MS89 at the IUCr meeting I presented my request to you for the raw verbatim sourceCIFs supplied to CCDC as part of the publication of scholarly articles (i.e. without any CCDC-added value). I was unaware of anyone from the CCDC being present, so you will have to accept my account. I stepped through the arguments on my blog, arguing that these data were part of the essential scholarly record, and that some publishers made their sourceCIFs available Openly. It is the sourceCIFs from publishers such as Wiley, Elsevier and Springer that are in question and for which I asked. There was no adverse comment on what I presented.

I showed your reply in the meeting and also posted it on my blog (/pmr/2011/08/28/iucr2011-reply-from-ccdc-on-restrictions-on-redistributing-cifs/ ). I summarised this as containing two main reasons why CCDC would not release the sourceCIFs:

  • these arrangements were put in place to satisfy the demands of publishers“. I have asked the University of Cambridge for details of these arrangements, through a Freedom of Information request. The advantage of specifying FoI is that it contains explicit guidelines on the public release of contracts (http://www.ico.gov.uk/ ) and this gives you the power (and the duty) to make these contracts Open except in very special circumstances. I have also asked for the number of sourceCIFs involved. When I have a better idea of the facts from this request we will be able to judge whether any of the current publishers is acting as a block to making the sourceCIFs Open. Note that the FoI legislation requires a reply by Sept 26 latest and I am likely to make further comment at that stage.
  • because the CDCC continues to rely on subscriptions to the CSD to fund its ongoing developments.” We discussed this in a conversation, and I think I can summarise your argument as: “if the sourceCIFs were open, the CCDC would lose a significant number of subscriptions” [in part because other resources based on the sourceCIFs such as Crystallography Open Database and Crystaleye could provide competition]. You argued that the CCDC was beneficial to the community (which I agree with) and that it could only continue to exist if it had a monopoly right to control the distribution of sourceCIFs (with which I profoundly disagree and now explain why).

There are ethical, moral, and political/legal reasons why basis published scientific data should be Open.

  • Moral, in that the authors of the data believe that by providing their experimental data they are providing it to the world community, whether scientific or not. If CCDC closes the sourceCIFs data, authors are deprived of their moral rights.
  • Ethical. The data in sourceCIFs are of value to the world community (for example many subscribers to CCDC are involved in medical research and need the data to help develop new drugs).
  • Political and legal. Many governments and funders are requiring that the fruits of their funding are made completely Open. If, for example, a scientist is funded by Wellcome Trust or NIH their research is expected to be Open. They publish their papers Openly according to guidelines But for many of these papers the text is Open but the sourceCIFs is closed [it can only be obtained by request, only in small numbers and cannot be redistributed]

More generally public opinion is strongly in favour of reform towards Openness. I give some examples, some of which will have quasi-legal compulsion.

  • On Monday 29th Aug George Monbiot published an article in the Guardian which very strongly criticized the current system of academic publishing (http://www.guardian.co.uk/commentisfree/2011/aug/29/academic-publishers-murdoch-socialist ). This has had widespread impact and very general support. I believe that the current CCDC practice of a monopoly control of academic research would fall under the same criticism. While it may not be the same scale, it is the same principle. CCDC lay themselves open to being judged in the court of public opinion and it will be difficult to show why they should not release sourceCIFs.
  • RCUK have now universally pushed for raw data to be openly available. In talking with NERC, I understand that their philosophy is that raw data should be Open and that value-adders should build on this and can create a competitive market based on the value-add, not a monopoly.
  • The value of text and data in bulk (e.g. for mining) has been highlighted by the Hargreaves report. Effectively Hargreaves is saying that copyright and other contractual restrictions are seriously harming science and that the UK should remove them. I wish the sourceCIFs to be used for data-mining in an open fashion, whereas at present the only data-mining is what CCDC permits and which has to be paid for. I have commented in /pmr/2011/08/30/open-crystallography-the-hargreaves-report-can-help-make-ccdc-data-open/ . Note that this contains suggestions from a third party as to how we should approach sourceCIFs, and I have done what I can to avoid confrontation. But the issue is public and I expect the community to make reference to Hargreaves.
  • The Information commissioner’s office (ICO) has taken strong action on scientists who refuse to share data (see Queen’s University Belfast and tree-ring data http://www.informath.org/apprise/a3900.htm which describes in detail how QUB fought and lost the right to keep data closed). The chairman of the UK parliament’s Science and Technology Committee stated that “data has to be made publicly available” and that “Any university or scientist that hasn’t got that message needs a total rethink of the way they do research”. I hope that CCDC do not take the same route as QUB as it is messy, ultimately pointless, and reduces standing in the community.

These are some of the recent examples of how public opinion, including government, is solidly behind releasing data. Some of these involved conflict and I am keen to avoid this. My hope is that, by the time I get a reply to my FoI request, CCDC decides that it can after all, release the sourceCIFs as Open. If there are contractual problems with the publishers I am happy to help take them to higher authorities (as in the previous paragraph). While the pubklishers may not need to comply legally, I think moral pressure from government offices is likely to be effective.

I do not accept that the CCDC will suffer serious business loss. Encyclopedia Britannica has not been destroyed by Wikipedia but it is redesigning itself. Ordnance Survey has not been put out of business by OpenStreetMap. CCDC should not feel seriously challenged by Crystallography Open Database and Crystaleye. Indeed if it aligns itself with Open Crystallography it can benefit.

I am happy to advise CCDC in how to make sourceCIFs Open, e.g. by defining what is meant by Open and what needs to be done to make sure sourceCIFs are Open and can be re-distributed and re-used. There is also the issue of future sourceCIFs and I am happy to suggest processes for the future ingest of Open sourceCIFs. I stress that the Openness requirement is not negotiable – effectively it means there can be no imposition of conditions other than an Open licence (PPDL, CC0) [see the Panton Principles]. The decision must be rapid (i.e. within the 20 days of the FoI request). A promise to change in the future is, unfortunately, not acceptable.

I hope that CCDC will agree this is the right way to go, quickly.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *