Talk at Int. Union of Crystallography: I ask for the availability of scientific data

I am giving an invited talk on Monday 2011-08-29 at the IUCr ( ) about our Crystaleye system and more generally about new approaches to publishing science. I’ll be blogging a lot over the next 2 days and get all my ideas into posts.


Authors’ raw data is part of the necessary scientific record (i.e. the material required to support the authors’ claims) and there is a large groundswell that it should be universally published. Some disciplines already do this and in the Long-tail of science crystallography leads the field. Every paper MUST be accompanied by the crystallographic data (CIF) and all publishers require authors to make this available. (More later).


Crystaleye (written by Nick Day) is a system that reads the CIFs from publishers webpages and aggregates them into a browsable and searchable knowledgebase. So far it has got over 200,000 different CIFs and covers both organic and inorganic data. It does this for journals published by IUCr, Am. Chem. Soc. (ACS), Royal Soc. Chem. because they publish the CIFs on their websites. Other publishers, however, do not. They send the data to the Cambridge Crystallographic Data Centre (CCDC, [Note I have no formal connection with the CCDC].


The CCDC are the only place in the world allowed to hold these CIFs and they are not on public view. They have many tens of thousands of these CIFs but normally restrict gratis access to very small numbers, normally 1 at a time, by email request. To get access to the CIFs you have to subscribe by paying an annual subscription. This subscription also provides added downstream value, which I do not need.


I would like the raw data from major publishers such as Wiley, Elsevier and Springer, which at present are only held to my knowledge at CCDC. I believe that access to them is essential to doing modern science (I will explain why later), so I have asked the director of CCDC (Dr Colin Groom) for these CIFs. Note that these are the CIFs deposited by the authors, not material enhanced by the CCDC (and already routinely published by ACS, RSC, IUCr, etc.)


Colin and I met yesterday and he agreed to reply to my request. We understand each other’s positions but it will be useful for Monday’s talk to have a formal record and this is the mail I sent to Colin yesterday.


Several journals (mainly Wiley, Elsevier, Springer, Science) do not publish the authors’ electronic CIFs but instead deposit them in the CCDC. These CIFs are a major part of the primary scientific record and can be used for validation, detection of fraud and error, systematic studies, mashups, etc.

I am asking whether the CCDC is prepared to make all these available as an Open collection (e.g. under CC0) so that the community can have bulk access to these without requiring further permission.

I asked the ICSD earlier this week and they were prepared to do this for the electronic CIFs they have received as part of the deposition process. If CCDC can do the same then the complete record of published electronic data will be available.

I am talking about new approaches to publication on Monday and would like to be able to present the CCDC’s formal position and any future plans on this issue.

I am around most of the time and happy to see if we can meet
Many thanks,



I will publish his reply in full and unabridged on this blog and discuss it in my talk on Monday

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Talk at Int. Union of Crystallography: I ask for the availability of scientific data

  1. Pete Carroll says:

    I wonder if the Government response to the Hargreaves Review regarding data/text mining for research might be relevant?
    “Nor does the Government regard it as appropriate for certain activities of public benefit
    such as medical research obtained through text mining to be in effect subject to veto by the owners
    of copyrights in the reports of such research, where access to the reports was obtained lawfully. We
    recognise that some publishers view licensing of text mining as a legitimate commercial opportunity;
    however we are not persuaded that restricting this transformative use of copyright material is
    necessary or in the UK’s overall economic interest…
    the Government agrees with the Review’s central thesis that the widest possible
    exceptions to copyright within the existing EU framework are likely to be beneficial to the UK,
    subject to three important factors:
    That the amount of harm to rights holders that would result in “fair compensation”
    under EU law is minimal, and hence the amount of fair compensation provided
    would be zero. This avoids market distortion and the need for a copyright levy
    system, which the Government opposes on the basis that it is likely to have adverse
    impacts on growth and inconsistent with its wider policy on tax.
    • Adherence with EU law and international treaties.
    • That unnecessary restrictions removed by copyright exceptions are not re-imposed by
    other means, such as contractual terms, in such a way as to undermine the benefits of
    the exception.
    The Government will therefore bring forward proposals in autumn 2011 for a substantial
    opening up of the UK’s copyright exceptions regime on this basis. This will include
    proposals for a limited private copying exception; to widen the exception for noncommercial
    research, which should also cover both text- and data-mining to the extent permissible under EU law..”
    The parliamentary select committee for the dept of Business Innovation & Skills is holding an inquiry on the Hargreaves Review and the Government’s response to the review. Closing date for submissions 5th September. See:
    I know time is short but it could be worth yourself or someone from the research community bringing this problem of “closed CIFs” to their attention as exemplary evidence of problems with access to data.
    PS good luck with your FOI request. You might find
    useful to you if they try a S41 or S43 exemption over release of the contracts.

Leave a Reply

Your email address will not be published. Required fields are marked *