I request Elsevier to make experimental data CC0 and release crystallography from CCDC monopoly

I have sent the following email to Elsevier’s Director of Universal Access (“very passionate about expanding access to information”). In summary I request that Elsevier publish all supplementary information (past, present, and future) and that by 27th Feb she gives me an unequivocal commitment that this has happened.

Dear Director of Universal Access,

I have been invited by Columbia University, NY, to give an opening keynote at their “Managing Research Data” symposium on Feb 27th
http://library.columbia.edu/news/libraries/2013/2013-1-31_Research_Data_Sympsosium_Announced.html ). Elsevier is among the sponsors, though (at my request) not of me. Among the recommendations I shall be making is that all primary research data should be published under CC0 (or equivalent) licence which allows anyone anywhere to do anything with it for any legal purpose without permission or negotiation, to re-use, modify, copy and repost. In my mind this is what “Universal Access” means.

This letter is to ask Elsevier, through your department, to make all supplemental data accompanying Elsevier publications , retrospective and future, available under CC0. I will treat all mail from you as public and announce your reply/s at the symposium.

I will restrict my examples to small-molecule crystallography though the argument extends to all primary scientific data (observations, instruments, computation, etc. in all disciplines). Crystallography, through its International Union (IUCr) has pioneered the imperative to publish all primary data (diffraction, cleaning, structural solution and refinement – and more). FWIW I am privileged to sit on the IUCr’s COMCIFS committee which creates the protocols for this.

Note that other major publishers (Nature, Acta Crystallographica, ACS, RSC, etc.) have no problem making their data available in the way I have described.

This publication enables many things including:

         The verification/validation of the experiment being reported. There are many ways of doing this including reprocessing the data with new algorithms, comparison with other data sets, recomputation, etc.

         The re-use of the data to build knowledgebases both in and outside the domain. Crystallography has a century of showing the value of the re-use of data and its interpretation.

         Creating of specialist services for alerting scientists to the publication of data.

As an example Nick Day in our laboratory collected 200,000 structures from the primary literature in http://wwmm.ch.cam.ac.uk/crystaleye. This resource, published under PDDL (equivalent to CC0) contains several features not found elsewhere including bondlength browsing and fragment browsing. In particular it has a unique feature of linking back to the original literature.

There are no Elsevier data in this, because Elsevier makes it impossible. Elsevier currently hides this behind a 42 USD paywall (Polyhedron) or – in a closed agreement with The Cambridge Crystallographic Data Centre (CCDC). I have no details of this agreement (CCDC refused to respond to my FOI request) but it gives a monopoly right to CCDC to be the holder of this data. CCDC sell a derivative product and only allow miniscule amounts of the data (ca 25 structures per year) on request. This is completely inadequate for what modern information-based scientists wish to do. It leads to bad science as the primary data cannot be reviewed and cannot be incorporated in new artifacts (CCDC forbid re-use of the data even though it is the primary scientific record).

I am therefore asking you do the following:

         Announce that all supplemental data accompanying Elsevier papers IS licensed as CC0.

         Require the CCDC to make all primary CIF data from Elsevier publications CC0. (The author’s raw deposition, not CCDC’s derivative works)

         Extend this policy to all other experimental data published in Elsevier journals (in chemistry this would be records or synthesis, spectra, analytical data, computational chemistry, etc.). When you agree to this I can give public advice as to the best way to achieve this.

I assume your division has effective power to do this on the timescale I have indicated. Note that in our past discussions you have used phrases such as “let’s talk to your librarians”, “we are reviewing this internally”, etc.) Any phrases of this sort will be interpreted as a refusal to make data CC0. Only a clear public commitment to make raw author data CC0 with target dates (e.g. within a month ) and an unequivocal public letter to CCDC requiring CC0 for raw CIFs can be regarded as Universal Access to raw author data.


Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069

This entry was posted in Uncategorized. Bookmark the permalink.

5 Responses to I request Elsevier to make experimental data CC0 and release crystallography from CCDC monopoly

  1. Alicia Wise says:

    Dear Peter,
    Thank you for your message. It is rather long and involved, and its description of various events do not always align with ours, but it is an important issue that you raise and I am very happy to respond on behalf of Elsevier. Datasets are sometimes published as supplementary information to journal articles. Authors provide Elsevier with only a non-exclusive license to publish/promote these supplementary datasets and so only the authors can decide to use a CC0 license for these datasets.
    This having been said Elsevier shares your vision for open data and a future in which data are much more broadly managed, preserved, and reused for the advancement of science. Professional curation and preservation of data is, like professional publishing, neither easy nor inexpensive. The grand challenge is to develop approaches that maximise access to data in ways that are sustained over time, ensure the quality of the scientific record, and stimulate innovation.
    Here at Elsevier we:
    • believe rich interconnections between publications and scientific data are important to support our customers to advance science and health
    • work with others to identify, if needed develop, and deploy standard approaches for linking publications and data.
    • encourage authors to document their data and to deposit their data with an appropriate open data centre or service and to make their data available for reuse by others, ideally prior to publication of articles based on analysis of these data, and with a permanent standard identifier to link from the publication to the dataset.
    • recognise that scientists’ invest substantially in creating and interpreting data, and their intellectual and financial contributions need to be recognised and valued
    • believe data should be accompanied by the appropriate metadata to enable it to be understood and reused.
    • help to communicate the benefits of data curation and reuse for different stakeholders in the scholarly communication landscape including authors, funders, publishers, researchers, and university administrators.
    • encourage authors to cite datasets that have been used in their research and that are available for reuse via a data curation center or service.
    • deploy our expertise in certification, indexing, semantics, and linking to add value to data
    • champion the importance of long term preservation of data, and accreditation systems/standards for digital curation services.
    You and your readers might find this short video by my colleague, IJsbrand Jan Aalbersberg, of interest. It is a 5-minute flash presentation from a recent STM Innovation seminar on this topic: http://www.youtube.com/watch?v=3KuBToc4Nv0 .
    Last but not least, our policies in this space are similar to those of other publishers. There are two industry position statements that many of us adhere to, and which your readers may find of interest. They are: http://www.stm-assoc.org/2006_06_01_STM_ALPSP_Data_Statement.pdf and http://www.stm-assoc.org/2012_12_04_STM_on_Data_and_IP_For_Scholarly_Publishers.pdf
    In closing, we at Elsevier welcome your thoughts and are committed to working with researchers to realize our shared vision for open data. I will post this response to your blog comment stream as well.
    With very kind wishes,
    Alicia
    Dr Alicia Wise
    Director of Universal Access
    Elsevier I The Boulevard I Langford Lane I Kidlington I Oxford I OX5 1GB
    M: +44 (0) 7823 536 826 I E: a.wise@elsevier.com
    Twitter: @wisealic

  2. Pingback: Unilever Centre for Molecular Informatics, Cambridge - #rds2013: My reply from Elsevier on publishing supplemental data « petermr's blog

  3. Pingback: Unilever Centre for Molecular Informatics, Cambridge - #rds2013 Managing Research Data « petermr's blog

Leave a Reply

Your email address will not be published. Required fields are marked *