I am speaking on 2011-08-29 on new methods of publishing crystallography including data. I shall prepare my talk as a series of blog posts, not necessarily in the order that they are presented at the meeting.
I am arguing that there should be a concept of Open Crystallography to which crystallographers and other communities (not restricted to scientists) can subscribe. The idea is that published crystallographic iformation should be Open to everyone. Open as in the Open Knowledge Foundation’s Definition (http://www.opendefinition.org/ ):
“A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike.”.
This is a crystal-clear operational definition – something either conforms completely or it does not conform. There are no intermediate positions
Crystallography is among the most Open of disciplines. It has conducted its affairs for the benefit of the community and has pioneered the concept of publishing data, especially in the long-tail of science (where millions of data are published independently). It has an Open Access journal (Acta Crystallographica E) with a very modest fee and a very large authorship (1000s of articles per year).
In the days of print journals it was extremely pro-active in requiring the publication of crystallographic supporting data/information alongside text. Indeed if the IUCr had not argued for and won this process there would be many fewer examples of supplementary data published today.
There are several subdisciplines of crystallography. Macromolecular crystallography (proteins, nucleic acids, etc.) are supported by the PDB (http://www.pdb.org/ , Protein Databank) which is effectively Open. People can copy entries, create derivatives create mashups, reformat, etc. without permission.
My focus here is on chemical crystallography (small molecules). Although all supporting information must be submitted to journals, only some of them publish it visibly and Openly. High-volume publishers with Open supplemental information include:
- American Chemical Society
- Int. Union of Crystallography (IUCr)
- Royal Society of Chemistry
- Nature
And by default all Gold Open Access publishers (e.g. BMC)
In contrast a number of publishers (Elsevier, Springer (ex BMC), Wiley/Blackwells) do not publish the supplemental information (or hide it behind paywalls). They send it (or get the author to send it) to the Cambridge Crystallographic Data Centre CCDC). This information is hidden behind paywalls and permissionwalls. It is not open
There is now a growing grounswell for making small-molecule crystallography completely Open. Some of us have built tools to collect Open CIFs into our own repositories or to accept donations. [see /pmr/2007/12/22/update-on-open-crystallography/ for historical position]. The largest of these are
- Crystallography Open Database (http://www.crystallography.net/ ) with ca 149,000 depositions (Saulius Gražulis )
- Crystaleye 1 (http://wwmm.ch.cam.ac.uk/crystaleye) with ca 200,000 CIFs (Nick Day, PM-R)
Saulius and I met at this meeting and we have completely aligned objectives. We have agreed to use “Open Crystallography” as an umbrella for our efforts, and to exchange data and tools (I will explain this at the meeting). We can immediately donate our data to COD, and while most are duplicates there are clearly a number which are not.
Open Crystallography can follow the Panton Principles. [I have substituted ‘science’ by ‘crystallography’]
By open data in crystallography we mean that it is freely available on the public internet permitting any user to download, copy, analyse, re-process, pass them to software or use them for any other purpose without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself. To this end data related to published crystallography should be explicitly placed in the public domain.
Formally, we recommend adopting and acting on the following principles:
-
Where data or collections of data are published it is critical that they be published with a clear and explicit statement of the wishes and expectations of the publishers with respect to re-use and re-purposing of individual data elements, the whole data collection, and subsets of the collection. This statement should be precise, irrevocable, and based on an appropriate and recognized legal statement in the form of a waiver or license.
When publishing data make an explicit and robust statement of your wishes.
-
[…] Creative Commons licenses (apart from CCZero), GFDL, GPL, BSD, etc are NOT appropriate for data and their use is STRONGLY discouraged.
Use a recognized waiver or license that is appropriate for data.
-
The use of licenses which limit commercial re-use or limit the production of derivative works by excluding use for particular purposes or by specific persons or organizations is STRONGLY discouraged. These licenses make it impossible to effectively integrate and re-purpose datasets and prevent commercial activities that could be used to support data preservation.
If you want your data to be effectively used and added to by others it should be open as defined by the Open Knowledge/Data Definition – in particular non-commercial and other restrictive clauses should not be used.
-
Furthermore, in science it is STRONGLY recommended that data, especially where publicly funded, be explicitly placed in the public domain via the use of the Public Domain Dedication and Licence or Creative Commons Zero Waiver. This is in keeping with the public funding of much scientific research and the general ethos of sharing and re-use within the scientific community.
Explicit dedication of data underlying published science into the public domain via PDDL or CCZero is strongly recommended and ensures compliance with both the Science Commons Protocol for Implementing Open Access Data and the Open Knowledge/Data Definition.
There are probably about 300,000 datasets for small-molecules (CIFs) now openly available through COD or Crystaleye and smaller collections. But there are probably about 50,000 – 150,000 CIFs published electronically but closed behind the CCDC walls. If we can Open these (and I am awaiting CCDC’s reply) then all small-molecule crystallography becomes Open.