In previous posts I have written on the value of robotic extraction of data in scientific articles. By default Elsevier do not allow robotic extraction:
All content in this Site, including site layout, design, images, programs, text and other information (collectively, the “Content”) is the property of Elsevier and its affiliated companies or licensors and is protected by copyright and other intellectual property laws.
… and …
You may print or download Content from the Site for your own personal, non-commercial use, provided that you keep intact all copyright and other proprietary notices. You may not engage in systematic retrieval of Content from the Site to create or compile, directly or indirectly, a collection, compilation, database or directory without prior written permission from Elsevier.
The Site may contain robot exclusion headers, and you agree that you will not use any robots, spiders, crawlers or other automated downloading programs or devices to access, search, index, monitor or copy any Content
PMR: So I have written the following letter:
Subject: Permission to extract crystallographic data robotically from Elsevier publications
Dear Clare Truter,
I and colleagues have built a repository of crystallographic information published in scientific journals. This data is factual, and not copyrighted by the original authors. Major publishers such as the International Union of Crystallography and the Royal Society of Chemistry encourage (and often demand) the publication of such data as part of the scientific record and mount it on their sites as “supporting information” or “supplemental data”. It is of extremely high quality and over the last 30 years the crystallographic and chemical community have shown that it is an essential resource for data-driven science – a concept with the NSF and JISC among other see as a large part of future science.
We have built robots which have analysed over 50, 000 papers on publishers’ sites and extracted the crystallography. Note that the major publishers I have referred to do NOT require a subscription to access this information. We have agreed protocols whereby our robots run at times and frequencies that do not cause denial of service (DOS) – i.e. we try to be responsible.
Elsevier journals do not expose this as public supplemental information but I believe it is available to toll-access subscribers.I would like permission to extract crystallographic data from any Elsevier journals using robotic techniques and to make the TRANSFORMED extracted data public under a CC-BY licence (Creative Commons) or an OpenData license from the Open Knowledge Foundation . All data so extracted would be referenced through the DOI of the article thus allowing any user (human or robot) to give full citation and therefore credit to the authors and the journal.
To help the discussion we note that facts, per se, are not copyrightable and that the authors do not claim copyright. The data are almost always direct output from an instrument. We need not store the actual documents (normally retrieved as IUCr CIF files) as our derived work is a value-added document in XML-CML which retains none of the creative work of formatting and pagination in the original.
I am sure you will agree that this is a reasonable request and that Elsevier as a major scientific publisher would wish to do whatever it could to foster the birth of a new science.
I am guessing that Elsevier journals (e.g. Tetrahedron, Polyhedron, etc.) contain a total of ca 20,000 relevant papers – until we are able to examine them robotically I can’t be more precise. Obviously I cannot write for permission for each paper individually so I am asking for general permission to carry out robotic extraction of crystallographic data from all Elsevier journals to which I have access through my institution. And I would obviously agree to devising a robotic protocol that was friendly to your web server.
If you and colleagues wish to be convinced of the value and quality of this cyberscience please have a look at http://wwmm.ch.cam.ac.uk/crystaleye where you can see the aggregated material from the other publishers. Although we haven’t published the results formally yet, two graduate students have carried out thousands of days’ work of theoretical calculations on the data which we believe have led to new insights into crystal and molecular structure.
I hope that Elsevier will be excited by the new vision and that we can move rapidly towards extracting this data. Note that the robots operate on a daily basis and provide news feeds to the community about new exciting derived data.
Note that this is a public request – I have explained the reasons on my blog (http://wwmm.ch.cam.ac.uk/blogs/murrayrust?/p=432) in which this letter is contained. Since this is a matter of considerable current public interest I request permission to post your replies – if there is material that you wish to remain confidential please send a separate mail to me indicating confidentiality which I will honour.
Unilever Centre for Molecular Sciences Informatics
University of Cambridge,
Lensfield Road, Cambridge CB2 1EW, UK