#solo10: Publishers, is your data Openly re-usable – please help us

Dictated into Arcturus

There is an absolutely critical point for the Green Chain Reaction that has been raised by a reader of this blog

Anon says:

August 12, 2010 at 9:45 am  (Edit)

Aren't most supplementary information files freely available? There must be tons of synthetic procedures hiding in there.

From a JACS paper: "This [Supporting Information] material is available free of charge via the Internet at http://pubs.acs.org."

As regards explicit permission…good luck finding any information at all about that.

The problem is copyright, followed by contract. By default copyright prevents you from doing anything with a document. And it's the law, so that not surprisingly reputable institutions such as universities are absolutely clear that it must not be broken.

It has been argued that copyright law even forbids your saving a copy of any downloaded article on your hard disk. It's almost certainly true that if you have thousands of PDF articles you're breaking copyright in the eyes of some interpreters. And I should stress that this is an area where almost every question is unanswered. The only responsible default answer is "you can't do that."

This is an enormous impediment to data-driven science. Enormous. By default we cannot extract or use any data from the published literature unless there is explicit permission. It's got to the stage where this problem is seriously holding back science.

You might argue that I'm being pedantic or chopping hairs. I'm afraid that's not true. Shelley Batts reproduced a single graph from a closed access article (http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=338 ) and was sent a threatening legal letter from the publisher (Wiley). There was an explosion in the blogosphere, and Wiley dropped their threats against Shelley. However they have never explicitly given permission to any one to post any closed material from any journal and we must assume that they are capable of carrying out the same threats in the future, although they know how the blogosphere will react.

I myself have caused the University of Cambridge to be completely cut off from a publisher on two separate occasions (ACS and Royal Society Of Chemistry). In both cases what I did was completely legal but the publisher's automatic systems assumed that I was stealing content and automatically closed down all access to anybody in the University of Cambridge for an indefinite period. I do not know on what basis this was done, but I assume that it was for a (wrongly) interpreted infringement of the specific contract that the University of Cambridge had with the publisher.

You should note that almost all universities sign contracts with publishers which are more restrictive than copyright. These contracts are often covered by secrecy agreements, and so the average scientist in the university probably has no access to the details of the contract that they may infringe. The contracts in many cases cover the amount of material and the frequency that a scientist has access to. Given that data-driven science requires access to large amounts of material on a frequent basis it is almost certain that attempts to carry it out are likely to involve in breach of contract in particular institutions.

I appreciate that the current business model of publishing is that closed access publishers "own – or control - the content". I personally do not agree that this is a good thing for science (in fact I believe it is highly detrimental) but I do not intend to break the law (at present) and I do not intend to cause my university to break its contract. However this means that readers are regarded as being potential content thieves and that and the publishers put in place expensive content management and access systems with a partial purpose to police the activities of readers. It is more important to most closed-access publishers to protect content than disseminate science and they err on that side.

The default position, therefore, is that one cannot automatically download and re-use content on the web without permission. It is very disappointing that none of the major publishers (other than of course the CC-BY Open Access publishers who are excused from all of these arguments and discussion) have changed their attitude or practice in making content available to scientists for the practice of modern science. I have on several occasions written to publishers asking whether I can use material and almost invariably I get no reply. It is difficult for me to see the closed-access publishers as part of an alliance which is trying to improve the way that data-driven science is done.

I should comment also that Openly reusable data must lead to higher quality science in that it is easier to pick up errors. For example our Crystaleye System (or rather Nick Day's system http://wwmm.ch.cam.ac.uk/crystaleye ) not only aggregates all publicly visible crystallographic data on publisher's web sites but also validates it as it processes it. This validation is part of our recently funded #jiscxyz bid where we are working to develop a means of validating Scientific Data before it appears in print. Just recently I have been pointed at two cases from very high profile closed-access publishers where the crystallography has been inappropriately used to support what is clearly invalid science. If the data had been openly available these errors would not have happened. There are also notable cases where the blogosphere has rightly criticized published science on the basis of invalid or incorrectly interpreted data.

Despite my campaign for greater openness in publication, I am not against a commercial publishing industry. I am in favour of a responsible publishing industry which makes efforts to innovate and support science. I am disappointed that publishers have not addressed the question of re-use of data and I am saddened by the fact that they do not regard readers' emails and enquiries such as mine as worth replying to. It is not just me – it is now two years since Chemspider enquired to ACS about the rights to re-use ACS supplemental data and as far as I know ACS has not given a formal reply.

I am, however, an optimist and I believe the publishers will now take this problem constructively and start to give clear information. Since data are not only valuable for citations but also are necessary for the proper practice of science I'm going to assume that the major players in chemistry will be keen to give definitive answers on this problem. If nothing else, it is actually in their best interests – being seen to be helpful and to increase their own citations is hardly a barrier to doing good business.

Therefore Heather and I will be preparing a set of letters to send to all the publishers under the IsItOpen tool. This allows the precise request and precise response to be publicly visible and act as a useful definitive record. It therefore saves both the readers and the publishers from having to continually reiterate their position. We appreciate that it may in some cases not be possible to give complete answers but we would certainly expect the courtesy of a timely reply.

We hope that it will be possible to collect all the replies from the major publishers before the Science Online meeting and hope that this will be a valuable contribution to the delegates and those who are following the procedures from a distance.

More later on what exactly free, open , gratis and libre mean.

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to #solo10: Publishers, is your data Openly re-usable – please help us

  1. Sarah Shreeves says:

    And publisher policies also make it difficult in areas where we do have some ability to really influence how we make available scholarly work - particularly theses and dissertations (where Peter, I know you've done a lot of exploring). Many of our graduate students in Chemistry include articles in their dissertations that were published with ACS or other major publishers; generally the publisher allows these to be included in the dissertation but heavily limits the terms under which it is 'published'. Approximately half of the students depositing in our now mandatory ETD program at the University of Illinois have chosen to restrict access to their dissertation in part because of these terms.

    I know there are lots of other problems with the way ETDs are shared and the format they're in - but the publisher issue is one that isn't commonly acknowledged as a problem.

  2. Pingback: Twitter Trackbacks for Unilever Centre for Molecular Informatics, Cambridge - #solo10: Publishers, is your data Openly re-usable – please help us « petermr’s blog [cam.ac.uk] on Topsy.com

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>