Peter Suber has picked up a point that I made at the RSC Open Access Meeting and I’m happy to address it:
[…]
The theme is “Open Data”. I’ve recently written a review of this in Elsevier’s Serials Review and it’s coming out RSN in a special issue on Open Access. It’s already on Nature Precedings. So if you want detailed aspects – a few months out of date, they are there.
Some bullet points:
- Data are different from text. Open Access generally does not support data well (I make exceptions for ultra-strong-OA such as CC-BY and BBB-compliant. Of the sort that PLoS and BMC provide. Green Open Access is irrelevant to Open Data (I think it makes it harder, others disagree).
[… most points snipped…]
Peter S Comment. I follow and agree with all of this, with one exception: “Green Open Access is irrelevant to Open Data (I think it makes it harder, others disagree).” I don’t understand the claim or the argument, but I imagine we’ll hear more in time. Good luck today, Peter!
PMR: [I’m happy to be corrected in anything that follows… if you comment doesn’t get through please mail pm286]
Green Open Access describes a process – primarily of an author self-archiving her “paper” to an Institutional repository or their own web page. There are mechanisms for indexing repositories (e.g. Google Scholar). I’ve been through the process and here is a typical result:
-
Representation and use of Chemistry in the Global Electronic Age
P Murray-Rust, HS Rzepa, SM Tyrrell, Y Zhang – dspace.cam.ac.uk
This manuscript addresses questions of robotic access to data and its automatic
re-use, including the role of Open Access archival of data. This is a
pre-refereed preprint allowed by the publisher’s (Royal Soc. Chemistry) …
Cached – Web Search
[There is an anomaly in the the RSC does not actually allow self-archiving in this way but at the time they had publicly announced (or been announced) that they did. And the hassle of taking it out is even worse than the hassle of getting it in. So we agree to let it rest and there was a statement from RSC (Org Biomol Chem. 2005 May 21;3(10):2037.) clarifying it]
Green Open Access results in the full-text (versions may vary) of a paper being publicly visible, indefinitely, without price barriers. There are no default permissions – Green does not per se remove any permission barriers. In particular GOA does not actively support the extraction of data (of course an author may be permitted by some publishers to allow data extraction).
GreenOA is designed to be simple. Stevan Harnad argues that it can be accomplished with “one-click”. I haven’t found this to be true for me in Cambridge/DSpace but it’s a useful mantra. The “one-click” is to upload some version of the paper (varying between pre-/post- refereeing and author/publisher version).
GreenOA does not, in general, say anything about copyright or licences. The paper may or may not carry a publisher’s copyright, an author’s copyright and (frequently) none. There is almost never a formal licence. There is almost always no formal statement of policy for re-use. Cambridge DSpace states by default “Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.” It takes a lot more than one click to override this default.
There is no explicit mention in the GreenOA upload model for items other than the “full-text”. The repositories may provide such support but – at least in the early days – the focus was completely on full-text only.
We need to remove GoldOA from the discussion. GoldOA by default may also not remove permission barriers. However with GoldOA there is a single copy of the material – the items on the publishers website and these are freely accessible to human eyes. Indefinitely. So IF the author has submitted supplemental data, that will be oenly visible on the publishers site.
I hope we can all agree on these and I’ll start making my argument here…
======================
So by default GreenOA items are designed to be human-visible but without any support for Data, in any of upload, legal access and technical access. The primary goal of Stevan Harnad – expressed frequently to me and others – is that we should strive for 100% GOA compliance and that discussions on Open Data, licences and other matters are a distraction and are harmful to the GOA process. I suspect that many other do not take such a strong position. However if Open Data is irrelevant or inimical to GOA then it is hard to see GOA as supportive of Open Data.
However my main argument is that lack of support for Open Data in GOA is potentially harmful to the Open Data movement. Let’s assume that Stevan’s approach succeeds and we get 100% of papers in repositories through University mandates, funders et. al. (I’ll exclude chemistry from the argument). GOA will encourage the deposition of full-text only.
So a GreenOA paper may often be a cut-down, impoverished, version of what is available – for a price – on the publishers website. It may, and usually will, lack the supporting information (supplemental data). It will probably not reproduce any permissions that the publisher actually allows. So – if we concern ourselves with matters other than human eyeballs and fulltext – it is almost certainly a poorer resource than the one on the publisher site.
I’m aware that I’m speculating without data. If anyone can provide figures for the provision of (a) supporting info and (b) licences/permissions in IRs it would be extremely useful. However it is a lot of extra hassle and why bother anyway. The robots can’t search the data (technically) so why not point readers to the publisher website. It is possible that the reverse occurs – that some author archive more data than the publisher allows. But I doubt it’c common
So here is a challenge to the community: How many instances are there of crystallographic data (CIF) self-archived with GreenOA papers. It’s allowed to archive the data. There are enough publishers (Wiley, Elsevier, Springer) who allow GreenOA. If no-one can find examples then again I would justify the use of “irrelevant”.