Green OA and Open Data

Peter Suber has picked up a point that I made at the RSC Open Access Meeting and I’m happy to address it:

15:58 22/05/2008, Peter Suber,
Peter Murray-Rust, RSC Open Access – what I think I’m going to say, A Scientist and the Web, May 22, 2008. Peter is referring to his talk today at Open Access Publishing in the Chemical Sciences (London, May 22, 2008). Excerpt:

[…]
The theme is “Open Data”. I’ve recently written a review of this in Elsevier’s Serials Review and it’s coming out RSN in a special issue on Open Access. It’s already on Nature Precedings. So if you want detailed aspects – a few months out of date, they are there.
Some bullet points:

  • Data are different from text. Open Access generally does not support data well (I make exceptions for ultra-strong-OA such as CC-BY and BBB-compliant. Of the sort that PLoS and BMC provide. Green Open Access is irrelevant to Open Data (I think it makes it harder, others disagree).

[… most points snipped…]

Peter S Comment. I follow and agree with all of this, with one exception: “Green Open Access is irrelevant to Open Data (I think it makes it harder, others disagree).” I don’t understand the claim or the argument, but I imagine we’ll hear more in time. Good luck today, Peter!

PMR: [I’m happy to be corrected in anything that follows… if you comment doesn’t get through please mail pm286]
Green Open Access describes a process – primarily of an author self-archiving her “paper” to an Institutional repository or their own web page. There are mechanisms for indexing repositories (e.g. Google Scholar). I’ve been through the process and here is a typical result:

[There is an anomaly in the the RSC does not actually allow self-archiving in this way but at the time they had publicly announced (or been announced) that they did. And the hassle of taking it out is even worse than the hassle of getting it in. So we agree to let it rest and there was a statement from RSC (Org Biomol Chem. 2005 May 21;3(10):2037.) clarifying it]
Green Open Access results in the full-text (versions may vary) of a paper being publicly visible, indefinitely, without price barriers. There are no default permissions – Green does not per se remove any permission barriers. In particular GOA does not actively support the extraction of data (of course an author may be permitted by some publishers to allow data extraction).
GreenOA is designed to be simple. Stevan Harnad argues that it can be accomplished with “one-click”. I haven’t found this to be true for me in Cambridge/DSpace but it’s a useful mantra. The “one-click” is to upload some version of the paper (varying between pre-/post- refereeing and author/publisher version).
GreenOA does not, in general, say anything about copyright or licences. The paper may or may not carry a publisher’s copyright, an author’s copyright and (frequently) none. There is almost never a formal licence. There is almost always no formal statement of policy for re-use. Cambridge DSpace states by default “Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.” It takes a lot more than one click to override this default.
There is no explicit mention in the GreenOA upload model for items other than the “full-text”. The repositories may provide such support but – at least in the early days – the focus was completely on full-text only.
We need to remove GoldOA from the discussion. GoldOA by default may also not remove permission barriers. However with GoldOA there is a single copy of the material – the items on the publishers website and these are freely accessible to human eyes. Indefinitely. So IF the author has submitted supplemental data, that will be oenly visible on the publishers site.
I hope we can all agree on these and I’ll start making my argument here…
======================
So by default GreenOA items are designed to be human-visible but without any support for Data, in any of upload, legal access and technical access. The primary goal of Stevan Harnad – expressed frequently to me and others – is that we should strive for 100% GOA compliance and that discussions on Open Data, licences and other matters are a distraction and are harmful to the GOA process. I suspect that many other do not take such a strong position. However if Open Data is irrelevant or inimical to GOA then it is hard to see GOA as supportive of Open Data.
However my main argument is that lack of support for Open Data in GOA is potentially harmful to the Open Data movement. Let’s assume that Stevan’s approach succeeds and we get 100% of papers in repositories through University mandates, funders et. al. (I’ll exclude chemistry from the argument). GOA will encourage the deposition of full-text only.
So a GreenOA paper may often be a cut-down, impoverished, version of what is available – for a price – on the publishers website. It may, and usually will, lack the supporting information (supplemental data). It will probably not reproduce any permissions that the publisher actually allows. So – if we concern ourselves with matters other than human eyeballs and fulltext – it is almost certainly a poorer resource than the one on the publisher site.
I’m aware that I’m speculating without data. If anyone can provide figures for the provision of (a) supporting info and (b) licences/permissions in IRs it would be extremely useful. However it is a lot of extra hassle and why bother anyway. The robots can’t search the data (technically) so why not point readers to the publisher website. It is possible that the reverse occurs – that some author archive more data than the publisher allows. But I doubt it’c common

So my major concern is that GreenOA will lead to substandard processes for publishing scientific data. I’d be happy to find Repositories that insist on data upload. I doubt they are common.
So here is a challenge to the community: How many instances are there of crystallographic data (CIF) self-archived with GreenOA papers. It’s allowed to archive the data. There are enough publishers (Wiley, Elsevier, Springer) who allow GreenOA. If no-one can find examples then again I would justify the use of “irrelevant”.
Now the more tenuous arguments.
Even if the IRs contained all the data appropriate to the publications how do we discover it? This is anyway very difficult, and CrystalEye succeeds mainly because of the insistence of the Int. Union of Crystallography on the need to publish all supporting experimental information. By contrast many publishers do not do this for chemistry. If I want to find data then unless there is a known data repository I will go to the publisher’s website, not the IRs. The Transylvanian Journal of Haematology is a better place to find data on nocturnal data on anticoagulants than searching IRs. Firstly I don’t know that the papers are in the IRs and they probably aren’t anyway. Secondly I don’t know that even those are indexed – maybe most are but the doubt remains. But mainly I don’t think I’ll find any data. And how, anyway, do I search thousands of repositories, when searching a small number of journals has a much higher concentration of productive results.
And there is a deeper worry about the role of data in the mandates. GoldOA does not by default remove any permission barriers. Yet the fees for achieving GoldOA can be very high. So if a publisher agrees with a funder that the grants can pay 3000 Draculas for a publication, this need not include any data. Funders may get the idea that data-impoverished publication is the most than can or should be achieved. The currently position with NIH/PMC is an example of extremely impoverished archiving. Read-but-don’t-use.
Many funders (Wellcome, and we heard from Robert Kiley 8 other major UK medical funders) require ultra-strong-OA for their archival. Because they care about data. And several publishers (PLoS, BMC) also insist on CC-BY. This is, of course, great for scientific data.
But it’s a long way from GreenOA.
This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to Green OA and Open Data

  1. Klaus Graf says:

    (i) I am not sure if I like ultra-strong-OA as name for CC-BY …
    (ii) I am fighting for OA to heritage items (like the Berlin declaration) and this is also a sort of call for “Open data” (for the humanities) and I don’t bother if Harnad sees that as distraction.
    (iii) There is very few CC-BY content in IRs.
    http://www.google.de/search?as_q=dspace&hl=de&
    num=100&btnG=Google-Suche&as_epq=&as_oq=&as_eq=&
    lr=&cr=&as_ft=i&as_filetype=&as_qdr=all&as_occt=url&as_dt=i&
    as_sitesearch=&as_rights=%28cc_publicdomain%7Ccc_attribute%7C
    cc_sharealike%29.-%28cc_noncommercial%7Ccc_nonderived%29&
    safe=images
    This Google search with license filtering is using the fact that DSpace articles often (or mostly?) have dspace in the URL. It hardly can find items were the CC-license is only mentioned in the PDF (this is the case in Gold-OA e.g. for Hindawi).
    http://www.google.de/search?hl=de&lr=&as_qdr=all&
    as_rights=(cc_publicdomain%7Ccc_attribute%7Ccc_sharealike).-(cc_noncommercial%7Ccc_nonderived)&pwst=1&
    q=allinurl:++site:dspace.mit.edu+dspace
    MIT-DSpace has 28065 items, with Google you can find 12 CC-BY (or PD) items (incl. some data presentations).
    (iv) I think it’s a question of the appropriate research infrastructure:
    Should
    * (commercial) journals (Gold-OA) make the open data available or
    * (non-commercial) standard IRs (Green-OA) or
    * (commercial or non-commercial) special IRs only for data?
    We need more experience with open data to see the pros and cons of each solution. But I don’t think one should confuse this problem with the Green/Gold-OA-discussion. One reason is that data aren’t peer reviewed (which is a main point in the Green/Gold discussion).
    [PMR I have had to edit some of the URLs as they are too long for WordPress]

  2. pm286 says:

    (1) Thanks Klaus
    (i) since “strongOA” will disappear soon, so will “ultra-strongOA”.
    (ii) This seems a very good idea. I was very pleased that I could find Minard’s classic picture of Napoleon’s march to Moscow as Open Content (under Wikipedia. We really need many more cultural items to be freely discussable and re-usable. Museums are much too possessive
    (iii). Thanks – I am not surprised. 0.05% seems believable
    (iv). Don’t understand the first list. I want everyone to make data available
    There are a number of journals that DO review data. I am not confusing Green/Gold with OpenData. Indeed I think they are separate

  3. OA Primer for the Perplexed
    Peter Murray-Rust continues to misunderstand, and hence misrepresent OA. The picture is a lot simpler than Peter Murray-Rust makes it sound. Here’s a simple glossary:
    1. Research Data vs. Research Arctices:
    Data: Research generates raw data.
    Articles: Research generates journal articles describing, analyzing and interpreting the raw data.
    Data in Articles: Sometimes articles don’t just describe but actually contain raw data.
    Articles as Data: Sometimes the articles themselves are treated as data.
    OA1 (Free Access) vs OA2 (Free Re-Use):
    OA1: Articles made accessible/useable free online for users who do not have subscription access to the journal in which they are published.
    OA2: Articles or data being made accessible/useable free online with various kinds of re-use licenses.
    (There is only one OA1 but there are several degrees of OA2, depending on which re-uses are licensed.)
    The Green vs. Gold Roads to OA
    Green OA: Authors make their articles and/or their data OA1 or OA2 by self-archiving them online.
    Gold OA: Journals make their articles OA1 or OA2.
    Green OA self-archiving by authors, mandated by their universities or funders, can in principle provide OA1 or OA2, for either articles or data or both. However, it would be difficult, resisted by many authors, and probably unjust for universities to mandate Green OA1 for data or to mandate Green OA2 for either articles or data. (Funder are in a position to mandate more.)
    Researchers may not wish to make their data either freely accessible/useable or re-usable, and they may not wish to make their articles freely re-useable. However, all researchers, without exception, want their articles freely accessible/usable (OA1).
    This is the reason Green OA1 mandates are the highest priority. Authors all want Green OA1 and they report that they will comply, willingly (see Swan studies) and actually do comply (see Sale studies) with Green OA1 mandates from their universities and funders to self-archive their articles.
    Moreover, OA1 for articles prepares the way and is likely to lead to OA1 and OA2 for data, as well as to some OA2 for articles.
    That is why Green OA1 self-archiving and Green OA1 self-archiving mandates should be assigned priority.
    Peter Murray-Rust, who is concerned exclusively with OA2 (re-useability) for both articles and data, persistently misunderstands much of this, especially the practical causal path and its attendant priorities. Here are the kinds of misunderstandings that keep recurring in Peter’s discussion of Green OA1 [translations are provided in brackets]:
    PMR: “Green Open Access [OA1 to articles] is irrelevant to Open Data [OA1 or OA2 to data] (I think it makes it harder, others disagree).”
    No, OA1 to articles is not irrelevant, either to OA1 to articles or data, nor to OA2 (licensed re-use rights) to articles and data. Nor does OA1 make it harder to achieve OA2 (for articles or data). But it would certainly make it harder to achieve Green OA1 for articles through Green OA1 mandates if we tried to insist on OA2 instead, or first.
    PMR: “There is no explicit mention in the GreenOA upload model [Green OA1 to articles] for items other than the “full-text” [data].”
    There is no “GreenOA upload model” but there is Green OA1 self-archiving of articles, and Green OA1 mandates to self-archive articles. Data and OA2 can certainly be mentioned in these mandates, but they cannot be mandated (because not all authors wish to provide OA1 to their data, or OA2 to their articles or data, whereas all authors wish to provide OA1 to their articles (even it needs to be mandated to get them to actually do it!).
    PMR: “The primary goal of Stevan Harnad – expressed frequently to me and others – is that we should strive for 100% GOA [mandated Green OA1 to articles]compliance and that discussions on Open Data, licences and other matters [OA2 to articles, OA1 or OA2 to data] are a distraction and are harmful to the GOA process.”
    What is distracting and harmful for getting consensus and compliance on Green OA1 mandates, hence for getting OA1 to articles, is not the discussion of OA2 or of data, but the suggestion that it is not enough to mandate OA1 to articles. The time to insist on more than Green OA1 mandates is when Green OA1 is already faithfully mandated and provided, not before Green OA1 mandates have prevailed.
    PMR: if Open Data [OA2 to data] is irrelevant or inimical to GOA [OA1 to articles] then it is hard to see GOA [OA1 to articles] as supportive of Open Data [OA2 to data] .
    Pre-emptive insistence on OA2 to data is inimical to achieving consensus and compliance on mandating OA1 to articles. Achieving OA1 to articles will certainly facilitate going on to achieve OA1 and OA2 to data as well as achieving some OA2 to articles.
    PMR: “my main argument is that lack of support for Open Data in GOA [OA2 to data and articles] is potentially harmful to the Open Data movement [OA2 to data and articles]. Let’s assume that Stevan’s approach succeeds and we get 100% of papers in repositories through University mandates, funders et. al… [This] GOA [mandates OA1 to articles] will encourage the deposition of full-text only [articles, not data]”
    Green OA1 mandates can encourage OA1 to data and OA1 and OA2 to articles and data, but they cannot mandate them, because all authors want OA1 for their articles but not all authors want OA1 for their data or OA2 for the articles and data. And pre-emptively insisting on more will only result in getting less (i.e., less consensus and compliance on OA1 for articles).
    PMR: “So my major concern is that GreenOA [OA1 to articles] will lead to substandard processes for publishing scientific data. I’d be happy to find Repositories that insist on data upload [OA1 to data].”
    I would be happy if we had 100% OA1 and OA2 to both articles and data, but I know of no way to achieve that, and certainly not directly, because it is not the case that 100% of authors want it already, in principle. But 100% of authors do want OA1 to their articles, in principle, and they can and do provide it in practice if it is mandated. And I find it hard to imagine that the universal practice of providing OA1 to articles will not strengthen the inclination to provide OA1 and OA2 to data and articles as well. On the other hand, it is easy to see why insisting pre-emptively on the latter will prevent even the former from coming into universal practice.
    PMR: “a GreenOA paper [OA1] may often be a cut-down, impoverished, version of what is available – for a price – on the publishers website. It may, and usually will, lack the supporting information (supplemental data). It will probably not reproduce any permissions that the publisher actually allows. So – if we concern ourselves with matters other than human eyeballs and fulltext – it is almost certainly a poorer resource than the one on the publisher site.”
    This point is truly perplexing. What is available on a (non-OA) publisher’s website is not even OA1, so what is the point of talking about its impoverishment to those would-be users who are not rich enough to afford the publisher’s version?
    And, yes, OA1 (free online access/use) is not OA2 (free online access/use and re-use licenses, to either article or data), because not all authors wish to provide OA2 to their articles or data, and Green OA1 mandates hence do not attempt to mandate it.
    However, data too can certainly be self-archived in Institutional Repositories (IRs), and IRs have the metadata tags for specifying re-use rights (OA2), if any, for all deposited articles and data.
    PMR: “Many funders… require ultra-strong-OA for their archival… [OA2 to articles and data] And several [Gold OA2] publishers… also insist on CC-BY [OA2 to articles]. This is, of course, great for scientific data [OA2 to data]. But it’s a long way from GreenOA [OA1 to articles].
    Yes, some funders can and do mandate more than OA1 to articles. He who pays the piper calls the tune, so funders are in a better position to do this than universities are (and they do not need authors’ consensus or consent, as universities do). But so far that funder OA2 applies only to articles (and usually after an embargo period), not to data (though funders could in principle mandate data self-archiving too, and eventually will, I hope).
    What Gold OA publishers provide is another matter; the OA1 problem is the problem of the 90% of journals that are non-OA, not the 10% that are OA. (Moreover, most Gold OA journals provide only OA1 too, as Peter Suber has pointed out.)
    PMR: “Even if the IRs contained all the data appropriate to the publications how do we discover it?”
    If authors self-archive their data, the IRs allow them both to link with the corresponding articles and to specify the re-uses licensed.
    PMR: “GreenOA [OA1] is designed to be simple. Stevan Harnad argues that it can be accomplished with ‘one-click’.”
    No, it is not OA1 self-archiving that is one-click, it is almost-OA via the “Fair Use” Button — for deposits that are not Open Access (OA1) Closed Access.
    The deposit of the full-text itself takes under six minutes’ worth of keystrokes, as described in Carr, L. and Harnad, S. (2005) Keystroke Economy: A Study of the Time and Effort Involved in Self-Archiving.

  4. pm286 says:

    (3) Thanks Stevan.

Leave a Reply

Your email address will not be published. Required fields are marked *