A better interpretation of "green" and "gold"

In my last post I had the presumption to lecture my readership on what “green” and “gold” access mean. Hubris strikes – I got it wrong. I comment on the comments and then continue with why I think “green” is not enough:

Chris Rusbridge Says:
April 10th, 2008 at 10:45 am eI don’t think [PMR’s definition] is right at all. Wikipedia says:
“In OA self-archiving (also known as the “green” road to OA [6] [7]), authors publish in a subscription journal, but in addition make their articles freely accessible online, usually by depositing them in either an institutional repository[8] (such as the Okayama University Digital Information Repository[9]) or in a central repository[10] (such as PubMed Central)…
“…In OA publishing (also known as the “gold” road to OA [14]) authors publish in open access journals that make their articles freely accessible online immediately upon publication. Examples of OA publishers[15] are BioMed Central and the Public Library of Science.”

PMR: I agree with this and will use it in the future

CR: In both cases (green AND gold) the permissions set the terms of what you can do. OA journals do not necessarily have licences that allow data mining.

PMR: Also agreed. In many cases OA journals have no explicit permissions at all. In these cases and where I have athe time I engage with the editors to help them clarify the position. Sometimes they realise that they do actually wish to announce permissionFree re-use.

I’m also not certain that a widely distributed set of repositories (the green road) is particularly resistant to data mining. OAI access should tell you which repositories have data of interest, and you robots can go there.

PMR But they will not know whether they are allowed to mine the data. OAI does not mean Open Access. It means Open Archives Initiative and the Open says nothing about permissions. It is extremely rare (in my experience) that material in OAI repositories carries an explicit statement about re-use. It’s possible to extract Green material from an OAI repository, re-use it, and be sued by a publisher.

Perhaps the real problem is that (a) licences offered are not those you need for the task (whether green or gold), and (b) those licences are rarely expressed in machine-readable form, even though Creative Commons have encodings to allow this. If licences were so expressed, then you could let your robots wander at will, and mine what they are allowed to!

PMR: I agree with this sentiment but in practice it is unlikely that there will be universal machine-readable licences in OAI repositories any time soon. So in practice roaming the OAI repositories is no use if I wish to re-use and redistribute the material.

Klaus Graf Says:
April 10th, 2008 at 1:49 pm eI found it was not a good idea by Harnad to choose the same colors as in the “road” metaphor. The last comment shows it is indeed confusing.
* Green road: Self-Archiving in Repositories
* Golden road: OA Journals
* Green OA: cost-free Access (PMR in an earlyer post: FREE access)
* Gold OA: Access without Permission Barriers (preferably CC-BY) – (PMR: OPEN access)
These are independent aspects. Most golden road journals (in DOAJ) are access-green, and CC-BY contents in green road IRs are access-golden.

PMR: Klaus seems to use the terms Green OA and Gold OA in the way I did and also seems to differentiate between Colour-road (how something got there) from Colour-OA (what you can do with it). This seems to conflict with ChrisR and PeterS.

Peter Suber Says:
April 10th, 2008 at 3:46 pm eHi Peter: Chris is right. There are two distinctions here and we shouldn’t mix them up. One distinction is between green and gold OA, or between OA through repositories and OA through journals. The other is between removing price barriers alone and removing both price and permission barriers. I think you meant to say that removing price barriers is not enough –and I agree with that 100%. But green OA *can* be enough.
Some green OA removes both price and permission barriers, and some gold OA does as well. But also note the converse. Just as some (perhaps most) green OA doesn’t remove permission barriers, some (perhaps most) gold OA doesn’t either. When we work for the removal of permission barriers, we are working to improve both green and gold OA.

PMR: I accept this definition as coming from the fountain of Open truth. Now for the implications (and see If I have learnt OA-101):

“some green OA removes both price and permission barriers”. This means that authors publish in a subscription journal (i.e. you can only read it if you pay) BUT allows an author to self archive the article and release it under a license where anyone can read it for free and anyone can redistribute it without permission. I think it happens when authors shout loud enough or for special issues and it also happens in disciplines like computer science where everyone republishes their articles with or without permission. But in general it isn’t common and it is of very little practical use (if only because of the difficulty of discovery). It’s of no use for data-mining unless (highly unlikely) the author actually attaches CC-BY or similar.
“some gold OA does as well”. In my experience – which is limited as I am a chemist and there are essentially no examples – all major Gold OA removes permission barriers. I’m thinking of BMC and PLoS and OUP. They all have CC-BY. There are some journals who have CC-NC and I have argued the case with some but in general this is a minor concern. So which major Gold OA journals forbid re-use? (We should exclude the awful hybrid journals which take money off authors for less than permissionFree). If an author has paid money for OA, which journals forbid their readers to re-use the article?
“…perhaps most) green OA doesn’t remove permission barriers”. I agree with this.
“…most) gold OA doesn’t either”. I’m disappointed if this is the case.

My conclusion is that the terms Green and Gold seem to me to be highly confusing and operationally almost useless for a reader. The reader doesn’t care how the material got there – they need to know what they can do with it. For that there has to be a simple set of labels and CC-* provides that.
Finally a word about why it is essential that the NIH continues to mandate deposition in PubmedCentral. (Stevan Harnad has argued that it would be better for authors to self-archive in their institutional repositories). Note that many authors – e.g. from industry – don’t have IRs anyway. But the main point is that it is completely impossible to discover and systematically mine this information. Let’s assume there are ca 60,000 articles deposited in PMC this year, and that there are ca. 10,000 institutions involved. (Evne if it’s only 1000 my argument holds). If I want these I have to set my own list of 10,000 repositories and trawl the lot – every day – for new content. (And I want it daily). And every other text-miner has to do the same. How do I know when a new institution publishes? I have to go to Pubmed anyway, so I might as well read the material there. And the compliance will be awful. The NIH cannot check 10,000 sites on a regular basis. In contrast if the stuff in in PMC (or UKPMC) then I can get a single RSS feed daily which will alert me to the material that comes in. The robots have no trouble trawling this. PMC will presumably alert me to what is minable and what – thanks to the publishers – is not. So I am afraid that self-archiving is a complete non-starter.

4 Responses to A better interpretation of "green" and "gold"

Chris Rusbridge says:

April 11, 2008 at 8:27 pm

Peter, you write “I agree with this sentiment [that machine-readable licences would let robots do their work] but in practice it is unlikely that there will be universal machine-readable licences in OAI repositories any time soon. So in practice roaming the OAI repositories is no use if I wish to re-use and redistribute the material.”
Well, you don’t have much minable stuff anywhere just now. The big guys may well have major policy drivers preventing them from doing it. IRs may be less at risk, and more under the authors’ sway. Librarianship is getting militant. Faculty are getting militant. There are machine-readable interpretations of all CC licences. I think those machine-readable licences are effectively invoked via a link to the appropriate part of the CC web site. So lets imagine a specialist Chemistry “service” (in the OAI sense), mining the metadata from all data repositories it knows about. Your robots search this service for papers of interest, some in repositories you may not have come across before. The robots jump across to the repositories, look for the licence terms, check the machine-readable terms, and SOME OF THE TIME will be able to mine the data. This is better than now and, if some appropriate incentives are in place, has the potential to grow. It’s a long game (any approach).
Which makes me realise I don’t know what kind of reward (eg citation) an author gets for having her paper mined by your robots!

pm286 says:

April 11, 2008 at 8:40 pm

(1) Thanks Chris.
CR: Well, you don’t have much minable stuff anywhere just now.
PMR: There are 15 million Pubmed abstracts and we and our collaborators have mined 500,000 in one day. Abstracts are very useful but we need even more.
CR Librarianship is getting militant.
PMR I’m not an expert but I haven’t seen a concerted policy effort to change things.
CR Faculty are getting militant.
PMR Some are but most have no idea that a problem exists
CR There are machine-readable interpretations of all CC licences. I think those machine-readable licences are effectively invoked via a link to the appropriate part of the CC web site.
PMR Yes, amd I use a machine-readable CC licence. But when I look in IRs I see no licences.
CR So lets imagine a specialist Chemistry “service” (in the OAI sense), mining the metadata from all data repositories it knows about. Your robots search this service for papers of interest, some in repositories you may not have come across before. The robots jump across to the repositories, look for the licence terms, check the machine-readable terms, and SOME OF THE TIME will be able to mine the data. This is better than now and, if some appropriate incentives are in place, has the potential to grow. It’s a long game (any approach).
PMR This is certainly a useful model but it hasn’t even started. It needs a domain specific repository of metadata and we aren’t building these, except in subjects such as HEP and bioscience where they don’t use IRs anyway.
Which makes me realise I don’t know what kind of reward (eg citation) an author gets for having her paper mined by your robots!

bill says:

April 12, 2008 at 6:31 am

PMC will presumably alert me to what is minable and what – thanks to the publishers – is not.
I wouldn’t presume that at all. I suspect they will do nothing of the sort, and you will be left unable to mine any of it for fear that your robot will eat something it shouldn’t have.

Pingback: Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » Open Access Week - kudos to the Wellcome Trust

A better interpretation of "green" and "gold"

4 Responses to A better interpretation of "green" and "gold"

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta