Peter Suber is one of the people I most respect, though we have never met, and we’ve been having a discussion about whether the text-mining policy announced by Nature Publishing Group is “libre” or “fair-use”. [Here I discuss his comments, but add some background first].
A major problem is the use of terms, whether derived from common English usage (“fair” and “use”) or specially constructed (“libre”). In either case the meaning is never self-evident and also interpreted differently by different people. For “open access” there is a huge spectrum. At one end is Klaus Graf, and PloS and BMC and me – who want Open Access to mean complete adherence to the BBB declarations which means you can do anything with the paper (including selling it) as long as you acknowledge the authorship. At the other end are publishers who charge authors (“funder-pays” in my jargon) for the privilege of having their paper readable on the publishers’ web site but with no other permissions. (“you may not download this paper, keep a copy, re-use it in whole or part for any purpose, put it in you repository, etc.”). In the English language these are both “free”, which is highly confusing.In Open Source terms these are explained as “free-as-in-speech” and “free-as-in-beer”. To resolve the English ambiguity the terms “libre” and “gratis” are increasingly used. Wikipedia elaborates slightly.
Peter and I are agreed that it is really important to get this right. It’s not just theoretical – if I mis-use a publisher’s item, because I think I can do something with it when they think I can’t I’ll get a lawyer’s letter or have my institution cut off (both have happened – one to me, so they are not “academic”). And, if the UK passes HADOPI-UK the publisher will simply ask Ofcom to have my home broadband terminated, with no appeal. (That’s why I am writing about HADOPI – it matters).
Words generate arguments. I warn everyone in our group that when we talk about ontologies we will fight. And we do. And that’s when we are all trying to reach a common goal – a machine-implementation of human understanding. With publishing it’s worse because there are some publishers who deliberately want to make it difficult for us to use our (sorry their) content on their sites. So they have no interest in a common definition of “Open Access” and the more confusion the better. A publisher can now get funders to pay large amounts (1000-3000 USD) for a toll-free (gratis to readers) publication. So the precise meaning of the term can carry a great deal of money with it.
Some publishers such as BMC are quite clear. Author/funder pays and reader can do whatever they like. It’d defined by Creative Commons – Attribution licence. Clear and trivial to interpret. I’ve not heard any problem of people re-using CC-BY content.
You also have to understand that there is something called “fair use”. This is impossible to define precisely (see Wikipedia) but it’s country dependent, depends on the monetary damage to the copyright owner, depends on the amount re-used relative to the whole, etc. What is fair-use can only be resolved by paying lawyers huge amounts of money to fight it out in civil court. It’s generally agreed that reproducing chunks of text to back up one’s science is legitimate and photocopying papers for teaching is not. (Personally I disagree ethically with the latter – after all it’s OUR content). There is a particular problem with images as most copyright regards images as creative works (e.g. cartoons, streetmaps, photographs in museums, etc.). But a spectrum? Created by a machine?
It would help a great deal if publishers actually said what they regarded as fair-use and what other privileges of re-use the author/funder may have bought. But they don’t. It’s far more profitable to keep everyone in FUD. Librarians are now so terrified of publishers that they will always err on the side of conservatism (“I don’t know precisely what you want to do but assume you can’t”). I’ve brought this up on my blog and publishers know there is a problem and they have failed to make any reasonable approach to the academic community.
After all a publisher can charge an author 100 USD for including a diagram (created by another author) in a review when the publisher hand no hand in its creation. Why give up the cash cow?
So what can a reader do without being sued?
They can write down in pencil and type up (on a manual typewriter) words and data from an article. They’ve been doing this for >100 years and no-one has objected
They can redraw diagrams. (I can remember a review I published where I included one of my own diagrams published in an ACS journal. The Royal Society of Chemistry redrew the diagram (it had ca 1000 data points). It was badly redrawn. How completely absurd.
They can compile facts (like melting points, spectra, etc.) as long as they write them in cuneiform
But when the material is electronic (which should make this process easier) the publishers absolutely forbid it. I can do text-mining by hand, but not – apparently – by machine.
Therefore, whether you approve of their motives or not:
Closed Access publishers deliberately make it difficult to re-use their information
They claim to be supporting science. They aren’t, they are supporting their shareholders or CEO’s remuneration.
So where does the NPG text-mining issue come? I’ve written enough, so I’ll cover that in the next post.
Just remember that we are allowed fair-use though no-one agrees what it means