Nature’s recent “news” article on Text and Data Mining was unacceptable [redacted]; I ask them to renounce licensing.

[See Update 2014-02-10 at end]

I have sent the following letter to Philip Campbell, Editor of Nature:

Dear Philip,

I am writing to you to protest against your biased reporting of Text And Data Mining in Nature News (part of Nature Publishing Group (NPG))[1] . This article, which purports to be news, is effectively an attempt by the Toll-Access Scientific, Technical & Medical Publishers (TA-STM) industry to promote publisher licences as a benefit to science. It is in the same category of market-led misinformation as Science Magazine’s analysis of flawed Open Access.

Here is the true story which I ask you to publish to redress the balance.

For years Nature and other TA-STM publishers have consistently fought to prevent Text and Data Mining (TDM) solely for their financial benefit. As an example NPG promoted “the Open Text Mining Interface” in 2006 which was designed to appear useful but actually jumbled the sentences (” while preserving any subscription model that funds the journals”).[2].

During the last few years publishers, including yourselves, have imposed draconian conditions restricting crawling and reuse far beyond copyright law. These effectively prevent legal TDM for science and this has killed open activity.  Scientists doing TDM hide their activities for fear of being prosecuted or cut off. For example Max Hauessler (cited in your “news”) spent two years trying to get permission from Elsevier [3] to mine for biological sequences. It is no surprise that he can be persuaded to give a positive comment now that he can “click through”. Heather Piwowar “negotiated” for months with Elsevier – who sent several executives to negotiate. I quote her (with permission) “I hate negotiating with publishers – the stress gives me hives [Urticaria]”.

Last year the European Commission attempted to pave the way for responsible TDM (“data analytics”) by bringing the publishing community together with librarians and scientists and open groups (Ross Mounce and I represented the Open Knowledge Foundation). “Licences 4 Europe” was a series of meetings in Brussels. To summarise, the TA-STM publishers were not prepared to cooperate effectively [4] and halfway into the proceedings most of the committee wrote:

“We write to express our serious and deep-felt concerns in regards to Working Group 4 on text and data mining (TDM).  Despite the title, it appears the research and technology communities have been presented not with a stakeholder dialogue, but a process with an already predetermined outcome – namely that additional licensing is the only solution to the problems being faced by those wishing to undertake TDM of content to which they already have lawful access. ”

The signatories came from about 40 highly responsible European Scientific and Scholarly Organizations [4] and included: The Association of European Research Libraries (LIBER), UUK, The Royal Society, SURF, The Hungarian Academy of Sciences, JISC, SPARC, Research Libraries UK., The Austrian Science Fund, and included experts on policy and intellectual property law. Despite this clear and compelling request the TA-STM publishers held their position, and the signatories later withdrew from negotiations.

This failure of cooperation was later noted by Mme Neelie Kroes, European Commissioner for Digital Agenda and Vice-President EC [5]

“And, for me, the Text and Data mining Group has also shown something very clear. We need to find better ways to cope with immense data flows. They affect so many aspects of our daily lives and professional work. As the European Council put it, big data drives innovation, improves productivity, means better quality services. And scientists in particular can use these data flows for research, even for life-saving discoveries. They need every possibility to do that.

I understand the proposed initiative here by publishers is not supported by the users. And this cannot be seen as any kind of solution without agreement from that very important group of stakeholders. Now we need to seriously consider possible legislative exceptions.”

The TA-STM publishers, NPG included, have ignored Mme Kroes. The industry continues to promote licences. Elsevier’s recent announcement is not news (save for the click-through) and although previous Elsevier contracts are often secret, I suspect the click-through forfeits even more rights than before. By your complete lack of balance in failing to report any of the Licences4Europe dissension and choosing proponents who can be expected to see click-through as an advance, you are effectively marketing the licence solution under the guise of news.

My primary concern is the unacceptability of NPG using its “news organ” for self-interested promotion of the licence solution. However I have also analyzed Elsevier’s “click-through” licence in some detail and found it directly contrary to the requirements of TDM. It is badly written and designed to stop any large scale TDM. In my blog [6]  (and several previous ones) I show that the licence prevents me legally from doing chemical TDM as it would disadvantage Elsevier’s commercial offerings in this area. I could easily end up in court. So, I suspect, could the enthusiasts from whom you got quotes – their outputs, if done responsibly, could compete with Elsevier products. My analysis is backed by Professor Charles Oppenheim an expert in scholarly publishing.

There will be a strong incentive for other TA-STM publishers, including, I suspect, NPG, to follow the Elsevier route. This will either result in a plethora of per-publisher click-through licences or a single, probably highly restrictive Elsevier-like licence, available through a publisher supported gateway.

At present therefore I am finding it hard to continue to have confidence in NPG as a responsible organization in Science evaluation and communication. This is a great pity as I have previously worked productively with you and your colleagues.  Richard van Noorden had asked if he could do a story about our new initiatives in TDM (to be announced later this month) – I can’t now regard this as impartial.

I would ask you to do the following:

  • publicly renounce the use of licences to control TDM and agree that “The right to read is the right to mine”. The Royal Society (a publisher) takes this position so surely NPG could.
  • Commission a balanced account of the Licences4Europe story from a disinterested expert and publish it in Nature.


In two months the UK parliament is expected to table and pass the Hargreaves recommendations for TDM,  when we will be able legally to carry this out in UK. Since my institution subscribes to a large number of NPG journals which I have the right to read I expect to start mining them, without further negotiations and without your further permission, in the near future.


This letter will appear on my blog. I would consider it appropriate for Nature Correspondence and I request you to publish it.






[2] and

[3] My submission to the UK Government IPO




  • Update.
  • Richard van Noorden has tweeted that he is the sole author of the article. I accept his assertion and have removed the implication that he was involved in a marketing exercise. My other concerns about the unacceptability of a news article promoting NPGs position remain.

8 Responses to Nature's recent "news" article on Text and Data Mining was unacceptable [redacted]; I ask them to renounce licensing.

  1. Mike Taylor says:

    Thanks for putting this together, Peter. It’s important stuff. I hope Nature publish it (along with a counter-point from Richard, who I know doesn’t accept everything you say here).

  2. Dear Peter,
    I’m replying as the reporter of the news article in Nature. First, I want to be very clear: I reported this article alone – with no ‘marketing’ input from Nature Publishing Group (NPG). The news team work independently, which is essential to our strong reporting. Also, as I’ll explain, I believe my article was fair, giving representation to pro and anti- sides in this debate.
    I think the issues you raise about publishers’ policies on text and data mining (TDM) are important: so important that I have reported on them four times already. I’m proud that Nature News has provided extensive coverage of TDM and its implications for researchers. I can’t speak for NPG on their TDM policies.
    Let’s dig into the detail: you suggest that the article was ‘biased reporting’ which ‘purports to be news’ and was ‘effectively an attempt … to promote publisher licenses as a benefit to science’. My article does not intend to make a case for or against publisher licenses. It is, quite simply, reporting: explaining what has happened, and how scientists reacted to Elsevier’s new policy (which was, of course, news).
    Far from a bias for publishers’ licenses, the article clearly states the objections that you raise against the license approach. The introduction says that ‘some scientists object that even as publishers roll out improved technical infrastructure and allow greater access, they are exerting tight legal controls over the way text-mining is done’. The final three paragraphs explain precisely the complaints that some researchers have with the way publishers are setting license-controls on text-mining activity, leaving the reader with Ross Mounce’s criticisms.
    On the other hand, for all you might disagree with them, it is a fact that other scientists I spoke to – including Max Hauessler, who has been very critical of Elsevier in the past – were pleased about the API and the click-through license. They told me that this would open up TDM opportunities, albeit under restrictive conditions (conditions that the article explains). I had, as you know, contacted you for your reaction too. You pointed me to your first blog (written before your more detailed analysis, which wasn’t available at the time), and I judged that Ross Mounce had already provided a voice for that view in the article.
    It is particularly bewildering that you accuse Nature of “failing to report any of the Licenses4Europe discussion”, and ask for “a balanced account of the Licenses4Europe story”.
    For as far as I am aware, Nature is the *only* mainstream media venue to have reported the Licenses4Europe issues. In March 2013, I covered the clash between scientists and publishers over licenses, and in June 2013, further reported on the divisions rife in the European Commission TDM discussions. What’s more, two years ago I wrote the first media coverage of Max Haeussler’s struggles to get permission from Elsevier to text-mine for biological sequences.
    I didn’t consider that the Licenses4Europe discussion needed to be explained again in this article: for I had already explained the argument that ‘the right to read is the right to mine’, and noted that the European Commission was examining the issue. Of course, all the relevant previous coverage is linked to at the end of the story.
    Where does this discussion of bias and reporting balance leave us? Your critique helps me think carefully about how I’m reporting my stories for our readers. And your campaigning is bringing the issue to wider attention; I’ll be as interested as you are to see how NPG responds to your call for the company to ‘publicly renounce the use of licenses to control TDM’. Your examination of Elsevier’s detailed legal terms is also very useful. So, broadly, I welcome your letter.
    Except this: you have conflated your antipathy to NPG’s (and other subscription publishers’) TDM policies, with the incorrect accusations that the reporting in Nature was an attempt to promote publisher licenses, and was somehow ‘marketing … under the guise of news’. I’m pleased that you have already retracted your implication that I was involved in a marketing exercise. I hope that in future you’ll keep separate your critiques of my reporting, from your critiques of NPG policies.

    • pm286 says:

      Thank you for a detailed reply. I am in airport transit and will read later

    • Dear Richard,
      now having some time to read up everything, I disagree with the sentiment in your first paragraphs of your Nature News item. In fact, it feels like a punch in the face: to me someone who has been trying to improve scientific dissemination in the era of the internet for about 20 years now, this Elsevier step is *not* step forward. And if you then write that researches welcome it (despite having some critique), I would have expected you to put this in that context. After two decades even I am happy that some step is taking. But at the same time, even though anything is better than nothing, if you starting thinking about the contract with them, you know this license stifles innovation and that it cripples European industry.
      I have giving up hope on some publishers. I have no grudge against NPG and I know their novel technologies department has been doing rather innovative research. Yet, OTMI never took of the ground. Many years ago I added support for it in Bioclipse, but why was it not adopted? That is the context, and that was the level this Elsevier news should have been compared too.
      And then it exactly feels as this: a punch in the face. Their proposal is just to keep us sweet. That is fine; I have not been expecting differently from Elsevier; I got wiser than that. But from a news item I would have expected this balance, that 20 years ago this “opening” would have been a step forward. All it now means is that we again have to wait another five years for something that might happen.
      I fully understand that a news item has limited space, and that the lead must be captive to a wide audience. But when you write:
      “Academics: prepare your computers for text-mining.”
      They were 10 years ago. This is a slap in the face for all who wanted to do this for years now.
      “Publishing giant Elsevier says that it has now made it easy for scientists to extract facts and data computationally from its more than 11 million online research papers.”
      Another slap: it was easy 20 years ago (I know, because I did it then); they made it less difficult.
      “Other publishers are likely to follow suit this year, lowering barriers to the computer-based research technique.”
      This is somewhat true, but keep in mind that the barrier was removed long ago, and for the past 20 years many publishers practically just refused to make the publishing model reflect what was technically possible. Any journal could insist of data to directly go into repositories.
      Oh, lowering barriers? True, because they could have removed those barriers they insisted on keeping in place for a rather long time. Fairly, a weaker slap.
      “But some scientists object that even as publishers roll out improved technical infrastructure”
      For the record, I object not their step, but I am quite disappointed. Some details on why in my blog:
      “and allow greater access, they are exerting tight legal controls over the way text-mining is done.”
      This is true, but another weak slap, as they do not exert control: they take more control over something that is under most international laws uncontrolled: fact are not copyrightable. With the contract scholars sign with Elsevier, they are forced to exert copyright on something that is basically without copyright. That is not “exerting control” in the sense that it was already there: it is new control.
      All that said, I fully see your intend of a balanced view, but I am happy that some of us stand up and show that proposals like this take us further away from what science was 100 years ago. In fact, putting it in context of access to knowledge in the 19th century and then active copyright laws, this step from Elsevier is a step backwards in scientific dissemination. If you think about it like that, I am not sure I like to celebrate this step at all.

  3. Thanks Egon for this, which I’ve just seen, and Peter for your reply on a separate post. I’ll respond next week on Peter’s second post, but just to note for now that I’m thinking carefully about these critiques of my reporting, and am pleased that they are now clearly separated from accusations of ‘marketing’. It’s very good to hear in more detail what people think about Elsevier’s policy – and the policy of publishers like NPG who are getting TDM terms added to site licenses. Also, I will be reporting again on the legal aspects of text-mining, probably in April when the UK law comes in.
    Can you leave your comments, Egon, on my story so that readers who come to it in future can see them? Or, I’ll add them myself.

  4. Pingback: Reply to Richard van Noorden - Shuttleworth Foundation

