I am continuing with my analysis of Elsevier’s terms and Conditions that researchers must use to carry out content-mining. The first post urged you to stop and think https://blogs.ch.cam.ac.uk/pmr/2014/02/06/content-mining-elseviers-tdm-why-researchers-and-libraries-should-think-very-carefully-and-then-not-sign-1/ (I hope you haven’t already signed). This post suggest that what they are requiring researchers to do is probably legally meaningless in parts.

The first thing to realise is that the terms are potentially incompatible with other indications on Elsevier’s site. Thus the TaC allow content mining of closed articles. To mine Open access articles requires a different licence or process. And specifically (http://www.elsevier.com/about/universal-access/content-mining-policies?a=120946 ):

User License

Reuse the article in another work?

Reuse portions or extracts from the article in other works?

Make a modification of the article (e.g. translations)?

Text & data mine?

Choose a different license?

‘Sell ‘ or re-use for “commercial purposes”?




















I am NOT allowed to mine Elsevier’s “Open Access” articles published under CC-BY-NC-ND (an option that Elsevier makes it easy for authors to select (unlike Springer who rightly forbids CC-NC for Open Access)).

Now I expect that someone from Elsevier will mail and say I have misunderstood this – which I will accept as many Elsevier papers contain direct legal contradictions such as CC-BY and “All rights reserved” juxtaposed. But the point is that legal documents must be clear and this one isn’t clear to me.

With this reservation I’ll take section 2:


2.1  Elsevier grants You a limited license to use the TDM Service, data, files and other materials provided by Elsevier (the “Dataset”), to use the TDM Service:

2.1.1 to continuously and automatically extract semantic entities from full-text articles retrieved through the TDM service for the purpose of recognition and classification of the relations between them and mount, load and integrate the results (the “TDM Output”) on a server used for the User’s text-mining system (i.e., not in libraries, repositories or archives) for access and use by the User or the company, institute or organization the User is affiliated with;

LIBRARIES NOTE: I cannot put my output in the University Repository (dspace@cam.ac.uk). The IR is a natural place to put valuable science. I have probably put more science into repositories that any other. I put 200,000 datasets in the IR nearly 10 years ago. So, simply, even with the requirements of funders I CANNOT archive my science.

2.1.2 to distribute the TDM Output externally, which may include a few lines of query-dependent text of individual full text articles or book chapters which shall be up to a maximum length of 200 characters surrounding and including the text entity matched(“Snippets”) or bibliographic metadata,.

Here’s a typical scientific sentence (I didn’t search for it – I didn’t even have to flip the page):

Liquid chromatography was performed on an Agilent (Torrence, CA, USA) 1100 HPLC system coupled to a triple-quadropole mass spectrometer (Waters-Micromass, Manchester, UK) with a Z-spray ESI operated in positive mode source using a flow of 700 L/h nitrogen desolvated at 350 °C.

278 characters. Every single word is necessary for accurate rendition of science. I can’t quote this responsibly if it’s truncated. Take the “°C” off and there’s a good chance someone will think it K => 77 °C. This happens. It’s actually scientifically irresponsible and illiterate to require truncation.

Further the TDM Output should include a Creative Commons proprietary notice in the following form:

“©Some rights reserved. This work is distributed under the terms of the CC-BY-NC Attribution-NonCommercial- 3.0, which permits non-commercial use, distribution, and reproduction in any medium, provided the original author and source are credited.”

My TDM output is FACTS. DATA. It will look something like:

<method> Liquid chromatography</method>

<equipment> Agilent (Torrence, CA, USA) 1100 HPLC system</equipment>

<flowrate units=” L/h”>700 </flowrate>

(actually it’s much better and smarter than this…)

And I am required (“SHOULD”) to copyright this.

But this is ridiculous, FACTS and DATA cannot be copyrighted. See http://www.lib.umich.edu/copyright/facts-and-data – UMich are among the world experts in this area:

Exceptions to Copyright: Facts and Data

Copyright basics
Copyright law provides protection for original creative expression that is recorded in a physical or digital form, things like literary works, music, art, and film. Copyright does not protect facts, data, or ideas though it does protect databases.

Copyright and databases
Copyright law does not apply to facts, data, or ideas. According to the U.S. Constitution, the purpose of copyright law is “to promote the progress of science and useful arts.” If copyright could grant individuals or business exclusive control of facts and ideas, it would constrain all kinds of progress, or eliminate it altogether. That is why the second section of the US Copyright Act spells out what is not protected by copyright:

It is important to remember that even if a database or compilation is arranged with sufficient originality to qualify for copyright protection, the facts and data within that database are still in the public domain. Anyone can take those facts and reuse or republish them, as long as that person arranges them in a new way. Unless they are accessible only under a contract that conditions access on limiting how the facts and data may or may not be used; any such contract would control.

And I would contend that even if I wanted to copyright this as a database (perish the thought) I don’t think 3 lines of RDF represents a database.

So what Elsevier are asking me to do (if I signed up) is legally absurd.

And that makes the whole TaC unacceptable. If it has such glaring errors as copyrighting data can YOU trust any of it to be legally valid?

In the next post I shall show why I (specifically PMR and some others) would breach Elsevier’s TaC as soon as I started mining…

[BTW Heather Joseph tweeted – “why is PMR the only one blogging on this topic?” do libraries and universities and academics simply not care?]



6 Responses to #elsevier’s TDM Terms (TaC): Can they force us to copyright data? (2)

  1. Peter, I’m hoping you’re also going to summarize these analysis posts in a letter to Nature…

    • pm286 says:

      You mean challenge Richard’ van’s article? Don’t really have time – am off to AU. You are welcome to use my material with my blessing.

      • Mike Taylor says:

        Peter: you should do this. If you have time to write the blogs, you have time to write the letter to nature. It’s important that it be done, and no-one is better placed to do it than you. (Not to mention that most potential authors of such a letter would feel awkward about writing it without your being on the authorship, now that this page and the others exist.)

  3. Marcus says:

    Personally, I think Elsevier are being ridiculous. There’s no legal justification whatsoever.

