WOSP2014: Text and Data Mining: II Elsevier's Presentation (Gemma Hersh)

WOSP2014 – http://core-project.kmi.open.ac.uk/dl2014/ – is a scholarly, peer-reviwed workshop. It consists of submiited, peer-reviewed talks and demos and invited talks from well-known people in the field (Lee Giles, Birger Larsen). At ContentMine we submiited three papers/demos which were peer-reviewed and accepted (and which I’ll blog later) .
But there was also one Presentation which was, as I understand, neither invited nor peer-reviewed.
“Elsevier’s Text and Data Mining Policy” by Gemma Hersh
It is usually inappropriate for a manufacturer to present at a scholarly conference where the audience are effectively customers. It ends up as a products pitch which is offensive to attendees who have paid to attend, and offensive to those who have submmited papers which were rejected, while product pitches are allowed.
This was one of the most unacceptable presentations I have ever seen at a scholarly workshop and I said so.
Before I add my own comments I simply record the facts. Professor Charles Oppenheim agrees that this is a factual record.

GH = Gemma Hersh (Elsevier)

CO = Prof Charles Oppenheim

GH arrived 10 mins before her presentation and left immediately afterwards. She did not stop to talk. She later tweeted that she had a meeting.

GH Presentation

(1) Elsevier’s presentation was not an invitation or peer-reviewed submission but appeared to have been a result of pressuring the organizers.

(2) it was a manufacturer-specific product pitch not a scholarly presentation

(3) It made no attempt to be balanced but presented only Elsevier’s product. In particular:

* no mention was made of Hargreaves

* no mention that it had been rejected by LIBER and associates

* no mention that the library community had walked out of Licences for Europe

(4) Elsevier said that their studies showed researchers preferred APIs. No mention was made that researchers had to sign an additional agreement

Public Discussion (no record, but ca 20 witnesses)

PMR challenged the presentation on the basis of bias and inaccuracy.

GH cricitized PMR for being aggressive. She stated that it was the libraries fault that L4E had broken down

PMR asked GH to confirm that if he had the right to read Elsevier material he could mine it without using Elsevier’s API

GH replied that he couldn’t.

CO told GH that PMR had a legal right to do so

GH said he didn’t and that CO was wrong

Discussion continued with no resolution. GH showed no intention of listening to PMR and CO
Later tweets from GH
@gemmahersh:
@petermurrayrust check the explanatory notes that accompany the legislation.
@gemmahersh:
@petermurrayrust would prefer a constructive chat rather than an attack though….
@gemmahersh:
@petermurrayrust very happy to keep talking but I’ve just come straight from one appointment and now have another.

PMR I believe that any neutral observer would agree that this was roughly factually correct.
====================
PMR: now my comments…
It is completely unacceptable for a product manager to push their way into a scholarly workshop, arrive 10 minutes before their presentation, give a product pitch and leave immediately without deigning to talk to anyone.
The pitch itself was utterly one-sided, presenting Elsevier as the text-miner’s friend and failing to give a balanced view of the last several years. Those of us in the UK Hargreaves process and Licences4Europe know that STM publishers in general and Elsevier in particular have thrown money and people in trying to control the mining effort through licences. To give a blatantly biased presentation at a scholarly meeting rules them out as trustable partners.
Worse, the product pitch was false. I called her on this – I was forthright – and asked whether I could mine without Elsevier’s permission. She categorically denied this. When challenged by Professor Oppenheim she told him curtly he was wrong and Elsevier could do what they liked.
The law explicitly states that publishers cannot use terms and conditions or other contractual processes to override the right to mine for non-commercial research processes.
So it’s a question of who do you believe:
Gemma Hersh, Elsevier or Professor Charles Oppenheim, Loughborough, Northampton, City?
(and PMR, Nottingham and Cambridge Universities)
If GH is right, then the law is pointless.
But she isn’t and it isn’t.
It gets worse. In later discussions with Chris Shillum, who take a more constructive view, he made it clear that we had the right to mine without Elsevier’s permission as long as we didn’t sign their terms and conditions. The discussion – which I’ll cover in the next post – was useful.
He also said that Elsevier had changed their TaC several times since January, much of this as a result of my challenging them. This means:

Elsevier themselves do not agree on the interpretation of the law
Elsevier’s terms and conditions are so mutable and frequently changed that they cannot be regarded as having any force.

This entry was posted in Uncategorized. Bookmark the permalink.

7 Responses to WOSP2014: Text and Data Mining: II Elsevier's Presentation (Gemma Hersh)

Jason Hoyt says:

September 17, 2014 at 6:00 am

Peter – Regarding the right to textmine by way of crawling rather than via API, I believe you are in the right here according to 2014 amendments to the UK Copyright Act. It does, however, come with some stipulations.
The relevant amendment to Article 29A of the Copyright Act is thus [article 3(2)(5)] “To the extent that a term of a contract purports to prevent or restrict the making of a copy which, by virtue of this section, would not infringe copyright, that term is unenforceable.”. http://www.legislation.gov.uk/uksi/2014/1372/regulation/3/made
In other words, if you already have the legal right to view, then your computer is allowed to copy and mine regardless of what other terms Elsevier try to impose.
However, the stipulations are that Elsevier is allowed to protect its resources for stability and security. This is outlined in 7.9.3 of the memorandum to the 2014 amendments as follows, “The exception will not provide a “right to data mine” works to which the researcher does not already have a right of access. Researchers or their institutions will still have to buy access to content if that is the rights holder’s model. Publishers will be able to impose reasonable measures to maintain stability and security of their computer networks as long as researchers are able to benefit from the exception to carry out non-commercial research.” http://www.legislation.gov.uk/uksi/2014/1385/pdfs/uksiem_20141385_en.pdf
My reading is that they theoretically could deny crawling, even if your institution already has legal access, but ONLY if they can show that your crawler is making their resources unstable. Given that crawling algorithms can be designed to throttle themselves based on available resources, and Elsevier itself could throttle over-eager crawlers, it seems you should be able to safely mine by crawling both legally and technically speaking. Additionally, if their API is somehow restricting the content (i.e. you are unable to “benefit” from it) then they need to either update the API or allow the crawling regardless of stability. Of course, you may want to get a proper legal opinion before going forward!

- pm286 says:
  
  September 17, 2014 at 7:58 am
  
  Many thanks Jason
  
David Roberts says:

September 20, 2014 at 10:16 am

I hope someone is keeping copies of the various versions of the T&C that Elsevier are posting.

- pm286 says:
  
  September 20, 2014 at 10:51 am
  
  I’m not = Elsevier matters waste too much of my time ATM.
  
Gemma Hersh and Chris Shillum says:

September 27, 2014 at 12:13 am

[Repeating our response to PMR’s later blog post on this topic]
We feel it is important to clarify the following:
1. We are fully aligned on the details of Elsevier’s text mining policy and feel that, despite your attempt for accuracy, that this is a somewhat subjective and rather unproductive recap of our conversations last week.
2. Chris had the opportunity to speak with you informally and at greater length over lunch, and Gemma gave a short presentation at the invitation of colleagues from Mendeley, who helped co-organise the conference. It is clear from the workshop program at http://core-project.kmi.open.ac.uk/dl2014/ that this was neither presented as, nor intended to be, a peer-reviewed session nor a sales discussion.
3. If a single crawler behaves as you propose, then it alone would pose no stability risk to publisher systems. However it does not follow that this is sustainable when you consider that many researchers may wish to mine. We are designing services that scale globally.
4. We therefore continue to ask and recommend that you use our APIs for programmatic access to the content. You have not, to our knowledge, ever used the TDM service of which you are so critical and we invite you try it.
5. We would welcome the opportunity to participate in a community-driven process to further refine your proposed Guidelines for Responsible Text Mining and acknowledge the more constructive tone of your later blog post. We look forward to more constructive dialogue in future.
Gemma Hersh and Chris Shillum

Pingback: Meet the Fellows - Peter Murray-Rust - OpenForum Europe
Pingback: WOSP2014: Text and Data Mining: II Elsevier’s Presentation (Gemma Hersh) – ContentMine