Content Mining (TDM). I analyse Elsevier’s reply and ask whether I am allowed to mine Chemistry

Elsevier has replied to my last blog post on their Content Mining (TDM) facility and regulations. I am going to critique these – mainly for the benefit of Universities and policy makers/funders who might think it is a step forward. It isn’t.

First a preamble about the TA (Closed) Scholarly publishing industry. This is almost unique in that it provides an essential service on an unregulated monopoly basis. IOW the industry can do what it likes (within the law) and largely get away with. The “customers” are the University libraries who seem only to care about price and not what the service actually is. As long as they can “buy” (sorry “rent”) journals they largely don’t seem to care about the conditions of use (and in particular the right to carry out Content Mining). In many ways they act as internal delivery agents and first-line policing (on copyright) for the publishers. This means that the readers (both generally and with institutional subscription) have no formal voice.

Railways have to submit to scrutiny and have passenger liaison committees. So do energy providers. Ultimately they are answerable to governments as well as their shareholders.

Publishers have no regulation and have effective micromonopolies. Readers have no choice in what they read – there is no substitutability. They can either subscribe to read it or they are prevented by the paywalls. If they have access they can either mine it or they are subject to legal constraints (as in this case). When reading Elsevier’s reply remember that the only constraint on what the Director of Access and Policy has is that they must make money for Elsevier. Nothing else matters. Elsevier can go a very long way in upsetting its readers without losing market.

Elsevier has replied through its Directorate of Access and Policy. (This is the one acceptable feature – that there is a clear channel). It used to be called “Universal Access” but the Orwellian euphemism seems to have gone. The Director is currently Alicia Wise (who also tweets under @wisealic). I treat the Directorate in a polite manner and regard it in the same way as I regard “Customer Care” on the railways. To me its staff are people employed by Elsevier to maximise their profits by growing the market and limiting damage. They are not my collaborators and we do not share common goals. In many cases they are directly trying to make life difficult for me and other readers.


Alicia Wise says:

February 1, 2014 at 2:20 pm

Hi Peter,

Dear Director of Access and Policy,

Thanks you for the reply on my blog which I have copied in full. I comment and add QUESTIONS which I would be grateful if you would answer clearly and succinctly. Please avoid generalities. If I do not get answers within a few days, I shall announce that Elsevier have failed to answer.

We think our new text mining policy goes a long way to addressing researcher needs in respect of TDM. You raise some good questions, though, and I’d like to take this opportunity to respond to them:

PMR: Companies always assert that they address customer needs. It is an effectively empty phrase.

• Elsevier requires both institutions and individuals to sign licenses

Our objective is to provide practical support to researchers. We believe a licence-based, self-service solution removes access barriers for researchers who want to text and data mine while allowing publishers to ensure performance and quality of service for all users.

PMR: Another empty phrase.

• Elsevier is the sole author and controller of the policy – there has been no Open discussion or agreement with scholarly bodies

This new policy is the result of extensive discussions with academic institutions – we have, for example, been running pilots with a number of institutions over the course of last year to test and refine both our technology and the terms and conditions under which this access is provided.

PMR: It is relatively easy to find customers who will promote the role of a company.

QUESTION: Where are these pilots? Have any been published? Have any University, Funder or Government organizations acted to oversee them and provide an impartial opinion? (Without such evidence this is an empty claim).

• Libraries have to – individually – sign agreements with Elsevier. There are no details of these policies or whether they entail additional institutional payment. It is also possible that Institutions may be asked to give up content-mining rights in return for lower overall prices. (Libraries have universally and unilaterally given away all these rights over the last decade and support publishers to forbid machine access to content).

There is no additional charge for this access, and it will be automatically included in all library contracts when they are renewed. Libraries who would like access immediately (perhaps their next renewal is some time away) are asked to simply send us a request and we will amend their current agreement to include this access.

PMR: I will be getting this information from libraries

• Researchers have to register as a developer (I think) and ask permission of Elsevier for every project they wish to do. It is not clear whether permission is automatic or whether Elsevier exercise control over choice and scope of project

The process is automatic – researchers are indeed asked to register and agree to the terms, and are then automatically sent an API key. You don’t need to contact anyone at Elsevier, and we do not exercise any control over the choice and scope of research projects.

QUESTION: What are the detailed terms that researchers have to agree to? (Note: The previous terms that Elsevier asked me to sign, restricting out and forbidding mining chemistry effectively were unacceptable).

• Researchers can only mine text. Images are specifically prohibited. This is useless for me – as I and colleagues are mining chemical structure diagrams.

Figure metadata (titles, captions, etc) is included in the XML returned from our APIs and may be mined as a matter of course. Due to some ambiguity about re-use rights for some of the images included in our content, we are not automatically making the images themselves available to those who self-register for our text-mining API, but do have an image retrieval API that we can make available upon request once we understand the way in which the researcher intends to use the images.

PMR: This requires the researchers to formally seek Elsevier’s permission to mine images. You also wish to decide whether you approve of my proposed use. I will therefore state this as a formal request.

QUESTION: I wish to mine all chemical diagrams in Elsevier publications and extract reactions and analyse these for novel chemical reactions. I have an institutional subscription. I will publish only facts which are uncopyrightable. I wish to analyse 100 articles a day – one every 15 minutes (which should not cause stress on your servers). Note: There are only two answers: YES and NOT-YES. Any prevarication as I have had before will be interpreted as a refusal.

• There is no indication of how current the material will be. I shall be mining the literature an hour after it appears. Will the API provide that?

Yes. The APIs provide immediate access to content – they are hooked up to the same “back end” content store as ScienceDirect.com itself.

PMR: Noted.

• The amount that can be republished is often useless (“200 characters”). I want to build corpora (impossible); vocabularies (essential to record precise words – impossible); chemical names (often > 200 characters so impossible). Figure captions (impossible).
• The researchers must commit to a CC-NC licence. This effectively kills downstream use (I shall use CC0). It also trains them into thinking CC-NC is a “good thing”. It isn’t.

We arrived at our terms in consultation with researchers, and we believe that they pose no issue in the vast majority of cases. Of course, it’s not possible to cover every situation in a general policy, so we’re always open to specific requests.

QUESTION: Which researchers did you consult? Please publish full details of their input.

PMR: Note. Yet again a subscriber has to make a specific request. Elsevier are regulating what can be accessed.

• If a researcher has a LEGITIMATE collection of papers that they wish to mine (say on their hard disk) they are forbidden. They have to go to each publisher (if this awful protocol is promoted elsewhere) and find the API and mine the individual papers. Absurd.

We recognise that an important issue for researchers is the need to deal with multiple publishers. So for us providing an API for our customers is only part of the solution – we’re also strong supporters of CrossRef’s Prospect initiative (https://prospect.crossref.org/splash/), which aims to provide a single interface to content from multiple publishers.

Interested readers can learn more here: http://www.elsevier.com/connect/elsevier-updates-text-mining-policy-to-improve-access-for-researchers

PMR: Noted.

With kind wishes,
Alicia

Dr Alicia Wise
Director of Access & Policy
Elsevier
a.wise@elsevier.com
@wisealic

Yours Sincerely

Peter Murray-Rust

This entry was posted in Uncategorized. Bookmark the permalink.

6 Responses to Content Mining (TDM). I analyse Elsevier’s reply and ask whether I am allowed to mine Chemistry

  1. Alicia Wise says:

    Hi Peter,
    For further insight into our pilots, readers may be interested to read http://www.nature.com/news/elsevier-opens-its-papers-to-text-mining-1.14659
    The end-user click through license terms are available at http://integration.elsevier.com/files/TDM_click_through_agreement.pdf
    On your specific use case, i.e. “I wish to mine all chemical diagrams in Elsevier publications and extract reactions and analyse these for novel chemical reactions. I have an institutional subscription. I will publish only facts which are uncopyrightable. I wish to analyse 100 articles a day – one every 15 minutes (which should not cause stress on your servers).”
    The answers is YES. You would need to accept the end-user license, and access the full-text via our API.
    With kind wishes,
    Dr Alicia Wise
    Director of Access and Policy
    Elsevier
    a.wise@elsevier.com
    @wisealic

  2. Dear Alicia,
    what if I want to mine a mathematics paper, in which, when I read it in HTML format online, there are many, many images masquerading as mathematical symbols throughout the text? Do I need special permission? I even managed to get ScienceDirect to generate me an ePub document after multiple tries, and even there the images are rendered as images.
    Best regards,
    David Roberts

    • pm286 says:

      I am not sure that @wisealic will necessarily see this request unless you mail/tweet it.
      FWIW it has been standard in the publishing industry to turn LaTeX into GIFs or PNGs. This is, of course, awful for readers as the images don’t scale. There are solutions such ass MathML (normally non-semantic) but only a fraction of publishers use them.
      I have no idea what awful process you will need to use to get these images out of the publisher but – say – 100 images per paper will be unaccpetable.

Leave a Reply to Alicia Wise Cancel reply

Your email address will not be published. Required fields are marked *