Mendeley (and other Bib Data): WHAT is Open?

Euan has commented on my enquiry to Mendely about their "Open data". He raises important valid points


Euan says:

March 12, 2011 at 1:08 am  (Edit)

IANAL, but the data includes (or did last time I checked) many, many abstracts that definitely haven't been licensed for use in this way, scraped from PDFs or online sources of metadata like PubMed.

PMR: I do not know what data Mendeley are offering – I have specifically aked (see below). IF they are offering abstracts then IMO they are likely to potentially be breaking copyright law.

Though

  1. I'm sure most publishers look favourably on their article metadata being spread around as many ways as possible (I'm speaking personally, not for NPG my employer whom I do not represent in this matter in any way)
    2) Abstracts do exist in a kind of weird grey area where nobody is sure exactly what's fair use and what isn't, and some people seem to believe that they're public domain

You are right in that it is extremely difficult to get definitive answers. I regard abstracts as "mumble" in the Yes-No-Mumble value logic.

… it *doesn't* seem to me that this means that anybody can package them up with a bunch of homegrown content under the same CC-BY license and say that the whole thing is "open".

I personally agree with this analysis. An individual or organization cannot declare that someone else's IP is Open or free of copyright. The problem is that it is difficult to determine what the IP is on things like abstracts. Publishers are extremely unhelpful in this (other than the ones who assert that they own the abstract).

Obvious example: some publishers sell their abstracts and associated metadata to commercial literature databases. The current Mendeley API license implies to me that I could put together my own, identical datasets with the same content from that source and sell it for half the price, thus cutting the original publisher out of the loop. This makes me think that a blanket one line "all data is made available under CC-BY" is insufficient.

I agree. IF the Mendeley data includes abstracts I would refuse to use to abstracts in an Open data collection.

At the very least the attribution for abstracts should be to the copyright holders – preferably the authors, otherwise the publisher – not Mendeley (Again, IANAL, I may be wrong. If somebody wants to tell me so I'll be very happy).

I've mentioned this issue on Twitter a couple of times and know that Mendeley are perfectly aware of it, but haven't ever had a response and nothing has ever changed on the license page. Jason…? Just having somebody say "we've checked it out with our lawyers and it's all fine" would be good to hear. If it is then I'm off to build my own abstract dataset to sell for $$$.

Copyright is governed by civil law in many domains so if someone believes they own the IP then they can reasonably sue. It doesn't mean they will or wont win.

The Mendeley API is awesome and the intentions noble, but you can't cut legal corners. It won't do anybody good in the long run (at some point it'll become a problem) and, at worst, could potentially land people who've used the API in legal trouble with the *actual* intellectual property owners.

I agree. And in the OKF we are scrupulous to avoid violation of IP. It often means there are things we cannot reuse that seem "reasonable". So many of the "free" bibliography collections are not Open in that they may scapre data off other sites.

I'd like to see a little extra effort in living up to the "open data" claim by securing the relevant permissions from copyright holders, or clarifiying exactly what attribution should be used for what, or separating out abstracts to be delivered under a different license… whatever would work.

So would I. I'd like to see publishers trying to help scholars re-use material rather than explicitly or implicitly preventing re-use as part of an outdated business model. It would, for example, be possible for publishers to agree that they would not claim copyright over abstracts. That doesn't remove the author copyright. It would also be possible (thoughthe probability requires a Maxwell demon (JamesClerk, not Robert)) for publishers to own the copyright of abstracts and donate it to the world as CC-BY..

Alternatively John Wilbanks saying "it's not a problem because x" would work for me too.

That depends on "x".

So this is why I have formally asked Mendeley on IsItOpenData WHAT their data are. http://www.isitopendata.org/enquiry/view/e59da4b5-1ef2-43d1-beab-60ec91196f27/: So far I have not had a response and I hope I get one

It would be very useful if you could answer the following question(s):

(1) What is the data? It is important to know precisely what is included and what is not.

(2) How is the data obtained? From your blurb it seems like it may have to be accessed through an API. If so what is the nature of the nature of the API?

(3) Are there any limitations on how much data can be downloaded? If so what is the definition of the subset?

(4) Is there any guarantee that updates to your data base will be made available in the same way or is this effectively a snapshot?

(5) what is the "Creative Commons licence" that you mention?

This was answered in my blog. It is CC-BY

(0) This is summed up in the single question: is it OKD-compliant? http://www.opendefinition.org/ "A piece of content or data is open if anyone is free to use, reuse, and redistribute it — subject only, at most, to the requirement to attribute and share-alike.".

So some final clarification.

All evidence points to core data for individual bibliographic entries being Open. The STM publishers have confirmed publicly to me that they do not regard this as copyrightable. PMC have declared that their collection of bibliographic data is Open . The core is author/journal/title/year/page/language/format/ etc. It does NOT cover abstracts or images.

It does NOT cover collections of bibliographic data which ARE generally agreed to be copyrightable.

I think that Mendeley need to add clarification to this and I am hoping to get it. Mendeley, PLEASE use the IsItOpenData enquiry to reply since that then becomes public record.

[Of course with goodwill we can solve this problem and the OKF intends to catalyse that.]

 

 

 

Reply

This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to Mendeley (and other Bib Data): WHAT is Open?

  1. Mr. Gunn says:

    Hi Peter, Egon, et al.

    I'm getting to your Is It Open request, but it's basically the same as Jason said in his earlier comment here.

    Mendeley has said, "we checked it out with the lawyers and we've spoken to many publishers and they've all said what we're doing is OK.." Nonetheless, as with any user-submitted content, there's a possibility we'll get a DMCA takedown request that we'll have to comply with, although that hasn't happened yet. You're certainly ok using the cite data.

  2. Euan says:

    Thanks Mr Gunn -

    To be specific the potential problem is the abstract field in the document details results as in the response examples from the API docs here:

    http://dev.mendeley.com/docs/public-resources/search-details
    http://dev.mendeley.com/docs/user-specific-resources/user-library-document-details (though abstract in "null" for this particular paper)

    > DMCA

    Doing a risk assessment and deciding to worry about it if anybody does complain is perfectly reasonable from a business pov but not the same as making data open, is the point really... also AFAIK it's the developers using the material obtained through the API who would actually be infringing copyright, so it's them who should be aware of / decide how to handle the risk. Mendeley just needs to make them aware. Also the DMCA system only applies in the US and publishers / developers could be anywhere, including places where the law is even more draconian.

    I do think what Peter says about solving the issue with goodwill probably goes for everybody. In general the API is awesome, it's just making sure that it lives up to the Open label.

    • pm286 says:

      I agree with this analysis. Mendeley SHOULD post a notice alerting users to the fact that abstracts cannot be assumed to be copyright-free. If there is any other information that is also a potential problem, please flag it. It would also be helpful if the abstract and other copyrightable stuff can be identified programmatically and filtered out if required.

  3. Mr. Gunn says:

    All good points. We'll try to make it clearer regarding abstracts, as we do with the Open Access flag already.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>