I have had a very rapid response to my blog post:
Jason Hoyt says:
All very good and valid questions raised there. To answer you acid test: yes, you could download and mash up our data set in any way you see fit. It is currently covered by a CC-BY license.
As you pointed out with the addition of John Wilbanks from Creative Commons to the API judging panel, we are very serious about making data accessible and agreeable to all parties.
That said, currently there is no bulk data dump option available to all. That option is available to academic researchers who want to work closely with us. The current process of using API methods is the more appropriate tool for developers desiring to build various applications for this contest.
We see the creation and usage of basic developer-friendly APIs as one of the key solutions to making science more open and more digestible by the general public. Large, raw data sets can serve a different purpose.
For serious research, ie not consumer facing apps to make science more accessible, we currently have a data set suited for collaborative filtering algorithm development (http://dev.mendeley.com/datachallenge/). We are also working on a few other large data sets that would be suited to other types of algorithm development and general research.
We will also be taking in feedback from all relevant stakeholders, including yourself, as we go forward in our agenda of making science more open.
Jason Hoyt, PhD
Chief Scientist and VP for R&D
So this is very good news. Not just for the fact that 70 million pieces of data are available, but because this is large enough to make a major impact on scholarship. I don't know much about the data, but I will get myself a login and have a look.
I'm assuming that the bulk download for "academic researchers who want to work closely with us." will carry restrictions. So the open material is what can be got out of the API. I certainly value the API – for example this is something that could be accessed by Chem4Word. Just as we access Pubchem and OPSIN from C4W we could also access the Mendeley API. If you want small numbers of specific bibliographic records then APIs can be a useful way to go. Indeed we might have a look during the hackfest.
However there are cases where we want all the data. An API should not precludes access to the raw data. And that's where the "data" question still needs to be answered.
AIUI the data are collected silently from the activities of Mendeley users. Clearly there are data about this process (names of users, patterns of usage) which probably won't be made public. There may be users comments – I don't know. But for me the core raw data is the bibliography – as specified in The Principles of Open Bibliographic Data . Here we have confirmation from an increasing number of sources that individual bibliographic entries/records/data are not Copyright. So – assuming the Mendeley data falls into this, that is what I am talking about for raw data. Note that the author (or unfortunately the publisher) may claim copyright over material such as abstracts and I don't know what Mendeley do about abstracts and related material.
So I still need to know what the "data" is. But assuming that it's core bibliography then a large amount of that is becoming Open. And it's not before time.
Oh, and if all of us in JISCOpenBib and related projects feel the same way, expect us to win the 10001 prize. There is a great deal we can do already.