[Quote in title is from Mark Hahnel, see below]
I have been meaning to write on this theme for some time, and more generally on the increasing influence of DigitalScience’s growing influence in parts of the academic infrastructure. This post is sparked by a twitter exchange (follow backwards from https://twitter.com/petermurrayrust/status/591197043579813888 ) in the last few hours, which addresses the question of whether “Figshare is Open”.
This is not an easy question and I will try to be objective. First let me say – as I have said in public – that I have huge respect and admiration for how Mark Hahnel created Figshare while a PhD student. It’s a great idea and I am delighted – in the abstract – that it gained some much traction so rapidly.
Mark and I have discussed issues of Figshare on more than one occasion and he’s done me the honour of creating a “Peter Murray-Rust” slide (http://www.slideshare.net/repofringe/figshare-repository-fringe-2013 ) where he addresses some (but not all) of my concerns about Figshare after its “acquisition” by Macmillan Digital Science (I use this term, although there are rumours of a demerger or merger). I use “acquisition” because I have no knowledge of the formal position of Figshare as a legal entity (I assume it *is* one? Figshare FAQs ) and that’s one of the questions to be addressed here.
From the FAQs:
figshare is an independent body that receives support from Digital Science. “Digital Science’s relationship with figshare represents the first of its kind in the company’s history: a community based, open science project that will retain its autonomy whilst receiving support from the division.”
However http://www.digital-science.com/products/ lists Figshare among “our products” and brands it as if it is a DigitalScience division or company. Figshare appears to have no corporate address other than Macmillan and I assume trades through them.
So this post has been catalysed by a tweet of a report from a DS employee(?) Dan Valen
John Hammersley @DrHammersley tweeted:
Such a key message: “APIs are essential (for #opendata and #openscience)” – Dan Valen of @figshare at #shakingitup15 pic.twitter.com/HDyYEaXJRn
This generated a twitter exchange about why APIs were/not essential. I shan’t explore that in detail, but my primary point is that:
If the only access to data is through a controlled API, then the data as a a whole cannot be open , regardless of the openness of individual components.
There is no doubt that some traditional publishers see APIs as a way of enforcing control over the user community. Readers will remember that I had a robust discussion with Gemma Hirsh of Elsevier, who stated that I could not legally mine Elsevier’s data without going through their API. She was wrong, categorically wrong, but it was clear that she and Elsevier saw, and probably still see, APIs as a control mechanism. Note that Elsevier’s Mendeley never exposed their whole data – only an API.
An API is the software contract with a webserver offering a defined service. It is often accompanied with a legal contract for the user (with some reciprocity). The definition of that service is completely in the hands of the provider. The control of that service is entirely in the hands of the provider. This leads to the following technical possibilities:
- control: The provider can decide what to offer , when, to whom, on what basis. They can vary this by date, geography or IP of user, and I have no doubt that many publishers do exactly this. In particular, there is no guarantee that the user is able to see the whole data and no guarantee that it is not modified in some way from the “original”. This is not, per se, reprehensible but it is a strong technical likelihood.
- monitoring: (“snooping”) The provider can monitor all traffic coming in from IP addresses, dwell times, number of revisits, quite apart from any cached information. I believe that a smart webserver, when coupled to other data about individuals, can deduce who the user is, where they are calling from and, with the sale of information between companies, what they have been doing elsewhere.
By default companies will do both of these. They could lead to increased revenue (e.g. Figshare could sell user data to other organizations) and increased lockin of users. Because Figshare is one of several Digital Science products (DS words, not mine) they could know about a user’s publication record, their altmetric activity, what manuscripts they are writing, what they have submitted to the REF, what they are reading in their browser, etc. I am not asserting this is happening but I have no evidence it is not.
Mark says, in his slides,
“it is not just about open or closed, it is about control”
and I agree. But for me the question is who controls Figshare? and is Figshare controlling us?
Figshare appears to be one of the less transparent organizations I have encountered. I cannot find a corporate structure, and the companies’ address is:
C/o Macmillan Publishers Limited, Brunel Road, Basingstoke, Hampshire, RG21 6XS
I can’t find a board of directors or any advisory or governing board. So in practice Figshare is legally responsible to no-one other than UK corporate law.
You may think I am being unfair to an excellent (and I agree it’s excellent) service. But history inexorably shows that these beginnings become closed, mutating into commercial control and confidentiality. Let’s say Mark moves on? Who runs Figshare then? Or Springer buys Digital Science? What contract has Mark signed with DS? Maybe it binds Figshare to being completely run by the purchaser?
I have additional concerns about the growing influence of DigitalScience products, especially such as ReadCube, which amplify the potential for “snoop and control” – I’ll leave those to another blogpost.
Mark has been good enough to answer some of my original concerns, so here are some othe’r to which I think an “open” (“community-based”) organization should be able to provide answers.
- who owns Figshare?
- who runs Figshare?
- Is there any governance process from outside Macmillan/DS? An advisory board?
- How tightly bound is Figshare into Macmillan/DS? Could Figshare walk away tomorrow?
- What could and what would happen to Figshare if Mark Hahnel left?
- What could and what would happen to Figshare if either/both of Macmillan / DS were acquired?
- Where are the company accounts for the last trading year?
- how, in practice, is Figshare a “a community based, open science project that will retain its autonomy whilst receiving support from the (DS) division.”?
I very much hope that the answers will allay any concerns I may have had.
Open Data behind a crippled API is like a patent: yes, it’s open, but you cannot really work with it.
Interesting analogy. But with a patent the knowledge itself is Open/Free. I can collect all the patents and redistribute them and make derivative works. I just cannot make the things the patent describes. Whereas with API access to knowledge you never know what you are missing or how it has been corrupted or manipulated.
I could argue that the web interface is an API too; you can still get the data and have the Open license (for a lot of data anyway; not sure, because I think you can host closed data nowadays on figshare hosting too). Of course, and I quite agree with that a good API can be so much more helpful. The analogy I was aiming at was that both are, indeed, Open, but actually using it is limited in both cases; but, again indeed, in both cases due to different mechanisms.
[Some markup corrected by PMR]
Hi Peter,
Firstly thank you for your kind words. I know you have long mirrored our vision, to better aid researchers, institutions and publishers to better manage and disseminate their academic research and data. The Figshare team is firmly of the belief that scholars and researchers will benefit enormously from open access to knowledge. Born digital, our business model has always been built on open principles. We are a commercial company with a business model.
Let me address your questions and clear up some confusion.
Figshare LLP is a legal partnership between myself and Macmillan Digital Science Limited. With regard to rumours of a merger or a de-merger, the future position of Digital Science has been clearly documented [here by MD Timo Hannay] back in January . You will see that on completion of the necessary regulatory approvals for the business joining the joint venture between parts of Macmillan Science & Education and Springer Science + Business Media, Digital Science will remain wholly owned by its parent company – the global media company Holtzbrinck Publishing Group.
So Figshare has not been “acquired” by Digital Science, but is one of a number of portfolio businesses supported by Digital Science. Digital Science has a long-term vision to help researchers work smarter and discover more, whilst cultivating an environment that encourages independence and responsibility. The team at Digital Science has always provided us with invaluable support in the way of advice, mentorship and finance and has been instrumental in helping Figshare “gain traction so rapidly.”
We are both committed to being an agent of change in academic research. Figshare specialises in the active long-term digital preservation of data for academic research. This is not going to change.
With regards to your concerns on control and monitoring:
“It is about control” – this statement, in context is about ensuring that researchers and their institutions have complete control over how much or how little, or when they want to make their own research open to others. It is not about our control over whether or not they can publish their data. Figshare does not decide what to offer, when and to whom, on what basis.
All content hosted on Figshare can be downloaded by anyone, with no need to log in. The content can be mass downloaded or mined using the Figshare API, also available to anyone at api.figshare.com. This whole conversation came about following a [discussion] with you and your colleagues about best practice when looking to mirror all files and metadata hosted on figshare.
We are pushing for a world where academics can build on top of the cumulative research of those who have gone before. For this we believe that all content and associated metadata should be interpretable by man and machine alike. We will continue to utilize technology and workflow processes, such as the agile methodology currently implemented, in order to make responsible data management and dissemination a natural and unobtrusive part of the research life-cycle. We’re always open to discussion and ideas from the community. We will continue to iterate on our free offering as well as our client products. If something does not exist yet, it is because we haven’t had the time to make it [happen] – we prioritize things based on the community need as a whole and not specific requests of individuals.
Our community is very involved with how we progress and develop Figshare. To date there are 150 Figshare advisors who provide feedback on what the development of core functionality – we’d love to have you on board. In the past 2 years, we have begun working with several publishers and more recently, institutions. This is fantastic in terms of sustainability, something that academic repositories have struggled with in general. Obviously, each client has a stake in the development of Figshare, from which viewers to add to the platform next, to specific funder mandate compliance features. A more defined advisory board is something we are looking into at the point that we have a full compliment of stakeholders interested. We have spent a lot of our focus on scaling up the product and ensuring the sustainability of the data. A next obvious step is to improve [“transparency docs”] around this. As mentioned above, we appreciate feedback from all areas of academia and we will continue to push people towards open licenses and open knowledge where possible. Thanks again 🙂
Mark,
I don’t know what Peter will say about this, but I think it’s helpful and important information. I’d urge you to add it to the Figshare website as soon as you conveniently can. People need to know this stuff.
On APIs — I think that what’s happened here is that Elsevier’s insistence that their articles may be harvested ONLY via the API have given APIs a bad name among the mining community, Peter included. For myself, I agree that APIs are indeed valuable and important; but for maximum transparency, replicability and disaster-proofness, it’s important to offer bulk download as well.
Anyway, good luck as you push on with this important work. Great that we have people like Peter around to keep us all honest!
>>> On APIs — I think that what’s happened here is that Elsevier’s insistence that their articles may be harvested ONLY via the API have given APIs a bad name among the mining community, Peter included. For myself, I agree that APIs are indeed valuable and important; but for maximum transparency, replicability and disaster-proofness, it’s important to offer bulk download as well.
Open is only Open if it can be forked. The forking may be technically difficult, or may cost money to implement, but as long as it’s reasonably possible that’s fine. Comments from Mendeley seem to suggest that Mendeley is not forkable because they haven’t got the money (I may have misunderstood that). But Mendeley is not Open, never was, and never will be.
I have manually corrected some markup problems in Mark’s reply.
Immediate reactions.
Thank you. This helps.
Note that I found it difficult to find this information on the website. There is no obvious mention of 150 advisers, nor of their names. It’s kind of you to offer an advisory role to me – I will need to think about it. There was no clear indication of the partnership agreement and there is no clear indication of Figshare (or other companies) as “Digital Science Products” – their words. Is Figshare a DS product or not? If it is, the DS have, presumably, some contractual relation to the purchaser. Or is it a franchise of Figshare products through Digital Science?
Figshare and some of the other DS products represent a key part of the academic infrastructure – analogous to a transport or power system. It’s reasonable, I believe, that since much of the money comes from the public purse (directly or indirectly) that we should be assured about the integrity of the service. With the best will in the world assurances from the CEO of a non-publicly-traded company cannot be taken as absolute guarantees. That is why I believe there should be public representation in several parts of the scholarly publishing process.
I am still unclear of the legal status of Figshare. You say “Figshare LLP is a legal partnership between myself and Macmillan Digital Science Limited”. So some more questions:
Are details of this partnership public?
Is this the legal trading entity?
Can either party withdraw from this and under what conditions?
If the partnership is dissolved, who holds the assets, intellectual property and the trading rights?
P.
Pingback: OpenLab
It would be easy for services like figshare to overcome openness concerns by putting dumps into an openly accessible archive that is controlled by a public body. This is what libraries are made for, and why some of them offer services in digital longterm preservation.
From my point of view, the case of Github is at least as relevant. Several scientists who are genuinely concerned about openness for there work rely heavily on them. With git, so their argument goes, it’s always easy to fork and copy data into other places. But there’s a huge difference between the sheer possibility to do something, and really let someone (a digital archivist) do it.
As Egon already mentioned, it makes no fundamental difference in regard to openness if a commercially funded entity like figshare via a (public) API or via a webserver.
Thanks,
>>It would be easy for services like figshare to overcome openness concerns by putting dumps into an openly accessible archive that is controlled by a public body. This is what libraries are made for, and why some of them offer services in digital longterm preservation.
It overcomes some of the concerns of “public Figshare”. It would address in part the “Mendeley fear” of becoming closed. It’s easy to see how Figshare could offer differential services.
But my further concern is about Figshare selling private services to universities to manage public research data – which I believe may (or is) happening. In that case it matters greatly about the governance of the organization doing it. I shall return to this.
>>From my point of view, the case of Github is at least as relevant. Several scientists who are genuinely concerned about openness for there work rely heavily on them. With git, so their argument goes, it’s always easy to fork and copy data into other places. But there’s a huge difference between the sheer possibility to do something, and really let someone (a digital archivist) do it.
If it’s critical enough people will do it – there’s a cost balance. I may be naive, but if Git had succumbed to Cyberattacks last month then we would find ways of recovering.
>>As Egon already mentioned, it makes no fundamental difference in regard to openness if a commercially funded entity like figshare via a (public) API or via a webserver.
For me Open applies not open to the licence but to the process.
Pingback: Data management – one size does not fit all | Unlocking Research