Open Letter to EC Carlos @Moedas on Open Science and ContentMining (TDM)

Dear Commissioner Moedas,
I am an academic at the University of Cambridge UK determined to see published scientific knowledge brought to citizens. I also run, a non-profit which does this technically by content-mining (TDM) the complete scientific literature for facts.
I was inspired by your speech to EU2016NL yesterday [1] where you wholeheartedly promoted the Open Science agenda for Europe. I support all of your vision, but wish specifically to urge the  unrestrained development of published science and content mining as a key tool. I was delighted to see your praise for the European Bioinformatics Institute. EBI hosts Europe PubmedCentral (EPMC), a collection of the world’s published biomedical literature,and I have worked with them for over 10 years. Here is a short video of how citizens can extract published factual Open science on the Zika virus from EPMC in less than 5 minutes [2].
It is critical to reform copyright law in Europe. It must go beyond the UK 2014 “Hargreaves” legislation (“personal non-commercial use”). I am probably one of only 2 UK groups using this, because it is heavily weighted against us. It depends on Universities allowing their staff to mine without explicit publisher permission. My anecdotal evidence is that many libraries will give in to publishers, sign restrictive contracts and regulate academic access [3] thereby negating the law.
We then have a problem publishing the results – as this may break copyright. Hargreaves allows freedom of quotation, but this is untested. In short, we must have legal clarity.
Changing the law is not enough; we must change hearts and minds. Not enough academics actively work with citizens and it’s critical that science is equally available to conservationists, doctors, policy makers, schools, patient groups, etc. This must not be controlled, however lightly, through the current publishers. Please find ways of actively involving citizens outside academia.
There has been massive lobbying by “rightsholders” against reform of content-mining. This includes FUD that (a) mining will break servers [4] (b) there is no demand [5] (c) you need publisher APIs [6] (d) only experts can do it [7]. (e) we will use this to steal content [8]. This is an asymmetric battle. I have watched the lobbyists spend millions on lobbying for watering down of Julia Reda’s EP proposals, and diluting and delaying any reform from the Commission. To redress the balance I’ll offer to come to Brussels and demonstrate on my (or your) laptop the value of ContentMining (TDM) for Open Science.
Peter Murray-Rust
Reader Emeritus, University of Cambridge
[2] This video is shot in real time (5mins) demonstrating that any citizen can access knowledge on that timescale.
[3] A Dutch statistician (Chris Hartgerink) was mining the literature to detect scientific malpractice, and both Wiley and Elsevier wrote to the University of Tilburg (NL) to get his research stopped. The University complied with the publishers without any public comment.
[4] In Cambridge I can mine the whole daily scientific literature on my laptop in an hour. This is probably less than one millionth of the daily accesses made by other subscribers. And if there is a trusted cache, as suggested in the recent French proposals, then there is no problem of overload.
[5] Publishers have made this so difficult that no one asks. ( where I chronicle 5 wasted years trying to get anything from Elsevier).
[6] Our software can scrape publisher sites directly. And without external regulation I don’t trust any company to respect my privacy, nor to control the view presented through an API.
[7] We shall make our (Open) software available to MEP Julia Reda and we’d be delighted if you and other Commision staff wish to use it to see how easy it is.
[8] I am a responsible citizen and have no intention of making copyrighted content available illegally. I coined the slogan “The Right to Read is the Right to Mine”. Yet I and others have been branded as potential thieves.

