#rds2013: #okfn Content-mining: Europe MUST legitimize it.

I’m on an EC committee looking at how to make content available for mining. (At least I thought that was the point – it seems it isn’t).

“Licences for Europe –A Stakeholder Dialogue”

Working Group 4: Text and Data Mining

Unfortunately I haven’t been able to attend the first meeting as I have been in Australia, but @rmounce has stood in and done a truly exceptional job. The WG is looking at licences and WG4 is on content mining. Ross reported back on Saturday and was disappointed. It seems that the WG4 has been told it has no course of action other than to accept that licences are the way forward.

This is unacceptable in a democratic system. It is difficult enough for us volunteers to compete against the rich media and publisher community. If I go to Brussels I have to find the money. These WGs are monthly. That’s a huge personal cost in time and money. The asymmetry of fighting for digital rights is a huge burden. Note also that it’s a huge drain of opportunity costs. Rather than writing innovative code we have to write letters to Brussels. And that’s what we have done (I’m not on, but I would have been). Here’s our letter.



We write to express our serious and deep-felt concerns in regards to Working Group 4 on text and data mining (TDM).  Despite the title, it appears the research and technology communities have been presented not with a stakeholder dialogue, but a process with an already predetermined outcome –namely that additional licensing is the only solution to the problems being faced by those wishing to undertake TDM of content to which they already have lawful access. Such an outcome places European researchers and technology companies at a serious disadvantage compared to those located in the United States and Asia.


The potential of TDM technology is enormous. If encouraged, we believe TDM will within a small number of years be an everyday tool used for the discovery of knowledge, and will create significant benefits for industry, citizens and governments.McKinsey Global Institute reported in 2011[1]that effective use of ‘big data’ in the US healthcare sector could be worth more than US$300 billion a year, two-thirds of which would be in the form of a reduction in national health care expenditure of about 8%. In Europe, the same report estimated that government expenditure could be reduced by €100 billion a year. TDM has already enabled new medical discoveries through linking existing drugs with new medical applications, and uncovering previously unsuspected linkages between proteins, genes, pathways and diseases[2]. A JISC study on TDM found it could reduce “human reading time”by 80%, and could increase efficiencies in managing both small and big data by 50%[3]. However at present, European researchers and technology companies are mining the web at legal and financial risk, unlike their competitors based in the US, Japan, Israel, Taiwan and South Korea who enjoy a legal limitation and exception for such activities.

Given the life-changing potential of this technology, it is very important that the EU institutions, member state governments, researchers, citizens, publishers and the technology sector are able to discuss freely how Europe can derive the best and most extensive results from TDM technologies.We believe that all parties must agree on a shared priority, with no other preconditions – namely howto create a research environment in Europe with as few barriers as possible, in order to maximise the ability of European research to improve wealth creation and quality of life. Regrettably, the meeting on TDM on 4th February 2013 had not been designed with such a priority in mind. Instead it was made clear that additional relicensing was the only solution under consideration,with all other options deemed to be out of scope.We are of the opinion that this will only raise barriers to the adoption of this technology and make computer-based research in many instances impossible.

We believe that without assurance from the Commission that the following points will be reflected in the proceedings of Working Group 4, there is a strong likelihood that representatives of the European research and technology sectors will not be able to participate in any future meetings:

  1. All evidence, opinions and solutions to facilitate the widest adoption of TDM are given equal weighting, and no solution is ruled to be out of scope from the outset;
  2. All the proceedings and discussions are documented and are made publicly available;
  3. DG  Research and Innovation becomes an equal partner in Working Group 4, alongside DGs Connect, Education and Culture, and MARKT – reflecting the importance of the needs of research and the strong overlap with Horizon 2020.

The annex to this letter sets out five important areas (international competitiveness, the value of research to the EU economy, conflict with Horizon 2020, the open web, and the extension of copyright law to cover data and facts) which were raised at the meeting but were effectively dismissed as out of scope. We believe these issues are central to any evidence-based policy formation in this area and must, as outlined above be discussed and documented.

We would be grateful for your response to the issues raised in this letter at the earliest opportunity and have asked susan.reilly@kb.nl(Ligue des Bibliothèques Européennes de Recherche) to act as a coordinator on behalf of the signatories outlined below.



Sara Kelly, Executive Director, The Coalition for a Digital Economy

Jonathan Gray, Director of Policy and Ideas, The Open Knowledge Foundation

John McNaught, National Centre for Text Mining, University of Manchester

Aleks Tarkowski,  Communia

Klaus-Peter Böttger, President, European Bureau of Library Information and Documentation Associations (EBLIDA)

Paul Ayris, President, The Association of European Research Libraries (LIBER)

Brian Hole, CEO, Ubiquity Press Ltd.

David Hammerstein, Trans-Atlantic Consumer Dialogue 


PMR: I and collaegues are now technically able to mine the scientific literature in vast amounts. #ami2 takes about 2 seconds per page on my laptop. Given 1 years * 10 million papers * 10 pages that’s 2.0E+8 – 200 million seconds. That means 5 cpus – a trivial amount – can mine and index this data at the rate it appears – and we get machine-readable tables, graphs, trees, chemistry, maps and masses else. It’s a revolution.

I am legally allowed to read these papers.

But If I try to mine them I will be sued.

The planet and humanity desperately need this data. It does not belong to “publishers”. It’s the world’s right to mine this.


This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to #rds2013: #okfn Content-mining: Europe MUST legitimize it.

  1. Pingback: Unilever Centre for Molecular Informatics, Cambridge - #rds2013 Managing Research Data « petermr's blog

  2. Pingback: Recent US developments in open access | Australian Open Access Support Group

Leave a Reply

Your email address will not be published. Required fields are marked *