petermr's blog

A Scientist and the Web


My talk on #openaccess at University of Leicester 2014-04-04:1300 UTC

April 3rd, 2014

I’m talking tomorrow at the University of Leicester in the heart of England. Leicester is where Richard III was recently dug up in car park. But more importantly it’s an excellent University and I have worked with the Biomedical groups in the past. It’s where DNA fingerprinting was discovered by

Here’s our session

The international Open Access movement has been embraced by UK government and funding councils (including new HEFCE requirements) for both publications and now the data behind them. Come and meet some OA enthusiasts from within the University and influential expert visitors and find out how Open Access might benefit you personally. OA is a great way of increasing the visibility of your research which can lead to new collaborations and impacts.

You’ll be able to follow remotely and also tweet – if you have a good question, tweet it.

I don’t know what I shall talk about.

What? PMR doesn’t prepare his talks?

I don’t even know how much I shall talk and how much the others there will talk about. I’ve got things at the top of my mind and so have they. If there is a small group (say less than 20) I like to rearrange the seats into a circle. But if it’s a tiered theatre then it becomes more of a lecture or Q and A.

I do prepare my talks. I have slides, blogposts, code, repositories. And I will flip between them as needed. I give demos. Most demos work; some don’t. I’m happy now with eduroam – the academic network – so I don’t have to go through tedious registration. (One US university – Penn State – made us install keyboard logging software to track everything we did).

I’ve asked whether this can be recorded- and it will be!. When sessions aren’t based on linear slides (which I don’t normally like) then it’s useful to know what I and others said and be able to present it to others.

Michelle Brook won’t be physically there but we’re hoping to get a Skype connection so she can talk us through some of the WellcomeTrust APC data. (below).

It’s always tremendous to have a hashtag, and also to storify it. I’ll tweet it when we know.

So.. What’s at the top of my mind?

  • What is Open? Open is a state of mind, not a process. Is Open Access Open? Why are we spending billions of dollars on Open Access – the answer is not trivial.
  • The Wellcome Trust APC data set . This is massively important. WT and Michelle and community have made something wonderful. It’s something that libraries should start broadening asap. It could be the start of Open Bibliography – the “map of scholarship”. Probably the main focus
  • The Hargreaves review of copyright. Again massively important. Every library in UK should be actively preparing to take action NOW. And do not sign any agreements with any publisher till you understand this.
  • Aaron Swarz. Are you fighting for what he believed in?
  • Content Mining and The hidden wealth in University repositories. I’m hoping to have time to show machines That was my initial title for this session, but events have overtaken us.
  • The rise of the anti-publisher. Elsevier and NPG have recently made it clear they are gearing up to restrict access to knowledge. The ACS cuts off whole Universities. Many publishers market CC-NC-ND as “what authors want”.  Publishers – with their restrictions and “Universal Access” are now between Bradbury’s firemen and Orwell’s Ministry of Truth. Where do Universities and Libraries stand? Are they working for the readers of the world or – implicitly – for publishers?
  • The Future of Science reference is not in libraries or publishers but in the new generation of Wikipedia – where I’m speaking, Mozilla – and hopefully – our own Content Mine.  Now is the time to jump on the bus or you will miss the Digital Century.

And what’s at the top of yours? I’m guessing…

  • How are we going to force our authors to make their publications Open Access so we can put them in the next REF?

No. It should be:

  • What is the role of the University Library in the Digital Century?

Let’s go back to Ranganathan….


I hope you are kidding. Every librarian should recite the laws till they know them by heart:

  1. Books are for use.
  2. Every reader his [or her] book.
  3. Every book its reader.
  4. Save the time of the reader.
  5. The library is a growing organism.

For “Book” read “Knowledge”. In the digital read-write age we should add some more:

  • save the time of the author
  • the whole world are authors and readers

However with the rise of the AntiPublisher we have new laws. Four years ago I wrote  Nahtanagran’s laws of modern library science:

  1. Books are for selling.
  2. Every purchaser his [or her] books.
  3. Every book its purchasers.
  4. Make money for the seller.
  5. The seller is a growing organism.

Has it got better in the last four years?

  • No. It’s got worse.

Is Open Access as currently envisaged going to make it better or worse? Think before you answer. And we’ll discuss tomorrow.




ACSGate: Pandora opens the American Chemical Society’s box and her University gets cut off

April 2nd, 2014


Pandora is a researcher (won’t say where, won’t say when). I don’t know her field – she may be a scientist or a librarian. She has been scanning the spreadsheet of the Open Access publications paid for by Wellcome Trust. It’s got 2200 papers that Wellcome has paid 3 million GBP for. For the sole reason to make them available to everyone in the world.
 She found a paper in the journal Biochemistry (that’s an American Chemical Society publication) and looked at . She got that OK – looked to see if they could get the PDF - - yes that worked OK. 
What else can we download? After all this is Open Access, isn’t it? And Wellcome have paid 666 GBP for this “hybrid” version (i.e. they get subscription income as well. So we aren’t going to break any laws…
The text contains various other links and our researcher follows some of them. Remember she’s a scientist and scientists are curious. It’s their job. She finds:
<span id="hide"><a href="/doi/pdf/10.1046/9999-9999.99999">
<!-- Spider trap link --></a></span>
Since it's a bioscience paper she assumes it's about spiders and how to trap them. 

She clicks it. Pandora opens the box...

The whole university got cut off immediately from the whole of ACS publications. "Thank you", ACS

The ACS is stopping people spidering their site. EVEN FOR OPEN ACCESS. It wasn't a biological spider. 
It was a web trap based on the assumption that readers are, in some way, basically evil..
Now *I* have seen this message before. About 7 years ago one of my graduate students 
was browsing 20 publications from ACS to create a vocabulary. 
Suddenly we were cut off with this awful message. Dead. The whole of Cambridge University. I felt really awful.

I had committed a crime.
And we hadn't done anything wrong. Nor has my correspondent.
If you create Open Access publications you expect - even hope - that people will dig into them. 
So, ACS, remove your spider traps.  We really are in Orwellian territory where the 
point of Publishers is to stop people reading science.

I think we are close to the tipping point where publishers have no 
value except to their shareholders and a sick, broken, vision of what academia is about.

See comment from Ross Mounce:

The society (closed access) journal ‘Copeia’ also has these spider trap links in it’s HTML, e.g. on this contents page:

you can find

<span id="hide"><a href="/doi/pdf/10.1046/9999-9999.99999">
<!-- Spider trap link --></a></span>


 I may have accidentally cut-off access for all at the Natural History Museum, London 
once when I innocently tried this link, out of curiosity. 
Why do publishers ‘booby-trap’ their websites? Don’t they know us researchers are an 
inquisitive bunch? I’d be very interested to read a PDF that has a 9999-9999.9999 
DOI string if only to see what it contained – they can’t rationally justify 
cutting-off access to everyone, just because ONE person clicked an interesting link?
PMR: Note - it's the SAME link as the ACS uses. So I surmise that both society's outsource their web pages to some third-party 
hackshop. Maybe 10.1046 is a universal anti-publisher. 

PMR: It's incredibly irresponsible to leave spider traps in HTML. It's a human reaction to explore. 

Lib Dem MEP thinks Net Neutrality will be safe in Europe

April 2nd, 2014

This is an example of web democracy in action. I mailed my MEPs yesterday about Net Neutrality and here’s a detailed useful reply. It’s clear my representatives understand my concerns. This is what an Open Neutral Web allows.

Dear Mr Murray-Rust,

Thank you for emailing Andrew regarding the legislative proposal on the Telecoms Single Market and its net neutrality provisions in particular.

It is important to stress that the new legislative proposal covers a wide range of issues. It seeks to abolish roaming charges and improve rights for both users and service providers, as well as strengthening net neutrality in order to achieve a truly open internet for all.

Andrew has been in close contact with his colleagues Fiona Hall MEP and Jens Rohde MEP, who is leading on this issue for the Alliance of Liberals and Democrats in Europe (ALDE), the political group that Lib Dem MEPs belong to in the European Parliament.  Mr. Rohde is aware of concerns about a two-tiered internet and discriminatory agreements between access providers and content providers. Mr Rohde and Andrew are strong supporters of an open internet and are determined to make sure this openness is maintained.

Andrew also met recently with a net neutrality campaigner and tech professional in Cambridge to discuss some of the subtler aspects of the package and agrees with their view that the draft report to the ITRE committee must be strengthened.

Work is ongoing on a number of points but it is already clear that the final text will include a robust definition of net neutrality and will have strong language on disallowing any blocking or throttling by internet services providers, as well as ruling out any discrimination against online content or applications. The Parliament’s text will also stipulate that “specialised services” can only be provided when there is network capacity to do so and when such services are not to the detriment of general internet access. These safeguards are essential to ensure an open internet across Europe.

The ALDE group has been working on compromise amendments to strengthen these passages in the legislation and these efforts will continue into the final plenary session in Strasbourg later this year.

For your information you may wish to consult the following links:

The procedure file:

The main criticisms:

A response from the Commission:

I should emphasise that we are not taking all of the Commission’s points at face value but investigating and challenging them. When the Commission has got it wrong in the past, which is not unprecedented – particularly with regard to legislation that pertains to the internet – the Liberal group has not hesitated to vote to reject the report in its entirety. For example, we saw to it that the ACTA package was junked.

In conclusion, Andrew will be strongly supporting the principles of an open internet and net neutrality and will seek to ensure that specific amendments to the final report are passed that achieve these aims.

Thank you again for contacting Andrew about this matter. I hope this reply is of some help.

Yours sincerely,

Kilian Bourke
Caseworker to
Andrew Duff
Liberal Democrat MEP for the East of England

Our planet is dying and publisher Copyright greed is helping to kill it. Why aren’t you Angry?

April 2nd, 2014

We are all aware (or we should be) of the dire effects of human-made climate change.

Two days ago the IPCC issued the starkest warning yet.

Climate change is not going to happen.


James Lovelock , Nobel Laureate for gas chromatography, and known for the Gaia metaphor told the world that humans will have to live in mega-city hives.

So how do we find out about this in a responsible way?

We read the IPCC report.

The Intergovernmental Panel on Climate Change (IPCC) is the leading international body for the assessment of climate change. It was established by the United Nations Environment Programme (UNEP) and the World Meteorological Organization (WMO) in 1988 to provide the world with a clear scientific view on the current state of knowledge in climate change and its potential environmental and socio-economic impacts. In the same year, the UN General Assembly endorsed the action by WMO and UNEP in jointly establishing the IPCC.

and tell others about it?

NO. You can’t tell others about it without the permission of IPCC. (The restrictions are at best CC-NC-ND).

Unless otherwise stated, the information available on this website, including text, logos, graphics, maps, images, audio clips or electronic downloads is the property of the IPCC and is protected by intellectual and industrial property laws.

You may freely download and copy the material contained on this website for your personal, non-commercial use, without any right to resell or redistribute it or to compile or create derivative works there from, subject to more specific restrictions that may apply to specific materials.

Reproduction of limited number of figures or short excerpts of IPCC material is authorized free of charge and without formal written permission provided that the original source is properly acknowledged, with mention of the complete name of the report, the publisher and the numbering of the page(s) or the figure(s). Permission can only be granted to use the material exactly as it is in the report. Please be aware that figures cannot be altered in any way, including the full legend. For media use it is sufficient to cite the source while using the original graphic or figure. In line with established Internet usage, any external website may provide a hyperlink to the IPCC website or to any of its pages without requesting permission.

For any other use, permission is required. To obtain permission, please address your request to the Secretary of the IPCC in a signed letter with all relevant details using official letterhead and fax it to: +41 22 730 8025.

THE PLANET IS DYING. And we have to write (with pen and ink) on official letterhead (this rules out most of the citizens on the planet) to try to stop it. This is Laputa in the Digital Century.

It gets worse. There are thousands of references in the report. Many are in scholarly articles that are behind paywalls. 40 USD for one read for one day. It would probably cost the average citizen (citizens are the people that will die) 50, 000 USD to read all the referenced papers.

I wrote about this six months ago:

I came up with suggestion then – that any citizen could help with. Almost complete silence. I’m going to try again. And again.

I have said this before:


And now we can see that they are actually dying.

Inaction is nearly as dangerous as publisher greed.




I wrote to my MEPs to preserve Net Neutrality; You can too, it’s easy

April 1st, 2014

Net Neutrality is a battle we must win or face Digital darkness for decades or ever. Europe votes this week – YOU must tell your MEPs to vote. Here’s what I wrote using It took me 10 minutes. (I didn’t have to look up the names of the MEPs or type them in – the site does all of that)

Dear Andrew Duff, David Campbell Bannerman, Geoffrey Van Orden, Vicky Ford, Richard Howitt, Stuart Agnew and Robert Sturdy,

Dear MEPs,
I’m writing to ask you to vote for “Net Neutrality” in the forthcoming Commission’s Telecoms Package proposal. I am a member of OpenForum Europe – a body who for many years has campaigned for clarity and Openness in IT issues in Europe. Recently we wrote to you with our arguments (copied on my blog).

I am an academic, and proud to be in Cambridge and the Eastern region which is one of the top innovation areas of the world. A free and open internet allows UK ideas and people to thrive. It was Tim Berners-Lee who created the idea of the World Wide Web on which so much has happened. It creates new businesses, new ideas, new science, challenges orthodoxy and even lets me write to you!

From our letter:

Brazil is successfully pushing its own ‘net neutrality’ law through the legislative process and it is a question of time when other countries will follow.

“The moment you let neutrality go, you lose the web as it is. You lose something essential – the fact that any innovator can dream up an idea and set up a website at some place and let it just take off from word of mouth”, said Tim Berners-Lee, the inventor of the World Wide Web.

Please take the time and interest to consider what is at stake. There is still a possibility to correct this shortcoming and introduce a text that truly safeguards the net neutrality in the EU.

Yours sincerely,

Peter Murray-Rust

Now go lout and do it. Tomorrow is too late.

Scholarly Soup Kitchen welcomes new HEFCE OpenAccess repository and Hargreaves Copyright Reforms

April 1st, 2014

I’m thrilled by the following news which appeared early today. For those who don’t know the publishing industry well The Scholarly Soup Kitchen is seen as one of the key commenters on all things scholarly and is widely read for its perspective on innovation and independence from conventional power blocks.  Its CEO Ant Kenderson  commented on the recent HEFCE proposals for Open Access repositories

Yesterday the UK’s HEFCE made a tremendous stride forward when it mandated that all evaluation of its academics should take place through Open Access papers in a repository. At SSK we have consistently argued that all publication should be open and available to the whole world.  We feel that the vast fees demanded by so-called “Glamour mags” are outrageous and we should strive for zero-cost of access. The main point of the new repository is to promote the outcome of UK scholarship to the whole world for free – we hope that other countries follow suit. arXiv can publish for as little as 7 USD, and HEFCE is following suit.

We have a few minor quibbles. HEFCE still allows infinitely long embargoes and we are working with them to remove this clause. No modern publisher likes embargoes as it means people can’t read papers, and that’s the whole point of publishing, isn’t it. Also HEFCE seems to allow authors to ignore the mandate. Effectively they say “if you want to publish in journal X and it doesn’t allow for deposition in the repo then find a more suitable journal. But if you can’t find one, don’t worry we shan’t enforce the mandate”. That’s wrong and we at SSK believe that all authors should be forcibly persuaded to comply.

The other exciting news of the week is that the UK will reform copyright by 2014-06-01.   This will allow text and data mining without permission, format shifting and much more. We at SSK have always felt that copyright stifles innovation and so we welcome Hargreaves. We’re sad that it doesn’t go far enough and we at SSK have always pressed for the removal of the “non-commercial” clause. We support “the right to read is the right to mine” and want it to become universal. We’re working to persuade publishers to change their ideas and welcome open content mining.

We are delighted by the lifting of copyright on parody. Everyone should have the right to poke fun at pompous or out of date people and institutions. Parody, like Swift or Orwell, can change our values and liberate basic human values. Swift parodied the publishing industry which, in his time, did not publish but left all the work to others and simply added a “factor d’impact”  designed by Queen Anne. Orwell lambasted the Departments of Openness and Truth which all major publishers  implemented and whose role was to create barrier for readers (“murs de payment”) ,  and browbeat authors and reviewers into slavery for the industry.

Thank goodness those days are gone.

The full text of  Kenderson’s post can be read here.



Write to your MEPs to vote to safeguard Open Internet in Europe

March 31st, 2014

I am proud to be a Fellow of the OpenForumAcademy – which promotes openness in IT standards and procurement. We are very concerned about the pressures to lead to two/many-tier Internet access and we urge “Net Neutrality”. Read this and then write to your MEP.

Don’t know who s/he is? Or how to write?

Simple  in UK. Go to – it will tell you everything. Don’t just copy the letter below – make it a bit personal.

  • About how a free Internet generates wealth for your region.
  • About how it encourages your constituents to keep in touch with MEPs
  • About the ability to share culture across Europe

You get the idea? Now tell them how to vote.

From: Maël Brunet <>Date: 31 March 2014 12:33Subject: A chance to safeguard the Open Internet in EuropeTo:

Inline images 1

Dear Member of the European Parliament,

On April 3rd, you will have the opportunity to vote on the Commission’s Telecoms Package proposal. As you are surely aware, ITRE committee adopted on March 18th its report with proposed amendments for the EP. We are disappointed with the final outcome of this vote that we believe is detrimental to an open Internet and would like to take this opportunity to address this issue with you.

We are an independent, not-for-profit industry organisation that aims at promoting open and competitive ICT market. As such, we would like to draw your attention to the vague definition of ‘specialised services’ as adopted by the ITRE members in the aforementioned report. We believe that this is a dangerous loophole. In fact, this provision opens a space to use these services for exploiting the Internet in a way that is deeply detrimental to innovation and the EU citizens as the end users.

We fear that the wording as it stands would allow Internet Service Providers (ISPs) to prioritise content/application providers that can comply with the financial conditions of the ISPs. This would undoubtedly lead to service monopolies, hindering the competition as a direct consequence. In addition, the ISPs would lose any incentives to invest in the open Internet and the services thereof would slowly deteriorate. Moreover, the end-users would be trapped to use and access only services, contents and/or applications of providers that can pay a prioritised accessibility under this ‘specialised services’ loophole provision. Asresearch indicates, we need to guarantee that investment continues to be made in the ‘open’ part of the network in order to avoid a ‘dirt road’ effect whereby ‘specialised services’ would become the norm rather than the exception.

The success of the global Internet and the World Wide Web has been built on the sole concept of openness, with access being guaranteed to all without favour to any individual, organisation or commercial company. This would not be the case any more, should the definition of ‘specialised services’ be maintained in the text as recommended in the report. We urge you not to miss this opportunity and use your mandate to ensure the full impact of advances to innovation that are introduced by the package. In this regard, we strongly welcome and support the alternative amendments to the regulation bill proposed by the ALDE, S&D, Greens/ALE and the GUE/NGL groups. Europe is at a crossroad and needs to decide whether it will maintain a leadership position in the digital age. In this very moment, Brazil is successfully pushing its own ‘net neutrality’ law through the legislative process and it is a question of time when other countries will follow.

“The moment you let neutrality go, you lose the web as it is. You lose something essential – the fact that any innovator can dream up an idea and set up a website at some place and let it just take off from word of mouth”said Tim Berners-Lee, the inventor of the World Wide Web.

Please take the time and interest to consider what is at stake. There is still a possibility to correct this shortcoming and introduce a text that truly safeguards the net neutrality in the EU.

Yours sincerely, 

Maël Brunet (Mr)

Director, European Policy & Government Relations
OpenForum Europe

UK Copyright reforms set to become Law: Content-mining, parody and much more

March 30th, 2014

I have been so busy over the last few days and the world has changed so much that I haven’t managed to blog one of the most significant news – the UK government has tables its final draft on the review of copyright. See .

This is fantastic. It is set to reform scientific knowledge. It means that scientific Facts can be extracted and published without explicit permission. The new law will give us that. I’m going to comment on detail on the content-mining legislation, but a few important general comments:

  • UK is among the world leaders here. I understand Ireland is following, and the EU process will certainly be informed by UK. Let’s make it work so well and so valuably that it will transform the whole world.
  • This draft still has to be ratified before it becomes law on June 1st. It’s very likely to happen but could be derailed by (a) Cameron deciding to go to war (b) the LibDems split from the government (c) freak storms destroy Parliament (d) content-holder lobbyists kill the bill in underhand ways.
  • It’s not just about content-mining. It’s about copying for private re-use (e.g. CD to memory stick), and parody. Reading the list of new exceptions make you realise how restrictive the law has become. Queen Anne in 1710 ( didn’t even  consider format shifting between technologies.  And e-books for disabled people??

So here’s guidance for the main issues in simple language:

and here are the details (I’ll be analysing the “data analytics” in detail in a later post):

And here’s the initial announcement – includes URLs to the IPO and government pages.

From: CopyrightConsultation
Sent: 27 March 2014 15:06
To: CopyrightConsultation
Subject: Exceptions to copyright law – Update following Technical Review

The Government has today laid before Parliament the final draft of the Exceptions to Copyright regulations. This is an important step forward in the Government’s plan to modernise copyright for the digital age. I wanted to take this opportunity to thank you for your response to the technical review and to tell you about the outcome of this process and documents that have been published.

As you will recall, the technical review ran from June to September 2013 and you were invited to review the draft legislation at an early stage and to provide comments on whether it achieved the policy objectives, as set out in Modernising Copyright in December 2012.

We found the technical review to be a particularly valuable process. Over 140 organisations and individuals made submissions and we engaged with a wide range of stakeholders before and after the formal consultation period. The team at the IPO have also worked closely with Government and Parliamentary lawyers to finalise the regulations.

No policy changes have been made, but as a result of this process we have made several alterations to the format and drafting of the legislation. To explain these changes, and the thinking behind them, the Government has published its response to the technical review alongside the regulations. This document sets out the issues that were raised by you and others, the response and highlights where amendments have been made.

It is common practice for related regulations such as these to be brought forward as a single statutory instrument. However, the Government is committed to enabling the greatest possible scrutiny of these changes and the nine regulations have been laid before parliament in five groups.  In deciding how to group the regulations, we have taken account of several factors, including any relevant legal interconnections and common themes. The rationale behind these groupings is set out in the Explanatory Memorandum.

The Government has also produced a set of eight ‘plain English’ guides that explain what the changes mean for different sectors. The guides explain the nature of these changes to copyright law and answer key questions, including many raised during the Government’s consultation process.  The guides cover areas including disability groups, teachers, researchers, librarians, creators, rights-holders and consumers. They also explain what users can and cannot do with copyright material.

The response to the Technical Review and the guidance can be accessed through the IPO’s website: <>.  This also provides links to the final draft regulations, explanatory memorandum and associated documents that appear on<>.

It is now for Parliament to consider the regulations, which will be subject to affirmative resolution in both Houses. If Parliament approves the regulations they will come into force on 1 June 2014.

Thank you again for your contribution.

Yours sincerely,

John Alty

Elseviergate today: LIBER says to Libraries: DONT sign Elsevier’s click-through licence for Content Mining (TDM)

March 29th, 2014

A month or so ago Elsevier published a “click-through” licence “allowing” researchers to use Elsevier content for Text-and-Data-Mining (TDM) – more widely content mining.  Nature News rejoiced and suggested everyone could start mining. I read the licence carefully and wrote several [start] blog posts [end] showing the great danger of anyone signing . Effectively DONT.

LIBER, the European association of Research Libraries flagged these and said it would do a thorough analysis which has now been published.’s-text-and-data-mining-policy I’ll show most of this below with my comments. It’s necessarily long, so, to summarise:


Other publishers and publisher syndication – e.g. DOI resolvers – may develop their own TDM “licences”


So here’s why (summarised)

  • The licences add additional restrictions and no freedoms
  • Researchers could find themselves in legal trouble
  • Libraries could find themselves in trouble
  • Legislation is coming in UK and elsewhere which renders these licences unnecessary. You will simply be signing away your right
  • Publishers’ APIs are worse than using the standard access to research papers
  • You do NOT need publishers software. There is better Open Access software that is free.

So, if an Elsevier rep approaches you with a shiny new contract with a TDM clause, strike it out. YOU have the power. Tell the world.

Now the TL;DR bit. I reproduce much of LIBER and comment.

LIBER  believes that the right to read is the right to mine and that that licensing will never bridge the gap in the current copyright framework as it is unscalable and resource intensive. Furthermore, as this discussion paper highlights, licensing has the potential to limit the innovative potential of digital research methods by:

  1. restricting the tools that researchers can use
  2. limiting the way in which research results can be made available
  3. impacting on the transparency and reproducibility of research results.

The full text of the discussion paper is included below or can by downloaded here.

PMR: Yes. LIBER and many others (JISC, BL, etc.) walked out of the attempt to force licences on us. My highlighting 

Over the last twelve months LIBER has devoted a considerable amount of effortto making the case for the need for changes to copyright legislation in order to allow researchers to employ digital research methods to extract facts and data from content. We believe that this will exponentially speed up scientific progress and innovation in Europe. Having explored the issue of TDMwith our members and other stakeholders in the research community we have come to the conclusion that licensing will never bridge the gap in the current copyright framework as it is unscalable and resource intensive.

In the current vacuum left by a legal framework that is unfit for the digital age, and with the ensuing lack of legal clarity, it is unavoidable that libraries or researchers will have to agree to further licences for the mining of content to which they already have access. The terms of such licences, however, should be such that they reinforce the position that the right to read is the right to mine, and not impose restrictions on how researchers apply research methods or disseminate their research.

UK members should exercise particular caution when considering TDM licence terms, since an exception in UK law for text and data mining is imminent and, dependent on the wording in this new exception, TDM licence terms may undermine what researchers will be permitted to do under this update to UK copyright law. Ireland is also considering such an exception.

PMR: This has now been tabled (I shall blog it) and is substantially what has been drafted for the last year. It gives all the rights we felt we could ask for. Singing Elsevier’s contract or any other contract will simply restrict your rights.

This paper has been released in response to the recent launch of the new Elsevier text and data mining policy and API. It is understood that Science Direct licences will be amended to include language around access for TDM. Many libraries may be considering signing, or have even already signed up to the terms and conditions laid out under this new licence.

PMR: DONT sign. Much of what libraries have signed has restricted scholarship for no gains. STOP HERE>

Other publishers may also be considering following in the footsteps of Elsevier by introducing similar terms for the licensing of text and data mining activities into their licence agreements. LIBER is concerned that some of the licence’s terms and conditions relating to content mining may be unnecessarily restrictive and that systematic and widespread adoption of such terms and conditions will severely hamper the progress and dissemination of data-driven research.


The institutional licence agreement for text and data mining

In order for a researcher within a subscribing institution to gain access to Elsevier content for the purpose of mining, it is necessary for the institution to update their licence agreement to allow text mining access. Note that within this agreement “text mining access” does not mean access to the content on the Elsevier Website that universities subscribe to. Access to content for the purpose of mining is limited to access via an API. The licence explicitly prohibits the use of robots, spiders, crawlers or other automated programs, or algorithms to download content from the website itself, which are the most common ways of performing content mining. Although the new Elsevier policy claims that it “enshrines text- and data-mining rights” in subscription agreements, in reality, under these terms, it compels institutions to agree to very restrictive conditions in order to gain very narrowly defined “access” to content for the purpose of mining.

PMR: Elsevier’s API is constructed solely to reduce the view of the content, control the way it is accessed and monitor what is done. It is not necessary and has no beneficial process. (PLoS and BMC provide all that is necessary without APIs).

Access via an API

An application program interface (API) is a set of programming instructions and standards for accessing a web-based software application. In the case of the API offered by Elsevier, the API provides full-text content in XML and plain-text formats.  The use of APIs for the mining of metadata is not uncommon. However, article content is much richer, potentially containing images, figures, interactive content, and videos. For researchers in many different disciplines there is as much value in the images and figures contained in the article as there is in the text. In fact, for researchers in disciplines such as the humanities, genetics, chemistry, these may be the most valuable content elements. The Elsevier API allows access to thetext only.And the access limit is an arbitrary and proportionally tiny 10,000 articles per week.

PMR: In the ContentMine we are already extracting data from images and expect to handle millions of figures a year.

Crucially, researchers develop their own tools for handling and exploiting this rich and diverse variety of content and formats. In order for students and academics to be able to perform research freely, in the way that makes sense for their own studies, they must have the freedom to interrogate, query and structure content in ways that fit with their own needs, technologies and requirements. The requirement to use pre-defined publisher technologies hampers academic freedom, learning, and data driven innovation.

PMR: Innovation is critical. Publishers have failed to innovate and held back innovation. We are innovating.

Even for those researchers for whom the API is sufficient, the licence does not guarantee sustained access to the API, as the following clause indicates:

3.4 Elsevier reserves the right to block, change, suspend, remove or disable access to the APIs and any of its services at any time.

PMR: Were you pleased when Elsevier or Nature tightened their policies on Green OA recently? They can do that on TDM.

Use of robots

The Elsevier policy expressly forbids the use of robots for content mining on the grounds that it would place too much strain on their infrastructure. Open access publishers, whose infrastructure is exposed to all web users on the open web,have reportedthat the demand placed on their infrastructure by robots for content mining is negligible and any increase in demand will be easy to manage. For subscription services such as those provided by Elsevier, the demand placed on their infrastructure should be even less, as only users registered at subscribing institutions will have access.

PMR: I can mine the whole literature on my laptop. That’s probably 0.00001% of daily usage. If that crashes Elsevier they shouldn’t be in the business. This argument is FUD.

Control of outputs

Under the terms and conditions of the updated licence agreement the outputs are controlled in the following ways:

1.    Outputs can contain “snippets” of up to 200 characters of the original text

This is an arbitrary limit. Because this is essentially a limit on the amount of text that can be quoted from the original source, it could potentially result in misquotation or, at the very least, an inaccurate representation of the original research.

PMR: some chemical names are > 200 characters. Truncating these could KILL PEOPLE.

2.    Licensed as CC-BY-NC

In signing up to the Elsevier licence agreement, researchers are asked to agree to make their output available under a CC-BY-NC licence. The outputs of TDM are very often facts and data, which are not subject to copyright; however, the Elsevier licence agreement stipulates that this non-copyright information should be put under a licence for copyright works.

In addition, the definition of “non-commercial” is highly ambiguous and open to interpretation. In effect, a CC-BY-NC licence prevents downstream use of the results and may also put researchers who are performing research under a grant agreement that mandates that data be openly available in a difficult position. Universities are also increasingly engaging in, and being encouraged by governments to enter into business partnerships with, private business. This is known as the “knowledge transfer agenda”. We recommend that universities and researchers decide before signing the Elsevier licence whether there is a possibility that the outputs of the research they wish to undertake are commercial. As facts and data are not copyrightable, LIBER’s position is that they should be made available under a CC0 licence.[1]

PMR: The only reasonable way to publish scientific Facts is CC0. We enshrined this in the Panton Principles. These are , for example, endorsed by BMC and Cameron Neylon of PLoS is a co-author

Registration and click-through licences

In order for an individual researcher to gain access to the Elsevier content that their institution subscribes to, he/she must register directly with the Elsevier developers portal, provide details about the research they wish to undertake, and agree to the terms of a click-through licence. LIBER is particularly concerned about making such demands of researchers for the following reasons:

1.    We want to protect the privacy of our users.

Libraries have a strong track record of putting measures in place to protect the personal details and reading habits of our patrons. By requiring researchers to register individually and to provide details of their research project, Elsevier is circumventing the protections that libraries have put in place. The reason given by Elsevier for this requirement is that the publisher needs to check the credentials of the individual accessing the content. However, in authenticating individual user accounts the institution has already established the bona fide nature of the researcher. Further verification should not be necessary. We object to data about the research being performed by our users in our institutions being collected by an external third party. It is not the job of a publisher to control, monitor and vet what research takes place at a university.

2.    We want to protect our researchers from undue liability.

Many institutions employ full time experts to negotiate the terms and condition of licence agreements on their behalf. This process can take months, and yet, a researcher is expected to agree to the Elsevier click-through licence in a matter of seconds. The terms of this click-through licenceare extremely complex, in many places unclear[2] and could haveserious down-stream implications for the outputs of the research. We also note that there is no cap on liabilities for a researcher:

2.3 The User will be solely responsible for all costs, expenses, losses and liabilities incurred, and activities undertaken by the User in connection with TDM Service. [BOLD here is from LIBER]

What is more, Elsevier retain the right to amend the terms, without notice and the changes will be deemed accepted by the researcher immediately. This is unacceptable.

Many of the responsibilities that are placed on the researcher by the click-through licence will be difficult to implement in practice e.g. the licence states that copyright notices may not be changed from how they appear in the dataset. This means that in a dataset derived from 10,000 articles there may be at least 10,000 appearances of the word “copyright”. A normal way of dealing with this “noise” would be to remove these irrelevant data from the dataset, but this would contravene the terms of the licence.

The click-through licence also makes it impossible to ensure the transparency and reproducibility of research results as the researcher may not share the dataset used for the research project and must delete it after use. The researcher is also expressly prohibited from depositing this dataset in their institutional repository.

Lastly, the licence is silent on post-termination use of the results of content mining. The licence will be terminated if the subscribing university “does not maintain a subscription to the book and journal content in the ScienceDirect® database”.If a researcher has mined thousands of articles, how do they check that each and every one is being subscribed to? If one or many are cancelled, what does this mean for the results, categorisations and hypotheses contained in data they have invested time and effort to produce?

PMR: Can anyone suggest that these terms are good for science?


We estimate that European universities spend in the region of €2 billion a year on Scientific Technical and Medical published content, the vast majority of which is on e-journal subscriptions. The new Elsevier licence terms and added requirement of an additional licence for each and every researcher who wishes to mine the content raises questions about what institutions are actually purchasing when subscribing to digital information. The implication of the Elsevier TDM policy is that institutions only purchase the right to cache, look at, print out, and do a word-search on a PDF. We believe that universities should be able to employ computers to read and analyse content they have purchased and to which they have legal access. An e-subscription fee is paid so that universities can appropriately and proportionately use the content they subscribe to. For what other purpose is a university buying access to information?

Research and innovation is best encouraged in a free-thinking and enabling environment where researchers can fully exploit the content they have access to through their library. Going forward, it is important that libraries can ensure that the scientific freedom of their researchers is not eroded, and the impact of their scientific outputs undermined, by limits imposed through licences.

[1]This licence is recommended so that reuse is not prevented under the sui-generis Database Directive.

[2]Terms used in the licence such as “recognition” and “classification” (2.1.1) are unclear. Another crucial, term “integration” (3.3) has been left undefined.

PMR: In summary, the ONLY reason for Elsevier’s licence is to give them stranglehold over this new technology. Libraries gave away author’s rights (they should have flagged this and communally refused to let it happen).

Any library who signs a publishers’ TDM clause will destroy the new information-led science.

Even if you aren’t in UK it is very probable that it is legally allowed to extract facts. The only thing stopping you doing it is the additional clause you have agreed to with the publisher.

Kill the restrictive clauses you sign with the publisher. You don’t have to.





The WellcomeTrust APC spreadsheet (Michelle Brook, Ernesto Priego and community) adds massive crowdsourced value to Open Access. YOU can help

March 27th, 2014

NOTE ADDED after first version. This is so massive that I completely forgot to mention a whole chunk of contributors including Ernesto Priego, Graham Steel (McDawg) and others. Here’s Ernesto’s blog:

where he first outlined the headline figures and plotted the amounts paid to each major publisher.  I was concentrating on the mechanics of community editing, rather than the whole community picture. So yes, anyone else involved should add info. We are a community, not a group of infighters.


Last week The Wellcome Trust published its list of ca. 2000 articles for which it had paid Article Publishing Charges (APCs). It spent about 3 million GBP.

Those publications are a valuable investment. On Monday Mark Walport told us at the EuropePMC young scientist writers awards that publishing was as valuable as test tubes. Well-communicated science is of great value. Science behind paywalls loses hugely. My rough guess is that publishing is ca 1-2% of the cost of the grant, so I’d guess this represents about 200 million GBP overall investment. [See below how to avoid the guessing].

But what the Wellcome Trust lists offers is just the beginning. Michelle Brook , who runs Science at the Open Knowledge Foundation, immediately saw the potential. With great energy (and loss of sleep) she coordinated volunteers to curate this list. The result is at

This isn’t the “version of record”. It’s a snapshot. Get used to the idea that in the Digital Century everything is snapshotted. There is often no “final version”. There may be intermediate versions used for specific purposes – for example checking that Elsevier has published what it got paid to publish. But everything is capable of revision and enhancement – in so many ways. I’ll give some below.

Michelle is using Google spreadsheets – which allows anyone to view the exact state of the spreadsheet. When she first prepared the spreadsheet it could be a bit confusing because if anyone sorted a column it alters everyone’s views.  But we solve that by social, not technical means. We know who is there – they are all friends (by definition you are part of the community) and we let each other know what we are doing.

The result is mind-blowing. It’s a human-machine synthesis of a section of scholarly publishing. So here’s a rough roll of honour:

  • Mark Walport, Robert Terry for making Wellcome the most dynamic force in Open Access and providing the funding
  • Robert Kiley and colleagues
  • Michelle Brook (andOKFN) for pulling this together and in no order (and maybe with omissions)
  • Stuart Lewis
  • Theo Andrew
  • Nic Weber
  • Jackie Proven
  • Fiona Wright
  • Stuart Lawson
  • Jenny Molloy
  • Yvonne Budden
  • SM
  • Rupert Gatti
  • Peter Murray-Rust
  • ck

That’s 13 contributors in less than a week. That’s how crowdsourcing works. About half the entries have names, so there’s lot of opportunity for you. You don’t need to have any specialist knowledge – and it’s open to all. Would make a good high-school project. Open Access Button could be involved, for example.

I think this spreadsheet has added a million GBP to Wellcome’s output.

What???!!! That’s an absurd amount to claim for 1 week of crowd sourcing. OK, I’ll revise it below…

Yes. There is 200 million GBP of investment. If no-one knows about it its values is small (we can count people trained, buildings kept-up, materials, etc.). But the major outcome of research funding, apart from people and institutions is KNOWLEDGE.

If the knowledge is 100 million, that’s a bad investment. If it’s 200 million, it’s marginal. To be useful the knowledge must be at least 300 million. [I'll claim a multiplier of 5 for the mean of Open Knowledge and I'll write a separate post...].

So what can this spreadsheet be used for?

  • we can download all the full text and search it. ["some of this isn't CC-BY" so you can't do that... Well I'm going to mine it for Facts, and that's legal and anyway if you want to take me to court and claim that copyright stops people doing research that stops people dying I'll see you there. It's Open - Wellcome Trust has paid huge amounts of its own money and we have a moral right to that output.]. So expect the Content Mine to take this as a wonderful resource.
  • we can teach with it. For most science the publishers forbid teaching without paying them an extra ransom. Well, there’s enough here that we can find masses of useful examples for teaching. tells, sequences, species, phylogenetic trees, metabolism, chemical synthesis, etc. When you are creating teaching resources one of the first places you will look will be the WT-OKFN spreadsheet
  • we make science better. There’s enough here to create books of recipes (how-tis), typical values, etc. We can detect develop FRAUD detection tools.
  • we can engage citizens. ["Hang on - you're going too far. Ordinary people can't be exposed to science". Tell that to cyclists in cambridge - there's a paper on the "health benefits of cycling in Cambridge". I think they'll understand it. And I think they may be more knowledgeable that many paywall-only readers.]
  • we can detect papers behind paywalls. and the hints are that it’s not just Elsevier…
  • we can develop the next generation of tools. This spreadsheet is massive for developing content-mining. It’s exactly what I want. A collection of papers from all the biomedical publishers and I know I can’t be sued.
  • a teaching resource. If I were teaching Library and Information Science I would start a modern course with this spreadsheet. It’s a window onto everything that’s valuable in modern scientific information.
  • an advocacy and awareness aid.
  • a tool to fundamentally change how we communicate science. This is where the future is and it’s just the beginning. Information collected and managed by new types of organisation. The Open Knowledge Foundation. Democracy and bottom-up rather than top-down authoritarianism. If you are in conventional publishing and you don’t understand what I have just said then your are in trouble. (Unless of course you have good lawyers and rich lobbyists who can stop the world changing). We haven’t even put it into RDF yet and that will be a massive step forward.
  • a community-generator. We’ve already got 13 people in a week. That’s how Open Streetmap started. it’s now got half a million. WT-Brook could expand to the whole of enlightened scientific communication. Think Wikipedia. Think Mozilla, Think Geograph, Think OpenStreetMap. Think My Society, Think Crowdcrafting, Think Zooniverse. These can take off within weeks or months.


So it was silly to suggest this spreadsheet liberates a million pounds of value. I’ll be conservative and settle for ten million.