petermr's blog

A Scientist and the Web


Our Protocol for Text-mining: Preamble and “Institutionalism”; Elsevier and other publishers should take note

I have been invited by the UK Intellectual Property Office to collect information and produce a reply to the Hargreaves report on copyright reform. The particular area that Ben Hawes (IPO) and I agreed on is “text-mining” [I shall refine this term later]. We are doing this under the aegis of the Open Knowledge Foundation and with the help of their software. However it is not appropriate for the OKF, as a partner in the UK Government Transparency activity, to lobby for change so it will be an ad hoc group of identified individuals (perhaps under the label @ccess). It is probable, however, that the protocols we intend to develop will be part of the OKF activity, perhaps under the “Panton” brand.

Our group will represent that very serious harm is done to science and the use of science by the refusal to allow textmining. We shall be preparing our material completely in the open, coordinated on Anyone can take part in the discussion and interested parties such as publishers are invited.

We shall argue that as from today all publishers know of our activity and have the opportunity to influence what we say. A major problem is that publishers make it extremely difficult for a reader to get a useful reply to any question on rights and practice. I know, however, that staff in all major publishers follow this blog. We shall concentrate on a small subset of high-profile publishers, probably limited to Wiley/Blackwell, Elsevier, Springer, Nature, AAAS (Science), PLoS, BMC and because of my involvement in chemistry ACS and RSC. Those organizations have the opportunity to make their views and practice known on open-access. Any private mails on this subject will be posted to the list.

The publishers argue, from their own surveys, that the scholarly community assert that publishers are extremely helpful over text-mining and agree to a large percentage of requests (data collated by Eefke Smit, STM publishers’ association). Our group asserts the opposite – that publishers have been extremely unhelpful.

We shall also argue that the publishers “institutionally” oppose text-mining. (In the UK we have a phrase, “institutional sexism/racism/ageism, etc.” which identifies practices and attitudes – whether conscious or not – that oppose fundamental rights ( ). Thus the UK police have been described as “institutionally racist” and I assert that the scholarly publishing industry is “institutionally opposed” to text-mining. [If anyone has a better term please let me know]. The “glass ceiling” is a similar term. This is reflected in the large number of barriers, whether conscious or not, that publishers put in place or leave in place that effectively prevent text-mining. Institutionalism is defined as “the collective failure of an organisation to provide an appropriate and professional service to people” and I assert that the scholarly publishing industry is almost universally guilty of this for its READERS.

I will start by stating an unpleasant but true fact: many people no longer trust the scholarly publishing industry. There have been too many assertions of “we are doing everything we can”, “I’ll get back to you”, “our marketing people will look at the problem” to trust effective action. This is “institutional” – I no longer care whether it’s deliberate or unconscious, the effect is the same.

On Wednesday I talked with Alicia Wise, Elsevier’s Director of “Universal Access”. I put my concerns to her including the unacceptable manner in which Elsevier had treated me and I asserted my rights to text-mine scholarly content. [I intend to formalise these rights in the submission to Hargreaves]. It was an informal, unplanned conversation in the presence of other people and I shall not put words into her mouth. She agreed, I believe, to treat me with professional courtesy and to respond to my points in public. She said she would mail me yesterday (she hasn’t) so I am assuming she will read this blog. I have her email and will email her.


If a publisher fails to take part in public discourse on text-mining and fails to comment on the principles and protocols we shall create on the list we shall represent them to Hargreaves as “institutionally opposed to text-mining”. If you wish to take part please make your contact details known on the list, not on this blog.

The response to Hargreaves will consist of a number of questions which (generally) require the response “YES” to be seen as helpful to the provision of text-mining. A typical one is:

  • “Do you agree that facts and data are uncopyrightable?”

The only answers are YES and “not-YES” (which will be labelled by us as “unhelpful”). The following are examples of “not-YES”:


  • Failure to reply
  • Additional of conditions (“it depends on…”)
  • “I don’t have authority to answer this question”. Sorry – that’s institutionalism. It may not be YOUR personal fault, but it is your organization’s fault
  • Promises to “get back to us” – you have two weeks max as we need a week to collate for Hargreaves. That’s a fact. So start preparing now.
  • Asserting that OPEN-ACCESS should have approached person X rather than person Y.

Any publisher who is actually well-intentioned towards textmining should be trivially able to answer the questions in half an hour. Any publisher who has to worry about them is probably guilty of institutionalism.

This is the first of several posts. I shall next address our RIGHTS and what “information-mining” covers. I may then give further examples of my and my colleagues experiences of publisher institutionalism.

The list will create protocol will be a draft of acceptable textmining practice by readers, subscribers and publishers.

** PUBLISHERS and STM-PUBLISHERS ** your immediate action should be to register with the OPEN-ACCESS list and make known the identity of the persons who will answer questions for Hargreaves. That can be done today (It’s a working day in most countries).


Leave a Reply