What are the formal restrictions on text-mining?

#oscar4 #okfn #pantonpapers

A little while ago I suggested that we create whitepapers (“Panton Papers”, /pmr/2010/07/24/open-data-the-concept-of-panton-papers/ ) to help our development of open science. We’ve come up with some titles and I’ve drafted one on text-mining /pmr/2011/03/28/draft-panton-paper-on-textmining/ . There’s now a useful response from Todd Vision on the Open-science discussion list (http://lists.okfn.org/pipermail/open-science/2011-April/000698.html )

Peter’s draft whitepaper on text-mining is badly needed and nicely put. I was particularly interested in this passage:


“The provision of journal articles is controlled not only by copyright but also (for most scientists) the contracts signed by the institution. These contracts are usually not public. We believe (from anecdotal evidence) that there are clauses forbidding the use of systematic machine crawling of articles, even for legitimate scientific purposes.”


We have also heard tell of the existence of such clauses, but also have not been able to secure first-hand evidence for them. It would be very nice to promote this from “anecdotal” to “documented”, and I would like here to put out a wider plea for anyone who might be able to provide the language of these contractual re[s]trictions. Alternatively, I would welcome suggestions for how we are to know what exactly we are prohibited from doing in light of the confidential nature of the contracts.


If copyright holders really wish to enforce such restrictions, it seems odd that their very existence is little more than a rumor. Can secret restrictions be legally enforced?




So I’d very much like to have authoritative evidence on this area. If anyone has first hand evidence of the FORMAL restrictions on text-mining please let us know on this blog or open-science. Ideally this would actually be hard figures but if institutions are actually debarred from publishing this information then I, as a reader of the material, am in an interesting position. So before poking around in my usual fashion I’d like any evidence that’s publishable without fear of lawsuits.

If not I will have to try to find this out the hard way.




This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to What are the formal restrictions on text-mining?

  1. Pete Carroll says:

    Re: Text & Data Mining
    This issue has come up several times in submissions to the Hargreaves Review on Intellectual Property & Growth (2011) (www.ipo.gov.uk)
    Submission from National Centre for Text Mining (NacTem) particularly p4
    Also British Library submission (page 12)
    I hope this useful. Maybe these organisations, and probably JISC, can provide hard evidence of problems in licences

Leave a Reply

Your email address will not be published. Required fields are marked *