Stevan Harnad – a tireless evangelist of OA – has replied to my points. He has been consistent in arguing the logic below and I agree with the logic. The problem is that few people believe that this allows us to act as he suggests.
Stevan argues that current Green Open Access allows us to do all we wish with the exposed material without permission. However when I spoke to several repositories managers at the JISC meeting all were clear that I could not have permission to do this with their current content. I asked “can my robots download and mine the content in your current open access repository of theses?” – No. “Can you let me have come chemistry theses from your open access collection so I can data-mine them/” – No – you will have to ask the permission of each author individually. So Stevan’s views on what I can do iseem not to be – unfortunately – widely held.
My concern was not with just with material in repositories but elsewhere. Some publishers allow posting on green open access on web sites but debar it from repositories. So the concerns remain.
The MIT repository deliberately adds technical restrictions from printing there theses and this also technically prevents data and text mining. There are some hacks possible to get round this but it comes close to dishonesty and illegailty.
“derivative works” is a phrase that doesn’t work well in the data-rich subjects and we need something better. But it’s what the licenses use at present.
In data-rich subjects Linking to repositories is often little use. I need thousands of texts on specialist machines accessed with high frequency and bandwidth.
My problem is not with Stevan’s views but that few others give positive support to them, particularly not the repository managers. Maybe I’m too cautious…