I have had time to reflect on http://www.repositoryfringe.org/ (the meeting of repositarians in Edinburgh) and having been recently concerned about the publishing of data (about which I shall post more later) I post my current analyses of the UK repository scene (I don’t know enough about elsewhere). I shall try to be objective, possibly constructive, but this will probably be a rather uncomfortable post. Before I start I’ll say that I have been committed in the past to working with my local repo and more generally the repo community.
I am going to comment (>> PMR) as a working research scientist who needs a repository for (a) collaboration and (b) data publication and storage at all stages of the scientific endeavour. My comments do not necessarily extend to other disciplines or other purposes.
Here are some basic motivations for repos (http://en.wikipedia.org/wiki/Institutional_repository ) :
- to provide open access to institutional research output by self-archiving it; >>PMR: This hasn’t worked for science and isn’t going to. I have self-archived some of my publications pre-publication but not post-publication. Most publishers of chemistry do not permit post-publication, the process is complex, distracting and I know of no cases where scientists search in IRs for post-publication material.
- to create global visibility for an institution’s scholarly research; >>PMR This is a useful function but IRs are generally poorly set up as showcases and there is so little science in most that I don’t go looking. (Why would I look at the output of the University of X? I might if they were headhunting me, but not otherwise)
- to collect content in a single location; >>PMR this has no value for the average scientist. It is primarily (if at all) for institutional purposes such as managing the Assessment exercises
- to store and preserve other institutional digital assets, including unpublished or otherwise easily lost (“grey”) literature (e.g., theses or technical reports). >>PMR. This is the only thing that might be useful to me *if I could discover the material easily and read it*. As an example of the non-use, Imperial College prevents anyone outside the institution reading any of their ca 1000 theses. This is not the norm, but it is impossible to answer the question “show my all UK theses”. The interfaces to the ca 200 UK IRs are hotch-potch and completely unnavigable by machine. So I agree with “store and preserve” (which is no use to most scientists in the modern world) but not “discover”.
And from Alma Swan: (I exclude topics above, teaching, measurement, showcasing):
- Providing a workspace for work-in-progress, and for collaborative or large-scale projects; >>PMR This is something I have been urging repos to do as I think it’s the only thing that would provide something of value to the average scientist. If scientists used their university system for managing their work processes and data then they would have naturally engaged. But I think repos are running out of time and I think there are existing solutions which have a trajectory and will work.
If repos wish to engage with scientists I think the only real way forward is to help create *single* domain-specific repositories. Examples of these are Dryad, Tranche, PDB, etc. NCBI/EBI resources. The model would involve domain scientists running the [single] repository (let’s say for computational chemistry) and one or more traditional repos managing the sustainability. Note that scientists do not, in general, care about preservation beyond a few years at most. Scientist will not and should not put data directly into their own IR – it fragments the discipline and there are no good search tools.
So I have painted a fairly stark picture for IRs and science. They aren’t working and they aren’t going to work in their current form. The only area of possible interest is theses. To do this the IRs must, across all institutions:
- Make their content Open. If the response is “it’s the student’s copyright, we can’t do anything” then we are not interested.
- Label the Open content as open (machine-readable). It is *impossible* in any repository I have visited to find specifically Open material in bulk (i.e. by machine –reading). So almost all thesis and other content in UK repositories is closed.
- Make it iterable. It should be possible to list everything in the repository systematically. Google does this but academics are usually forbidden to do so. Relying on Google to search University information is simply bottling the problem. I have floated this idea, got very little take up, even though it could be done in a week if the community put its effort into it. I doubt they will, but would be happy to be proved wrong.
On the assumption, therefore, that IR’s have nothing to offer scientists either in data management or discovery my next posts will turn to solutions from different sectors.