A very thoughtful post by Cameron Neylon about a very thoughtful talk by Andy Powell about why institutional repositories don’t work and in their current form won’t work. I’ll post snippets and comment:
20:52 10/06/2008,[…]
The problem with institutional repositories in their current form is that academics don’t use them. Even when they are being compelled there is massive resistance from academics. There are a variety of reasons for this: academics don’t like being told how to do things; they particularly don’t like being told what to do by their institution; the user interfaces are usually painful to navigate. Nonetheless they are a valuable part of the route towards making more research results available. I use plenty of things with ropey interfaces because I see future potential in them. Yet I don’t use either of the repositories in the places where I work – in fact they make my blood boil when I am forced to. Why?
PMR: Exactly so.
So Andy was talking about the way repositories work and the reasons why people don’t use them. He had already talked about the language problem. We always talk about ‘putting things in the repository’ rather than ‘making them available on the web’. [PMR emphasis]. He had mentioned already that the institutional nature of repositories does not map well onto the social networks of the academic users which probably bear little relationship with institutions and are much more closely aligned to discipline and possibly geographic boundaries (although they can easily be global).
But for me the key moment was when Andy asked ‘How many of you have used SlideShare’. Half the people in the room put their hands up. Most of the speakers during the day pointed to copies of their slides on SlideShare. My response was to mutter under my breath ‘And how many of them have put presentations in the institutional repository?’ The answer to this; probably none. SlideShare is a much better ‘repository’ for slide presentations than IRs. There are more there, people may find mine, it is (probably) Google indexed. But more importantly I can put slides up with one click, it already knows who I am, I don’t need to put in reams of metadata, just a few tags. And on top of this it provides added functionality including embedding in other web documents as well as all the social functions that are a natural part of a ‘Web2.0’ site.
PMR: Yes. This is how the world actually works, not how repositarians would like it to work
Andy was arguing for global discipline specific repositories. I would suggest that the lesson of the Web2.0 sites is that we should have data type specific repositories. FlickR is for pictures, SlideShare for presentations. In each case the specialisation enables a sort of implicit metadata and for the site to concentrate on providing functionality that adds value to that particular data type. Science repositories could win by doing the same. PDB, GenBank, SwissProt deal with specific types of data. Some might argue that GenBank is breaking under the strain of the different types and quantities of data generated by the new high throughput sequencing tools. Perhaps a new repository is required that is specially designed for this data.
PMR: Fully agreed. CrystalEye is an aggregatory for crystals. eCrystals promises to be a repository. We should have, we shall have, an open repository for chemical preparations. And an Open repository for chemical spectra. All three separate. And it would be nice
[… role of preservation snipped…]
But the key thing is that all of this should be done automatically and must not require intervention by the author. Nothing drives me up the wall more than having to put the same set of data into two subtly different systems more than once. And as far as I can see there is no need to do so. Aggregate my content automatically, wrap it up and put it in the repository, but I don’t want to have to deal with it. Even in the case of peer reviewed papers it ought to be feasible to pull down the vast majority of the metadata required. Indeed, even for toll access publishers, everything except the appropriate version of the paper. Send me a polite automated email and ask me to attach that and reply. Job done.
For this to really work we need to take an extra step in the tools available. We need to move beyond files that are simply ‘born digital’ because these files are in many ways still born. This current blog post, written in Word on the train is a good example. The laptop doesn’t really know who I am, it probably doesn’t know where I am, and it has not context for the particular word document I’m working on. When I plug this into the WordPress interface at OpenWetWare all of this changes. The system knows who I am (and could do that through OpenID). It knows what I am doing (writing a Blog post) and the Zemanta Firefox plug in does much better than that, suggesting tags, links, pictures and keywords.
PMR: Read Cameron’s blog and Andy’s slides. Then ask yourself what do I want from my repository?
What I do not want is DSpace in its current form. Or ePrints. I don’t know Fedora. For me DSpace is a write-only system. I have (with Jim’s help) put 200,000+ molecules into DSpace and Cambridge. I thought this would make them available to the world. It doesn’t. Yes, someone can get one out by hand if the really want. But not the whole lot. Google doesn’t index them because they have the wrong file suffix.
I have thousands of revisions of code in Sourceforge. I can get every single revision. I share this with anyone in the world who is interested. They can contribute. The system pings me when they do. The material is safe. (I don’t know how it’s safe, but it’s safe.)
Whereas the code on my current machine is less safe. I’ve just had to change machines. Waht did I lose? I am still finding out. To be fair the COs have backed up everything and so I don’t think I’ll lose actual files. But I lose environment. I have to reinstal programs. Reconfigure fonts, etc.
So what do I want from my Institution? Not a repository.
I want an AUTHORITORY. And I’ll explain what that is in a future post.
Pingback: More on Repositories | The Logical Operator
Pingback: Unilever Centre for Molecular Informatics, Cambridge - Staudinger’s Semantic Molecules » Blog Archive » Twine as a model for repository functionality?
Pingback: PT’s blog » Blog Archive » More on negative click or net benefit repositories