As part of my analysis of what data repositories should look like, I look here at repos in general. There has been some useful feedback to my latest posts, mainly about Instituional Repositories (IRs), in the comments and on Twitter. Some people agree with me, others have suggested I have either got things wrong. I am not against repositories – in fact I am strongly in favour of them. I question the role of many institutional repositories and, as readers know, I argue in favour of domain-specific repositories
Institutional Repositories have only one thing in common – they are supported by cash and staff provided by the institution – and they are institution-centric. Here for example is Imperial College (http://eprints.imperial.ac.uk/):
Welcome to Spiral, the Digital Repository for research output of Imperial College. Spiral primarily contains full text peer-reviewed versions of journal articles and conference papers produced by academic staff of Imperial College London, as well as PhD theses by students of Imperial College London.
In fact NONE of the theses are visible to people outside Imperial. If I go to “More Information” (http://www3.imperial.ac.uk/library/find/spiral) It’s primarily about how to submit content (obviously only for Imperial people). If I go to the first item in “Chemistry” I find a “journal” which is actually a thesis from Cambridge (http://eprints.imperial.ac.uk/handle/10044/1/6193). One useful feature is the “Top twenty downloads” (http://www3.imperial.ac.uk/library/find/spiral/toptwenty ).
By contrast my own University’s repository exists for a completely different purpose:
DSpace@Cambridge is the institutional repository of the University of Cambridge. The repository was established in 2003 to facilitate the deposit of digital content of a scholarly or heritage nature, allowing academics and their departments at the University to share and preserve this content in a managed environment.
I have actually uploaded ca 180,000 items. There is no download indicator so I have no indication that anyone has ever downloaded anything. (Actually I have had 1 email , which shows somebody downloaded something 2 months ago). This, not surprisingly, is demotivating.
So, as a result, I am not highly motivated to explore Imperial as it is highly Imperial-centric. And I am not very motivated to deposit things in DSpace@Cam (I continue to do so, but out of a sense of duty, rather than because I want to).
By contrast Nature Precedings (run by Nature Publishing Group) runs a preprint server, and I have put papers in that, for example: http://precedings.nature.com/documents/1526/version/1 (My ill-fated paper on Open data which got buried behind the Elsevier paywall). People have read the NP offering. 11 voted for it. Now votes are not very scientific but it gives me a slight warm fuzzy feeling. It would be interested to know the downloads (and I’m slightly surprised I can’t find that). The NP site is nicely presented.
The upshot is:
- I don’t want to browse the Imperial repository – it makes me feel an outsider
- I don’t want to upload to DSpace@Cam (it’s tedious and I have no evidence anyone reads it)
- I do want to upload to Nature Precedings.
So 3 months ago I sent 11-15 papers off the Biomed Central. (J. Cheminformatics). I put them all in DSpace@Cam. The reviews have mainly come in and I think all the papers will get published. So I think I’ll upload them also to Nature Precedings and get a feel of what the world thinks of them. I’ll also see whether BMC have a “precedings” – if not, maybe they should.
And I submitted a lot of work last night to another repository – I stayed up till 0100 because of the excitement of doing so. It’s called Bitbucket. It’s how I make sure my code is working, high quality, available to everyone. The main motive was to increase our collaboration with the European Bioinformatics Institute (Christoph Steinbecks’ group at ChEBI).
There is no reason why IRs should not be able to appeal to sections of the community. But I think very few appeal to any more than very small groups, mainly within the institution. And if the repository is not clear what its purpose is, then I suspect it won’t appeal to anyone.
So I’ll leave you with Ranganathan’s laws, modified for repos (authors now have a role that they did not)
- Repositories are for use (by machines and/or humans)
- Every entry its reader
- Every author and every reader their entry
- Save the time of the reader and the author
- The repository is an evolving organism
If, for a given community of authors and readers, you can truly answer YES to every law, then you already have a success repository. Bitbucket has. Stackoverflow has. Wikipedia has. Dryad has. Tranche has. NCBI and EBI has. CKAN has. ArXiv has. Chemspider has. Figshare looks promising. Nature precedings (3500 entries) continues – I would expect more.
You cannot be everything to everyone and this is where IRs generally fail. If your main purpose is to manage the REF, say so. If it is to store theses and stop the rest of the world seeing them, say so. If it’s to create collections of important digital objects, say so. And don’t do the other things unless you are sure you can make a success of them.