Build your own Institutional Repository

I have alluded to Institutional Repositories (IR) before. Although I am an enthusiast and early adopter (having reposited 250, 000 digital objects) a year ago I would have said they were still a minority activity. Not now. Universities and related HEIs are all implementing IRs even though I suspect the majority of staff in some have never heard of them (we’ll reveal details for chemistry later). As well as the (IMO seriously overhyped) commercial tools there are several Open Source tools (ePrinst, DSpace, Fedora (not the Linux one)) and responsible services (e.g. from Biomed Central). There are reasonable paths for institutions of different sizes to take the plunge. So it was nice to see on Peter Suber’s blog:

Implementing an institutional repository

Meredith Farkas has blogged some notes on Roy Tenant’s talk on institutional repositories at Internet Librarian 2006 (Monterey, October 23-25, 2006). Excerpt:

I knew that Roy would be likely to give a very practical nuts-and-bolts introduction to developing institutional repositories and I was certainly not disappointed.
Why do it?

  • Allows you to capture the intellectual output of an institution and provide it freely to others (pre-prints, post-prints, things that folks have the rights to archive). Many publishers allow authors to publish their work in archives either as a pre-print or after the fact.
  • To increase exposure and use of an institution’s intellectual capital. It can increase their impact on a field. More citations from open access and archived materials.
  • To increase the reputation of your institution.

How do you do it? …
Software options….
Key decisions

  • What types of content do you want to accept (just documents? PPT files, lesson plans, etc?)
  • How will you handle copyright?
  • Will you charge for service? Or for specific value-added services?
  • What will the division of responsibilities be?
  • What implementation model will you adopt?
  • You will need to develop a policy document that covers these issues and more.

Implementation models

  • Self archiving – ceaselessly championed by Stevan Harnad. Authors upload their own work into institutional respositories. Most faculty don’t want to do this.
  • Overlay – new system (IR) overlays the way people normally do things. Typically faculty give their work to an administrative assistant to put it on the Web. Now, the repository folks train the admin assistant to upload to the repository instead. Content is more likely to be deposited than if faculty have to do it….
  • Service provider – not a model for a large institution. Library will upload papers for faculty. The positives is that works are much more likely to be deposited. The negative is that it’s a lot of work and won’t scale….

Discovery options: Most traffic comes from Google searches, but only for repositories that are easily crawlable and have a unique URL for each document. OAI aggregators like have millions and millions of records. They harvest metadata from many repositories. Some may come direct to the repository, but most people will not come there looking for something specific. Citations will drive traffic back to the repository.
Barriers to success:

  • Lack of institutional commitment
  • Faculty apathy (lack of adoption and use)
  • If it is difficult to upload content, people won’t use it.
  • If you don’t implement it completely or follow through it will fail.

Strategies for Success

  • Start with early adopters and work outward.
  • Market all the time. Make presentations at division meetings and stuff
  • Seek institutional mandates
  • Provide methods to bulk upload from things already living in other databases
  • Make it easy for people to participate. Reduce barriers and technical/policy issues.
  • Build technological enhancements to make it ridiculously easy for people to upload their content….

This is a good summary. I’d add that much of the early work in IRs has come from subjects where the “fulltext” is seen as the important repositable [almost a neolgism!]. We’re concerned with data and my repositions have been computations on molecules. I also admit that even as an early adopter I don’t self-archive much. This is mainly because the publishers don’t allow me to. In some cases I cannot even read my own output on the publishers website as Cambridge doesn’t subscribe to the journal online.
I have just realised what I have written! The publisher does not allow me to read my own work! We accept this?
The message from Open Scholarship was that voluntary repositing doesn’t work. There has to be explicit carrot and/or stick.
So while you are implementing your own IR make sure that you can reposit data as well as fulltext. This will be a constant theme of this blog!

This entry was posted in open issues. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *