What is an Institutional Repository for?

Have had great time today at Colorado State University talking to the Library and Information Scientists. Lost of ideas, especially on the role and construction of Institutional Repositories. I am still revising my views about this and feel that the classic model (if anything 3 years old can be classic) where scholars deposit a finished digital gem may not be the only one.

In preparing my presentation I looked around for repository models and suddenly realised I had been using one for years - Sourceforge. This is an ideal model, as long as you accept the tenet of Open Source (not the same as Open Access, but philosophically aligned). SF is a repository for computer code and manages complete version control and also a complete collaborative environment. I make a change to the code - it gets a new version number, but also I can still retrieve all previous versions. Also Any of my collaborators can make changes and I update seamlessly to include all their enhancements. So why not use the same software - SVN - to manage our repositories.

Publishing a scholarly manuscript is a complex workflow. (OK, I'm a coward and usually find a co-author who does it for me). It goes like this:

  • Author A writes a draft and circulates it to B and C (at a different institution)
  • B makes some changes
  • C makes some changes
  • A updates with B's changes
  • D (oh, yes, we can change the authorship) edits A's original manuscript
  • there are now at least 3 versions of the manuscript circulating
  • A prints them all out and tries to reconcile changes
  • etc.
  • finally F makes the finished version and sends it to the publisher
  • publisher sends reviewers comments to F who forwards them to B and C
  • C makes changes and resends them to F
  • F sends revised draft to publisher
  • A complains that s/he didn't see the comments
  • F sends further revised draft to publisher
  • weeks pass
  • and some more
  • X mails A saying why not put m/s in IR
  • Publisher only allows reposition of author's m/s pre-submission
  • A mails F saying that publication has appeared
  • mail bounces saying F has moved
  • A tries to recover m/s from B, C and E (yes E was in it).
  • A edits the mess into what might have been sent to publisher and mails to X

BUT using SVN it's trivial - assuming there is a repository.

So we do not speak of an Institutional Repository, but an authoring support environment (ASE or any other meaningless acronym. )

A starts a project in institutional SVN.

B joins, so do C, D, E, etc.

They all edit the m/s. Everyone sees the latest version. The version sent to the publisher is annotated as such (this is trivial). All subsequent stuff is tracked automatically.

When the paper is published, the insitution simply liberates the authorised version - the authors don't even need to be involved.

The attractive point of this - over simple deposition - is that the repository supports the whole authoring process.

If you want to start, set up SVN - it's easy and there are zillions of geeks who know how to do it. It's free, of course, and also very good. That's it. It's easiest if the authoring is done in LaTeX as then the diffs are obvious.  But Word will probably do fine (modern word is saved as XML). Start with single authors - thesis candidates, humanities, etc.


I was also honoured to have a videocast interview which CSU will make available (under CC) soon. I have a few  personal observations on Open Foo, the role of publishers, of libraries, etc.

This entry was posted in Uncategorized. Bookmark the permalink.

9 Responses to What is an Institutional Repository for?

  1. ojd20 says:

    3 reasons SVN wouldn't do a good job on Word (or ODF) files: -

    - The document formats usually compress the component files into an archive.

    - The markup Word generates is crufty (from what I've seen). Open Office Writer seems to be no better.

    - Diff doesn't do well at XML

  2. Egon says:

    But at least everyone will be working on the latest draft. You would need to require locks though, to allow editing the .doc one by one. SVN supports that, AFAIK.

  3. pm286 says:

    Thanks Jim and Egon,
    There are certainly problems with Word, but at least all the versions would be present. I have tried to deal with multiple authors on Word documents sent round by email and it was a disaster.

  4. baoilleach says:

    Check out writely at http://docs.google.com/. It's very impressive and seems to be built with collaboration in mind.

  5. ojd20 says:

    Good point, Noel. Google Docs copes extremely well with versioning, even if two people are editing different parts of the same document. I don't think any tool that requires pessimistic concurrency (locking) is good enough for a collaborative authoring environment.

  6. Peter Sefton says:

    Google Docs does indeed cope well with concurrent editing, but it is far from a preservation quality authoring system; the HTML it generates is awful. Some researchers at USQ have looked at Google docs.

    As others have noted neither Word's XML or the ODF will behave very well with standard (or even XML) zip files, as there is a lot of application logic needed to interpret the XML, but most word processors do have diff features.

    And finally, have a look at the Integrated Content Environment (ICE), an open source SVN-based collaborative multi-output publishing environment for word processing documents we are building at USQ, we have done a lot of work to abstract away the SVN operations so that users click simple buttons: like 'Sync' and 'Get changes'. We deal with conflicts by moving the conflicting doc out of the way and renaming it my-changes, then leave the offending author to sort it out.

  7. Pingback: Unilever Centre for Molecular Informatics, Cambridge - Jim Downing » Blog Archive » Repository problem solving

  8. Kate Taylor says:

    I was interested and amused to read your account of writing a paper, which ties in with mine. We are trying to put Word documents with Visio schematics under SVN, and wanted to point out that this is not entirely straightforward when edits are made, just in case you are going to try to do it!

  9. pm286 says:

    (8) Thanks Kate. I agree that word is not easy - but even having the individual binary (or XML) documents would not be a disaster. I think we shall increasingly see XML diff tools

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>