Using our own repository

Elin Stangeland from our Institutional Repository will be talking to us (my Unilever Centre colleagues) tomorrow on how to use it. Jim and I have seen her draft talk, but I’ll keep it a surprise till afterwards.
I still think there is a barrier to using IR’s and I’ll explain why.
We spent some of our group meeting on Friday discussing what papers we were writing and how. As part of that we looked at how to deposit them in the IR. It’s not easy in chemistry as most publishers don’t allow simple deposition of the “publishers’ PDF”. So here’s the sort of problems we face and how to tackle them.
Firstly every publisher has different rules. It’s appalling. I don’t actually know for certain what ACS, RSC, Springer, Wiley allow me to do.  Elin has a list which suggests that I might be able to archive some of my older ACS papers. etc. This is an area where I’m meant to know things, and I don’t. (I’ve just been looking through the Springer hybrid system and I do not understand it. I literally do not know why all the articles are publicly visible, but some are Open Choice, yet Springer copyright. I would have no idea which of these can be put in an IR. Or when. Or what the re-use it. I may write more about this later.)
Here are some basic problems about repositing:

  • the process from starting a manuscript to final publication can take months or years
  • there are likely to be multiple authors
  • authors will appear and disappear during the process
  • manuscripts may fission or fuse.
  • authors may come from different institutions

A typical example is the manuscript we are writing on the FOO project. The project has finished. The paper has 6 authors. I do not know where one of them is. There are 2 institutions and 4 departments involved. One person has been entrusted with the management of authoring. They are unlikely to be physically here when the final paper is published. The intended publisher does not support Open Access and may or may not allow self-archiving
We have to consider at least the following versions of the article:

  1. The manuscript submitted to the publisher (normally DOC or TeX). Note that this may not be a single version as the publisher may (a) refuse it as out of scope (b) require reformatting, etc. even before review. Moreover if after a refusal the material is submitted to a subsequent journal we must remember which manuscript is which.
  2. The publisher sends the article for review and returns reviewers comments. We incorporate these into a post-review manuscript. This process may be iterative – the journal may send the revision for further review. Eventually we get a manuscript that the journal accepts.
  3. We get a “galley proof” of the article which we need to correct. This may be substantially different from (2). Some of the alterations are useful, some are counterproductive (one publisher insists on setting computer code in Times Roman). There are no page numbers. We make corrections and send this back.
  4. At some stage the paper appears. We are not automatically notified of when – some publishers do, some don’t. We may not even be able to read it – this has happened.

By this stage the original person managing the authoring has left us, and so has one of the co-authors. Maybe at this stage we are allowed to reposit something. Possibly (1). The original manuscript. But the author has left – where did they keep the document? It’s lost.
This is not an uncommon scenario – I think at DCC 2005 were were informed that 30+% of authors couldn’t locate their manuscripts. Yes, I am disorganized, but so are a lot of others. It’s a complex process and I need help. There are two sorts – human amanuenses and  robot amanuenses. I love the former. Elin has suggested how she can help me with some of my back papers. Dorothea Salo wants to have a big bucket that everyone dumps their papers in and then she sorts it out afterwards (if I have got this right). But they don’t scale. So how can robots help?
Well, we are starting to write our papers using our own repository. Not an IR, but an SVN repository. So Nick, Joe and I will share versions of our manuscripts in the WWMM SVN repository. Joe wrote his thesis with SVN/TeX and I think Nick’s doing the same. Joe thought it was a great way to do things.
The advantage of SVN is that you have a complete version history. The disadvantage is only that it’s not easy to run between institutions. I am not a supporter of certificates. And remember that not all our authors are part of the higher education system.  In fact Google documents starts to look attractive (though the versioning is not as nice as SVN.)
Will it work? I don’t know. Probably not 100% – we often get bitten by access permissions, forgetting where things are, etc. But it’s worth a try.
And if I were funding repositories I would certainly put resource into communal authoring environments. If you do that, then it really is a one-click reposition instead of the half-day mess of rtrying to find the lost documents.

This entry was posted in repositories. Bookmark the permalink.

One Response to Using our own repository

  1. I don’t scale, no… but if the “big bucket” approach works, I’d have a good shot at convincing library brass to give me more resources. The workflow is simple enough, just tedious.
    The cage I’m in is that I have no resources, which means I can’t attract deposits, which means nobody’s inclined to give me any resources… you see the vicious circle. I’m ready to try anything and everything to break out.
    And some of the work is automatable over time. Once you know a particular journal’s inclinations, pushing anything from that journal out of the bucket becomes a two-second decision instead of a ten-minute slog through SHERPA and publisher websites.
    And yes, yes, YES to communal authoring environments! What right has a repository to a final product it didn’t help with in the slightest?

Leave a Reply

Your email address will not be published. Required fields are marked *