Repositories: give us the tools

From Peter Sefton’s blog:

00:43 09/08/2007, Sefton
I have already mentioned this blog post lamenting the use of PDF instead of HTML in an online journal:

In short, choosing to use PDF rather than HTML tends to make the content less open than it otherwise could be. That feels wrong to me, especially for an open access journal! One could just about justify this approach for a journal destined to be published both on paper and online (though even in that case I think it would be wrong) but surely not for an online-only ‘open’ publication?
http://efoundations.typepad.com/efoundations/2007/08/open-online-jou.html

One of the commenters nails the issue:

Go find ’em a workflow that produces good HTML as well as PDF, and I’m sure they’ll sign right on.
Posted by: Dorothea Salo | August 06, 2007 at 01:54 PM

The workflow that produces good HTML as well as PDF is what we’re after with the ICE-RS project. I talked about the project in my paper for the ETD 07 conference. I use ICE to write this blog, and you get both HTML and PDF. And the e-Journal of Instructional Science and Technology (e-JIST) is published in ICE, meaning that all the papers are in HTML and PDF. Anyone who wants help trying out ICE contact me.

Now why is that paper of mine only available in PDF at the moment?

It’s because it’s a real pain to add it to the Eprints software we use at USQ you have to upload the HTML and all its images and so on one at a time.

If you’re using other repository software, at least the stuff that’s commonly used in a Australia, then you’re out of luck as most of it doesn’t handle HTML at all.

It would help for the Open Access community and repository software publishers to help drive the adoption of HTML by making OA repositories first-class web citizens. Why isn’t it easy to put HTML into Eprints, DSpace, VITAL and Fez?

To do our bit, we’re planning to integrate ICE with Eprints, DSpace and Fedora later this year building on the outcomes from the SWORD project when that’s done I’ll update my papers in the USQ repository, over the Atom Publishing Protocol interface that SWORD is developing.

PMR: PeterS is right. The time has come for a proper investment in tools. Filling repositories with PDFs is a very limited solution and it does nothing for data-driven science. At present if anyone asks me where they should reposit their data I’m tempted to tell them “in the Cloud” rather than in their repository.
HTML (XHTML) is the necessary first step. It will emphasize the need for structured documents, compund documents, structured document collections, etc. I’m looking forward to SWORD.
See also Peter Suber commenting on this:


  • I strongly support tools to improve the quality, handling, and professional uptake of HTML. The sooner we have HTML editions of scholarly eprints, next to or instead of PDF editions, the better. HTML and PDF files can both be OA, but HTML facilitates re-use of the content and PDF (deliberately) retards it.
  • “ICE-RS” stands for Integrated Content Environment for Research and Scholarship.

PMR: notice the little word “re-use”. Start practising how to say it. Then how to explain it. Then how to make it happen.

This entry was posted in data, open issues. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *