Mandate Theses in HTML

Peter Sefton has put forward a cogent case that all theses should be submitted as HTML. [If you are a PDF-junkie read the sentence carefully before howling. It doesn’t say only HTML.] He wishes USQ to…

Be the first university in Australia to mandate that theses are deposited in the institutional repository in HTML, with linked data and embedded semantics as well as the standard paper-on-screen PDF file.


Ill [PTS] start with the theses. The Open Access movement is now well established, and USQ already has a mandate (1) that all theses are to be submitted electronically and to go into ePrints when the degree is conferred. This does help to make research available to the community that paid for it, but it is such a pity that in the web age we are still stuck with the paper view of a research output. Citations are not reliably machine readable, data sets are rarely made available and if they are they are not linked into the thesis. And worst of all, the thesis is not made available in HTML where it is part of the fabric of the web. Can you imagine a university getting away with a web site which was PDF only? We certainly try not to deliver courses that way2. In most web situations PDF is considered an accessibility barrier and yet in the repository community its the main game.

There are some universities around the world with XML production systems for theses, where HTML should be available but as far as I know none of them have achieved the level of automation that we have or spent as much effort on the semantic web in way that will be usable by candidates. This is partly because most of the efforts have used complex XML schemas which are not a good match for word processing documents, whereas we target HTML which is a reasonably good match for a generic styled word processing document.

So why are institutions in general not mandating that these must be available as web pages?

Well, in most places that would be because it is too hard to do. Regrettably you cant just save as HTML from Word and expect to get repository-quality web pages, or expect any-old LaTeX file to be magically web-ready. You can read me ranting on about that in this list of delicious links about how hard it is to make HTML from word processors. But at USQ, we have a not-so-secret weapon: ICE. The Integrated Content Environment is the core university system we use here to create our long-form courseware a lot of which is very similar in size and structure to a thesis. With Jim Downing and team at Cambridge, we have shown on the ICE-Theorem (2) project how chemical theses can be created in ICE and published to the web complete with embedded chemical semantics and everyones favourite the rotating molecule (theyre taking this much further with Chem4Word which we will try to work with as well). Jim and I will be presenting that work at Open Repositories 2009. And we have a few other sample theses that show that we can produce rich web-based theses and still have the core part delivered as a printable PDF file.

We have the systems. We know it can be done. Its a small institution. Lets do it.

[PMR] I’ll be presenting the power of the semantic thesis at ETD2009 next month. There is no technical reason why theses cannot be deposited as HTML. (Alongside the PDF, of course). And when that’s in place we can deposit as Word or ODT. Then we will have a truly semantic thesis.

And, just to remind you for the n’th time the universities can make their own rules here. They do, anyway. If someone has to have a gown of colour c, speak words in a foreign language, sign impenetrable declarations, etc. surely the production of the work in HTML which babies now learn in their cradle is possible. The students will hardly blink if you require HTML from them.

So just do it

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Mandate Theses in HTML

  1. rpg says:

    Hell, I did this by hand in 1996. And of course there are XML tools which can do the bulk of the work. What do people think _publishers_ do?!

  2. Ah, revival of Jmol and JChemPaint…
    I don’t like the stress of HTML very much, though. Let’s make that XHTML + other namespaces? Can ICE handle MathML? What about RDFa? Surely it does CML? How does ICE visualize CML?

Leave a Reply

Your email address will not be published. Required fields are marked *