Scholarly HTML – what we are hoping for

#scholarlyhtml

We’ve already started the ScholarlyHTML event (with Peter Sefton’s prfesence) but tomorrow we start to ramp up with a session presented by Peter Sefton, Martin Fenner and Brian McMahon. (There’ll also be Simon Hodson and others including me). We then have a hackfest weekend and then Peter stays with us for the whole of next week, with more hacking taking place. So what’s it all about?

For me it’s about returning freedom-of-publication to authors. To publish what they want, and how they want.

Can’t they do that already? No. A major role of publishers is to restrict the flow of publications. Some refuse 90% of the publications they are sent. And at the other end they work hard to restrict the readership with paywalls, legal teams, etc. And a major casualty of this is the author. The author has virtually no freedom in how they publish. This is nothing new in creative arts where patrons insist on conditions unacceptable to authors and artists, but science is different. We are the patrons and the publishers are taking our money.

The problem is that this impacts on the “service” to authors. Authors get told that they have to publish in ways that suit the publisher. I’ve had personal experience of this. One where Henry Rzepa fought our paper through the most Kafkaesque system ever devised. But more recently when I wanted to publish a paper using HTML.

HTML? What’s that?

Well it’s the language that the rest of the world uses to create and publish electronic material – websites, adverts … It’s universal. It was designed to communicate science (it happens to sell insurance as well, but science was its motivation). It’s easy to author. Even if you don’t like pointy brackets there are zillions of free / open tools for creating HTML. So it’s obvious that scientists should use HTML for publishing.

So I was invited by a publisher a few years back to contribute to a themed issue. I asked the editor “could I publish in HTML?” “Yes” They said. So I created my manuscript on the basis that I could use HTML. It’s got the great advantage that you can lay things out where you want, it resizes, you can embed interactive objects (e.g. molecules), etc. I checked at regular intervals – I think I sent 50 emails for this one paper.

I came to submit it. The publisher refused. Well, not the publisher, but its publication robot. This is as friendly as a robot salesperson. There was no way I could submit this paper. I contacted the editor but was told I had to create it in Word. Converting my HTML to Word destroyed all my work. Half the figures couldn’t even be included. The final paper was a disaster.

I am not alone in wanting to publish in a plastic medium such as HTML. Many people do their slides in HTML. It’s plastic, fluid and semantic.

So this event is about returning to our basics.

HTML is completely suitable for all forms of modern scientific publication

It was good in the beginning, and now with HTML5 and various W3C and other specs and tools it’s all we need and all we should need. So Scholarly HTML is about reclaiming our right to express ourselves. It’s about authors.

Here are some of the things that HTML can do:

  • Embed a wide range of non-textual objects
  • Provide a machine-validatable specification (whether XHTML, XML, RDF or other)
  • Provide a manifest of what is being submitted
  • Act as a reading and writing environment

There is no reason why students shouldn’t write their theses in HTML. It’s more powerful than any other format and will allow the students and the examiners to agree on what has been submitted. There’s no reason why manuscripts should not be submitted in HTML, reviewed in HTML, processed and edited in HTML and read in HTML.

In a hackfest ideas arise naturally so we don’t want to be too prescriptive, but we have some initial starting points:

  • It’s about authors (not reviewers, or metric-weenies, or backroom production)
  • It’s platform- and tool-chain independent. There must be a toolchain but there can be (and are) many solutions.
  • ScholarlyHTML is declarative. Declarative means you state what something is, not how it is processed. The HTML exists independently of the tools. A molecule is a molecule regardless of how it looks. A table is a table. An author is an author. The declarative nature is probably the central technical core of ScholarlyHTML.
  • It’s Open. It comes from the community, not from a digital neo-colonialist. HTML was not just a markup language, it was a major blow for Freedom. We’ve lost some of the freedom. HTML was and is subversive technology.
  • It’s communal. HTML always envisaged communal activities but it’s taken a little while for good tools to arise. Now we have them. So HTML is publicly read/write. Wikis, blogs, shared docs all are communal HTML readwriters.

And the science…

  • It depends who is there but we shall definitively have some molecules, some crystal structures, hopefully some compchem.
  • One idea is to create a toolchain for writing and assembling theses. A validatable checklist.
  • Another is to create a data-journal – probably in crystallography

So will we change the world? The omens are good – scholarly publishing technology is so far behind what the rest of the world is doing that it cannot last in its present state. When something that makes sense comes along people will change to it. And when enough people are is using it, then the rest of the world has to take notice.

So:

  • Authors have a right to author in HTML
  • There is a burden that we should do it responsibly and we’ll address this. We need conventions and styles that make processing straightforward and robust. But it’s technically possible. We are not being unreasonable.

Join us – and help to make history.

 

This entry was posted in Uncategorized. Bookmark the permalink.

8 Responses to Scholarly HTML – what we are hoping for

  1. Pingback: Beyond the PDF: Some ideas for document formats and authoring tools « ptsefton

  2. Pingback: Scholarly HTML: new approaches to authoring Scientific Papers « ptsefton

  3. lou burnard says:

    in the humanities people are increasingly thinking about scholarly publishing using the text encoding initiative … a fairly well known XML vocabulary for representation of scholarly text . see http://www.tei-c.org

    • pm286 says:

      Greetings Lou,
      I used to be very familiar with TEI some years ago.
      There is no incompatibility between ScHTML and TEI-XML – we can embed islands of practice either directly or by link. TEI is an excellent example of a community of practice that can identify its material through a unique URI and give the world instructions in how to create and process TEI. ScHTML will be a means to greater awareness in both directions

  4. Claudia Koltzenburg says:

    good discussion of “HTML is completely suitable for all forms of modern scientific publication” here:
    http://friendfeed.com/claudiakoltzenburg/a4e61e20/html-is-completely-suitable-for-all-forms-of

  5. Josh says:

    The Scholarly HTML idea seems very promising. A key obstacle I see is that, at present, HTML looks horrible compared to my nice LaTeX-generated PDF, especially when math is involved. HTML/CSS/MathML/SVG, with attendant layout algorithms and renderers, simply do not provide the same visual quality as traditional systems. Until that changes, I will be reluctant to embrace Scholarly HTML in spite of its many advantages.

    • Josh says:

      For example, load up http://www.mozilla.org/projects/mathml/demo/texvsmml.xhtml in a recent version of Firefox and compare the TeX output to the MathML output.

    • pm286 says:

      >>The Scholarly HTML idea seems very promising. A key obstacle I see is that, at present, HTML looks horrible compared to my nice LaTeX-generated PDF,
      I find quite the reverse. The “prettiness” doesn’t worry me. HTML has advantages over PDF:
      * It wraps when scaled
      * It can be cut and pasted
      >>especially when math is involved. HTML/CSS/MathML/SVG, with attendant layout algorithms and renderers, simply do not provide the same visual quality as traditional systems.
      I have used your link http://www.mozilla.org/projects/mathml/demo/texvsmml.xhtml and I actually prefer the HTML because it scales beautifully, whereas the PDF just gets large and fuzzy. I expect the same is even more true with tables.
      >> Until that changes, I will be reluctant to embrace Scholarly HTML in spite of its many advantages.
      That’s a pity. You are obviously in a field where presentation is more important than formal correctness. In chemistry, for example, we are happy to make do with whatever molecular drawing tool is available and we are more concerned about whether the structure is correct than whether it is pretty.

Leave a Reply

Your email address will not be published. Required fields are marked *