petermr's blog

A Scientist and the Web


Scholarly HTML – major progress


Our extended hackfest over the last 3 days has made huge progress towards Scholarly HTML. We will be posting reports on #beyondthepdf lists and also continuously and continually updating Etherpads, wikis, code pages, etc. Our current starting place is

Anyone is free to contribute – we ask you to identify yourself. There is an FAQ at If you have a question, simply ask it in the current style. The Etherpad has a total memory of ALL changes so it can be backtracked.

We’ll be posting more details as fast as I and others can, but here are some basic details:

We held an introductory session in Chemistry on Friday afternoon with presentations from Martin Fenner, Peter Sefton and Brian McMahon. REALLY valuable in setting the scene. Then dinner. On Saturday about 10 physical and 6 virtual attendees starting from ca 10 o’clock. Lunch at the Alma. Dinner at the Panton. Sunday hack through morning and snack through lunch till ca 1500 – Marting and Brian leave; others continue with citations-references. Dinner at M-R house. Flop…

Physical attendees:

Peter Murray-Rust

Peter Sefton

Lezan Hawizy

Dan Hagon

Brian McMahon

Martin Fenner

Sam Adams

Mark MacGillivray

David F. Flanders

David Jessop

Cameron Neylon

Nick England


Virtual attendees:

Egon Willighagen (Stockholm Area/Sweden)

Jakob Voss (Hamburg/Germany)

Aaron Culich (San Francisco/USA)

Mark Hahnel (London/UK)

Claudia Koltzenburg (Hamburg/Germany)

Graham Steel


Outcomes (VERY brief):

Scope of ScHTML. It’s a community-based activity, re-using best practice, but with minimal entry barriers. IOW it can apply to almost every public activity in scholarship (research, education, reference, record). It is not just for submitting publications – it can manage student essays, lab notebooks, Wikipedia entries, chemical databases, etc. See the FAQs YOU can contribute questions or suggested answers.

Technology of ScHTML. The minimum entry is simply to be able to create modern well-formed HTML5. Everything beyond that is done by evolving agreements (“conventions”). It can, if necessary, be edited by hand, though we are obviously keen to create a flexible toolchain. In practice this is often already present and a matter of identifying current good tools and good practice.

Social and political aspects. ScHTML is not owned by any one institution or person – it is owned by you. (Wikipedia is quite a good, but not perfect, analogy). Anyone can contribute and their influence is based on the value of their contribution. If someone wants to do something new they can – and the convention structure means that they cannot “break” other parts of the effort. ScHTML is an evolving ecosystem constrained only by the acceptability of the ideas and the ability to create the tools and distribute them. ScHTML is guided by a set of principles, which are as yet only part formed – they will form an evolvable “constitution”. We are informed by the IETF mantra “rough consensus and running code” ( ).

Conventions. The ability to “do your own thing” is governed by “conventions”. These are sandboxes of practice identified by a unique Identifier (URI) and with a description of the convention. In a convention participants have complete freedom to create their own ScHTML infrastructure, governed only by the constraints of HTML5. If they wish to develop complex objects (e.g. scientific articles) they will have to work out how to create objects, edit them, disseminate them, search them and display them. Their success will depend on the strength of the community, the available toolset, the ability to write their own, their advocacy and the innate compulsion of their activity.

Packaging. The major drawback of HTML is that it does not have a universal packaging format (e.g. a page with embedded PNGs cannot be saved or transmitted in a universally safe and recognizable way. This is one reason why DOC and PDF are often used – not for their format, but because they wrap several objects. This is currently the greatest challenge for ScHTML. Should we re-use existing formats (e.g. ePub) or develop our own? There are social and technical pluses and minuses for most ways forward.

Exemplars: We shall create ScHTML from the start ( )both technically and socially ScHTML. The first activity is on Citations (References) and their management within ScHTML documents. But if you want to set up a convention – anything from molecules to mountains to midges – you can do so. The only criterion is that you adopt the very simple syntax of HTML and the convention mechanism.

Citations. . We have created the first draft of a convention for citations (again, anyone can contribute). We are creating an example of how a modern scholarly document would use this convention to manage its references. We are adapting existing Open technology to create a new “reference manager” and we hope that RSN we shall start to remove the current hideous and unnecessary waste of time in “formatting” and even worse “reformatting” references. If you want to get “into the spirit and practice” of ScHTML, take part in this activity.

Will it work?

That’s up to YOU. Wikipedia works – and it’s an excellent lead to follow. This is not a moonshot – it’s primarily about doing simple sensible things that already exist. It works best if you do it in the spirit of helping others to use and re-use your work, but it can equally well be used for private material or protected commercial activity.




Leave a Reply