CML on ICE – towards Open chemical/scientific authoring

Because WWMM had outages my blogging is behind and I’d written a post on Peter Sefton’s ICE. Peter and I met at ETD2007 and immediately clicked. But WWMM went to sleep and I haven’t reposted. Peter has beaten me to it.
ICE is a content authoring tool based on Open Office. It works natively with XML and Subversion. So it adds a dramatic aspect to document authoring – versioning with full community access and collaboration (if required). For example, if Peter and I write a paper about this we’d use the ICE server at University of Southern Queensland to store the versions. And of course as it’s Open Source anyone can set one up – it would be ideal for the Blue Obelisk community to author papers with.
But what catalysed this was the possibility of authoring theses. Students and looking for imaginative approaches and many will be happy to be early adopters in this new technology. If the domain-specific components are in XML (or other standards) it becomes easy to integrate them into ICE. And it is fantastic to be able to revert to previous versions at – I find Subversion easier than Word change management for example.
So some points from PeterS’s post:

View this page as PDF


I mentioned before that at the ETD 2007 conference I met Prof Peter Murray-Rust. [1] We’re going to collaborate on adding support for CML the Chemical Markup Language to ICE, so that people can write research publications that include ‘live’ data.

[1] I’m just Petermr or PMR 🙂

Here’s a quick demo of the possibilities.
I went to the amazing Crystaleye service.

PMR: This is Nick Day’s site. We’d hoped to announce it formally a week or so ago but machine problems kep us back. But we’ll get some posts out this coming week. [We thank Acta Cryst/IUCr for a summer studentship which helped greatly to get it off the ground.]

The aim of the CrystalEye project is to aggregate crystallography from web resources, and to provide methods to easily browse, search, and to keep up to date with the latest published information.

Crystaleye automatically finds descriptions of crystals in web-accessible literature, turns them into CML and builds pages like Acta Crystallographica Section B, 2007, issue 03-00.
From that page I grabbed this two dimensional image of (C6H15N4O2)2(C4H4O6-2),

PMR: Minor point: This is just the anion – there is a separate image for the cation.(the 3D structure below displays the cations as well).

There’s a Java applet on the page that lets you play with the crystal in 3d. Here’s a screenshot. of the 3d rendering.

There’s lots more work to be done, but I thought I’d show how easy it is to make an ICE document that shows the 2d view for print, with the 3d view for the web, via the applet. Be warned, this may not work for you. The applet refuses to load in Firefox 2 for me, but it does work in Safari on Max OS X. If you follow the ‘view this page in PDF’ link above you’ll see just the picture.

PMR: image and applet deleted here …

What’s happening here?
My initial hack is really simple. I grab the image and paste it into ICE like any other image, but then I link it to the CML source. I wrote a tiny fragment of Python in my ICE site to go through every page, and if it finds a link to to a CML file containing an image, it adds code to load the CML into the Jmol applet. This is a kind of integration-by-convention, AKA microformat.

The main bit of programming only took a few minutes, but sorting out where to put the CML files and the Jmol applet, and integrating the changes into this blog took ages. I ended up putting the files here on my web site which meant putting a big chunk of stuff into subversion, something that should have been done ages ago, but the version of svn that runs on my other server refuses to do large commits over HTTPS ‘cos of some SSL bug and I can’t figure out how to update it which meant switching the repository to use plain HTTP, and so on. It wasn’t made easier by me mucking around with the Airport Extreme router and our ADSL modem at the same time, halting internet access at home for a couple of hours.
To make this integration a bit more usable and robust we want to:

  • Work out a workflow that lets you keep CML files in ICE and easily drop images in to your documents, letting ICE render using the applet when it makes HTML.
  • Integrate forthcoming work from Peter & team that will provide high quality vector graphics instead of the PNG files I’m using now.

PMR: I have now hacked JUMBO so it generates SVG images of 2D (and soon 3D molecules). Note that this then allows automatic generation of molecular images in PDF files (through FOP/SVG)

  • Investigate embedding CML in an image format such as EPS that word processors understand.
  • Generalize this approach for other e-scholarship applications. We’re working with the Alive team at USQ on this.
  • Talk to the DART & ARCHER teams.

I am also extremely keen to talk to these teams – as they are doing very similar and complementary work to our SPECTRa and SPECTRA-T projects in capturing scientific data at source.
I am impressed by the Australian commitment to Open Access, Open Data and collaborative working. ICE is an excellent example of how we can split the load. ICE likes working with the technical aspects documents (I don’t really though I have to). The Blue Obelisk likes working with XML in chemistry. The two components naturally come together.
This is something I have been waiting for for about 12 years. We haven’t got there yet, but we are well on the way.

This entry was posted in "virtual communities", blueobelisk, chemistry, data, etd2007, programming for scientists. Bookmark the permalink.

2 Responses to CML on ICE – towards Open chemical/scientific authoring

  1. C. Anthony Lewis says:

    The link to CrystalEye seems to be broken. The link text, where it’s spelt out, is okay but the actual URL of the link isn’t… I think it should be

  2. pm286 says:

    Thanks – mended – not sure whether this came from us or Peter Sefton’s post.

Leave a Reply

Your email address will not be published. Required fields are marked *