CML provides semantic chemistry

[I have had to change machines temporarily so this post not ICE-enabled]

Henry Rzepa has commented on CML/Chem4Word:

Henry Rzepa says:

The progress with Chem4Word is hugely impressive and important. But I would like to remind that the first steps to this were taken a long time ago now. In an article published in 2001, we [Henry, Michael Wright and PMR] set out much of the framework. I have taken the liberty of pasting the abstract of that effort here. In essence, we have moved from a very simple article markup (DocML) to the sophisticated one used by Microsoft, but the essence of transcluding interoperating and namespaced XML languages, the CML component of which is data rich, is preserved to the present.

“We report the first fully operational system for managing complex chemical content entirely in interoperating XML-based markup languages. This involves the application of version 1.0 of chemical markup language (CML 1.0) and the development of mechanisms allowing the display of CML marked up molecules within a standard web browser (Internet Explorer 5). We demonstrate how an extension to include spectra and reactions could be achieved. Integrating these techniques with existing XML compliant languages (e.g. XHTML and SVG) results in electronic documents with the significant advantages of data retrieval and flexibility over existing HTML/plugin solutions. These documents can be optimised for a variety of purposes (e.g. screen display or printing) by single XSL stylesheet transformations. An XML schema has been developed from the CML 1.0 DTD to allow document validation and the use of data links. A working online demonstration of these concepts, termed ChiMeraL, containing a range of online demonstrations, examples and CML resources such as the CML DTD and schema has been associated with this article via the supplementary material.”

I would note that IE 5 was a quite different beast from the present one, and that inevitably, our demonstrator called ChiMeraL, no longer functions! Would anyone like to offer to repair it?

Bachrach has also blogged on the topic, and a discussion is also developing there

[PMR] Henry is correct that this was the first full semantic chemistry publication (and it’s worth noting that – 8 years later – there are now publications emphasising “semantic” which go over much of the same ground. In fact the very first publication is:

http://acscinf.org/docs/meetings/216nm/216cinfabstracts.htm#33

33. THE COMPLETE CHEMICAL E-PUBLICATION.

Peter Murray-Rust, University of Nottingham, Nottingham, NG7 2RD, UK.

The development of new tools for use on the WWW is now extremely rapid. Even allowing for the current “hype” over XML (eXtensible Markup Language) and other protocols, it seems certain that most Web-based information systems will be adopting them. The announcement by major suppliers that they will be developing XML-based browsers and editors means that there will be a large number of affordable high-quality tools available very shortly. The goal is to make information available globally, in any discipline, and as easily as possible to authors and readers/users. Authors and tool developers will use documents and data from different domains that interoperate in a platform- and vendor-independent manner. The XML family of protocols will allow integrated documents and data for the first time. To support chemistry, we need to address specifically molecular problems. Although there are no agreed semantics for molecular data nor de facto standards, a starting point, Chemical Markup Language (CML) will be presented.

… which is now nearly 11 years old. At that meeting I presented an interactive semantic document where I had had to write almost all the software for molecular display, spectral display, tree-based navigation, etc. Most of that is now lost, although Henry had the foresight to archive some of it as a CDROM as one of his electronic conferences.

We are gratified that the design we worked out 10 years ago for CML 1.0 is still largely relevant today. CML covers the main areas of chemical publication – molecules, reactions, spectra, crystallography and compchem. These have all been extensively implemented and tested so we are clear that the design works. It is the only approach to managing all of these objects in a single schema.

Since that time there have been many enhancements to the design, some driven by experience and some by new web technology. We now make extensive use of RDF and ontologies to implement the CML dictionaries. We also regard CML as a set of microformats which can be used in an arbitrary or a controlled manner (for the latter we use the @convention attribute).  There are extensive converters to an from CML and tools for display and – now – editing. There is no technical reason why it should not become the digital dialtone of the [chemical] web (Jon Bosak’s phrase for XML).

Until Chem4Word and ICE, the  authoring of complete documents had to be done manually. We can now use these to create documents, include chemistry and add hyperlinks and – where there is open source code – some behaviour.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *