Open Semantic Chemistry

In a reply to my post on Chem4Word Egon makes a valuable contribution (Egon Willighagen says: July 27, 2009 at 5:37 pm)

I think the cheminformatics community is seeing the value of semantics in chemical editing, and understood that even closed-source product have shown serious evolution in this area. JChemPaint also followed the semantic path for a while, but does not have the advantage of tight integration in a production phase editing tool like Chem4Word has. With the current marketshare of Word, this editor will quickly see a quick uptake and bring semantic chemical editing to a new audience, that of organic chemists. This is positive, and anything drawn in this tool will be semantic and interoperate with other tools. That is positive too, even if many of us will not use the editor at all, like me.

I agree (although prediction of a quick uptake is an inexact science ). He is also right that he will not use the tool directly. However there are immediate spinoffs for the whole open chemistry community regardless of platform:

  • The system is modular. That means that it does not have to be used in Word (although obviously the benefits of creating a compound document will be absent). There is an essentially standalone tool allowing chemical manipulation of objects (relies on WPF/XAML and C#). There is also a library of routines (.NUMBO) which are independent of anything except the C# language. To what extent C# will be a help or a hindrance in the Open chemical world I don’t know.

  • The APIs have been designed to be largely platform and language independent. It’s difficult to write completely independent APIs (as for example CORBA IDLs) but the following signature is characteristic of the CID interface between the UI and the .NUMBO library:

public static bool CanFlipAboutExternalAcyclicBond(

ContextObject contextObject,

IEnumerable<XElement> atomPointers)

The contextObject holds the complete state in CML so that a generic library (such as JUMBO) can relatively easily implement them. That means, inter alia, that the system can be used for batch processing of data without the need for graphics

Many of the components are declarative (in various flavours of XML) and hence language-independent. Thus the primary CML validation in import is done using a CML XML Schema and a Schematron validator. This means that the process could be trivially ported to any other language or platform simply through standard XML APIs.

XML is platform independent (you do not have to worry about line-endings, blank space, etc.)

The CML-Lite schema has been thoroughly refactored and fairly well tested so that we have a good proven foundation for semantic chemistry

And, above all, it will be Open. That means that the community will be able to contribute and benefit.

How can people benefit and contribute if they do not use Microsoft technology? To the extent that the chemical architecture is language-independent we should be able to develop and refine the chemical algorithms and semantics independently of C#. At present we are hotly debating what is meant by add a positive charge to an atom – which I hinted at before. Think about the effect (i.e. what is the formula and electron count) of the following:

  • add a + to the N in (CH3)N

  • add a + to CH4

  • add a to CH4

  • add a to N=O

  • add a to C6H6 (benzene)

  • add a to Na

  • add a to Na+

  • add a to B in BH3

  • add a to F in HF

  • Now consider what would happen if you had the option add a radical (often denoted by .).

  • I doubt very much whether the chemistry community agrees completely on the results, other than that it probably contains a and/or + and/or . glyph somewhere. But if we do not know how many electrons there are, or what the spin multiplicity is, we cannot submit this to a QM calculation.

  • For this reason I think the Open Chemistry community (and especially the Blue Obelisk community) can help systemat
    ize these declarative processes. My current position is that there are no universal valence rules and that there needs to be a separate set of rules for each element, each with its own special cases. I suspect that much of this is implicit, and perhaps explicit, in Openbabel, CDK, JUMBO, Avogadro and other Open software. If we can extract these into a set of rules that are declarative (i.e. not expressed in a specific procedural language) then we can start to get semantic consistency in our tools.

  • Here’s two more. What’s the result of deleting one =O atom from:

  • CH3C(=O)CH3

  • CH3S(=O)CH3

  • CH3N(-O.)CH3

  • CH3-N(=O)

  • CH3-N(=O)=O

  • and are there any general rules?

This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Open Semantic Chemistry

  1. I strongly support the need for a common set of chemical actions, and this is indeed what we implemented in Bioclipse. Take for example this script to ‘draw’ a molecule:
    http://gist.github.com/97248
    I’d welcome a project to set up an Open Standard for a language for editing chemical structures.
    BTW, JChemPaint does not support charges on whole molecules at this moment, and restricts itself to ‘formal charges on atoms’.
    Jim demoed Chem4Word in January, and I am impressed. In particular, I very much like to changing of representation, allowing one to share the connection table between different locations in the paper, where at some it may be merely collapsed onto a molecular mass or formula. That is certainly progress, and would love to see the equivalent in OpenOffice and LaTeX.

Leave a Reply

Your email address will not be published. Required fields are marked *