Rich Apodaca is an original member of the Blue Obelisk and has developed his own chemical authoring tool (ChemWriter 299USD). He’s just posted the rather enigmatic comment…
Metallocenes? Axial Chirality? Apache/MIT/BSD License? OpenOffice? GitHub?
I’m taking this to mean that he’s asking slightly tongue-in-cheek (a) about the power of C4W’s and CML’s representation of chemistry and (b) the openness and comunity aspects of C4W. It’s actually an excellent opportunity to follow those up. Here’s a recent post…
The last installment in this series discussed the limitations in today’s molecular languages and how FlexMol is designed to overcome them. Although these limitations are clearly present theoretically, what’s the practical effect likely to be?
For the last two years, a series of articles highlighting specific examples from the current chemical literature have appeared here. Variously titled “How would your cheminformatics tool do this?”, “Can your cheminformatics tool to this?”, and “Cheminformatics Puzzler”, each entry featured an article from a mainstream chemistry journal in which SMILES, Molfile, CML, and/or InChI would be incapable of faithfully representing a centerpiece structure. The examples are taken from well-read journals in synthetic organic, natural products, and medicinal chemistry.
The purpose was not to bash these languages, but rather point to an important common set of limitations among them – a kind of groupthink if you will.
The fundamental problem with ‘standard’ molecular languages such as molfile, SMILES, InChI, and CML is their simplification of bonding and stereochemistry. Bonding is defined as an association between two atoms using two electrons. Stereochemistry is defined in terms of one or more chiral templates.
FlexMol takes a different approach. Bonding is defined in terms of systems of one or more pairs of atoms interacting with the cooperation of zero or more electrons. Stereochemistry is defined in terms planes passing through atom pair axes.
As we shall see, this flexible system enables the faithful, lossless representation of almost any chemical substance consisting of a single, well-defined molecular entity.
There's a fundamental misunderstanding here about the role of CMl (which is anything but group-think). CML addresses the semantics of chemistry. I could reply – in the same lighthearted vein:
Zeolites? Clathrates? Block co-Polymers? HEPES buffer? transition states? gaussian logfile output? cell dimensions? multiplets? eigenvectors?
and assert that CML can deal with all of them and ChemWriter cannot.
In fact CML can deal with any of the examples that Rich has mentioned in his article because it is (a) extensible (b) namespaced and (c) linked to ontologies. CML can add any properties to any of its primitives (atoms, bonds, etc.) It can define multicenter bonds and bonds between bonds. It has primitives for lines, planes, etc which should be sufficient for representing any of the geometry mentioned. JUMBO can do geometry and algebra on these if required. It has a primitive for electron. It also can hold 2D and 3D coordinates for atoms, so that it can represent the drawings of any of the species in Rich's diagrams.
That some of these CML primitives are not used in practice is because to be useful there needs to be agreement between two of more people. If Rich wishes people to use FlexMol then either they all have to use his software or other vendors have to install FlexMol readers and writers. If he can show me a groundswell of users of FlexMol and if it appears useful for them to convert to CML then I'd be happy to give some pointers.
What would emerge is a set of primitives and ontology terms that was FlexMol-specific - in CML we call call this a convention. There are already several conventions in CML - a typical example is a JSpecView convention for spectra. This requires that a spectrum contains data (it's perfectly reasonable to have an empty spectrum) so that JSpecView can display it. Another convention is CML-lite - a subset of primitives which are processable by default by C4W.
But because CML is semantic and because it uses ontologies it can hold a very wide range of chemistry. If a processor does not understand some of it, then it simply passes it through without loss. Whether this changes the semantics can be decided by the ontology and although that's at an early stage the basic infrastructure works.
I appreciate that for generations raised on FORTRAN-like formats (Mol) and implicit information (SMILES) that it will take time to migrate to XML-based ontology-driven chemistry. But it's the only way forward that can cover mainstream chemistry whether it be molecules, reactions, crystallography, nanotechnology, computation, spectra, physical properties and their measurement.
Because chemistry is a lot more than organic molecules drawn pictorially...