Chemistry and CrystalEye – 1

We are discussing with Chemspider how some of the content in CrystalEye can be transferred to Chemspider. What Antony Williams wants is primarily the chemical identity of the substance, expressed as a connection table which can be ingested into CS, checked for uniqueness and linked back to CrystalEye (and thereby the original literature). This is straightforward for some of the entries. It’s hard for others and impossible for some. I’ll explain some of the problems here but they include:

  • we don’t know what some of the compounds are
  • we cannot give a connection table for some of the compounds
  • the algorithmic generation of the connection table may not be “chemically useful” for some compounds.

To start with let’s observe that the chemical identification of the compound is not normally (or almost ever) recorded in the CIF file. Yes, the CIF dictionary has a template for connection tables but no-one uses it. (IF they did life would be easier).
Since Acta Crystallographica Section E covers all of chemistry and since its now Open Access it’s a good place to start by looking at some example. (Note also that if you want an overview of the range of CrystaEye, just subscribe to its RSS feed and you’ll get 10-50 structures per day with structure diagrams when possible). Here we’ll browse the 2008-05 issue: We get a table of contents with the first structure displayed in Jmol (3D coordinates) and CDK (2D coordinates and bondTypes and connections from JUMBO/CIF2CML). The 2D structure looks like this:

(I’m temporarily unable to post images in WordPress so cannot snapshot the Jmol. But in any case Jmol is so lovely you should really try it yourself).
Now if we click on the first entry (grey bar) in the “article” column (view) we get linked through to the Acta Cryst article. And, since it’s Open Access, I can reproduce the whole page without having to ask permission. That gives me a warm feeling, and I am sure it does the same for the team at IUCr in Chester and also the authors. Open Access is so liberating. Anyway here’s the page:
======================================================

Acta Crystallographica Section E

Structure Reports Online

Volume 64, Part 5 (May 2008)


organic compounds


bg2176 scheme

Acta Cryst. (2008). E64, o899    [ doi:10.1107/S1600536808010696 ]

8-Hydroxy-5,6,7-trimethoxy-2-phenyl-4H-chromen-4-one

J. E. Theodoro, D. Santos, H. Pérez, M. F. das G. F. da Silva and J. Ellena

Abstract: In the title compound, C18H16O6, the benzopyran group is essentially planar, with the O atoms of the substituent groups lying close to its mean plane. The molecular conformation is governed by intramolecular interactions. The crystal packing is mainly determined by one classical intermolecular hydrogen bond which gives rise to the formation of an infinite chain along the a axis.
======================================================
So everything fits. Our proposed structure (from the CIF) corresponds exactly with the structure drawn by the authors (for non-chemists the position of the double bonds in the LH benzene ring is arbitrary). And the name can be translated by human or machine to give the same structure.
So in this case we’d be 100% happy to submit the connection table to aggregators such as PubChem or Chemspider. There’s a simple map between the connection table and the crystal structure.
But it’s not always that simple…
… and later posts will tel you why.

This entry was posted in semanticWeb, Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *