A really useful post from Noel O’Blog about chemical depiction and structure diagram generation (SDG). The chemical structure of compounds in “2D diagrams” is often the most important way of communicating chemical information. There is a gradually growing realisation that diagrams need to be clear and use consistent conventions (I have been involved with IUPAC in this activity).
There are two aspects as Noel shows clearly:
- if you know what atoms are connected, and you know where to put them on the page then how do you draw the best/most_useful diagram? Among the things you are allowed to alter are:
- the font-size and color
- the width of lines,
- the color of bonds
- whether hydrogen atoms are shown or not
- how aromatic rings are drawn (double bonds or circles)
- where charges should be located
- how close bond lines should approach atoms
- what happens when lines cross
- how to depict stereochemistry
- where exactly to position double bonds (inside rings, inside and outside, mitred, etc.)
- what you must not do is alter the position of atoms.
It must be clear what the compound is – correctness is more important than beauty. A major problem is when atoms are very close – it can be difficult to distinguish the atoms and often there are spurious “rings”. There is no correct answer, but it’s worth looking at some of Noel’s collection of molecules drawn by different programs. The molecules are randomly taken from Pubchem (so probably don’t exercise the inorganic features). Here’s the post:
Now for some pretty pictures as well as some not so pretty. Yes, it’s the turn of the structure diagram generators (SDGs) to strut their stuff and throw some shapes. How do they perform for 100 random compounds from PubChem?
Here are my [NO’B] results for depiction and structure diagram generation […]
(0) Rich Apodaca has written an overview of Open Source SDGs.
(1) 2D coordinate generation is independent of depiction. A SDG typically has both parts but coordinates could be generated with one toolkit and depicted with another.
(2) Looking good is not the same as chemical accuracy. But looking good is important too! 🙂
[…](5) The PubChem images appear to be generated by an OpenEye product (for sure, the coordinates are). I don’t know what version.
[…](7) It is important to consider how to handle hydrogens. With OASA, I just drew all the hydrogens. This is probably not a good idea.
(10) PubChem entries with more than 1 connected component were not included in this test. (As a result, the number of molecules shown is actually less than 100.)
So it’s not impossible but not completely trivial to depict structures. Structure Diagram Generation (where the coordinates are not given) is much harder and there is often an impossible tensions between accuracy, arbitrary convention, and aesthetics. Sometimes only a human can do it.