Inorganic InChIs

Mark Winter – who has done an enormous amount to promote web-based chemistry such as WebElements – makes an important point:

  1. Mark Winter Says:
    October 18th, 2006 at 10:18 am eOK – having carefully and rather too obviously written in InChI and SMILES strings in a story about ozone at nexus.webelements.info, and being an inorganic chemist who might want to write about a few inorganic species, I wondered how to write strings for, say, metal coordination complexes like the salt [Cr(OH2)6]Cl3. This compound is listed at PubChem athttp://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=104957
    but shows a nonsense structure, and not being a fluent InChI reader I therefore distrusted the InChI string on that page. I looked at the above mentioned carcinogenic potency database and found
    http://potency.berkeley.edu/chempages/COBALT%20SULFATE%20HEPTAHYDRATE.html
    where again the chemical structure drawn is nonsense and so again I have little confidence in the InChI string on that page.
    So how does one proceed for such species?

The structure in Pubchem (CrCl3.6H2O) does not reflect accurately our current knowledge of the compound (though it was probably OK in 1850). It should be Cr(OH2)6(3+).3Cl-. InChI does not have any builtin chemical knowledge and calculates what it is given. It sometimes points out potential valence errors (e.g. CH5) but since it is capable of representing unusual chemistry it doesn’t throw actual errors. So this particular problem is PubChem’s, not InChI. (Note that there is a small fraction of errors in Pubchem of many sorts – there is inconsistency in structural representation and some blatant errors. For those who like an amusing name, try CID: 27 and similar). Pubchem does accepts contributions from many places and does not check chemical “validity”. (These problems are well addressed by social computing…)
There is a more difficult problem for compounds without an agreed connection table. How do we represent “glucose”? It can have an open form and four ring forms (furanose and pyranose, alpha and beta). Similarly “aluminimum chloride” can be AlCl3, Al2Cl6 or Al3+.3Cl-, etc. InChI represents all of these faithfully but does not provide means of navigating between them. And coordination compounds may be represented differently by different humans – there is clearly no simple approach here.
But InChI takes a useful intermediate approach – it can disconnect the metal from the ligands. While this reduces the amount of information is will provide better chances of finding isomers in a search – it should be fairly easy to sort them out.

This entry was posted in chemistry. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *