Steve Bachrach poses an interesting question on the CHMINF-L list. I have omitted the citations and some other material – you can read the archive if necessary.
I have run into an interesting chemical problem that has led to both theoretical and applied database questions. I am hoping that some of the experts on the list can shed some light.I have been looking into the recent controversy concerning the structure of (+)-hexacyclinol. This compound was first isolated in 2002 by Graefe et al who proposed a structure for it…. La Clair recently synthesized this structure, or it least reportedly so. [SB: By the way – do a google searchon hexacyclinol so see how the blogosphere responded to this problem.] Then Rychnovsky …proposed an alternative structure for (+)-hexacyclinol, which was subsequently synthesized and confirmed to be identical to the original natural product
So here is first my theoretical question: How do you index such a situation? The original structure of the molecule (+)-hexacyclinol is wrong, and a subsequent one is right. So, when you query a database, which structure matches up with the name “(+)-hexacyclinol”? My guess is that it should be the correct one – but then what do you do with the oldstructure? Obviously, this is not the first, nor will it be the last, compound whose structure in contested.Now here is the more applied aspect. A search in SciFinder for (+)-hexacyclinol gives CA 484674-97-7, which is the original (and, we now know, wrong!) structure. Querying for the papers that have this structure returns the Grafe, La Clair and Rychnovsky papers, but not the Porco paper. But entering the “true” hexcyclinol structure and then doing a search locates 2 structures CA 903574-41-4 and CA 903574-42-5, which look to me to be identical. Furthermore, the only paper that is linked to these “two” structures is the Rychnovsky paper. In other words, the Porco paper that reports the actual synthesis and x-ray structure of hexacyclinol does not have any hexacyclinol structure(s)(correct or not) attached to it!
(By the way, a PubChem search for hexacyclinol comes up dry, but all of the above papers are indexed in PubMed.) Any explanations?
(PMR: Yes, Hexacyclinol is not very interesting except to chemists so no-one has deposited a data collection containing it to Pubchem. If synthetic chemists contribute collections of targets to Pubchem I am sure Pubchem will be delighted to accept them. However many chemists are still unaware that PubChem exists.)
- What is aluminium chloride?
- What is glutamate?
- What is glucose?
These are legitimate scientific statements which require several assertions, linked in a mini-semantic web. That is why we need to move from a twentieth-century way of describing chemistry (as exemplified by the CA numbers) to a semantic one. There is lots of room for volunteers.
I’m reminded of being shown round the British Museum of Natural History – and a room full of fish specimens in ethanol in labelled glass containers. The biologist said that some countries had asked for their specimens – their property – back – the BM had resisted. But if it ever came to that the BM would keep the labels – their metadata.