Chemspiderman has commented…
PMR: I have already posted on this blog that – in general – chemical structures are not right or wrong. They may be associated with other information and the chemical community as a whole decides that this association is useful or counterproductive. Please read the argument carefully.
If, for example, I write CH5 is this structure wrong? It violates the valency rule, after all. No. It’s not wrong, it just can’t be found in a bottle in most labs. It can be found in mass specs and interstellar space. There is an arrogance in the chemical informatics community that assumes the only discipline that matters is synthetic organic chemistry. In general no chemical structure that obeys the algebra is wrong. (The algebra says things like “no fractional charges on molecules” (although ther can be on crystal cells, “if A is bonded to B then B is bonded to A”).
There are unacceptable uses in mainstream C19 organic chemistry, such as carbon with 5 “valencies”. Such structures may be deemed “wrong” by organic chemists. It was clear that when Chemspider was set up the support for inorganic compounds was almost non-existent – I pointed this out and I think the position is improved somewhat. But I don’t have time to check – I expect there are many compunds represented by discrete “connection tables” which in my view are far worse chemical sins. But I am turning my attention elsewhere.
So “Peter, I think the structure of discodermolide is wrong”. No. I think this means “liquidcarbon has drawn a structure to which s/he has associated the name ‘discodermolide’ and Chemspiderman things this association is incompatible with current usage. ” OK. Discodermolide is a substance of relatively minor importance compared to penicillin G and THC. It has 103 hits in Pubmed, compared with 30,000 for taxol. Maybe it will become famous one day. Until then I don’t really care that liquidcarbon may have got it “wrong”.
What I do care about is that we develop a community process – not regulated by a closed commercial company or a closed learned society division – that allows us to converge towards a cluster of agreed names at any point in time. In some cases this is easy – I think we all agree what Pen-G is – in some cases this is a question of removing known errors – and Wikipedia is great for this. (BTW I made a correction to the strucure of Acetyl-CoA in Wikipedia, and the wikichemists agree the structure is noew “correct” – but this is a natural part of using WP and I do these things every other day).
Pubchem has got it right. It simply records what name a human or organization has attached to a connection table, and gives the reference. That is all it needs to do. We then, as a community, need to evolve a Web 2.0 mechanism for annotation that allows us to find the “right” structure rapidly.
That’s the sort of thing we shall soon start to be doing with the peer-reviewed literature – if our grant gets funded. Social computing to create consensus on data and names. All Open. All in public view. Versioned. With metadata. And until the chemical “databases” adopt C21 metadata they are largely useless in the C21. Pubchem understands this. And ChEBI, and some Blue Obelisk efforts. No-one else seems to have got the point.
October 2nd, 2007 at 5:48 am e[…] Luqidcarbon has put up a recent blog posting about the speed by which he/she can draw structures in ChemDraw and asked for challengers. PRM has commented in Chemical SpeedDrawing. The challenge is outlined below… […]
October 2nd, 2007 at 6:21 am ePeter, I think the structure of discodermolide is wrong…this is where a look-up in a reference dictionary is necessary…and I think we both support that effort. But it MUST be curated. it IS correct on Wikipedia but drawn incorrectly by liquidcarbon and everyone afterwards…
It is why I favor the scan and convert software for this…there is the version from Marc Nicklaus’ lab but I must admit that my present bias is to use CLiDE (http://www.simbiosys.ca/clide/index.html) because it can be batched and because the results appear to be so far ahead of the Open Source code at present. We do not have time to work on the Open Source support at present as ChemSpider is very distracting and we are focused on potentially using the batch processing for extracting novel structures from Open Access articles.
I put a detailed blog posting about this at: http://www.chemspider.com/blog/?p=180