CDK's Diazonamide and general thoughts on Openness

Noel O’Blog has suggested that I should use Rajarshi Guha’s CDK service to layout the Diazonamide structure (see my post Finding chemical structures – InChIs et al., an amusement)

  1. baoilleach Says:
    September 24th, 2007 at 7:59 am eFor the record, you can compare with CDK’s SMILES to 2D at:
    http://cheminfo.informatics.indiana.edu/~rguha/code/java/cdkws/cdkws.html#sdg

PMR: so here it is:
cdk.png
PMR: I think it’s correct. Interpretable. I’d put it on the same level as the Daylight one. One message is that it is difficult for software to layout structures with a 10-ring nucleus.
The point is that CDK is Open Source and can therefore be enhanced by the community. Daylight and the software that Pubchem (?Cactus?, ?Openeye?) use isn’t. CDK is joint leader, and we can improve it.
A complementary approach is to start making collections of human-drawn images. The intelligible Chemspider image was hand-drawn by the PNAS authors – I don’t know how it got to Chemspider. (Personally I think it’s pretty awful – I do not like stereo bonds which are rectangular rather than wedges. Why do people use them. And You only have to scale the image to corrupt this info). So we need an Open collection of chemical structures.
This is not technically difficult but is lathered with copyright madness. Can I reproduce a chemical structure from Nature without permission? I’ve asked but they haven’t got back to me. Can I reproduce a chemical structure diagram from Wiley? I’ve asked but… … they haven’t got back to me.
It has to be fully Open. Every structure diagram has to be copyright-free and accompanied by metadata that gives provenance and alternative descriptions (names, InChIs, etc.). Is there anywhere that has chemical images that I can download that fulfils all these permissions?
I’ve found one (sorry for the layout). Here’s taxol:

Paclitaxel
β-(benzoylamino)-α-hydroxy-,6,12b-bis
(acetyloxy)-12-(benzoyloxy)-2a,3,4,4a,
5,6,9,10,11,12,12a,12b-dodecahydro-4,11-
dihydroxy-4a,8,13,13-tetramethyl-5-oxo-
7,11-methano-1H-cyclodeca(3,4)benz(1,2-b)
oxet-9-ylester,(2aR-(2a-α,4-β,4a-β,6-β,9-α
(α-R*,β-S*),11-α,12-α,12a-α,2b-α))-
benzenepropanoic acid

And there’s lots of data with it that looks like this:
taxol.png
I’ll leave you to guess where this is. Clues: It’s Open, re-usable, very highly curated, and the first place that students look. That – or a derivative – is where the world’s chemistry should reside.

This entry was posted in Uncategorized. Bookmark the permalink.

8 Responses to CDK's Diazonamide and general thoughts on Openness

  1. Peter, There’s a FeedBurner subscription box on the chemspider blog at http://www.chemspider.com/blog. it appears we are covering so much similar ground it would be good to not echo our efforts. I posted this tonight but thought it would fit here too.
    I covered the issue of taxol a few weeks ago: http://www.chemspider.com/blog/?p=64. Today Taxol came up again on a post by Peter Murray-Rust. First of all a couple of comments re the post.
    PMR commented “The intelligible Chemspider image was hand-drawn by the PNAS authors – I don’t know how it got to Chemspider. (Personally I think it’s pretty awful – I do not like stereo bonds which are rectangular rather than wedges. Why do people use them. And You only have to scale the image to corrupt this info). So we need an Open collection of chemical structures.” In case there is confusion please read the original post…the structure was grabbed from a PDF file (4 Total synthesis highlights (Annu. Rep. Prog. Chem., Sect. B: Org. Chem., 2004, 100, 91) – Royal Society of Chemistry)…it is NOT on ChemSpider. The structure was located by a search using Chemrefer, now on ChemSpider. It was not drawn by us, we’re not responsible for it and, to clarify, I don’t like it either.
    Oh, and we do have an Open Collection of chemical structures. The deposition process is under beta-testing and anyone can download the data (we will give away the entire structure collection shortly).
    Peter commented that Wikipedia is highly curated. I use it a lot. But, I am cautious…ESPECIALLY with stereochemisty. I’m trying to determine what the ACTUAL taxol structure is. My investigations suggest that one stereocenter is WRONG on the Wikipedia structure. The link to the PubChem record is therefore to the incorrect structure in theory.
    Also, the systematic name is not what I would term as anywhere near IUPAC standard: β-(benzoylamino)-α-hydroxy-,6,12b-bis(acetyloxy)-12-(benzoyloxy)-2a,3,4,4a,5,6,9,10,11,12,12a,12b-dodecahydro-4,11-dihydroxy-4a,8,13,13-tetramethyl-5-oxo-7,11-methano-1H-cyclodeca(3,4)benz(1,2-b)
    oxet-9-ylester,(2aR-(2a-α,4-β,4a-β,6-β,9-α(α-R*,β-S*),11-α,12-α,12a-α,2b-α))-benzenepropanoic acid
    By the way…the name on Drugbank is 5 beta,20-Epoxy-1,2a,4,7 beta,10 beta,13 alpha-hexahydroxytax-11-en-9-one 4,10-diacetate
    2-benzoate 13-ester with (2 R,3S)-N-benzoyl-3-phenylisoserine….hmmm…
    I would LOVE this post to get confirmation regarding what the right structure is…is Wikipedia CORRECT or Wrong? I THINK the structure on Drugbank is RIGHT. This DIFFERS from the Wikipedia structure by one stereocenter. Check out the InChIs below:
    PUBCHEM
    InChI=1/C47H51NO14/c1-25-31(60-43(56)36(52)35(28-16-10-7-11-17-28)48-41(54)29-18-12-8-13-19-29)23-47(57)40(61-42(55)30-20-14-9-15-21-30)38-45(6,32(51)22-33-46(38,24-58-33)62-27(3)50)39(53)37(59-26(2)49)34(25)44(47,4)5/h7-21,31-33,35-38,40,51-52,57H,22-24H2,1-6H3,(H,48,54)/t31-,32-,33+,35-,36+,37-,38-,40-,45+,46-,47+/m0/s1/f/h48H
    DRUGBANK
    InChI=1/C47H51NO14/c1-25-31(60-43(56)36(52)35(28-16-10-7-11-17-28)48-41(54)29-18-12-8-13-19-29)23-47(57)40(61-42(55)30-20-14-9-15-21-30)38-45(6,32(51)22-33-46(38,24-58-33)62-27(3)50)39(53)37(59-26(2)49)34(25)44(47,4)5/h7-21,31-33,35-38,40,51-52,57H,22-24H2,1-6H3,(H,48,54)/t31-,32-,33+,35-,36+,37+,38-,40-,45+,46-,47+/m0/s1/f/h48H
    Compare the STEREO layer at:
    t31-,32-,33+,35-,36+,37-,38-,40-,45+,46-,47+
    t31-,32-,33+,35-,36+,37+,38-,40-,45+,46-,47+
    and compare the stereo for stereo center 37… one is PLUS and one is MINUS. OOPS!
    I’m certainly willing to be wrong but the point is, right now, I am not sure what the right structure. Can anyone out there confirm??? Can someone check “the” highly curated data source and tell us?
    Until then I am in full agreement with Peter regarding what Wikipedia SHOULD be “It’s Open, re-usable, very highly curated, and the first place that students look. That – or a derivative – is where the world’s chemistry should reside. ” HOWEVER, I am calling for confirmation of the structure and correction if necessary. One of either DrugBank OR PubChem, both linked from Wikipedia, is wrong.
    In terms of the comment “That – or a derivative – is where the world’s chemistry should reside”. I DO agree. We have committed to a wiki-environment for Chemistry. We are presently deciding on the appropriate wiki environment (NOT necessarily MediaWiki) to layer onto ChemSpider. Email exchanges are underway with some of the players in this domain at present – and a sincere thanks to Joerg Wegner for his support on this! With Martin Walker on our advisory group (Walkerma on Wikipedia…a very active player in this domain) we look forward to the best advice and guidance from our collaborators.

  2. Peter, There has been comments regarding the accuracy of Taxol on Wikipedia at
    http://www.chemspider.com/blog/?p=164
    Overall conclusion, to date, byt two people who’ve spent some time researching the accuracy is that the structure as drawn is correct. The structure in Drugbank is consistent but the one linked in Pubchem (of many) is INcorrect. Some more research may be necessary as two points don’t make the conclusion but for now I think this might be all we get….

  3. Pingback: ChemSpider Blog » Blog Archive » Will the Correct Structure of Taxol Please Stand Up. Part 2.

  4. Pingback: ChemSpider Blog » Blog Archive » Will the Correct Structure of Taxol Please Stand Up. Part 3.

  5. Peter..I called to question the quality of the Taxol entry in Wikipedia and covered it in part 3 of my blog postings. Bottom line..the structure is correct, the entry linked in Drugbank is correct, the structure linked in PubChem is wrong. This is not a PubChem issue in many ways since the correct structure of Taxol IS in PubChem, but it is a different ID. The systematic name is questionable.
    I expanded on the number of places the structure is correct and the places it is incorrect.
    You are correct..we need a validated database of structures and associated names available for all. STN-Easy provided access to the right information and was an excellent resource. A kind gentleman from MDL provided information too and it was correct so “professionally” curated data based on this example is of high quality. Now the question is whether a communal public effort can achieve the same level. I believe Wikipedia has the right model..and overall quality is good.

  6. The systematic name and the link to PubChem have been updated in Wikipedia now so all is now appropriate and accurate. This is the outcome of the Open Data nature of Wikipedia plus a major concern about the quality of what’s in there. Thanks to all who contributed to me resolving this issue.
    http://en.wikipedia.org/wiki/Paclitaxel

  7. Pingback: ChemSpider Blog » Blog Archive » Curators Perform Heroic Duties. They Should be Celebrated!

  8. Pingback: ChemSpider Blog » Blog Archive » Will the Correct Structure of Taxol Please Stand Up. Part 3.

Leave a Reply

Your email address will not be published. Required fields are marked *