Open Molecular Information

Last week we had a young doctor friend staying with us and because he was interested in infection the conversation turned to MRSA. If you don’t what this is, look it up in Wikipedia under MRSA (this hyperlink should work). Under the entry it stated:

Vancomycin and teicoplanin are glycopeptide antibiotics used to treat MRSA infections

I know a fair amount about vancomycin, not least because one of my colleagues Dudley Williams was a pioneer and there is a physical molecular model in the entry hall. But I had never heard of teicoplanin. (I am not afraid to admit ignorance – I am ignorant of almost everything). So what is it?
Before I give my adventures, I’ll give an overview of a typical current process for chemical searching. This is taken from the CHMINF-L list, a highly respected forum for chemical librarians and informaticians run by Gary Wiggins from Indiana University. A list member wanted to know the structure of coenzyme A. I’ll summarise the discussion (you can read it in full in the archives):

From:         Meghan Lafferty
Subject:      Structure of coenzyme A?

Hello,
I have a faculty member who wants to make sure that she has the
correct structure of coenzyme A. When she looks it up in PubChem, the
compounds she finds list 10 related structures with Same,
Connectivity (i.e., "The molecules in this group have the same
regular chemical connectivity, ignoring isotopes and
stereochemistry."). I don't know the significance of the ranking, but
the first 5 hits of a text search for coenzyme a have the CID of 87642.
A search for coenzyme A in SciFinder (using Locate, Substance
Identifier) brings up 1 hit (CAS RN 85-61-0). I got 7 hits in a
search in Beilstein (Substance Identification, Chemical Name); the
information on all 7 was not very extensive. The CAS RNs in the
records (of the 2 or 3 that included them) were 85-61-0 (same as the
SciFinder one) and 31416-98-5 which appears to be for L-Coenzyme A.
I'm inclined to tell the faculty member that the 85-61-0 is the
correct one, but I'm not entirely sure that it's true. Can anyone
shed any light on this?
Thanks!
Meghan
_____________________________________
Meghan Lafferty
Chemistry & Chemical Engineering Librarian
Science & Engineering Library
University of Minnesota
108 Walter Library
Minneapolis MN 55455

(Note: I have included institutions to emphasize the quality of the correspondents. Non-chemists need to know that:

  • Coenzyme A is a fundamental biochemical in almost all organisms and will form part of any biochemistry degree. It is therefore not a rare or contentious substance.
  • PubChem is the NIH’s Open collection of chemical and biological information related to their Molecular Libraries initiative. It contains information (not samples) of about 5 million compounds. The information is not peer-reviewed and PubChem gratefully accepts contributions of information from many sources including suppliers, publishers, researchers.
  • SciFinder is a tool/service created by Chemical Abstracts Service. I do not regularly use it but my colleagues do, after debate as to whether they could afford it (I do not know prices but it costs a lot). I believe it contains about 25 million compounds though many of those are biological sequences.
  • Beilstein is a commercial supplier of chemical information and has, I believe, about 6 million compounds and associated properties. Again, since I don’t use it, I can’t give figures.
  • The CAS-RN is a unique ID for each chemical substance created by Chemical Abstracts on which they claim copyright. It is very widely used as a universal identifier and many sites (but not PubChem) will list the CAS number. Whether this has been agreed with CAS in individual cases is not normally known.
  • PubChem and CAS were in dispute last year, with CAS lobbying the US congress to limit the activities of PubChem.
  • Note also that the answer is not immediately clear (this is not unusual in chemistry as there are some subtle qualifiers).
  • PubChem is free. CAS charges $6.00 to non-subscribers for the information above. Beilstein will also charge.)

Next:

From:         Dana Roth <[log in to unmask]>
 
Meghan: The Merck Index (#2491) gives a structural diagram.
Dana L. Roth
Millikan Library / Caltech 1-32

(The Merck Index was for many years a large physical reference volume giving strucures and properties. I do not use it myself and assume it is now on CDROM or offered online in institutions. I assume it costs money).

Next:

From:         Meghan Lafferty <[log in to unmask]>
Dana,
Thanks. It looks like the same one as in SciFinder (same CAS RN).
Meghan

Next:

From:         "Poynter, Michael" <[log in to unmask]>
 
Hi Meghan,
FYI - Science of Synthesis refers to Acetyl Coenzyme A (and gives a
structural diagram) here:
Seela, F.; Ramzaeva, N.; Rosemeyer, H., in Science of Synthesis, 16
(2003), p.945
DOI: 10.1055/tcsos-016(2006.1)-01192
Michael Poynter,
Thieme New York

(Note: Science of Synthesis is a large series of reviews of chemical reactions published by the commercial publisher Thieme. AFAIK the information is not Open).

From:         Jacob Zabicky <[log in to unmask]>
Subject:      Re: Structure of coenzyme A?

Dear Colleagues,
After trying in WOS SCI the query "ti=coenzyme a and ti=structure"
namely, articles carrying also "coenzyme a" (not necessarily because
of the split words) and "structure", the search ended with  196 hits
over the 1965-today period. Not an unwieldily number for direct
examination. The following recent  entries (from 2000 onwards) have a
chance of carrying the information (nothing to say about "acetyl
coenzyme A" and similar compounds for reconfirmation):
Shirakawa T, Takahashi Y, Wada K, et al.
Identification of variant molecules of Bacillus thermoproteolyticus
ferredoxin: Crystal structure reveals bound coenzyme A and an
unexpected [3Fe-4S] cluster associated with a canonical [4Fe-4S]
ligand motif
BIOCHEMISTRY 44 (37): 12402-12410 SEP 20 2005
(3 other references snipped)
(Note: AFAIK none of these articles are Open - i.e. it costs money to read them 
and you may not even get the answer) 

Next:
From: “E. Connie Powell”
Hello Meghan
Search the NCBI web site and select the Books database. Enter a search for coenzyme A. From the result select the book Biochemistry by J. M. Berg 5th edition. Select the figures tab. The second figure (figure 14.16) is the structure of coenzyme a.
Good luck E. Connie Powell
Evelyn Constance Powell
Physical and Chemical Sciences Librarian
Folsom Library Rensselaer Polytechnic Institute 110 8th Street Troy, NY
(Note: I hadn’t heard of NCBI books on line – thank you Connie – and I’m impressed. This book carries a date of 2002 so it’s uptodate as far as the query is concerned.)
I now try two of my own resources:

  • ChEBI. This is an Open resource run by the European Bioinformatics Institute which publishes a taxonomy of chemical substances of interest to bioscience. I search for “coenzyme A” and immediately get what I want – in machine-readable form. (I could get machine readable info from CAS and Beilstein if I paid). There is a great deal of useful information here as well.
  • Wikipedia. This resource is much-maligned as being inaccurate, created by amateurs, unsuitable for any scientist, etc. I believe that it is the future and that it will rapidly replace many reference works. (I’ll discuss in a later article my own ideas how this might happen in chemistry). So I go to Coenzyme A and find 2D and 3D structural diagrams as well as useful information about the compound.

Now… in WP I compare the structure with some of the others and I think one of the atoms has a different stereochemistry from, say, ChEBI (if true, this is serious). I don’t actually know which is right (or whether I have made a mistake). I could say “Wikipedia is probably wrong as it’s created by non-experts so I’ll ignore it” OR I can leave a note on the WP Talk page saying “I think the stereochemistry may be wrong – see my blog”. I’m optimistic that that note will be picked up by the Wikichemists and between them they will research the literature to confirm or correct the structure.
So what is the message? Firstly it’s not always trivial searching for chemical names and structures as there can be variants under the same name. There was some confusion in the discussion on the list between “coenzyme A” and “acetyl coenzyme A”. And many of the diagrams in PubChem and elsewhere don’t give stereochemistry. But assume I am an intelligent person who does not have an  immediate institutional subscription to expensive chemical resources (e.g. I am travelling). The chemical community can offer me nothing useful unless I pay for it, and some of those are impossible outside an institution. The biological community gives me 3 free resources, two of which can be seen as qaulity controlled and the other as almost comprehensive. I am confident that, with social computing, the quality control will be added to PubChem so that the bioscientists will have created a high-quality chemical information resource.
Back to teicoplanin
My first visit was to PubChem. if you go there and type in “teicoplanin” you get only one entry – and that is a mixture of two compounds – the one shown is not teicoplanin. So off to CheBI… no entry there … and to Wikipedia, which has a significant entry though without the chemical structure. I search the literature and find a link to a report on the Royal Society of Chemistry’s pages. This is Openly Accessible but I assume is copyright so I cannot re-use the structural diagram – I leave a link instead. At some stage a Wikipedian will add a structure, I’m sure. Then I will be able to point my doctor friend at Wikipedia so he can find out what the chemical formula of his drug is…
P.

This entry was posted in chemistry, open issues. Bookmark the permalink.

7 Responses to Open Molecular Information

  1. Egon says:

    Peter, while you can’t copy paste the image of the diagram, I can’t believe you are not able to redraw it? That is, do you mean that the UK law allows copyright on the content of chemical diagrams??

  2. C. Anthony Lewis says:

    Peter,
    I think you are correct… the structure shown on Wikipedia has a different stereochemistry for the OH group than either http://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI%3A15346 or that shown in Figure 14.16 alluded to in one of your quotes.
    This seems to be because in the Wikipedia structure there has been a rotation about the bond between the C(OH) and the C(Me2) carbon atoms. I don’t know which is correct but the structure shown for pantothenic acid (part of the Co A struscture) in http://www.jbc.org/cgi/reprint/276/14/10794 has the same stereochemistry as for the OH group in CO A given in the ChEBI site.
    I made models using the good old Orbit system, bought when I was an undergraduate but still much used.
    This has been the most interesting topic on ChemInf for a while!
    All the best,

  3. Justin says:

    alternatively, you can type in “teicoplanin” and perform a google IMAGE search to solve your problem immediately.

  4. Egon says:

    Peter, if you want the article to show up on chemical blogspace, just put in a for the DOI numbers.

  5. pm286 says:

    Thanks everyone, Sorry to have delayed replying – I have now learnt how to moderate the blog!
    Egon (1) Yes – I can certainly redraw the diagram but it is quite complex. I would obviously use a modern editor such as JCHempaint to do it, but it will still take a few minutes. However it is absurb for me to have to redraw it! (I remember writing a review for the Chem Soc (==RSC) many years ago and they redraw all the diagrams quoted from other jornals. What a waste of effort!

  6. pm286 says:

    Anthony (2)
    Thanks – I’ll post this on WP talk and leave it to someone else to edit. I’ll add a note to the visible text

  7. pm286 says:

    Justin(3)
    You are correct that this will find the diagrams in Google, but the problem is copyright. I suspect that very few of these will have explicit Creative Commons or similar licenses. One of the biggest contributions we can make is to explicitly stamp our images with CC licenses.
    WP insists rightly that images are in the public domain or CC.

Leave a Reply

Your email address will not be published. Required fields are marked *