#solo10 GreenChainReaction: Some chemical information is appalling quality, but does anyone care?

Typed into Arcturus

Earlier I asked for the compound a patent image represented (EP_2050749A1/0026imgb0032.tif)


“Could someone please tell me what the InChI or SMILES or CML is for this compound?” This was a slightly trick question as you have to realise that the image is corrupted. (You might have to re-read the patent to know this).

Egon Willighagen says:

August 15, 2010 at 12:36 pm  (Edit)

Peter, I guess this is the compound:

http://pubchem.ncbi.nlm.nih.gov/summary/summary.cgi?cid=24888923

When seeing the image for the first time, I had the feeling a bond was too thin to show up in the rasterized image… this hit on PubChem might be a lead for further information.

Egon has solved it. There ought to be a vertical line (chemical bond) in the formula. It’s missing. That’s because TIFFs and GIFS and PDFs destroy and corrupt information. This is a classic. Here’s a similar compound –


 

The point is that the quality of the chemical information in the patent is so poor that the image has got corrupted in the processing somewhere. This is symptomatic of chemistry where there is little communal interest in tools that create quality validated information. The EPO engaged with us some years ago to introduce Chemical Markup Langauge but they couldn’t convince the chemical companies, the chemical software companies, the secondary patent providers and the whole market sector to do it.

Egon is trying to get the Chemoinformatics sector to do proper, Openly validatble science. I’m with him – and have been all along. But the mainstream community does not want to require details of what data where used, where they came from, how to re-run the calculations, how to re-analyse the data. Chemoiniformatics is in danger of being regarded as a pseudoscience. I’ll blog more of this later – probably not until after the GCR.

Here’s the pubchem entry. Note how this is presented in scalable vectors:

 

(I have rotated this to show the similarity to the original


Of course you can follow the links on Pubchem to see what the commercial providers of patent info provide. If you haven’t seen the sort of webpages, take time to have a look…

Category: [for same structure substances]

   DiscoveryGate ( 1 ) External ID: 24888923
   Thomson Pharma ( 1 ) External ID: 02644348

How much use is this to you?


 

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *