Why we need semantic chemical authoring -4

Wolgang Robien has nicely solved the puzzle of the inconsistency in the formula and molecular mass of the chloromethyl methyl ether: Wolfgang Robien Says:

  1. April 30th, 2008 at 4:25 pm eWhat you read as ‘Cl’ (=chlorine) – because you expect it – is written as ‘CI’ (=carbon+iodine) …
    quite simple, but good example showing that even the most trivial checks might avoid errors. The ‘advantage’ of this error is, that no conclusion is built on it. Every misassignment in NMR-spectroscopy might have the consequence, that another assignment is based on it – making it more reliable, because of better statistical parameters.
    For a summary of more or less ’sophisticated’ errors in NMR-spectra see:
    nmrpredict.orc.univie.ac.at/csearchlite/NMR_misinterpretation.html

PMR: Thanks Wolfgang. It is interesting how many people (including me) could not see the problem. It is so natural to communicate using visual signs that we forget that they frequently mislead. There are some more general points:

  • Wolfgang shows some examples of errors in reported spectra. I imagine that these are a very small
    percentage of those actually in the literature. The 10 examples break down roughly equally into scientific errors and typos. He makes the point, and I completely support him, that these are avoidable. If we use semantic tools when authoring then we can avoid many errors. It’s straightforward to check the expect NMR values of many compounds and it’s a small fraction of the effort to actually make them. There are known ranges for many properties – the IUCr does this very well and has an extensive dictionary/ontology system for all mainstream crystallography. It’s made a huge difference to the quality of crystallography which is generally considerably higher than spectra. It is not a problem of tools, it’s a problem of will.
  • It can be quite difficult to know what a compound – even a simple one – actually is if we cannot assume the semantic coherence of the document. There are ca 1500 common compounds and substances in the ICSC collection (from which my example was taken). I’ve blogged about this before and shall return but there are
  1. names (often opaque) and sometimes inconsistent
  2. identifiers such as CAS or RTECS for which there is no authoritative free resolution system (it costs USD6 to find out what a single CAS number is)
  3. chemical formula – for which there is no standard method of reporting
  4. molecular structure with a variety of approaches mainly based on connection tables

This is unacceptable for the machine information age. The ICSC documents are important (they are about safety) yet it is enormously difficult for a machine to read them accurately. Almost every other web source of information and many databases are also difficult or impossible to read accurately. The only modern way forward is a combination of XML(CML and others), RDF and ontologies. You’ll be seeing more of this over the next few weeks and months with ways of showing how it can be achieved. Yes, there is a need for new tools, and demonstrators, but the main problem is one of will.
And, in passing, it’s worth noting that the problem only occurred for sighted people. It’s worth remembering that our information is not just for them.

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Why we need semantic chemical authoring -4

  1. Today I have added 2,388 literature citations (corresponding to approx. 6,000 spectra), which have been reassigned during the database generation process or subsequent ‘correction cycles’
    for details see:
    nmrpredict.orc.univie.ac.at/csearchlite/NMR_misinterpretation.html
    I think this improvement in data quality justifies the moderate price for all the systems, where my data are available ( NMRPRedict – see: http://www.modgraph.co.uk / MestReNova – see: http://www.mestrec.com / Chemgate – see: chemgate.emolecules.com / KnowItAll – see: http://www.biorad.com ) – be sure, that any royalty paid to me goes back into the project !

  2. pm286 says:

    (1) That’s very interesting Wolfgang – do you have an idea of what percentage of spectra have errors that are significant enough to require reassignment?

Leave a Reply

Your email address will not be published. Required fields are marked *