Wolgang Robien has nicely solved the puzzle of the inconsistency in the formula and molecular mass of the chloromethyl methyl ether: Wolfgang Robien Says:
PMR: Thanks Wolfgang. It is interesting how many people (including me) could not see the problem. It is so natural to communicate using visual signs that we forget that they frequently mislead. There are some more general points:
- Wolfgang shows some examples of errors in reported spectra. I imagine that these are a very small
percentage of those actually in the literature. The 10 examples break down roughly equally into scientific errors and typos. He makes the point, and I completely support him, that these are avoidable. If we use semantic tools when authoring then we can avoid many errors. It’s straightforward to check the expect NMR values of many compounds and it’s a small fraction of the effort to actually make them. There are known ranges for many properties – the IUCr does this very well and has an extensive dictionary/ontology system for all mainstream crystallography. It’s made a huge difference to the quality of crystallography which is generally considerably higher than spectra. It is not a problem of tools, it’s a problem of will.
- It can be quite difficult to know what a compound – even a simple one – actually is if we cannot assume the semantic coherence of the document. There are ca 1500 common compounds and substances in the ICSC collection (from which my example was taken). I’ve blogged about this before and shall return but there are
- names (often opaque) and sometimes inconsistent
- identifiers such as CAS or RTECS for which there is no authoritative free resolution system (it costs USD6 to find out what a single CAS number is)
- chemical formula – for which there is no standard method of reporting
- molecular structure with a variety of approaches mainly based on connection tables
This is unacceptable for the machine information age. The ICSC documents are important (they are about safety) yet it is enormously difficult for a machine to read them accurately. Almost every other web source of information and many databases are also difficult or impossible to read accurately. The only modern way forward is a combination of XML(CML and others), RDF and ontologies. You’ll be seeing more of this over the next few weeks and months with ways of showing how it can be achieved. Yes, there is a need for new tools, and demonstrators, but the main problem is one of will.
And, in passing, it’s worth noting that the problem only occurred for sighted people. It’s worth remembering that our information is not just for them.