Why we need semantic authoring tools in chemistry – 3

The type of problem highlighted in my recent post is a very serious one and so rather than giving the answer I want to help you discover it for yourself. Hopefully then you will have a wow! or aha! or buggerthat! moment that will help orient you to the importance of semantic tools. Persevere in this and you will see why I rant against PDF, why weak OA does not normally provide high quality semantic documents.
You need to know a very little chemistry and I’ll explain it all below. But first the essence of the problem (relating to methyl chloromethyl ether – you can look it up on WP but it’s not necessary to solve the problem)
In essence the chemical formula as given:

CH3OCH2CI

is completely incompatible with the molecular mass as given:

Molecular mass: 80.5

For those who have forgotten high-school chemistry all you need to know is:

  • Elements are defined by an unambiguous symbol. Thus “H” means hydrogen, “C” means carbon, “O” means oxygen. You can look up all the information in Wikipedia.
  • The count of each element is one, unless subscripted. Elements can be repeated. Thus CH3OH is read as one carbon, three hydrogens, one oxygen and another hydrogen. Adding them up gives one carbon, four hydrogens and one oxygen.
  • to get the molecular mass you look up the atomic masses of each element in the Wikipedia entry (or on the Blue Obelisk site) and multiply by the count. The example above (methanol) goes: 1 carbon @ 12 = 12; 4 hydrogens @ 1 = 4; 1 oxygen @ 16 = 16. Add together and the answer is 32 (you can check in Wikipedia). Note that you should round the atomic masses to the nearest 0.5 (my teaser is not a problem of decimal points).

If you do this for the puzzle compound you should discover the problem.
And you’ll see why it bears on PDF, OA, and all the rest.
If we had semantic chemical tools where the information was checked as it was entered this COULDN’T happen. Now for that we need something like a chemical plugin for Word.
Is there a good fairy out there?

This entry was posted in semanticWeb, Uncategorized. Bookmark the permalink.

4 Responses to Why we need semantic authoring tools in chemistry – 3

  1. baoilleach says:

    I guess I’m missing the point somewhere. The molecular weight seems to be correct when I use Pybel.
    >>> pybel.readstring(“smi”, “COCCl”).molwt
    80.513500000000008

  2. pm286 says:

    (1) Yes, you are :-). Think very carefully what you calculated the molecular mass of (CH3OCH2Cl). It wasn’t what I posted

  3. What you read as ‘Cl’ (=chlorine) – because you expect it – is written as ‘CI’ (=carbon+iodine) …
    quite simple, but good example showing that even the most trivial checks might avoid errors. The ‘advantage’ of this error is, that no conclusion is built on it. Every misassignment in NMR-spectroscopy might have the consequence, that another assignment is based on it – making it more reliable, because of better statistical parameters.
    For a summary of more or less ‘sophisticated’ errors in NMR-spectra see:
    nmrpredict.orc.univie.ac.at/csearchlite/NMR_misinterpretation.html

  4. pm286 says:

    (3) Thanks Wolfgang – several people were fooled by this. If the string is pasted into a machine it becomes obvious.
    For our purposes it’s important as formula is a key field in disambiguating web resources – more of that later.
    And thanks for the NMR errors – very useful.

Leave a Reply

Your email address will not be published. Required fields are marked *