Open NMR

As I have already blogged (WWMM calculation of spectra) we are hoping to provide Jean-Claude Bradley and others an Open service to calculate NMR spectra from structure. This  needs a lot of software components and a lot of glueware. With the release of FROG – not just Free, but Open yet another problem is solved, but we aren’t there quite yet.
The calculation of spectra from NMRShiftDB is automatic because, AND ONLY BECAUSE, Christoph and Stefan have used CMLSpect to represent the data. CMLSpect allows:

  • connection table
  • atom labels
  • 3D coordinates
  • spectra
  • spectral peaks
  • assignment of peaks to atoms

all these (except the raw spectra) are required for the calculation. Actually the connection table can be dispensed with if the hydrogen atoms are given explicitly – as they should ALWAYS be. (Implicit hydrogens have probably cost the human race thousands of wasted years through errors. There is now NO excuse for not including hydrogen atoms explicitly in files. Size of files? Rubbish. All the hydrogens in a year’s global chemistry are worth 1 day of astronomical simulation).
So with NMRShiftDB we have the simple process:

  • read NMRShiftDB file
  • add hydrogens with coordinates (JUMBO does this)
  • transform to Gaussian input (XSLT makes this automatic)
  • run job (Condor makes this automatic)
  • analyze results (i.e. compare calculated and observed – Nick Day’s software is making this automatic)

With the normal chemical environment this is messier

  • read mol file
  • submit to FROG to generate 3D coordinates. Hope it hasn’t changed the order of atoms
  • convert mol file to CML
  • read list of peaks in some legacy format (?Excel)
  • try to match peaks to atoms for assignment (probably have to rely on atom ordering)
  • create peakList in CMLSpect. How?
  • combine peakList with molecule in CML
  • transform to Gaussian input (as above) and then it’s plain sailing

The problems arise because:

  • hydrogens are a problem
  • mol files (and all other files than CML) do not have atom labels
  • there is no Open tool for assigning peaks to atoms
  • relying on atom ordering is a recipe for disaster and extremely difficult to debug

So what is clear is that we need a tool to couple JSpecView to a molecule in CML. The output, at least, has to be in CML because there is no other way of linking atoms to peaks.
This should be seen as one of the great (but achievable) challenges of the Blue Obelisk movement. When we get it, it will transform the way that graduate students record their peak assignment and publish their papers and THESES!

This entry was posted in blueobelisk, nmr, open issues, open notebook science, programming for scientists, theses. Bookmark the permalink.

6 Responses to Open NMR

  1. Pingback: Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » Open-Data-driven science and a brokering system for ONS

  2. Robert Lancashire has some examples where when you click the atom in jmol the NMR peak in JSpecView gets highlighted. See

  3. Dear Peter;
    there is a searchable database of 16.4 millions of calculated C13-NMR spectra available since already 1 year on
    The spectra have been calculated for 16,4 millions of the PUBCHEM-structures using the CSEARCH NN-approach. The search technology used, is a modified SAHO-approach as implemented in CSEARCH.
    If there is more interest in using this, no problem to upgrade the data file to the actual size of the PUBCHEM-collection. The calculation of approx. 40 millions of spectra can be done in less than one week on a 4-processor box.
    Best regards, Wolfgang Robien

  4. pm286 says:

    (3) Thank you.
    This sounds like a useful resource but it isn’t really relevant to our project where we wish to compare predicted values with validated data.

  5. pheidrias says:

    A propos “read list of peaks in some legacy format (?Excel)” I wonder, wether there is some free format (xml-type?) which represents spectral data?
    Would be also interesting for infrared spectra etc. .
    Do you know any project taking care for this?
    best regards,

  6. pm286 says:

    (5). We have developed CMLSpect that manages many types of spectra.

Leave a Reply

Your email address will not be published. Required fields are marked *