petermr's blog

A Scientist and the Web


Open NMR

As I have already blogged (WWMM calculation of spectra) we are hoping to provide Jean-Claude Bradley and others an Open service to calculate NMR spectra from structure. ThisĀ  needs a lot of software components and a lot of glueware. With the release of FROG – not just Free, but Open yet another problem is solved, but we aren’t there quite yet.

The calculation of spectra from NMRShiftDB is automatic because, AND ONLY BECAUSE, Christoph and Stefan have used CMLSpect to represent the data. CMLSpect allows:

  • connection table
  • atom labels
  • 3D coordinates
  • spectra
  • spectral peaks
  • assignment of peaks to atoms

all these (except the raw spectra) are required for the calculation. Actually the connection table can be dispensed with if the hydrogen atoms are given explicitly – as they should ALWAYS be. (Implicit hydrogens have probably cost the human race thousands of wasted years through errors. There is now NO excuse for not including hydrogen atoms explicitly in files. Size of files? Rubbish. All the hydrogens in a year’s global chemistry are worth 1 day of astronomical simulation).

So with NMRShiftDB we have the simple process:

  • read NMRShiftDB file
  • add hydrogens with coordinates (JUMBO does this)
  • transform to Gaussian input (XSLT makes this automatic)
  • run job (Condor makes this automatic)
  • analyze results (i.e. compare calculated and observed – Nick Day’s software is making this automatic)

With the normal chemical environment this is messier

  • read mol file
  • submit to FROG to generate 3D coordinates. Hope it hasn’t changed the order of atoms
  • convert mol file to CML
  • read list of peaks in some legacy format (?Excel)
  • try to match peaks to atoms for assignment (probably have to rely on atom ordering)
  • create peakList in CMLSpect. How?
  • combine peakList with molecule in CML
  • transform to Gaussian input (as above) and then it’s plain sailing

The problems arise because:

  • hydrogens are a problem
  • mol files (and all other files than CML) do not have atom labels
  • there is no Open tool for assigning peaks to atoms
  • relying on atom ordering is a recipe for disaster and extremely difficult to debug

So what is clear is that we need a tool to couple JSpecView to a molecule in CML. The output, at least, has to be in CML because there is no other way of linking atoms to peaks.

This should be seen as one of the great (but achievable) challenges of the Blue Obelisk movement. When we get it, it will transform the way that graduate students record their peak assignment and publish their papers and THESES!

Leave a Reply