petermr's blog

A Scientist and the Web


Open NMR: Nick Day’s “final” results

Nick has more-or-less finished the computational NMR work on compounds from NMRShiftDB and we are exposing as much of the work as technically possible. Here is his interim report, some of which I trailed yesterday. The theoretical calculation (rmpw1pw91/6-31g(d,p)) involves:

  • correction for spin-orbit coupling in C-Cl (-3 ppm) and C-Br (-12 ppm)
  • averaging of chemically identical carbons (solves some, but not all conformational problems)
  • extra basis set for C and O [below]

====== Gaussian 03 ====

# rmpw1pw91/6-31g(d,p) NMR scrf(cpcm,solvent=Acetone) ExtraBasis

Calculating GIAO-shifts.

0 1

C 0
SP 1 1.00
0.05 1.00000000 1.00000000
O 0
SP 1 1.00
0.070000 1.0000000 1.0000000
====== Gaussian 03 ====

In general his/our conclusions are:

  • the major variance in the observed-calculated variate is due to “experimental” problems (“wrong” structures, misassignments)
  • significant variance from unresolved conformers and tautomers
  • small systematic effects in the offset depending on the hybridization [below]

The final variance is shown here (interactive plot at ( requires Firefox):


(In the interactive plot clicking on any point brings up the structure, and the various diagnostics plots can then be loadad for that structure). It can be seen that the sp3 Carbons (left) are systematically different from the sp2 (right) and we shall be playing with the basis sets to see if we can get this better. If not it will have to be an empirical calculation.

The variance can be plotted per structure in terms of absolute error (C) and intra-structure variance (RMSD). Here’s the plot ( for this (which obviously includes some of the variance from the systematic error above):


The sp2/sp3 scatter can be seen at the left but the main RMSD (> 3.0 ppm) is probably due to bad structures and unresolved chemistry. There are 22 points there and we’d be very grateful for informed comment.

Assuming the main outliers can be discarded for legitimate reasons (not just because we don’t like them) then I think we have the following conclusion. For molecules with:

  • one major conformation …
  • … and where there are no tautomers or we have got the major one …
  • … and where the molecule contains only C, H, B, N, O, F, Si, P, S, Cl, Br …

then the error to be expected from the calculation is in the range 1-2 ppm.

We can’t go any further without having a cleaner dataset. We’d be very interested if anyone can make one Open. But we have also have some ideas how to start building one and we’d be interested in collaborators.

We’ve now essentially exposed all or methodology and data. OK, it wasn’t Open Notebook Science because there were times when we didn’t expose things. But from now on we shall try to do it as full Open Notebook Science. There may be some manual procedures in transferring results from the Condor system to web pages, but that’s no different from writing down observations in a notebook – there will be a few minutes between the experiment and the broadcast. And this will be an experiment where anyone can be involved.

Leave a Reply