Open Notebook NMR – Outliers

Egon asks:

  1. Egon Willighagen Says:
    October 23rd, 2007 at 4:00 pm eWhere in which wiki can I find the outliers? That would allows people to indicate problems, and possible annotate existing publications with ratings (”this article has a incorrectly assigned NMR spectrum”).

Egon and others… We met with Christoph, Nick and Jim today and talked with Henry on the phone. What we plan is roughly:

  • Nick has worked out the variance of each structure (precision), and also its offset from the origin (accuracy). Serious errors usually affect both of these. We hope we can find a set of outliers that primarily show variance because these will be interpretable (accuracy may be to effects we cannot see such as machine settings).
  • There are some outliers due to known systematic errors that Henry has analysed and will correct, so we won’t be publishing these until we have made the corrections.
  • We shall then start to publish the outliers as we extract them. Some will have known problems and we shall indicate these with our error categories. These will be available for anyone to comment on and we believe we can make good educational use of this.

So here is the first and worst outlier:
10006060.PNG
The difference is enormous – 135 calc vs. 60 obs. So Henry went back to the original paper and found:
nmr1.jpg
The compound is 2a. (I am offline and it would cost 30 USD to read the paper so I will take Henry’s word. It is clear that the observed peak is 122.5 not 60.
We’ll be releasing further outliers as we go. Ideally these should be on a wiki and we should provide identifiers.  Initially we had thought about making the whole data set available but since companies have requested all the data to compute this has made us think about data release strategy more carefully. We’ll have to have split the data between public and private and this will take time.

This entry was posted in data, nmr, open notebook science. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *