Open NMR: contributions from the community about outliers and assignments

We are delighted at the practical and helpful  contributions from members of the community in helping to understand or correct outliers in the data set we are using. This is exactly what we hoped would happen at the start of the project and it has not started to gain momentum. I list some of them below to acknoledge the help. It is also highlighting the need for better tools for such collaborative projects – a blog is a poor mechanism but wikis also have their failings.
To reiterate:

  • Nick has been through the dataset by hand and identified all data sets with potential misassignments or other anomalies. This has been done by comparing agreements within each set. A data set is likely to have been flagged if (a) it has a single widely outlying shift (b) two peaks (a, b) have coordinates yb, xa (as we have shown) giving an “X”-like pattern (c) has a large general scatter considerably greater than the average.
  • Nick will post the major outliers based on RMSD. I don’t know how many there will be but I expect about 50 (hence the “20%”). These will be clickable – i.e. anyone with an SVG browser can imemdiately find our which peak is linked to which atom.
  • After, and only after, these have been cleaned or accepted we will try to see if there are systematic effects in the data  – either the variance or the precision. We could expect that data from various sources could provide much of the variation, or the date, or the field strength, or the temperature, or the solvent. Unfortunately we do not have all the metadata as it isn’t present in the CMLSpect files.
  • Finally we may be able to comment on Henry’s method. It is possible that certain functional groups have problems (Nick has some suspicions) but at present these are overwhelmed by variance from other sources in the experiment or its capture

So here are examples of useful comments. (I am not sure why Pachyclavulide-A is relevant – I can’t find it by name search in NMRShiftDB – but the effort is appreciated. However we are primarily looking for comments on the outliers we have identified.)

  1. Egon Willighagen Says:
    October 26th, 2007 at 1:15 am eThe first one is another misassignment. Look up the structure in the NMRShiftDB and you will see one correctly assigned and one misassigned spectrums. This kind of issues should be filed as ‘data’ bug report at:
    http://sourceforge.net/tracker/?atid=560728&group_id=20485&func=browse
    I’m will do this one.
  2. Egon Willighagen Says:
    October 26th, 2007 at 1:17 am eFiled as:
    http://sourceforge.net/tracker/index.php?func=detail&aid=1820353&group_id=20485&atid=560728
  3. Wolfgang Robien Says:
    October 26th, 2007 at 8:48 am eAnother error: Pachyclavulide-A (should be C26 instead C27), MW=510
    Found automatically by the following procedure within CSEARCH:
    Search all unassigned methylgroups located at a ring junction. The methylgroup must be connected either with an up or down bond. As an additional condition, it can be specified if only “Q’s” are missing or if the multiplicity of missing lines can be ignored. I think a quite sophisticated check which goes into deep details of possible error sources. […]
  4. hko Says:
    October 27th, 2007 at 12:02 pm eMisassignments NMRShiftDB (10008656-2) removed.
  5. hko Says:
    October 27th, 2007 at 5:28 pm eMisassignments NMRShiftDB (10006416-2) removed. 45.0 and 34.4 reversed.
This entry was posted in nmr, open issues, open notebook science. Bookmark the permalink.

One Response to Open NMR: contributions from the community about outliers and assignments

  1. PMR wrote:
    So here are examples of useful comments. ( I am not sure why Pachyclavulide-A is relevant – I can’t find it by name search in NMRShiftDB – but the effort is appreciated…. )
    WR:
    http://nmrshiftdb.org —> SEARCH —-> chemical name search / regular expression / ‘PACHY’
    —> 6 hits / as far as I remember 5 of them are wrong
    Why did I mention this example: Beyond the scope of your selection criteria for your QM-
    calculations for many reasons:
    (1) MWT > 500
    (2) More than 20 heavy atoms
    (3) Incomplete spectrum – the reason for this incompleteness is the wrong structure itself ! 1 carbon has been ‘invented’ during data input – therefore no chemical shift is available …. found automatically by a ‘robot’ named CSEARCH – there is no way around to inspect the literature to be sure about the source of the error
    Another – maybe commercial – remark with respect to N-Methyl-piperidone:
    QM, HOSE, NN, Incr give the similar result – there must be something wrong with this entry. Now there is a nice webpage on your blog showing a diagram and a short descriptive text – lets assume it takes 10 minutes to have all online. There are 5 comments ( 2x EW, 1x WR, 2x HKO ) – lets assume 5 minutes per comment ( I am quite sure that Egon has looked it up on nmrshiftdb, takes also some time, he has put the link to sourceforge, then he wrote a message into the tracker, I have commented ‘off topic’ another data error (I have checked for security reasons too), HKO has done the work and also documented it)
    Now lets add: 10 + 5×5 minutes = 35 minutes, lets say half an hour time of 4 highly qualified scientists, 1 hour maybe Euro 100.- (our salary, room rental, computer time, network traffic, etc.)
    Costs to the scientific community: Euro 50.- corresponding to approx. 70 spectra views at CHEMGATE
    or approx. 4 months accessing NMRPredict ONLINE FULL

Leave a Reply

Your email address will not be published. Required fields are marked *