Nick Day’s procedure has generated the agreement – and disagreement – between observed and calculated NMR shifts. In my post Open Notebook NMR – the good and the ugly I highlighted one of the worst disagreements. I hesitated to say “the structure is wrong” because I am not an expert in either NMR or this group of natural products”, but I would have bet on it.
Now there is general agreement that the structure is wrong:
Wolfgang Robien Says (in comment on post above)
[… details snipped…]
(5) Application of ANY PREDICTION SOFTWARE PACKAGE should put a lot of RED and/or YELLOW markers on this entry ….. CSEARCH does this AUTOMATICALLY during display – a NN-spectrum is calculated on-the-fly whenever you display an entry ! (ACD does it too according to the examples I have seen)
Conclusion: Guys, its all there, we only need a few drops of glue putting the pieces together
PMR: I think we have good agreement here. The glue that is needed is between the NMR community and the publication process, ultimately to generate semantic publishing. My involvement is to try to “sell” the idea of semantic publishing and data valiation to the chemical authors and to the publishers. If an author is spending 3000 USD to publish a paper, then it should not be impossible to find part of that to validate the data.
Hopefully this acts as a signal to reduce the number of wrong structures in future.
PMR: Now there is general agreement that the structure is wrong:
It’s a conclusion based on good reasons that this structure is wrong – you, HKO and me, we came independently to the same result – general agreement …. I hesitate to say so !?
What happens, when you use HOSE-code technology and this is a new compound (therefore not in the database):
Predictions perform at a lower level (not over 5 or more bonds), in my case predominantly at the 1 and 2 bond level, 2 carbons at level 3, 2 carbons at level 4. There is a significant difference between the prediction and the exp.values. NN gives deviations of 5-19ppm per carbon.
What happens, when you use HOSE-code technology again and this wrong compound is already in the database:
NMRShiftDB: The compound is already flagged, but still used for predictions, you end up with the identical (but wrong !) spectrum (because your dataset is wrong, not because of the algorithm !), you get predictions over 6 bonds, but only 1 reference. No warning, ranges=zero.
CSEARCH & NMRPredict: You end up also with 1 reference and the same wrong values, because the prediction engines uses the identical wrong entry, BUT 9 (out of 10) carbons are highlighted as ‘suspicious’ because the deviation between NN and HOSE is fairly large (5-19ppm) -> RED COLORED values always mean ‘Be careful, something might be wrong’
Conclusion: We need (at least) 2 independent methods for prediction !
PMR: …. My involvement is to try to “sell” the idea of semantic publishing and data valiation to the chemical authors and to the publishers. If an author is spending 3000 USD to publish a paper, then it should not be impossible to find part of that to validate the data.
You are absolutely right – sorry, that I repeat my argumentation, but a part is definitely new and very specific for this paper:
NMRPredict Online Full allows 25 predictions per day (background: 425,000 NMR-spectra, 345,000 out of them for C13, 2 methods available) for Euro 155.- per year: 155 / ( 365 * 25 ) = 1,7 Eurocent per prediction ….
Data validation:
(1) I think the money cant be the limitation
(2) Equipment used is also not cheap (NMR,LC-MS,etc.)
(3) this is the AUTHOR(s) obligation at the first instance.
(4) The second instance is the refereeing system – a very critical reader contribution has been published with respect to this topic in CEN about half a year ago …….
Specific for this paper: …. The 1H and 13C NMR spectra enabled us to interprete and to determine the CONFIGURATION of the new antibiotic ….. (page 1299, lower right corner). All C-signals are ‘S’, but some are attached to H’s, the numbering system (H’s and C’s) is inconsistent (p1298/upper right) …..
BTW: When dealing with this problem I have found a carboxyl-group at 141 ppm in another article …….
“NMRShiftDB: The compound is already flagged, but still used for predictions, you end up with the identical (but wrong !) spectrum (because your dataset is wrong, not because of the algorithm !), you get predictions over 6 bonds, but only 1 reference. No warning, ranges=zero.”
The NMRShiftDB generally does not do this, I believe. It requires a certain number of shifts for a HOSE-code level, before it will use it. If there are too few shifts for a level, with will go one level lower. That would be less precise (larger s.d.), but more accurate (lower bias, relatively).
Egon Willighagen says: The NMRShiftDB generally does not do this, I believe.
Wolfgang Robien says:
Let’s turn your ‘believing’ into ‘knowledge’ by starting a web-browser, calling ‘http://nmrshiftdb.org’, drawing the structure and pressing the ‘Predict spectrum’ button ….
The result can be seen on: http://nmrpredict.orc.univie.ac.at/csearchlite/Believing_and_Knowledge.html