I am very grateful to hko and Wolfgang Robien for their continued analysis of the results of Nick Day’s automated calculation of NMR chemical shifts, using the GIAO approach (parameterized by Henry Rzepa). The discussion has shown that some structures are “wrong” and rather more are misassigned.
Wolfgang Robien Says:
November 11th, 2007 at 10:01 am e
we need ‘CORRECT’ data – many assignments of the early 70’s are absolutely correct and useful for comparison […]
As a consequence of your QM-calculations 10 assignment corrections and 1 structure revision within a few hundred compounds have been performed by ‘hko’ (see postings above) – this corresponds to an
error rate of approx. 5% ! [PMR: In the data set we extracted from NMRShiftDB]. [… discussion of how such errors are detected snipped…]
PMR: Part of the exercise that Nick Day has undertaken was to give an objective analysis of the errors in the GIAO method. The intention was to select a data set objectively. It is extremely difficult to select a representative data set by any means – every collection is made with some purpose in mind. We assumed that NMRShiftDB was “roughly representative” of 13C NMR (and so fat this hasn’t been an issue). It could be argued that it may not have many organometallics, minerals, proteins, etc. and I suspect that our discourse is mainly about “small organic molecules”. But I don’t know. It may certainly not be representative of the scope or GIAO or HOSE codes. Again I don’t know. Having made the choice of data set the algorithm for selecting the test data was objective and Nick has stated it (< 20 heavy atoms, <= Cl except Br, no adjacent acyclic bonds). There may have been odd errors in implementing this (we got 2-3 compounds with adjacent acyclic bonds) but it was largely correct. And it could be re-run to remove these. We stress again that we did not know how many structures we would get and whether they would behave well in the GIAO method. In fact over 25% failed to complete the calculation. (We are continuing to find this – the atom count is not a perfect indication of how long a calculation will take which can vary by nearly a factor of 10).
We would not claim that the remaining ca. 250 compounds were "representative". There are no organometallics, no electron-deficient compounds, no overcrowded compounds, no major ring currents, etc. (all of which are areas where we might expect GIAO to do better than some empirical methods). In fact the compounds are generally ones that we would expect connection-table-based methods to score well on as there are few unusual groups (so well trained) and no examples where the connection table cannot describe the molecule well (e.g. Li4Me4, Fe(Cp)2, etc.
Our current conclusion is that the variance in the experimental data is sufficiently large (even after removal of misassignments) to hide errors in the GIAO method. This appears to give good agreement with an RMS of ca. 2 ppm. (but again we stress that the data set is not necessarily representative). If the Br/Cl correction had not been anticipated it would have been clearly visible and the exercise would have revealed it as a new effect. It is certainly possible that there are other undetected effects (especially for unusual chemistry). But, for common compounds I think we can claim that the GIAO method is a useful prediction tool. It should be particularly useful where connection tables break down and here are some systems I'd like to see it exposed to:
- Fe(Cp2) – although Fe is difficult to calculate well.
- p-cyclophane (C1c(cc2)ccc2CCc(cc3)ccc3C1)
- It is agreed what the chemical scope is. I think we would all exclude minerals, probably all solid state, proteins, macromolecules (there are other communities which do that). But I think we should include a wide chemical range if possible.
- The data set is prepared by one or more NMR-expert groups that have no particular interest in promoting one method over another. That rules out Henry, Wolfgang, ACDLabs, and probably NMRShiftDB.
- The data set should provide experimental chemical shifts and the experts should have agreed the assignments by whatever methods are currently appropriate – these could include a group opinion. The assignments should NOT have been based on any of the potential competitive methodologies.
PMR: So what I would like is a representative test data set that could be used for the GIAO method. The necessary criteria are:
For a competition there would be stronger requirements – it is essential it is seen to be fair as reputation and commercial success might hang on the result.
So I make my request again. Please can anyone give me some data that we can use in an Open experiment to test (and if necessary validate/invalidate) the GIAO method? At this stage we’d be happy to take material from anyone’s collections, but it would have to be Open so that other groups have the chance to comment.
I hope someone can volunteer. If not we may have to resort to (machine) extraction of data from the current literature. Our experience with crystallography suggests that the reporting and quality of analytical data in general has increased over the last 10 years.