Open Notebook NMR – cont

We are now close to releasing the first results of the calculations – at present 300+ molecules. We think that really major foul-ups have been superseded (i.e. when all the Gaussian files failed to run because of a missing blank line, etc.) So we think it’s worth the community listening in.
To set the scene. This is part of Nick Day’s thesis work and Nick will be first author of anything to come out in the immediate future. Henry Rzepa provided much of the motivation and also the algorithms that we are now using. He first gave us an extension of the GIAO and Rychnowsky methods and then elaborated this in a further protocol which is what we are now using. This protocol is based on his own work at Imperial where her has computed a number of structures and gradually refined the methods. So this is his current best guess as to what we need, although there are some refinements for halogens that need to be added after the calculation.
Christoph has provided the NMR data from NMRShiftDB. Of course it comes from various sources but we shall rely on his judgement as to whether a structure is likely to be “wrong”. This is a difficult one – we cannot simply remove a structure because it doesn’t fit but he may be able to assert that there is a known problem. We may also have generic filters like the laboratory it came from.
These are the expected initial authors and we’ll see how things go. Christoph and Henry and Nick and I will have a few days to inspect the data before releasing it all. This should remove any really obvious “data errors” and also allow us to plan any further refinements. For example Henry has looked at the really glaring outlier and suggested a protocol change though we don’t think it will account for all the deviation.
People’s contributions will necessarily be recorded and so it will be clear what has been done. In the first instance I think we shall use the NMRShiftDB data and the Imperial protocol to give us an idea of the tractability of the method.
We absolutely welcome any input. We’ll be fairly focussed on a thesis-like approach for the next month or so, but may branch out. Here are some highly valuable suggestions

  1. ChemSpiderMan Says:
    October 22nd, 2007 at 2:49 pm eI’ve seen the discrepancies Jean-Claude is talking about many times. However, a difference of 0.2ppm in C-13 is pretty much irrelevant. Admittedly [… discussed elsewhere – PMR …]
  2. Peter – FYI ACD/Labs are ready to participate in the work as discussed: and Tony,I think this is a fantastic project and am very keen to see how accurate the QM techniques prove to be for the subset of structures that you choose from the NMRShiftDB, and then how helpful they can be in improving the accuracy of experimental shifts in this wonderful resource.For the purposes of this work, we would be willing to provide the chemical shift predictions from the ACD/Labs software if you would like to use them in your comparison. If, for instance, they prove to be accurate enough to find many of these problems without the need for time consuming QM calculations, it may be preferrable to use the faster calculation algorithms that are available in our software. It may turn out that the ACD/Labs predictions could serve as a pre-filter to define which structures need the QM calculations and which don’t. Many variations on this theme come to mind, but we won’t know which are useful until we do the work.
    Brent Lefebvre
    NMR Product Manager
    Advanced Chemistry Development, Inc.

PMR Many thanks. I think this will be extremely useful in the next phase of the program (which could be quite soon). At present Nick needs to concentrate on the Gaussian stuff as it is fairly easy to initiate a new protocol and re-run the jobs in perhaps 2 days. The results of this will then give us an idea of where the main problems. If, for example, we find 5% of structures are misassigned, that is ca 15. Not to difficult to do by hand. But if we then scale this to 20,000 in NMRShiftDB then it’s 1000 entries and we have to automate or fan out the social computing. If, however the data error rate is 0.5% then 100 problems in NMRShiftDB is a long wet afternoon for the dedicated few.
The data quality are critical. Joe Townsend went round this loop several times before coming up with a usable protocol for filtering problems. It’s harder for NMR, but there are some tricks we may be able to play to weed out the worst.

This entry was posted in data, nmr, open notebook science. Bookmark the permalink.

One Response to Open Notebook NMR – cont

  1. Peter, all that is needed to perform the calculations for comparison using the ACD/Labs NMR predictors is a download of the exact dataset Christoph provided to you (we have already had issues with comparing algorithm to algorithm but using different versions of the NMRShiftDB database…not good). Also, if Nick can send us the ID of the structure inside the NMRShiftDB this should be enough. Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *