Open NMR – again. Why we do it

 Chemspider (who has been doing some useful things recently with making data available and on which I shall comment separately) criticizes our work on NMR prediction by GIAO methods, and says he doesn’t “get it”. So I will continue to try to explain.

A Publication Comparing NMR Prediction Approaches


Those of you frequenting this blog might have read my highly opinionated views of what was originally entitled “Open Notebook Science NMR” (1,2). My views around that work were very strong…in fact I didn’t really “get it”. I didn’t get why GIAO approaches for NMR prediction (with all of the stated limitations) would be done to prove that you could validate NMR assignments by comparing predictions with assignments made experimentally . It’s known that NMR prediction can validate structures – it’s done on a daily basis in commercial software tools. I was involved in building tools like that for over a decade so what was to prove?

PMR: This is a scientific experiement to see if Quantum mechanical methods can predict NMR shifts. The emphasis is on Quantum Mechanics. Does Quantum Mechanics agree with experimental data? At heart it’s as simple as that. Many results of quantum mechanics are not observable – the wavefunction for example. A few things are observable. The geometry. Aspects of the energy. And the interaction of the wavefunction with the nucleus.
The original GIAO method had deviations from experimental. Were these deviations due to experiment or problems with theory. Henry Rzepa thought the methodology needed improvement. So he created new basis sets. He also calibrated the effect due to spin-orbit coupling. Our research has confirmed that the spin orbit coupling effect exists and is of a reproducible magnitude. That wasn’t clear before.
As we continue to get better data we may or may not discover new effects. If so they may be discoveries in physics – who knows?
Physical science works in large maprt by comparing theories with experiment. This is the basis of PhysML about which I may write later. The Open NMR experiment is about comparing theory and experiment. It is NOT about predicting as many structures as rapdily as possible by empirical means. It is about the fundamental ability to predict the properties of matter through quantum mechanics.
We have done exactly the same for molecular geometry. It could be argued that rather than calculating the geometry of a crystal we should simply make it and measure it. We have, for example, showed that many current QM programs are not capable of calculating crystal structures well. That’s a deficiency of the theories and the programs. By highlighting the differences we help to develop better methodology in fundamental theory. That is an unarguable approach in science.
From the practical point of view there are huge numbers of molecules than cannot be well predicted by the empirical NN or HOSE methods. Transition metal compounds. Anything that cannot be represented by a connection table (no-one has responded to my request as to how NN or HOSE would calculate molecules such as Li4Me4). The view of chemistry seen through connection tables is necessarily limited. The view through QM is not.
There are also many chemical effects that can be investigated through QM. It is possible that there are clear and systematic effects due to solvation (e.g. on C=O groups). QM may be able to model these atomistically (i.e. with explicit solvent). NN cannot do this.  And there are many more aspects of chemistry where NMR shifts gives us a window on reality through QM calculations.
But first we have to get some believable Open Data to work with. Then we shall start to create new science.

This entry was posted in open issues, open notebook science. Bookmark the permalink.

6 Responses to Open NMR – again. Why we do it

  1. —snip—
    From the practical point of view there are huge numbers of molecules than cannot be well predicted by the empirical NN or HOSE methods. Transition metal compounds. Anything that cannot be represented by a connection table (no-one has responded to my request as to how NN or HOSE would calculate molecules such as Li4Me4).
    —snip—
    Sorry to say so, but I have the impression you have never understood the basic principle of HOSE-code technology. The basic principle is, that you ‘linearize’ your molecule and built up a table holding these linear structure codes and the associated shiftvalues. The prediction works the same way, but we look up our linear codes and extract the associated shiftvalues. Assume CCl4 – the linear codes consists of one focus atom (the central C) having 4 chlorine attached -> might look like ‘C-Cl,Cl,Cl,CL’. My question is now: in which other molecules do you find this code ? You find it in ‘CCl4’ exclusively ! It is known since about 30 years that HOSE-code works well for medium-sized to large molecules and doesnt work well for small molecules (where you describe the whole molecule when going over 1 (or 2 spheres)). This is basic scientific knowledge for more than 30 years. For me its no surprise that nobody commented on your post …. many people understand HOSE-code technology. On the other hand, this behaviour of HOSE-codes shows the necessity for other prediction techniques (like NN, Increments, PLS and QM). All these techniques have their application range – it would be a ‘clever’ idea to create a ranked hitlist from an isomer generator, when you have e.g. 1,000 structure proposals using QM ….. a basic ability within science is to select the best method to solve a given problem. For predicting shifts of Li4Me4 QM might be best, for ranking 1,000 structure proposal I would prefer something else !
    —snip—
    QM may be able to model these atomistically (i.e. with explicit solvent). NN cannot do this. And there are many more aspects of chemistry where NMR shifts gives us a window on reality through QM calculations.
    —snip—-
    I recommend to read about NN and solvent effects my webpage http://nmrpredict.orc.univie.ac.at/csearchlite/Solvent_dependent_predictions.html – this webpage is online since June 5th, 2007.
    I have the impression, that your posts are higly self-centered combined with a very selective recognition of what is going on at other institutions/companies/universities – a few weeks ago I named it the ‘NIH’-syndrom coming from ‘Not Invented Here’ …….
    ———-snip———–
    Chemspider (who has been doing some useful things recently with making data available and on which I shall comment separately) criticizes our work on NMR prediction by GIAO methods, and says he doesn’t “get it”. So I will continue to try to explain.
    ———snip————-
    Sorry, I didnt get it too !
    ——snip—– from your post ‘835’ http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=835
    The NMREye work suggest that it should be possible to validate the actual 13C NMR values reported in a scientific experiment.
    ——snip-end—-
    When I read such a sentence ignoring 30 years of work of many excellent people, I only cite William Shakespeare ‘Much Ado about Nothing’ ……

  2. Peter…you comment
    “PMR: This is a scientific experiement to see if Quantum mechanical methods can predict NMR shifts. The emphasis is on Quantum Mechanics. Does Quantum Mechanics agree with experimental data?”
    That’s known I thought. There are tens of papers out there already where this has been shown. You’ve commented on this re. Hexacylinol yourself, a famous example of GIAO application. So, it’s at best a repeat experiment, one conducted many times.
    The inclusion of spin-orbit coupling into GIAO NMR predictions has been discussed previously a decade ago:
    http://www3.interscience.wiley.com/cgi-bin/abstract/5008678/ABSTRACT?CRETRY=1&SRETRY=0
    Note the abstract re. normal halogen dependence
    “Spin-orbit coupling is responsible for many heavy-atom effects on NMR chemical shifts, for example, normal halogen dependence. A simple but general model for spin-orbit-induced substituent effects has now been developed by analogy to the Fermi contact spin-spin coupling mechanism (see below). DFT calculations on some simple iodo compounds illustrate the scope and validity of the model.”
    The upfield shifts you talked about here http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=732 regarding the halogens are well-known effects. You commented “the effects can be calculated, and are somewhat basis set dependent. For our basis, Br should be corrected by -12 ppm (and approx -24 for two) and Cl by -3 ppm. S is probably -2ppm, and Iodine -28 ppm. That should probably suffice for the halogens.”
    I’ve done a lot of heavy metal NMR over the years. One Chapter of my PhD focused on pressure and temperature dependent shifts of Co-59 (http://www3.interscience.wiley.com/cgi-bin/abstract/88511686/ABSTRACT). During that work I read a lot about the theory of chemical shifts and specifically the work of Cynthia Jameson. At the same time I was examining the temperature dependence of C-13 shifts in octyl halides …the temperature-based shift dependence of the Carbon alpha to the halogen was significant for all halogens increasing from F through to I. I recalled reading about the normal halogen dependence and it’s connection to spin-orbit coupling at that time.
    A quick search on google on “spin orbit normal halogen dependence” gives as a top hit:
    http://pubs.acs.org/hotartcl/cenear/980928/nmr.html
    This contains the following nugget:
    “…carbon-13 in iodomethanes, were shifted far upfield, way out of character for what was expected. This is called the normal halogen dependence.
    Eventually, in 1969, Japanese theorists determined the effect was due to spin-orbit coupling… ”
    Looks like the effect you’ve noticed was first explained about 40 years ago.
    Ain’t the web great?
    In regards to GIAO vs HOSE, Neural Net and Increment based methods we were hoping to get a good data set for comparison out of your work. What we’ve done in the meantime until the entire dataset is online for us to compare is to look in the literature for GIAO NMR calculations. We’ve got over 30 publications out of the literature to data and have done the comparable predictions. I’ll blog about it soon…it’s consistent with expectations.

  3. An additional comment…i judge there’s maybe only three trained NMR spectrocopists reading your blog…Wolfgang Robien, Christoph Steinbeck and myself . I might be wrong and others can speak up if my judgment is incorrect.
    That said you might want to get someone to post questions about NMR phenomena to groups such as AMMRL (http://chemnmr.colorado.edu/ammrl/). You’re not an Manager of an NMR lab but someone could post questions for you. It’ll be a lot more NMR intellect than you likely have visiting your blog. Best wishes.

  4. Wolfgang…I read your comments with interest.
    For sure there should be intellect in the project around HOSE codes. Christoph Steinbeck from NMRShiftDB visited Cambridge recently (See earlier posts on this blog site) and Christoph definitely understands NMR prediction via HOSE (http://nmrshiftdb.ice.mpg.de/portal/js_pane/P-Help;jsessionid=A784D0C356B875934D7D1EE7B6B0EBB0.tomcat2?URL=using.html#predict)and (http://nmrshiftdb.sourceforge.net/api/org/openscience/nmrshiftdb/PredictionTool.html) as well as neural nets I’m sure. He might get involved with this a little deeper since he is now moving to Cambridge and will be working nearby to Peter’s team.
    I think that the continued focus on this work could possibly be to validate the approach for the SPECTRA project (slide 12/17 on http://www.jisc.ac.uk/media/documents/events/2007/06/alan_tonge.pdf shows the relationship between the predicted and measured spectra). Peter has talked about SPECTRA a number of times on this blog and both Henry and Peter are members of the SPECTRA team. The original project is outlined in the document here http://www.lib.cam.ac.uk/spectra/documents/SPECTRa_edited_project_proposal.pdf.
    I think it’s an ambitious project with appropriate expectations and deliverables.
    If the involvement of the NMR prediction component that I am assuming is real then my only contention is that forcing GIAO predictions and all of the associated limitations is not the best path forward in terms of throughput and performance for validation of NMR assignments. You and I both know that! I would hope for some consideration of the other well-proven approaches. And, as you say, there has been decades of work at other institutions and by specialists in this domain. I say consult with them. It IS what I offered right at the beginning of the Open Notebook Science project and was quite firmly turned down. I think it’s better for all if we function as a community.

  5. pm286 says:

    (1,2,3,4) A lot of this is repetitive and I shan’t reply in detail to material we have covered several times before. I also hope this is the last discussion at this level. I hope that our detractors are trying to understand – if so, here is a clear statement.
    1. The project was a limited 1-2 month project as part of a PhD thesis. It’s finished. The purpose of the PhD thesis (which is forgotten by the detractors) is to see what observables can be gathered from the public literature and computed by QM methods. Much of the work involves data aggregation and cleaning techniques on crystallographic data and more recently NMR. We have, for example, explored whether MOPAC can reproduce crystal structures. In some cases it can, in others it can’t. That’s research in QM and informatics.
    2. The QM aspect was a fundamental part of the thesis. It was not negotiable. However desirable or not, HOSE codes were never part of the scope – it would have been irresponsible to change the direction of the thesis, however attractive they may or may not have been. The projects, both crystallographic and NMR, explore the distribution of errors in both the observed and calculated quantities and devise informatics methods for separating them. That’s research in informatics.
    3. The infrastructure required the development of new informatics tools for metadata, provenance, versioning, etc. and the development of XML and RDF systems. Incidentally it confirmed that the design of CMLSpect was robust and could be used for this exercise (which would have been largely impossible without it). That’s research into informatics.
    4. I have never claimed that this was world-shattering science in the NMR area. It’s a small incremental advance. The GIAO method 5 years ago reported errors of over 5 ppm. The Rychnowsky work (and we have been in close correspondence with Scott) improve that, but Henry thought we could do better. So he introduced another, more expensive, basis function and we have showed it had a small but useful reduction in the RMS. That’s entirely appropriate for 2 months work in a PhD. Science often advances by a number of small advances. This is one. It tells us just a little bit more about what basis functions are required to model certain elements. (We’ve done the same for crystallography. It’s part of a larger picture). That’s incremental research in QM.
    5. You may not believe me but I worked with HOSE codes nearly 20 years ago and have written code. I understand it. It simply wasn’t in scope. There is a fundamental difference between machine-learning approaches – which are interpolative – and QM methods which have degree of absolute predictivity. In general machine learning gives useful tools for prediction within a known domain, but is not normally scalable outside that. (Machine-learning is well-represented in the Centre here. I understand it).
    6. Finally, and slightly sadly, this was offered as a collaborative project. We got some of the “Open Notebook” stuff wrong and we’ve put our hand up. We’d hoped for some contributions from the community and we’ve had this from a few people who have helped to confirm misassignments and other error. (Part of the informatics research was to discover whether data errors could be discovered and this has been largely successful). But we had hoped that someone would actually make a small number of spectra available on which different methods could be tested. It would have been a useful community-building exercise. As it is we shall be likely to go down the SPECTRa route.
    So to sum up: A two-month project, part of a PhD in QM calculations and molecular informatics. Which has show unimpeachable methodology and has essentially been validated by the community. It may not be novel in the NMR field, but then it was never intended to be.
    I can’t put it any clearer and unless there are new arguments I shan’t reply in detail. I shall, of course, post all comments – there is no censorship on the blog.

  6. Pingback: Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » What is data deposition?

Leave a Reply

Your email address will not be published. Required fields are marked *