I’m interested in the nmrdb database and toolkit for NMR spectra. I don’t know how long this has been going, but I have only known about it for a day. I used it to predict a spectrum (NMR Prediction through nmrdb.org), and now I’d like to know how it did it. I guessed it was based in some way on geometrical models because protons which were topologically equivalent had different chemical shifts and couplings. So I went to the home page and found:
This page allows to predict the spectrum from the chemical structure based on “Spinus”. You may find more information on the authors website.
- Aires-de-Sousa, M. Hemmer, J. Gasteiger, “Prediction of 1H NMR Chemical Shifts Using Neural Networks”, Analytical Chemistry, 2002, 74(1), 80-90 most of the proton descriptors are explained. In that work they were used for the prediction of 1H NMR chemical shifts by counterpropagation neural networks.
- Y. Binev, J. Aires-de-Sousa, “Structure-Based Predictions of 1H NMR Chemical Shifts Using Feed-Forward Neural Networks“, J. Chem. Inf. Comp. Sci., 2004, 44(3), 940-945 the development of the FFNNs and the selection of descriptors is explained.
- Y. Binev, M. Corvo, J. Aires-de-Sousa, “The Impact of Available Experimental Data on the Prediction of 1H NMR Chemical Shifts by Neural Networks“, J. Chem. Inf. Comp. Sci., 2004, 44(3), 946-949 the use of an additional memory is described.
PMR: I missed the link to the authors’ webpage (the font is small) and thought I would start by looking at the literature references. There should be enough in the abstracts to give me a general idea. [to avoid reproducing the whole abstract – which might go beyond fair use – I have removed the past tense of the verb “to be”]. The first abstract read
Copyright © 2001 American Chemical Society
[… authors snipped…]
Counterpropagation neural networks […] applied to the fast prediction of 1H NMR chemical shifts of CHn groups in organic compounds. The training set consisted of 744 examples of protons that […] represented by physicochemical, topological, and geometric descriptors. The selection of descriptors […] performed by genetic algorithms, and the models obtained were compared to those containing all the descriptors. The best models yielded very good predictions for an independent prediction set of 259 cases (mean absolute error for whole set, 0.25 ppm; mean absolute error for 90% of cases, 0.19 ppm) and for application cases consisting of four natural products recently described. Some stereochemical effects could be correctly predicted. A useful feature of the system resides in its ability to be retrained with a specific data set of compounds if improved predictions for related structures are required.
PMR: This gives some, but not really enough , information about the method. In particular were 3D coordinates of the molecule generated, and if so how. Were shifts averaged across topologically equivalent protons? So I went to the second abstract (again copyright ACS so the full abstract is not given):
Feed-forward neural networks […] trained for the general prediction of 1H NMR chemical shifts of CHn protons in organic compounds in CDCl3. The training set consisted of 744 1H NMR chemical shifts from 120 molecular structures. The method […] optimized in terms of selected proton descriptors (selection of variables), the number of hidden neurons, and integration of different networks in ensembles. Predictions […] obtained for an independent test set of 952 cases with a mean average error of 0.29 ppm (0.20 ppm for 90% of the cases). The results […]significantly better than those obtained with counterpropagation neural networks.
PMR: Still nowhere near enough information. Now I’m at home (it’ a public holiday in UK) and I’m watching the cricket (which is absorbing). Although I could get a login to Cambridge library I choose not to as it gives me the position of a second-class citizen empoversihed through closed access. So maybe the authors have bought ACS “Author Choice”. Unfortunately not. I will have to pay 2 * 25 USD for access to these articles. And the access only lasts for 48 hours. These are the sort of papers than can’t easily be fully digested in 2 days. I’m not sure what happens if I keep copies on my hard disk – I expect that it bursts into flame like Mission Impossible and I daren’t take the risk. BTW I can see no possible point in restricting access to 48 hours – and it’s yet another indication of the publisher treating the scientific community solely as a source of revenue. Maybe the ACS staff who read this article will enlighten us.
The point is that this type of procedure – however necessary or not to the survival of the publisher – causes great problems to the community. So the price of preserving a reader-pays publishing economy is to slow down science, encourage many scientitsts to avoid reading the literature and generally to reduce the coherence of the scientific process. I, for example, am unlikely ever to read these papers now.
[NMRDB note. The link I missed explained that the prediction was based on CORINA structures. I am still unclear generally about nmrdb.org – does it have spectra? how many? are they freely available? etc. The web site says very little.]