I am currently refactoring Nick Day’s code that has supported “NMREye” – the collection of Open experiments and Data that he has generated as part of his thesis and have been trailed on this blog ( View post). One intention of this – which got lost in some of the other discussion is to be able to see whether published results are “correct”. This is, of course, not new to us – students here developed the OSCAR toolkit for checking experimental data (View post). The NMREye work suggest that it should be possible to validate the actual 13C NMR values reported in a scientific experiment.
Nick will take it as a compliment that I am refactoring his code. It was written on a very strict timescale – he had to write the code, collect and analyse the results in little more than a month. And his work has a wider applicability within our group. So I am trying to design a library system that supports his ideas while being generally re-usable. And this has very useful consequences for CML – the main question as always is “does CML support enough chemistry in a simple fashion and can it be coded?”. As an example here’s an example of data from a thesis we are analyzing in the SPECTRaT project:
13C (150 MHz) d 138.4 (Ar-ipso-C), 136.7 (C-2), 136.1 (C-1), 128.3, 127.6, 127.5 (Ar‑ortho-C, Ar-meta-C, Ar-para-C), 87.2 (C-3), 80.1 (C-4), 72.1 (OCH2Ph), 69.7 (CH2OBn), 58.0 (C-5), 26.7 (C-6), 20.9 ((CH3)AC-6), 17.9 ((CH3)BC-6), 11.3 (CH3C‑2), 0.5 (Si(CH3)3).
(the “d” is a delta but I think everything has been faithfully copied from the Word document. Note that OSCAR can :
- understand that this is a 13C spectrum
- extract the frequency
- identify the peak values (shiofts) and identify the comments
Try to think how you would explain this to a robot and what additional information you would need. Indeed try to explain this to a non-chemist – it’s a useful exercise.
What OSCAR and the other tools cannot do yet is:
- extract the solvent (this is mentioned elsewhere in the thesis)
- understand the comments
- manage the framework symmetry group of the phenyl ring
- understand peakGroup (the aromatic ring)
So the toolchain has to cover this and much more. However the open source chemistry community (in this case all Blue Obelisk) has provided most of the components. More on this later.
I hesitated 2 days to write a response to your post – but the OSCAR-functionality described above deserves a reply:
————–snip —– from above
As an example here’s an example of data from a thesis we are analyzing in the SPECTRaT project:
13C (150 MHz) d 138.4 (Ar-ipso-C), 136.7 (C-2), 136.1 (C-1), 128.3, 127.6, 127.5 (Ar‑ortho-C, Ar-meta-C, Ar-para-C), 87.2 (C-3), 80.1 (C-4), 72.1 (OCH2Ph), 69.7 (CH2OBn), 58.0 (C-5), 26.7 (C-6), 20.9 ((CH3)AC-6), 17.9 ((CH3)BC-6), 11.3 (CH3C‑2), 0.5 (Si(CH3)3).
(the “d” is a delta but I think everything has been faithfully copied from the Word document. Note that OSCAR can :
* understand that this is a 13C spectrum
* extract the frequency
* identify the peak values (shiofts) and identify the comments
————–snip/end ————————
Lets step through item per item:
* understand that this is a 13C spectrum
No surprise because there is the string ’13C’ in the text
* extract the frequency
No surprise because this is the number BEFORE the string ‘MHz’
* identify the peak values (shiofts) and identify the comments
NO surprise, the shifts are the digits/numbers outside the parenthesis, the comments are within the parenthesis
I think, no, I am absolutely sure, this functionality can be achieved with a few basic UNIX-commands like ‘grep’, ‘cut’, ‘paste’, etc. What you need is the assignment of the signals to specific carbons in your structure, because this (and EXACTLY THIS) is the basis of spectrum prediction and structure verification – before this could be done, you need the structure itself.
Conclusions and Questions:
(1) Why has NMRShiftDB only approx. 20K entries, when OSCAR is around ?
(2) Why have all entries of NMRShiftDB been entered manually, when OSCAR is around ?
(3) The OSCAR-functionality described above corresponds to basic UNIX/LINUX-commands
Personal remarks:
(1) Please dont take me (or the community ?) for a fool
(2) I wouldnt even dare to waste the communities time with the statement ‘Note that OSCAR can:’ and the subsequent table of basic UNIX/LINUX-functionality
(3) My questions about INTERNAL CROSS-CHECKS in NMRShiftDB are still not answered
Another remark from above:
—snip——————-
The NMREye work suggest that it should be possible to validate the actual 13C NMR values reported in a scientific experiment.
——snip/end—————
You are absolutely wrong here: This has been already shown by e.g. Grant with his increment system for C13-NMR prediction which has been published approx. 45 YEARS ago, Bremser and Dubois developed a coding system (HOSE-code) published in the 70’s of the last century, afterwards NN-technology was applied to exactly this problem. The reason for doing a prediction is usually a comparison with the experimental data in order to distinguish between structural proposals – which means verifying (or rejecting) a certain proposal … a long list of academic/commercial software is around for doing EXACTLY this job (ACD, NMRPRedict, NMRShiftDB, CSEARCH, KnowitAll, SPECINFO, CHEMGATE, etc. etc.)
In our internal lab-slang I dare to name such a claim the ‘NIH-syndrom’ (‘Not Invented Here’)
Sorry for being nasty and sarcastic ! Wolfgang
PS: I cant comment during the following 3 days I have to invent the wheel ;-))
Pingback: Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » How OSCAR interprets text and data