Nick and I sat down this morning and thought about what possible errors might arise in the “data” or “experimental” axis and also on the “predicted” axis. Some of these may overlap with Antony’s suggestions but they are independent.
An “experimental” error is one that is independent of the prediction of the effect. There are some grey areas (in the compound), but we have come up with:
- mis-reported solvent (the shifts are solvent dependent and the calculation tries to simulate this)
- variable calibration of NMR instrument (e.g. giving rise to origin shifts)
- impure compound. The sample may contain substance(s) which give rise to appreciable peaks not belonging to the title compound
- wrong compound assigned to spectrum (i.e. error in bookkeeping or drawing error)
- machine parameters (phasing, folding, field strength, etc.) varied incorrectly or reported incorrectly
- transcription errors in spectrum or peaks.
- misassignment of peaks to inappropriate atoms
- broad peaks with considerable variance leading to misreporting of mean (unlikely with 13C)
- errors in applying theory of NMR or its interpretation
- noise (including random noise and mains spikes).
- human editing of spectra including fraud
A “prediction” error is independent of the reported value for the shift. Some are theoretical, some are computer “bugs”. These include:
- mis-calculation of offset (e.g. from isotropic tensor to observed shift)
- garbling of the assignment of peaks to atoms (bug)
- corruption of connection tables (especially in adding hydrogen atoms)
- mismapping of atoms between input and output of calculation (we assume atoms come out in the order they go in – bug)
- incorrect generation of input (bug)
- program bugs in reading input and main calculation. For example we found a really nasty bug with GAMESS – if the line overflowed 80 characters the atom was reported but not include in the calculation.
- incorrect transformation of output to CMLSpect
- theoretical model has limitations (Henry will comment)
- Oversimplified chemical model. There are several common problems:
- only one conformer is calculated
- symmetry is not well treated
- tautomerism is ignored
- isomerism (e.g. ring-chain is ignored)
- other chemical effects (Antony mentions micelles, etc.)
There are also potential bugs in the computational side:
- inconsistent results from different machine architectures
- errors in processing and displayng the results
So, we look forward to sharing this with Christoph tomorrow. Nick has prepared a range of display tools, including a filter for the errors within structures. Ideally the claculated value (y) shoudl relate to the observed one by:
y = x + eps
where eps is normally distributed. In practice we expect that we shall find
y = x + c + eps
where c varies between entries and reflects the errors in origins and solvents. We don’t know what the magnitude will be. We don’t see any need at present for
y = m*x + c + eps
where there is empirical scaling.
The intra-compound comparison will highlight entries with the following features:
- high precision, high accuracy
- high precision, low accuracy (hopefully allowing identification of systematic error)
- low precision, high accuracy (maybe due to noise, though this is unlikely here)
- low precision, low accuracy (these may allow us to identify problems with various sources such as authors, machines or protocols).