Open Notebook NMR – variance is both experimental and theoretical

When making claims in foo-metrics and foo-informatics it is essential to have access to the data and the algorithms used. That’s why, for example, Peter Corbett and Sciborg colleagues are so careful in constructing their corpus. In developing our Open Notebook NMR we have to do the same for our data. As I have explained earlier, variance can come from many sources and all must be examined. So here is the first pass at our raw data – a histogram of the (absolute) internal RMS within a structure, fitted to
y = x + c + eps
where x has units of 13C ppm.
It’s obvious that we should throw away the outlier isn’t it?
But we cannot – absolutely must not – unless we have good reason to do so. Until then it contributes to the variance.

  1. baoilleach says:

    The mean and variance assume a normal distribution and are sensitive to outliers. You should use the median and the inter-quartile range. This isn’t a fudge-factor – the values of the mean and variance are misleading when looking at non-normal distributions.
    Also, why are you plotting the absolute value rather than the actual value? You are throwing away interesting information by folding +ve and -ve values on top of each other.
    This diagram is per structure. Unless you suspect that particular structures have systematic errors, you should also do one per predicted shift. Presumably, particular environments of C atom are more difficult to calculate than others…?

