petermr's blog

A Scientist and the Web


Open Chemistry Data at NIST

I had a wonderful mail this morning from Steve Heller …


I am helping the NIST folks get additional GC/MS EI (electron impact only) mass spectral for their WebBook and mass spec database.

The question I have for you is would you be willing to post something on your blog suggesting it would be useful for people to donate their EI MS to the NIST folks. The WebBook is Open Data which is where the spectra would go first/initially. In addition, the spectra would also go into the NIST mass spec database to add to the existing database they provide.

NIST is in the process of setting up an arrangement with the Open Access Chemistry Central folks to do this and I wanted to see if you also would be willing to cooperate/collaborate as well.



PMR: Many of us have known the NIST webbook for many years. It was the first, and for some time the only, openly accessible chemistry resource on the web (outside bio-stuff like PDB). NIST are a US government agency whose role is – in large part – to produce standards (data, specs) for resources in science and engineering. Part of this role is to support US commerce through these activities.

The webbook has many thousands of entries for compounds. Even if you aren’t a chemist, have a look as it’s an ideal exemplar of how data should be organised. The impressive thing is that it has complete references for all data and also concentrates on error estimation. In many ways it is the gold standard of chemical data. (I agree that things like Landolt-Bernstein are very important but in the modern web-world monographs costing thousands of dollars are increasingly dated). And it was Steve and colleagues (especially Steve Stein) who got the InChI process started – because they had so much experience in managing data publicly it made sense to promote the InChI identifier for compounds.

(In passing, NIST has also made an important contribution to our understanding of the universe by measuring the fundamental constants to incredible accuracy).

So is NIST in CKAN – the Open Knowledge Foundation’s growing list of packages of Open Data? YES (from




The NIST Data Gateway provides easy access to NIST scientific and technical data. These data cover a broad range of substances and properties from many different scientific disciplines.


Much of the material appears to be in the public domain as it is produced by the US Federal Government, but it varies from dataset to dataset.

Note that there is some fuzziness about what is meant by openness here – the NIST pages carry “all rights reserved” and “the right to charge in future”. But Steve’s motivation is clear here and it’s part of the role of OKFN/CKAN to help determine what the rights are.

I’m also interested in the reference to Open Access Chemistry Central. This raises the very important question of where Open Data should be located. The bioscience community has shown that a mixture of (inter)governmental organizations can work extremely well but this is less clear in chemistry at present. We are in exploration phase with a number of initiatives trying out models such as Pubchem (gov), Chemspider (independent/commercial), Crystaleye (academic), NIST (gov), Wikipedia Chemistry (independent), NMRShiftDB(academia), Chemistry Central (commercial/publisher) etc. I am sure there will be a need for multiple outlets – the variation in the sites above is too great for any single organization.

What is important is that this is Linked Open Data because then it does not matter who exposes it. LOD has a number of requirements including

  • Open Data (not just accessible)

  • Semantic infrastructure (e.g. XML/RDF)

  • Identifier systems

  • Appropriate metadata and/or Ontologies

I’ll be talking about this at BioIT next week in Boston (where I shall meet up with Steve). I’ll be bloggins more over the next two days.

In Cambridge we have just been funded by JISC to enhance our repository of chemistry data, which will include Mass Spec. I don’t know how much is EI, but our mission is to make the data Open and where this happens then we will certainly send it off to Steve. There’s a certain amount of technology needed but between us I think we could get an excellent public prototype.

More – much more – soon.

This blogpost was prepared with ICE+OpenOffice.


Leave a Reply