Open Chemistry Data at NIST

I had a wonderful mail this morning from Steve Heller …

Peter


I am helping the NIST folks get additional GC/MS EI (electron impact only) mass spectral for their WebBook and mass spec database.
http://webbook.nist.gov/chemistry/
and
http://www.nist.gov/srd/nist1a.htm

The question I have for you is would you be willing to post something on your blog suggesting it would be useful for people to donate their EI MS to the NIST folks. The WebBook is Open Data which is where the spectra would go first/initially. In addition, the spectra would also go into the NIST mass spec database to add to the existing database they provide.
NIST is in the process of setting up an arrangement with the Open Access Chemistry Central folks to do this and I wanted to see if you also would be willing to cooperate/collaborate as well.
Cheers

Steve

PMR: Many of us have known the NIST webbook for many years. It was the first, and for some time the only, openly accessible chemistry resource on the web (outside bio-stuff like PDB). NIST are a US government agency whose role is – in large part – to produce standards (data, specs) for resources in science and engineering. Part of this role is to support US commerce through these activities.

The webbook has many thousands of entries for compounds. Even if you aren’t a chemist, have a look as it’s an ideal exemplar of how data should be organised. The impressive thing is that it has complete references for all data and also concentrates on error estimation. In many ways it is the gold standard of chemical data. (I agree that things like Landolt-Bernstein are very important but in the modern web-world monographs costing thousands of dollars are increasingly dated). And it was Steve and colleagues (especially Steve Stein) who got the InChI process started – because they had so much experience in managing data publicly it made sense to promote the InChI identifier for compounds.

(In passing, NIST has also made an important contribution to our understanding of the universe by measuring the fundamental constants to incredible accuracy).

So is NIST in CKAN – the Open Knowledge Foundation’s growing list of packages of Open Data? YES (from http://www.ckan.net/package/read/nist)

Metadata:

Notes:

About

The NIST Data Gateway provides easy access to NIST scientific and technical data. These data cover a broad range of substances and properties from many different scientific disciplines.

Openness

Much of the material appears to be in the public domain as it is produced by the US Federal Government, but it varies from dataset to dataset.

Note that there is some fuzziness about what is meant by openness here – the NIST pages carry “all rights reserved” and “the right to charge in future”. But Steve’s motivation is clear here and it’s part of the role of OKFN/CKAN to help determine what the rights are.

I’m also interested in the reference to Open Access Chemistry Central. This raises the very important question of where Open Data should be located. The bioscience community has shown that a mixture of (inter)governmental organizations can work extremely well but this is less clear in chemistry at present. We are in exploration phase with a number of initiatives trying out models such as Pubchem (gov), Chemspider (independent/commercial), Crystaleye (academic), NIST (gov), Wikipedia Chemistry (independent), NMRShiftDB(academia), Chemistry Central (commercial/publisher) etc. I am sure there will be a need for multiple outlets – the variation in the sites above is too great for any single organization.

What is important is that this is Linked Open Data because then it does not matter who exposes it. LOD has a number of requirements including

  • Open Data (not just accessible)

  • Semantic infrastructure (e.g. XML/RDF)

  • Identifier systems

  • Appropriate metadata and/or Ontologies

I’ll be talking about this at BioIT next week in Boston (where I shall meet up with Steve). I’ll be bloggins more over the next two days.

In Cambridge we have just been funded by JISC to enhance our repository of chemistry data, which will include Mass Spec. I don’t know how much is EI, but our mission is to make the data Open and where this happens then we will certainly send it off to Steve. There’s a certain amount of technology needed but between us I think we could get an excellent public prototype.

More – much more – soon.

This blogpost was prepared with ICE+OpenOffice.

This entry was posted in "virtual communities", Uncategorized and tagged . Bookmark the permalink.

5 Responses to Open Chemistry Data at NIST

  1. OT: ICE+OpenOffice– (bad font sizes…)
    But for the rest: great news!

  2. I understand NIST is a US Federal Government Agency – hence one might expect that the material they produce is exempt from copyright, and hence in the public domain. It would be great if Steve could help to clarify this! (Its not entirely clear to me what the ‘All Rights Reserved’ notice applies to.)
    Also it would be useful if Steve or anyone else has any more download links to add!

  3. Physchim62 says:

    NIST is covered by an exemption to the usual rules concerning copyright of US federal government works (see 15 U.S.C. 290e). Hence, the selection of material to include in their collections is protected by copyright, even if the original data is public domain. This is what the “all rights reserved” notice applies to, as well as the long standing warning that they reserve the right to charge for access in the future.

  4. We have recently published a series of NIST webbook spectral data on ChemSpider to populate the SpectralGame (www.spectralgame.com). The data are NOT open but are copyrighted and I asked the hosts directly whether we could source any of their JCAMP data. We were given permission to copy up to 30 spectra and deposit on ChemSPider and we did so. We stopped at 30 as requested. The data are not Open but it would be good to see if that would happen as we would grab the data and populate quickly.
    I have suggested in an email that we meet at Bio-IT World with Steve Heller to discuss his project and how we can help him. I am especially interested in your comments above “We are in exploration phase with a number of initiatives trying out models such as Pubchem (gov), Chemspider (independent/commercial)…”. I’d love to hear how ChemSpider can help and who you are exploring us with?

  5. Interesting, we are in the process of adding simple spectral tutorial to our tutorial section. We hope to be done by summer. We are currently looking for simple spectra we can screen grab.

Leave a Reply

Your email address will not be published. Required fields are marked *