I have spent a splendid day at NIST in Boulder invited by Michael Frenkel of TRC – a group which captures thermochemistry data from the literature and elsewhere on behalf of the US Department of Commerce. Here’s what they do – and I’ll explain why it’s exciting.
======
Tasks
Located in Boulder, CO, TRC Group performs several functions related to providing state-of-the-art thermodynamic data:
- compiles and evaluates experimental data
- develops tools and standards for archival and dissemination of thermodynamic data, especially critically evaluated data
- develops electronic database products
- maintains a web-repository of published data in ThermoML — an XML format developed by TRC for the representation of thermodynamic data
About TRC
TRC specializes in the collection, evaluation, and correlation of thermophysical, thermochemical, and transport property data. The goals of TRC are to establish a comprehensive archive of experimental data covering thermodynamic, thermochemical, and transport properties for pure compounds and mixtures of well-defined composition, and correspondingly, to provide a comprehensive source of critically evaluated data.
Critically Evaluated Data
An important and useful aspect of our work here at TRC, and of the Physical and Chemical Properties Division of NIST as a whole, is to provide critically evaluated data. Critical evaluation is a process of analyzing all available experimental data for a given property to arrive at recommended values together with estimates of uncertainty, providing a highly useful form of thermodynamic data for our customers. The analysis is based on intercomparisons, interpolation, extrapolation,and correlation of the original experimental data collected at TRC. Data are evaluated for thermodynamic consistency using fundamental thermodynamic principles, including consistency checks between data and correlations for related properties. While automated as much as possible, this process is overseen by experts with a great deal of experience in the field of thermodynamic data. Professional staff are responsible for the evaluation of each set of data that is committed to the archive.
=====
This is the sort of foundation that the data-driven world of the future will be built on. Thermochemistry tells us how the atmosphere works, how energy can be transported, why chemical plants explode and much more.
Michael and his colleagues are typical data scientists and scholars. This wa a role emphasized by the JISC/NSF meeting – we critically need the data scholars of the future but we don’t reward them. It’s not easy to get tenure by collecting and publishing data. It’s difficult to get careers for those who can program, run software projects but don’t publish in “proper peer-reviewed journals”.
So JISC/NSF suggest peer reviewed data-journals – which should be regarded in the same light as text-based publications. The intellectual endeavour can be at least as challenging.
Anyway Michael has developed ThermoML – a markup language for themochemistry. Isn’t this a competitor to CML? Not at all. We’ve kept in touch for several years – been on the same platforms and agreed we’d keep in regular touch. But this was the first time I’ve been able to visit.
In fact there’s a wonderful complementarity between what we are doing. We have some common problems – how to create declarative markup for physical science (we’re going to look at OMDOC). And how CML can be embedded in ThermoML to manage compounds and mixtures. How CML can use the property vocabulary of ThermoML for publications involving physical measurements. As always with markup languages we are driven by real examples so we’ll be exchanging documents to see how easy this is, trying to create robust but flexible markup and will see if we can create rough consensus and running code.
TRC has abstracted ca 20,000 articles over 6 years. That’s a lot of manual labour although a lot is done by willing students. Some of the work is possibly useful for OSCAR…
… However Michael has persuaded 4 publishers into having their data converted into ThermoML and made available freely on the website. See the page with the titles. This is a great pointer to the future – if all journals did this I would have to find something else to blog about. Thanks to all at TRC.
Peter – nice example. Keep finding more of these gems.
Peter:
Nice comments. We truly enjoyed your visit and talk and are looking forward to streenthen our cooperation.
Michael