Quixote: CML is now an infrastructure for Computational Chemistry

#quixotechem #cmlcomp

We have had a fantastic two days at Daresbury developing a prototype of a repository system for computational chemistry. Although computational chemistry is a major scientific discipline, supporting bioscience, materials, chemical reactions, energy, etc. it consists of isolated programs (codes) with no common semanatics of communactions. Although hundreds of millions of dollars (or more) are spent on compchem, almost none of the data is made public. This lack of interoperability means that inter-program communication has to be created for each pair of programs, leading to an n-squared number of interconverters, most of which have no semantic model and are highly lossy. Similarly if you want to visualize the results, then you often need one-viewer-per-code.

The Quixote-CML initiative has now changed this. There is a common semantic model base on CML. This model is democratic but controlled. A special interest group creates their own model, based on CML vocabulary, and defines the practices required to make it work. There is no “central top-down control” other than to specify how the conventions are created (but not their content) and the base vocabulary of CML.

This means that, with foresight from the conventions, software creators (such as Avogadro) can expect to read a wide range of files and display the contents without breaking. (Avogadro is much more than a visualizer, and with the semantics of CML embedded is likely to become a major resource in compchem and it has an “intelligent” approach to the content.)

As always we set ourselves a very ambitious target – to create a searchable repository that could consume the output of any semantified code. We’ve currently written lossless parsers for NWChem, MOLCas, Turbomole, Gaussian, Dalton, and GAMESS-US. The parser technology is such that it’s a few days to write one for a mainstream code and the results is lossless. If anyone is interested, consults us and we’ll show the way forward – it will add considerably to the public value of your output

We are also developing tools for solid state, with Quantum Espresso as the leading code. It was exciting to see the solid state group starting to form their own convention and repurposing CML for that.

I can’t thank everyone but …

  • Jens Thomas for his enormous enthusiasm and vision. He has made this happen.
  • Paul Sherwood and colleagues at Daresbury for positive involvement and funding.
  • Pablo Echenique for his continued enthusiasm and support while not being able to travel to Daresbury. Jorge Estrada making the second part of the Spanish team
  • My colleagues Sam, Joe and Weerapong. They go along with the crazy deadlines and make them work
  • Marcus Hanwell for the great belief and the enormous support from Avogadro

And everyone else at the meeting. Whether or not you believed in all the windmills all the time you took part with enthusiasm and made it work.

So – after 16 years of developing CML this is the first time I have felt that it has become a real fact. People are using it for many different parts of chemistry and continue to advocate it. It’s complex, but not too complex to manage. It’s large, but not too large to implement. It has created its own democracy – which is very rare in chemistry and confined to the Blue Obelisk and a few other areas. The democracy means it moves fast. Faster than many other developments. And because it is continually questioned and open to new ideas the rigour of design is encouraged. It’s now captured the spirit of many mainstream ICT activities.

The next phase is technically trivial. It’s simply to get people to publish their log-files (the output of the calculations). After they have published the work, if necessary. If people did this with it would save millions of dollars of repeated calculations. It would enhance teaching and learning. It would enhance quality. You don’t have to write any code – simply make your files available.

Publishers, just require that authors of computational chemistry results make their files available. Quixote will do the rest.

Funders , just require that authors of computational chemistry results make their files available. Quixote will do the rest.

Universities and research labs, just require that authors of computational chemistry results make their files available. Quixote will do the rest. And your institutions will get greater visibility

It is now unstoppable… the only unknown is the rate of growth.


This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Quixote: CML is now an infrastructure for Computational Chemistry

  1. Peter, is there guidance on writing parsers? I’m wondering about whether to fold this into some of the things we’re trying to do with small angle scattering analysis. It’s a bit of a leap but would be interesting to know where the gaps are. Largely involves taking proteins structures, manipulating them in some way and then simulating some experiment or property. I’ve been writing a couple of cruddy parsers for my own purposes but it would be nice to do something better.

Leave a Reply

Your email address will not be published. Required fields are marked *