We have had a fantastic two days at Daresbury developing a prototype of a repository system for computational chemistry. Although computational chemistry is a major scientific discipline, supporting bioscience, materials, chemical reactions, energy, etc. it consists of isolated programs (codes) with no common semanatics of communactions. Although hundreds of millions of dollars (or more) are spent on compchem, almost none of the data is made public. This lack of interoperability means that inter-program communication has to be created for each pair of programs, leading to an n-squared number of interconverters, most of which have no semantic model and are highly lossy. Similarly if you want to visualize the results, then you often need one-viewer-per-code.
The Quixote-CML initiative has now changed this. There is a common semantic model base on CML. This model is democratic but controlled. A special interest group creates their own model, based on CML vocabulary, and defines the practices required to make it work. There is no “central top-down control” other than to specify how the conventions are created (but not their content) and the base vocabulary of CML.
This means that, with foresight from the conventions, software creators (such as Avogadro) can expect to read a wide range of files and display the contents without breaking. (Avogadro is much more than a visualizer, and with the semantics of CML embedded is likely to become a major resource in compchem and it has an “intelligent” approach to the content.)
As always we set ourselves a very ambitious target – to create a searchable repository that could consume the output of any semantified code. We’ve currently written lossless parsers for NWChem, MOLCas, Turbomole, Gaussian, Dalton, and GAMESS-US. The parser technology is such that it’s a few days to write one for a mainstream code and the results is lossless. If anyone is interested, consults us and we’ll show the way forward – it will add considerably to the public value of your output
We are also developing tools for solid state, with Quantum Espresso as the leading code. It was exciting to see the solid state group starting to form their own convention and repurposing CML for that.
I can’t thank everyone but …
- Jens Thomas for his enormous enthusiasm and vision. He has made this happen.
- Paul Sherwood and colleagues at Daresbury for positive involvement and funding.
- Pablo Echenique for his continued enthusiasm and support while not being able to travel to Daresbury. Jorge Estrada making the second part of the Spanish team
- My colleagues Sam, Joe and Weerapong. They go along with the crazy deadlines and make them work
- Marcus Hanwell for the great belief and the enormous support from Avogadro
And everyone else at the meeting. Whether or not you believed in all the windmills all the time you took part with enthusiasm and made it work.
So – after 16 years of developing CML this is the first time I have felt that it has become a real fact. People are using it for many different parts of chemistry and continue to advocate it. It’s complex, but not too complex to manage. It’s large, but not too large to implement. It has created its own democracy – which is very rare in chemistry and confined to the Blue Obelisk and a few other areas. The democracy means it moves fast. Faster than many other developments. And because it is continually questioned and open to new ideas the rigour of design is encouraged. It’s now captured the spirit of many mainstream ICT activities.
The next phase is technically trivial. It’s simply to get people to publish their log-files (the output of the calculations). After they have published the work, if necessary. If people did this with it would save millions of dollars of repeated calculations. It would enhance teaching and learning. It would enhance quality. You don’t have to write any code – simply make your files available.
Publishers, just require that authors of computational chemistry results make their files available. Quixote will do the rest.
Funders , just require that authors of computational chemistry results make their files available. Quixote will do the rest.
Universities and research labs, just require that authors of computational chemistry results make their files available. Quixote will do the rest. And your institutions will get greater visibility
It is now unstoppable… the only unknown is the rate of growth.