Compchem Quixote Workshop: to create the “first Open distributed repository for electronic simulations”

#quixote #xmlcml

I am delighted to announce the first Quixote Conference http://quixote.wikispot.org/First_Quixote_Conference_-_22nd-23rd_March_2010 at Daresbury Laboratory. This is the outcome of all the work put in by the Quixote community and is

A meeting to create the first Open distributed repository for electronic simulations

To explain a bit further. There are zillions – probably at least 10 million – computational chemistry calculations “published” each year (i.e referred to in scholarly publications) but almost no data is publicly available. Comp chem is 50+ years old, it’s very well understood, and almost no data is published. [There are some collections – including our own DSpace @ cam – or log files and derived data but it’s << 1% of what is published].

So Quixote intends to change this. We’ve been building the components, and now we intend to bolt them together. Essentially we have the following components:

  • Lensfield/Quixote – a tool to crawl your disks for compchem
  • JUMBOConverters – tools to transform the legacy files into XML-CML
  • CMLDictionary – a formal semantic method of describing the data
  • Chempound – a repository for indexed numeric and chemical data
  • Avogadro – a flexible GUI for navigating and transforming the system.
  • CompChemPub [vapourware] a tool to collect the results into a scholarly publication. To be created during the coming hackfest

The strategy is on a per-code basis. So let’s say your code is called Foochem. Its input is something like:

  • Molecular/crystal/surface atoms and coordinates
  • Basis sets and/or pseudopotentials
  • Parameterisation (level of theory, accuracy, etc.)
  • Physical constraints (pressure, field, etc.)
  • Strategy – what to calculate (energies, frequencies, wavefunctions…) and how to do it (algorithms)

And its output should retain all this and also include:

  • History of calculation (e.g. optimisation)
  • Final calculated coordinates and electronic properties
  • Other properties

To create this information needs (at least):

  • A Foochem dictionary
  • A Foochem output parser
  • (possibly) a Foochem input parser
  • Some Foochem examples

So we are inviting experts in various codes. So far we have NWChem, QuantumEspresso, GAMESS-UK, GAMESS-US, DALTON, Turbomole, Gaussian. We hope to create dictionaries for them, parsers and documentation. This does not need to be complete – the parsers and XML-CML can be expanded when people have time and energy or a really boring cricket match.

It’s a hands-on meeting. You need to be reasonably proficient at running the software (i.e. you may need a few days’ in advance). If anyone is interested, let Jens Thomas know. I think there are some places but it’s up to Jens and colleagues at STFC.

Lots of thanks to lots of people.

 

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *