Our bottom-up infrastructure for Computational Chemistry is going very well. We have enthusiastic weekly meetings which show significant progress each week. I’m going to be demo’ing the concept next month in India (more later).
-
The parser strategy seems to be viable. I have parsed a large NWChem file and am now turning to GAMESS-US. I’ll be using this as a tutorial in parsing infrastructure and strategy if people want to join in http://quixote.wikispot.org/GAMESS-US_logfileReader . There is a two phase strategy:
- Parse raw text to XML (using CML vocabulary)
- Semantify the XML to semantic conformant CML
- Generating dictionaries. Compchem is the easiest of all subjects to create non-controversial ontologies in and by parsing (say) 10 major codes we build up an excellent picture of the syntactic discourse. We expect each code to have between 100-500 entries in the dictionary. This, just by itself, is a useful tool for users of the code. But it’s more because it allows those terms and concepts to be mapped onto other codes. That looks very feasible, much more than most disciplines
The next phase is hard grind in writing these parsers and creating dictionaries. But as we do it we get experience and we also generate better tools.
Sam Adams brought me a present from New Mexico. It’s the only private distillery (http://www.dqdistillery.com/) . It’s palatable. But I’m afraid that my family has me completely seduced on the Scotch Malt Whisky Society (http://www.smws.co.uk/).