Several threads come together to confirm we are seeing a change in the external face of scientific computing. Not what goes on inside a program, but what can be seen from the outside. Within simple limits what goes on inside need not affect what is visible. The natural way now for a program to interface with other programs and with humans is to use a mixture of XML and RDF. XML provides a voculabulary and a simple grammar; RDF provides the logic of the data and application.
The COSTD37 group has just met in Berlin (I blogged the last meeting - COST D37 Meeting in Rome) COST is about interoerability in Comp Chem and it's proceeding by collaorative work to fit XML/CML into FORTRAN programs - at present Dalton and Vamp. We do this by exchange visits paid by COST, wo we are looking forward to having visitors in Cambridge shortly.
It coincided roughly with Toby White's session at NeSC in Edinburgh on how to fit XML/CML into FORTRAN using his FoX library. I look forward to hearing how he got on.
And then, on Friday, we had a group meeting including outside visitors where the theme was RDF. I was very impressed by what the various members of the group had got up to - five or six mini-presentations. Molecular repositories, chemical synthesis, polymers, ontologies, natural language and term extraction. Andrew Walkingshaw showed the power of Golem which combines XPath with RDF to make a very powerful search tool. We are grateful to Talis for making their RDF engine available and when I have some hard URLs I'll blog how this works.
The main message is that the new technolgies work. Certainly well enough to support collections in the order of 100,000 objects with many triples (Andrew had ca 10 megatriples). We are also making great progress in extracting chemistry out of free text (PDF is still awful, so please let's have Word, or even better XHTML and XML). Or LaTeX. But in any case most of the toolset is now well prototyped. More later...