Microsoft eChemistry Project and molecular repositories
Some of you may have picked up from – e.g. the Open Grid Forum – that Microsoft (Tony Hey, Lee Dirks, Savas Parastatidis) have been collaborating with Carl Lagoze (Cornell) and Herbert van de Sompel (LANL) on bringing together Chemistry and OAI-ORE – the next generation of interoperable repository software. We are delighted that Microsoft has now agreed to fund this project and when Carl, Lee, Simon Coles (Soton) and I had lunch yesterday Lee said I could publicly blog this. (There are contractual details to be settled on various sites). In brief – Tony Hey was the architect of the UK eScience program and then moved to Microsoft Redmond where he has been developing approaches to Open Science (not sure if this is the correct term but it gives the idea) – for example it includes Open Access and permits/encourages Open Source in the project. Carl and Herbert developed the OAI-PMH protocol for repositories which allows exposure of metadata for harvesters. They have now developed ORE – Object Re-use and Exchange – which sees the future as composed of a large number of interoperating repositories rather than monolithic databases (I am on the advisory board of ORE). There are 7-8 partmers in the program – MS, PubChem, Cornell, LANL, Lee Giles (PSU), Soton, Indiana and Cambridge. This is a really exciting development as we shall be able to create a number of well-populated molecular repositories with heterogeneous content (everything from crystallography to Wikipedia chemicals for example). One that we are currently developing is an RDF/CML-based repository of common chemicals – perhaps 5000 – which could serve as an amanuensis for the bench chemist or undergraduate needing reference material. CrystalEye will be in there as well and we shall also be “scraping” (ugly word) any material we can legally access. In this was we can hope to see the concept of World Wide Molecular Matrix start to emerge. Chemistry eTheses can also be reposited – we are starting to hear of universities who have mandated open theses. Chemical substructure searching across repositories will be an exciting challenge but we have a number of ideas. We shall have openings here so if you are interested let us know. More later, but to reiterate our thanks to Tony and colleagues.