I have been in Vilnius LT for nearly two weeks. I had hoped to blog every day, but have failed to do so once. This is because we are working flat out on developing Open Crystallography (for “small molecules” – i.e. non-macromolecules). I have masses to write (and will do so) but here is the summary:
Much small-molecule crystallography is effectively Closed and certainly not conformant to the OKF’s Open Definition. I’ve written about this several times earlier – in essence people don’t have facile access to enough data, code. There’s a lot of people – not just practising crystallographers – who want to change this. Crystallography is a central science (and this year is recognised as The International Year of Crystallography) it’s used in:
And much more.
Ten years ago Armel Le Bail Set up an initiative – the Crystallography Open Database (COD) – to collect and store completely Open crystallography. It’s had a lot of support in kind, and some financial support. It now has about 250,000 structures. These are being widely used. Some years ago Armel handed over the direction to Saulius Grazulis (there’s a hacek on the “z”) and I’ve been visiting Saulius and colleagues for 2 weeks.
Independently Nick Day in our group in Cambridge built an Open Database of structures (“Crystaleye” (CY)). Like so many things (e.g. Figshare) it wasn’t planned as a world-beating database. Nick wanted these structures to validate computational methods, so he thought why no collect every structure on the web. Then he thought, why not offer them to the world (http://wwmm.ch.cam.ac.uk/crystaleye/ ) and built a system wich not only exposed the data, but also calculated a huge variety of chemistry. This was possible not only because of the code we had written but the huge contributions of the http://www.blueobelisk.org community. We extensively use CDK, OpenBabel , Jmol, Avogadro and many others. This meant that Crystaleye could display over 10 million computed webpages to allow people to browse and display the chemistry.
I’ve formally shut down my group at Cambridge but continue to be active in chemistry and it would be a great pity if Crystaleye atrophied and died. Nick put many completely novel features into it. So Saulius and I planned that the two efforts would merge – COD has an emphasis on crystallography and CY ‘s is on chemistry. So they complement each other well.
In the time here we have tackled:
- Pulling the Crystaleye entries to Vilnius. Of the 250,000 10,000 were unique to CY so COD has immediately increased.
- Extracting the major chemistry routines from CY and installing them in COD-CY
- Testing the extraction of chemistry from COD-CY
- Designing novel functionality and display for the web pages
- Expanding the community that COD-CY interacts with in both directions. I’ll write more about this. COD chemistry will be a massive resource for the whole chemical community and the BlueObelisk will contribute hugely to COD-CY;
- Designing and implementing RDF for crystallography
- Turning COC-CY into one of the first small-molecule chemical resources on the LinkedOpenData Cloud.
The group here is wonderful and the potential is huge. We are seeing how Open resources can liberate thought and action in chemistry and crystallography. There’s a commitment to being part of the world community.
I’ve particularly worked with Saulius – we’ve had many days where we have literally hacked from dawn to dusk. Saulius is an ace UNIX-hacker and the infrastructure of the COD is very impressive – with a lot of Perl and shellscripts. I contracts much of the BlueObelisk software is Java and many users run on windows. So we’ve spent a lot of time making CY tools and JUMBO-converters run on the commandline. We’ve cracked the main problems and Saulius can now run Nick’s Crystaleye ideas on the COD server.
Much more later