Dictated into Arcturus
We've made great progress over the last week on the Green Chain Reaction. Our progress is all being recorded on the Etherpad provided by the Open Knowledge Foundation at thttp://okfnpad.org/solo10. Have a look! And feel free to add anything - it's very easy.
Each section here will contain an invitation to participate.
Dan and Mark have worked very hard in showing that the System Works and documenting what is necessary. I am regularly feeding new bits of code we are very close to having a system which will extract and analyse chemistry from published documents. The test bed is Acta Crystallographica E, which is open with about 10,000 reactions. The main data will come from patents, and I have been modifying David Jessop's download and analyser so that they can be distributed.
We're now looking for computer-savvy volunteers to see if the code can be widely distributed in its current form. Please volunteer by signing up on the Etherpad.
Mat Todd and Jean-Claude Bradley have already contributed material (via their links) and we will soon be analysing this in detail. Mat's posted a number of places where we might get additional content and some of these will be straightforward as it is clear that the content is Open. However a number of offers, such as from Chemspider, are formally copyrighted and we will need explicit permission from the "owners" to make them open.
We'd like to hear from anyone who has chemical reaction content that we can extract and they completely open. We particularly like to hear from publishers who would like to take part in this high-profile activity to show the value of open data. And we'd also like to hear from organisations such as government agencies who are by default make their data open.
Heather Piwowar and I are creating a series of letters to enquire of data providers whether their data is open. I hope to draft the principles today in a PantonPaper, and will blog this probably in the afternoon.
Contributions on describing and taking data as open will be particularly valuable. Feel free to join heather and myself in the ether pad. Any pointers to existing protocols and manifestos on Open data will be particularly valuable
We've had some other offers which are much appreciated. These include links to other resources, help with the Green concept, and a lot more. You may have ideas on what is possible in the next few weeks so we'd love to hear them.
Please make other suggestions and offers of help that we may not have thought of.
I'm expecting that by the end of today I will have managed to modularise the parts of David's code which will be used in this project. They will be described in the Etherpad, and reposited in several Bitbucket projects.
Also by the end of today I expect to have drafted the principles of data extraction from scientific documents on websites. Heather and I will work this so that we expect to be able to impose these to document providers such as publishers and get their early response.
I also hope to be able to spend some time on creating semantic chemistry definitions that will result from parses. I shall do this on the ether pad and will welcome contributions.