I’ve now asked the Royal Society of Chemistry for permission to extract factual information from the journals to which Cambridge subscribes. For background for non-chemists, the RSC has supported our research in information mining through funding summer students, and in kind for the Sciborg (EPSRC) and the CheTA (JISC) projects. For example our Experimental Data Checker (OSCAR2) is hosted on the RSC website and very widely used for checking the quality of chemical papers before and after publication. Chemspider is a novel, volunteer populated, resource for collecting and validating chemical information (http://www.chemspider.com )
We are preparing a response to the Hargreaves report about information mining from scientific publications. As you know we have developed a world class set of Open Source tools for chemical information extraction, some of them with your support – for which public thanks!
We are now in the position where we can extract factual chemical information from the full text of articles with high precision and recall (OPSIN accuracy is > 99.5% and recall > 95%) and with great speed and cost-effectiveness. The University of Cambridge is a subscriber to RSC journals and we would like to begin to extract information on a systematic basis for Open scientific research. We don’t need technical help or permission from the RSC. We have copied Cambridge University Library staff.
This mail is to ask your assurance that we can do this without (a) legal/contractual barriers from RSC and (b) that we shall not be cut off by RSC robots (unfortunately this happened some years ago). We wish to start immediately to show Hargreaves the benefit of information mining – they have a deadline for 2012-03-21 so we would like your agreement by 2012-03-15. All we require is:
YES: you may mine and publish factual information from RSC journals without additional payment and without restriction from legal and technical barriers.
I hope you can trust me to act responsibly on not violating copyright and being considerate to your robots. I have set out more details and a non-exhaustive illustration of facts in /pmr/2012/03/04/information-mining-and-hargreaves-i-set-out-the-absolute-rights-for-readers-non-negotiable .
Unfortunately any other reply than YES by 2012-03-15 will be regarded as unacceptable for the purposes of Hargreaves.
You will note that we are also approaching other major publishers of chemistry. Elsevier has already publicly said we can mine their content for research and we’ll be publishing the facts under an Open licence. This means that Chemspider (Tony Williams copied) can immediately use all this information in the Chemspider resource.
One of the immediate benefits is our collaboration with Mat Todd (Sydney) who is running an Open project for discovering novel antimalarials. The RSC publishes much high-quality research in (for example) its journal “Organic and Biomolecular Chemistry” and Mat will be able to scan the factual list of factual compounds and factual data for leads to develop antimalarials.