Interaction with Chemspider

Antony Williams (Chemspiderman) has posted a useful comment on this blog under ( I am still DELIGHTED with Chemspider – May 10th, 2008) – [I sometimes have trouble with permalinks to comments]. I’ll pick up some points and reply, but first to say that there is a loose and hopefully ongoing synergy between the our site which is not a definite project but which may from time to time turn into one.

ChemSpiderMan Says:
May 13th, 2008 at 5:56 am e
Peter, I thank you for the applause regarding our implementation of licensing on ChemSpider. I also acknowledge and accept the apology you have issued publicly to John Wilbanks, ChemSpider and members of the advisory board.

PMR: This episode appears to have had a silver lining in places and I’ll blog that separately under Open Data

I believe that some good has likely come out of the conversations over the weekend – maybe a little more confusion, maybe a little more clarification (especially around John’s “data in the public domain” comments) and maybe a few more relationships. This latter part is especially of interest to me as we work on creating a community for chemists.
Now to the outcome for ChemSpider. ChemSpider went live in March of last year with a “who knows where it will go” approach. From the moment we went live you have paid attention. However, rarely has this been with any sense of support but, rather, a framework of negativity. You have criticized our science and our intent. You have projected your judgments as truths. I have addressed these judgments many times but rarely with acknowledgment from your side. It has been a lot of work for both of us. To be clear, I have judged your efforts around Open Notebook Science for NMR similarly.

PMR: I am putting the past behind us.

ChemSpider appears to have a center spotlight now in terms of licensing and Open Data. I acknowledge these are significant parts of YOUR agenda and a key part of what you have worked on for many years. I judge your other agendas to be Open Access, Semantic Web and associated technologies. I honor your work in these areas and feel you have contributed and will continue to contribute to the ongoing shifts of Open science prevailing at present. Thank you.
Our agenda for ChemSpider is different. We are building a community for chemists (Notice the recent shift from the original vision “Building a Structure Centric Community for Chemists” as we expand out of structures only.) At present, we are doing what we can to support the needs of chemists researching structure-based information. We are integrating information. We are more than a “linkbase”. We are actively supporting Open Notebook Science. We ARE listening to our users, the community, our collaborators and our advisory group.We have delivered a valuable solution in the past year with no cost to the users, to the tax-payers, with no grants and based on the hard work of a small dedicated team of volunteers only.
[…]I believe that you judge our efforts to be in conflict with those of your WorldWide Molecular Matrix but I doubt that is true.

PMR: I don’t think they are in conflict. The WWMM is not a fixed concept and evolves. Indeed part of its philosophy was that it was a peer-to-peer system with no centre. I still believe that to be true. What unites the components is a shared sense of purpose and a shared technological infrastructure.

I will respond shortly to some historical posts regarding your call for a structure collection for your eChemistry Project with Microsoft. We are willing to help and I am open to a discussion should you wish to collaborate. I am working with the Wikipedia:Chemistry team to build a validated SDF file for the public domain and we can make this available to you.

PMR: The current position is that we would like to collect a set of common compounds which have high-quality data and which are likely to emanate from trusted sources. Wikipedia is one, Pubchem is another, MSDS are another and various other Open web pages. We wish to make sure that this information is consistent (that is not necessarily the same as correct) and this is not easy. We are very happy to use Chemspider as one conduit for this, but like all sources we have to be sure of the quality. Chemspider adds quality through human checking and it’s value to have an audit trail. Some of the sources (such as ICSC MSDS are much worse than we thought – despite the fact that it claims to be “peer-reviewed” there is a significant percentage of identifiable errors – e.g. molecular masses do not match formulae and formulae do not match names. We hope to assemble information from a number of sources and use RDF to check the consistency (it can generally never check correctness).
There is a general problem with robotic aggregation of information – it can highlight agreement or disagreement but it can also introduce noise. Thus Pubchem has a great deal of noise and there is no simple robotic means of removing it. Indeed (though I can’t find it) I think Shannon has a theorem that proves that machines cannot guarantee the correctness of information. In a similar vein, the interpretation of names can often add significant noise, not just because of the ambiguity of interpretation but also through generic use and metonymy. Peter Corbett uses the example of “a pyridine” as almost certainly meaning something other than C5H5N. I’ll probably write more later.
We are preparing our CrystalEye data in a form that it can be reused by you and others. It’s harder than it looks – partly because crystallography and chemistry are not the same, and partly because here is no system of unique identifiers in the original data. But Jim has or will have tools to access it.
Best

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *