What if chemistry data had been open?

When people ask me for examples of why Open Data matters, I always refer them to the Openness of bioscience – or at least those parts close to the Central Dogma (DNA-> RNA->Protein->Structure->Function). All those parts are Open. You can get any information that can conceivably be shoehorned into some formal description (som can’t but most can). Now Cameron Neylon has done a useful review of what we would have missed if the progenitors of bioinformatics had gone done the closed route (it nearly happened at the time of ESTs).

Picture this…



History the first…
[…hisorical details snipped…]Imagine a world with no GenBank, no PDB, no SwissProt, and no culture growing out of these of publically funded freely available databases of biological information like Brenda, KEGG, etc etc. Would we still be living in the 90s, the 80s, or even the 70s compared to where we have got to?
History the second…

In the second half of the twentieth century synthetic organic chemistry went through an enormous technical revolution. …
There was tremendous excitement as people realised that virtually any molecule could be made, if only the methodology could be figured out. Diseases could be expected to fall as the synthetic methodology was developed to match the advances in the biological understanding. The new biological databases were providing huge quantities of information that could aid in the targeting of synthetic approaches. However it was clear that quality control was critical and sharing of quality control data was going to make a huge difference to the rate of advance. So many new compounds were being generated that it was impossible for anyone to check on the quality and accuracy of characterisation data. So, in the early 80s, taking inspiration from the biological community a coalition of scientists, publishers, government funders, and pharmaceutical companies developed public databases of chemical characterisation data with mandatory deposition policies for any published work. Agreed data formats were a problem but relatively simple solutions were found fast enough to solve these problems….
[…]
Ok. Possibly a little utopian, but my point is this. Imagine how far behind we would be without Genbank, PDB, and without the culture of publically available databases that this embedded in the biological sciences. And now imagine how much further ahead chemical biology, organic synthesis, and drug discovery might have been with NMRBank, the Inhibitor Data Bank…

PMR:  If only. And what makes it even more poignant is that in the 1970’s the AI community developed many of their approaches round chemistry. DENDRAL, LHASA, etc. Years ahead of their time. But most AI relies on real-world knowledge and the chemists closed this and starved the efforts.
Still we now know a lot of things that do and don’t work in CompSci. So as we start to prise cjemistry data out of the silos we should be able to move very quickly…

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *