“Chemical software will be Open Source”
This statement expresses both a simple truth (Simple Future, see WP) and an aspiration (Coloured Future – Software shall be free). The latter is what I have been advocating on this blog – the moral, pragmatic, utilitarian value of Open Source. The former simply states that it will happen. IOW a betting person could lay a wager.
This post is simply to convince you that the simple future is inevitable. I’ve made this claim before and been taken to task by the Closed Source chemical software manufacturers. “Of course Open Source can’t be as good as us, of course volunteers can’t coordinate, of course you don’t have the developers”. So why am I so confident.
There is a great deal of chemical software – it ranges from Quantum-mechanics, through Molecular mechanics, to docking, property calculations, “QSAR”, analytical support (instruments, data), and chemical informatics. I’m addressing just the latter in detail and I agree that CompChem may be the slowest to change. But it will.
What are the forces?
The expectation in the community that software will be free (gratis, as in beer) and diminished budgets
The requirement of science that methodology should be Open and repeatable. The necessity of justifying one’s computed conclusions. “Trust us, you have paid us a lot of money” no longer works
The Open movement in general
The growing realisation (though not the reality yet) that software development should be an activity worthy of “publication” metrics
The increasing complexity of deployed systems, meaning that SME manufacturers simply cannot maintain such a diversity of unique products.
And the evangelism of the major information manufacturers (IBM, Google and now Microsoft) that their products will increasingly be Open Source and that they will benefit from this division.
And there is a particular aspect to “Chemoinformatics” – the software that supports the management of chemical compounds, reactions and their measured and computed properties:
There have been no new developments in the last decade
What I mean by this is that there have been no new algorithms or information management strategy to have come out of commercial chemoinformatics manufacturers. Chemical search, heuristic properties and fingerprints, molecule docking are “solved” problems. And advance comes from packaging, integration and parameter_tweaking/machine_learning. Only the last adds to science and since the commercial manufacturers are secretive then we can’t measure this (and I believe this to be mainly pseudoscience in its practice – you can make extravagant plans without independent assessment). So the advances from the manufacturers have been engineering – ease of use, deployability, interoperation with third-party software – but not functionality.
So the Open Source community – the Blue Obelisk – is catching up. I believe that OSCAR is already the best chemical language processing tool, that OPSIN will soon be as good as any commercial name2structure parser and that OSRA will do the same for chemical images.
KNIME and Taverna are becoming de facto workflows and will continue to develop. And there are many other OS tools such as R and Weka. That are being integrated
And, when the Open Source components catch up with their commercial rivals the community will switch. Not just academia but pharma and chemical industry.
Because the growing community round each tool will mean that the tools are better. The science is better. No commercial company can accurately claim metrics for their software as there is no current way of measuring this.
So what role is there for the commercial sector?
An different but enormously exciting one. Where the companies provide the integration of Open Source components. Academics are not paid to integrate – companies are. Where the open deployment of components is a service worth paying for. Where the tools start to produce better science and information that can be managed – Openly – better than before. How many of us have contributed to ClosedChem property calculators? Probably only those whose system was purchased and then closed. How many of us contribute to ClosedName2Structure as opposed to OPSIN. Who would publish a bug from ClosedTheoChem whose lawyers will send you a letter the next day (probably revoking your licence)? That’s a true case. And that’s science??
Last time I published claims of this sort I was challenged and responded – I hope – fairly. I obviously cannot review the science in closed source programs as I have to pay for them and might be sued if I benchmarked them. So it’s up to the commercial sector to justify their existence. If they make a well-argued case I might even change my analysis.