Chemical Open Source will win

Chemical software will be Open Source

This statement expresses both a simple truth (Simple Future, see WP) and an aspiration (Coloured Future Software shall be free). The latter is what I have been advocating on this blog the moral, pragmatic, utilitarian value of Open Source. The former simply states that it will happen. IOW a betting person could lay a wager.

This post is simply to convince you that the simple future is inevitable. I’ve made this claim before and been taken to task by the Closed Source chemical software manufacturers. Of course Open Source can’t be as good as us, of course volunteers can’t coordinate, of course you don’t have the developers. So why am I so confident.

There is a great deal of chemical software it ranges from Quantum-mechanics, through Molecular mechanics, to docking, property calculations, QSAR, analytical support (instruments, data), and chemical informatics. I’m addressing just the latter in detail and I agree that CompChem may be the slowest to change. But it will.

What are the forces?

  • The expectation in the community that software will be free (gratis, as in beer) and diminished budgets

  • The requirement of science that methodology should be Open and repeatable. The necessity of justifying one’s computed conclusions. Trust us, you have paid us a lot of money no longer works

  • The Open movement in general

  • The growing realisation (though not the reality yet) that software development should be an activity worthy of publication metrics

  • The increasing complexity of deployed systems, meaning that SME manufacturers simply cannot maintain such a diversity of unique products.

  • And the evangelism of the major information manufacturers (IBM, Google and now Microsoft) that their products will increasingly be Open Source and that they will benefit from this division.

And there is a particular aspect to Chemoinformatics – the software that supports the management of chemical compounds, reactions and their measured and computed properties:

There have been no new developments in the last decade

What I mean by this is that there have been no new algorithms or information management strategy to have come out of commercial chemoinformatics manufacturers. Chemical search, heuristic properties and fingerprints, molecule docking are solved problems. And advance comes from packaging, integration and parameter_tweaking/machine_learning. Only the last adds to science and since the commercial manufacturers are secretive then we can’t measure this (and I believe this to be mainly pseudoscience in its practice you can make extravagant plans without independent assessment). So the advances from the manufacturers have been engineering ease of use, deployability, interoperation with third-party software but not functionality.

So the Open Source community the Blue Obelisk is catching up. I believe that OSCAR is already the best chemical language processing tool, that OPSIN will soon be as good as any commercial name2structure parser and that OSRA will do the same for chemical images.

KNIME and Taverna are becoming de facto workflows and will continue to develop. And there are many other OS tools such as R and Weka. That are being integrated

And, when the Open Source components catch up with their commercial rivals the community will switch. Not just academia but pharma and chemical industry.

Because the growing community round each tool will mean that the tools are better. The science is better. No commercial company can accurately claim metrics for their software as there is no current way of measuring this.

So what role is there for the commercial sector?

An different but enormously exciting one. Where the companies provide the integration of Open Source components. Academics are not paid to integrate companies are. Where the open deployment of components is a service worth paying for. Where the tools start to produce better science and information that can be managed Openly better than before. How many of us have contributed to ClosedChem property calculators? Probably only those whose system was purchased and then closed. How many of us contribute to ClosedName2Structure as opposed to OPSIN. Who would publish a bug from ClosedTheoChem whose lawyers will send you a letter the next day (probably revoking your licence)? That’s a true case. And that’s science??

Last time I published claims of this sort I was challenged and responded I hope fairly. I obviously cannot review the science in closed source programs as I have to pay for them and might be sued if I benchmarked them. So it’s up to the commercial sector to justify their existence. If they make a well-argued case I might even change my analysis.

5 Responses to Chemical Open Source will win

  2. Peter…good timing on this post as I just read: With your involvement with CDK, Blue Obelisk, CML, knowledge of Knime, Taverna and so on it was interesting to see the selection of IDBS E-notebook over maybe Bioclipse and extensions/integrations to other Open tools. My experiences with IDBS have been positive and what I know of the platform is good. I saw the comment “It is a leader in semantic chemistry technologies and in the use and development of Chemical Mark-Up Language (CML), which represents the emerging global lexicon of chemical computing. The use and ongoing development of IDBS’ chemical technology platform through this R&D collaboration will encompass many of these important new areas, including new approaches to structure authoring, drawing and representation, annotation, context-rich chemical ontologies, polymer support and the application of the electronic workbook environment as a tool to enhance research excellence. ” and look forward to seeing a commercial partner integrating some of the Unilever/Cambridge tools.
    Your post seems to suggest that your choice would be different and in a few years commercial ELNs such as IDBS’ will be irrelevant? Why the choice to purchase rather than innovate? Based on our experiences of some of the Open Source cheminformatics tools the commercial vendors have many years advantages at present.

    • pm286 says:

      @ChemSpiderman Thanks.
      I differentiate between components and integrations. In general integrations are done for specific communities and purposes and I regard an ELN as an integration. Such things are often better done in the commercial sector as I hope I made clear. What I was challenging was closed source scientific components, not commercial integrations.
      Also – in case it’s unclear – and ELN and repository are different and complementary.

  3. Based on your comment I am definitely interested to hear what components you will be integrating into the ELN. I assume that you will integrate OPSIN for name to structure, CDK maybe for generation of properties/descriptors, JMOL for 3D viewing, NMRShiftDB for NMR prediction etc. This would make sense…

    • pm286 says:

      @ChemSpiderman I can’t reveal details of the project at present and it’s probably not worth trying to guess…

