Quality is emerging in chemical software

An unplanned but very useful discussion on software quality has developed. In response to a remark that I made that there was no tradition of quality in the chemical software industry, Zsolt Zsoldos (ZZ) has responded carefully and at length and I’ll try to answer carefully). The discussion pings between the two blogs but I’ll copy major chunks.
Before I start I’ll re-emphasize what I said and make clear what I didn’t say. I believe the chemical software industry is useful, produces useful products, and that the customers value those products. What concerns me is public attention to quality.
I’m also only talking about chemistry. The debate has been widened to word processing and document formats. I’m not involved in developing these (thought I work with them) and the general intention is that they work that I and colleagues do is Open. I do not have a religious aversion to closed source products – in word processing or chemistry, but my point is that the quality of closed products may be difficult to measure.
By a tradition of quality I mean that there is a communal understanding that quality matters. Although quality is a wide term it is often difficult to discuss unless it is measured.

ZZ: Quality in chemical software – the debate continues

Peter Murray Rust has responded to my previous blog post and has raised some important points to which I have to respond, see comments section by section:

Quality in chemical software – a debate

PMR: ButSymBioSys Blog has replied to my post about unit testing in a long and thoughtful post. I don’t know who the individual is but the company sells a number of chemical software packages, a lot of which I recognize from Peter Johnson’s research group at Leeds.
Let me introduce myself: I am Zsolt Zsoldos, Chief Scientific/Technical Officer at SimBioSys. As Peter MR has recognised correctly, some of the software we market has been developed in Peter Johnson’s research group at Leeds, including the Sprout de novo design software which was my PhD project and Peter Johnson was my supervisor, and he is a scientific adviser and a director on the board of SimBioSys. There are a number of publications listed here covering my post-PhD work at SimBioSys as well as various presentations I gave at conferences, just to give some background on my work.
PMR:
I’m confining my remarks to “chemoinformatics” software. I exclude quantum mechanics programs (which take considerable care to publish results and test against competitors) and instrumental software (such as for crystal structure determination and NMR. Any software which comes up against reality has to make sure it’s got the right answers as far as possible. But chemoinformatics largely computes non-observables.
Reproducibility of results and robustness is not the whole story of quality. There are tens of thousands of docking and QSAR studies done each year and many of them are published. Are they reproducible? I expect that if a different researcher in a different institution with different software ran the “same” calculation they would get different results.
I fail to see how the “tens of thousands” of docking studies considered to compute “non-observables”, when we have tens of thousands of X-ray crystal structures to compare against. How is that less of a reality to come up against than quantum mechanics ? There are experimentally measured binding affinities to compare scoring results against. What better metric does QM has ? There is no exact mathematical solution to the Schrodinger wave functions, so all QM software computes approximations and there is no absolute benchmark point to compare against, because we cannot compute the exact solutions.

PMR: ZZ addresses this below in reporting a competition and I’ll continue there

Are the docking and QSAR study results reproducible ? With eHiTS and LASSO, the answer is definitely YES! I understand that many tools on the docking/QSAR market use stochastic (read random) methods and therefore their results are inherently unreproducible. Again, I can only speak with authority about our own software, which uses strictly deterministic and reproducible techniques. So if a different researcher in a different location runs our software on the same input they will get the same result. However, I do not see how one could run the “same calculation” using a different software. By definition, if you are using a different software (which embodies the calculation) then you are not running the same calculation. I can assure you the same is true for QM software as well, for the simple floating point error reasons I have explained in a previous blog post. So any different QM implementation will necessarily involve computation steps in different orders (as simple as summation in different order will suffice) and therefore get slightly different results.

PMR: Leaving aside the stochastic aspect – which we agree on (and which makes quality assessment much harder) my concern is not whether a given calculation is reproducible when confined to a manufacturers platform, but whether the results have been assessed as meaningful. Now I agree that this is not easy, but unless the manufacturers develop interoperable standards then the quality of the result is only assessable by public assessment, requiring standard data sets and standard results. I gave the example of “(total) polar surface area” which should, in principle, be computable reproducibly by all manufacturers. But only if it is defined in a manner that all agree upon. Otherwise we have as many different values as there are manufacturers. And I would content that – unless each has a clear defintions of the lagorithm and the proerty calculated – this is a lack of quality.
As an example from another field there is a standard way for all organizations to calculate their carbon footprint – AMEE. The same should be true for polar surface area.

PMR:
Which manufacturers publish the source code of their algorithms? Without this the user depends completely on trust in the manufacturer.
Hmmm, very good point. Let me see, does Microsoft publish their source code ? No. Then why do they have over 95% market share ? They must be very trust-worthy, right ? Then why are they facing anti-trust trials in US, Europe and Japan. Perhaps my example is off-topic and off-target, since PMR advocates open source over closed proprietary software and standard, like OpenOffice over MS Office and ODF over OOXML ? Nope, those links prove the exact opposite with statements like:
PMR:
The reason I currently like OOXML is that we can make it work and that we have material in Word that we can use.
…
My worry about Open Office (which emits ODT) is that I don’t yet believe that has reached a state where I could evangelize it without it falling over or being too difficult to install.

PMR: This example is out of scope, if only because we are talking about computational software. But I can and will answer the other points – briefly here, and in more detail later. We work with Word because that is the only useful source of chemical documents. If we find we can reliably convert OOXML to ODT we certainly will – we have some funding in that area. And, in the time we have started, MS have a plugin which is described as emitting ODF. We shall certainly see how it behaves.

So, let’s just agree that if something is open source that does not automatically guarantee good quality, and on the other hand, it is also possible to have good quality software that is proprietary. Although, I definitely see and acknowledge the quality values in open source, but in my opinion the open source model requires a critical mass (in terms of number of developers and users) to achieve the “any bug is shallow for many eyes” state of linux. Whether the user and developer base has reached that level for chemistry software is an interesting question worthy of its own debate. Let’s continue with our current debate:

PMR: I have not – and will not – claim that the Open Source movement in chemistry is of higher quality than closed source. I said there was no tradition of quality. As a result of your post I will moderate this statement slightly.

PMR:
Many communities have annual software and data competitions. They use standard data sets and different groups have to predict observables. Examples are protein structure and crystal structures. In text-mining and information retrieval there are major competitions. They rely on standard data sets (”gold standards”) against which everyone can test their software.
But in chemical software these type of standards are rare. If companies feel strongly about quality they should be doing something publicly. Developing test cases. Collaborating on the publication of Open Standard data. Creating Gold Standards. Developing Ontologies – if we don’t agree what quantity we are calculating then we are likely to get different answers.
Yes, indeed many communities have annual software competitions, including the docking community: for example, the SAMPL competition by OpenEye which the Bio-IT World has reported about, or the CASP docking competition as published by Lang et al. J Biomol Screen.2005; 10: 649-652. As for standard benchmarking data, how about GOLD validation set, or the more recent Astex diverse validation set specifically designed to be a high quality benchmark set for docking, published as:

Diverse, High-Quality Test Set for the Validation of Protein-Ligand Docking Performance.
M. J. Hartshorn, M. L. Verdonk, G. Chessari, S. C. Brewerton, W. T. M. Mooij, P. N. Mortenson, C. W. Murray
J. Med. Chem., 50, 726-741, 2007.
[DOI:10.1021/jm061277y]

For binding energy estimation we have the PDB-bind database, and for enrichment studies the DUD data set at docking.org. As for community based collaboration I have personally participated (among many others from the industry and academia) in the eChemInfo “Virtual screening and docking – comparative methodology and best practice” workshop last year at Bryn Mawr College, Philadelphia. A recent special issue of the Journal of Computer-Aided Molecular Design (Vol 22, Num 3-4 March/April 2008 131-266) has been devoted to “Recommendations for Evaluation of Computational Methods for Docking and Ligand-based Modeling”. As demonstrated by these links, it is unfair to say that standards, public data and collaboration do not exist in this area.

PMR: I agree this, but note that many of these are very recent. So I would be prepared to say that in certain fields a tradition of quality metrics is starting to emerge. Almost all of these relate to docking into proteins and are driven, at least in part, by the tradition of competitions in proteins such as CASP which has for many years been involved in predicting protein structure.
So I wish them well and will now exclude docking (but not QSAR) from my remarks. When there is a competition in QSAR, with open datasets, open descriptors and open algorithms (at least to the extent that in principle it is possible for a third party to implement them then I will happily accept the quality has been addressed.

ZZ

Quality is emerging in chemical software

ZZ: Quality in chemical software – the debate continues

Quality in chemical software – a debate

One Response to Quality is emerging in chemical software

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta