It is very uncommon for commercial organizations in chemoinformatics to make any of their material Open Source. (Unlike the contributions of many IT companies – e.g. Eclipse, Netbeans, etc.) So I was very pleased to see an announcement of open Source [BSD] chemoinformatics software on the Blue Obelisk list:
SiMath is Silicos’ open source library for the manipulation of datamatrices and the subsequent mathematical and statistical modeling.SiMath tries to provide a simple C++ API that is data matrix centeredand that covers the model building procedure from data preprocessingto training and evaluation of the model. The goal is to provide alibrary that can be easily integrated into standalone applications.The rationale of SiMath is not to invent the wheel again but tointegrate available open source packages and also newly implementedalgorithms into one comprehensive library. Several well establishedlibraries exist nowadays, but they all have a different interface andwork with their own matrix representation. These tools areincorporated into SiMath and adapted such that their interface isconsistent over all tools. For instance, all clustering algorithmsare initiated by defining a set of parameters and the actualclustering is done by calling the cluster method with the data matrix.Currently, SiMath contains modules for PCA (or SVD), matrixdiscretisation, SVM training and evaluation, several clusteringalgorithms, self-organing map and several general mathematicalutilities.More information about SiMath and how to download the source code canbe found at: http://www.silicos.com/simath.htmlSilicos is a chemoinformatics-based biotechnology company empoweringvirtual screening technologies for the discovery of novel compoundsin a variety of disease areas.
This makes sense. The technology here is common to many applications and as (Hans De Winter ) says it is foolish to reinvent the wheel. This is exactly the sort of components we need in the discipline. Because they are in C++ and many of use use Java it may make sense to develop these as Web services (REST) as the message overhead is likely to be smaller than the computational cost.
The Blue Obelisk mantra – Open Data, Open Source, Open Standards welcomes contributions in any of these areas.
Having computational backends in the form of WS”s is certainly handy. Something I recently set up at IU was a variety of WS’s using R as the underlying computational engine: regression, classification, clustering, plots etc. The available services are listed at
http://www.chembiogrid.org/wiki/index.php/Web_Service_Infrastructure
(somewhere in the middle of the page)