Alpha and beta

From the discussion on ChemZoo:

  1. Antony Williams Says:
    April 25th, 2007 at 3:00 pm eHi…banana-biter from ChemZoo here… one more at the chimps tea party.
    ChemSpider uses third party components for the generation of certain properties. Passing over 10 million compounds through has shown a number of issues in the nature of the dataset and the applicability of the third party components to the diversity of the dataset. Feedback has been provided and the issues already addressed but passing 10 million structures through a series of prediction algorithms is not undertaken lightly and therefore will be performed in the near future.
    The system was released in beta form. Known bugs are posted at http://www.chemspider.com/KnownBugs.aspx. As commented on the website at this page “We know that more bugs will be identified based on the testing of our users and the fastest way to receive real stress testing is to make the system public and ask for feedback….and where necessary, deal with the fallout and potential mocking. We encourage you to report bugs to us, as you find them, at bugs@chemspider.com.
    So, thanks for the feedback. The monkeys will get back to our keyboards and address the feedback. We’d welcome the feedback through our feedback page rather than via blogs. That said…you have EVERY right to be concerned about quality..we are.
    By the way, we believe that Wikipedia is a very valuable resource. That’s why we have linked up the synonyms from ChemSPider out to Wikipedia. Repeat your search on Calcium Carbonate and click on Caltrate for example to get http://en.wikipedia.org/wiki/Caltrate.
    Clearly this type of linkage also throws up errors in the linkage to wikipedia and we’ll be optimizing shortly. In order to help build the public curation process we have put up a “Help Curate Data” link. This was always our intention…to create a chemical community around the chemical structure database…and your posting simply accelerated it. Now we hope that we’ll be joined by more monkey’s on typewriters in the ChemZoo…actually, the preference is flies on the Spider web!
    By the way, in trying to get further connection to Wikipedia there is work being done right now to enable the ChemSketch drawing package to export PNG files with embedded InChI (and why not other structure data…formula, formula weight..even logP) and then searchable directly from a drawing package or through ChemSpider. http://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Chemistry/Structure_drawing_workgroup
    As well you know with your work on the Worldwide Molecular Matrix, not everything that goes live first day is perfect…and it’s why we have declared beta.

==== PMR ===
I leave it to readers to decide whether this is alpha or beta.
There is an implication in the replies that the community is invited to help clean up ChemZoo. Now ChemZoo is a commercial company, the data are not open (at least not explicitly – can anyone download the whole data base and if so what license covers it) and the software is not open. The community is expected to do the testing for the company (I see little sign from the known bugs that the alpha test was very strict – chemical formula searches were case-insensitive and order-sensitive, for example). A great concern abou “free services” is that the companies can switch them off at a moment’s notice – this happened with Chemfinder and caused serious problems with anyone who had built – say – educational resources round it. There is, and cannot be, any promise that the ChemZoo will not close its doors. In contrast Pubchem is open – anyone – like us can download the whole data.

As well you know with your work on the Worldwide Molecular Matrix, not everything that goes live first day is perfect…and it’s why we have declared beta.

The WWMM was – and is – a research project.  We have very carefully reported our protocol and our metrics in peer-reviewed publications (not, unfortunately Open Access). A central tenet pf our work is that nothing is perfect and that it is important to measure the errors. Nick Day has done an excellent job of this in CrystalEye.

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Alpha and beta

  1. It’s interesting to see the dialogue that’s emerging here regarding the Chemspider database. One related point regarding the open nature of PubChem, is that just as with any commercial system, those in control of the purse strings and the computers running the system could remove the guarantees of access from individuals, institutions, or indeed countries, on a political wind of change.
    One could imagine bizarre homeland security issues arising through which access to PubChem is closed to whole states. Think of all those hazardous chemicals cited in the database, for instance. Economic sanctions are readily imposed on whole countries by states in the name of politics, it is easy to imagine scientists in allegedly rogue states being excluded access summarily on a politician’s say so.
    More worryingly perhaps, the database itself could be “repositioned” in the name of a fundamentalist point of view that regards scientific knowledge as somehow abhorrent. We have already witnessed the problems facing science teachers in attempting to discuss evolution rationally in certain parts of the world.
    Of course, these are unlikely scenarios, but then improbable does not mean impossible.
    db

  2. pm286 says:

    1. There is a fundamental difference between a commercial closed system like ChemSpider and an Open system like PubChem. Pubchem can be – and is frequently – downloaded in toto. If the US site were switched off then there would be many copies that could be used to continue. (Of course they wouldn’t continue to get the bio data). OTOH if chemspider were switched off then there would be no copies

  3. The curation process on ChemSpider has already started. Visit http://www.chemspider.com/RecordView.aspx?id=4911265 for an example. This issue is the SAME as that shown for Calcium Carbonate. The mass is calculated for the “primary component” only. This particular bug is noted at the Known Bugs page http://www.chemspider.com/KnownBugs.aspx.
    The bug is fixed but calculations have to be performed for over 10 million compounds.
    The intention is pass curated data back to the data source providers.

Leave a Reply

Your email address will not be published. Required fields are marked *