David Bradley alerted me to Chemspider, an engine which scrapes the web for information on chemical compound information and calculates properties. I blogged yesterday about what it did for sodium chloride.
I am slightly sorry to do this as I have had some acquaintance with the people involved, but I cannot let garbage science go uncommented. That is what peer-review is about. And it can be painful.
Fom the Chemspider website:
ACD/Labs Software Integrated into ChemSpider Service
Partnership brings predicted logP property values and systematic nomenclature identifiers to over 10 million chemical structures online, to support ChemZoo’s vision of creating a chemical community.
RALEIGH, North Carolina, and TORONTO, Canada, April 10, 2007—ChemZoo and Advanced Chemistry Development, Inc., (ACD/Labs) announced a collaboration that will allow integration of a number of ACD/Labs software tools to the ChemSpider service, a new online chemistry database and property prediction service provider. ACD/Labs properties will be generated and published for over 10 million chemical structures using some components of the ACD/Labs PhysChem and Nomenclature software suites.
From what I can see the spider scrapes information and passes it to the Zoo. The Zoo is filled with monkeys. (The same monkeys who are trying to write Shakespeare by hitting typewriter keys at random). The monkeys seem to be using ACD Lab software to calculate properties. ACD software has been around for several years and I suppose it gets some answers right but it doesn’t do very well on calcium carbonate. Here’s the entry:
0953 Chemical Structure CCaO3 Molecular Weight 62.02 logP -0.809 hydrogen bond donors 2
Well, they got the chemical formula right. Calcium carbonate is marble, limestone, chalk. It’s hard and doesn’t dissolve in water. You calculate its molecular weight as follows:
Ca=40 + C=12 + 3 (O=16)
and your child will tell you that it comes to 100. (I’ve missed the decimal point). The monkeys only get to 62 before they give up. I have no idea how they get this. The monkeys also tell us how many hydrogen atoms can be used to bind to other molecules. I can’t see any hydrogen atoms, nor can you but the monkeys found 2.
Why am I so angry about this? Because NIST and NMRShiftDB and Joe Townsend and Nick Day work very hard at calculating molecular properties. Volker Thome in our lab has spent years calculating the properties of calcite. They are trying to show that data quality matters. The ChemZoo monkeys are destroying the value of chemical data.
There are ways of calculating molecular properties properly – it’s hard work, takes care and only applies to certain compounds. It’s hard measuring the properties – that’s what NIST does. If the spider extracted the data from NIST without their permission then it has broken copyright. And I hope NIST gets them to remove the data. It’s harder with NMRShiftDB – as a Blue Obelisk member we make our data Open for any reasonable re-use.
But giving it to the ChemMonkeys is not reasonable. The zoo should close.