ChemZoo properties : treat as dangerous

Chemspider has recently replied in depth to my concerns:

ChemSpider as a part of Web 2.0 – and what is that Web 2.0 anyways?
In this blog I am going to excerpt from another blog (and bolded to identify) regarding ChemSpider (based on my previous post it’s the way of the blogosphere) and it’s non Web 2.0 status since pages from the ChemSpider blog are being excerpted in the same way.

I shall tackle the Web 2.0 issues separately but here I am concerned that the material and services produced by Chemspider are likely to be seriously misused by students. Chemspider states:

May I use your service in my teaching class ?

Absolutely. We would especially like the academic community to benefit from the information available on ChemSpider.

Now students – especially those startiing their courses – are likely to accept what they read on the web. When they access a company who sells the calculation of molecular properties the assume that the molecules, calculation and metadata is of sufficient quality for them to use. They cannot be expected to have enough judgment to assume that a large number of the answers they get will be wrong, and that the definitions and explanations of the properties are wrong or unclear. While I, as an experienced scientists in chemoinformatics realise how suspect much of the material and services (not just from Chemspider) is suspect, students cannot.

Here are the “definitions” of some of the properties that Chemspider/ACD provide. The standard of description, the lack of units and metadata, and experimental constraints would be below that that an undergraduate would be expected to present: I shall pick out a few examples

PhysChem Properties (as defined by ACD/Labs):

The Partition Coefficient (LogP) is the equilibrium distribution of a solute between two liquid phases, the constant ratio of the solute’s concentration in the upper phase to its concentration in the lower phase. ACD/Labs provides acess to logP prediction through their freeware ACD/ChemSketch and their LogP addon.

PMR: The Partition Coefficient is NOT LogP, it is P.

The Distribution Coefficient (LogD) is the ratio of the amounts of solute dissolved in two immiscible liquids at equilibrium. The distribution coefficient (logD) equation accounts for all possible partition coefficients (logP) that a system can obtain. For compounds containing a single ionizable group (acid/base) there are 2 partition coefficients or a single distribution coefficient accounting for the relative concentration of each species within each of the two possible phases

PMR: these two paragraphs do not make it clear what the differences between D and P (or LogD and LogP). Moreover there is no mention that a P (or LogP) is meaningless unless the solvents are fully identified. (I assume that the non-polar phase is octanol but I cannot find this in this document or in the calculation of properties. As P is temperature dependent it is necessary to report this – but I cannot find this in the data. (I assume it is 298K but this is not mentioned)

Polar Surface Area (PSA) is the measure of how much exposed polar area any two- or three-dimensional object has.

PMR: This is a fuzzy definition – the algorithm calculates a precise quantity but this gives no indication of how this is done. Different vendors will report different values for this quantity. The impression given is that the precise definition is unimportant. Note also that when this property is first displayed to the student there are no units (we try very hard to impress on students that all numeric quantities must have units).

Surface Tension is a property of liquids arising from unbalanced molecular cohesive forces at or near the surface, as a result of which the surface tends to contract and has properties resembling those of a stretched elastic membrane

PMR: .Again a fuzzy defintion that many educators would indicate that the student did not understand surface tension.

Molar Refraction is the equation for the refractive index of a compound modified by the compound’s molecular weight and density. Also known as the Lorentz-Lorenz molar refraction.

PMR: Molar Refraction is NOT an equation. Any student writing stuff like this would get near-zero marks.

Now I appreciate that Chemspider is “beta” which means that they want to community to correct their bugs but it is not fair to encourage students to be part of it. For example if you want properties of “sodium hydride” it will draw a picture looking like:

Na+ HH2

It is clear that this is NOT a copy of the pubchem entry (which is the normal NaH) but that the Chemspider software (or the ACD software) has taken a correct formula and displayed the formula incorrectly. Verify this for yourself but do not let students near it

Chemspider will calculate properties for any compound, and many of these are meaningless. For example, try “prussian blue” and it will give a logP even though the stuff is an insoluble pigment. Now that is because the chemical formula has been represented as separated iron ions and cyanide ions. This may be useful for searching, but it unacceptable for calculating properties.

So, in summary, you cannot rely on some of the properties calculated by Chemspider. For students that means you should not rely on any.

This entry was posted in chemistry. Bookmark the permalink.

3 Responses to ChemZoo properties : treat as dangerous

  1. Pingback: ChemSpider Blog » Blog Archive » Is ChemSpider Dangerous for Students?

  2. Peter,
    lets see if ChemSpider will have added value over other services. They all have pros/cons and I personally love that they respond on their blog. They are open and willingly to consider and discuss things. This is already more than some other do.
    Sure, there are also some people/companies contributing even code to ‘Blue Obelisk’ projects. I personally think we sould here very pragmatical and constructive. I hope that any open discussion and information provides any service out there. And especially young services can benefit a lot, since their infrastructure is growing and still flexible without violating too many depending services.
    Joerg

  3. Thanks for the feedback on the definitions. I have connected with our collaborators at ACD/Labs, specifically the PhysChem product manager, and have pointed him to your comments on the blog. I will leave it to him to choose whether or not to edit the definitions or not.
    The display of the units for PSA on the initial search results page was an oversight since it is on EVERY other view of the results display so thanks for pointing it out. It was fixed within minutes of reading your blog.
    Regarding your observations about Prussian Blue and solubility. There’s a lot of misinformation out there for sure… http://ptcl.chem.ox.ac.uk/MSDS/IR/iron_III_ferrocyanide.html named as Prussian Blue and defined as soluble in water. However, I am going with the Wikipedia definition which talks about : “Soluble” Prussian Blue – Prussian Blue is insoluble, but it tends to form such small crystallites that colloids are common. These colloids act like solutions, for example they pass through fine filters. According to Dunbar and Heintz, these “soluble” forms tend toward compositions with the approximate formula KFe2Fe(CN)6
    Based on your multiple comments I am considering recalculating the properties having prefiltered and excluded compounds based on the following constraints:
    1)Exclude substances containing elements other than As,B,Br,C,Cl,F,Ge,H,I,N,O,P,Pb, S,Se,Si,Sn, the elements supported by ACD/PhysChem predictors.
    2)Only include single component substances – would resolve your issue with CaCO3 and Prussian Blue
    3)Exclude substances represented as a single atom
    4)Exclude structures containing isotopes
    5) Exclude radicals
    6)Exclude structures with a delocalized charge
    I welcome your comments….

Leave a Reply

Your email address will not be published. Required fields are marked *