I have agreed to review the CrystalEye data in Chemspider and before reading this post you should read the background carefully ( CrystalEye and Chemspider). The main points are that Crystaleye was not designed to be redistributable and that the method that we were asked to use (InChI/URL/SDFile) leads to massive semantic loss and possible corruption. I have also undertaken to give an objective review and to make no judgments. This post will not endorse or criticize Chemspider per se.
I do not vouch for the accuracy of information in this post. I believe that what Chemspider had access to was connectionTable-URL pairs. It is possible to use these to download more information from our site if required.
I also stress very strongly that CrystalEye is a crystallographic site and that the relation of crystallography to chemistry is non-trivial.
I shall also assume that the CrystalEye collection in Chemspider might be discovered by someone who was not familar with the organisation and motive of the site.
So to report…After a few minutes I found the collection of CrystalEye under (this link) – I hope it is the correct place to start. It links through to a page describing CrystalEye with further links to our homepage. It describes us as:
|
|||
Organizational logo, personal photo or avatar, up to 50K in size. It will be shown on publicly accessible data source web page.
|
|
||
Yes |
I do not know what “approved” means but it does not imply endorsement or approval by ourselves. (This is a general point for all aggregrators).
The page starts with the heading (This may cause problems for some readers’ browsers as it is wide):
=======================================================
|
|
||||||||||||||||||
ID | Structure | Empirical Formula | Molecular Weight | Monoisotopic Mass, Da | LogP | ACD/LogD (pH 5.5) | ACD/LogD (pH 7.4) | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
116 | C4H9NO2 | 103.1198 | 103.063329 |
|
-3.15 | -3.14 |
=======================================================
I interpret this to mean that there are about 25000-30000 entries from CE which have been put into Chemspider. I do not know the exact number as CS can have multiple links.
The only information in this table that came from CE is the connection table (but not the depiction) and jmol/”cell” – see below. The ID is a Chemspider ID, not a CE link. The other columns are presumably generated by a program. There is nothing on the page to indicate what they mean. The average crystallographer visiting the site might conclude this had nothing to do with crystallography and leave at this stage (there is no link to CrystalEye from this page). I assume the data are computed LogP which is outside most crystallographers’ daily experience or requirements. I make no judgment on whether they are generally useful.
The “Structure” cell contained 4 links. I cannot depict them on this blog – you will have to click. The “jmol” (sic) linked to a page with 3 links “2D” “3D” and “Cell”. Jmol (sic) is a 3D program and should not be used for 2D diagrams. The 3D link did not work in Firefox 2. In IE it appeared to create a molecule in real time, without reference to the crystallography. This is very seriously misleading as a user of a crystallographic resource would expect the molecular structure displayed to be the one in the crystal. The “cell” resource brought up Jmol and displayed the same cell information as on our site. The molecule was depicted with a spurious bond – I do not know whether this is an artefact of the crystallography or Chemspider.
The link “116” takes the reader to a page which combines all the information for this compound (GABA). In this case, but not others, there is probably agreement about its identity. I will attempt to copy salient points:
====================================
|
|
|||||||||||||
Systematic Name: | 4-aminobutanoic acid | |||||||||||||
SMILES: | O=C(O)CCCN | |||||||||||||
InChI: | InChI=1/C4H9NO2/c5-3-1-2-4(6)7/h1-3,5H2,(H,6,7) | |||||||||||||
InChIKey: | BTCSSZJGUNDROE–UHFFFAOYAC |
Original Reference(s)
Gamma-aminobutyric acid (GABA) is an amino acid and the chief inhibitory neurotransmitter in the mammalian central nervous system. As such, GABA plays an important role in regulating neuronal [snipped, PMR]
====================================
The SMILES and InChI are presumably either used as the primary link for this page or computed. They are not taken from the CrystalEye site (which provides both). I do not know whether they have been verified against our site. Lower down we find:
CrystalEyeLink to Record
which does what it says – links to a page on the CE site. Some people will find the linkage of the textual information on Gaba and the links to the crystal structure useful.
Lower down the page we find:
====================================
Names and Synonyms |
Validated by Experts, Validated by Users, Non-Validated, Removed by Users, Redirected by Users, Redirect Approved by Experts
200-258-6 [EINECS/ELINCS]
Acide amino-4- butyrique [French]
butanoic acid, 4-amino-
Butyric acid, 4-amino-
g-Aminobutyric Acid
g-Amino-n-butyric Acid
Gamma aminobutyrate
gamma Aminobutyric acid
GAMMA(AMINO)-BUTYRIC ACID
gamma-Aminobutanoic acid
GAMMA-AMINO-BUTANOIC ACID
gamma-Aminobutryic acid
gamma-Aminobuttersaeure
Gamma-aminobutyric acid [JAN]
gamma-amino-n-butyric acid
omega-Aminobutyrate
Piperidate
Piperidinate
w-Aminobutyrate
.gamma.-Aminobutanoic acid
.gamma.-Aminobutyric acid
.gamma.-Amino-N-butyric acid
4-aminobutanoate
4-Aminobutanoic acid
4-aminobutyrate
4-AMINO-BUTYRATE
4-aminobutyric acid
4-amino-n-butyric acid
56-12-2 [RN]
Aminalon
AMINOBUTYRIC ACID,-4-, ALPHA
Butanoic acid, 4-amino- (9CI)
GABA
Gaballon
Gamarex
Gamastan
gamma-aminobutyrate
gamma-Aminobutyric acid
gamma-Aminobutyric acid (JAN)
gamma-Aminobutyric acid-carboxy-14C
Gammagee
Gammalon
Gammalone
Gammar
Gammasol
Gamulin
Mielogen
Mielomade
omega-Aminobutyric acid
Piperidic acid
Piperidinic acid
Reanal
w-Aminobutyric acid
4-Aminobutylate
Database ID(s) |
Validated by Experts, Validated by Users, Non-Validated, Removed by Users, Redirected by Users, Redirect Approved by Experts
A2129_SIGMA
A5835_SIGMA
A7463_SIGMA
AI3-26812
C00334
CCRIS 3721
CHEBI:16865
D00058
DF 468
DivK1c_000616
EPA Pesticide Chemical Code 030802
EU-0100005
KBio1_000616
KBio2_000429
KBio2_002997
KBio2_005565
KBio3_002190
KBioGR_001297
KBioSS_000429
Lopac-A-2129
MLS000028505
NCGC00015043-01
NCGC00024546-01
nchembio.78-comp12
NINDS_000616
NSC 27418
NSC27418
NSC32044
NSC45460
NSC51295
SMR000058285
SPBio_000996
Spectrum_000049
Spectrum2_001208
Spectrum3_001385
Spectrum4_000809
Tocris-0344
ZINC01532620
LogP: | ACD/LogP: -0.64 XLogP: -0.70 ALOGPS: -2.99 | # of Rule of 5 Violations: | 0 |
ACD/LogD (pH 5.5): | -3.15 | ACD/LogD (pH 7.4): | -3.14 |
ACD/BCF (pH 5.5): | 1 | ACD/BCF (pH 7.4): | 1 |
ACD/KOC (pH 5.5): | 1 | ACD/KOC (pH 7.4): | 1 |
#H bond acceptors: | 3 | #H bond donors: | 3 |
#Freely Rotating Bonds: | 4 | Polar Surface Area: | 29.54 Å2 |
Index of Refraction: | 1.465 | Molar Refractivity: | 25.68 cm3 |
Molar Volume: | 92.8 cm3 | Polarizability: | 10.18 10-24cm3 |
Surface Tension: | 46.2 dyne/cm | Density: | 1.11 g/cm3 |
Flash Point: | 103.8 °C | Enthalpy of Vaporization: | 53.43 kJ/mol |
Boiling Point: | 248 °C at 760 mmHg | Vapour Pressure: | 0.00798 m |
====================================
PMR: I will have more to say on names and synonyms later but for those here I have no particular comment.
Readers should make their own judgment about the value of predicted properties. They should note that GABA is a solid at room temperature and so the concept of surface tension and several other properties is irrelevant. I do not know whether the other properties refer to the solid or liquid states but personally I would not use them for anything. I also observe that a machine reading this page (and even some humans) could easily not notice that the properties were not observed.
In general – apart from the prediction of properties – the aggregation provided for this compound is probably useful to many people though probably not for mainstream crystallographers. It does, however, require a great deal of expert judgment to determine what properties are useful and which are seriously misleading. I would not, for example, recommend its use in undergraduate teaching.
I shall comment on two other entries later and in any case it’s a good point to break as this blog struggles with cut and paste.
In regards to the loss of stereochemistry our initial investigations (to be confirmed) suggest that one of the software tools we use has generated the problem. Contrary to your assertion that Open SOurce software will replace commercial software (http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1130) and is of higher quality (and in the future it maybe!) the evidence at present is we have a long way to go.
We changed our processes to use Open Source software and, assuming no user error, has resulted in the stereo issues you are commenting on. I’ll comment on this and your many other posts when I come back from traveling starting tomorrow. Thanks for the feedback.
Just a user tip for WordPress “in any case it’s a good point to break as this blog struggles with cut and paste”. You can easily grab screengrabs and upload the image and link back either to the record view or simply a fullscreen image for the users to review. It’s a lot more attractive and gives less headaches than your approach I’d suggest
(1) Thanks – we should be able to make useful mutual progress
(2) I used to be able to do this but there are local reasons in the
way access is set up here that prevents my uploading images