Chem4Word – why semantics are necessary

I was asked to explain how Chem4Word and CML could encode ferrocene. I’ll start by using Wikipedia to give a clear and accurate picture. Sorry for the cut-and-paste mess.

WP: Ferrocene is the organometallic compound with the formula Fe(C5H5)2. It is the prototypical metallocene, a type of organometallic chemical compound consisting of two cyclopentadienyl rings bound on opposite sides of a central metal atom.

Other names dicyclopentadienyl iron
Identifiers
CAS number 102-54-5
PubChem 11985121
ChEBI 30672
InChI
IUPAC name
Other names dicyclopentadienyl iron

Very clear and tidy. By contrast the entries in Pubchem are a mess. That’s NOT Pubchem’s fault – it’s the non-semantic stuff that is sent by depositors. Again I shan’t bash the depositors too hard as they have voluntarily deposited their material – it the awful non-semantic authoring tools they use and the absence of agreed conventions.

Chem4Word aims to raise the standard. You’ll note from the entries below that the formulae for some of these structures are grotesque (10 negative charges). C4W will give authors a clear indication of the molecular formulae and charges and encourage semantic validation.

Anyway here goes. These are all the different compound IDs associated with ferrocene. I assume that all these compounds are meant to be ferrocene but their formulae are garbled by the tools – note the absurd charges. CML prevents such garbling.


Ferrotsen; Catane; FERROCENE …
Compound ID: 7611
Source: LeadScope (LS-357)
IUPAC: cyclopenta-1,3-diene; iron(2+)
MW: 186.031400 g/mol | MF: C10H10Fe


FERROCENE; Bis(.eta.-cyclopentadienyl) iron
Compound ID: 11985121
Source: NIST Chemistry WebBook (3993653726)
IUPAC: cyclopenta-1,3-diene; cyclopentane; iron
MW: 186.031400 g/mol | MF: C10H10Fe-6


FERROCENE; Di(cyclopentadienyl)iron; Bis(cyclopentadienyl)iron …
Compound ID: 10219726
Source: Sigma-Aldrich (F408_ALDRICH)
IUPAC: cyclopentane; iron
MW: 186.031400 g/mol | MF: C10H10Fe


FERROCENE
Compound ID: 504306
Source: NIST Chemistry WebBook (1113374621)
IUPAC: cyclopenta-1,3-diene; iron(2+)
MW: 186.031400 g/mol | MF: C10H10Fe


Ferrotsen; FERROCENE; Dicyclopentadienyl iron …
Compound ID: 24196050
Source: DTP/NCI (209798)
IUPAC: cyclopenta-1,3-diene; iron
MW: 177.967880 g/mol | MF: C10H2Fe-10
Tested in BioAssays: All: 3, Active: 0; BioActivity Analysis


Ferrotsen; FERROCENE; Dicyclopentadienyl iron …
Compound ID: 5150118
Source: DTP/NCI (44012)
IUPAC: cyclopenta-1,3-diene; iron(2+)
MW: 177.967880 g/mol | MF: C10H2Fe-8
Tested in BioAssays: All: 1, Active: 0; BioActivity Analysis


This entry was posted in "virtual communities", Uncategorized, XML. Bookmark the permalink.

One Response to Chem4Word – why semantics are necessary

  1. Peter..I was interested to see how Ferrocene was handled in CrystalEye and visited:http://wwmm.ch.cam.ac.uk/crystaleye/summary/acta/b/1997/06-00/data/cr0517/cr0517sup1_cr0517b/cr0517sup1_cr0517b.cif.summary.html
    I noticed that there are SMILES and InChIs for ferrocene listed below…any idea why there are so many spaces inserted into what’s displayed on the CrystalEye pages…it is not unique to this record.
    InChI: InChI=1/2C5H5.Fe/c2*1- 2- 4- 5- 3- 1;/h2*1- 5H;/q2*- 1;
    SMILES: [H]C=12(C3([H]) (=C9([H]) ([C-]10([H]) (C=1([H]) [Fe]235678910(C=4([H]) (C8([H]) (=C7([H]) ([C-]6([H]) (C=45([H]) ) ) ) ) ) ) ) ) )
    The InChI of course cannot represent the Ferrocene appropriately but this was discussed in Salt Lake City at the InChI meeting (as a topic with other organometallics).

Leave a Reply

Your email address will not be published. Required fields are marked *