CML – semantics for pi-bonds

Rich Apodaca has asked how CML represents ferrocene. As there is no communal agreement on how to do this, CML has to support all possible current mainstream representations (the resolution of these is not a semantic, but ontological task). The remaining task is to represent sketch (b) in Metallome’s blog post.

The CML schema supports this through defining (i) a pi-bonded system and (ii) a bond from one or more atoms to this system. The schema () asserts:

       <xsd:attributeGroup ref="atomRefs">
            <xsd:annotation>
                <xsd:documentation>
                    <h:div class="specific">This is designed for
                    multicentre bonds (as in delocalised systems or electron-deficient
                    centres. The semantics are experimental at this stage.
                    As an example, a B-H-B bond might be described as
                 <bond atomRefs="b1 h2 b2"/.</h:div>
                </xsd:documentation>
            </xsd:annotation>
        </xsd:attributeGroup>
        <xsd:attributeGroup ref="bondRefs">
            <xsd:annotation>
                <xsd:documentation>
                 <h:div class="specific">This is designed for pi-bonds
                 and other systems where formal valence bonds are not drawn to
                 atoms. The semantics are experimental at this stage. As an example,
                 a Pt-|| bond (as the Pt-ethene bond in Zeise's salt) might
                 be described as <bond atomRefs="pt1" bondRefs="b32"/.</h:div>
                </xsd:documentation>
            </xsd:annotation>
        </xsd:attributeGroup>

So we’ll define a pi-bond system for atoms a1,a2,a3,a4,a5 and another for atoms a6,a7,a8,a9,a10:

<bond id="b12345" atomRefs="a1 a2 a3 a4 a5"/>
<bond id="b678910" atomRefs="a6 a7 a8 a9 a10"/>

and then bond the Fe (a0) to each separately:

<bond id="bpi1" atomRefs="a0" bondRefs="b12345"/>
<bond id="bpi2" atomRefs="a0" bondRefs="b678910"/>

Note how the use of pointers (refs) is a fundamental part of CML and makes much of the semantics tractable. Put it all together and we get:

<molecule id="mol123456789" title="ferrocene"
  xmlns='http://www.xml-cml.org/schema'>
  <formula concise="C 10 H 10 Fe 1" inline="Fe(C_5_H_5)_2_"/>
    <atomArray>
      <atom id="a0" elementType="Fe"/>
      <atom id="a1" elementType="C"/>
      <atom id="a2" elementType="C"/>
      <atom id="a3" elementType="C"/>
      <atom id="a4" elementType="C"/>
      <atom id="a5" elementType="C"/>
      <atom id="a6" elementType="H"/>
      <atom id="a7" elementType="H"/>
      <atom id="a8" elementType="H"/>
      <atom id="a9" elementType="H"/>
      <atom id="a10" elementType="H"/>
      <atom id="a11" elementType="C"/>
      <atom id="a12" elementType="C"/>
      <atom id="a13" elementType="C"/>
      <atom id="a14" elementType="C"/>
      <atom id="a15" elementType="C"/>
      <atom id="a16" elementType="H"/>
      <atom id="a17" elementType="H"/>
      <atom id="a18" elementType="H"/>
      <atom id="a19" elementType="H"/>
      <atom id="a20" elementType="H"/>
    </atomArray>
    <bondArray>
      <bond id="a1_a2" atomRefs2="a1 a2"/>
      <bond id="a2_a3" atomRefs2="a2 a3"/>
      <bond id="a3_a4" atomRefs2="a3 a4"/>
      <bond id="a4_a5" atomRefs2="a4 a5"/>
      <bond id="a5_a1" atomRefs2="a5 a1"/>
      <bond id="a1_a6" atomRefs2="a1 a6"/>
      <bond id="a2_a7" atomRefs2="a2 a7"/>
      <bond id="a3_a8" atomRefs2="a3 a8"/>
      <bond id="a4_a9" atomRefs2="a4 a9"/>
      <bond id="a5_a10" atomRefs2="a5 a10"/>
      <bond id="a11_a12" atomRefs2="a11 a12"/>
      <bond id="a12_a13" atomRefs2="a12 a13"/>
      <bond id="a13_a14" atomRefs2="a13 a14"/>
      <bond id="a14_a15" atomRefs2="a14 a15"/>
      <bond id="a15_a11" atomRefs2="a15 a11"/>
      <bond id="a11_a16" atomRefs2="a11 a16"/>
      <bond id="a12_a17" atomRefs2="a12 a17"/>
      <bond id="a13_a18" atomRefs2="a13 a18"/>
      <bond id="a14_a19" atomRefs2="a14 a19"/>
      <bond id="a15_a20" atomRefs2="a15 a20"/>
      <bond id="b12345" atomRefs="a1 a2 a3 a4 a5"/>
      <bond id="b678910" atomRefs="a6 a7 a8 a9 a10"/>
      <bond id="bpi1" atomRefs="a0" bondRefs="b12345"/>
      <bond id="bpi2" atomRefs="a0" bondRefs="b678910"/>
    </bondArray>
</molecule>

This completes our tour of four different representations of ferrocene. None have implicit semantics. They can only be reconciled through ontologies, not semantics – we have to assert that some authority says that they are equivalent (or different).

If we need we can give hints to the processing program. We could add an electron count to the pi-bonds:

<bond id="b12345" atomRefs="a1 a2 a3 a4 a5">
  <electron count="5"/>
<bond>
<bond id="b678910" atomRefs="a6 a7 a8 a9 a10"/>
  <electron count="5"/>
<bond>

if we like a neutral Fe and cps or

<bond id="b12345" atomRefs="a1 a2 a3 a4 a5">
  <electron count="6"/>
<bond>
<bond id="b678910" atomRefs="a6 a7 a8 a9 a10"/>
  <electron count="6"/>
<bond>

if we want a cp- and Fe2+ model.

You may ask “how does CML search for ferrocenes”? CML doesn’t. It’s not a program, it’s a representation. For that you need CML-aware engines and that’s what the Open Source community has been developing…
… please join us and help to take us to the semantic future. It won’t happen with SD files.

This entry was posted in "virtual communities", Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *