Egon on SMILES InChI CML and RSS

I agree with everything Egon says and add comments.
(Incidentally WordPress and Planet remove the microformats so please read his original
for the correct syntax)
The blogs ChemBark and KinasePro, have been some discussions on the use of SMILES, CML and InChI in Chemical Blogspace (with 70 chemistry blogs now!). Chemists seem to prefer SMILES over InChI, while there is interest in moving towards CML too. Peter commented.

PMR: 70 blogs is great. Go back a year and we’d have ca 10 I suspect. As I say I’m only looking for the 5-10% who are happy to be early adopters

Any incorporation of content other than images and free text requires some HTML knowledge, but this can be rather limited. It is up to us chemoinformaticians to write good documentation on how to do things; so here is a first go.

PMR: Yes, documentation is key as we are always being reminded! But we are also still fighting the browser technology. One of the great problems is that browsers have been a moving target for 12 years – it was almost easier to create a “plugin” in 1994 than now. How many of you can run Chime under Firefox?

Including CML in blogs and other RSS feedsI blogged about including CML in blogs last February, and can generally refer to this article published last year: Chemical markup, XML, and the World Wide Web. 5. Applications of chemical metadata in RSS aggregators (PMID:15032525, DOI:10.1021/ci034244p). Basically, it just comes down to putting the CML code into the HTML version of your blog content, though I appreciate the need for plugins.

PMR: you should always try to create XHTML (HTML with balanced tags). Unfortunately (and most regrettably) some tools, including WordPress, can often remove end tags.

Including SMILES, CAS and InChI in blogsIncluding SMILES is much easier as it is plain text, and has the advantage over InChI that it is much more readable. Chris wondered in th e KinasePro blog on how to tag SMILES, while Paul did the same on ChemBark about CAS numbers.
PMR: SMILES shouldn’t need to be “readable” and some of it isn’t (e.g. if you have a complete disconnected structure). It is because people have got used to seeing it for many years that they don’t feel frightened. There is no way to create canonical SMILES by hand, so you have to have a tool. InChI seems more forbidding because (a) it’s new (b) It can never be hand authored (c) it’s about 50% more verbose (d) it has layers. But each of those has a positive side.
Now, users of know how to add markup to their blogs to get PostGenomic index discussed literature, website and conferences. Something similar is easily done for chemistry things too, as I showed in Hacking InChI support into (which was put on lower priority because of finishing my PhD). basically uses microformats, which I blogged about just a few days ago in Chemo::Blogs #2, where I suggested the use of asperin.And this is the way SMILES, CAS and InChI’s can be tagged on blogs. The element is HTML code to indicate a bit of similar content in HTML, and can, among many other things, be formatted differently than other text. However, this can also be used to add semantics in a relatively cheap, but accepted, way. Microformats are formalized just by use, so whatever we, as chemistry bloggers, use will become the de facto standard. Here are my suggestions:
[snipped see Egon’s blog]
The RDFa alternativeThe future, however, might use RDFa over microformats, so here are the RDFa equivalents:
[snipped see Egon’s blog]
which requires you to register the namespace xmlns:chem=”” somewhere though. Formally, the URN for this namespace needs to be formalized; Peter, would the Blue Obelisk be the platform to do this? BTW, this is more advanced, and currently does not have practical advantages over the use of microformats.
Egon is right: there is currently no clear indication of which approach will come out as the “winner” although there is lots of Web discourse. However for us I suspect we would adopt both if lots of people were using them, and see which approach won.
Yes, of course we should use blueobelisk for the RDF! This has the real chance of succeeding.
Again the message is that the rest of the world is going down this route and at some stage chemistry will follow. RDF looks just as impenetrable as InChI, DOI, and all the rest…
This entry was posted in open issues, XML. Bookmark the permalink.

One Response to Egon on SMILES InChI CML and RSS

  1. One of the reasons I chose Wikispaces to post all of our UsefulChem experimental data is their policy for free accounts: Contributions to are licensed under a Creative Commons Attribution Share-Alike 2.0 License.

Leave a Reply

Your email address will not be published. Required fields are marked *