Semantic Chemistry in Wikis and Blogs – a proposal

Joerg Wegner posts (Blogging chemistry means not blogging minable data) :

As posted by Peter more and more chemists are blogging. And I would appreciate if those blogs would contain more chemical minable information. I think especially Rich and Egon have given some nice examples on their blogs. And beside of those blogs Wiki’s can also use chemical information (and not images!).
Anyway, those blogs are just great:

And here are some ideas what is missing on those blogs:

  • tags based on reaction names (with cross-links), reagents, products, …
  • chemical data in a native and downloadable/minable form, e.g. CML or InChI. Preferably all those data entries should have unique identifiers and tagged reaction centers! So, I prefer here rather CML than InChI.
  • more literature references using DOI or PMID

I obviously and absolutely agree. The question is what to do about it. I have been struggling with simple code on this blog (I have now cracked it) but even loading images isn’t absolutely trivial (you have to upload from a directory and then upload int the blog, whereas we would really like to cut and paste.
The commercial chemical tools are not much help. Apart from costing money they are designed to be integrated into Word documents using (I think) ActiveX.  Moreover they aren’t designed to be semantic. In the Blue Obelisk we are developing tools which are XML-CML aware and which – ultimately – will be the simple solution to this. The main challenges are:

  • discover, develop and enhance semantic wikis and blogs. Please post anything you know that actually works.
  • find simple (almost automatic) ways of embedding InChIs and CML invisibly in text. I think this would be relatively simple if we can agree on a method.
  • enhancing the Blue Obelisk tools (especially CDK, JOELib and JChempaint) to provide simple chemical services for Wiki/blogs

I came up with a novel way of doing this – what do you think? Since everything on blogs is public, we could use a communal authoring service. Let’s say we have a server – and this is something I’ll put to my colleagues – that provides a graphical authoring interface for semantic chemistry. (This would include reactions as well as compounds). You would create the molecules you want (and we’ve got some simple ideas for accelrated graphics) on the server. It would create all the InChI and CML transparently. It would give you two URLs:

  • An image of the chemical object (like the GIF we have at present). You could either link to this or cut and paste it or both.
  • A link to semantic chemistry stored on the server. Since all our work is public and I think most of us use Creative Commons there is no loss of IPR – quite the reverse. Clicking on the link could bring up Jmol, JChempaint or whatever, without any need to add client-side functionality

Note that if the server is unavailable you would still have the local PNG in the normal way. There are many advantages:

  • there would be a communal repository for any molecule. This means that simply by linking to (say) ciclosporin it would give you a template (or many templates) which the community had already drawn.
  • The molecule could be linked to other sources such as Wikipedia, Pubchem, etc. Conversely they could link to this resource. We build a communal knowledge base.
  • The server can provide services (e.g. logP from CDK or JOELib).
  • The server would have search facilities (CDK, JOELib, OpenBabel)

and we can all think of many more.
If this excites you – it may need altering – let us know.

5 Responses to Semantic Chemistry in Wikis and Blogs – a proposal

  1. I like the idea very much, especially if people can add additional ‘network’ information like PubChem, KEGG, more meta-data, link-out’s, and-so-on, …
    Best, Joerg

  2. pm286 says:

    (1) thanks for your support. I will try to work out what is easy. We are already building a molecule knowledgebase for crystallography so it should not be too difficult to add this in to the storage. The main thing will be to bolt in the editor. We have to do something along these lines anyway for our own stuff.

  3. Excellent proposal, Peter.
    I think this comes down to what we all want for quite a while already: a decent way to author complex and semantically rich chemistry documents.
    The way by which you want to achieve this is interesting and we’ll certainly would love to be part of such a project. 🙂
    Cheers, Chris

  4. pm286 says:

    (3) Thanks Chris. We’ve started thinking about this by trying images first. There is a problem of transclusion to overcome. If the server is S, and mounts S(a.gif), and A links to this (e.g. A(img src=”S(a.gif)”), then every load of A’s page hits S. Also, as you have seen from PlanetBO transclusion of HTML content is a mess (the images get wiped). So I think we have to have copies, with, perhaps human-activated links to the server (e.g. A(a.gif) is accompanied by href=S(a.html) which accesses the semantic resource. I’ll write again about this – that should not stop any others also trying.

