Useful chemistry thesis in RDF

I shall be using Alicia’s Open Science Thesis in Useful Chemistry as a technical demonstrator at ETD2007. I really want to show how a born digital thesis is a qualitative step forward. Completely new techniques can be used to structure, navigate and mine the information. Here’s a taster:
A chemical reaction diagram (“scheme”) is a graphic object which looks like this:
udc_scheme2.JPG
As you can see this is semantically useless. A lot of work has gone into this, but none of it is useful to a machine (look closely and you’ll see it’s a JPEG). Even in the native software which was used to draw it it is unlikely that the semantics can be easily determined. However XML and RDF allow a complete representation. It took me about 1 hour to handcraft the topology – if we had decent tools it would be seconds. The complete set of reaction schemes (I counted 11 in the thesis can be easily converted to a single RDF file which looks something like this:
uc:scheme1_1 pmr:isA pmr:reactionScheme .
uc:scheme1_1 pmr:hasA uc:rxn1_1a .
uc:scheme1_1 pmr:hasA uc:rxn1_1b .
uc:rxn1_1a pmr:hasReactant uc:comp1 .
uc:rxn1_1a pmr:hasReactant uc:comp2 .
uc:rxn1_1a pmr:hasReactant uc:comp3 .
uc:rxn1_1a pmr:hasReactant uc:comp4 .
uc:rxn1_1a pmr:hasProduct uc:comp5 .
uc:rxn1_1b pmr:hasReactant uc:comp5 .
uc:rxn1_1b pmr:hasProduct uc:comp6 .
(uc: refers to the usefulChemistry namespace, pmr: to mine).
There are many Open Source tools for graphing this and here is part of the output of one from the W3C
scheme1.png
Here you can see that reaction1.1a has four reactants (compound 1,2,3,4) and 1 product (comp 5). Comp5 is the reactant for another reaction (clipped to save blog problems). The complete picture for the whole thesis looks like this:
reactions1.png
and (assuming you have a large screen) you can see immediately what reactions every compound is involved in.
That’s only the start as it is possible to ask sophisticated questions from a SPARQL endpoint – and that’s where we are going next…
… IFF you make the theses true Open Access

This entry was posted in chemistry, etd2007, open issues, XML. Bookmark the permalink.

3 Responses to Useful chemistry thesis in RDF

  1. For some reasons I am a big fan of ‘graph mining’ 😉
    Joerg

  2. pm286 says:

    (1) Good – I’ll be showing some RDF tricks soon I hope

  3. Pingback: Unilever Centre for Molecular Informatics, Cambridge - Staudinger’s Semantic Molecules » Blog Archive » Polymer Theses, Polymer Data and a Common Language.

Leave a Reply

Your email address will not be published. Required fields are marked *