Microformats in the chemical blogosphere – the Chemical Semantic Web has arrived?

One of my readers writes privately…

Too many acronyms for my poor head in [blog] world. I am beginning to see this as a series of rocks in a swirling sea of T- and F-LAs. People on the solid rocks know where they are and what they’re trying to do; the rest are carried around on the acronymic currents, and continually changing the way they face. Confusing for the rest of us….

Yes.
This gives a true flavour of what I am being bombarded with – a screenful of acronyms, parsers, etc. for GRDDL. The main point of the blog is to leave pointers for the Blue Obelisk community.
The simple message is that this stuff is very powerful if the community wants to use it. I think the chemical blogosphere will. I talked with Dan Connolly and Harray Halpin at tea. Essentially we need a microformat vocabulary of no more than 20 terms (see hCard, etc.). Some time ago Henry and I proposed Dublin Chem which has concepts such as “does this document talk about substances”? “are there any calculations”? etc. So we could write:
<span class=”bo:calculation”>MOPAC</span>
which is a microformat saying that the document has something to do with the tag “calculation” as defined by the Blue Obelisk community. There is a lot of very clever magic (profiles) which can be added to the top of the HTML or XML document. There are also lots of very clever tools that already exist to process this. All the stuff is in the tutorial material
I think this stuff is actually easier than adding InChIs to chemical documents. If we use it then the chemical blogosphere becomes extremely powerful. We can then ask questions like:
“how do I make trityl azide”?
and the GRDDL/SPARQL tool will search the chemical blogosphere for
class=”bo:preparation”>trityl azide
(and of course the InChI could also be used)
How do we know what the “chemical blogosphere is”? We use FOAF (Friend of a Friend) tools. we can define ourselves as friends in the chemical blogosphere and search this bounded set.
So here are some of Dan’s slides that informa our direction:

  1. Everything should have a URI: All entities of interest should be identified by URIs.
  2. Follow Your Nose Principle: URIs should be dereference-able, meaning that an application can look up a URI over the HTTP protocol and retrieve RDF data.
  3. Use standard formats: Data should be provided using the RDF/XML or Turtle syntax. If data is embedded using a format like Microformats , then these documents should include links to automatically extract RDF data from them, ala GRDDL.
  4. Link Your Data: Resource descriptions should contain links to related information in the form of dereference-able URIs within RDF statements and rdfs:seeAlso links.

and the most exciting vision:

The Web as One Big Mashup

Follow your nose and query the whole Web
For each triple pattern, the library executes the following algorithm:

  1. look up URIs that appear in the triple pattern. Add retrieved graphs to the local graph set.
  2. look up any URI y where the graph set includes the triple { x rdfs:seeAlso y } and x is a URI from the triple pattern. Add retrieved graphs to the local graph set.
  3. match the triple pattern against all graphs in the local graph set.
  4. for each triple that matches the triple pattern
    1. look up all new URIs that appear in the triple. Add retrieved graphs to the local graph set.
    2. look up any new URI y where the graph set includes the triple { x rdfs:seeAlso y } and x is a URI from a matching triple. Add retrieved graphs to the local graph set.
  5. match the triple pattern against all newly retrieved graphs.
  6. repeat step 4 and 5 until the maximum number of retrieval steps or the timeout is reached.

So “The chemical web as one big mashup” I think we can.
(Check out the “Tabulator” which is a sort of RDF browser for the web.)

This entry was posted in chemistry, semanticWeb. Bookmark the permalink.

One Response to Microformats in the chemical blogosphere – the Chemical Semantic Web has arrived?

Leave a Reply

Your email address will not be published. Required fields are marked *