chem:microformats - what questions would YOU like?

Many years ago Henry Rzepa and I discussed the idea of extending Dublin Core to chemistry and we called it Dublin-Chem. The "Dublin" is Dublin Ohio, home of OCLC. We discussed this with Stuart Weibel [OCLC] - the DC guru - and it seemed a reasonable approach. An early publication )ca. 1999) listed 11 primary tags (although I thought there were more):

Table 2. A Chemical Metadata Schema
Element Name Description of the element Deployment in HTML 4.0
HEAD Specifies the location of a meta data profile.  
DC.chem.coordinates Molecular coordinates
DC.chem.substance.formula Formula constitution
DC.chem.substance.smiles Connection table for molecule
DC.chem.computation-simulation Presence of computed or simulated property
DC.chem.biological-activity Biological activity Type of chemical safety information
DC.chem.characterisation Characterisation mode of molecule
DC.chem.instrumentation Associated instrumentation
DC.chem.physicochemical-data Molecular properties
DC.chem.reaction-data Reaction classification
DC.chem.crystallography Crystallographic information

We'd like to put these into the chem:* microformat pool. It's probably a good idea to remove the hierachary (e.g. chem:formula) and some of the verbosity (e.g. chem:reaction).

I have talked with a future Open collaborator who is keen to try these ideas out on the chemical blogosphere. We calculated that the current blogosphere might contain ca 1 million triples - this is not a serious problem at this stage - 3 orders of magnitude might require more engineering.

So how many tags have we got? and how many might we want? Maybe a good start is to think of hypothetical queries (aimed at present at the blogosphere, but potentially over a much wider set of documents). At present let's assume that there are no synonyms and no numeric computation. Some suggestions:

  • Find posts after [data] with mention of patents from GSK
  • What posted syntheses mention DCM
  • Find posted reviews of syntheses which involve author X.

Note that not everything has to be done in chem:* - we can probably rely on dates, bibliography etc. coming from elsewhere.

3 Responses to chem:microformats - what questions would YOU like?

  1. I read Egon's suggestion a few months back, and have been waiting for some progress in this area. I think I am much like a blogger you made reference to earlier, in that I am very much confused as to what I should do regarding microformats in the blog I am writing. I have placed InChIs and DOIs in the text - just what should I do now with this microformatting business?

    My hunch is that I should just wait for some of this to simply mature - but the longer I wait, the more work it will be to go back to earlier materials and add this stuff in.

    Any advice?

  2. Jim Downing says:

    Why will 3 orders of magnitude more triples need more engineering? Are you assuming that the only way of getting value is by aggregating all the triples into a single triple-store?

    I suspect the real question is how we structure sources of chem:triples so that we don't have to aggregate them all together to do useful things, and that this is worth thinking about up front.

  3. pm286 says:

    (2) Jim - ignore what I wrote. The world is much bigger and I will post about this in an hour or so

