chem:microformats – what questions would YOU like?

Many years ago Henry Rzepa and I discussed the idea of extending Dublin Core to chemistry and we called it Dublin-Chem. The “Dublin” is Dublin Ohio, home of OCLC. We discussed this with Stuart Weibel [OCLC] – the DC guru – and it seemed a reasonable approach. An early publication )ca. 1999) listed 11 primary tags (although I thought there were more):

Table 2. A Chemical Metadata Schema
Element Name	Description of the element	Deployment in HTML 4.0
HEAD	Specifies the location of a meta data profile.
DC.chem.coordinates	Molecular coordinates
DC.chem.substance.formula	Formula constitution
DC.chem.substance.smiles	Connection table for molecule
DC.chem.computation-simulation	Presence of computed or simulated property
DC.chem.biological-activity	Biological activity
DC.chem.safety	Type of chemical safety information
DC.chem.characterisation	Characterisation mode of molecule
DC.chem.instrumentation	Associated instrumentation
DC.chem.physicochemical-data	Molecular properties
DC.chem.reaction-data	Reaction classification
DC.chem.crystallography	Crystallographic information

We’d like to put these into the chem:* microformat pool. It’s probably a good idea to remove the hierachary (e.g. chem:formula) and some of the verbosity (e.g. chem:reaction).
I have talked with a future Open collaborator who is keen to try these ideas out on the chemical blogosphere. We calculated that the current blogosphere might contain ca 1 million triples – this is not a serious problem at this stage – 3 orders of magnitude might require more engineering.
So how many tags have we got? and how many might we want? Maybe a good start is to think of hypothetical queries (aimed at present at the blogosphere, but potentially over a much wider set of documents). At present let’s assume that there are no synonyms and no numeric computation. Some suggestions:

Find posts after [data] with mention of patents from GSK
What posted syntheses mention DCM
Find posted reviews of syntheses which involve author X.

Note that not everything has to be done in chem:* – we can probably rely on dates, bibliography etc. coming from elsewhere.

This entry was posted in semanticWeb. Bookmark the permalink.

3 Responses to chem:microformats – what questions would YOU like?

Steven Bachrach says:

May 9, 2007 at 9:41 pm

I read Egon’s suggestion a few months back, and have been waiting for some progress in this area. I think I am much like a blogger you made reference to earlier, in that I am very much confused as to what I should do regarding microformats in the blog I am writing. I have placed InChIs and DOIs in the text – just what should I do now with this microformatting business?
My hunch is that I should just wait for some of this to simply mature – but the longer I wait, the more work it will be to go back to earlier materials and add this stuff in.
Any advice?

Jim Downing says:

May 10, 2007 at 2:29 pm

Why will 3 orders of magnitude more triples need more engineering? Are you assuming that the only way of getting value is by aggregating all the triples into a single triple-store?
I suspect the real question is how we structure sources of chem:triples so that we don’t have to aggregate them all together to do useful things, and that this is worth thinking about up front.

pm286 says:

May 10, 2007 at 2:42 pm

(2) Jim – ignore what I wrote. The world is much bigger and I will post about this in an hour or so

chem:microformats – what questions would YOU like?

3 Responses to chem:microformats – what questions would YOU like?

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta