Many years ago Henry Rzepa and I discussed the idea of extending Dublin Core to chemistry and we called it Dublin-Chem. The "Dublin" is Dublin Ohio, home of OCLC. We discussed this with Stuart Weibel [OCLC] - the DC guru - and it seemed a reasonable approach. An early publication )ca. 1999) listed 11 primary tags (although I thought there were more):
|Table 2. A Chemical Metadata Schema|
|Element Name||Description of the element||Deployment in HTML 4.0|
|HEAD||Specifies the location of a meta data profile.|
|DC.chem.substance.smiles||Connection table for molecule|
|DC.chem.computation-simulation||Presence of computed or simulated property|
|DC.chem.safety||Type of chemical safety information|
|DC.chem.characterisation||Characterisation mode of molecule|
We'd like to put these into the chem:* microformat pool. It's probably a good idea to remove the hierachary (e.g. chem:formula) and some of the verbosity (e.g. chem:reaction).
I have talked with a future Open collaborator who is keen to try these ideas out on the chemical blogosphere. We calculated that the current blogosphere might contain ca 1 million triples - this is not a serious problem at this stage - 3 orders of magnitude might require more engineering.
So how many tags have we got? and how many might we want? Maybe a good start is to think of hypothetical queries (aimed at present at the blogosphere, but potentially over a much wider set of documents). At present let's assume that there are no synonyms and no numeric computation. Some suggestions:
- Find posts after [data] with mention of patents from GSK
- What posted syntheses mention DCM
- Find posted reviews of syntheses which involve author X.
Note that not everything has to be done in chem:* - we can probably rely on dates, bibliography etc. coming from elsewhere.