Centralisation or Federation?

  1. Greg Pearl Says:
    April 25th, 2007 at 1:08 pm eSo is the World Wide Molecular Matrix planning on Collating data? All I was able to find was chemical structure generation. If you are planning on Collating the data ACD/Labs would be happy to also provide the WWMM with logP prediction capabilities, as we have done with Chemspider and eMolecules.

    [...]So here is a quick suggestion, maybe we should focus assisting the efforts on curating the world’s chemical information opposed to simple pointing out flaws in Beta software that is making the first pass at collating the data, which from my experience is the first step required in eventually curating the data. So i would like to personally thank PubChem, eMolecules and Chemspider for making the first attempts at collecting all the chemical data available. FYI, when I was examining the Chemspider website it appears that anyone can provide feedback, so we could constructively suggest that chemspider add a new field that would allow the users to curate the data. So in the same vane as wikipedia. FYI, the chemspider substructure search of calcium carbonate does provide a link to wikipedia.

=== PMR ===

The WWMM is a federated model while Chemspider is a centralised one. The WWMM is based on developing an agreed set of metadata for discovery and re-use based on Dublin Core, and more recently METS (Metadata Encoding and Transmission Standard (METS) Official Web Site) and the newly developed OAI-ORE (Open Archives Initiative Protocol - Object Exchange and Reuse) protocol for repositories. The federated model is based on sharing data and not duplicating entities, while the centralised model is based on duplicating  and possessing data.

The world is moving towards federated models where search engines visit repositories and extract what they wish. There is little point in copying data - the modern approach is to take the software to the data. It relies on it being public and Open - which Pubchem, WWMM and similar sites are. In WWMM we are developing a number of repositories where the data are new and unique to the repository and where the links will be passed to Pubchem for indexing and searching (there will also be local search facilities).

OAI-PMH (Open Archives Initiative - Protocol for Metadata Harvesting - v.2.0) and OAI-ORE provide interfaces which allow search engines to  extract metadata and data from repositories without human intervention. I am an advisor to the ORE project and will be meeting with the architects tomorrow - more later.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>