I'm at a session at WWW 2007 on Linking Data - which I think will be of enormously important for us. Something I had never heard of before (it's new this year):
it scrapes 750, 000 infoboxes from WP and truns them into structured RDF. The message is simple - the more implicit structure there is in WP, the easier it is for DBP to extract it. If there is a template for a given category (e.g. chemical compounds) then we can easily create an interface to extract structured RDF. For example DBP now has:
58, 000 persons
75, 000 YAGO categories
207 , 000 WP categories
and I am sure it will be relatively easy to extract the chemistry (Martin, how many compounds are there with infoboxes?)
DBP has a SPARQL endopint, on an OpenLink Virtuoso server (I am sitting next to these guys) Typical Q:
"All German musicians born in Berlin in 19th Century"
Key components are:
- All concepts are identified by URIs
- All URIs dereferenceable over the web into a small RDF snippet.
The fantastic thing is that we now have a complete RDF resource FOR FREE. One example which was shown was "von Baeyer", so whenever we refer to him we get his date of birth, history, probably even his FOAFs! DBP is becoming one of the central information hubs of the emerging web of data.
In that way DBP can become the "popular" chemical hub, while Pubchem-RDF will become the "specialist" chemical hub. Of course they will be linked and possibly even indistinguishable in some RDF snippets.
The queries are fantastic:
"A soccer player with #11 shirt in a club with a stadium of over 40,000 seats born in a country with over 10 M inhabitants"
Let's think what the Blue Obelisk will be able to do for chemistry. TBL has said we can lash/mash things up "in an afternoon" I am going to find out today what we can do with the chemistry we have got.
The other RDF resources in the same web are books, US census, geonames, CIA factbook, DBLP, dbtune, FOAF, Revyu
600 RDF triples. This is staggering. 100Klinks out of DBPedia
And then in 2 months music, gutenbreg, SW-lifesci, flickr, eurostat, freebase, HTMLweb GRDDL , blogosphere (SIOC), music brainz...
So - let;s do dbchem...!!! There is still a lot for me to learn. There are starting to be several large hubs of links. Which is the hub for a community will depend on what they want and what they create.