I have blogged about the exciting potential of DBPedia before ( dbchem” href=”http://wwmm.ch.cam.ac.uk/blogs//?p=316″>dbpedia – structured information from Wikipedia => dbchem). It is a semistructured RDF triple collection created automatically from Wikipedia. The really exciting thing is that huge numbers of WPedians have contributed to DBPEdia without even knowing it. Simply by evolving simple community metadata (tagging and infoboxes) the WPedians have created a top-class semantic resource. A WP category of, say, “1997 deaths” gets translated to a triple something like:
which says that the object with label “Diana” had a “deathDate” category with value “1995″ which is is of type date.
Now the OKFN has blogged
DBpedia recently released the new version of their dataset. The project aims to extract structured information from Wikipedia so that this can be queried like a database. On their blog they say:
The renewed DBpedia dataset describes 1,950,000 “things”, including at least 80,000 persons, 70,000 places, 35,000 music albums, 12,000 films. It contains 657,000 links to images, 1,600,000 links to relevant external web pages and 440,000 external links into other RDF datasets. Altogether, the DBpedia dataset now consists of around 103 million RDF triples.As well as improving the quality of the data, the new release includes coordinates for geographical locations and a new classificatory schema based on Wordnet synonym sets. It is also extensively linked with many other open datasets, including: “Geonames, Musicbrainz, WordNet, World Factbook, EuroStat, Book Mashup, DBLP Bibliography and Project Gutenberg datasets”. This is probably one of the largest open data projects currently out there – and it looks like they have done an excellent job at integrating structured data from Wikipedia with data from other sources. (For more on this see the W3C SWEO Linking Open Data project – which exists precisely in order to link more or less open datasets together.)
“So I predict that with a few years DBPedia will become the semantic resource for chemistry.”
Very informative. Who are their angel investors?
I always want to be part of a web2.0 finance related projects!
(1) I meant to qualify this as “general reference chemistry”. It may not be the actual DBPedia group at present, but it will be a semantic transformation of social computing.
Wikifying chemistry…
There has been some interest in using wikis to annotate molecules: e.g. the ChemSpider blog where Antony is interested in using a local wiki to annotate entries, chem-bla-ics where Egon is trying to e……
I am always getting excited when reading about ‘structured data’
And I need help …
http://miningdrugs.blogspot.com/2007/09/wikipedia-dbpedia-and-remaining.html
(5) We are now starting to extract semantic data from theses (we can do this as we have the rights). This is necessarily semi-structured and is a good example of the sort of thing that is possible. A typical query is “which theses have compounds common to both”. Relatively easy to express in SPARQL.