Since I think triple stores will change the way we think about information the current posts are somewhat of a stream of consciousness. (Rather tedious as it is almost impossible to put nicely formatted TT stuff into WordPress). We shall certainly benefit from HTML-based demos and Kingsley Idehen from OpenLink – with whom I spent a lot of time at WWW2007 – has done this…
Thanks Kingsley. While I’m writing, here are a few more thoughts which the experts are welcome to correct.
A triple can be regarded as a directional arc in a graph (“graph” is used topologically – i.e. what is connected to what. Graphs have nodes with arcs between them and all can be labelled with URIs or literals or other primitives). A triple store is somewhere that you can add your labelled arcs and it will sort out whether they have nodes or predicates in common with other arcs. I think of the triple store as a huge graph which can be searched with SPARQL. Indeed if we have a large enough machine all the world’s triples can be loaded and you can ask any question which can be framed as SPARQL.
The real excitement is that if you use the same vocabulary as other people then you can combine your graphs. So, for example, if the Blue Obelisk and chemical blogosphere community all use the same microformats and all the blogs and data files all use them consistently you can ask questions of the whole blogosphere. Remember that things like dates, places, publications, etc. are already catered for – it is up to us to do the chemistry. So remember when I set a puzzle on finding a compound with yellow crystals, an unusual spacegroup and a molecular weight about 250? This is just the sort of query that opens up with WP or the chemical blogosphere.
To do this with relational databases is virtually impossible, but with RDF it is conceptually simple. If the information is in there, SPARQL will find it.
The problem is scale. Triple stores work quite well (I think) if everything fits into memory but start struggling when they get too big. To test this I downloaded Jena Semantic Web Framework – which is Open Source –
and loaded it in Eclipse. I then got persons.nt from dbpedia (about 50Mb) which has triples about people in WP. I found that out-of-the-box I could load about half the file (but I haven’t reset the VM size) and that it would answer questions in a second or two. So for personal use – up to a million triples – Jena looks good.
I have a naive idea that everyone will recognise the value of triple stores and technology will therefore advance so fast that size will not be a problem. I’m also hoping that content will be more important than computational resource and that free triple stores will appear in the cloud. Certainly I would expect to see continuation of free services for dbpedia (I don’t know the basis of the current server).
And we’ll be looking out for molecules. But first we have to convert them to triples. and that is already happening.