From Cameron Neylon:
Semantics in the real world? Part I – Why the triple needs to be a quint (or a sext, or…)
… This definitely comes with a health warning as it goes way beyond what I know much about at any technical level. This is therefore handwaving of the highest order. But I haven’t come across anyone else floating the same ideas so I will have a shot at explaning my thoughts.
The Semantic Web, RDF, and XML are all the product of computer scientists thinking about computers and information. You can tell this because they deal with straightforward declarations that are absolute. X has property Y. Putting aside all the issues with the availability of tools and applications, the fact that triple stores don’t scale well, regardless of all the technical problems a central issue with applying these types of strategy to the real world is that absolutes don’t exist. I may assert that X has property Y, but what hppens when I change my mind, or when I realise I made a mistake, or when I find out that the underlying data wasn’t taken properly. How do we get this to work in the real world?
[… lots more – on provenance, probability, etc. snipped …]
PMR: In essence Cameron outlines the frustration that many of us find with the RDF model. It makes categorical assertions which have 100% weight and – in its default form – are unattributed. Here are three assertions:
Assuming I have the implicit semantics that frezzing does not change the chemical nature of a substance (not always true), these three statements taken at face value create a contradiction.
I can remove the contradiction by introducing the semantic that a formula may be associated with more than one name and that a name may be associated with more than one formula. This taken at face value prevents us from making any useful inferences.
What I have felt a great need for (echoing Cameron) is that the triple should be enhanced with two properties:
- the provenance (the person or software making the assertion)
- the weight of the assertion
“At Dagstuhl it is continuing to snow.”
If I pass this sentence to OSCAR it may mark up snow as a chemical substance. In doing so it now gives every annotation a weight based on the confidence (I shan’t explain how here). So, for example, it is much more likely that 2-acetylfoobarane is a chemical than HIV (hydrogen-vanadium-iodide) and OSCAR addresses these concerns.
It’s possible to add provenance and confidence to RDF but I don’t know of a standard approach for doing this. If we start doing this we need to make sure we have consistent schemas.
(Interestingly we’ve just been discussing the value of adding “strength of statement” to the results of text mining.
I was sitting here tonight reading my Semantic Web primer (Antoniou and van Harmelen), and I read something that seemed to relate to this. Paraphrasing, they suggest that in RDF statements like “Chris believes that Peter is the author of this blog entry” can be achieved by “reification”, which are needed because RDF only deals with triples. I think that means we turn the triple “Peter is the author of this blog entry” into a resource, which then becomes the object of the triple “Chris believes that [resource]”. Looks clumsy but hey, multiplication seems like the name of the game in RDF. Just think about turning a decent database into triples!
The vibe I’ve been picking up from people far more knowledgeable about these things is that reification is so last year. You can achieve the same effect using blank nodes (to which you’ll probably want to assign some kind of URI, but that’s another argument).
:chris :madeAuthorshipAssertion [ :aboutBlog ;
dc:author :pmr].
We’re doing similar things with molecular data:
:compoundX :hasMeltingPoint [ :value “234.5”; :units units:celsius; :measurementTechnique techniques:45; ….].
Jim, I think those are indeed to two options. Topic Maps is more elegant in this matter, but the triple nature allows us to apply all the graph theory mathematics we have around. I just saw an example of reification:
John Punin
Gerard Uffelman
at http://www.cs.rpi.edu/~puninj/XMLJ/classes/class8/all.html
But using anynomous resources has the same effect, and thus not require to introduction of new RDF elements. (Though, they are there anyway…)
I’m in the process of converting the BODR into RDF, and plan to use anonymous resources. I had in mind defining a new class, but this is obviously not needed, as shown in your example.
Ok, now with escaped <’s…
<?xml version=”1.0″?>
<rdf:RDF xmlns:rdf=”http://www.w3.org/1999/02/22-rdf-syntax-ns#”
xmlns:dc=”http://purl.org/dc/elements/1.1/”
xmlns:s=”http://www.schemas.org/schema/”>
<rdf:Description>
<rdf:subject rdf:resource=”http://www.cs.rpi.edu/~puninj/XMLJ/”/>
<rdf:predicate rdf:resource=”http://purl.org/dc/elements/1.1/creator”/>
<rdf:object>John Punin</rdf:object>
<rdf:type rdf:resource=”http://www.w3.org/1999/02/22-rdf-syntax-ns#Statement”/>
<s:attributedTo>Gerard Uffelman</s:attributedTo>
</rdf:Description>
</rdf:RDF>