From Cameron Neylon:
… This definitely comes with a health warning as it goes way beyond what I know much about at any technical level. This is therefore handwaving of the highest order. But I haven’t come across anyone else floating the same ideas so I will have a shot at explaning my thoughts.
The Semantic Web, RDF, and XML are all the product of computer scientists thinking about computers and information. You can tell this because they deal with straightforward declarations that are absolute. X has property Y. Putting aside all the issues with the availability of tools and applications, the fact that triple stores don’t scale well, regardless of all the technical problems a central issue with applying these types of strategy to the real world is that absolutes don’t exist. I may assert that X has property Y, but what hppens when I change my mind, or when I realise I made a mistake, or when I find out that the underlying data wasn’t taken properly. How do we get this to work in the real world?
[… lots more – on provenance, probability, etc. snipped …]
PMR: In essence Cameron outlines the frustration that many of us find with the RDF model. It makes categorical assertions which have 100% weight and – in its default form – are unattributed. Here are three assertions:
Assuming I have the implicit semantics that frezzing does not change the chemical nature of a substance (not always true), these three statements taken at face value create a contradiction.
I can remove the contradiction by introducing the semantic that a formula may be associated with more than one name and that a name may be associated with more than one formula. This taken at face value prevents us from making any useful inferences.
What I have felt a great need for (echoing Cameron) is that the triple should be enhanced with two properties:
- the provenance (the person or software making the assertion)
- the weight of the assertion
“At Dagstuhl it is continuing to snow.”
If I pass this sentence to OSCAR it may mark up snow as a chemical substance. In doing so it now gives every annotation a weight based on the confidence (I shan’t explain how here). So, for example, it is much more likely that 2-acetylfoobarane is a chemical than HIV (hydrogen-vanadium-iodide) and OSCAR addresses these concerns.
It’s possible to add provenance and confidence to RDF but I don’t know of a standard approach for doing this. If we start doing this we need to make sure we have consistent schemas.
(Interestingly we’ve just been discussing the value of adding “strength of statement” to the results of text mining.