[I have moved machines and as a result overwrote an earlier post. Here it is again]
The meaning and use of words and ideas is critical to the development of the semantic web. Frequently I find the writings of Jorge Luis Borges expresses a deep truth about how we describe, classify and interrelate concepts. Here in a single sentence he sums up the problem of classification. It’s sufficiently compelling that it has a whole Wikipedia entry http://en.wikipedia.org/wiki/Celestial_Emporium_of_Benevolent_Knowledge%27s_Taxonomy
“These ambiguities, redundancies, and deficiencies recall those attributed by Dr. Franz Kuhn to a certain Chinese encyclopedia called the Heavenly Emporium of Benevolent Knowledge. In its distant pages it is written that animals are divided into (a) those that belong to the emperor; (b) embalmed ones; (c) those that are trained; (d) suckling pigs; (e) mermaids; (f) fabulous ones; (g) stray dogs; (h) those that are included in this classification; (i) those that tremble as if they were mad; (j) innumerable ones; (k) those drawn with a very fine camel’s-hair brush; (l) etcetera; (m) those that have just broken the flower vase; (n) those that at a distance resemble flies.”
I shall be writing one or more blog posts on a proposed architecture for the Quixote project (which will gather computational chemistry output). I’ll be describing a concept – the World Wide Molecular Matrix – which is about 10 years old but only now is the time right for it to start to flourish. It takes the idea of a xdecentralised web where there is no ontological dictatorship and people collect and republish what they want.
In classifying the output of computational chemistry we can indeed see the diversity of approaches. Here are some meaningful and useful reasons why someone might wish to aggregate outputs:
- Molecules which contain an Iron atom
- Calculations of NMR shifts in natural products
- Molecules collected by volunteers
- Calculations using B3LYP functional
- Studies reported in theses at Spanish Universities
- Work funded by the UK EPSRC
- Molecules with flexible side chains
- Large macromolecules with explicit solvent
- Work published in J. American Chemical Society
- Calculations which cause the program to crash
These are all completelye serious and some collections along some of these axes already exist.
The point is that there is no central approach to collection and classification. For that reason there should be no central knowledgebase of calculations, but rather a decentralised set of collections. Note, of course, that both Jorges’ and my classification have intersections and omissions. This is the fundamental of the web – it has no centre and is both comprehensive and incomplete.
I’ll be showing how the WWMM now has the technology – and more inmportant the cultural acceptability – to provide a distributed knowledgebase for chemistry. It knocks down walled gardens through the power of Open Knowledge.