Today I'm putting forward the value of semantics for materials science and more generally computational science. http://blogs.ch.cam.ac.uk/pmr/2013/02/03/the-semantic-web-for-materials-science-and-a-great-day-for-melbourne/
Semantics is not just a technology – it's an attitude of mind. It's about how we can create a world where humans do what they are best at and machines do what they are best at. Currently that doesn't happen in much of science (bioscience gets it, chemistry and materials don't). So here's a simple guide to semantics.
Compound 2a melted at 119 o C
Most scientists (of any discipline) would understand that.
But no machine would. Machines do not currently by default understand human communication. They don'y understand:
- "Compound" (the word has many meanings). Which one?
- "2a" . Why is it bold? Does it matter?
- "melted" -
- "at" – a place? A time?
- "119" – an integer? A house number? A rock band?
- o I've been naughty – it is an "oh". But unless you use U+00B0 it's not "degrees"
- C? so many meanings.
So why should we translate this for a machine?
Because if machines can read and understand this they can act as our assistants.
There's probably 50-500 million pages of scholarly scientific literature relevant to materials science published each year. We'd like to find all the melting points. Humans find this very very boring. They make mistakes.
Machines don't get bored and they don't make mistakes. But we have to translate that into semantic form. That's not trivial. But once we've built the infrastructure it becomes routine. So here's what a machine needs:
We've used XML (Chemical Markup Language) as the machine syntax. Humans do NOT need to author or read this. We have tools for that. But a a brief explanation:
- The "compound" is described by the formal CML concept "molecule" CML defines precisely how machines should interpret "property", "scalar", "dictRef", etc.)
- "2a" is a REFERENCE to a file/document elsewhere
- "melted" is a property defined by "prop:mpt" in a dictionary
- Degrees is now defined by a units dictionary
- 119 is a real number ("float")
A machine can now "understand" this. Here's AMI
AMI can understand:
- XML syntax
- Precisely formulated rules
- Computer code
If we can create or convert information into semantic form, AMI can process it. For example Graphics:
Humans understand the top. AMI understands the bottom.
I shall demo this (I shall run out of time, as I always do!)
That's our challenge.