I’m writing blog posts to collect my thoughts for the wonderful workshop at SePublica http://sepublica.mywikipaper.org/drupal/ where I am leading off the day. [This also acts as a permanent record instead of slides. Indeed I may not provide slides as such as I often create the talk as I present it.] My working title is
Why and how can we make Scholarship Semantic?
[If you switch off at “Semantics” trust me and keep reading… There’s a lot here about changing the world.]
Why should we strive to create a semantics web/world? I “got it” when I head TimBL in 1994. Many people have “got it”. There are startups based on creating and deploying semantic technology. My colleague Nico Adams (who understands much more about the practice of semantics than me) has a vision of creating a reasoning engine for science (he’s applied this to polymers, biotechnology, chemistry). I completely buy his vision.
But it’s hard to sell this to people who don’t understand. Any more than TimBL could sell SGML in 1990. (Yes there were whole industries who bought into SGML, but most didn’t). So what TimBL did was to build a system that worked (The WWW). And this often seems to be the requirement for Semantic Web projects. Build it and show it working.
SePublica will probably be attended by the converted. I don’t think I have to convince them of the value of semantics. But I do have to catalyse:
- The creation of convincing demonstrators (examples that work)
- Arguments for why we need semantics and what it can do.
So why are semantics important for scholarly publishing ? The following arguments will hopefully convince some people:
- They unlock the value of the stuff already being published. There is a great deal in a single PDF (article or thesis) that is useful. Diagrams and tables are raw exciting resources. Mathematical equations. Chemical structures. Even using what we have today converted into semantic form would add billions.
- They make information and knowledge available to a wider range of people. If I read a paper with a term I don’t know then semantic annotation may make it immediately understandable. What’s rhinovirus? It’s not a virus of rhinoceroses – it’s the common cold. That makes it accessible to many more people (if the publishers allow it).
- They highlight errors and inconsistencies. Ranging from spelling errors to bad or missing units to incorrect values to stuff which doesn’t agree with previous knowledge. And machines can do much of this. We cannot have reproducible science until we have semantics.
- They allow the literature to be computed. Many of thre semantics define objects (such as molecules or phylogenetic trees) which are recomputable. Does the use of newer methods give the same answer?
- They allow the literature to be aggregated. This is one of the most obvious benefits. If I want all phylogenetic trees, I need semantics – I don’t want shoe-trees or B-trees or beech trees. And many of these concepts are not in Google’s public face (I am sure they have huge semantics internally)
- They allow the material to be searched. How many chemists use halogenated solvents. (The word halogen will not occur in the paper). With semantics this is a relatively easy thing to do. Can you find second-order differential equations? Or Fourier series? Or triclinic crystals? (The words won’t help) AMI2 will be able to.
- They allow the material to linked into more complex concepts. By creating a data base of species , a database of geolocations and links between them we start to generate an index of biodiversity. What species have been reported when and where? This can be used for longitudinal analyses – is X increasing/decreasing with time? Where is Y now being reported for the first time?
-
They allow humans to link up. If A is working on Puffinus Puffinus (no, it’s not a Puffin, that’s Fratercula Artica) in the northern hemisphere and B is working on Puffinus
tenuirostris in Port Fairy Victoria AU then a shared knowledgebase will help to bring the humans together. And that happens between subjects – microscopy can link with molecular biology with climate with chemistry.
In simple terms semantics allow smart humans to develop communal resources to develop new ideas faster, smarter and better.
Please add other ideas! I am sure I have missed some.
Pingback: Unilever Centre for Molecular Informatics, Cambridge - SePublica: What we must do to promote Semantics #scholrev #btpdf2 « petermr's blog