Today is the first (half) day of Repository Fringe in Edinburgh and we are having Workshops. Chris Gutteridge from Southampton is running one on Consuming Linked Open Data:
RDF & Linked Open Data are terms becoming more common in the repository community, but what are they and why should you care?
Open data is only useful if people are using it! This workshop will give a hyperbole-free introduction to using the technologies. Attendees will learn how to use PHP to consume RDF data and do useful things with it, including doing some stuff with bibliographic data.
This workshop will be run by Christopher Gutteridge, developer of the award winning University of Southampton Open Data Service and more recently founder of data.ac.uk, and Patrick McSweeney, notorious University of Southampton developer.
Requirements: ideally you will bring your own laptop with PHP installed (Apple & many Linux machines will have it installed by default), and a text editor. Don't panic if you don't have any PHP programming experience, you are welcome to buddy up with somebody who does.
Well, I like workshops and want to help make them successful so I signed up and the numbers have since trebled.
I must admit that I have no use or linking for PHP. It's a broken language for Unicode – and I do a lot with Unicode. And it's possible to write very bad code. But hey ho… So I went to find an installation for Windows and got into one of those chains…
"Cannot find MSCV01.dll". Search on the web. "You must install Visual Studio C++". What??? I thought I had seen the last of Visual Studio when Chem4Word finished. So try more search. "You need an Apache server". You can't install that without FTP, Fake Email Server, Tomcat, Perl and goodness knows what. So I have installed about 8 things I didn't want to get something I didn't want running…
But I am approaching the workshop in a very positive spirit and can now help other attendees install the same stuff…
Ok – what and why is Linked Open Data? It's described by a mug:
The most important thing is that
THE DATA ARE ON THE WEB
The problem is that most academics don't put their data on the web. Most academics just let their data decay. Governments are telling them that they must have data management plans. Academics ignore them. This isn't true in bioscience or crystallography or astronomy. But:
THERE IS NO MATERIALS SCIENCE DATA ON THE WEB.
Not quite true. Wikipedia is doing a great job of systematising data. But they can only do what people make available.
OF THE SEVERAL MILLION CHEMICAL SYNTHESES REPORTED EACH YEAR, HOW MANY ARE OPENLY AVAILABLE?
OK, let's move to biodiversity. After all the future of our planet – in some part - depends on knowing about species and ecosystems. 10,000+ phylogenetic trees are published each year. How much of that data is on the web? Guess.
Wrong! It's not zero. It's 4%!
Linked Open Data is a great idea. I support it.
But you can't link data when there isn't any.
Well, like the first telephone, one site by itself is not much use. It's not Linked Open Data, it's LINKABLE. To be linkable the data provider has to:
- Understand the data on the site and have a formal description of each bit ("semantics"). It's no good labelling it "tree" if you don't know whether it's a tree preservation order or a phylogenetic tree. You need some formal of vocabulary. (Posh word is "ontology").
- Give each bit of data a unique identifier. The posh name is "URI". If you make it unique on your site and combine it with the domain name that's roughly what a URI is.
- If the chunks of your data have relevance to other chunks of your data you can add links. Then the site is an example of internally linked open data.
But the real value of LINKED comes when others to link to it. And for that you have to:
- Create data that other people want
- Make it easy for them to find it and use it
And that's very difficult. Because the academic system implicitly tells people not to do this. No reward points. So no-one does it.
Well I and my group did it. We made 200000 computational chemistry calculations available in DSpace. It was too difficult to use. It's months of wasted work.
We've tried again, this time on our own server, with RDF!
No-one want to use it.
We've done the same for crystal structures. And now Ross and I are going to do the same for phylogenetic trees.
We are mad. We are hoping that the huge interest in Open Data (data.gov.uk, data.gov, etc) will lead people to start linking data sources. We are hoping that people want phylogenetic trees. We never learn.
The time is coming. At some stage scholars/universities will realise the value of domain repositories. Will they help support them? In the way they have poured money into Institutional Repositories? I doubt it. So the value has to come from funding bodies governments and – I think – foundations like Wikimedia who are years ahead of academia in their thinking.
Anyway today Chris is pointing us to bibliographic data. JISC supported us to create #openbib some years ago and we helped national libraries to Open their metadata and to create a protocol (BibJSON).
See you there at 1400