It is encouraging to see the increasing use of “Open Data” and the suggestion that we should use CKAN (OKF):
Fiona Bradley, A database of data, Semantic Library, April 16, 2008. (via Peter Suber)One of the sites recommended by Read/Write Web is CKAN, which is backed by the Open Knowledge Foundation that counts someone who has worked in the library sector amongst their leadership. Are these the types of groups more of us should be involved in to have a role in information access on a larger scale?
Last week there was a flurry of comments around a post by Bret Taylor, We need a Wikipedia for data. Taylor describes a model for a wiki that would aggregate common data in one database that could be cross-searched. Great idea.
One interesting thing about the types of datasets he mentions are that they are all copyrighted – stations own TV schedules, exchanges own market data (the free stuff is usually 20 minutes delayed) and a variety of companies own publishing rights over telephone numbers. This is the data that could be really useful if it was truly free, but given the amount of updating required, I wonder who would do so without a business or legislative imperative.
But that issue is perhaps besides the point. There are many, many incredible datasets out there, everything from Census data to older market information to astronomy. Reading the comments and suggestions on Taylor’s post and Read/Write Web’s post about the topic revealed dozens of sites to find these resources.
I did feel that looking through the list libraries may have missed an opportunity. We have been recommending and linking to various datasets on our websites for years, but there is a huge potential to go beyond this and build something collaboratively and use it as an input for different libraries. Many libraries now take in Open Access Journal records to their catalogues and search engines via DOAJ but there is no reason to not do something similar for Open Data.
Certainly, it is an issue that few of these datasets can talk to eachother – but perhaps the move towards a more standards-based Semantic Web will encourage standardisation and interoperability, at least within, for example, individual government departments so that Census records can be analysed against education records.
PMR: Open Data is very much about the power of networks. The value of a piece of information is proportional to the number of other sources it can be linked to. Fiona is right that raw data may not be easily linkable but that problem is far less if we convert it into RDF. RDF removes the syntactic problem immediately (we don’t need to worry whether it;s comma-separated values, etc.). And many tools are expecting that the vocabulary will be fluid – for example Wikipedia uses at least “birthdate” and “dateofbirth” in its infoboxes. Even simple lexical tools can help bring this together.
Of course if you have data where there is a known, Open, format (such as FOAF, protein sequences, etc.) use it. But it’s better to carry out very lightweight markup with RDF than not to deposit the data at all. And don’t underestimate the cleverness of the search engines.