Open Data: My apologia

Typed into Arcturus

My blog on the RI meeting has been blogged at

http://bishophill.squarespace.com/blog/2010/6/15/murray-rust-on-pearce.html?currentPage=2#comments

which is factual and fair. I should make it clear that I am not putting anyone on a spectrum (“sceptics”, “nice guys”, “cheats”) etc. I went to a meeting I knew almost nothing about and came back saddened and concerned about an apparent priesthood. This has been confirmed by various public emails and blogs which show that there is concern in the community about this issue.

A comment on the blog above reads:

“I had no idea that this “FOI battle” had been going on for several years and that nothing had been done to try to solve the problem”.

Wikipedia says “Peter Murray-Rust campaigns for open data, particularly in science”.

Now that would be nice if Peter was to make a start on an open data campaign in climate science as he seems to be several years behind.

June 15, 2010 | martyn

And this is a good reason for me to make my apologia for Open Data and why I am active in an area I know little about.

I have no problem about “being several years behind” – I would expect nothing less. Ignorance is not a crime – we are all ignorant of almost everything. [Arguing from known ignorance is less excusable.]

I have spent a lunchtime hour flicking through blogs I have been pointed to (e.g. http://climateaudit.org/2010/06/04/losing-glacier-data/#comment-231990 ). There are many issues but my only comment will be that there is a range of views on how easy it is to preserve data. Some posters express surprise that all data is not preserved for ever, others that historically it has been very difficult to preserve it. My own view is that it depends on the motivation, the tools and the funding. Any missing component leads to data being lost.

So what is Open Data and why am I talking about a discipline (Climate) I don’t know much about? I got involved in Open Data about 5 years ago when I was enraged by publishers who sprayed copyright notices over factual data and who were less than enthusiastic about addressing any problem to do with data. The term “Open Data” was almost unknown then and while I am not the first to put the two words together they were sufficiently rare that I started a Wikipedia page (http://en.wikipedia.org/wiki/Open_science_data ) – [BTW this needs updating].

Since then I have been invited to speak on Open Data at a number of meetings (often Open Access or library meetings), met with many editors and publishers and most recently worked the the Open Knowledge Foundation and Science Commons, resulting in the Panton Principles. Most recently BiomedCentral honoured us by presenting Open data prizes and asking us to judge and award them.

I have also worked with the JISC in the general area of Open data and most recently am the PI of a grant award (with OKF, International Union of Crystallography, British Library, Cambridge University Library and PLoS) on “Open Bibliography”. It hasn’t yet started but we’ve made good progress.

My claim to be involved is that there are universal aspects to Openness in science (and usually corresponding benefits) and I’ll summarise them in what I (and I believe colleagues in the OKF) would feel able to do in an objective manner:

  • Inspect data resources and determine whether they were fully Open according to the Open Knowledge Definition (http://www.opendefinition.org/ ). It should be possible to do that in most cases without expert knowledge of the domain.
  • Help to provide a label (button) stating that the resource was Open Data.
  • Inspect a bibliography and determine which of the resources pointed to by the bibliography were Open and comment on appropriate aspects.
  • Work with bibliography creators to ensure that the bibliography itself was Open (even if some of the resources to which it pointed were not)

This list is a first pass – please comment. Note that I myself do not intend to create the bibliography of metadata – that would be inappropriate. A bibliography is an important resource which often represents a point of view and hopefully people in the Climate area have bibliographies (these often emerge when writing theses and reviews). Note that the overall infrastructure of a bibliography and it Openness is independent of whether the science is good or flawed, whether the people quoted have a particular viewpoint or whether they are nice or nasty.

If a resource can be identified as Open, then it can save a great deal of time (and sometimes money) when it is re-used. An Open diagram can be used in a review, book, teaching, etc. without further permission. Data can be mined from it. Text can be quoted from it. These things by themselves can add considerable to the speed and quality of a scientific field.

What if the Open resources are quoted in preference to the Closed ones? That might give a false view of the field?? In which case there is a good incentive for making more resources open.

Here are examples of Openness for resources in climate:

  • The “Keeling Curve” (http://en.wikipedia.org/wiki/File:Mauna_Loa_Carbon_Dioxide-en.svg ). This carries the licence:
    Own work, from Image:Mauna Loa Carbon Dioxide.png, uploaded in Commons by Nils Simon under licence GFDL & CC-NC-SA ; itself created by Robert A. Rohde from NOAA published data and is incorporated into the Global Warming Art project. ce: However NC is NOT Open – you could not use this in a text book, create a movie from it, etc.
  • The IPPC’s AR4 Synthesis report. () “The right of publication in print, electronic and any other form and in any language is reserved by the IPCC. Short extractsfrom this publication may be reproduced without authorization provided that complete source is clearly indicated. Editorial correspondence and requests to publish, reproduce or translate articles in part or in whole should be addressed to: [IPCC]”. This is NOT Open.

     

  • Atmos. Chem. Phys., 10, 9-27, 2010
    www.atmos-chem-phys.net/10/9/2010/
    doi:10.5194/acp-10-9-2010
    © Author(s) 2010. This work is distributed
    under the Creative Commons Attribution 3.0 License.

    A comprehensive evaluation of seasonal simulations of ozone in the northeastern US during summers of 2001–2005

    H. Mao1, M. Chen2, J. D. Hegarty1, R. W. Talbot1, J. P. Koermer3, A. M. Thompson4, and M. A. Avery5

     

    This IS OPEN. The licence (CC-BY) is fully conformant with the OKD. As ACP is an Open Access journal I expect that
    all publications carry this rubric. (Apologies for the cut-n-paste into Word)

     

So it should be possible to annotate any bibliography as to whether the items are Open. I can’t give examples of datasets as I don’t know the field. Certain ones (e.g. works of the US government) may be clearly Open, but many others will be fuzzier.

 

 


 

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Open Data: My apologia

  1. Nick Barnes says:

    That drawing of the Keeling Curve doesn’t match the Open Knowledge Definition, but the underlying data, here: http://www.esrl.noaa.gov/gmd/ccgg/trends/ almost certainly does. Probably the picture on that same page does too. As is far too common in science, there is no clear statement of ownership on the page, but I think all NOAA data is open.
    You will tread on a lot of toes if you say something is “Not Open” just because it doesn’t meet the Open Knowledge Definition. For instance, a very great deal of Open Source software, which has been using the word for more than a decade, does not meet that definition.

  2. I didn’t quite know what I would set off when I let Peter and a few others know about the meeting with Fred Pearce and Myles Allen at the Royal Institution on Monday! It was very good to be with him and to meet Fred and Roger Harrabin of the BBC – who came under pressure to defend some of his employers’ coverage, from all sides.
    Nick, Peter and I are involved in a fledgling movement that I’ve been calling the Open Climate Initiative since last November. See for example my report of George Monbiot’s very positive response to the idea at a key debate on 3rd December: http://climateaudit.org/2009/11/29/press-coverage/#comment-206449. It’s taken a while to get the attention of some key technical people and open science people. Right now though it’s fizzing!
    I have to say that I’m on Peter’s side in the use of the term “not open”. But I’m learning loads and expect to be doing so for some while. We will be looking to harness people power in opening up climate science. Watch this space!

    • pm286 says:

      Thanks for the tip on the meeting. I have learnt enough over the last 2-3 days to change my view of Climate Change Research. I shall try to write this up – it won’t be what people expect. I believe that I can and should make a small contribution to this area. The only way I can do it is by being strictly neutral. For that reason I shall not comment on things like the emails (though I have read one selection).

Leave a Reply

Your email address will not be published. Required fields are marked *