Open Data – the time has come

The term “Open Data” is now becoming commonly used and we (Blue Obelisk) are trying to define it (our mantra being ODOSOS. Open Data, Open Source, Open Standards). It was not commonly used two years ago although the concept is general enough to have been important. In the last 12-15 months there has been a lot of use, particularly in the techie web logs and meetings. The idea is potentially very much broader and looks set to become very important.
The earliest references I can find are:
Jim Kent on the human genome. An Open Data Consortium was founded in ca. 2003 seemingly concerned with geospatial data. Simon St. Laurent gave a presentation without date but it looks a few years back. It has a strong XML flavour.
I became concerned about Open data in ca. 2003-2004 and Henry and I published a Manifesto for Open Chemistry in 2004. I followed these up in 2005 with several mails.
(example) presentations to JISC, OAI, STM Publishers, etc. where I used the term “Open Data”.
Late in 2005 SPARC set up an Open Data list with me as moderator.
Science Commons started in
Dec 2004
In 2005 the term started to emerge, possibly independently, in the XML/tech area as in:
XTech 2005.
It is now a
hot topic among the Tims Bray and O’Reilly

There seem to be several related threads:

  • scientific data deemed to belong to the commons (e.g. the human genome)
  • infrastructural data essential for scientific endeavour (e.g. GIS)
  • data published in scientific articles which are factual and therefore not copyrightable
  • data as opposed to software and therefore not covered by OS licenses and potentially capable of being misappropriated. (this is a very general idea)

I think the current usages are sufficiently close that we should try to bring them together. Comments here would be useful. Maybe a Wikipedia article would help?

  1. Peter, thank you so much for this list; it is very helpful, indeed.

  3. Pascal says:

    you missed my favoriyte link:
    We hope this will mark a new beginning for collaborative efforts towards open standards and open tools.

  4. Keith G Jeffery says:

    Peter, all:
    Eric Zimmerman kindly pointed me to this blog. Although the term open data is rather new, the concept is rather old. The International Geophysical Year of 1957-8 caused the setting up of several world data centres and – more importantly – set standards for descriptive metadata to be used for data exchange and utilisation.
    Somewhat surprisingly, commerce and industry has made more progress in this field with metadata (and exchanged data) in e.g. supply chains being particularly effective – but proprietary to a group of companies. There are many different metadata standards – commonly by domain of interest – for ‘open data’ – developed over the last 50 years.
    There is a standard (technically an EU recommendation to member states) for metadata and data describing research – a standard fomat to describe projects, persons, organisations, products, patents, publications, facilities, equipment, funding etc etc. It is named CERIF; details under This is really useful to understand the context of an open dataset – and usually helps with issues like provenance etc too.
    And finally a plea; please make open data metadata formal; that is – unlike Dubln Coe – it should be machine-understandable as well as machine-readable; then it will scale (automated processes can be used rather than requiring human browsing).

  8. Adrian says:

    I just stumbled over an article that has some additional information on the origin of the term “open data”: Yu, Harlan and Robinson, David G., The New Ambiguity of ‘Open Government’ (February 28, 2012). 59 UCLA L. Rev. Disc. 178 (2012). Available at SSRN: or
    (The article holds up the disctinction between “open data” and “open government” and criticize that both terms have increasingly been merged.)
    “The earliest appearance of the term ‘open data’ in a policy context appears to come from science policy in the 1970s: When international partners helped NASA operate the ground control stations for American satellites, the operative international agreements required those partners to adopt an ‘open-data policy comparable to that of NASA and other U.S. agencies participating in the program, particularly with respect to the public availability of data.’ The agreements also required that data be made available to NASA ‘in the NASA-preferred format.’
    Later, a 1995 National Academy of Sciences report titled On the Full and Open Exchange of Scientific Data elaborated the idea of sharing data from environmental monitoring satellites, perhaps reflecting its shared lineage with those earlier NASA agreements: ‘International programs for global change research and environmental monitoring crucially depend on the principle of full and open exchange …. Experience has shown that increased access to scientific data, information, and related products has often led to significant scientific discoveries and the opportunity for educational enhancement.'”

