PP3_0.1: Who owns Scientific Data? Anyone?

Typed into Arcturus

This post is a first outline – not even a draft – of a proposed Panton Paper on “Who owns scientific data? Anyone?”

[Note: I am using a blog format to explore these issues. This is partly because it feels natural, and partly because it reaches out to readers of the blog who may not regularly read OKF lists (of course we hope you start doing so). These posts then seed/catalyze discussion which will take place on the OKF open-science list ( ) and then be communally crafted on one or more Etherpads (temporary scratch pads for Open communal discussion – you'll find the addresses on the list(s)). Note that almost anything I write can and should be edited.]

A very common concern is “Who owns scientific data?” This will not be an easy question to answer and may take some months to explore. This is partly because there will be a fuzzy borderline as to what is data, but mainly because it requires grappling with legal, contractual and moral ownership issues. These cannot be ignored as several recent cases have shown but we can often avoid taking an overly legal-algorithmic approach.

NOTE: I got so bogged down in the legal issues that I forgot to raise the most fundamental question of all – does or should anyone own data? Claudia Koltzenburg rightly points this out on Friendfeed

good you make a start in the PP/OKF context, pmr, thanks, and I do welcome the tentativeness of your post. On an equally tentative note, let me add that the title of PP3 (currently “Who owns Scientific Data?”) implies that anyone does (or should?) own data, maybe let’s find a more neutral question without any such implication? Would this be perceived as more neutral? “Are scientific data owned by anyone?” I guess that this would help us move away from the pdf towards xml ;-) – Claudia Koltzenburg

Much of the discussion will use words such as “research” rather than “data” and we should try to make the distinction where possible. Tangible research involves recorded ideas, hypotheses, software, data (in different sorts of rawness/filtering) , analyses, conclusions. Some or all can be protected by copyright or patents.

In some jurisdictions there are explicit or implicit requirements to try to exploit research, perhaps most prominently in the US Bayh-Dole Act ( ). Quoting Wikipedia:

Among other things, [Bayh-Dole] gave US universities, small businesses and non-profits intellectual property control of their inventions and other intellectual property that resulted from such funding. The Act, sponsored by two senators, Birch Bayh of Indiana and Bob Dole of Kansas, was enacted by the United States Congress on December 12, 1980.

Perhaps the most important change of Bayh-Dole is that it reversed the presumption of title. Bayh-Dole permits a university, small business, or non-profit institution to elect to pursue ownership of an invention in preference to the government.

Note the use of “Intellectual Property”, not “data” (which is a subset of the IP). Bayh-Dole is seen by some as emphasizing the need to exploit at the expense of the free flow of research within the community. Staff in many (?most) research institutions have explicit contracts requiring them to attempt to exploit discoveries and tools.

Concerns have been raised about this, e.g. (2005).

I see roughly three aspects of data ownership:

  • Legal. In some (?most) jurisdictions some of the research output (such as computer software) is automatically copyright. The copyright owner may not be obvious. In the UK it could be the employer or the author. A PhD student usually owns the copyright automatically, while for a postdoc it may depend whether the work is “for hire”.
  • Contractual. It is possible for the employer to require the employee to assign copyright to the institution. This varies from institution to institution.
  • Moral. “Author’s moral rights” is a well established concept in many jurisdictions and gives the author some power to control what is done with their works.

Our immediate problem is that data is uncopyrightable [1]. Data per se is also not patentable (although some scientific “factual” discoveries such as protein structures have been patented generating strong opinions). Much of the discussion does not really address data and that is where I think the OKF has an important opportunity to help. Although we cannot remove the legal issues (just as we found in the Panton Principles) we can create alternative ways of conducting research that minimise the concerns.

Here’s a recent exploration from from Phil Bourne at UCSD (2010).

There is a general assumption in the world in general that research should be made freely available at the earliest opportunity (and we’ll sketch a different PPaper for that). But in a competitive world many scientists believe they have a right and a necessity to hang onto “their” data and are under no moral obligation to share it. There have been major public examples of the tensions that these approaches cause. In the UK we have Climategate ( ) where ultimately data was forced to be released ( ) but again this was not pure data, but included emails. A few blogs ago ( ) I commented on the forced release of tree-ring data from Queen’s Belfast where there are suggestions that requests could require the release of data before publication. (I think this is over-reaction and in any case we in the OKF should be able to help suggest appropriate conduct). Interestingly the report stated “Dr Keenan won a ruling from the Information Commissioner in April that said that Queen’s owned the data and must release it.” [my emphasis].

The question of timescale is critical. I shan’t discuss this here but here is an account of how NASA appears to be holding back the “best data” so a selected group of astronomers can get first pick at it. One implication is that NASA owns the data. and

So I have no concrete starting points for this discussion. Here is the draft of the PantonPaper:

  • Should anyone own data?
  • What are the current problems?
  • How do we make “owned data” Open?



[1] Trivia. This is the longest word in the English language with no repeated letter

5 Responses to “PP3_0.1: Who owns Scientific Data? Anyone?”

  1. Nick Barnes says:

    See also this example, the Planck telescope:
    People are scraping data from press-release images because the telescope team won’t release it….

  2. Chris Rusbridge says:

    Peter, you write “Our immediate problem is that data is uncopyrightable”. I think this is slightly too strong; data represents such a continuum from the directly measured “fact” that is probably uncopyrightable to highly interpreted and interpolated constructions with large amounts of skill involved in their creation. I think these higher level forms of science data are more likely to be copyrightable and hence can be owned.

    Second, don’t forget that there are other kinds of intellectual property than copyrights and patents. In our neck of the woods, there does exist the database right, which has much to do with the arrangement of data. It protects against unauthorised extraction of significant portions of the work, although there are arguments that it is weaker than it was believed to be (some court case relating to a Horse Racing Board, or some such; those folk claimed to own the data about horses and riders in races).

    I am also not sure what role trade secrets might apply here. They are pretty hard to assert; you have to show evidence such as employment contracts etc that make clear that trade secrets apply and must not be shared. I know some scientists that would love their data to be considered trade secrets, at least until they have their Nobel Prize (as if, in most cases!).

    Any way, this is interesting stuff, do keep up the good work.

  3. Chris Rusbridge says:

    Of course I meant to suggest you might say “some data are uncopyrightable”!

  4. Ben Small says:


    came across your blog by chance. I think you may find the following link interesting.



  5. pm286 says:

    Really valuable – I have fed this back to the OKF…

