Typed into Arcturus
This post is a first outline – not even a draft – of a proposed Panton Paper on “Who owns scientific data? Anyone?”
[Note: I am using a blog format to explore these issues. This is partly because it feels natural, and partly because it reaches out to readers of the blog who may not regularly read OKF lists (of course we hope you start doing so). These posts then seed/catalyze discussion which will take place on the OKF open-science list (http://lists.okfn.org/mailman/listinfo/open-science ) and then be communally crafted on one or more Etherpads (temporary scratch pads for Open communal discussion – you'll find the addresses on the list(s)). Note that almost anything I write can and should be edited.]
A very common concern is “Who owns scientific data?” This will not be an easy question to answer and may take some months to explore. This is partly because there will be a fuzzy borderline as to what is data, but mainly because it requires grappling with legal, contractual and moral ownership issues. These cannot be ignored as several recent cases have shown but we can often avoid taking an overly legal-algorithmic approach.
NOTE: I got so bogged down in the legal issues that I forgot to raise the most fundamental question of all – does or should anyone own data? Claudia Koltzenburg rightly points this out on Friendfeed
good you make a start in the PP/OKF context, pmr, thanks, and I do welcome the tentativeness of your post. On an equally tentative note, let me add that the title of PP3 (currently “Who owns Scientific Data?”) implies that anyone does (or should?) own data, maybe let’s find a more neutral question without any such implication? Would this be perceived as more neutral? “Are scientific data owned by anyone?” I guess that this would help us move away from the pdf towards xml – Claudia Koltzenburg
Much of the discussion will use words such as “research” rather than “data” and we should try to make the distinction where possible. Tangible research involves recorded ideas, hypotheses, software, data (in different sorts of rawness/filtering) , analyses, conclusions. Some or all can be protected by copyright or patents.
In some jurisdictions there are explicit or implicit requirements to try to exploit research, perhaps most prominently in the US Bayh-Dole Act (http://en.wikipedia.org/wiki/Bayh%E2%80%93Dole_Act ). Quoting Wikipedia:
Among other things, [Bayh-Dole] gave US universities, small businesses and non-profits intellectual property control of their inventions and other intellectual property that resulted from such funding. The Act, sponsored by two senators, Birch Bayh of Indiana and Bob Dole of Kansas, was enacted by the United States Congress on December 12, 1980.
Perhaps the most important change of Bayh-Dole is that it reversed the presumption of title. Bayh-Dole permits a university, small business, or non-profit institution to elect to pursue ownership of an invention in preference to the government.
Note the use of “Intellectual Property”, not “data” (which is a subset of the IP). Bayh-Dole is seen by some as emphasizing the need to exploit at the expense of the free flow of research within the community. Staff in many (?most) research institutions have explicit contracts requiring them to attempt to exploit discoveries and tools.
Concerns have been raised about this, e.g.
I see roughly three aspects of data ownership:
- Legal. In some (?most) jurisdictions some of the research output (such as computer software) is automatically copyright. The copyright owner may not be obvious. In the UK it could be the employer or the author. A PhD student usually owns the copyright automatically, while for a postdoc it may depend whether the work is “for hire”.
- Contractual. It is possible for the employer to require the employee to assign copyright to the institution. This varies from institution to institution.
- Moral. “Author’s moral rights” is a well established concept in many jurisdictions and gives the author some power to control what is done with their works.
Our immediate problem is that data is uncopyrightable . Data per se is also not patentable (although some scientific “factual” discoveries such as protein structures have been patented generating strong opinions). Much of the discussion does not really address data and that is where I think the OKF has an important opportunity to help. Although we cannot remove the legal issues (just as we found in the Panton Principles) we can create alternative ways of conducting research that minimise the concerns.
Here’s a recent exploration from from Phil Bourne at UCSD http://www.ethicscenter.net/event/who-owns-data (2010).
There is a general assumption in the world in general that research should be made freely available at the earliest opportunity (and we’ll sketch a different PPaper for that). But in a competitive world many scientists believe they have a right and a necessity to hang onto “their” data and are under no moral obligation to share it. There have been major public examples of the tensions that these approaches cause. In the UK we have Climategate (http://en.wikipedia.org/wiki/Climatic_Research_Unit_email_controversy ) where ultimately data was forced to be released (http://en.wikipedia.org/wiki/Climatic_Research_Unit_email_controversy#Information_Commissioner.27s_Office ) but again this was not pure data, but included emails. A few blogs ago (http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2473 ) I commented on the forced release of tree-ring data from Queen’s Belfast where there are suggestions that requests could require the release of data before publication. (I think this is over-reaction and in any case we in the OKF should be able to help suggest appropriate conduct). Interestingly the report stated “Dr Keenan won a ruling from the Information Commissioner in April that said that Queen’s owned the data and must release it.” [my emphasis].
The question of timescale is critical. I shan’t discuss this here but here is an account of how NASA appears to be holding back the “best data” so a selected group of astronomers can get first pick at it. One implication is that NASA owns the data.
So I have no concrete starting points for this discussion. Here is the draft of the PantonPaper:
- Should anyone own data?
- What are the current problems?
- How do we make “owned data” Open?
 Trivia. This is the longest word in the English language with no repeated letter