#OSS2010: Reclaiming Our Scholarship – what I said

Typed and edited into Arcturus

There’s an impressive (near verbatim) transcript of the sessions. this is often at least as useful as the slides. In my case essential. I do not use Powerpoint and instead click my way through my “slides” in a non-linear order according to how I interpret the session and audience.

This time all slides were projected remotely (i.e. there was no podium computer and no VGA connector). So I hurriedly typed a few links into a blog post and asked the projectionist to click on about 2-3. The OK Definition, the picture of the Panton Arms and Pantonistas, The Panton Principles. Most of the talk was done with Flowerpoint. What I said is here:

http://gnusha.org/open-science-summit-2010-transcript.html

and my slight editing (e.g. removing “click here” and correcting names) gives:

I am a chemist. I do not do PowerPoint […] My main method of presentation is flowerpoint. I am old enough to have remembered the 60s and not to have been at Berkeley but it has made a huge contribution to our culture. [describing the flowerpoint] The Open Knowledge Foundation will adopt [flowerpoint] as a way of making my points.

We [OKF] have many different areas- maybe 50- that come under Open, that relate to knowledge in general. First of all, my petals are going to talk about various aspects of Openness. […] the open knowledge definition. This is the most important thing in [my talk and message].

A piece of knowledge is open if you are free to use, re-use and re-distribute it, subject only [possibly] to attribute and share-alike.

That’s a wonderfully powerful algorithm. If you can do that, it’s open. If not, it’s not open according to this [?definition].

Another picture [the Panton Arms with PP collaborators], Panton Principles. It’s a placed called a pub. It’s 200 meters from the chemistry department where I work, and between the pub and the chemistry lab is the Open Knowledge Foundation. Rufus has been successful in to getting people to work on [OKF]. A lot of this is about government, public [knowledge].

[petal 1]How many people have written open source software? [many hands]

[petal 2] What about open access papers? [fewer] How many of them had a full CC-BY license [fewer still]. If they weren’t, they didn’t work as open objects. CC-NC, causes more problems than it solves.

[petal 3] How many people have either published or have people in their group who have published a digital thesis, not many, right? [few hands] How many of those explicitly carry the CC-BY license. [about zero] That’s an area where we have to work. Open Theses are a part of what we’re trying to set up in the Open Knowledge Foundation. Make the semantic [version] available, LaTeX, Word, whatever they wrote it in, that would be enormously helpful. The digital landgrab in theses is starting and we have to stop it. There are many things we can do.

[petal 4 + 5] There are two projects, and these have been funded by JISC. Open Bibliography and Open Citations. At the moment, we’re being governed by non-accountable proprietary organizations who measure our scholarly worth by citations and metrics that they invent because they are easy to manage and retain control of our scholarship. We can reclaim that within a year or two, and gather all of our citation data, and bibliographic data, and we can then, if we want to do metrics, I am not a fan, but we [emphasis] should be doing them, and not some unaccountable body. Anyone can get involved in Open Bibliography and Open Citations.

[petal 6] The next is open data, and the next is very straight forward. Jordan Hatcher, John Wilbanks from Science Commons, have shown that open data is complex. I think it’s going to take 10 years [to get to terms with Open Data].

[petal 7]This is a group involved in the Panton Principles, Jenny Molloy, Jenny is a student. The power of our students.. undergraduates are not held back by fear and conventions. She has done a fantastic job in the Open Knowledge Foundation. [identifies people in photo] Jordan, then Rufus, John Wilbanks, Cameron, and me, and anyway, we came up with the Panton Principles, [link to ] the Panton Principles, and let’s just deal with the first one [due to time and not being able to scroll down].

Data related to public science should be explicitly placed in the public domain.

There are four principles to use when you publish data. What came out of all of this work is that, one should use a license that explicitly puts your [data] in the public domain – CC0, or PDDL from the Open Knowledge Foundation. So, the motto that I have brought to this [meeting] is one which I’ve been using and been taken up by JISC.. … on the reverse of the flower,

reclaim our scholarship.

That’s a very simple idea, one’s that possible if a large enough number of people in the world look to reclaiming scholarship, we can do it. There are many more difficult things that have been done by concerted activists. We can bring back our scholarship where we [emph] control it, and not others.

[petal 8] I would like to thank to people on these projects, Open Citations (David Shotton) and our funders and collaborators who are JISC, who funds it, BioMed Central who also sponsors this, International Union of Crystallography, Public Library of Science. (applause)

Posted in Uncategorized | 1 Comment

PP4_0.1: Comments on Repository Structure and location

Dictated into Arcturus

Responses to PPaper4_0.1.

I have had a number of useful comments to my suggestion that Scientific Data be reposited in domain-specific repositories (and a number of tweets to the effect that “PMR is dissing librarians yet again”. To the latter I’d ask the authors to reread what I actually said which was that many librarians think that data should be put in IRs; all the scientists I have spoken to think otherwise. This was a factual statement, not an attack.) The meaningful comments are:

Chris Rusbridge says:

July 28, 2010 at 8:57 pm  (Edit)

My real point to make is that Peter suggests an ideal that i fear cannot be realised in the broad. There are comparatively few existing domain-specific repositories, and most are extremely vulnerable. Witness what happened to the AHDS when the makeup of the policy committee changed slightly. Secondly, don’t think (please!) that domains are consistent; there can be endless divisiveness of approach between many subdomains. Thirdly, why should institutional data repositories not work, given the support of the institutional scholars? Fourthly, how can reasonably well-managed institutional data repositories not be federated so that the sub-domain parts of all the world appear as one? Fifthly, institutional data repositories do have a sustainability case, if linked to a library, an institutional mission, and that vital sense of scholarship disclosure.

I would never seek to undermine a domain repository that existed and worked, but I would hesitate to try to establish (and more importantly sustain) a domain repository where none existed. I would aim to establish IDRs and federate them. I’m not saying the former can’t be done, just that it is MUCH harder!

Jim Downing says:

July 29, 2010 at 10:47 am  (Edit)

@Chris

I have to say that I broadly agree with your points, and that the best sustainability and access is offered by federated institutional / sub-institutional repos.

I don’t think this is the easy path, though. There are few IRs tackling data archiving at a significant level, and even fewer aggregated domain-specific meta-repositories.

In the spirit of paving the cow paths, the best route might be to look for ways to deliver institutional support to domain repositories.

Steve Hitchcock says:

July 29, 2010 at 10:53 am  (Edit)

Peter, You mention ‘open data’ twice in this blog entry, in the opening sentence and in the final sentence. In between you do not address how the extensive requirements can be achieved while continuing to provide open data. You propose to disregard the contribution that might be made by researchers’ institutions, yet intimate roles for scientific unions, societies and publishers. These are likely to provide services at a cost that is not compatible with open data. Since open is axiomatic to what you want, it doesn’t seem to add up here. I think we could, and will, see examples of more diversified structures, with IRs at the apex, to provide the expert data management and curation that you seek, but within our research institutions.

Firstly, none of this will be easy and it may well be impossible in most cases. I see no reason why Institutions should not provide data repositories other than the fact that they do not currently do so and there is little sign of them making any progress. I can certainly conceive of a future where this happens – I just don’t see it happening. There *are* a number of domain-specific repositories , and yes, most of them are fragile. But that is to be compared with almost zero equivalents in IRs.

If you read my actual draft for the PPaper (between the rules) there is no mention of where the repository should be and who should finance it. I have simply made the point that data should not be stored in a general-purpose repository where there is no domain expertise. If you wish to make the point that it *should* be stored like that – without effective federation and without domain expertise for ingestion – I will continue to disagree. If you agree that it needs domain expertise then you will have to get that from practising scientists – there is no way that anyone outside the discipline (libraries, Google, Bing) can rightly manage the intricacies and detail.

The last thing that scientists want is their data spread over ca. 10,000 sites (because that is how many HE institutions there are worldwide. No scientist, editor, journal that I have spoken to would countenance data being reposited in that way.

So if libraries (whom I did not attack) wish to be involved they have to engage with domain scientists. Libraries have the following positives to offer:

  • They have (at least currently) funding for IRs
  • They have some degree of permanency

Domain repositories have the following:

  • They have the technical trust of the community
  • They offer a single point of contact

My (rather tentative) solution is that libraries should actively try to take on one or two domain-specific repositories. Not more. Those repositories should correspond to world-expertise on campus. So the Protein Data bank (RCSB) is located in Rutgers. I have no idea whether the University supports it. But it is a single point of contact for the discipline.

The future is tough, however you look at it. But the fact that scientists are starting to set up their own repositories sends a message.

I am simply the messenger.

Posted in Uncategorized | 3 Comments

Open Science Summit: My homage to Berkeley – Flowerpoint

Typed into Arcturus

Open Science Summit: Update.

The live feed is variable, I gather. There have been many very good tweets and they are archived on:

 http://opensciencefoundation.com/oss2010/

(I guess this will be dynamically updated.)

I wasn’t able to show my own HTML so I showed a few HTML links from the blog. I had, however, my main visual support – FLOWERPOINT;


Some of the younger generation may not appreciate the real change that flowerpower made to us in the 60’s and 70’s. I have finally paid my homage to Berkeley.

Posted in Uncategorized | 2 Comments

OSS2010: My slides

Typed into Arcturus

Open Science Summit: I may be presenting from another machine so here is a blog post with critical links. Most of the talk will be given by Flowerpoint, but we shall need:

Reclaiming our Scholarship

Peter Murray-Rust,

Univ of Cambridge, Churchill College and Open Knowledge Foundation

Power Corrupts; Powerpoint Corrupts Absolutely (Tufte)

Flowerpoint Petals:

  • Open Source
  • FULL Open Access
  • Open Theses
  • Open Bibliography + Open Citations
  • Open Data
  • Panton Principles
  • Thanks (JISC, IUCr, PLoS, BMC and many others)

Links:

Posted in Uncategorized | Leave a comment

Berkeley: Reclaiming our scholarship

Dictated into Arcturus

I am giving a 10-minute talk at the Open Science Summit this afternoon at Berkeley. So many things are going round in my head and I still have no clear idea exactly what I’m going to say. The history of freedom on the Berkeley campus is enormous and I’m just off to have my breakfast in the free speech movement cafe in the library (http://www.lib.berkeley.edu/give/bene55/fsm.html ). Goodness only knows what new ideas I will get during breakfast.

I am absolutely sure that we are in the middle of something momentous. The phrase that I have come up with in the Berkeley context is that “openness is the new flower power”. That may sound pretentious but for somebody who remembers the sixties the influence that Berkeley has given us is enormous. Flower power changed our way of thinking throughout the world and directly addressed the military-industrial-complex. Openness has many of the same aspects. It is compelling, it gathers people round it and if enough people become involved then it is surely unstoppable. And it challenges power by making basic assertions that cannot reasonably be denied.

Our new concern is the publisher-industrial-complex. Of course many publishers are enlightened and working to spread knowledge. Many industries are enlightened and bringing valuable wealth creation to the world. But there is a core of members of the publisher-industrial complex who create income by restricting the flow of knowledge. They innovate only to generate greater control and more income. Our scholarship is hobbled by copyrights, patents, firewalls, portals, restrictive conditions and contracts. This has been a problem for the last 50 years and many people have accepted that this is the appropriate way to manage our scholarship. But it is not. .

So we must Reclaim Our Scholarship.

And that will be the primary theme of my short talk. It’s now possible. We create the scholarship. We create the meta-data. We create the tools. We can reclaim and reinvent the way that scientific scholarship is created and disseminated. It only needs enough people.

 

Posted in Uncategorized | 2 Comments

Semantic destruction should no longer be tolerated

Typed and scraped into Arcturus

A very important comment from Henry Rzepa [http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2488&cpage=1#comment-471129 ] on how publishers destroy semantics.

[HSR] This Peter does not mention above, but you will find a fully explained example of how to create a domain specific (chemistry) repository at 10.1021/ci7004737.

Other comments: We have used this repository to provide complete supporting data in readily accessible form for more than 20 primary scientific publications with publishers such as Nature publishing group, VCH (Wiley), RSC, ACS, Science. Each of these publishers had to be persuaded to a greater or lesser degree to integrate this data into the primary article (rather than more obscurely in supporting information), a location we insisted upon in order to give that data prominent visibility to the readers. If you are interested in seeing the effects, I have blogged on the topic.

I recently acted as a referee for a well known publisher. The article was the analysis of quite large molecules, and I was quite keen to explore the proposed structures. Data had been provided, in the form of a double column, page broken, Acrobat file. I faced not a little work in converting this format to something I could use for the purpose. Since I knew the authors, I contacted them after my review process was complete (yes, thus breaking my anonymity) asking why they had provided the data in such an arcane and relatively unusable format. They were following the publisher guidelines. They did suggest that it should be the publisher themselves who should offer a domain specific repository for authors to use, since it is non trivial for an author to establish a domain specific repository themselves (and even within domains, there are may diverse requirements). I have my doubts however that such a model could be effectively deployed by the multiple publishers in chemistry any time soon. Meanwhile, for the vast majority of articles which have associated data submitted with them, the Internet revolution has yet to make much of an impact!

Henry is, of course, a pioneer in this area and jointly and separately we keep running into this.

This example has come at an excellent time as I shall be posting another Panton paper on mining and semantics. I except a few publishers [mainly Open Access in practice and spirit] from the remarks below.

Publishers do not understand semantics and therefore destroy them.

Publishers are generally not interested in innovation unless they drive it, and most of them drive it to enhance their monetary returns rather than for the benefit of the community they purport to.

The simplest way to create “publications” is PDF. PDF was not invented/chosen for the benefit of the community but for the benefit of the publishers. PDF destroys semantics and can be used to prevent innovation use of publications such as data- and text-mining.

Many librarians and even Open Access enthusiasts have been sucked into the “PDF is wonderful” syndrome. Repositories, theses and journals use PDF and our semantic scholarship languishes.

And supplemental data – as Henry points out is even worse. Turning semantic images, tables, molecules, graphs, etc. into mindless PDF is the academic equivalent of opencast coal-mining or logging the rainforests. We destroy our information richness for the benefit of monetary gain or simple laziness.

Posted in Uncategorized | 6 Comments

PP4_0.1: Repositories for Scientific Data

Dictated into Arcturus

This post is a first outline – not even a draft – of a proposed Panton Paper on “Repositories for Scientific Data”

[Note: This is likely to be controversial.]

Very soon we will need to decide where Scientific Data should be stored unless we solve this problem many feels will not have effective access to open data because it is too large or too complex to be included in conventional publications. Some fields such as bioscience or high energy physics and astronomy have already made significant and valuable progress in setting up repositories for their scientific outputs (mainly data) but most fields rely on what can be included in the traditional print-like publication. In almost all cases this is inadequate although the tireless work of organizations such as the International Union of Crystallography shows that it can be achieved in some cases.

I shall be blunt. The only place where Scientific Data should be stored is in domain-specific repositories.

 

Many people, especially those in the academic library community, has suggested that institutional repositories are the appropriate place to reposit data. Almost every scientist I have talked to believes that we need specialist repositories for each domain and that institutional repositories are not set up for and cannot be adapted to serve this purpose. There are several reasons.

  • Scientists expect information to be global. They either assume that Google, and Bing, or other search engines will index all sources of information or they look to specialist domain repositories such as the Proteins (Structure) DataBank or sequence data bases such as Genbank or Swissprot. Until repository managers can coordinate their services so that they appear as a single global resource they will not be used by scientists either for deposition or for discovery.
  • Most Scientific Data needs very expert and careful validation. This cannot be provided by the average institutional repository who has no knowledge of the particular domain. Arbitrary deposition of data into repositories will simply reduce rather than increase their value..
  • Scientific Data has its own specialized metadata. Again this requires domain experts to create and manage as part of the data deposition process.
  • Scientific Data requires specialist search and discovery tools. For example biological sequences are normally searched using a very large database of known sequences to see if they are novel. These services are provided by the NIH and the EBI for example. There is no way that these facilities can be duplicated except in specialist repositories.
  • Scientific Data requires specialist tools for deposition and validation. These are likely to be developed by community efforts centred around global repositories rather than individual academic institutions.

Therefore this paper will address the question of how to build domain-specific repositories for Scientific data.

 


  • Scientific data should be stored in specialist domain-specific repositories.
  • Every sub community in science should explore the data management needs of its community. It is certain that they will need to find sources of funding for this.
  • The community it will need to dedicate time and energy to the creation of data standards and metadata, such as markup language use and Ontologies.
  • The community will also need to create processes for validating data so that there is an expectation of an appropriate quality in their repository.
  • The community will need to build specialist tools for the deposition of data. These tools should be as easy to use as possible (as otherwise data will not be reposited) and it should be noted that this is a resource intensive requirement.
  • The community will also need to develop discovery tools which go beyond text searching. And these tools must be open source so that innovation, correction and validation can be carried out by the community.
  • Arrangements should be made for the transfer of data from theses in institutional repositories into these domain-specific repositories. It is highly likely that this will involve validation which may become part of the future requirements for an acceptable thesis.
  • It is critically important that scientific unions and societies are intimately involved in these repositories but they must not be allowed to gain monopoly status.
  • It is also critical that publishers are active members and amend their processes such that data can be validated before publication and deposited seamlessly into the repositories. Publishers must not be allowed to dominate the data deposition process
  • It is important to think out the governance model of these domain-specific repositories.
  • It is critical that the data and access to them are open in perpetuity. However if the money for the repositories is raised it can no longer be acceptable to charge the world for access.

Posted in Uncategorized | 7 Comments

What I shall say at the Open Science Summit

Dictated to Arcturus

Tomorrow I shall be going to Berkeley to the Open Science Summit. I am creating some audio visual material which is somewhat unconventional and takes us back to the Bay Area of 50 years ago. I’ll say no more and leave it to be revealed at the occasion.

 

The purpose of my talk is to introduce the Open Knowledge Foundation to people who don’t know about it and who don’t know what its full potential its. As I have said earlier I think the OKF will become one of the great institutions of this decade. It does not intend to be universal but it intends to cover all aspects of knowledge where it is important to know what the degree of Openness is and how to manage it. The OKF has at least 50 different activities and divisions (some larger than others) in which interest groups (of course anyone is welcome) work together to define the knowledge structure and practice of a domain.

 

I am involved in several of these and have only a very short time to present both the OKF in general and my particular interests. So I am listing some of these here as a reference in case I cannot present the detail at the time. The projects that I am particularly interested in are:

  • Open Data. This is now of enormous importance and almost every scientist that I encounter knows the importance of managing data. We have recently been awarded a grant by JISC in their “Managing Research Data” theme where we will be working closely on the methods of validating data before publication. We are delighted to be working with two major publishers (the International Union Of Crystallography, and Biomed Central) to see how we can prototype the complete lifecycle of data creation, validation, publication and archival.
  • Open-bibliography. This is extremely exciting as well because bibliography is key to our management and navigation of the scientific literature. Again we have been funded by JISC and are working with the IUCr and with PLoS. [both of this and the previous project are run in very close collaboration with David shotton at Oxford. We hope that these projects will be auto catalytic and act as nuclear round which other like minded people and organizations can congregate. As a result we hope to change the way in which science is published and its outputs are managed.] We see this as the first step in Reclaiming Our Scholarship from the non-accountable organizations who currently manage scientific publication metadata. Here I will hope to interest some of the audience in joining this revolutionary activity.
  • Open theses. The dissemination of electronic theses and dissertations is poorly managed for the 21st century (concentrating far more on preserving the past than disseminating the current and future). In this project we hope to be able to coordinate access to theses and to provide a global approach to finding information. The information is of course not just textual but also graphical, chemical, numeric and many of the other scientific data types that are not catered for by the major world search engines. Again this is a project where we hope that many people and organizations will help to provide local and global material for the project and I’ll be expanding more later.

 

I’ll really only have time to mention these three topics and then to spend a small amount of time on open data which is probably the most important for the open science summit.

 

Summarising, the OKF does not intend to do everything but is likely to be the best place to look when you need to explore some aspects of openness the discussions on the list are often extremely high quality with great technical expertise in many fields and political there are of course many from the academic library but also many from other walks of life.

Posted in Uncategorized | Leave a comment

PP3_0.1: Who owns Scientific Data? Anyone?

Typed into Arcturus

This post is a first outline – not even a draft – of a proposed Panton Paper on “Who owns scientific data? Anyone?”

[Note: I am using a blog format to explore these issues. This is partly because it feels natural, and partly because it reaches out to readers of the blog who may not regularly read OKF lists (of course we hope you start doing so). These posts then seed/catalyze discussion which will take place on the OKF open-science list (http://lists.okfn.org/mailman/listinfo/open-science ) and then be communally crafted on one or more Etherpads (temporary scratch pads for Open communal discussion – you’ll find the addresses on the list(s)). Note that almost anything I write can and should be edited.]

A very common concern is “Who owns scientific data?” This will not be an easy question to answer and may take some months to explore. This is partly because there will be a fuzzy borderline as to what is data, but mainly because it requires grappling with legal, contractual and moral ownership issues. These cannot be ignored as several recent cases have shown but we can often avoid taking an overly legal-algorithmic approach.

NOTE: I got so bogged down in the legal issues that I forgot to raise the most fundamental question of all – does or should anyone own data? Claudia Koltzenburg rightly points this out on Friendfeed

good you make a start in the PP/OKF context, pmr, thanks, and I do welcome the tentativeness of your post. On an equally tentative note, let me add that the title of PP3 (currently “Who owns Scientific Data?”) implies that anyone does (or should?) own data, maybe let’s find a more neutral question without any such implication? Would this be perceived as more neutral? “Are scientific data owned by anyone?” I guess that this would help us move away from the pdf towards xml 😉 – Claudia Koltzenburg

Much of the discussion will use words such as “research” rather than “data” and we should try to make the distinction where possible. Tangible research involves recorded ideas, hypotheses, software, data (in different sorts of rawness/filtering) , analyses, conclusions. Some or all can be protected by copyright or patents.

In some jurisdictions there are explicit or implicit requirements to try to exploit research, perhaps most prominently in the US Bayh-Dole Act (http://en.wikipedia.org/wiki/Bayh%E2%80%93Dole_Act ). Quoting Wikipedia:

Among other things, [Bayh-Dole] gave US universities, small businesses and non-profits intellectual property control of their inventions and other intellectual property that resulted from such funding. The Act, sponsored by two senators, Birch Bayh of Indiana and Bob Dole of Kansas, was enacted by the United States Congress on December 12, 1980.

Perhaps the most important change of Bayh-Dole is that it reversed the presumption of title. Bayh-Dole permits a university, small business, or non-profit institution to elect to pursue ownership of an invention in preference to the government.

Note the use of “Intellectual Property”, not “data” (which is a subset of the IP). Bayh-Dole is seen by some as emphasizing the need to exploit at the expense of the free flow of research within the community. Staff in many (?most) research institutions have explicit contracts requiring them to attempt to exploit discoveries and tools.

Concerns have been raised about this, e.g.

http://www.d.umn.edu/~pschoff/documents/ElliotR05WhoOwnsScientificDatapdf.pdf (2005).

I see roughly three aspects of data ownership:

  • Legal. In some (?most) jurisdictions some of the research output (such as computer software) is automatically copyright. The copyright owner may not be obvious. In the UK it could be the employer or the author. A PhD student usually owns the copyright automatically, while for a postdoc it may depend whether the work is “for hire”.
  • Contractual. It is possible for the employer to require the employee to assign copyright to the institution. This varies from institution to institution.
  • Moral. “Author’s moral rights” is a well established concept in many jurisdictions and gives the author some power to control what is done with their works.

Our immediate problem is that data is uncopyrightable [1]. Data per se is also not patentable (although some scientific “factual” discoveries such as protein structures have been patented generating strong opinions). Much of the discussion does not really address data and that is where I think the OKF has an important opportunity to help. Although we cannot remove the legal issues (just as we found in the Panton Principles) we can create alternative ways of conducting research that minimise the concerns.

Here’s a recent exploration from from Phil Bourne at UCSD http://www.ethicscenter.net/event/who-owns-data (2010).

There is a general assumption in the world in general that research should be made freely available at the earliest opportunity (and we’ll sketch a different PPaper for that). But in a competitive world many scientists believe they have a right and a necessity to hang onto “their” data and are under no moral obligation to share it. There have been major public examples of the tensions that these approaches cause. In the UK we have Climategate (http://en.wikipedia.org/wiki/Climatic_Research_Unit_email_controversy ) where ultimately data was forced to be released (http://en.wikipedia.org/wiki/Climatic_Research_Unit_email_controversy#Information_Commissioner.27s_Office ) but again this was not pure data, but included emails. A few blogs ago (http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=2473 ) I commented on the forced release of tree-ring data from Queen’s Belfast where there are suggestions that requests could require the release of data before publication. (I think this is over-reaction and in any case we in the OKF should be able to help suggest appropriate conduct). Interestingly the report stated “Dr Keenan won a ruling from the Information Commissioner in April that said that Queen’s owned the data and must release it.” [my emphasis].

The question of timescale is critical. I shan’t discuss this here but here is an account of how NASA appears to be holding back the “best data” so a selected group of astronomers can get first pick at it. One implication is that NASA owns the data.

http://top-seminar-topic.blogspot.com/2010/06/in-hunt-for-planets-who-owns-data-once.html and http://www.nature.com/news/2010/100414/full/news.2010.182.html

So I have no concrete starting points for this discussion. Here is the draft of the PantonPaper:


  • Should anyone own data?
  • What are the current problems?
  • How do we make “owned data” Open?


 

 

[1] Trivia. This is the longest word in the English language with no repeated letter

Posted in Uncategorized | 5 Comments

PP2_0.1: Why Scientific Data should be Open

Typed into Arcturus

This post is a first outline – not even a draft – of a proposed Panton Paper on “Why Scientific Data should be Open”

This outline is largely a rework of material that I first exposed on a Wikipedia page (http://en.wikipedia.org/wiki/Open_science_data ) in late 2006 (when “Open Data” was essentially unknown). I was then invited to publish it in Serials Review 10.1016/j.serrev.2008.01.001

Which will cost you about 31.50 USD to read. I also pre-posted this to Nature precedings http://precedings.nature.com/documents/1526/version/1/html

which is less pretty but costs 0.00 USD. Take your choice.

I advanced some arguments for why scientific data should be Open and would welcome feedback. They are about 3.5 years old and if nothing else need updating. During these years we have developed the Panton Principles which should that there is a need for categorising and formalising both what to do with open Data and what it is.

 


 

The reasons for making data Open are several and any piece of data may have more than one of the following motives:

  • Many (?all) scientific data can be deemed to belong to the commons (“the human race”) (e.g. the human genome, medical science, environmental data)
  • They have an infrastructural role essential for scientific endeavour (e.g. in Geographic
    Information Systems and maps)
  • Data published in scientific articles are factual and therefore not
    copyrightable (the previous PPaper made this argument).
  • Public money was used to fund the work and so it should be universally available.
  • It was created by or at a government institution (this is common in US National
    Laboratories and government agencies)
  • Sponsors of research do not get full value unless the resulting data are freely
    available
  • Restrictions on data re-use create an anticommons and the rate of discovery is accelerated by better access to data.


Posted in Uncategorized | Leave a comment