petermr's blog

A Scientist and the Web


Open data is essential for science

An important set of papers on Open Data and science:

The June issue of the Journal of Science Communication is now available (Peter Suber) . OA-related articles:

The first is an editorial overview. Bora’s has been blogged elsewhere and I’ll concentrate on John’s. You should read it rather than relying on me. But what I take is:
  • if we try and apply ANYTHING other than the public domain to scientific facts we shall no be able to manage scientific data. Problems include aggregation, restrictions (however reasonable) on re-use, cascading attribution, different jurisdictions
  • the public domain in NOT another licensing scheme. It is as free as the air we breath. No-one has to ask permission to ask to breathe. It is NOT copyright
  • It must be supplemented by community norms. Yes, you may legally do anything with this data, but if you do X and Y we shan’t like it and this might affect future funding, collaorations, publishability, etc.
We have no alternative. Everything else descends into infinite recursion and hypotheticals. You cannot control how a marketing analyst might use meteorite data.  Or what data sets are useful for devloping new machine-learning techniques. Or how word frequency in scientific texts gives a greater understanding of the structure of the brain. Or…
We don’t know how to do some of this in practice but it shouldn’t stop us trying. The simplest thing is to add the “Open Data” sticked from the Open Knowledge Foundation as we do in CrystalEye. This says “This data is is Open”. You can do what you want. If your are the primary user please acknowledge us (we shan’t sue if you don’t but it’s simple human courtesy). If you aggregate into another resource and “our” data was a major input in would be nice to have it acknowledged. If you take a few parts it’s probably overkill to acknowledge the bits.
What can you do to help? If you are a scientist, add the “Open data” to your data. This stops it being possessed and controlled by third parties. Even if they do you can point to the fact it was labelled as that originally which I hope would resolve lawsuits. If you are a publisher who believes in Open Data, add some statement to your web site that makes it clear tha the data are open. DON’T try to control it. DON’T add CC-NC licences. These are impossible to use anyway. Indeed don’t use an CC licences.
If you are a publisher who understands what I have just written and fail to label the data that passes through your organ as “open” then you are actively impeding science. This is a strong and accurate statement. Your refusal to help suport the free flow and reuse of data makes it harder for scientists to make discoveries, makes it harder for readers to judge the quality of data in your journals.
And, if at the same time to are making money by restricting access to scientific data created and provided by the scientific community (and not b y yourself)  then not only your effect, but your intent is also clear.

Leave a Reply