Shared data? Open data?

Heather (Research Remix) asks a key question:

15:49 18/07/2007, Heather Piwowar,

Quick wondering. My research is on data re-use. I struggle with what to call the source datasets. I’d like to call them “open data” but they aren’t, necessarily. Sometimes not free, and usually not open in a licensing sense. I’ve been calling them “shared data” which seems ok, but isn’t mainstream and so doesn’t help link the work in to others who are perhaps interested in the same ideas. Publicly-available data? Even more unwieldy.
I’m on the lookout for a better phrase. Let me know if you have any suggestions?

It’s very clear from recent explorations into mainstream publishers (see many posts on this blog) that the English language is broken for accurate description of right-to-access and right-to-use. “open” “access” “free” “read” are all essentially Humpty Dumpty words.

“When I use a word”, Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean — neither more nor less.”

WP describes this as “he discusses semantics and pragmatics with Alice.”
We cannot and must not continue to use common English (or any other natural language) to describe what we mean in access-to and reuse-of data. Bill Hooker writes:

“Open Access” is not a marketing phrase and you are not free to use it as you see fit.

I remember going to a presentation by a closed source software manufacturer who described their system as “Open architecture”. When I asked if I could see the documentation they said no and I was (correctly) informed that he and I meant different things by “open”. He meant that if a customer bought a product they now got documentation telling them how they could access the functionality – i.e. it was no longer a “black box”. Obviously I overload “open” in a software context to mean “Open Source”.
The problem has been largely (but not completely) solved in software. If I am told something is “Open Source” I Immediately ask “what licence?”. I can then go to the Open Source Software Institute and find out the terms and conditions of the licence. This is our only way forward.
We therefore have to put precise labels on our research output – initially papers and data. I had (naively) thought that the relative lack of progress from publishers was inertia and ignorance and that when it became clear this was necessary they would accept the challenge of describing their output more clearly. It is now clear this will not happen and that the publishers (apart from the aggressive Open Access publishers) are part of the problem, not the solution. Publishers copyright data that does not belong to them. Publishers cut off subscribers who try to download data. Publishers blaze around “free” “choice”, etc. which confuse rather than inform. For a publisher “open” and “free” are to be used like “low fat” “energy food” “healthy” as a way of legitimising current practice.
Heather, the solution lies with Science Commons (a project of Creative Commons) and the real Open Access publishers. (Classic Creative Commons is the right philosophy but the licenses were created for creative works, not scientific data and the licence don’t fit very well. However I would far rather see a CC-BY on a table of melting points than Copyright Wiley).
To all authors out there who wish their data to be re-used and not owned and resold by a publisher, just add CC-BY to your data. It’s not perfect but it works.
I shall return to the actual implicit and explicit licences of publishers in a post in the near future.

This entry was posted in data, open issues. Bookmark the permalink.

One Response to Shared data? Open data?

  1. Jim Downing says:

    To finish off the quote you use: –

    ‘The question is,’ said Alice, ‘whether you can make words mean so many different things.’

    ‘The question is,’ said Humpty Dumpty, ‘which is to be master – that’s all.’

    “Open Source” had to change from “Free Software” in order to be the master of its own name. Will “Open Access” be able to stand its own ground?

Leave a Reply

Your email address will not be published. Required fields are marked *