Open Data – preservation

An interchange with a correspondent…

You [PMR] said in your Blog:

It is critical to distinguish between “Free” and Open. “Free”, in this context, simply means that the provider has mounted the data (not necessarily the whole data) on a web page. There is often no licence, no copyright, no guarantee of availability, no commitment to archival, no explicit freedom of re-use. The materials database is in this category – and to be fair it didn’t call itself Open.

The Open Knowledge Initiative says:
1. Access The work shall be available as a whole and at no more than a reasonable reproduction cost, preferably downloading via the Internet without charge. The work must also be available in a convenient and modifiable form.
================================
[Correspondent] This does not seem to go far enough in that if I have all good intentions and post material on a web server and then drop dead tomorrow the info will disappear pretty soon after that. Possibly lost forever!
The distinction you make between “free” and “Open” suggests that Open means there is some permanency to the arrangement of having it available? Am I interpreting this correctly? How could this be monitored or managed?

PMR: You are absolutely right. I think the problem of preservation has not been addressed. Indeed until I starting thinking about it I hadn’t realised how relatively simple the preservation of text was and how difficult the preservation of data was.
First: Who can one trust?It’s currently easy to deposit material with anyone – Google, Amazon, whoever. And to trust to spinning media to be replicated. But it’s very risky for long term preservation. There are many bodies working on this and the simple message is that it’s difficult, depends on what we want to preserve and how long we want to do it. There are many levels – the bitstream, the semantic content, the ontological context, etc. Places like the UK’s Digital Curation Centre understand and work on exactly this.
[ Correspondent] is worried – like me – about the archival of chemical data iwithin the laboratory. What should be done? My personal answers are:

  1. If the data is valuable enough an international data centre will store it. Biology is strong here and bioinformatics centres have an effective commitment to archival and also work on preservation. Chemistry relies on commercial and quasi-commercial organisations which generally accomplish far less.
  2. My own inclination is towards global domain-specific repositories where the data are difficult and the volume merits it; national ones where the problem is understood but needs supporting (e.g. in mainstream chemistry) and where possible departmental ones (e.g. for crystallography, spectroscopy, computational chemistry.)
  3. I am not a strong supporter that terabytes of data should be generally dumped in institutional repositories without good metadata and analysis software. Maybe future generations will welcome these hidden treasures and will have super-intelligent software.
This entry was posted in data, open issues. Bookmark the permalink.

One Response to Open Data – preservation

  1. Pingback: Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » Open means Libre

Leave a Reply

Your email address will not be published. Required fields are marked *