In the Science Commons meeting Creating a Vision for Making Scientific Data Accessible Across Disciplines (see earlier post) Andrew Lawrence (Royal Observatory Edinburgh) illustrated the wide range of “ownership” of data even in a single discipline – physics – I hope my notes do it justice.
Distinguish “ownership” from re-use. I can continue to own data while allowing others to use it. Legal constraints (formal) vs. community practices (informal). Data (private until publication) vs. knowledge (public, universal). Technology and policy must address all of these.
He showed a knowledge chain:
- raw data (directly from instrument)
- calibrated data (skymaps, catalogues)
- physical properties (particular knowledge)
- understanding (properties in general)
Generally the data in 1/2 “belong” to the experimental team and are separated from 3/4 by the “public ownership line”
In physics he described three communities:
- Condensed matter (solids, liquids, surfaces, etc.) generally done in small labs and small teams, sometimes needing experiements on a facility such as a reactor or synchrotron. Data are very sensitive until publication after which they are thrown away. However there is political pressure (especially from the funders and facilities) to re-use the data.
- Particle Physics. The epitome of big science, big facilities (CERN, etc.) big teams often with “Stalinist” control. Data belongs to the project with elebaorate rules for access and re-use. Good data infrastructure (these people gave us the Grid). The assert that data re-use is pointless (who else could re-use it?) but offer the infrastructure for re-use
- Astrophysics. (small) Big facilities (telescopes) but small teams – might get a few nights use at a time. Analyze, publish, throw away. BUT the facilities archive all the data – they are private for a year and then anyone can access them. There are standards for archival, formats, access, analysis. The “Virtual Observatory” provides data that are “science ready” – i.e. a potential user should be able to understand the provenance and know how to use them. (big) Systematic surveys (e.g. Hubble telescope) which produce “science-ready” archives. Everything is public from the start. “The archive becomes the sky”
The Virtual Observatory has a small set of professional service centres and a large set of end-users. Andrew finished by making a case for global standards, well-funded data centres, infrastructure in software, and data servers.
So where are other subjects in this? I traditionally look to biosciences with envy for their open data and requirements to publish. But even here we are under threat. Tim Hubbard quoted Graham Cameron as saying that IPR restritions would have made it impossible to build the European Bioinformatics Institute today. And universities continue to urge IPR protection which is rapidly creating the anticommons. In crystallography there is an enviable requirement to publish data alongside full-text articles. Some publishers (rightly) regard this as copyright-free while others (to be named-and-shamed later) carry out creative works by adding their copyright to experimental data.
And chemistry… … mainly publish and throw away… …most data is lost. Our SPECTRa project is looking at why this is so – so far our findings show it is social factors (“ownership”) that are the main factor. And re-use? We publish hamburgers so there aren’t many cows.
Pingback: Unilever Centre for Molecular Informatics, Cambridge - petermr’s blog » Blog Archive » Presentation to Open Scholarship 2006