I am going through the various sections in my presentation to http://cdrs.columbia.edu/cdrsmain/2013/01/esearch-data-symposium-february-27-2013/ . I’ve got to “Problems in Managing Research Data”. Warning: This section is uncomfortable for some. In rough order (I might swap 1 and 2):
-
Vested Commercial interests. There are at least these problems:
-
STM publishers. I’ll concentrate on this because until we have an Open Data Commons we can’t work out how to manage it. STM publishers not only stop me and other getting the data, they stifle innovation. That leaves STM about 15 years behind commerce, civics, and the Open movement in terms of technologies, ontologies, innovation.
-
Instrument manufacturers. Many instruments produce encrypted or proprietary output which cannot be properly managed. In many cases this is deliberate to create lockin.
-
Domain software.
Some manufacturers (e.g of computational chemistry software) legally forbid the publication of results, probably to prevent benchmarking for performance and correctness of science
-
Materials.
Many suppliers will not say what is in a chemical, what the properties of a material are, etc. You cannot build ontologies on guesswork nor create reliable metadata
-
STM publishers. I’ll concentrate on this because until we have an Open Data Commons we can’t work out how to manage it. STM publishers not only stop me and other getting the data, they stifle innovation. That leaves STM about 15 years behind commerce, civics, and the Open movement in terms of technologies, ontologies, innovation.
- Academic apathy and misplaced values. I continue to be appalled by the self-centeredness of academia. The debate is data is “how can my data be given metrics”, not “how can I make data available for the good of humanity”. Yes, I’m an idealist, but it hasn’t always been this way. It’s possible to do good scholarship, that is useful, and that is recognised. But academia is devising systems based on self-glorification. With different values, the publisher problem would disappear. The Super Happy Block Party Hackathon (Palo Alto) shows how academia should be getting out and working for the community.
-
Intrinsic difficulty. Some research data is hard. A lot of bioscience. But they solve that by having meetings all the time on how to describe the data. You can’t manage data that you can’t describe. I’ve been working with Dave M-R on integrating the computational declarative semantics of chemistry and mathematics. That’s completely new ground and it’s hard. It’s essential for reproducible computational chemistry (a billionUSD+ activity). Creating chemical ontologies (ChemAxiom, Nico Adams) is hard. Computational ontologies (OWL) stretch my brain to its limits. Materials science is hard. To understand piezoelectricity you have to understand a 6*6 tensor.
But that’s what the crystallographers have been doing for 30 years. And they have built the knowledge engines.
- Finance. The least problem. If we want to do it, then the costs is a very small proportion of the total research funding. And a miniscule amount of what we pay the STM publishers. Open Street map was built without a bank balance.
It’s simple. You have to WANT TO DO IT. The rest follows.
Pingback: Unilever Centre for Molecular Informatics, Cambridge - #rds2013 Managing Research Data « petermr's blog