John Marks (ESF) introduced our session and set the scene on the need for Open Data and sharing. He stated strongly that it was essential that we had discipline-specific repositories for different branches of science. I share this view and blogged it recently (berlin5 : how to progress Open Data?).
My stance comes from meetings this year where I have talked to many people about institutional repositories. I ask them “why are you setting up an IR?” I have got about 8 distinct answers. Very few of them mention data.
Some of us addressed these issues at ETD2007. There are hundreds of different types of biologiocal data, tens of chemistry data, humderds of geoscience, etc. There is no way that these managers – with the best will in the world – will know how to manage them. So I wrote:
although there is quite a lot of activity in institutional digital repositories they won’t (and shouldn’t) address Data. It’s subject-specific and too complex for the average repository manager.
PMR: In response to this Dorothea Salo (who has run Caveat Lector blog for some years and has a strong following).
PMR: I haven’t met Dorothea but I’d like to – her blog is insightful and entertaining and she is unafraid to speak out. She’s also technically proficient in the IT skills required – XML, etc. And the last thing I want to do is upset and antagonize people like Dorothea.
But… There is no single human on the planet who knows how to reposit all of protein structures, variable stars, ice sheets, chemical structures. It needs much more than metadata. So what can a repository manager do. Putting the raw data into the repository without understanding it is not an option. It has to go into a system devised by experts in the discipline. And, for me, that means subject repositories. Maybe each university has a different one. Maybe they are national.Some, like the bioscience ones, will be international.
Disagree somewhat that IRs and their managers shouldn’t address data, though I agree that for now it’s impractical because the software is so wretched and the technical infrastructure insufficiently scalable. Just because IR software in its current state is completely broken with regard to data doesn’t mean it must or should stay that way, though. Moreover, the notion that “domain knowledge” is the sole key to data curation is (bluntly) bunk, and nobody’s yet tested the assertion that it’s harder to teach a librarian domain knowledge than to teach a discipline-practitioner info management.Frankly, “it differs by discipline” doesn’t matter. So does everything else in librarianship, from reference transactions to collection development. We cope. It’s our job to. As for “too complex,” says who? And about which librarians? I think I’ve just been insulted.
There’s nothing wrong with telling librarians — and the subset of librarians who are repository managers — that we need to brush up our game to deal with these issues. I have a plan in place to learn the principles of data curation for myself over the next year or so. I want to see more librarians planning the same!
Looks like a good talk. Wish I could be there to hear it!