I have 3-4 important talks to give in the next 2-3 weeks about Openness, Data and how they come together. As I often do I expose the ideas on this blog and hope for feedback. I’m going to highlight the meetings, but first I’m going to explain why IRs are not valuable for what I want to do.
Institutional Repositories – possibly for the last time
I’ve been writing a lot about “Repositories” and I am going to stop writing about them. My 6+ years of involvement with formal “repositories”, essentially Institutional Repositories (IRs), has convinced me that their design and philosophy is completely wrong for modern scientific research, especially data-driven research. They can’t be mended or adapted for science. Where their purposes are clear (and most do not have single clear purposes) they are designed and used for static book-like and article-like objects such as theses, e-articles and reports. The purposes are usually either preservation or research management for the university’s benefit. They are heavily institution-oriented and so only of interest to depositors and users closely involved with the institution. Nothing wrong with that, but some practitioners and advocates go beyond that and suggest they are general solutions – they are not and they will never be.
I’ve enjoyed being part of this community. I’ve been able to get some funding (JISC, Microsoft, thanks). The developer groups have been imaginative and energetic and there has been a small spinoff in generic software and protocols (e.g. SWORD2). But it’s clear after this time that the idea of “Repository” – as somewhere you deposit a fixed and final precious digital object unrelated to the rest of the content is not what we now want in science. The concept of “repository” has never engaged scientists – they either think of “databases” or systems for sharing community resources. I use http://en.wikipedia.org/wiki/Software_repository (s) but they are very different beasts and have many features that IRs don’t have, such as versioning, automatic installation software, and distributed repositories.
As a scientist I’ve tried to engage the IR community but with almost no success (other than words). There may have been opportunities for IRs to provide domain-specific solutions, but these haven’t caught the imagination of the IR managers. I have tried to suggest distributed iteration (e.g. for theses), and more generally the provision of born-digital theses (e.g. Word, not PDF). Again no common interests. I’ve tried to start discussion on support for scientists in laboratories (e.g with distributed versioned data capture). Again no practical interest.
I was disappointed not to get any substantial feedback from my earlier post on Criteria for Data Repositories. I put quite a bit of effort into thinking about it and I got one comment. Clearly this is a community which doesn’t discuss things in public and where there is no sense of electronic community. Maybe occasionally on the DCC-list, but repositories shouldn’t be about curation, they should be about people – which they aren’t. There is, of course, no reason why anyone should reply to me. But it gives the clear message that IRs should not be involved in data – and they should make this clear.
Actually it seems more generally that there is little public discussion of IRs anywhere; not even an active general global mailing list. So I am talking in the wrong direction. It’s saddening how little activity Universities have had in the information revolution. IRs seem to be the largest area of university funding – there’s basically no interest in publication, no interest in data. There is no way of communicating to universities even though I am employed by one.
So from now on I’m going to be addressing the following:
- Groups of practising scientists, especially those building new tools for information
- Enlightened scientific publishers (effectively those committed to Open Access/Data and models where the scientist is not just seen as a commodity for creating income)
Rather than use “Repository” I’ll introduce a new term “Sharer”, as in CodeSharer, DataSharer. I’ll develop these ideas over the new day or so and present them at the meetings
First let me advertise a meeting in Zaragoza (http://grandir.com/EN/debatesessionSTM/ )
Next Thu Aug 25 a technical session organised under the auspices of GrandIR will be held in Zaragoza, Spain, for dealing with the management of STM research data, a yet relatively unexplored field in Spain. Along the meeting the current state of development of the Quixote Project will be also presented as an example. Quixote is a pioneering initiative for research data management in Quantum Chemistry in which several Spanish researchers are involved.
The meeting will have two sections: The first one will introduce the Quixote project, as well as existing national and international research data management initiatives. The talks will be short (15-20 minutes), with 10 extra minutes for questions. The second and core section of the meeting will be a discussion session, aiming to evaluate the needs of researchers and repository managers regarding data management repositories and tools, and to plan collaborations for creating a research data management infrastructure in Spain as a collection of repositories.
I am very appreciative of this – it’s less than a year since we conceived the Quixote system and here the second formal meeting about it. More later.
And then a meeting in Madrid (Int. Union of Crystallography, Triennial) 2011-08-29
In honour of the 20th anniversary of CIF, the upcoming IUCr meeting in
Madrid will feature a COMCIFS-sponsored microsymposium entitled
“Scientific Data Archiving, Exchange and Retrieval in the 21st
Century”. We have three excellent invited speakers, Brian Matthews,
Brian MacMahon and Peter Murray-Rust. These speakers will discuss
various topics drawn from the past, present and future of scientific
data exchange and management.
Here I am trying to work out aspects of how a Data Journal ties into a DataSharer. I don’t what I am going to say – some of it will be controversial and may upset some people. The general emphasis will be that primary Scientific data must be universally Open/libre. Since some organisations make their income by selling our data back to us I think we need som change of thought.
Finally I caught a tweet from Gigascience (a new Data Journal? with a DataSharer?):
Lots of useful advice (especially for us) on what makes a successful repository from @petermurrayrust, #opendata
So this is the direction I should now point in, perhaps. I’ll analyse their web site in a future post – there are some plus and minus things…