I had the pleasure of meeting Greg Crane in Phoenix (see below) and last week at our brainstorm on how to fund digital curation. Greg is a remarkable person – a classicist who is compleetely at home creating computer applications. He is familiar with many languages – trick question: “what is the most important language for the study of classics in the Near East [ans below]. Here he reports on the phoenix workshop and also questions our first generation of institutional repositories…
Open Access and Institutional Repositories: The Future of Scholarly Communications, Academic Commons,
Submitted by Greg Crane on December 16, 2007 – 10:19am.
Institutional repositories were the stated topic for a workshop convened in Phoenix, Arizona earlier this year (April 17-19, 2007) by the National Science Foundation (NSF) and the United Kingdom’s Joint Information Systems Committee (JISC). While in their report on the workshop, The Future of Scholarly Communication: Building the Infrastructure for Cyberscholarship, Bill Arms and Ron Larsen build out a larger landscape of concern, institutional repositories remain a crucial topic, which, without institutional cyberscholarship, will never approach their full potential.
PMR: Although I’m going to agree generally with Greg I don’t think the stated topic of the workshop was institutional repositories per se. It was digital scholarship, digital libraries and datasets. I would expect to find many datasets outside institutions (witness the bio-databases).
Repositories enable institutions and faculty to offer long-term access to digital objects that have persistent value. They extend the core missions of libraries into the digital environment by providing reliable, scalable, comprehensible, and free access to libraries’ holdings for the world as a whole. In some measure, repositories constitute a reaction against those publishers that create monopolies, charging for access to publications on research they have not conducted, funded, or supported. In the long run, many hope faculty will place the results of their scholarship into institutional repositories with open access to all. Libraries could then shift their business model away from paying publishers for exclusive access. When no one has a monopoly on content, the free market should kick in, with commercial entities competing on their ability to provide better access to that freely available content. Business models could include subscription to services and/or advertising.
Repositories offer one model of a sustainable future for libraries, faculty, academic institutions and disciplines. In effect, they reverse the polarity of libraries. Rather than import and aggregate physical content from many sources for local use, as their libraries have traditionally done, universities can, by expanding access to the digital content of their own faculty through repositories, effectively export their faculty’s scholarship. The centers of gravity in this new world remain unclear: each academic institution probably cannot maintain the specialized services needed to create digital objects for each academic discipline. A handful of institutions may well emerge as specialist centers for particular areas (as Michael Lesk suggests in his paper here).
The repository movement has, as yet, failed to exert a significant impact upon intellectual life. Libraries have failed to articulate what they can provide and, far more often, have failed to provide repository services of compelling interest. Repository efforts remain fragmented: small, locally customized projects that are not interoperable–insofar as they operate at all. Administrations have failed to show leadership. Happy to complain about exorbitant prices charged by publishers, they have not done the one thing that would lead to serious change: implement a transitional period by the end of which only publications deposited within the institutional repository under an open access license will count for tenure, promotion, and yearly reviews. Of course, senior faculty would object to such action, content with their privileged access to primary sources through expensive subscriptions. Also, publications in prestigious venues (owned and controlled by ruthless publishers) might be lost. Unfortunately, faculty have failed to look beyond their own immediate needs: verbally welcoming initiatives to open our global cultural heritage to the world but not themselves engaging in any meaningful action that will make that happen.
The published NSF/JISC report wisely skips past the repository impasse to describe the broader intellectual environment that we could now develop. Libraries, administrators and faculty can muddle through with variations on proprietary, publisher-centered distribution. However, existing distribution channels cannot support more advanced scholarship: intellectual life increasingly depends upon open access to large bodies of machine actionable data.
The larger picture depicted by the report demands an environment in which open access becomes an essential principle for intellectual life.The more pervasive that principle, the greater the pressure for instruments such as institutional repositories that can provide efficient access to large bodies of machine actionable data over long periods of time. The report’s authors summarize as follows the goal of the project around which this workshop was created:
To ensure that all publicly-funded research products and primary resources will be readily available, accessible, and usable via common infrastructure and tools through space, time, and across disciplines, stages of research, and modes of human expression.
To accomplish this goal, the report proposes a detailed seven-year plan to push cyberscholarship beyond prototypes and buzzwords, including action under the following rubrics:
- Infrastructure: to develop and deploy a foundation for scalable, sustainable cyberscholarship
- Research: to advance cyberscholarship capability through basic and applied research and development
- Behaviors: to understand and incentivize personal, professional and organizational behaviors
- Administration: to plan and manage the program at local, national and international levels
For members of the science, technology, engineering, and medical fields, the situation is promising. This report encourages the NSF to take the lead and, even if it does not pursue the particular recommendations advocated here, the NSF does have an Office of Cyberinfrastructure responsible for such issues, and, more importantly, enjoys a budget some twenty times larger than that of the National Endowment for the Humanities. In the United Kingdom, humanists may be reasonably optimistic, since JISC supports all academic disciplines with a healthy budget. Humanists in the US face a much more uncertain future.
PMR: I would agree with Greg that IRs are oversold and underdeliver. I never expected differently. I have never yet located a digital object I wanted in an IR expect when I specifically went looking (e.g. for theses). And I went to Soton to see what papers of Stevan’s were public and what their metadata were. But I have never found one through Google.
Why is this? The search engines locate content. Tyr searching for NSC383501 (the entry for a molecule from the NCI) and you’ll find: DSpace at Cambridge: NSC383501
But the actual data itself (some of which is textual metadata) is not accessible to search engines so isn’t indexed. So if you know how to look for it through the ID, fine. If you don’t you won’t.
I don’t know what the situation is in humantities, so I looked up the Fitzwilliam (the major museum in Cambridge) newsletter and looked for “The Fitzwilliam Museum Newsletter Winter 2003/2004″ in Google and found: DSpace at Cambridge: The Fitzwilliam Museum Newsletter 22 but when I looked for the first sentence “The building phase of The Fitzwilliam Museum Courtyard“ Google returned zero hits.
So (unless I’m wrong and please correct me), deposition in DSpace does NOT allow Google to index the text that it would expose on normal web pages. Jim explained that this was due to the handle system and the use of one level of indirection – Google indexes the metadata but not the data. (I suspect this is true of ePrints – I don’t know about Fedora).
If this is true, then repositing at the moment may archive the data but it hides it from public view except to diligent humans. So people are simply not seeing the benefit of repositing – they don’t disover material though simple searches.
So I’m hoping that ORE will change all this. Because we can expose all the data as well as the metadata to search engines. That’s one of the many reasons why I’m excited about our molecular repositories (eChemistry) project.
As I said in a previous post, it will change the public face of chemical information. The key word for this post is “public”. In others we’ll look at “chemical” and “information”.
[ans: German. Because the majority of scholarship in the C19 was in German.]