I was just about to blog on the way that institutional repositories hide information rather than advertise it, when I found my thoughts had been anticipated by Dave Haden, Open access search?, Jurn blog, June 12, 2009. He puts the simple question
Pouring out all this open access content is all very well, but where’s the competition and development in open access search?
And where are the simple common standards for flagging open content for search-engine discovery and sorting, for that matter?
He’s absolutely right – and it’s shameful that academia has almost no systematic search for its content. In 1994 the first search engine “the Jump Engine” was developed at the University of Stirling, and there were offerings from national labs, etc. but these were soon taken over by Altavista (yes, there was life before Google). This was when the web was open and spam-free.
I’m making a partial exception for my collaborators at PennState, Lee Giles and colleagues who have developed CiteSeer and ChemXSeer. But those are mainly aimed at formally published scholarly articles and little at institutional content.
Now of course I’m viewing things from the outside, as an independent curator and social entreprenuer, not a librarian or OA evangelist. But it seems to me that burying your Phd thesis deep in a repository cattle-car — seemingly with only a few keywords, an ugly template and an impenetrable URL for company — isn’t serving it or the author very well. Especially in terms of metadata and tagging leading to full-text search discovery. As the authors of “Experiences in Deploying Metadata Analysis Tools for Institutional Repositories” recently wrote in Cataloging & Classification Quarterly (No. 3/4, 2009)…
“Current institutional repository software provides few tools to help metadata librarians understand and analyse their collections.”
DH: Which doesn’t bode well for search-engines aiming to hook into and sort the same metadata. That sort of statement might have been acceptable in 1999, but it’s a damning statement to hear from librarians in 2009. And another paper in the same issue concludes that there is…
“a pressing need for the building of a common data model that is interoperable across digital repositories”.
And he goes on to give a simple idea for making metadata exposed.
Pingback: Four new titles « Jurn blog
I recommend taking a look at BASE. It is based (excuse the pun) on OAI-PMH and uses FAST S&T technology, but don’t let that put you off you 😉
On the other hand, I totally agree that authors do themselves a disservice hiding things away in unlinked institutional archives.
The alternative at the moment seems to be publishing work on a personal website connected to your institution (which should lend it credibility enough to be included in Google Scholar). Of course, adding some sort of standardized metadata to this would make it easier to find in search engines. I have long been a bit of a fancier of Adobe’s XMP in this respect. Given your opinion of PDFs, I hasten to add that XMP can be embedded in anything, including HTML files 😀
On a similar Semantic Web note: perhaps a Linked Data approach to institutional archives will help matters. I suspect that publishing metadata from traditional archives as linked data is more likely to happen than a sudden change in direction for institutional archives.
(Please note, I am not affiliated with any of the companies mentioned above, though I do work for NTNU, the university from which Fast S&T sprang.)