ETD2009; Make Electronic Theses properly visible

This is my last post on ETD2009 #etd09. It was great to meet and re-meet many of the people who have developed the ideas an practice of electronic theses and dissertations. We had a great evening out on the Pittsburgh river and many useful discussions.

So I apologize if I am critical of current practices in theses and institutional repositories. Please argue back. And I acknowledge I was not present for the whole meeting.

I got the overwhelming impression that the major purpose of putting theses in Irs was to preserve them. There was an after-lunch talk from Deanna Marcum from Library of Congress which stressed preservation and the benefits of copyright (LOC was permitted to copy things so they could be preserved). She mentioned the data deluge – labs with terabytes/day and acknowledged this was a preservation problem. Later, when the delegates were asked for straw polls for why we should have ETDs in repositories the largest vote was for preservation. Although there was some appreciation of the fact that theses now had a wider readership, there was little discussion of how they could enhance the visibility of theses.

And absolutely no expressed appreciation of the fact that someone might wish to download 10,000 theses at once.

There were far too many presentations about metadata-gateways to theses. The infrastructure still seems to be:

  • precious thesis submitted to IR, in precious PDF.

  • IR-metadata expert spends time indexing this properly as authors are no good at metadata and full-text doesn’t work. [I challenged the latter absolutely and pointed out that for anything other than text maths, chemistry, protein sequences, etc. human metadata experts are irrelevant].

  • A commercial metadata organisation is allowed access to thesis metadata to create complex archaic arcane metadata structure where users (probably not even readers but subject librarians) search for individual items by metadata.

  • Thesis is embargoed from view if there is any FUD it might offend a publisher.

This is so far out of track with the C21 that I don’t know where to begin.

In the modern web we are developing Linked Open Data. This linking is largely being done by robots. It works like this:

  • Information provider (e.g. scientist) creates information in web-friendly form. HTML is designed for the web, so use that. As the web evolves we will use RDF, microformats, etc. This information will be added by robots. But for now bog-standard HTML works very well.

  • Expose the information on web pages.

That’s it.

The search engines are smarter and more numerous than metadata specialists. They know how to get the best out of full text. The search engines in our group at Cambridge understand chemical language. Soon, very soon, they will understand chemical diagrams. That is 100x more than can be added by a metadata specialist, even a chemical one.

The answer is simple. Create Open Theses in HTML and publish them. Use IR’s if you think that’s a useful way of making them permanent but it’s not required.

That’s all.

So we can index the academic web It would be useful to have a user-accessible list of academic IRs for our robots to scan academia. In RDF, please.

[I will give my own views on preservation later. I do care about it. But not to the exclusion of making material visible.]

This entry was posted in Uncategorized. Bookmark the permalink.

4 Responses to ETD2009; Make Electronic Theses properly visible

  1. JURN says:

    The missing link – the URL _is_ the metadata. Use the URL to carry some basic human-readable machine-sortable metadata in plain-English. Stick to a universal standard for directory structures at repository servers, and the URLs never break.

  2. JURN says:

    Example: w*w.ouruniversity.ac.uk/our-repository/free-full-text/theses/chemistry/organic-chemistry/2009-peter-adams-on-aromatic-hydrocarbons.pdf

Leave a Reply

Your email address will not be published. Required fields are marked *