Effective digital preservation is (almost) impossible; so Disseminate instead

I was just about to go back to refactoring Chem4Word, when I saw this pingback on my blog and just have to comment. It’s really important. More of my comments at the bottom…

Which blogs should be preserved?

Richard M. Davis on 26th June, 2009 at 12:00 pm

Youd think it obvious that my blog should be preserved, though Im not so sure about yours! According to the poster summarising the fascinating 2007 survey by Carolyn Hank et al: The majority of bloggers agreed (36%) or strongly agreed (34.9%) that their own blogs should be preserved. Five per cent dont want their blogs preserved at all; nearly a quarter arent fussed either way.

Heres one of the data tables (which I had to retype as HTML Peter Murray Rust is right about PDFs and data):

Table 4. Preservation perceptions general

Strongly agree or agree

Neither agree or (sic) disagree

Strongly disagree or disagree

Should preserve

Personal blog




Every blog




Every comment




All online content




Should not preserve

Some blogs




Some comments




Some online content




The overall pattern seems a good vindication of  our own project approach, which will progressively move from capturing blog content (posts), to addressing comments and content, reflecting the scale of the bloggers own priorities.

It also seems a useful juncture in our project to throw open the question: which blogs should we preserve?

With over 5 million active blogs noted by Technorati, it seems daft to even start to enumerate them but in our field (libraries, archives, information science), several stand out, and its the very nature and importance of these that bolster the case for keeping them. I have in mind in particular Peter Subers Open Access News blog, but also blogs such as those of Peter Murray Rust, Brian Kelly, Lorcan Dempsey, Dorothea Salo, Jill Walker Rettberg all ripe with contemporary accounts and robust views on matters of scholarly communication. But in every case, we have cause to wonder: will that information survive, will that link still work tomorrow?

What blogs (or types of blogs) do you think should be preserved, and why?

PMR: This is really important. Blogs are evolving and being used for many valuable activities (here we highlight scholarship). Some bloggers spend hours or more on a popst. Bill Hooker has an incredible set of statistics about the cost of Open Access and Toll Access publications, page charges, etc. Normally that would get published in a journal no-one reads (I have even published in such it was a huge effort and it’s got one citation. Not that I care about citations). So I tend to work out my half-baked ideas in public. Some people do their early science in the Open. Some are activists. Some review the current landscape, etc.

But preservation is really really difficult. I don’t know how to tackle it. Since 1993 I have been determined to preserve my digital record.

And I’ve failed.

I’ve created courses, forums, data sets, teaching-learning objects, blogs, preprints, etc.

And I’ve lost most of them.

There are many reasons. First it’s extremely hard to preserve complex digital objects. The problems include:

  • compound documents (and only after 15 years is the web coming round to realising this is important)

  • hyperlinks

  • moving URLs/URIs

  • formats

  • semantic behaviour

  • disorganised humans (me)

  • moving institution (4 times)

  • moving computer (about 10 times)

Henry Rzepa and I have worked hard on this and he is more organized than me. We put early versions of JUMBO on CD-ROMs and got the RSC to distribute them with an issue of the journal. I have saved things on DAT tapes from the SGI. DAT??? SGI??? I don’t have a machine which will read 3.5 floppies at home. I have trashed my much beloved BBC Micro.

Every time I change machine I lose large amounts of data.

At some stage someone will invent a true Memex for my digital activities. Until then:

Preservation is effectively impossible.

So what’s the answer? The only one I can think of at the moment is to disseminate as widely as possible. If people want to read your material they will take copies (if that is technically possible). I would urge University Repositories:

Stop agonizing about preservation and start disseminating.

If it’s worth preserving the the web will have a reasonable chance of containing it somewhere. If it’s not, well history will judge whether our current dross are the jewels of the future. We can’t tell.





  1. Hello Peter, Thanks for noticing our little endeavour. I too have lost untold stuff (I’d hesitate to call it data) over the years; though equally I know I have on this MacBook some files that originated on an Acorn Electron in my student days, and have subsequently been ‘migrated’ through various DOSes, MacOSes and Unices over the years, with much line-ending and charset surgery en route.
    Personal preservation is non-trivial and I agree the dissemination (LOCKSS) approach probably gets the best bang-per-buck; at the other end of the scale, we have massive endeavours by the likes of Google, BL, Internet Archive. The simple proposition at the heart of our project is: what if we made it ridiculously easy for any institution (department, centre, library, whatever) to set up a WordPress blog to sit silently in the background, ping the feeds of affiliated public blogs and bloggers, and import the content into a single blog database, easy to search, sort, cite and preserve as part of the institutional record?
    Personally I think this is just the kind of thing I’d want to see in my Library Of The Future. As the kind of blogger who I think ought to be represented in such an institutional archive, do you think we might be able to persuade you to “strongly agree or agree” that this might be a worthwhile undertaking? 🙂

    • pm286 says:

      @Richard many thanks – I agree with all of this. Let’s assume the primary mechanism of commmunication was the pamphlet, flyer, bill, etc. and that we stuck these on every wall, lampost, etc. If copying is cheap that’s the equivalent of the OPEN – and it must be Open – approach.
      I very strongly agree.
      Now – what are we going to DO about it. We are trying to develop software that does this. Peter Sefton is doing this with ICE. If we have this type of system on everyone’s desktop and sprayed our output everywhere then we would have publication and we would have preservation. Maybe Google wave will do it

