Effective digital preservation is (almost) impossible; so Disseminate instead

I was just about to go back to refactoring Chem4Word, when I saw this pingback on my blog and just have to comment. It’s really important. More of my comments at the bottom…

Which blogs should be preserved?

Richard M. Davis on 26th June, 2009 at 12:00 pm

You’d think it obvious that my blog should be preserved, though I’m not so sure about yours! According to the poster summarising the fascinating 2007 survey by Carolyn Hank et al: “The majority of bloggers agreed (36%) or strongly agreed (34.9%) that their own blogs should be preserved.” Five per cent don’t want their blogs preserved at all; nearly a quarter aren’t fussed either way.

Here’s one of the data tables (which I had to retype as HTML – Peter Murray Rust is right about PDFs and data):

Table 4. Preservation perceptions – general

		Strongly agree or agree	Neither agree or (sic) disagree	Strongly disagree or disagree
Should preserve	Personal blog	70.9%	23.8%	5.3%
	Every blog	35.8%	27.9%	36.3%
	Every comment	31.4%	31.9%	36.7%
	All online content	28.2%	22.3%	49.5%
Should not preserve	Some blogs	44.7%	27.7%	27.7%
	Some comments	48.4%	31.3%	20.2%
	Some online content	51.3%	24.9%	23.8%

The overall pattern seems a good vindication of our own project approach, which will progressively move from capturing blog content (posts), to addressing comments and content, reflecting the scale of the bloggers’ own priorities.

It also seems a useful juncture in our project to throw open the question: which blogs should we preserve?

With over 5 million active blogs noted by Technorati, it seems daft to even start to enumerate them but in our field (libraries, archives, information science), several stand out, and it’s the very nature and importance of these that bolster the case for keeping them. I have in mind in particular Peter Suber’s Open Access News blog, but also blogs such as those of Peter Murray Rust, Brian Kelly, Lorcan Dempsey, Dorothea Salo, Jill Walker Rettberg – all ripe with contemporary accounts and robust views on matters of scholarly communication. But in every case, we have cause to wonder: will that information survive, will that link still work tomorrow?

What blogs (or types of blogs) do you think should be preserved, and why?

PMR: This is really important. Blogs are evolving and being used for many valuable activities (here we highlight scholarship). Some bloggers spend hours or more on a popst. Bill Hooker has an incredible set of statistics about the cost of Open Access and Toll Access publications, page charges, etc. Normally that would get published in a journal no-one reads (I have even published in such – it was a huge effort and it’s got one citation. Not that I care about citations). So I tend to work out my half-baked ideas in public. Some people do their early science in the Open. Some are activists. Some review the current landscape, etc.

But preservation is really really difficult. I don’t know how to tackle it. Since 1993 I have been determined to preserve my digital record.

And I’ve failed.

I’ve created courses, forums, data sets, teaching-learning objects, blogs, preprints, etc.

And I’ve lost most of them.

There are many reasons. First it’s extremely hard to preserve complex digital objects. The problems include:

compound documents (and only after 15 years is the web coming round to realising this is important)
hyperlinks
moving URLs/URIs
formats
semantic behaviour
disorganised humans (me)
moving institution (4 times)
moving computer (about 10 times)

Henry Rzepa and I have worked hard on this and he is more organized than me. We put early versions of JUMBO on CD-ROMs and got the RSC to distribute them with an issue of the journal. I have saved things on DAT tapes from the SGI. DAT??? SGI??? I don’t have a machine which will read 3.5 floppies at home. I have trashed my much beloved BBC Micro.

Every time I change machine I lose large amounts of data.

At some stage someone will invent a true Memex for my digital activities. Until then:

Preservation is effectively impossible.

So what’s the answer? The only one I can think of at the moment is to disseminate as widely as possible. If people want to read your material they will take copies (if that is technically possible). I would urge University Repositories:

Stop agonizing about preservation and start disseminating.

If it’s worth preserving the the web will have a reasonable chance of containing it somewhere. If it’s not, well history will judge whether our current dross are the jewels of the future. We can’t tell.

DISSEMINATE, DISSEMINATE, DISSEMINATE

MAKE IT OPEN. FORGET COPYRIGHT. JUST PUBLISH.

CREATE LINKED OPEN DATA. LINKED OPEN DATA

CREATE AND RELEASE HERDS OF COWS, NOT PRESERVE HAMBURGERS IN A DEEP-FREEZE

This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Effective digital preservation is (almost) impossible; so Disseminate instead

Richard M. Davis says:

June 26, 2009 at 8:34 pm

Hello Peter, Thanks for noticing our little endeavour. I too have lost untold stuff (I’d hesitate to call it data) over the years; though equally I know I have on this MacBook some files that originated on an Acorn Electron in my student days, and have subsequently been ‘migrated’ through various DOSes, MacOSes and Unices over the years, with much line-ending and charset surgery en route.
Personal preservation is non-trivial and I agree the dissemination (LOCKSS) approach probably gets the best bang-per-buck; at the other end of the scale, we have massive endeavours by the likes of Google, BL, Internet Archive. The simple proposition at the heart of our project is: what if we made it ridiculously easy for any institution (department, centre, library, whatever) to set up a WordPress blog to sit silently in the background, ping the feeds of affiliated public blogs and bloggers, and import the content into a single blog database, easy to search, sort, cite and preserve as part of the institutional record?
Personally I think this is just the kind of thing I’d want to see in my Library Of The Future. As the kind of blogger who I think ought to be represented in such an institutional archive, do you think we might be able to persuade you to “strongly agree or agree” that this might be a worthwhile undertaking? 🙂

- pm286 says:
  
  June 26, 2009 at 8:54 pm
  
  @Richard many thanks – I agree with all of this. Let’s assume the primary mechanism of commmunication was the pamphlet, flyer, bill, etc. and that we stuck these on every wall, lampost, etc. If copying is cheap that’s the equivalent of the OPEN – and it must be Open – approach.
  I very strongly agree.
  Now – what are we going to DO about it. We are trying to develop software that does this. Peter Sefton is doing this with ICE. If we have this type of system on everyone’s desktop and sprayed our output everywhere then we would have publication and we would have preservation. Maybe Google wave will do it
  
Pingback: ArchivePress » Blog Archive » Our first month

Effective digital preservation is (almost) impossible; so Disseminate instead

3 Responses to Effective digital preservation is (almost) impossible; so Disseminate instead

Leave a Reply Cancel reply

Recent Posts

Recent Comments

Archives

Categories

Meta