Sunday, 5 October 2008

what is digital preservation for?

I have been thinking a lot about digital archiving/preservation and what is the use behind it. In part I've been doing this to clarify my thoughts, as while the technologies of digital archiving and preservation are well understood the purpose is not and often different purposes are conflated. So lets step through the various options:

Digital Archiving as preservation

Here one essentially wants to keep the data for ever, cross hardware upgrades and format changes. Essentially what one is doing is taking a human cultural artifact such as a medieval manuscript, an aboriginal dreamtime story as recorded and making a digital version of it and keeping the file available for ever.

This has three purposes:

1)Increased access - these artifacts are delicate and cannot be accessed by everyone who wishes to. Nor can everyone who wishes access have or can afford access. While the preservation technology is expensive access is cheap - this is being written on a computer that cost be $83 to put together. This also has the important substrand of digital cultural repatriation - it enables access to the conserved materials by the originators and cultural owners. Thus, to take the case of a project I worked on, Australian Aborigines were too impoverished to conserve photographs and audio recordings of traditional stories and music, digital preservation allows copies of the material to be returned to them without any worries about its long term preservation.

2) Long term preservation. The long term conservation of digital data is a 'just add dollars' problem. The long term preservation of audio recordings, photographs, is not. And paper burns. Once digitised we can have many copies in many locations - think clockss for an example design and we have access for as long as we have electricity and the Internet.

3) Enabling new forms as scholarly research. This is really simply an emergent property of #1. Projects such as the Stavanger Middle English Grammar project are dependent on increased access to the original texts. Without such ready access it would have been logistically impossible to carry out such a study - too many manuscripts in too many different places.

Digital archiving as publication

This seems an odd way of looking at it but bear with me. Scholarly output is born digital these days. It could be as an intrinsically digital medium such as a research group's blog, or digitally created items such as the TeX file of a research paper or indeed a book.

This covers e-journals and e-press as well as conventional journals, which increasingly also have a digital archive.

These technologies have the twin functions of increasing access - no longer does one have to go to a library that holds the journal one wants, and likewise one has massively reduced the costs of publication.

Of course there's a catch here. Once one had printed the books and put them in a warehouse the only costs were of storage. These books were then distributed and put on shelves in libraries. Long term preservation costs was that of a fire alarm, an efficient cat to keep the depredations of rats and mice in check and a can of termite spray. OK, I exaggerate, but the costs of long term preservation are probably higher, in part due to the costs of employing staff to look after the machines and software doing the preserving and making sure that things keep running.

The other advantage is searchability. One creates a data set and then runs a free text search engine over it. From the data comes citation rankings, as loved by university administrators to demonstrate that they are housing productive academics (e-portfolios and the rest) and also the creation of easy literature searches - no more toiling in the library or talking to geeky librarians.

Digital preservation as a record

Outside of academia this is seen as the prime purpose of digital preservation. It is a way, by capturing and archiving emails and documents of creating a record of business, something that government and business has always done - think the medieval rent rolls of York, the account ledgers of Florentine bankers, and the transcripts of the trial of Charles Stuart in 1649. While today they may constitute a valuable historical resource at the time they served as a means of
record to demonstrate that payment had been made and that due procedure had been followed.

In business digital preservation and archiving is exactly that, capturing the details of transactions to show that due process has been followed and because it's searchable, it's possible to do a longitudinal study of a dispute. In the old days it would have been hundreds of APS4's searching through boxes of memo copies, to day it's a pile of free text searches across a set of binary objects.

Digital archiving as teaching

When lecturers lecture, they create a lot of learning aids around the lecture itself, such as handouts, reading lists. The lecture itself is often a digital event itself with a PowerPoint presentation of salient points, or key images, plus also the lecture itself.

Put together this creates a compound digital learning object and some thing that is accessible as a study aid sometime after the event.

While one may not want to keep the object for ever one may wish to preserve either components for re-use or even the whole compound object as the course is only offered in alternate years.

However these learning objects need to be preserved for reuse, and in these
litigious times, to also prove that lecturers did indeed say that about Henry II and consequently students should not be marked down for repeating it in an exam.


So digital preservation and archiving has a purpose, four in fact. The purposes differ but there are many congruences between them.

Fundamentally the gains boil down to increased accessibility and searchability.

The commercial need for such archiving and search should help drive down the cost of preservation for academic and pedagogic purposes. Likewise academic research relevant for digitisation, eg handwriting recognition, and improved search algorithms should benefit business and justify the costs of academic
digital preservation.

