Tuesday, 23 September 2014

Digital Preservation Strategies ...

I came across a beautifully succinct quotation from National Records of Scotland:

‘If digital records are not captured there can be no preservation and
if there is no preservation there can be no access’

Which is a beautifully concise description of why we do data capture. If we don’t there is no way of retracing our steps, no way of of substantiating research, because we don’t have the original data.

And of course, if we don’t have the data all our arguments about preferred archival formats are moot. And in a very real sense they are anyway - formats change over time, and preferences change over time. Legal documents and court transcripts in Wordperfect from the nineties are a key example.

They may still have validity, but they are in a dead file format. No one when they created these transcripts knew that in twenty years the files would be in a dead format - they chose a widely used well documented format - it’s just that preferences changed.

Tools such as Tika, Pronom and Fido give us a chance on capture of also being able to record information about the file format, which gives us a clue about how we might read the file in the future.

And of course technology to read files changes as well, all we can do is try and make sensible decisions to make life easy for anyone who wants to access captured files.

File normalisation is one - what of course it really means is ‘convert files in a known proprietary format to an open format on ingest’ - usually using something like libre office in batch mode, and storing the converted file along with the original.

The idea is of course, that the converted file will be easier to read as it’s in an open format than a proprietary format. Of course, when we say proprietary format we mean Microsoft because we worry about its dominance of the file format ecology.

And we are of course most certainly wrong - there is just so much material in Microsoft formats that it is difficult to believe that there will be a future in which there are no applications to read these files - what one should be worrying about is the less well used formats such as Pages or AbiWord where there is a greater risk of losing access.

But the point remains, that unless we capture the files in the first place we will have no chance of reading them in the future …

