Andrew Jackson of the British Library
has recently published a
study of the use of particular file types over time, focusing on
pdf, image and HTML file versions in an attempt to define whether
being widely distributed and in use is a guard against obsolescence.
It's a valuable and interesting chunk
of work. However it's possible to pick a couple of holes in the study
:
- it dosn't address the problem of different document formats, eg the variation in doc and ppt formats as recently exemplified by Chris Rusbridge's attempt to recover some powerpoint 4 format files
- it doesn't explore the problem of legacy formats – my favourite examples are Claris Works and AmiPro files, and also those legacy foramts without a mime type – such as data formats used by specialist dataloggers
However what it does show is that once
a format is in common use it is protected against obsolescence. The
real problem is with formats from the
days before storing documents on the web became the default for
many people and the conventions were not fully established.
For example I recently needed to check
some documentation about a legacy file format. The manufacturer had
put the documentation on the web as TeX files. While perfectly
readable this did entail installing OzTeX to read the downloaded
file.
Andrew's study also did not address the
problem of legacy media formats such as exabyte tapes and the rest.
To be fair he explicitly only looked at the UK web corpus, which by
definition is online, which meant that he was only concerned with
file formats, not media formats.
It would be interesting to run a
similar study over the filestore of a medium to large university and
see how large a diversity of file type there were, as well as
rerunning the study to look at document formats ...
No comments:
Post a Comment