Wednesday 22 May 2019

The joy of bibtex ...

The project's been chugging along nicely, and I've nearly finished documenting the dispensary and the back shop - we originally thought that there would be around 4000 items in total, but I've already documented around three and a half thousand, and there's still the shop to do.

Recently, one of the groups of items documented was a set of reference books - pharmocopaeias mainly, the earliest from 1914, the latest from 1963.

Too early to have ISBN's, and some different editions of the same pharmacopaeia.

So, how to document them and provide a unique reference, and preferably one that was machine readable?

BibTex!

All the books, and the correct editions, were on the National Library of Australia's catalogue which provdes a handy download of the BibTex reference, which gives us a professionally compiled description of the item, plus a catalogue reference to the NLA's catalogue to allow someone in the future to do a simple double check.

The one exception was a book which I couldn't find in the NLA, or any of the state libraries in Australia, but did find in the British Library, which unfortunately doesn't provide a handy citation export in BibTex format.

I could, I suppose, have downloaded the citation in the BL's preferred format and run it through one of the Endnote to BibTex, or Marc to BibTex conversion tools. but as it was only one entry, downloading, installling, and then checking the output seemed almost as much work as creating an entry by hand, so I ended up hand creating an entry based on the BL's RIS output.

And why BibTex?

Two reasons: (1) it's a common well documented format and (2) as well as being machine readable, its also human readable - more or less - which makes it easy for any future researcher or archivist using the data I've created to be sure that it was this edition and not that edition ...

Wednesday 8 May 2019

recovering data from garages

Earlier today a former colleague retweeted this:


and strangely, I've been here before.

When I was managing the ANU's various ANDS funded data capture projects we made use of company in Perth that specialised in reading old tapes - in particular for the mining industry, but they would read anything - for a fee of course.

As part of the DC7A project, we used this company to read seismological data that was locked away on piles of DAT tapes that no one could read any more, due to no one on campus having suitable hardware.

As is the way in universities,  a researcher in social sciences, who worked in PNG heard of this.

He'd recently found some old 9 track tapes in a colleague's garage, and he recognised them as likely to hold a copy of some data from the PNG government. More importantly he thought that it might be data that the PNG government had lost as a result of a hardware failure.

Details were sketchy, there were some paper labels that identified them as 9track ascii tapes, but that was about it.

Any way I talked to our data recovery company and they were happy to give it a go.

Fortunately, despite languishing in a Canberra garage, the tapes were readable and were in a straight forward comma delimited format, rather than some old proprietary compressed data format, or some strange format used by some now forgotten data manipulation software.

So the data was recoverable and could be returned to the PNG government.

Now I'm not blogging about this seven years after the event to show how good I am, but rather to show that old data can (with a bit of luck) be recovered.

But to simplify your task do the following:


  • try and find if there's anyone still left who remembers the days of tapes - hopefully they might be able to help interpret the (paper) labels stuck on the tapes.
  • talk to the people who are going to read the tapes. Chances are the tape will be in a 9 track format and be ascii encoded, unless it came from somewhere that used IBM or Amdahl mainframes where it might be EBCDIC
  • don't be put off by people mentioning dead manufacturers like Prime or Data General, 9 track was a fairly standard format
  • When you finally get to read the data, remember that even though it's been recorded in a standard way it doesn't mean that the data isn't in some proprietary format - again if you can find someone who knew about the original data they might remember the name of the software package used
do some detective work, and chances are you might luck out, and don't be afraid to ask questions ...