Friday, 31 October 2025

Guerilla cataloguing part 1

The nineteenth century Prussian general von Moltke the elder is reputed to have said 'No battle plan survives contact with the enemy'.

Well it wasn't quite as bad as that, but my first problem when I tried out our tentative cataloguing methodology was that LibraryThing's link to the British Library catalogue kept timing out on me.

However the link to LibraryThing's own Overcat database of library records was robust, so rather than using the British Library, our plan has changed to using Overcat in preference.

Not all records of books published in the nineteenth century are perfect, so sometimes a little bit of editing was required, but basically using Overcat with a little bit of cross checking with the National Library of Scotland and the British Library catalogues along the way our plan seems to work well, even if things went a bit slower than we hoped.

With only ten records so far it didn't seem worth doing a MARC export and then using something like FastMRCview to validate the output.

However, actually handling the books was quite valuable. By chance a number of the books we catalogued today were editions of Mary Elizabeth Braddon's novels.

Interestingly they still had paper stickers on the covers saying they were supplied through Mudie's Circulating Library.

George Mudie ran an important chain of circulating libraries in the mid to late nineteenth century in England charging his subscribers an annual fee of a guinea (£1-1s or a little over A$220 today using the Bank of England's inflation calculator) to borrow one book at a time - in comparison Netflix costs $120 a year with ads or $250 ad-free).

Circulating libraries were a middle class thing due to the up front subscribers' fee. They had a possibly undeserved reputation as a supplier of sensation novels to middle class women, and as a place where men and women could interact unchaperoned.

Mudie is also reputed to be responsible for the three volume novel format so common in the Victorian period as it allowed his libraries to lend out the volumes separately rather than have to stock multiple copies of in demand books.

And, as he bought so many copies of books he became an important wholesaler in his own right supplying books to overseas circulating libraries, including, quite obviously, the Athenaeum.

Incidentally the books had green covers with gold stamping. Fortunately they don't turn up in the list of known books where arsenic green bookcloth was used for the cover, but the list isn't exhaustive, so we  followed the sensible course of using nitrile gloves when handling them, rather than cloth gloves, or indeed handling them by hand.

I also learned a little bit about the business of publishing new editions of books in the late nineteenth century.

At that time books were still typeset by hand using movable type, much as they had been in Caxton's time.

However there was one important difference - once set and proofed they printers would make a papier mâché mould which they would then use to cast a single metal plate that they would use to print the page, and these moulds were called stereotypes.

This of course meant that the type could be quickly broken up and reused, and that, if they kept the moulds, they could quickly make a new set of printing plates if a book needed to be reprinted.

Sometimes, if you look at a late nineteenth century book it will have 'Stereotype Edition' on the title page, meaning that the book was printed by reusing moulds used to print a previous edition, rather than having the type reset.

Interesting what you can learn from cataloguing a few old books...

Thursday, 30 October 2025

Facebook, again

 


Two and a half years ago I abandoned social media, or more accurately these big behemoths that rule our lives, and I've been a lot better for it.

Sure, I've kept on blogging and post links the mastodon, but I've not really engaged with any of 'the socials'.

Unfortunately a lot of local history and community groups have continued to use Facebook and it has got to the point that I need (reluctantly) to rejoin Facebook, if only to lurk and look at posts ...

(I'm leaving it as an exercise for the interested to find my account and I'm not going to do any friend requests or anything like that - the whole experience of re joining has quite unsettling - people I don't know being suggested as friends and a waterfall of suggested posts, at best irrelevant, at worst, fascist right wing flag waving nonsense)

Wednesday, 29 October 2025

Cataloguing postcards

 

A few days ago I wrote about a procedure we had developed for cataloguing removable media at the Athenaeum.

I had been quite impressed by the sophistication of  the documentation provided by some of our contributors with human readable and self explanatory directory and file names, but inevitably there are going to be cases where the filenames and directory names are not human readable.

Now obviously we could rename the files and reorganise them but that's probably not sensible, as there may be references to the original filename in the files or accompanying documentation.

So I thought that probably the best solution would be to create a manifest file to be stored alongside the directory listings. 

As an experiment I thought I'd use as an example a German postcard from 1914 that I'd recently acquired

As with the original methodology for cataloguing removable media, making a manifest fil is actually quite easy if you use the command prompt (cmd.exe).

I'm quite systematic about how I document the various Victorian and Edwardian postcards I've collected over time and  store the scans and information about each postcard in a separate directory under an overall Postcards directory.

In this case the listing looks like this

Volume in drive C is Windows-SSD
 Volume Serial Number is 62F4-DEE9

 Directory of C:\Users\doug_\OneDrive\Victoriana\Postcards\Schwerin 1914


28/10/25  03:37 PM         3,141,869 2025_10_28 15_36 Office Lens.pdf
28/10/25  03:55 AM         1,905,786 IMG_0347.JPG
28/10/25  03:10 PM         1,851,390 IMG_0349.JPG
28/10/25  03:31 PM               377 schwerin.mkd
28/10/25  03:15 PM           545,497 schwerin.png
               5 File(s)      7,445,392 bytes
               2 Dir(s)  369,753,280,512 bytes free
Not particularly meaningful. 

However using the  tree command from the command prompt you can create a directory listing with 

tree /f > manifest.txt

This will give you a nice little tree listing in the directory which you can then annotate using Notepad or similar to create something like this

Folder PATH listing for volume Windows-SSD
Volume serial number is 62F4-DEE9
C:\USERS\DOUG_\ONEDRIVE\VICTORIANA\POSTCARDS\SCHWERIN 1914
    2025_10_28 15_36 Office Lens.pdf - pdf scan of postcard
    IMG_0347.JPG - face of postcard showing address
    IMG_0349.JPG - rear of postcard showing message
    schwerin.mkd - description of postcard in markdown format
    schwerin.png - montage of IMG_0347.jpg and IMG_0349.jpg

which gives you a human readable description of the contents stored in the same directory as the material you are documenting.

As always procedure is everything - if you always call the annotated file listing manifest.txt it will be consistent across all examples.

(And as a note for command prompt nerds I deliberately used tree/f rather than dir/b to create the directory listing. Using the tree command makes the process more general purpose and to take accounts of sub directories and their contents if present. 

As the Linux tree command works similarly it makes the procedure more general than relying on the traditional DOS directory command).

The actual procedure under Linux is slightly different

As the Linux version of tree creates its output file before enumerating the file list you can end up with manifest.txt appearing in the listing.

To avoid this use the command

tree -i > ../manifest.txt

which will create the file in the directory above the current working directory. The -i command suppresses the line drawing characters that give a representation of the directory structure. This creates a simple file that can be annotated as before, and once annotated the file can be moved to your preferred location.



Sunday, 26 October 2025

Baked beans and digital preservation

 


It was a wet cold Sunday morning here in North East Victoria, so we had beans on toast for breakfast and listened to the radio.

We prefer Wattie's beans, a New Zealand brand, because they are not quite as sweet as some of the other common brands.

Wattie's beans are almost unique in that they still come in a non-ringpull can, meaning that if you don't have the required access technology, in this case a can opener, you can't get at the beans.

And this is the first part of digital preservation - you need access to the appropriate technology to read the media, either by knowing someone with the correct kit, getting hold of a suitable access device, such as a floppy drive, a CD drive or a suitable card reader.

And of course, you need how to use them.

Which is why events like the Cambridge Festival of Floppies are important. Old buggers like me who have worked with digital preservation and file format conversion almost all their working lives, are either retired or getting close to it - after all 3.5" floppies dropped out of use roughly twenty five years ago and computers stopped coming with CD drives sometime in the early 2010s. And we won't mention Apple and the weird variable speed floppy drives in pre OS X macs.

So, somehow, the message needs to be passed on, which is why technology workshops are valuable. I might remember about how to cable up a floppy disk controller and access the media, but I'm not going to be around for ever, as are these super convenient USB based floppy drives you can find on ebay


Some day they're going to stop selling them as there's no profit in them, and anyway, no one makes 3.5" drives any more, meaning most of the external USB 3.5" drives you can buy are made using recycled components. (5.25" and these weird 3" drives used by some Amstrad word processors in the nineties in the UK, are another problem entirely - recycled 5.25" and 3" drives in working condition are almost impossible to find.)

Once you've recovered the files there's also the problem of file format.

For more recent content it's not really a problem - the use by digital cameras of JPEG format, and the dominance of Microsoft's file formats and Adobe's pdf have created a monoculture - if you can read the device you can almost certainly access the content.

And if you can't, both Libre Office and AbiWord between them support a wide range of legacy formats.

But that's by no means the whole problem. What do we do with the content once we have recovered it and have assured ourselves we can read it?

This is actually a live problem, up at the Athenaeum we are increasingly receiving donations of people's family history research material on removable media - almost all on USB sticks, although we do have a few CD's and external hard disk drives.

As we are a volunteer organisation with fairly minimal external funding we have the whole problem of being able to preserve the data long term, at least the format monoculture means that we are able to read the scanned letters and look at the old photographs without difficulty.

So, we can read the data, look after the media, and try and find a long term storage solution. And, while the content may be digital, it's mostly derived from non digital sources.

The future, of course, will be different.

As we know, no writes letters any more, and everyone's photographs are saved to the cloud somewhere, which makes will make the whole business of family history and biography rather more difficult.

In fact, there was an article on the ABC's website this morning bewailing the death of the biography, exactly because no one writes diaries or letters and of course there is the question of what happens to our digital content when we die.

What this means is that there is no assurance of long term access to digital content as it increasingly moves to the cloud. 

For people working in the field of family history this increasingly means that all their material is stored in the cloud. Even if it was originally in a non digital format it will have been scanned, indexed and stored.

If someone does some oral history work the recordings will be digital, as will any transcriptions. I could go on, but you get the picture.

Creating a portfolio of your work and writing it to a USB stick and giving it to a memory institution such as the Athenaeum is not a solution - we don't have a long term preservation solution of our own, and if we found somewhere to lodge the work, that somewhere will of course be dependent on external funding for the foreseeable future, and as we have seen with the failure of projects like the Florence Nightingale digitisation project to deliver, even funding does not guarantee either access or continuity of access...

Friday, 24 October 2025

Cataloguing removable media

 Up at the Athenaeum, people are increasingly donating USB sticks containing family history information. Usually as well as family trees, they contained scanned photographs and documents including birth death and marriage certificates as well as immigration records and picture pages from old passports.

All valuable stuff.

And it's not a rare ocurrence today we had two in the space of half an hour.

In some cases they come with some quite detailed documentation, with the best following all the guidelines as regards human readable filenames for directories and files and providing some descriptive information.

Others, perhaps less so, and we need to think about how we document them.

For the moment we need a little procedure to ensure that we catalogue and record the items in a standard way, so that we can keep the USB sticks safely, and make sure that the connection between any printed documentation and the USB is preserved.

After all, people have entrusted us to look after their family history research and it is the very least we can do is look after it for them in as professional a way as possible.

It has also revealed that we didn't actually have a procedure for managing donated electronic material, so I made one up.

As a procedure it owes something to the procedure we developed some years ago for ingesting field research data when I was at ANU, and people would bring us data that they wanted to archive, examples include species abundance data and digitised historical documents.

The difference here is that at the Athenaeum we have no content management solution - while the data may eventually end up in Victorian Collections or Trove at the moment our focus is simply on the safe storage of the donated data.

 When I wrote the procedure I had in mind the differing skill levels of our volunteers, so I tried to make it as mechanical as possible and not too different from the way we ingest data about aretfacts - the draft is available to download as a pdf.

The document is very much a work in progress, and may be subject to revision. In the meantime, please feel free to take a look, and reuse the content if it seems appropriate.




Sunday, 12 October 2025

Ipads versus Android tablets

 Just under a year ago, I bought myself a refurbished iPad as some applications had stopped working on my pandemic era Huawei MediaPad, basically due to it being stuck on an old version of Android.

I expected that over the course of this year I'd gradually change over to using the iPad exclusively, and the MediaPad would go to the ewaste centre.

A great pity, as it is an excellent device, but facts have to be faced, and Apple own the tablet space in Australia and Android devices are not even a minority sport.

However, due to my being a total gonk and failing to realise that if you buy a subscription to a news website through Google, in most cases you only get access to the Android app, I've kept on using the Huawei to read the news in the morning and check the weather.

This has given me an opportunity to compare both devices over the longer term.

Tablets don't really have to do much other than run an application, download and display content, so things like memory and processor power are not important - as long as they have enough to do the job in a timely manner it doesn't matter if one has a higher performance benchmark than the other.

In fact both are roughly the same age and roughly the same specification - the Huawei has a bit more memory - certainly you don't feel any significant difference in performance when using YouTube or Spotify.

Where you do see a difference is in switching between applications or indeed cutting and pasting content between the two.

The iPad is simply clunkier. It does the job, but it's clunkier, and I put this down to the fact that Android is inherently multi tasking, while older versions of iPadOs are not.

This isn't a showstopper by any means - if all you want is a device to review documents on or watch videos, you probably don't care that much.

Strangely, the one real differentiator is long term operating system support - Apple are still pushing out updates for a five year old device while the MediaPad has dropped off Huawei's update list.

So, if I was to go out and buy a replacement device today, which would it be?

A current model iPad brought from Apple in Australia is A$600, meanwhile the current Honor Pad is around A$550 bought from Amazon in Australia. (Since I bought my MediaPad, Huawei have both rebranded their phone and tablet business unit as Honor and sold it to another Chinese electronics manufacturer to avoid US sanctions on the Huawei parent company) 

Amazon also sell grey market imports of the previous Honor Pad, the 8a, for around A$250. The 8a is based around Android 14,  which is still supported.

Given the price advantage of the grey market import of the previous model, I think that's the one I would go for, if I wanted a new and competent device and didn't want to spend six hundred bucks on a tablet.

Refurbished Huawei and Honor devices are not really an option - you're unlikely to get any operating system updates. Refurbished iPads are competent, but more recent models attract a price premium meaning there's little advantage over buying new.

So, there we have it. As always your mileage may vary, especially depending exactly how you intend to use the device. What I would steer clear of are some of the remaindered Huawei branded mediapads floating around various online marketplaces - the supported operating systems are simply too old, even though the hardware is still good and performs well.

Friday, 10 October 2025

Guerilla cataloguing - part 0

 I've mentioned before that we planned to recatalogue the heritage book collection using LibraryThing, the heritage book collection being the contents of the Athenaeum when it functioned as the town library in Stanley.

As far as we can tell, they hardly ever deaccessioned anything giving us a picture of changing reading tastes from sometime around 1862 to 1971 when it ceased to function as a library.

Actually, I suspect tastes haven't changed much, given the number of early copies we have of novels by Louisa M Alcott, Mary Elizabeth Braddon, Wilkie Collins and the rest - clearly the nineteenth century subscribers to the library had same liking for mysteries and sensation novels as we do today.

Until we try it, we've no real idea how well recataloguing with LibraryThing and our proposed methdology is going to work.

To refine and document our procedures we are going to run a pilot project on a few shelves to see how well it works and if it works well, we'll turn it into a guerilla project where we basically just do it, and don't worry overmuch about deadlines or formal project plans.

There is an intention to try and get other people involved so we can turn the project round fairly quickly, so we do need a simple and robust set of procedures so we can bring people on board and get them up to speed - quite different from the documentation of Dow's and Lake View where there was only me and the main reason for documenting procedures was to avoid drift and capture any changes to the methdology.

So today it was part 0 of the exercise - creating an account on LibraryThing for the Athenaeum, and as part of what we want out of it is a set of MARC records to allow us to port the catalogued data to another library system, identify some tools for verifying and manipulating MARC records, especially as instead of class marks or any standard cataloguing scheme, the original spreadsheet used shelf position.

This is worse than it sounds - for the thirdmost book from the left on the front row of shelf C the shelfmark is C3F, and the thirdmost book from the left on the rear row the shelfmark is C3B. Unfortunately there's no guarantee that there are the same number of books in the front and back rows - as a scheme it's almost as eccentric as the Cotton Collection classification scheme.

So basically, we need to be able to validate the MARC output.

MARC is a binary format dating from the early days of library computing and, like BibTeX, is essentially a lowest common denominator format, ie one most other systems can read and process.

So, what we need is a utility that can read the binary MARC file and display the file in a human readable form - something that with MARC is a bit of an exaggeration.

Now, the last time I did any serious work with MARC was twenty years ago when I wrote a simple parser in Perl to take a set of MARC records and format them so that the records looked like old fashioned card catalogue images.

I forget why I was asked to do this, but I remember looking out at the rain coming down on the museum car park while I fiddled with regular expressions.

So we needed something to let us examine the contents of MARC files, and given that we have a budget of zero dollars and zero cents for this exercise it had to be both free and public domain.

Well, there's not a lot of choice - basically it seems to come down to Terry Reese's MarcEdit, which has the merit of being endorsed by the Library of Congress, FastMRCView, produced by the Russian State Library (formerly the Lenin Library) in Moscow, and the online only MRV MARC Record viewer.

Otherwise there don't seem to be a lot of options out there, but it's quite possible that I've missed a couple of other public domain applications, but playing with MARC seems to be very much a minority sport. 

I've decided, quite unilaterally to go with both MarcEdit and FastMRCView in the pilot and compare the output, while both seem to do what it says on the tin, there's always a risk that one application interprets the data slightly differently from another.

FastMRCView is a windows only application, while MarcEdit comes in Windows, Linux and OS X flavours. As most of the prior work on the catalogue has been done on windows there's no pressing need to change operating systems.

So, we have our account and some software that looks as if it might help with the gnarly stuff, all that remains is try and see is how well our proposed methodology works in practice ...