Sunday, 31 May 2020

Making a wordcloud for the Waterloo Bridge mystery

Remember wordclouds?

A few years ago they were incredibly popular as a way of visualising the key themes in a document.

Just for fun, and out of curiosity, I decided to use the accounts from the Mount Alexander Mail of the murder and the inquest to pull our the key themes.

There's nothing special about using the Mount Alexander Mail - they had more or less the same syndicated reports as other newspapers, but the OCR'd text in Trove was among the cleanest.

For the wordcloud software I used the IBM java wordcloud package - the same one as I used some years ago, and which I'd forgotten was (a) tortuous to install - for some reason my Xubuntu machine did not install OpenJDK 8 as a default (b) needed  some modifications to the startup script to work on Xubuntu, but I got there - you can see the results from working with the defaults stopwords file at the top of this post

I then 'borrowed' a stopwords list from a nineteenth century literature research site, rather than using the default, and came up with a slightly different wordcloud:

I don't think you actually learn much from either wordcloud, other than the stories were concerned with bones, blood, the bag, and the clothes, but it was a fun exercise for a wet and blowy Sunday afternoon ...

Thursday, 28 May 2020

Coronavirus, lockdown, and old machines

Earlier today, the ABC had a post about the difficulties poorer families encountered while trying to homeschool their kids during lockdown.

Naively, one might have thought using older hardware with a lower demand operating system might be a help to them.

Not really, while it's true that using Linux and LibreOffice would give you a low cost and licence free environment, and given that most online schooling is is web based, it should be a no brainer.


There are costs associated with moving to linux. The first, most obviously is that not everyone has the technical skills to install linux onto an old machine, and ideally you need a decent internet connection.

In fact you need someone to do the install for you.

Consider, if you've only got a 10GB a month data allowance the 1.5-2GB download is a big chunk of your monthly allowance.

And remember, if you are renting, and on benefits, as so many people are now, an extra $60 a month for even a basic NBN package is a lot of money, and you may be worried about being able to pay for it over the 24 month contract term.

And surprisingly, a lot of people get by with a phone or an ipad on a monthly cellular contract. One of the reasons that the queue's at Centrelink offices were so long with people filing claims was not only that the system was over capacity, but also the public libraries had been closed - a lot of people rely on the public machines when they need to do something that needs a keyboard.

When I've been doing some research in a public library I've always been amazed at how well used these public access machines are - now I know why.

And of course, there's the question of ongoing support - people will need support. Things go wrong, and sometimes people simply won't have the knowledge to to work round a problem.

There needs to be a support framework.

And also some commitment from the educational system. Telling kids to put their work in an excel spreadsheet or word document isn't helpful.

The teaching materials need to be product agnostic. By all means tell them that if they use LibreOffice they need to save that essay in docx format for upload, or export the spreadsheet as xlsx from Google Sheets, but don't tell them they should be using particular software packages.

Of course, this requires that teachers, educators, and support staff have to be familiar with a range of software and environments, which means training, which probably isn't going to happen any time soon.

But we should take a look at what actually happened during the great home schooling experiment, and think very hard about what we can do to end the digital divide.

It doesn't have to be expensive. It doesn't need a lot of shiny new kit. But it does need some careful thought.

Building a Xubuntu machine for research

Earlier this month I posted about building a Xubuntu based machine for fieldwork.

At the time it was difficult to get hold of suitable machine machine - basically all the reasonable second hand machines hand disappeared due to either working from home, or with kids having to be home schooled, the need for people to get a cheap extra second machine.

However, I manged to get hold of an old Dell E6320 ( i5, 4GB RAM, 500GB HDD) for a little under two hundred bucks, that I reckoned would be powerful enough to run R to do some (very simple) text analysis to establish the vocabulary used in Victorian murder reports, and OCRfeeder to take pdf's of Victorian murder reports and extract the text.

I decided I needed a newer machine because while capable, my old Inspiron did grind a little when asked to do anything serious.

An odd hobby to be sure, but fairly harmless.

The machine arrived yesterday. In practice a little bigger than I imagined - lesson learned, don't go on the screen size, go on the sizes from an old review - and slightly more battered than ideal, but everything worked and the screen was nice and bright.

I'd previously made myself a bootable Xubuntu USB stick for my initial experiments on my old Inspiron, so I used that.

First problem - I couldn't change the boot volume from the internal hard disk to the USB stick. Most Dell's I've installed linux on you just need to hit F12 once the Dell logo appears, and after a few seconds you end up in the boot selection menu.

Not this one. I tried holding down F12, which works on some manufacturers, I even tried an external keyboard to check if the key was faulty. I ended up resorting to Google. It turns out that on the E6320 (and some similar machines) you need to rapidly tap on the F12 key during startup until a message in yellow appears in the top right hand corner, and then you end up in the device selection menu.

After that, installation was basically the case of following the bouncing ball. The machine had come with Windows 7, and I decided to keep a small Windows 7 install just in case I ever needed to run some windows only software.

The Xubutnu installer included a nice graphical partitioning tool to let you keep your existing Windows install. The default was to split the disk half and half, which would give each more than enough, but I decided that as I only needed a small windows install to go for a 25:75 split between Windows and Xubuntu. On a 500GB hard disk this gave Windows a more than reasonable 120GB.

Everything just worked. Well except for the trackpad, which was a little slow and draggy. Strangely the little joystick thingie on the keyboard worked as well as an external mouse.

As it was a $5 mouse from the office supplies store solved the problem.

Software install was fine, network support was fine - it took me a couple of hours at most, including the F12 problem, to end up with a usable machine:

Sunday, 24 May 2020

Researching the Waterloo Bridge mystery

I've recently blogged about the 1857 Waterloo Bridge murder, but I thought I would summarise how I researched the post:

My initial search was some fairly dumb searches using Google and Wikipedia - usually if it's a well known case such simple searches turn up articles and books about it - this time I drew a blank so I moved on to Welsh Newspapers online which is free to search, and which turned up a number of  reports from the period.

Unfortunately, Welsh Newspapers is not the easiest to print articles from - it operates more like a virtual microfilm viewer than anything else, meaning that is so long or multi column such that you can't easily do a screen grab using Snip'n'Sketch, you have to resort to taking notes on a second device such as an ipad, which is a bit silly.

This time, I didn't do that, I repeated my search using the State Library of Victoria's historic newspaper databases, and printed out the relevant articles. I also used the print to One Note feature in Windows to save them to One Note as well in case I ever needed to go back to them.

Printing the articles wasn't the whole of the story - I then worked through the articles, taking notes and summarising as I went.

As it was a nice, sunny, late autumn afternoon, I did this sitting outside on the deck and used my MSI netbook for the task.

The MSI has a very nice keyboard to type on, which is why I keep it around, but is distinctly underpowered for 2020 with only 1GB of RAM and a 2012 vintage Atom processor.

That said it runs BunsenLabs linux perfectly well, and using ReText, it's highly responsive - add in two or three browser tabs and a running mail client it does grind a little, but for most writing purposes it's fine, and considerably nicer to work with than an ipad or my old Alcatel Android tablet and keyboard combo.

The finished notes were then transferred to my Windows laptop via dropbox, converted to Word, and inserted into OneNote.

Not the most elegant workflow, but one which I found to be the best given the various constraints (and the sunny weather)!

Friday, 22 May 2020

Books and deliveries in a time of coronavirus

Almost two months ago now I wrote about the problems of ordering books from overseas, given the near total disappearance of international flights.

While there is still a surface mail service with packages sent by sea, in reality most small packages are normally sent by air, but the absence of flights has led Australia Post for one to suspend its international economy service, and warn that standard rate packages may be sent by sea.

What happens to economy rate packages in transit from overseas is anyone's guess. Different postal organisations may have dealt with mail differently. Given that most of my books in transit originate from the UK I did check the Royal Mail's website, which was singularly unhelpful.

However, it's more complex than it first appears - companies often outsource their delivery services to fulfillment companies who then use different countries postal services to send the mail on.

I've had packages posted from Switzerland, Sweden, Belgium and the Netherlands, despite having been ordered from bookshops in the UK. And I'm guessing that different postal authorities may be handling matters in their own way. But what we can say is that the post will be delayed.

As a result we are living in a nineteenth century style world of uncertainty. Packages trapped in the shutdown will undoubtedly arrive, but when and by what route is anyone's guess.

Letters (should anyone still need to write by hand) will take between four and five weeks, and yes. occasionally there are documents that need to be signed and returned, rather than scanned and emailed. As I say, we have returned to nineteenth century style isolation.

However, there are signs that the mail is beginning to move again. About a month ago, I ordered a book from Blackwell's in Oxford. It arrived today courtesy of Jersey Post.

I have a couple of books that I ordered earlier in April that have yet to arrive. Having received one item, I'm now confident the others will arrive (eventually).

Sunday, 10 May 2020

Xubuntu, fieldwork and deja dup ...

Yesterday afternoon, having failed to find a suitable machine for linux experimentation, I got out my elderly Dell Inspiron 1545, and tried updating it from Ubuntu 18.04 to 20.04.

I tried to do this manually from the command line with the do_release_upgrade command, which insisted that I had to upgrade it by first going via version 19.x.

Well, I got as far as a working version 19 and gave up - basically spending two hours doing a command line upgrade is incredibly boring.

But just before I gave up, I noticed that Ubuntu had acquired an inbuilt backup utility that lets you backup to Google Drive, among other targets.

So, this morning, I downloaded the latest version of Xubuntu - I went for Xubuntu, despite its quirks because I really don’t like the default window manager in Ubuntu, and upgraded the Dell to Xubuntu 20.04. This only took about 20 minutes as the installer realised that there was already an working older install of Ubuntu and only upgraded those things that it had to.

Unfortunately, one of the things it doesn’t install is Deja Dup, the backup utility, but once you realise what the damn thing’s called, installation is utterly straightforward.

Download, select what directories you want backed up, schedule your first backup, and away you go.

It’s important to realise that this is a backup - it’s an encrypted backup of some directories on your machine, not a way of syncing a directory with your Google drive. There are some products that cost money to do that, and various magic spell solutions involving rsync, but there’s no easy to use synchronisation product.

However, what it does mean is that you have got a copy of these important documents backed up in case of disaster.

Now, one of my use cases for a linux machine is for a minimal travel and fieldwork machine - you have your writing tools of choice installed, perhaps a notes manager like Standard Notes, copies of anything important downloaded and cached locally, and otherwise the machine is pretty content free, except for work in progress.

The idea has always been that such machines tend to be older, tend to be used offline some of the time, and are to some extent disposable.

And, as I used to tell students - the data on them has cost more in terms of time and effort to generate than the machine, and in the case of fieldwork, may be irreplaceable. After all, in normal times you can get a basic machine for fieldwork for a couple of hundred bucks.

Not that you intend to lose them, but they’re the machine that gets bounced around in the back of the truck, or taken to some reasonably grubby location. Security checks at airports tend not to be a problem - on the very rare occasions I’ve been asked about a machine, the fact it’s a linux machine excites curiosity more than anything.

I previously used suggest to keeping everything crucial in a directory and religiously backing up the contents to a USB stick, and then uploading the contents of this work directory to some cloud storage - basically the same model that I follow with the Dow’s  Pharmacy documentation project, but having an automated utility like Deja Dup simplifies matters and means you can be assured that your data will be backed up on a regular basis automatically, and given that it keeps the old backups for a designated period, means that you can backtrack if you accidentally delete some crucial files.

Saturday, 9 May 2020

Lockdown, linux and refurbished machines

Recently, I've been looking for a refurbished laptop to use as a linux machine.

Not a chance. The WFH (Working From Home) scenario that we've all been pushed into in the last couple of months has meant that all the decent units around the $200-250 mark have disappeared off the market.

I suspect that this has been driven more by schools moving online more than anything else and that's triggered a sudden rush to buy anything reasonable that does the job.

Not that there aren't machines for sale, but they're obviously from the bottom of bin, eight nine years old, and believe it or not, I even saw one of offer with Windows Vista Business for around $150!

A couple of months ago, before the lockdown, you could get something reasonable - say around six years old with  a smallish hard disk and Windows 7 for for about $200 - but not at the moment.

So, I'll wait - I have some other old machines to play with ...

Thursday, 7 May 2020

22 years of the imac G3

The register had a post this morning that it was 22 years to the day that Apple released the iMac G3, the machine that arguably saved Apple from going under.

I had a couple of these machines, but never as OS X machines:

One, a genuine 1998 G3 iMac, I got given as I was known to enjoy experimenting with extending the usable life of redundant hardware by installing linux (otherwise known as buggering about with linux on old machines), the other I actually bought from a government disposal site for a few dollars.

Both were useful, and I actually used the original old G3 machine as my main desk machine for a couple of years.

But of course, what killed it was the lack of mainstream linux support for the PowerPC architecture.

Yes, even now you can still get distros that work on old PowerPC Macs, but most of these are purely volunteer supported - nothing wrong with that - and a little rough around the edges due to the need to reverse engineer support for Apple's varying mix of hardware and boot environments.

Most work well, but sometimes there needs to be a bit of goat sacrifice involved to get it to run, and that's not really suitable for a production environment. It's no use extending the life of hardware if your support costs go through the roof by having to cosset machines - they're machines, not cats.

Without a reliable, easy to install version, it wasn't tenable to advocate to people that they extend the lives of older iMac hardware by installing Linux.

So, I moved on, and the machines went to the recycler. But I still remember them with affection. Definitely fun ...