Sunday, 31 May 2020

Making a wordcloud for the Waterloo Bridge mystery

Remember wordclouds?

A few years ago they were incredibly popular as a way of visualising the key themes in a document.

Just for fun, and out of curiosity, I decided to use the accounts from the Mount Alexander Mail of the murder and the inquest to pull our the key themes.

There's nothing special about using the Mount Alexander Mail - they had more or less the same syndicated reports as other newspapers, but the OCR'd text in Trove was among the cleanest.

For the wordcloud software I used the IBM java wordcloud package - the same one as I used some years ago, and which I'd forgotten was (a) tortuous to install - for some reason my Xubuntu machine did not install OpenJDK 8 as a default (b) needed  some modifications to the startup script to work on Xubuntu, but I got there - you can see the results from working with the defaults stopwords file at the top of this post

I then 'borrowed' a stopwords list from a nineteenth century literature research site, rather than using the default, and came up with a slightly different wordcloud:

I don't think you actually learn much from either wordcloud, other than the stories were concerned with bones, blood, the bag, and the clothes, but it was a fun exercise for a wet and blowy Sunday afternoon ...

No comments: