Thursday, 17 May 2012

Sighelm and Orbis

Orbis - an interactive route finder has just been released - essentially it's a tool that calculates routes between various places in the Roman world, and the time it would take using various modes of transport to get from one to another. The Register has quite a good backgrounder on how it works.

I immediately though of Sighelm and these other AngloSaxon travellers who ventured far and wide in search of holiness (or spices). And my second thought was how does it measure up to what we know of the time it took pilgrims following the Via Francigena from the coast of northern France to get to Rome?

[click image  for a larger display]

The green line is the best I could get to Sigeric's route - Orbis suggests that it would take about 60 days walking. Sigeric's diary records it took him 80 days. The discrepancy is probably fair - the route would have been harder in Sigeric's time and he probably couldn't have spent as much time during the day just walking as a Roman traveller following a similar route in more peaceable times could accomplish.

However, being totally unscientific (we'd have to have a lot more data on pilgrims travel and plot the routes rather more accurately) we can say that travel in the early medieval period was slower, but perhaps not quite as remarkably slower as one might expect ...

Tuesday, 15 May 2012

just killed my facebook ...

Well I've gone and deactivated my facebook account.

Never used the damned thing (much). Yes, when I set it up it was kind of gratifying to friend and be friended by old acquaintances, but they were really just that. Old acquaintances. Don't get me wrong, there's quite a few I'd have been happy to have dinner with and catch up with but they weren't part of my life and I wasn't part of their life.

Google plus is another - got it because I use a lot of the tools from big G and it's kind of mandatory to hae it, but don't use it.

The one bit of social network technology I use is twitter - but really as a curated rss feed, a way of sharing useful links and information with people I work with, even if they're in another timezone or another continent even.

And twitter is useful for rolling news updates as in emergency management and useful for 'presence updates' - if someone tweets that they're in an airport you know there's a good chance emailing them might be more useful than calling them, especially if it's something knotty.

So, if you're missing me on facebook it's not personal, 'cos it wasn't that personal to start with ...

[also see my post 'Withdrawing from Social Media']


[update 20/05 - of course facebook has sneaked into our lives by providing authentication services for other sites - had to turn the damned thing back on ...]

Monday, 14 May 2012

History as data


J's Great^3 grandfather was a self employed seedsman in Barnard Castle in County Durham in the early 1800's.

We know this because we have one of his project management books from 1819. There must have been more but we only have the one.

The first thing that's really remarkable is that it looks like just one of my project management books, except that it doesn't have any postits inside. Otherwise its pretty much the same. You get the feeling he'd have loved Evernote. It's all there. Notes, scribbles, rough calculations, even when he went to the pub on a wet afternoon to do business with someone

Yesterday, because J has been doing some family history stuff we spent a couple of house going through it in detail. And it contains a wealth of information:


  • who his clients were, what they bought and and what they paid
  • what he planted when
  • who he worked for as a jobbing gardener at slack times and what the going rate was
  • drafts of letters asking for payment and offering stuff for sale to other suupliers
  • what sold for what in the local market
  • the weather - when it was too wet or cold to work outside, for example when there was spring snow
  • recipes for traditional remedies to kill blackfly


and on and on.

Tabulated it would give a picture of life in the early 1820's. Prices of seeds, prices of labour, what people spent money on, and how work was divided up. Combined with other sources it probably would tell you a lot about rural life in County Durham in the 1820's.

Of course it doesn't as it's sitting in an archive box in the study at home. Which is fine as an heirloom but not as a history resource.

So, having gone on about digitisation ad nauseam it's probably time to practise what I've been preaching and try a bit of home digitisation ... The first stage is probably to build a diy bookscanner - easier for us to save handling the books, and when we have the images put them on the web and add transcriptions as we do or find them ...

[update 15/05/2012]

I've just found a $3.50 bike camera mount on ebay. With a coule of old retort stands, clamps and a horizontal bar could probably use the mount to accurately position a camera above the page to take good quality images without building a diy book scanner ...

Friday, 11 May 2012

Australian National Corpus Officially Launched

Extremely late to the party, but the Australian National Corpus Officially Launched  just before Easter.

Interesting to see that it's built around Plone ....

Wednesday, 9 May 2012

Burma Burmah or Upper Burma

I've been having a little play with Tim Sherratt's QueryPic tool (which searches digitised versions of Australian newspapers before 1960).

Searching for 'Upper Burma' shows some activity, but not as much as you would expect given that the Third Anglo Burmese war of 1885 led to the conquest of what the British termed Upper Burma and the capture of its capital Mandalay.

You also get a nice little spike in the 1850's around the Second Anglo Burmese War.

And sure enough if you search for the older usage "Burmah" and for "Mandalay" you do get a peak around 1885. If as a correction you also search for a more modern term Burma you of course get a massive peak coincident with the second world war but also little usage of the more modern spelling prior to 1910.

If you then run a similar query on Google's Ngram viewer against the British English corpus which includes books and not just news papers you get a different picture


Interest only begins to develop post 1885 and already by then most people were using 'Burma' as opposed to 'Burmah', and that interest remained roughly constant until the second world war.

If you rerun it against the American English corpus you see (quite predictably) that interest in Burma prior to the second world war was mostly a British thing:


Run it against the English fiction corpus and we see something slightly different:


a little peak around 1885 and then not a lot of change until the 1940's (incidentally meaning that however seminal George Orwell's publication of Burmese Days in the early 1930's was to his development as a writer, it didn't cause a ripple overall). Interestingly if you ignore the 1885 peak the fiction corpus curve is sort of the same shape as the American corpus, perhaps suggesting that the first wave of (British) publication on Burma was non-fiction ....




Mining for meanings

Photo: Cath Styles
Went to the NLA last night to listen to the excellent Tim Sherratt give his Harold White Fellowship preso on Mining for meanings (the first slide is blank - click rightwards to see).

Tim's preso was entertaining funny and deeply interesting. I'm not going to summarise it but to make a different and equally serious point. A lot of this text mining work was done on a laptop with scripts.

No Amazon elastic compute, no farms of VM's crunching data, just a laptop, some inspiration and access to some large data sets.

We often talk about 'citizen science' and that usually brings up a picture of geeky individuals doing species censuses. What Tim has shown is that given time and inspiration it is possible to do citizen data science.

As I've said elsewhere the tools are there. The datasets are starting to be there. All that's needed is the inspiration ...

[Update: and if you want to listen to Tim's preso the audio is online at at the NLA website. ]

Friday, 4 May 2012

Abiword and me

Yesterday I tweeted that Abiword didn't seem to play nicely with the latest Ubuntu release, Precise Pangolin, perhaps because they've shipped  aversion of 2.9.2, which isa development version rather than the more usual 2.8.6.

I havn't had time to investigate, all I can say that I had two crashes in ten minutes, which, given I was working on a document at the time was a little irritating.

Abiword for me has the great virtue of being lightweight and running nicely on Windows 7 including Home starter, which runs on my netbook, meaning that I can have a reasonably flexible offline editor for working on drafts. Essentially all the functionality of the Google Docs editor while being offline.

Documents can then be cut and pasted into other documents, pasted into wordpress, or indeed saved to the dropbox or google drive folders for upload when next online.

And this is one of my tricks for getting work done, when travelling or working in places without decent wi-fi (or indeed places with expensive wi-fi). It also meant that I didn't need to lug a full size laptop around (one of the real hassles of travel) when an inexpensive netbook would do.

To be sure anything half decent that allows the layout of text would work - Kate, the KDE text editor, would be fine (in fact better if I was fiddling with stuff), but there isn't anything free that works on a windows netbook (and it has to be windows as I need to use my Huawei virgin mobile broadband  key with the system - I'm sure I could get it to work with linux, but I don't have the time) so I end up using abiword.

This isn't that bad, or rather wasn't. I could also use it on linux, share the documents to myself, work on them at home and at the end of the process export the document in  a more mainstream format (and if I forgot abiword worked just about enough on the Mac to allow a resaving in a more useful format).

Working on linux remains an essential. While I've long given up any hope of linux on the desktop achieving world domination the fact remains that I still need to play with linux systems as my forays with Omeka and desktop OCR show. If you don't play with this stuff it remains brochureware and you have no understanding of the realities of working with particular products.

Abiword gave me a common lightweight document editor between platforms that allowed me to lay out  text nicely.

There are of course alternatives Libre/Open office for one but its just a little annoying when you find that something worked stops working for you ...


Thursday, 3 May 2012

Omeka ...

I used to build solutions for digital asset management for people but now really only manage other people doing so.

However, today I had a need to get my hands dirty - someone wanted to have us host data but allow them to front end exhibition sites with Omeka.

Now I must admit that Omeka was no more than brochureware to me but after spending yesterday afternoon upgrading my linux laptop to Precise Pangolin aka Ubuntu 12.04 LTS I had a go at installing Omeka this morning.

Using the instructions on the Omeka website (which need a little tweaking for Pangolin) I had a running install on my laptop in around 20 minutes and a dummy collection of nineteenth century faux mediveal romantic paintings up and running in a further 10 complete with sensible looking metadata

As a test it's fairly trivial, but the power and ease of use is quite amazing - If I was running a local art gallery or museum and wanted an online exhibition this would be a complete no-brainer to install and use and just like doing your own ocr or digitisation programme well within the capabilities of a local art gallery or history research group ...