Wednesday 30 August 2017

Tightening the folksonomy

Well, after a couple of months on the documentation project I can say

(a) the methodology is working
(b) bench marking the data captured against the publicly available data on Museums Victoria shows we seem to be capturing the right sort of information
(c) I'm getting really good at recognising nineteenth century pharmacists bottles

which is kind of where I'd hope to be.

Having bench marked the data I spent the morning reviewing the first tranche of entries - as I would of expected - the earlier records basically have all the information but are not structured as tightly as the later ones, so as part of the review process I went back and restructured the data, and filled in any missing data.

Besides documenting the remaining three and a bit thousand objects, I guess the next stage is to write some perl (or python) to transform the records in to a true csv file rather than one with sections separated by commas and subsections by colons, which would potentially allow me to spit the file out in any other format (bibtex for artefacts anyone?)

The other fun idea is to build a little online exhibit using Omeka of the more interesting bottles, and again there's enough data to generate object descriptions ....

Tuesday 8 August 2017

Repurposing an old Eee netbook for research

A long time ago, when I upgraded my EEE pc to crunchbang (which is no longer maintained), one of my ideas was to use it as a distraction free writing machine.

Having fiddled about with cherrytree as a note manager, I’ve come to the conclusion that using machine as a distraction free research machine works:

It has an excellent, if slightly cramped keyboard, will run for a couple of hours without being plugged in to the wall, and with focuswriter for writing, and cherrytree for notes management, as well as something like retext or gedit for markdown work, and kate (or gedit) for general text file editing, the whole bundle works well, especially with opera as browser (for some reason it works better than Firefox or Chromium, coping with the EEE’s non standard screen size), and sylpheed as a mail client - not my favourite, but sufficiently lightweight to run quickly.

As before, no dropbox or other external storage - things are kept as minimal as possible.

The result?

A distraction free research machine with just enough connectivity to check items, but without all the booming buzzing confusion that a more fully specified machine would lead you into. That and its small form factor makes it highly portable

Sunday 6 August 2017

CherryTree ....

Having suggested that you could use something like cherry tree as an alternative to OneNote or Evernote in a barebones documentation solution running on linux, I thought I’d better try it out on my Xubuntu netbook.

And it’s not bad.

It’s essentially a node based note taker where one starts from the beginning and build a set of tree structured notes, which is fine for me, as it’s how I tend to work, and it’s fairly easy to move nodes and restructure things.

As a practical exercise I built myself a set of notes about the murder of Mary Dobie, which occurred in November 1880 in Taranaki in New Zealand against a backdrop of settler/indigenous conflict. (If you are interested in reading further, there's an excellent book available from the University of Auckland press)

Basically, I did what I always do, build myself a root node, add some subnodes containing the results of querypic search to confirm that there was a spike of interest and a couple of relevant newspaper articles from the period, garnered from Trove or PapersPast.

This was a little more fiddly than you might think - nodes can be richtext, plaintext or markdown (yay!), but you can only attach object to rich text nodes.

Display of the objects is dependent on external viewers, which is a little clumsy, but it does work, and it of course means that you need to put some descriptive text in the node otherwise you end up wondering quite what verylongname.png really is.

Cherry tree doesn’t really do document sharing, but you can share an individual database between members of a research group (or multiple machines)  say, via dropbox, or any other filesharing platform, and that’s probably good enough for most purposes.

The application comes with an impressive set of options to import from other note managers, but unfortunately none of the mainstream ones, and exporting again avoids the mainstream but it does allow pdf and html export, which again covers most options, including creating either a set of html pages or a single unified document.

So, not perfect, but perfectly usable for a lightweight alternative to one of big boys ...

[update 07 August 2017]

Just for fun I added cherrytree to my old Eee PC701 linux netbook and imported my test database, and everything worked well, the only downside being that the slightly newer version I installed no longer supports markdown ...

Friday 4 August 2017

Documenting artefacts - the methodology

Yesterday, while working at the documentation project that I’ve volunteered for I had a couple of interesting conversations, one with a lady from a local history society in New Zealand, and the other with a post graduate history researcher about what I was doing and what I’d found so far.

The New Zealand lady was particularly interested in the how, and thinking about it, while the final destination of the information is an InMagic artefact catalogue, you could use the methodology for just about anything  - I have for example thought about extracting the more interesting items and building a little exhibition with Omeka.

The data is entered into an excel spreadsheet in a semi structured manner so an entry will look something like this:

clear glass bottle ~200mm: glass stopper: printed label wintergreen: contents (liquid) present, 20170803_105032.jpg, label 20170803_105131.jpg

so the structure is basically

description, image, comments

with the three components separated by commas. Inside each section sub components are delimited by colons to make it easy to split up the text. The description section always follows the same structure, and uses a tacit controlled vocabulary (a folksonomy) of standard terms, so the label can be no label, handwritten, printed, typewritten, to make parsing easy. The description is always followed by the name of a jpeg file, which also give you the date of the description - purely accidental, I’m using my Samsung Galaxy to photograph the objects as I go - the camera being good enough for documentation purposes, and that’s just how it does it, a useful accident.

The comments section is basically a ragbag, extra images, text embossed on a bottle, and so on, but where possible standard terms are used and everything is always colon delimited.

Rather than go for a complex entry form I though it better to go for a simple bare bones approach and structure it so that the information could be post processed and fixed up later with a bit of perl and regex.

Much the same applies to using a folksonomy - as at the start I didn’t know what terms to use, it struck me a simpler to make one up and let it evolve, most times the terms are standard but if a new one is needed so be it.

Of course this all needs to be documented, so in parallel to the spreadsheet I have a markdown file which records progress and any changes. I chose markdown as it lends itself to structured documentation and can be easily converted to other formats.

In addition, working notes, background information on various suppliers and the like is stored in OneNote, for the simple reason that I’m using a windows machine that came with OneNote, and I’m not supposed to install additional software, otherwise Evernote would be an obvious alternative.

Data is backed up to a USB stick and then to a OneDrive account. While I do have access to the internet on the project, the connection has quite limited bandwidth - enough for email and web searches, but not enough for backup. Even a OneDrive sync can be tedious.

In the ideal world, I would have access to some secondary local filestore, but I don’t have that, so I back up the data at home, where I have a reasonably fast connection, to my personal OneDrive store, purely because I have storage to burn at the moment as our ISP gave us a fairly generous chunk of online storage as part of our package, and it’s stupid not to backup the data.

However, while it’s not ideal, it shows that the methodology is adaptable, and while it would be preferable to have an internet connection, it can be used for documentation work offline, something possibly important for onsite documentation in remote locations

The same goes for the software used. I’m using a windows machine with office, onenote and the windows markdown editor. I could equally well use libre calc, evernote, or typora on  either windows or a mac. I could even use an old repurposed machine with ubuntu installed. The only crucial parts of the methodology is bluetooth to transfer pictures from my phone and support for some sort of external storage device. Otherwise, it’s a spreadsheet, a text editor and some sort of note taking tool such as Laverna, Simplenote or Cherrytree.

Due to the lack of dependence on paid for software, the access cost is fairly minimal. It’s possible to pick up a refurbished thinkpad, admittedly with Windows 8, but with a decent warranty, for around three hundred dollars, and if Windows 8 isn’t your thing, Ubuntu is a simple and pretty automatic install.

As I said, now that smartphone cameras are as good as point and shoot there’s no need to invest in a separate camera, but if required, small end of range point and shoot cameras from manufacturers you’ve heard of are fairly easy to find at an affordable price.

So, the methodology is straightforward and has few prerequisites. how do I use it?

Well the artefacts are documented one by one, initially by longhand in a notebook, which has turned out easier when transcribing faded labels and embossed inscriptions using the Leiden conventions than directly typing them in, then photographing them.

The images are then transferred to the laptop via bluetooth and the image names recorded in the notebook. And then the record is added to the spreadsheet. Every half hour or so I save copies of the spreadsheet, markdown documentation file and the images to the usb stick for later transfer and backup to OneDrive using my home laptop.

Anything interesting, such as an unfamiliar manufacturers name is googled and a note added to OneNote. Bottles are highly collectable, so besides standard resources such as Collections Victoria, collectors personal sites can be a useful resource as can ebay as often collectors have greater detailed knowledge of particular bottles than museum curators.

So, in essence, keep it simple, use formats that can be easily read and document everything ....