Sunday, 13 June 2010

History, archaeology and metadata

At the recent ANDS bootcamp the discussion veered towards the use and reuse of data and metadata in the humanities, and I was hard put to come to an example.

I said something about using the findspots of byzantine coins and sixth century amphorae to show that trade from the Mediterranean was primarily with the west of England and revolved around the tin trade, with the celtic sucessor states swapping tin for Tunisian red among other things.

Other examples could have been Alessia Rovelli’s work on early medieval coin use (and reuse) or Nathalie Villa’s (with others) work to reconstruct a network of social obligation over space and time in the medieval Languedoc using techniques akin to cluster analysis.

There are other examples – for example I have heard of people using maps of Roman settlement sites in East Yorkshire combined with topographical and ecological data to show that the Romans did not settle near wet or marshy areas – which is interesting as one of the major activities in Roman east Yorkshire was growing grain for shipment (by sea) to feed the army on the Rhine frontier. Wheat, of course prefers a dry well drained soil.

All interesting, fascinating even, but none of them really sexy.

I then began to think about my favourite example of impermenance – Roman army pay dockets.

Broadly, between the accession of Augustus and the death of Septimius Severus the Roman Army (excluding the Auxilia) consisted of 30 legions of 5000 men. A period of roughly 250 years And three times a year (later four times a year) a soldier was paid and issued with a statement of account for how much he had been paid, how much had been docked for broken equipment, how much to his (compulsory) savings account, and how much to the burial club etc.

So the Roman army must have produced 250*30*3*5000 (or just under half a million a year or well over a hundred million during the time they did this). Assuming that they also did something similar for the Auxilia we would expect that we would have a substantial number of these.

We don’t, we have less than 10.

Which is a pity as these were semi structured documents with a predictable format and having this sort of information would let us know the actual strength of the Army, plot the impact of substantial defeats on the army. It would also let us understand what happened to the Legio IX Hispana which was once thought to have marched out of York sometime around 100AD to be massacred by the Picts – the legion never to be reformed – the story that forms the background to Rosemary Sutcliffe’s children’s story ‘The Eagle of the Ninth’ and one of the things that got me started with the Romans.

The truth appears different. It now looks, from the evidence of legionary stamps on bricks that the legion spent time on the Rhine frontier before disappearing on active service in Armenia in a battle during one of the hotter phases of the continual hot then cold war between Rome and Parthia. What crime the legio IX hispana committed has been lost.

However, the main takeaway here is that the availability of semi structured documents is what makes the construction of data sets possible.

Medieval land sale and marriage charters are another good example, as shown by Nathalie Villa. Written to a formula it’s relatively easy to extract the pertinent facts and build a dataset showing who exchanged what with whom.

Likewise one might be able to correlate the feudal obligations of landlords with rents received and start to find how much it cost to put a knight or some well armed men in the field. Or indeed to plot the rise of the monasteries and other church institutions.

Of course we tend to think about medieval because it’s immediately interesting and our own culture, but the data is in some places incomplete. However if what we want to do is validate our methodology there are some potential complete datasets to work with – the records of the East India company for one. Here was a commercial organisation that financed armies, fought wars and kept accounts, and the data was lovingly written in copperplate by accounts clerks making it relatively easy to digitise and OCR, and hence automate data ingest.

And if the techniques work on the late eighteenth/early nineteenth century records it would be possible to extend back in time, and to tackle the records of the Dutch East India company to prove the techniques work and to build a set of proven methodologies.

And we already have an example of the value of digitising old records like this. The digitisation of British naval logbooks from the eighteenth and nineteenth century has allowed us to improve the climate records we have in areas where the records are distinctly spotty.

For example, until the late 1880's with the advent of people such as Clement Lindley Wragge meteorlogical observation in Australia was distinctly hit or miss with little funded by the original colonies.

This means that we have perhaps a 120 years data at most. The logbooks from naval and convict ships would help push this back a further hundred years and help us establish whether the recent drought in New South Wales is evidence of climate change, or if the area is subject to periods of crippling drought as in the Federation drought of the 1900's.

