Tuesday 30 March 2010

Universities, Cuts and the GFC

Back in February I blogged about the end of paleography at KCL and generally about how when things are tough it's humanities and the creative arts that suffer. Since then I've posted news links about protests about university cuts as far apart as California and Sussex, not to mention the possible end of classics at Leeds and plans to reduce the humanities at ANU.

All very lamentable.

However one also needs a degree of perspective:
  • Universities are expensive to run, no matter how tatty they may look
  • Arts faculties while cheaper to run than the sciences still consume resource
  • It is genuinely difficult to measure "impact" or "value" while financial benefits of research and its applications can be calculated
  • in a financially limited world teaching sanskrit for example does not look like the best use of resources
  • saying "it's disgraceful" has never fixed anything however true it may be
And the arts faculty do not always sell themselves. Too posh, too useless.

So let's look at classics. Why should we study them? After all it seems to be mostly about dead, mostly male, people who either fought interminable wars, created unstable undemocratic hegemonies, or walked around wearing bedsheets and engaged in sexual practices that would make Aunt Hermione blush.

Well yes, but they're conveniently dead, wrote a lot about what they did and so form a convenient test bed for thought experiments about politics and power. And they have had a powerful hold on our imaginations since the Renaissance.

And if we don't understand them we can't really teach ourselves Latin and Greek, because learning a language is really about learning a culture - something that the experience of learning
Russian during the cold war has shown me. Twenty years on, knowing what UniverMag and Gastronom were, and why they were different is irrelevant. They don't exist, but then that knowledge encompasses knowledge about a particular form of social and economic organisation, and its plusses and minuses.

So with Latin. It helps us get inside the minds of the cultural ancestors of the west, and incidentally helps you learn Portugese or Spanish (it has always fascinated me that I can speak what is actually bad Spanish with some Italian vocabulary and be understood anywhere from Lisbon to Barcelona, and from Milan to Bari.) or Italian, or Catalan, which given the importance of the European Union, plus the rise of Brazil, Chile and Argentina as economically important must be worth something.

It'll also get you a decent cup of coffee at the cafe at the end of P street in Washington.

And so with History, English, Drama and whatever. It lets us look at ourselves. Or indeed Fine Art. Completely useless. Except where do graphic designers get their ideas of light and shade from? And advertising executives their design ideas?

So, the arts need to make their case. In hard times they need to demonstrate that they are not without value. And yes, Paleography is always going to be a difficult sell - but languages, or classics, surely not that difficult?

Saturday 27 March 2010

Archival filesystems

Over the past few years of this blog one of the threads has been that of filestore design, in particular student filestores, which have the annoying attributes of being unstructured, chatty and having lots of small files.

In contrast filestores designed for archival purposes are

  • not subject to rapid change
  • objects are only deleted according to business rules
  • ~ 90% of the content is accessed only once

However they are unstructured and do consist of lots of small files, be they pdf’s of research papers, invoices, recordings of Malian ritual chants, or whatever.

As these files need to be kept for a long time, and we’ll say that a long time is more than one technology refresh we need to guard against corruption, and also have an easy means of migrating content faithfully from one storage platform to another. Equally as most of the content is unchanging backing up doesn’t really work, especially as one never knows if a version of the files is corrupt or not, especially as the files are rarely if ever accessed.

So how to do it – it’s really quite simple:

Most archival file systems sit under a document or content management system of some sort, which implies a workflow and ingest process. Most of these document or content management systems will have a database that holds information – metadata – about objects in the filestore.

  • Modify the workflow so that you compute the md5 checksum on ingest, and record that in the metadata held about the object.
  • Write the object to the filestore.
  • Copy the object to a second filestore, preferably in a different city, but if not the other side of the city
  • Copy the database transaction to a copy of the database in this other location. Remember that without a copy of the database you can’t tell what’s in your filestore if you lose your primary copy.
  • Create a cron job that periodically checks the md5 checksum of the objects on disk against the value stored in the database
  • If an object fails the checksum check run the same query against the remote copy. If it passes, copy the good copy back
  • Run database consistency checks.

Now this is very simple. No fancy clustered fault tolerant filestores. But it ticks the boxes. It’s essentially an updated version of the consistency checking found in SAMFS/QFS for long term storage of data to tape to guard against bitrot

At the UK mirror service as implemented at the end of the nineties the system worked essentially like that, two geographically separate nodes with a consistency check and replication.

And the joy of it it could potentially be built with cheap off the shelf technology, say two separate Apple Xsan implementations. It’s also extensible, as nothing in the design precludes adding a third node for increased resilience, or indeed for migrating content.

Equally abstracting the automated data curation from the property of the filesystem allows both easy migration to new filesystems but also avoids expensive vendor lockin, two things that are important in any longterms curation centric design.

Sunday 21 March 2010

The $83 machine lives

Rumours of its demise appear to be a bit previous to the event. Yesterday, out of curiosity I turned it on and it booted, admittedly only seeing 64Mb of RAM, and incidentally proving that Crunchbang linux will run with a window manager in 64Mb albeit rather slowly.

Reseating the memory cured that, and there it was, back in the land of the living. Its apparent demise coincided with a weekend of exceptionally heavy rain in Canberra, and I suspect it just got damp, even though the garage apparently remained dry.

The question of course, is how long will it continue to be with us, and whether this was an indication of mortality …

In the meantime, despite my initial disquiet about being reassimilated by the borg, I have come to appreciate the strengths of Windows 7 and the advantages of using a mainstream operating system, for despite the growing popularity of OS X it simply doesn’t have the penetration of Microsoft operating systems.

Linux on the desktop remains an oxymoron, despite its great power and versatility. Despite the similarities in window managers between Windows 7 and Gnome, people are more comfortable with the Microsoft environment, and with installing software within it.

And until either a major vendor seriously backs it, or there is a must have killer app that exists only on Linux, desktop linux will remain the preserve of a few oddballs like myself.

Thursday 18 March 2010

digital decor

I’ve recently touched a couple of times on the uses of Flickr photostreams as an artistic resource for teaching and learning, and as a means of presenting a virtual exhibition.

Equally there are a lot of good quality digital images of art out there, and as I waited at the lights on the way home I was suddenly struck by the following ideas:

1) get a cheap digital photoframe, download a set of images, and hey presto instant art for the student apartment. And given people don’t have time to do this themselves, you could imagine a whole industry selling pre-rolled sdcards of renaissance italian paintings or medieval manuscript illustrations …

2) couple this to one of these new fangled cheap mini data projectors, aim it at suitable surface, and hey presto –instant art on your wall, in the loo, on the dining room ceiling or whatever…

I’m strangely taken with this idea.

Medievalism, Orientalism, and a view of reality

Pre-raphaelites, the faux medieavalism of William Morris, the Orientalism of Rosetti have two things in common. Colour and exoticism. Things solely lacking in mid Victorian England.

And as such they had a tremendous influence on Victorian England, letting colour back into a very staid, monochrome, hide bound society. They also perpetrated a view of medieval history and of the Orient that was colourful and exotic. No open drains, flies, or disfiguring diseases, as that spoils the aesthetic power of the image. All elegant and colourfully robed people, plus, and only where appropriate of course, a bit of female nudity.

You can see the same at work in the classical paintings of Alma-Tadema, which despite their claims for historical accuracy could be construed as essentially a means to legitimize looking at paintings of naked late Victorian middle class women- his tepidarium painting being a particularly blatant example.

Nor was France immune from the charms of Orientalism and the resultant image of Algeria as a rose tainted place of exotic carpet dealers and odalisques. (They also went to work on Indochina, and even today in Vientiane you can buy postcards of exotic half naked hilltribes women - sex and exoticism - a powerful combination)

Now all of this wouldn't matter a damn if it wasn't for the very powerful hold Orientalism has taken on our collective imagination. As far as medieval and classical history is concerned it's probably been fairly harmless, and turned as many people on to history as it has annoyed.

But as regards the Orient, or more particularly what the preRapahelites would have known as the Levant and points east and we call the Middle East it has been a disaster, obscuring our understanding of these cultures and societies, with people falling back to 100 year old stereotypes.

And to be fair, the middle east has played along - look at any carpet shop in Turkey catering to western tourists - making and selling souvenirs that speak of exoticism, no matter that off duty the salesmen wear hoodies and jeans, and drive Toyota pickups.

However this failure has crippled us, blinding us to reality while we go blindly on, pretending to be Lord Curzon, enthralled by the remnants of a vanished world ...

Wednesday 10 March 2010

Cardstar and Library cards

Ah the joys of working for a university. So many bright, young and enquiring minds, and those days seemingly all equipped with an iPhone.

So I guess it was only a matter of time that some of them would think of using Cardstar, an iPhone app for managing barcoded cards like supermarket loyalty cards, to keep a copy of their library card barcode and turn up at the issue desk presenting their iPhone to be scanned.

Of course, the devious could have easily added their girl/boyfriend's card to the collection as well, not to mention the barcodes of their last two ex-g/bf's in order to ensure that they get all the relevant books for that term paper. (And indeed for the malicious aor devious run up fines on the ex-g/bf's account or hog books to stop others using them - libraries - all human life is there)

This isn't an problem per se for manual book checkouts - issue desk staff can either demand to see supporting id eg driver's licence or indeed if they don't have their official card tell them to bugger off (in the nicest possible way, of course).

But of course we are all high tech and have automatic checkout machines which work by scanning the item barcode and the library card which are placed together in the scan zone, meaning it would be quite easy to take books out under someone else's barcode.

Fortunately, more by accident and design, the book issuing terminals seem not to cope with scanning iphones and fail to issue a book nine times out of ten.

This doesn't mean that it isn't a problem, just that we've got a little time to come up with a solution.

Zentity

possibly late to the party again I've just happened across Zentity - scholarly outputs repository platform from Microsoft research.

And conceptually it's rather interesting - basically take information search and build connections, including researcher linkages between papers based on citations, again using social network style analyses to describe both the connections between bits of research and researchers, allowing one to come up with results such as 'A and B of the School of Z have cited papers O P and Q by G of the school of Y - shouldn't you guys be collaborating?' etc etc - basically computing the entity relationships just as Entity Cube allows the discerning of weak ties.

Why is it valuable? Firstly it allows people to be more productive, and secondly because it actually assess the relevance of research - as in the following crude example:

A researcher publishes a research paper on gut parasite remains in Viking cess pit deposits (and yes, I do know someone who has published such research) and can statistically demonstrate the likelihood of a population being infected with which parasites. An epidemiologist separately looks at the occurrence of gut parasite related diseases in itinerant goat herders in upland Turkey. Bringing the two together could possibly show
  • eighth century users of viking privies had a lot to do with goats (or not)
  • the expected pattern of diseases has changed over time (or not)
  • and if it hasn't changed much a suitable course of treatment for these Turkish shepherds may be valuable to goat herders from Spain to Kazakhstan
and you only can only make such an argument by bringing together these two disconnected bits of research. As they say - very interesting ...

Sunday 7 March 2010

Document formats

I have recently begun playing with the Office 2010 beta in an effort to understand the implications of a migration from office 2003 direct to 2010.

The problem of migration is not one of document formats per se, but rather of compatibility for things such as word and excel macros.

Document formats are a done deal. Like it or loathe it the 97/2000/2003 doc and xls formats are the de facto standards for document interchange. I work quite happily, never using word or excel, but instead using Open Office and Google Docs for most purposes. In fact to be honest, as far as word processing goes, I find that if I want a local application AbiWord does everything I need – fast, lightweight and responsive.

Couple this with Google’s recent purchase of DocVerse, a company which made its money from enabling the easy sharing of Office documents across the web and we start to see a direction – and one that’s important for cloud computing and collaborative working – while not connected these two seem to go hand in hand as the one promotes the sharing of resources and simplifies the mechanics of the other.

Get away from file systems and start having collections of stuff, like pictures here, writing here, notes here, work related material there and so on one reaches the point where it actually becomes irrelevant where things are. Essentially a sharepoint style workspace abstraction - rather than the use of products such as sakai and moodle to provide a sharing and collaboration platform with their lack of tool and desktop integration.

So with Gladinet I have my Google Docs folder, my Windows live filestore and my home drive at work all connected, and I have dropbox as effectively an online thumb drive of live crucial documents.

And while it’s not quite all there yet, it means that I can work on a document on my mac or XP machine at work, on my Windows 7, Mac or one of my Linux machines at home, or indeed on my little travel computer from a coffee shop in town. All can use Google docs, all can upload to the skydrive, even if they can’t all mount it directly (yet).

This of course means that for the sharing of documents across platforms the document format is crucial.

In an ideal world this should be xml based, and should probably be ODT as it is well known and passes the archivability test of having a range of independently developed applications being able to make use of files created in that format.

Important, as it means that in principle a Martian with a compiler could write a program to display the text, the formatting and the associated metadata accurately.

Well unfortunately it isn’t. That battle has been lost long ago due to Microsoft’s success in selling earlier versions of Office such that the world and his cat used files in these formats – and created a quasi monopoly for these older office formats by disposing of the opposition. In fact their success with the older office formats was such that it inhibited the takeup of their newer OOXML formats due to the chaos that would potentially ensue from having Sales on 2003 and Engineering on 2007.

However, these older office formats are now well known, and the success of applications such as Open Office, AbiWord, Google and Zoho docs in rendering them means they pass the archivability test.

It also means that we are stuck with them for archive purposes, meaning that it is possibly time to realise that, imperfect as they are they are also de facto preservation formats and while applications such as Xena that seek to normalise them are laudable, they are really only performing a conversion from the de facto to the de jure.

It also means that anywhere that wants to start a collaborative environment understands that the thing about sharing (and collaborating) that you can’t predict is what sort of computer your collaborator has, and hence what software they have access to. In short it means that you need to agree a set of acceptable document format standards, not application standards as in the past.

Friday 5 March 2010

that damned arts faculty ...


link to original on flickr :: Originally uploaded by moncur_d.

seen on campus this morning - the students are protesting cuts to humanities courses...

(for this of you not at ANU the man portrayed is our VC, Ian Chubb)


Monday 1 March 2010

Entity Cube

Happened across entity cube, an interesting research project by Microsoft, where they are looking to search web pages, identify entities (people, things, cats or whatever) and build a guanxi or connection map between them - in other words graph out the links (sociology speak: weak ties) between people on the basis of their snail trail through the web, rather in the way that some researchers in France did a year or so ago working through medieval land transactions.

It's in the early stages and neither the data set they are working with, or some things, such as disambiguation are that great, but it's definitely one to watch...

moodle, flickr, you tube and learning object repositories

Building out from my previous post, we face a similar case in the increasing use of flickr images, and flickr sets in courses, be it art history or histology. The same is true of YouTube videos which are again increasingly embedded in online courses.

Now we could just simply back up the local components of the course material and trust that flickr, YouTube and the rest will just be there - certainly that has the merit of saving on disk space. Apart from the risk of the material disappearing off YouTube or Flickr, or whatever that's probably just about tenable if all you want to do is make it available to the end of semester as a revision aid, much in the same way we do with lecture recordings, but if we want to archive it for reuse, or at least re-editing we have all the problems of archiving that we have for long term preservation.

This also brings me to a second point. Much of academic digital preservation is focused on the low hanging fruit of journal articles combined with open access policies. Undoubtedly laudable, undoubtedly important, but very rooted in a model of scholarly discourse in the sciences where the model is:
  • get funding
  • do the research
  • publish it in a reputable journal
  • get more funding
and where we are talking about single objects and a model that has not much changed since Charles Darwin presented a paper to the Linnean Society. And as such this model has informed bibliometric attempts to quantify the impact of research, but which can at worst become a self fulfilling prophecy - Professor A does good work, Professor A gets his work published in the Journal of Important Things, Professor A gets more funding, Professor A hires more RA's, Professor A's team does good work.

This starts to fall apart for the humanities and creative arts, and areas such as computer science where books, presentations, conferences, exhibitions are the main means of building reputation. And it also fails for learning technologies.

More importantly if we were to revisit the question of teaching quality assessment in place of research quality assessment how would we do it for online, or online supported learning without comprehensive archiving of teaching resources.

Under TQA art history departments would show things like the comprehensiveness and qaulity of their slide libraries to show the degree of support that they had for particular courses.

Today, you would present the material electronically, and you would definitely not want a 404 at a critical point in proceedings ...