Tuesday, 23 December 2008

hmong in french guiana

Almost exactly three years ago we went to Laos and northern Thailand, one of our most fascinating trips ever. Certainly as fascinating as byzantine and hellenistic period ruins in Turkey, and that's saying something.

Desperately poor, and utterly deeply fascinating the history of that area got well under my skin and I've maintained an interest in the area ever since, including the various hill tribes of the area, including the Hmong.

The Hmong basically backed the wrong side. Some stayed, a lot left. While I knew that many had relocated to Nebraska, not always happily, what I didn't know was that some were happily farming vegetables in French Guiana, and generally making a go of it ...

Monday, 22 December 2008

metaphors, not interfaces

fluxbox screenshot
- icewm -- fluxbox -

a few weeks ago I blogged about how interfaces had to look real - and I'd still stand by that but I've been thinking some more about this.

When we talk about interfaces for computer operating systems we really mean the desktop/window manager, and what we're really talking about is a set of expectations, which are to large expect governed by XP.

Everyone has used XP, everyone knows how to find their way around XP via the start menu. Other well known interfaces work the same way, KDE has a start menu, gnome as pull downs as does OS X.

Interestingly, though apocraphally, apparently it's easier to move first year information sciences students across to kubuntu than straight ubuntu, for the simple reason it's more XP like in appearance, and XP is what they overwhelmingly use in school.

So we could say we have two common metaphors, the XP metaphor and the gnome/OS X metaphor. Makes one wonder how quite a different minamalist desktop, eg icewm, would fair in usability testing - given that it breaks the set of expectations, the metaphor, that makes something intuitive.

Same goes for browsers. Same goes for word processors. People could move to Open office easier than Word 2007 purely because it was closer to their expectations as to how menus were structured. Even mobile phones are prone to the same problem - most people know how to find their way round a nokia - give them a samsung and they're stumped.

So metphors are like memes, the collection of ideas that people have about how things are going to work and how things are going to be structured. Step away from the metaphor and people perceive it as difficult, need extra training etc, and hence the cost goes up, etc etc.

And this need for metaphor conformance means that everything ends up being the same - great for transition, poor for innovation, and makes radical change difficult, which goes back to my remarks about the linpus interface - they could have used the native xfce and probably got away with it. They could have customised it to make it look like XP, They didn't, they made something simple and self evident. What they didn't do was use a lightweight manager such as fluxbox or icewm.

Obviously I'm not privy to why, but I'm guessing metaphors had something to do with it ...

Saturday, 20 December 2008

austlang ...

Austlang, the online database of Australian indigenous language resources is finally live, thanks to sterling work by Kazuko Obata.

Good to see a project that you were in at the beginning of finally ship.

Wednesday, 3 December 2008

Tom Graham

Tom Graham, the university librarian at Newcastle and previously York, has died.

Tom was a decent man, a confirmed cyclist and most of all, an old school librarian, but one who understood fully the implications of digital media. I didn't know him particularly well, nor did our paths cross that often, but when they did he was always fair and never took himself too seriously. Famous, or perhaps infamous for once puzzledly asking 'what is telnet?' in the middle of a university computing committee meeting, Tom championed the use of digital resources in libraries and helped push the adoption of digital media.

And, underneath it all Tom was still a country boy. I'll always remember the summer morning when I turned up for work to find an escaped herd of cows blocking University Road and Tom is his suit enthusiastically shooing them off the road and onto some open grass beside the library.

Memories like that making working at a university utterly unique ...

Friday, 28 November 2008

social networking tools get the news out

Clearly not the happiest topic, but there was an interesting article in the Telegraph about how flickr, twitter and blogs spread the news of the terrorist events in Mumbai.

The same sort of thing was seen in the Californian bushfires, and I dare say the Thai blogs are full of news from inside Bangkok International airport.

The point is that the technology makes it simple to get the information out and enable the viral spread of news. It's not journalism, it's information, and while there are risks in rumours, misreports and too many reports overwhelming people it provides a channel to get the information out. No more can bad things happen behind closed doors - as we see in the rise of social media in China, or the Burmese governments hamfisted attempts to block uploads of mobile phone footage. If the death of privacy also means the end of secrecy that might be a deal worth living with

Wednesday, 26 November 2008

It's the interface stupid

Over the past few months I've been blogging about things like Zonbu and Easy Neuf, building a whole range of linux vm's, installing linux on old ppc imacs and also playing with a little Acer Aspire One I've got on loan.

Building the imacs was useful - it demonstrated just how little cpu and memory you needed to run a desktop to do the standard things - a bit of text processing, web mail, google docs, and in my case a usenet news reader for the two posts a week I'm interested in. I'm now convinced that an eight year old machine can make a perfectly useful web centric device.

Likewise having built various vm's I've come to the conclusion that gnome and xfce are usable windows managers. Fluxbox and IceWM, both of which work and use even less in the way of resources than xfce are uable, but their a bit rough round the edges, and the desktop manager needs to be slick. Apple did this with aqua and convinced a whole lot of people that unix on the desktop was a viable option precisely by not mentioning unix and hiding it behind a nice interface. Microsoft to an extent tried to sell Vista on the idea that it had a slicker look than the XP interface, and of course we all know that the version gnome that ships with ubuntu has more than a passing resemblance to XP.

Interfaces need to be intuitive, and while they can be different from what people are used to they can't be too different - and everyone knows xp.

And that takes us to the Acer Aspire. A fine machine - so fine I might even buy one, but the interface, based on xfce is dumbed down - so dumbed down it doesn't look like a 'real' operating system, and consequently makes the xp version of he aspire look like a 'proper computer'.

Run xfce natively and it looks like a 'real' machine. The functionality is exactly the same, the applications work the same, but the interface makes it look second rate.

Now linux has a scary reputation out there involving beards sandals and unfortunate trousers. As I said, Apple got away with things by not mentioning that OS X is really BSD, so I'm guessing Acer decided that using Linpus was simple and wouldn't have a support load - the fact that if you want to do something sensible, like install skype, you're back playing with the command line shows that its dumbed down too far - a simple version of xfce (or fluxbox - after all if you can develop linpus you hve the resources to sleek up fluxbox) would have been better.

It makes it look like a real computer - and that's want people want, they want to look different, not a freak.

Tuesday, 25 November 2008

OpenSolaris 2008.11 RC2

Way back in May, I built an open solaris install using 2008.05 on virtualbox.

And after some pfaffing around it worked pretty well, even if t was just another unix desktop. Could have been Suse, could have been Fedora. Six months later and, while it's still only rc2 and not the final distribution version I decided to have a go building 2008.11. This is not a like for like comparison, while I'm again building it on VirtualBox I'm now using version 1.6 and there's always the possibility that Sun have improved its ability to run OpenSolaris distros.

The live cd image booted cleanly, and the Installer to install the system to disk was clean, self evident, and runs well, and seems slicker than the previous version and was reasonably quick, with no errors encountered. Ignoring the false start caused by my forgetting to unmount the cd image, the boot process was fairly slick and professional looking with a single red line in a rotating slider to let you know things were still going on (that and the hdd light). Like many modern distros there was little or no information about what was happening during the process (personally I like the reassurace of the old startup messages).

The boot and login process was utterly uneventful, with the vm connecting strisght to the network (unlike the slightly annoying debian and suse extra mouseclick). The default software install was utterly standard and gnome like, and again open office was not installed by default - probbably because it won't fit onto the distribution disk.

The package installer is noticably slicker with more software available and is intuitive to use.

Generally, it looks and feels faster than the previous 2008.05 distro and somehow looks more polished and professional. Definitely worth trying and building on a 'real' pc. Like Suse it probably wants more resources than ubuntu and really legacy hardware might struggle, but on recent hardware it should be fine.

Monday, 24 November 2008

google earth as a survey tool for roman archaeology?

Earlier today I twitted a link about an English Heritage study of aerial photographs of the landscape around Hadrian's wall to pick out eveidence of the landscape history, including celtic settlements and post roman early medieval settlements.

And then I remembered a report from earlier this year of archaeologists using Google Earth to search for the remains of sites of archaeological interest in Afghanistan.

And then I had a thought. Is google earth good enough to search for the remains of missing Roman period remains, not to mention celtic hillforts and the like. If so one could imagine people doing detailed scans of individual 1km blocks to look for likely sites.

After all they tried the same thing looking for Steve Fosset. It might just be work as a strategy.

perhaps I should try it ...

Sunday, 23 November 2008

PPC imac redux

A few months ago I installed ubuntu on a 1998 imac. The machine turned out to be so useful as a web/google terminal that when I chanced across a 2001 G3/500 for ten bucks on a disposal site I bought it as sort of a spare/additional model with half an idea of using it as a replacement, given that it's twice the speed of the original model, even though it only came with 128MB RAM.

So got it home, turned it on, and found it whined - although it did get better as it settled down, so possibly it's just that the bearings in the fan dried out or something more serious - after all when you pay $10 for something it's no great loss if you end up cannibalizing it for spares.

So, put the ubuntu cd in, held down C as it booted to force it to read the cd. Not a bit of it - it had been configured to boot off the network by default. OK reboot it holding the option key down to get it to scan for a bootable system either on the disk or a cd.

And this is where it got fun. The firmware was password protected, and of course I didb't know the password. However Google was my friend here and came up with a couple of sensible links as to how to reset the password. Now in the instructions it says first add or remove memory from the machine. The machine only had a single 128MB DIMM. At this point I had a senior moment and removed the DIMM. Not surprisingly the machine failed self test as it didn't have any memory in it at all.

So the problem became where to get additional memory. Fortunately it used pc100 DIMMS just like $83 linux machine, so I 'borrowed' some memory from it, brought up the new imac, reset the PRAM, returned the borrowed memory and then rebooted into open firmware, forced a scan, and hey presto we were installing linux.

Well we were trying to install linux, since it was (a) a newer machine and (b) I'm a glutton for punishment I thought I'd try a newer version. Silly boy!. Had as much success as I did when I tried to upgrade the original 1998 imac - ie none. So it's back to the trusty 6.06 distribution. And it works. Runs fine. Possibly I need to buy another one to get more memory, or maybe I should break it up for parts, but at the moment I have 2 ppc imacs running linux. Which is perhaps a little excessive, but Hey, I had some fun and learned some things getting it running ...

Tuesday, 11 November 2008

Getting Data off an old computer

Well since I've mentioned the dread problem of getting data off legacy computers  I thought I'd write a quick how-to. This isn't an answer, it's more a description of how to go about it.

First of all, build up a computer running linux to do a lot of the conversion work on. There's a lot of open source bits of code out there that you will find invaluable. 

Make sure it has at least one serial port. If you can install a modem card that's even better.

Install the following software:

  • Network
    • ftpd
    • samba
  • Serial communications
  • Conversion
    • Open Office
    • Abiword plus plugins

it helps if you're happy with command line operations, and are old enough to remember the days of asynchronous communications. Depending on the sort of conversion you are looking at you might also need a Windows pc to run windows only software to process the conversion. If your linux machine is sufficiently powerful you could run a virtual machine on your linux box instead.

Now turn to the computer you want to get data off. Check if it will boot up. Examine it carefully to see if it has a serial port, or a network port or an inbuilt modem.

If it has a network port, plus network drivers you're home and dry. Configure up the drivers and get the data off either by binary ftp or by copying it to a network drive - this is only really an option for older microsoft operating systems.

If you have an internal modem connect that to the modem in your linux machine. You will need a clever cable to do this. If you have a pair of serial connections you will need a null modem cable, basically a crossover cable. You may be able to find one, old laplink data transfer cables are good, or you may find that you old machine has an 'eccentric' connection. Google may be your friend here to find the pinouts but you have to make up that special cable. Dick Smith or Radio Shack should have all the bits required, but you may have to learn to solder.

On your old machine you need to look for some file transfer software. Often software like Hyperterm (windows) or Zterm (Macintosh) includes xmodem type capabilities, and quite often they were installed by default on older computers. If not, and if the computer has a working dialup connection, google for some suitable software, and download an install it. On old windows machines, including 3.1, Kermit 3.15 for Dos is ideal and freely available.

Also if you're using pure serial communications you need to set up the serial port to something sensible. As some old serial hardware isn't the fastest, something like 9600 baud, 1 stop bit and no parity is a conservative choice. If you're using modem to modem communication they should autonegotiate sensibly.

Then, on your client, configure the connection to the same settings, 9600,n,1 and hit return. Hopefully you should see the login banner of your linux machine.

Login, connect and transfer files. Remember to transfer the files in binary mode. If you don't do this you will lose the top bit and the files may be utterly garbled and useless for any further work.

Then comes the conversion stage.

The first thing to check is if the software on the old machine can export the files in a format that AbiWord or Open Office can read. If so, give thanks.

Export the files, transfer, import and save as the desired more modern format.

However, life is not always kind. But there are a lot of free if old third party converters out there - for example there are scads for wordstar. Sometimes it will make sense to do the conversion on the old host, other times to run the converter on a more modern pc. If it's a windows only conversion application export the disk space from the linux machine via samba and mount it via a windows pc.

Sometimes there simply isn't a converter. This can often involve writing a simple parser using something like perl. If the file format is something with embedded control characters it's fairly simple to write a 'convert to xml' routine. Alternatively you can take the data, strip out all the control characters and recover the text. What you want to do depends very much on the requirements of the job and how important the formatting is. I've written various examples over the years, but this simple example  to fix Microsoft smart quotes should give you a pointer. How easy it is to write a parser is to a large extent dependent on how well documented the file format is and you will need to make some decisions as to what is an acceptable level of fidelity.

Sometimes, it really can be quicker to take the text, strip out all previous formatting and re mark it up by hand!

NASA and the 1973 Corolla fan belt

Important lesson in data continuity here. Even though you have the data available it's no use unless you can read it, as the report in yesterday's Australian showed.

With rewritable magnetic media you need to keep on transferring the physical data, the 1's and 0's, to new media otherwise you risk losing access to the data.

To ram the point home consider the Amstrad PCW8256 - absurdly popular among research students in the late 80's. Cheap, good, and got wordprocessing out among the masses. But it had a couple of drawbacks

  • It used a very nonstandard 3 inch disk format
  • Its own word processor was not particularly compatible with anything else though third party export software did appear in time
At the time I was was looking after data and format conversion for the university I worked at, and we realised it would be a problem down the track, so we went through the process of encouraging people to do their work using a third party word processor that wrote wordstar compatible files, and providing facilities to transfer data off the disks onto other media with greater longevity.

Some did, some didn't. Which probably means that there's piles of documents, including drafts of papers, out there that are totally inaccessible. And it's not just Amstrad. Vax WPS+ or Claris Works on the Mac Classic are just as much a problem - dependent on having suitable hardware if you only have the media.

Of course if you have 1's and 0's you then have a different set of problems around your ability to parse the file and the lossyness of any conversion process ...

And this can be a real problem in a lot of the humanities - when I worked in the UK I kept coming across working on things like tudor church records that had kept on using old computers and old software for transcription as they were doing it as something in addition to their day job, or because they were wildly underfunded, or whatever, but which basically mean that their data was inaccessible the moment they created it, and that getting the data off these elderly systems an onto something more sustainable was a major challenge ...

Monday, 10 November 2008

Stanza from Lexcycle - a first look

On and off I've been blogging on e-books and dedicated e-book readers. In the random way one does I stumbled across Stanza, an e-book reader both for the iPhone and the desktop. 

I've yet to try the desktop version, but I installed the iPhone version on my iPhone (where else), downloaded a copy of Tacitus's Germania from Project Gutenberg to try.

Basically, it works, the font is clear and readable and the 'touch right to go forward, touch left to go back' interface is natural and intuitive. Like all electronic displays you have to get the angle right to read it easily, but then it's easier than a book to read in poor light.

Basically it looks good and worth playing with some more ...

Friday, 7 November 2008

mobile printing an the page count problem

I've outlined elsewhere my suggestion for a pdf based/http upload style mobile printing solution.

The only problem is that microsoft word does not natively support pdf export, which means installing something like CutePDF on students pc's, or alternatively getting them to use Open Office, which does do native pdf export.

Both are bad things as they involve students hving to install software on their (own) pc's. This is generally a bad thing as we end up having to field version incompatibilities, general incompetence, and "open office ate my thesis" excuses.

However open office can be driven in command line mode so the answer might well be to provide a service that takes the input files and converts them to pdf (zamzar.com is an example of a similar service.) This isn't an original idea:  PyOD provides one possible solution. (There's also software that runs happily in command line mode to deal with the docx problem if we want to expand this out to a general conversion service ...)

Users can then queue the files for printing or not as the case may be. We use pdf as that gives an ability for users to check the page count before printing, and more importantly for us to display a list of the files awaiting printing and their size in terms of pages, which given that students pay by the page for printing, is all they really care about ...

Medieval social networking ...

A few days I twitted a link about a project to reconstruct a medieval social network.

At first it may seem a little odd , but bear with it it's quite fascinating. They took advantage, much as Ladurie did with Montaillou of extant medieval records to work out the network of social obligation. In this case they used 250 years worth of land tenancies (around 5000 documents) to work out the network of social obligation between lords and tenants, and its changes, between approximately 1260 and 1500, a period which encompassed both the black death and the hundred years' war. They also made use of supporting documents like wills and marriage contracts.

What is also interesting is the way that they used a record corpus which had been created for another project as input data for the Kevin Bacon style study, and along the way demonstrating the need for long term archiving and availability of data sets.

The important thing to realise is that medieval France was a society of laws and contracts rather than the Hollywood view of anarchy, rape, pillage and general despoilation. Sure there was a lot of that during the hundred years war, but outside of that, there was a widespread use of written agreements, which were drawn up by a notary and lodged appropriately.

The other useful thing is that these documents were written to a formula. Notaries processed hundreds of them in their careers and they wrote them all in more or less the same way. This means that even though they're written on calfskin in spidery late medieval script they can be codified in a database and treated as structured information, and analysed on the basis of network theory by being able to plot the closeness of particular relationships between individuals.

So what did they find?

Nothing startling. It confirmed the previous suppositions of historians. Slight letdown, but still very interesting as much of history is based on textual records of the seigneurial (landowning) class and the doings of senior ecclesiastics, for the simple reason that they were part of the literate universe, and the peasants who made up 85-90% of the population were not. That's why we have tales of courtly love but not 'swine herding for fun and profit'.

broadly they found that the seigneurial class contracted during the hundred years war, relationships became more linear with a number of richer more successful peasants buying up smaller and abandoned farms, and that in the course of the hundred years war some peasants developed wider social networks themselves, due to them taking over a number of tenancies and effectively becoming rentiers.

We also see the seigneurial class renewing itself over time, and also people moving in from outside to take over vacant tenancies.

As I said nothing remarkable However the technique is interesting and it might be interesting to run the same sort of analysis on town rent rolls etc to try and get a more accurate gauge of the impacts of the black death etc.

Sunday, 2 November 2008

drawing together some book digitisation strands ...

Melbourne university library's decision to install an espresso book machine in the library and Google's recent settlement with authors probably means increased access to electronic versions of texts, which can either be printed as on demand books, or refactored as e-books, given the apparent increasing acceptance of e-book readers.

What's also interesting is Google's use of bulk OCR to deal with these books irritating scanned as graphic images, meaning the text can be extracted from the pdf for re-factoring with considerably greater ease.

All of which I guess is going to mean a few things:

  • The international trade in hard to find second hand books will die
  • Someone somewhere will start a mail order print on demand service to handle people without access to espresso book machines
  • e-readers will become more common for scholarly material- interesting or difficult texts will become solely print on demand or e-texts
  • Mass market books will always be with us, economies of scale and distribution cancel the warehousing costs

a slice of migrant life

A couple of months ago I blogged about our purchase of a Skype phone, and damned useful it's been, both for business and pleasure. However the concept of a wi-fi phone seems to be beyond most people as Skype is something you like use at your computer to make scheduled calls rather than a replacement for an international phone service.

So my use of a wi-fi phone remains an oddity. But perhaps less so than it seems.

Yesterday we were running late doing house things and lunch was a bowl of pho at a vietnamese grocery store in Belco market. While we were slurping, this guy appears, speaks to the owner, is handed a phone, and he then disappears out into the carpark for a bit of quiet to make his call. If we thought anything of it, it was along the lines of 'son calling girlfriend back home'.

By chance I was paying the bill when the guy reappeared and handed the phone back, which was a wi-fi phone, just like the one I bought. More interestingly, he handed the shop owner five bucks, and he wasn't family, he spoke to the shop owner in English, not Vietnamese, and then asked to buy a discount phone card for calls to China (basically a prepaid card that gives you so many minutes at a discount rate to a given destination), ie he was Chinese not Vietnamese.

So I'm guessing that the store had a skype phone for their own use and would lend it for a fee to other people they know to let them make calls overseas if they had to do so urgently.

And a wi-fi phone as you can leave it on, pass it around family members, people can call you and you don't tie up the computer when you need to use it for business.

Makes sense ...

Monday, 27 October 2008

Genghis Khan and the optiportal

For those of you paying attention in the back, the link I twitted about searching for the tomb of Genghis Khan, makes reference to the use of an Optiportal to display the results of the findings:

Explains Lin : "If you have a large burial, that's going to have an impact on the landscape. To find Khan's tomb, we'll be using remote sensing techniques and satellite imagery to take digital pictures of the ground in the surrounding region, which we'll be able to display on Calit2's 287-million pixel HIPerSpace display wall. ...

which sounds an interesting use of the technology for large scale visualisation work.

Wednesday, 22 October 2008

mobile printing (again!) ...

Going round in circles on this one. Way back last year I suggested a mobile printing solution that essentially had students printing to a pdf file and uploading the pdf file for later printing. The idea of using pdf rather than postscript was to get round the problem of students having to install a generic postscript driver and configure it.

It was a fair solution and that bears some work. The scenario is that the student uploads the pdf file to their workspace and then has to login to a web based application to select the file and queue it for printing, file conversion being enabled by running pdf2ps and queueing it as a print job.

However the landscape has changed between then and now. What is required is a mobile printing solution that will interwork with whatever print/quota management solution is deployed.

The crude solution is quite simple. We create an lpd holding queue as a generic postscript queue and users spool to it. Messy and users need to authenticate in some way, meaning that its probably not on given the diversity of operating systems out there. However the idea is attaractive - users then log in to a web application, authenticate again, and then select the job they want to print, which is passed to the backend print system. Lpd is not a good solution as it doesn't do authentication. IPP does and in fact we would use ipp based printing with authentication. However installing an ipp printer needs having to distribute a print driver, or have users configure a printer locally. To make the call of the complexity or otherwise of doing this have a look at this example from a US university.

We could combine the ideas into something more elegant. Users still need to generate pdf files, which probably means windows users installing CutePDF (or using open office to export to pdf).

Once the user has generated the files they then open a connection and login to a web app.

They are presented with a list of files and an option to print them now.

They also get an option to (a) upload a file to print later (b) print a file now.

Option (a) puts the file in the spool space for printing later.

Option (b) takes the pdf file and spools it with pdf2ps to a printer. Users can also get an option as to which printer to print to. The print job is handed over to the backend print system, users print quota debited and so on.


  • Users can upload and print the files from anywhere
  • Users can upload but defer printing of files until they are ready to print
  • Users only need to login to the print management application once


  • Users need to create pdf files
  • Printing is not seamless - users need to login to print management application

If we go for an ipp only solution:


  • Users need to install and configure remote printer
  • Users need to login to separate print management application to release print job
  • User do not have the option of immediate print


  • Printing is seamless

And yes we probably could offer both options in some wierd combination ...

Monday, 20 October 2008

Shibboleth and the shared cloud

Increasingly universities and by extension university researchers are in the collaboration game, not just within institutions but across institutions. And the collaboration game requires access to shared resources, a blog server there, disk space here and a compute resource over there, essentially with a cloud of shared resources.

And then one comes up against the 800kg problem of authentication. Originally all this collaboration was a bit ad hoc - a research group needs a blog - install ubuntu in server mode on an old pc in a corner of the lab, install apache, php and wordpress. Create some local accounts, not forgetting one for Joe who'd moved to another institution, and hey presto, we were blogging. Spontaneous, effective and fun - well at least for those research groups that had an IT literate postgrad, who then moved on elsewhere and no one knew much about maintaining it.

Or take the other scenario, the middle aged lecturer in middle english who starts this really interesting blog on the divergence of Frisian English and Dutch around the time of the great vowel shift. Of course he doesn't know much about blog software so he gets a hosted wordpress or blogspot account and then invites a few of his mates to contribute posts. Very ad hoc, very spontaneous and totally invisible to any research quality assessment exercise.

But using small scale resources means that we don't need to have complex authentication, and because everyone's mates, everyone trusts each other.

The only problem is that these blogs increasingly represent a research output and a form of scholarly communication, which means that universities want to host them, if only to ensure their backed up. And hosting of course means hosting them in a multiuser environment with properly provisioned accounts, something that a lot of blog software isn't really designed for, hence projects like lyceum, wordpress-mu and their various ldap plugins.

And that's where it stops. Authentication only is within the institution's ldap domain, or in the case of some of the large system universities in the States, the particular college's ldap domain, effectively killing the spontaneity of collaboration stone dead.

Fortunately there is an answer - shibboleth, which has kind of been a wide area authentication mechanism without a purpose. Everyone thinks its a good idea, but really, until collaboration is required, it remains mere geekery. The joy of shibboleth is that while the implementation can be no easy exercise, for the end user it can be as easy as using openID to gain access to services. It does also require that the shared resources are shibbolized to work with a shibboleth basd mechanism. Given that many of the products required are open source this is less of a problem than it might be.

Also as shibboleth gives users control over what attributes are released the amount of information disclosed is consensual.

The only real problem remaining is a mechanism to provide access for people affiliated to institutions with no IdP, or visitors from outside of the academic world. There need to also be a mechanism for institutions to 'sponsor' non-affiliated invitees to get a non-affiliated shibboleth account that may be more restricted in scope but which will allow them to work with groups of affiliated researchers across institutions. Such a service could be provided on a per institution basis, or on some other basis, for example one provided by the government for employees engaged in collaborative research rather than on a per agency basis.

The mechanics don't actually matter, it can be one solution or a mix of solutions, the main problem is to ensure that whatever solution is found encourages openness and the free flow of communication both within institiutions and across institutions.

technology meets the electoral process

 A small win for technology in the electoral process last Saturday at the ACT elections for a new territory government. 

Up to now the process has been resolutely nineteenth century, turn up, give your name to the electoral clerk, who then riffles through a massive bound listing of the electors' register, asks you to confirm your address, and rules off your entry to show you've voted and then hands you a ballot paper.

As voting's compulsory, I guess that some group of poor bastards had to collate all the entries to make sure all us good citizens had voted - preferably only once - and issue infringement notices for anyone who didn't and didn't have a good excuse.

Well last Saturday was different. Queue up, talk to the elections clerk who uses a wireless pda to confirm your registration details against the voter's database before handing you your ballot paper. No riffling, no collating. No costs for printing the electoral roll.

What's even nicer is that they borrowed the pda's from the Queensland electoral commission so no acquisition cost, just the software development and network costs.

What I especially liked about it was its sensible, low key, pragmatic use of technology and unlike voting machines with their endless complications and audit requirements, totally non-controversial.

Saturday, 18 October 2008

Putting Twitter to work

Over the last few weeks I've been experimenting with twitter, or more accurately using twitter to experiment with microblogging.

So far so geek.

Now i have a real world application for it -providing live system status updates. One thing we lack at work is an effective way of getting information that there is a system problem out to people. Basically we email notifications to people.

However is we can apply sufficiently rigourous filters to the output from various montoring sysstems such as nagios we can effectively provide an input feed into a micro blogging service. This then produces an RSS output feed which people can then refactor in various ways elsewhere on campus.

And of course we can inject human generated alerts into the service and use our own tiny url service to pass urls of more detailed technical posts on a separate blog when we have a real problem,

Also, we could glue the feeds together, much as I have with this blog in a webpage where people can check and see if there is a problem, or indeed what's happening in the background - good when you have researchers in different timezones wanting to know what's going on, and it gives a fairly live dynamic environment.

All pretty good for a couple of hours creative buggereing about to get my head round Twitter ...

Wednesday, 15 October 2008


This morning I went to a presentation by Larry Smarr  on Optiportal.

Optiportal is one of the spinoff's of the Optiputer  project. Essentially it's a large scale visualisation system built out of commodity hardware, cheap to build, and essentially made out of some standard LCD displays and off the shelf mounting brackets. The displays are meshed together to give a single large display. So far so good. Your average night club or gallery of modern art could probably come up with something similar. However it was more than that, it allowed large scale shared visualisation of data, and this was the more interesting parts - using high speed optical links to move large amounts of data across the network would allow people to see the same data presented in near real time - rather like my former colleague's rather clever demo  of audio mixing across a near-zero latency wide area network. It could equally be a simulation of neural processing or an analysis of radio telescope data. (or indeed, continuing the digital preservation of manuscripts theme, se set of very high resolution images of source documents for input into downstream processing)

And it was the high end data streaming that interested me most. Data is on the whole relatively slow to ship over a wide area network, which is why we still  have people fedexing terabyte disks of data round the country, or indeed people buying station wagons and filling them with tapes. Large data sets remain intinisically difficult to ship around. 

When I worked on a digital archiving project one of our aims was to distribute subsets of the data to remote locations where people could interrogate the data in whatever way they saw fit. As these remote locations were at the end of very slow links updates would have been slow and unpredictable, a set of update dvd's would work better. If we're talking about big projects such as the square kilometre array project we're talking about big data, bigger than can be easily transferred with the existing infrastructure. For example transferring a terabyte over a 100 megabit per second link takes (8*1024*1024*1024*1024/100*1024*1024)/60/60 hours - or almost a day - and that's without allowing for error correction and retries. Make it a 10 megabit link and we're talking 10 days - even slower than AustraliaPost!

What you do with the data at the other end is the more interesting problem given the limitations of computer backplanes and so on. Applications such as Optiportal serve to demonstrate the potential, its not necssarily the only application ...

[update: today's Australian had a different take on Smarr's presentation]

Tuesday, 14 October 2008

the twitter feed

If you look at this blog in a browser you'll notice a list entitled 'interesting links' on the right hand side of the window. This is actually fed from twitter via twittermail, and I've been doing this as an experiment for just over a month now.

As an experiment I think it's quite successful, if only for me to (a) keep a list on interesting links in my sent mail and (b) share interesting things I've been reading with anyone interested enough to follow this blog.

Of course it doen't have to be twitter and twittermail, various other microblogging services provide the tools to create a similar effect - here it's twitter and twittermail purely because I found the correct off the shelf tools in their software ecology.

Personally I think this anonymous sharing model is more useful than the social networking 'share with friends' closed group model - it allows people who may be interested in a topic and who follow your blog also to follow the links you find interesting. Social network providers want to of course use the links to help add value to people who are part of the network to keep them hooked, or sell targeted advertising to or whatever.

In fact it's probably almost worthwhile also providing a separate rss feed for the interesting links as it's not beyond the bounds of probablity that someone finds the collection of links more useful than the original blog.


People have accused me of being fixated on google products. Not true, even if it may look that way at times. For example I still think that Zoho provides a richer more flexible set of online tools than google apps.

Likewise I've stuck with Bloglines as a blog aggregator in preference to google reader. In fact I've been using bloglines since 2004 which must mean something. And I've been happy with it - performance is rock solid, or rather was. Since the last upgrade it's claimed that feeds did not exist (including well known ones like guardian.co.uk) and if you re-add a feed it will work for a bit then stop. Response is poor compared to before the upgrade (being ever so anal I tend to read the feeds at the same time in the morning - so I think I can claim this even if it's anecdotal).

So one more bit of me has been assimilated by the borg - I've moved over to google reader. Not such an elegant interface, but more reliable, and given that my major reason for reading rss feeds is industry news and updates that's worth trading elegance for performance ...

Monday, 13 October 2008

not installing plan 9 ...

People react to having a blank day in their calendar in various ways. Some people (apparently) go looking on the web for pictures of people of the opposite gender in various states of undress, some play interactive Sudoku games. Being a sad anorak I do neither, I build test installs of various operating system distros  or play with various bits of software.

Today was an operating system day, and finally it was time for Plan 9.

Plan 9 irritates me as an operating system. It gnaws away at the corners of my mind saying 'look this could be interesting' like an itch you want to scratch and never do.

The whole idea of a simple distributed operating system to link together nodes for grid based application s is interesting and perhaps useful, but I'd never gone so far as to build an installation. Well this morning I did, building a instance on VirtualBox  rather than a real physical machine. 

After a little trial and error with the network settings (PCnet-PCI II (NAT) works fine) I started the install, taking the defaults all the way through on the basis that I didn't know what I was doing, though I had read the documentation. Installation was glacially slow, much slower than usual when building things under virtualbox, but basically worked with only the odd exception, usually caused by disk retries. Basically worked that is, until almost the last stage, unpacking the software image when it failed.

Probably due to running it on a vm more than anything else. Add it to the list to try next time I get another old machine to play with...

Tuesday, 7 October 2008

Nine months sans Microsoft ...

Well, today marks nine months since we became a microsoft free household (not quite true, we still have an old w2k laptop hidden away just in case, but I've only had to get it out once to configure a linksys wireless router ).

Now, I am not an anti microsoft zealot. Yes, I think Microsoft's business practices were not the best, but then all through the nineties and well into the first few years of the present decade there was no real alternative as a mass market desktop operating system - linux wasn't (isn't ?) there, and Apple seemed to lose the plot, and it took a long time to come back with OS X. The same in the application space. The competitors were as good if not better, and they took a long time a dying. Microsoft got where it was by either having products to which there was no serious alternative, or by convincing people that there was no serious alternative to Office and the rest. That was then and this is now, we work with the present reality.

Judi isn't an anti anything zealot as far as computers go. Computers are tools to her. Email, web, grading student reports, writing course notes and assignments and that's it. Providing she can send emails and buy stuff online, research stuff and get into the school email system she's fine.

Our decision for going microsoftletss was purely pragmatic. I can do most of what I can do with abiword, open office, google apps, firefox, zoho and pan, and I can do this on a couple of fairly low powered machines - an old ppc imac and a pc I put together for $83. Judi likes to play with digital photography, so we bought an imac purely because the screen was nicer. I though we might have to buy a second hand windows pc as well but that hasn't turned out to be the case. Firefox, safari, google apps and neo office have let her get her work done, even coping with the docx problem.

The only couple of problems we've had is with bathroom design catalogues (canned autorun powerpoint on a cd) and an ikea kitchen design tool. Things we could work around easily and turned out to be totally inessential. Other than that it's fine. Emails get written, appointments made, books bought, assignments graded, documents formatted.

So we've proved you can live without windows. We've also proved we could live with windows and not linux or OS X - no operating system has a set of killer apps for middle of the road usage.

And more and more we're using online apps - at what point does zonbu or a easy neuf bcome a true alternative? (and when do we get vendor lock in and all the rest in the online space?)

Sunday, 5 October 2008

what is digital preservation for?

I have been thinking a lot about digital archiving/preservation and what is the use behind it. In part I've been doing this to clarify my thoughts, as while the technologies of digital archiving and preservation are well understood the purpose is not and often different purposes are conflated. So lets step through the various options:

Digital Archiving as preservation

Here one essentially wants to keep the data for ever, cross hardware upgrades and format changes. Essentially what one is doing is taking a human cultural artifact such as a medieval manuscript, an aboriginal dreamtime story as recorded and making a digital version of it and keeping the file available for ever.

This has three purposes:

1)Increased access - these artifacts are delicate and cannot be accessed by everyone who wishes to. Nor can everyone who wishes access have or can afford access. While the preservation technology is expensive access is cheap - this is being written on a computer that cost be $83 to put together. This also has the important substrand of digital cultural repatriation - it enables access to the conserved materials by the originators and cultural owners. Thus, to take the case of a project I worked on, Australian Aborigines were too impoverished to conserve photographs and audio recordings of traditional stories and music, digital preservation allows copies of the material to be returned to them without any worries about its long term preservation.

2) Long term preservation. The long term conservation of digital data is a 'just add dollars' problem. The long term preservation of audio recordings, photographs, is not. And paper burns. Once digitised we can have many copies in many locations - think clockss for an example design and we have access for as long as we have electricity and the Internet.

3) Enabling new forms as scholarly research. This is really simply an emergent property of #1. Projects such as the Stavanger Middle English Grammar project are dependent on increased access to the original texts. Without such ready access it would have been logistically impossible to carry out such a study - too many manuscripts in too many different places.

Digital archiving as publication

This seems an odd way of looking at it but bear with me. Scholarly output is born digital these days. It could be as an intrinsically digital medium such as a research group's blog, or digitally created items such as the TeX file of a research paper or indeed a book.

This covers e-journals and e-press as well as conventional journals, which increasingly also have a digital archive.

These technologies have the twin functions of increasing access - no longer does one have to go to a library that holds the journal one wants, and likewise one has massively reduced the costs of publication.

Of course there's a catch here. Once one had printed the books and put them in a warehouse the only costs were of storage. These books were then distributed and put on shelves in libraries. Long term preservation costs was that of a fire alarm, an efficient cat to keep the depredations of rats and mice in check and a can of termite spray. OK, I exaggerate, but the costs of long term preservation are probably higher, in part due to the costs of employing staff to look after the machines and software doing the preserving and making sure that things keep running.

The other advantage is searchability. One creates a data set and then runs a free text search engine over it. From the data comes citation rankings, as loved by university administrators to demonstrate that they are housing productive academics (e-portfolios and the rest) and also the creation of easy literature searches - no more toiling in the library or talking to geeky librarians.

Digital preservation as a record

Outside of academia this is seen as the prime purpose of digital preservation. It is a way, by capturing and archiving emails and documents of creating a record of business, something that government and business has always done - think the medieval rent rolls of York, the account ledgers of Florentine bankers, and the transcripts of the trial of Charles Stuart in 1649. While today they may constitute a valuable historical resource at the time they served as a means of
record to demonstrate that payment had been made and that due procedure had been followed.

In business digital preservation and archiving is exactly that, capturing the details of transactions to show that due process has been followed and because it's searchable, it's possible to do a longitudinal study of a dispute. In the old days it would have been hundreds of APS4's searching through boxes of memo copies, to day it's a pile of free text searches across a set of binary objects.

Digital archiving as teaching

When lecturers lecture, they create a lot of learning aids around the lecture itself, such as handouts, reading lists. The lecture itself is often a digital event itself with a PowerPoint presentation of salient points, or key images, plus also the lecture itself.

Put together this creates a compound digital learning object and some thing that is accessible as a study aid sometime after the event.

While one may not want to keep the object for ever one may wish to preserve either components for re-use or even the whole compound object as the course is only offered in alternate years.

However these learning objects need to be preserved for reuse, and in these
litigious times, to also prove that lecturers did indeed say that about Henry II and consequently students should not be marked down for repeating it in an exam.


So digital preservation and archiving has a purpose, four in fact. The purposes differ but there are many congruences between them.

Fundamentally the gains boil down to increased accessibility and searchability.

The commercial need for such archiving and search should help drive down the cost of preservation for academic and pedagogic purposes. Likewise academic research relevant for digitisation, eg handwriting recognition, and improved search algorithms should benefit business and justify the costs of academic
digital preservation.

Thursday, 2 October 2008

Viking mice, the black death and other plagues

Interesting article [New Scientist, Scotsman] about how most house mice in Scotland, Wales Ireland have a characteristically Scandanavian genome, while mice in other parts of Britain have a genome associated with bronze age migrations of the first farmers, sugeesting that it wasonly in Viking times that the populations were sufficiently dense to sustain a mouse population with more intensive grain cultivation.

Likewise, prior to the black death we know the population had undergone a fairly rapid expansion, and hence could support the large rodent population in towns required as a reservoir for the plague bacterium. It has been hypothesised that the Black death was not a significant problem until it ended up infecting the large urban populations of Alexandria and Constantinople as part of the Plague of Justinian in 548. Large population, seaport, lots of rats.

There's also recently been a suggestion that the same sort of thing has happened with HIV in Africa - it had probably always been there but until significant urbanisation in the twentieth century, with accompanying population densities and greater opportunity for random sexual encounters.

So what does this mean for seventh century Britain. It's often argued that there was a plague event that preferentrially devastated the sub-Roman communities in the west of the island rather than the areas under Saxon control, providing an opportunity for further expansion westward by the Saxons. Does this mean that the plague was in the population and the outbreak was a result of higher population densities in the west capable of supporting a plague reservoir or does it simply mean that continued contact with the Byzantine east exposed the sub-Roman population to greater risk of infection as they had ports and the remnants of urban communities surrounding the ports to form an initial entry point for the plague?

If the former it suggests that the sub Roman successor states were capable of holding their own despite the loss of a lot of the prime agricultural territory, but to make a decision we need to know more about trade patterns, for example were the subroman populations also trading grain with northern France ?

Tuesday, 30 September 2008

Putting some medieval digitisation strands together

Over the past few weeks I've posted various links and updates to posts around digitising medieval manuscripts, character recognition and then using the material to build up a corpus for textual analysis

Now with the Stavanger Medieval English Grammar project we see how such a solution would work. Crucially we need to go back and digitise the sources - later editors 'smoothed' the text in places and regularised transcriptions, meaning that sources like Project Gutenberg simply don't work. The sources are not actually transcriptions of a single document - medieval books are more like open source projects with the same basic text but some bits added in or taken out. Think ubuntu, kubuntu, xubuntu - all basically the same but different utilities and window managers. So we have to identify common passages for analysis. Not intellectually difficult, but it does take longer.

The other source we have is legal texts, such as the records of the Scottish Parliament where transcriptions are likely to be more accurate - if rather less interesting to read. Of course accuracy is not necessarily a help here as it's the mistakes that are interesting, not the fidelity of the copy, but as they contain a lot of stock bits of boilerplate we can probably see the evolution of grammatical changes.

The other, unanswered question is how good auto recognition of medieval handwriting is. Clerks, who produced manuscripts as an act of devotion tended to have nice text. Commonplace books and legal records less so, sometimes quite a lot less so ...

Saturday, 20 September 2008

Adam of Usk and the Espresso book machine ...

Yesterday I twitted a link about an Espresso print on demand machine being installed at the University of Michigan. By chance, yesterday's Australian also had an article about how A&R, Australia's largest bookseller was deploying them at a number of their stores . So far so interesting.

As I've said elsewhere many times print on demand is the ideal solution for rare and obscure books and out of print titles. Basically all you need is a computer, a digitised version of the book, a printer with an auto binder. And the technology to do this is cheap, when a basic laser printer costs a couple of hundred dollars, and a rather more meaty one under a thousand.

And there's a lot of material out there, Project Gutenberg has been happily digitising old out of copyright texts, and now that many of texts have markup they can be processed easily and reformatted for republication.

And we see that publishers have begun to use this to exploit their backlists as in FaberFind. And certainly when I helped put together a similar project for a scholarly publisher, that seemed to be the way to go. No warehousing, no startup costs for printing, just churn them out when required, and only digitise and work on the original text when requested. That way while the first copy was expensive in terms of time and effort, any subsequent copy was free other than the cost of paper and toner.

Not e-texts?

Well once you've a digitised marked up text it's relatively easy to convert it into any of the format commonly used by bookreaders. Texts are hard to read and annotate on the screen, and I would assume so on a Kindle or Sony Book reader - I'm hypothesising here, I've never seen either of these devices - they're not available in Australia but clearly they are supposed to be the ipods of the book world. Anyway, while they may work for fiction or any other book read from beginning to end, I suspect that it's not quite got the utility of a book. And you probably can't read in the bath :-). An e-text reader that allows you to export the text to an sd-card and then take it to print and bind machine for backup or reference purposes might hit the sweet spot for scholarly work. That way you could have a paper reference copy and a portable version to carry around.

And Adam of Usk ?

Adam of Usk was a later fourteenth century/early fifttenth cleric, lawyer, historian and chroncler. If he'd been alive today he'd have been a blogger. He wrote a long rambling gossipy chronicle - part diary part history that covers a whole range of key events from the visit of the Emperor Manuel II of Byzantium to Henry IV of England to drum up support for Byzantium's war against the Turks, Adam's time serving on the legal commision to come up with justifications for the forced deposition of Richard II to the events of the Welsh Wars of Owain Glyndwr and the Peasnt's Revolt.

A book that you'd thing there'd be a Penguin classic edition of. Nope you're wrong. There's an 1876 translation (Adam wrote in Latin) and newer 1997 translation published at the cost of a couple of hundred bucks a copy - purely because this sort of book is probably only really of interest to scholars and the costs of short run conventional publishing are horrendous and self defeating.

Why there's no readily available edition is just one of these mysteries. Gossipy and rambling but then the Alexiad is not exactly a model of conciseness and tight structure. Bust basically there's no readilty available edition to dip in and dip out of. In short it's the ideal example of a text tailor made for print on demand publishing. Thhe thirty bucks than a print on demand copy would cost is a damn sight cheaper than the cost of even a tatty sceond hand version of the 1876 edition (cheapest I found was GBP45 - say a hundred bucks)

Friday, 12 September 2008

Peek email reader ...

Yesterday I twitted a link to a report about the Peek email reader.

My immediate reaction was that I want one. My second one was 'what a simple device!'.

Working backwards, its fundamental constraint is it's reliance on GPRS, which is slow and typically priced in terms of the amount of data transferred. So to be economic (or to make a profit) you don't want to stream too much data. However the speeds not unlike a dialup modem, and while complex big documents were slow to transfer, simple character mode email programs such as pine worked just fine. Pine or indeed mutt are good examples - relatively simple interface and coupled with a basic editor such as pico or nano very useful.

And you don't need complex formatting to send emails. It's about text after all. 

But increasingly email is sent in rich formats these days, usually html based but not always. This adds to the payload, the amount of data transferred, but not the content - what's said. Now emails are usually multi part including the mime encoded attachments and the sexy version of the email and hopefully a plain text version.

So by interposing a server that polls the users mailservers by using a fetchmail like process and then decomposing the message into its component parts, throwing away all the non plain text parts and/or stripping out any exteaneous formatting to get rid of the non plain text parts. (yes I've seen examples of message that would die if this was done to them but not that often).

Suddenly you've got a lightweight message to forward on.

As I say, very simple - breathtakingly so. You could imagine also doing a similar service based on qeu and dsl on a usb key and booting into pine. 

In fact, many years ago I built something very similar. 

We had a pile of old computers with limited memory, network cards and no hard disks. And we had boot roms which did pxe style requests to allow you to transfer down and execute a 1.44Mb floppy image. What I did was put an operating system on it, a tcp stack and a locked down version of kermit (actually ms-kermit and freedos) in terminal emulation mode that logged into a sun server and forced the user into pine. Logging out of the system forced the pc to reboot (basically we waited till we saw the word logout go past) to ensure a clean session for the user - basically a quick and dirty university email reading terminal - login, read mailm exit and walk away.

Peek is an enhancement of this concept, and a damned interesting one ...

what transcription mistakes in manuscripts might tell us

words change over time. That's how language changes and diverges. Sometimes the change is rapid and sometimes it's slow. That's why we can more or less follow Shakespeare and not Chaucer, and it's doubtful if Shakespeare would have had any less difficulty understanding Chaucer than we have.

Equally language changes over geography as well as time - the English spoken in Kingston, Jamaica is very different from that spoken in Kingston in the ACT or Kingston-on Thames in London although the latter two are not very different, for a whole lot of reasons such as consistent recent bi-directional migration, greater degree of education etc etc.

By looking at language changes over time  we can see how language changes and show that it evolves, with less common words changing their forms quicker than more common words as people are more likely make mistakes with the rare ones than the common ones.

Anecdotally, you can observe this in Australia, where the English spoken, while almost the same as that in the south of England, is a simpler version, the reasons for this probably being due to the need to absorb migrants from non-English speaking backgrounds, whose command of the language may be a little shaky.
impact of cheap technology And then I got to wondering. Projects such as the Canterbury Tales project  transcribe old manuscripts and collate the differences  in an attempt to build a consensus about what Chaucer originally wrote. But these manuscripts also tell us something about how people spoke, because the transcription 'mistakes' the scribes made were often unconscious corrections to usage. 

They are in fact a frozen record of language change. Of course it's more complicated than that, we need to know the provenance of manuscripts to work out which are temporal corrections - reflecting changes in language over time, and dialectical corrections reflecting geographic distance. And we need a big corpus.

So how do we get a big corpus of text. Typically these texts have been transcribed by hand but advances in character recognition algorithms  and the impact of cheap technology, including cheap digitization technology should give us a large corpus to subject to genetic analysis.

This could be very interesting (in a geeky sort of way) ...

Java Toasters

in 2001 Robert Southgate came up with the idea of the java toaster that burns the weather forecast into your morning slice.

Time and technology move on but the idea stays the same - Electrolux now have come up with a USB version that's essentially a thermal printer for toast ...

Thursday, 11 September 2008

interesting twitter behaviour ...

I've noticed something clever in twitter (or twittermail - havn't worked out which).

If I create a mail message with a tinyurl url it gets passed via twitter unaltered to my twitter page. If I do the same thing but use our in house short url service, which is based on nano url, the resulting twitter display is a tinyurl link.

Odd. Actually if you look at the flow what has happened is that the non-tiny url has had a tinyurl created for it - implying that rather than truncating the message there's intelligence to process and automatically generate tinyurls for all non-tinyurl url's

Tuesday, 9 September 2008

Twitter ...

My problem with playing with social networking is my shy and retiring nature. No, really. I genuinely don't think that the world is gagging to know that I've just spent $800 on the brakes and steering on my car, or indeed what I'm doing on a day to day basis.

And then I had a thought. One thing I do do is skim blogs and online newsfeeds for things that interest me. And I've often thought about doing a daily post on today's interesting things. Instead I'll use twitter and either tinyurl or our in-house short form url service to post links to things I find interesting ...

ambient intimacy

interesting and thoughtful post from the IHT on ambient intimacy - or how social networking influences society and allows people to feel connected, despite being physically disconnected.

The role of web 2.0 technologies is a fairly interesting topic - blogging has replaced samizdat in repressive societies, but few people have really commented on what function these technologies will have in non-repressive societies.

And of course one thing is the social network, the range of contacts and then being able to track things and how they're going - for example I can track what's going on with some projects I'm no longer involved with but am still interested in, and also the general connectedness with old friends an colleagues so one knows when people change jobs and all these other minutiae that help maintain contacts.

Wednesday, 3 September 2008

SMC Skype WiFi phone

At home we have a problem. We're on the side of a narrow canyon without decent line of sight to a cell phone tower. This means that phones ring but the signal quality is too bad to talk unless one goes and stands at the top of the block in the back yard.

At the same time I've also become a convert to Skype for overseas calls - not because it's a lot cheaper than Telstra - three cents a minute for Skype out versus five cents for Telstra - but because the call quality is better when calling overseas. And of course I use Skype to call home when I'm away.

But there's a problem with Skype - it's cumbersome to use. It means donning a headset and being tethered to one computer while making the call, and arranging overseas Skype to Skype calls across timezones requires careful prior co-ordination by email. This means that using Skype is not spontaeneous - it's ok for the occasional conference call and regular overseas call, but you lose the versatility of a phone call. And while you can get cordless handsets for your computer, it means leaving the computer powered up and connected to Skype.

So I cracked and bought a Wi-Fi phone from CE Compass. I ended up with the SMC version,  the same phone is available from Belkin and from Edge but ignore the branding - they're all the same phone.

So what are they like?

Slightly clunky, a bit like a 1999 cell phone with rubbery keys and a slightly crude user interface. That said it was fairly straight forward to setup, basically your skype account details and your wireless access point security details - a nice point is that it goes looking for open access points and can be made to search at any time - useful if you want to work from a coffee shop with free open internet access, train station or airport lounge. You can also add multiple home netwrks for it to try, should you need one at home and one at work, say. You can get that nineties feeling back again and you can use it like the late and unlamented Rabbit phone  that only worked at local access points which led to groups of rabbit users clustered round locations with an access point sign.

Other than that it just works. Rings when people call you and sits as a device on the network.

Boot up and initial connection is a little slow, but call quality is reasonable and saves having to be tethered to a computer. Basically it works all round the house and out into the yard - wherever there's a signal including the garage. Battery life could be better as well but as it's an alternative, not a substitute that's no big deal.

Tuesday, 2 September 2008

mobile printing redux ...

Back in August 2007 I blogged about how to design a mobile printing solution [1 ] [2]. Like many IT projects it went nowhere and then suddenly resurfaced, in a slightly different form.
What we still need to do is provide a means for people using their own machines to upload and print but making it as simple as possible for the user. Making it simple means it can't be seamless as we need to make as few assumptions about the user's machine and browser as possible. So here's the cartoon:
  1. User logs in to system
  2. System presents the user with a web page listing the user's files and an option to do an http upload of a file (analogy is the geocities website manager)
  3. Besides each non-pdf file we have two options - convert to pdf and print. All printing is done by converting to pdf and then  pdftops, with conversion being done with either OpenOffice command line mode or abiword command line mode as appropriate and then print/export to produce the pdf. The analogy is with Zoho's or Google Doc's print and import options
  4. Pdf files have an option view or print, this means that users can check the layout before printing
  5. Printing is done by passing the print job through pdftops and then queing it to a holding queue with lprng.
We then have a second web based print release application that can be accessed either separately or as a flow on from the web based printing solution. This application basically allows the user to requeue the print job from the holding queue to an active queue or delete the job. An added refinement would be to add an estimate for the number of pages and hence an estimate of the cost to print.
It's not elegant but it does allow users a way to print from any device with some local store and a browser.

Sunday, 31 August 2008

Byzantine links with post roman britain ...

The simple view of the post roman history of Britain is that the army left sometime before 410, and in 410 the cities and communities of Britain were told to fend for themselves. This they failed to do and collapsed under the weight of hordes of land hungry anglo saxon migrants. What of course this view does not show is the fact that there must have been an ongoing conflict for at least two centuries as the the anglo saxon communities pressed westward and the romano british retreated, yet were capable of mustering the effort to build fortifications such as Wansdyke. We akso know, both from literary sources, such as Gildas and Nennius that there were kingdoms in the west of britian, perhaps based originally on old roman local government divisions which themselves were based, loosley, on pre-Roman tribal boundaries. And that these stateles contained towns, certainly with eveidence that there was a roman style town functioning at Wroxeter till sometime after 500. There are arguments as to how romanized Roman britain was and to what extent romanization was only skin deep - he construction of towns in Roman Britain mainly because 'had to have them' and how majority of popultion in west and north continued to live in tribal villages, areas that were less romanized than the south and east. Interestingly, there's a similar example from Morocco. Most of Tingatania was abandoned by Rome in the face of the Vandal advance in the 400's but Volubilis remained occupied until being abandoned after an earthquake and then re-occupied with a smaller walled settlement on the edge of the town next to some fresh water springs, the town aqueduct being one of the casualties of the earthquake. These people were not Romans, even though some of them were buried with gravestones with latin inscriptions, and whose deaths were still dated from the founding of the province. Nor were they Arabs, their arrival had to wait until the coming of Islam. Most likely they were berbers, whos great grandparents may have had a patina of romanization but whose descendants were not, but who treated Latin as the 'official' language for business. And if anyone should doubt that Rome had an influence on the Berber's simply look to theBerber calendar, the names of the months and the celebration of Yennayer 1 as New year's day on 14 January, neatly paralleling the Orthodox Julian calendar. So one can say that it is quite probable that there were functioning post Roman statelets in the west of Britain. Like Bereber Volubilis, they were probably Roman in name only, even if their elites gave themselves titles such as 'protector' which drived from late Roman official titles, and the towns were only large native vilages perhaps with a few Roman style buildings built of wood, not stone. Now there statelets cannot have existed in a vacuum. Historians tend to concentrate on the saxon ascendancy and the conflict with the Romano British, yet we know that churchment travelled from the still british west to mainland europe, and given that these churchmen sailed on boats, that there must have been some sort of trade. And not just with Gaul. Byzantine coin finds are more common in the west than the east of england suggesting greater trade links.
(Question - how does this compare to the distribution for frankish coins?), but certainly suggesting that there was direct contact between the Byzanitine empire and the British successor state in Cornwall, based around Tintagel. Of course Byzantium does not mean Istanbul. The sixth century Byzantine empire had successfully reconquered North Africa and the grain ships sailed from Egypt and Carthage to feed the population of Constantinople. And paralleling the coin distribution, North African pottery is more common in west of england than east - pointing to trade route via north africa for supply of items such as wine and olives. And again the finds are focused around Cornwall and Tintagel. But why would anyone bother to sail to Tintagel from Carthage to trade with a gang of smelly celts who spoke bad latin and claimed to be Roman. Certainly not out of altruism. But the smelly celts had one thing that was in short supply elsewhere - tin - needed for making bronze. And in much the same way that minoan and phonecian traders before them found it worthwhile to risk the long sea journey to trade for tin so must it have been for the byzanitines, trading luxuries for tin ingots. And there are modern parallels to this scenario. During the second world war the danish colony in Greenland was cut off from Denmark, but managed to keep going and pay for the necessary imports by having something to trade, in the Greenland case cryolite that they could sell to the US and then use to pay for imports. Now this is all circumstantial. But someone with links to the Byzantine Empire was trading with Tintagel, where people did also make grave markers with inscriptions in bad latin. And the journey must have been worth their while - the more interesting question is what other forms of contacts were there and did they include any degree of cultural exchange.


Books get published, get read and then go out of print. You then end up trawling second hand bookshops and these days the internet to track down a copy at a reasonable price. And of course publishers are faced with the costs of warehousing the inventory, something that's increasingly expensive, so the old, the obscure, and the plain boring end up being dumped onthe second hand and remainder market really early, or if you're unlucky, pulped.

Some university and academic publishers have gone to a print on demand model, where the text is prepared for printing and copies are only printed as one offs as required, which in these days of cheap high volume laser printing is a really compelling way to go - no warehousing or inventory management costs.

Now comes news of Faber Finds - a mainstream UK publisher giving print on demand to its back list - basically you get a bound printed copy of a book from the back list on request. Of course this costs money, but it does provide an interesting change in the way of providing access to out of print texts, and incidentally to scanning and digitally archiving these books.

Tuesday, 26 August 2008

Viral spread of webmail...

Back in April I blogged about how, for students at least, email had come to mean webmail.  

Well we've seen an interesting phenomenon. Our current public webmail system is Sun Java Communications Suite 5 using the UWC client, but we also ran up the new version 6 (Convergence) on a test system a few weeks ago to see how it went and to play with it internally. We didn't bother protecting it, or restricting access, as we were wanting to do some user testing on the interface.

Well we've certainly got that. Somehow, even before we've started any formal testing of the system the url has leaked out and spread through the student community with sixty or seventy people logged into it at any time during the day. I think we might have got user acceptance ...

[if you're interested in Convergence, Sun have a demo system. You will need a username and password. I don't think they're very secret, either your local Sun account manager can get you them or else look on the Sun Communications suite website : at a pinch mail me]

Monday, 25 August 2008

Reconstructing Minoan wall paintings ...

Interesting article about a group at Princeton who have developed a computer system for pattern matching to reconstruct Minoan wall paintings on Thera. Interesting. Very simmlar in concept to the system developed in Germany to digitally glue together the shredded Stasi files - I suppose the questions are - can one (a) uses such a system to combine papyrus or manuscript fragments held in different collections to do virtual reconstructions of the documents, and (b) could one then pipe the reconstruction into a handwriting recognition system to recover the text for further analysis?

Language diversity in the Caucasus

On the back of the unfortunate conflict in Georgia, an interesting piece in the IHT on the degree of linguistic diversity in the Caucasus. Despite having studied Russian years ago, including slogging through and later enjoying Tolstoy's short stories set in the Caucasus, I'd never quite clicked that there were that many ethnicities, languages, cultures in the area. Nor had my interest in Byzantine history helped much despite the close links between Armenia and Georgia, to name but two and the Byzantine empire.

One thing that did resonate with me was the comment about the Ossetian lexicon being burned. nearly twenty years ago now, shortly after the Iraqi invasion of Kuwait, a small Yemeni man turned up at the Computer Centre at York where I was working, doing data recovery and document format conversion, with a bag of floppy disks. He had been working in Kuwait and the disks contained all that was left of his research notes - most had been stored securely on a server at Kuwait University, like they should where they were backed up properly, etc, except that the Iraqi's decided to use the server and its disk stack for target practice.

I did get most of his data back, and he thanked me with a present of wonderful fresh coffee beans from his father's farm - a wonderful thank you and something that makes it all worthwhile.

Tuesday, 19 August 2008

reading old documents

reading old documents can be difficult - incomplete text, blobby type and all the rest. There have been fairly successful automated attempts, abot which I've previously blogged elsewhere:

Searching Manuscripts Electronically posted Mon, 13 Feb 2006 09:29:51 -0800

 Digitisation of historical records is fine, but all you end up with digitisation projects for historical documents is a series of high resolution images which may be easier to work with and increase access but doesn't do anything for search.  Printed documents are more or less OK for search. Scanned and OCR'd versions of printed books, even very old printed books, such as those from rennaisance Italy are fine, even if you do need to sometimes 'teach' the OCR software how to deal with a non standard font.  Manuascripts have however, up to now been a no-no. 

The only way to make an electronic text was to type it in by hand and mark it up using an encoding schema such as those developed by the Text Encoding Initiative Consortium.  


Now comes news of a really clever idea. Allan Smeaton's reserach group has been looking at shape recognition software - basically software to recognise objects such as cars and planes in photographs as cars or planes - trickier than it seems as you need to be able to find an object and recognise it from any angle, something humans can do easily, but machines find hard.  For example from the window of my office I can see a car park containing a mixture of sedans, hatchbacks, SUVs, all by different manufacturers and all different colours, but I can recgnise them as cars.  The clever thing about Allan Smeaton's software is that it can look at an image and twist it to match a category, so it can tell that a Peugeot hatchback, a big Ford Sedan, and a Subaru Forester are all cars.  

On a whim Smeaton fed digitised images of George Washington's letters into his software and it recognised an 'A' as an 'A', a 'B' as a 'B' and so on - all of which was pretty impressive, because while George Washington was taught to write in an age where legibility was prized, being as handwriting was the only real means of communication other than face to face discourse, like of all of us his handwriting got a bit more sloppy (and variable) as he got older and busier.  

Smeaton has also tried this on digitised medieaval manuscripts. These were actually easier to handle as the monks were going for legibility, and hence repeatibility.  

Smeaton has now obtained funding from Google, among others, to develop this as a search tool for digitised manuscripts - essentially a sort of plastic OCR that copes with variation.  Copperplate and other highly repeatable handwiting - and I would guess not just in Latin script - appears to be in reach, but I would guess that dealing with highly variable scribbly script, such as in diaries, especially now from C20, C21, are not. This would be because in the last hundred years or so handwritten documents were usally for personal consumption only, with most other documents being typescript or latterly computer printed, and hence subject to greater variation (aka scribbly).  This may mean that the TEI-C folks are still in business, either doing difficult cases by hand or by correcting errors in shape recognised texts. 

[Years ago, I came across another object recognition project which was to write a naked people detector. Apart from the use of the algorithm as an engine for censorware for the prurient, it's a genuinely hard problem given that people are all shapes and sizes and are photographed from all sorts of different angles, and come in two sexes, both of which have nipples, and meaning you can't simply cheat by guessing it's a torso and then if its got nipples (like one or two round redbrown circles two thirds of the way up it's naked and not for public consumption]

Now there's an interesting alternative method - use the human eyeball by extracting text from old difficult to process document and then use the extracted text in captchas. which is an interesting idea.
Now if you put theses two approaches together would it work for reading ancient manuscripts or cuneiform tablets, and could you make the system self learn?