Stuff, geeky stuff: 10/01/2008

Monday, 27 October 2008

Genghis Khan and the optiportal

For those of you paying attention in the back, the link I twitted about searching for the tomb of Genghis Khan, makes reference to the use of an Optiportal to display the results of the findings:

Explains Lin : "If you have a large burial, that's going to have an impact on the landscape. To find Khan's tomb, we'll be using remote sensing techniques and satellite imagery to take digital pictures of the ground in the surrounding region, which we'll be able to display on Calit2's 287-million pixel HIPerSpace display wall. ...

which sounds an interesting use of the technology for large scale visualisation work.

Wednesday, 22 October 2008

mobile printing (again!) ...

Going round in circles on this one. Way back last year I suggested a mobile printing solution that essentially had students printing to a pdf file and uploading the pdf file for later printing. The idea of using pdf rather than postscript was to get round the problem of students having to install a generic postscript driver and configure it.

It was a fair solution and that bears some work. The scenario is that the student uploads the pdf file to their workspace and then has to login to a web based application to select the file and queue it for printing, file conversion being enabled by running pdf2ps and queueing it as a print job.

However the landscape has changed between then and now. What is required is a mobile printing solution that will interwork with whatever print/quota management solution is deployed.

The crude solution is quite simple. We create an lpd holding queue as a generic postscript queue and users spool to it. Messy and users need to authenticate in some way, meaning that its probably not on given the diversity of operating systems out there. However the idea is attaractive - users then log in to a web application, authenticate again, and then select the job they want to print, which is passed to the backend print system. Lpd is not a good solution as it doesn't do authentication. IPP does and in fact we would use ipp based printing with authentication. However installing an ipp printer needs having to distribute a print driver, or have users configure a printer locally. To make the call of the complexity or otherwise of doing this have a look at this example from a US university.

We could combine the ideas into something more elegant. Users still need to generate pdf files, which probably means windows users installing CutePDF (or using open office to export to pdf).

Once the user has generated the files they then open a connection and login to a web app.

They are presented with a list of files and an option to print them now.

They also get an option to (a) upload a file to print later (b) print a file now.

Option (a) puts the file in the spool space for printing later.

Option (b) takes the pdf file and spools it with pdf2ps to a printer. Users can also get an option as to which printer to print to. The print job is handed over to the backend print system, users print quota debited and so on.

Advantages:

Users can upload and print the files from anywhere
Users can upload but defer printing of files until they are ready to print
Users only need to login to the print management application once

Disadvantages

Users need to create pdf files
Printing is not seamless - users need to login to print management application

If we go for an ipp only solution:

Disadvantages

Users need to install and configure remote printer
Users need to login to separate print management application to release print job
User do not have the option of immediate print

Advantages

Printing is seamless

And yes we probably could offer both options in some wierd combination ...

Monday, 20 October 2008

Shibboleth and the shared cloud

Increasingly universities and by extension university researchers are in the collaboration game, not just within institutions but across institutions. And the collaboration game requires access to shared resources, a blog server there, disk space here and a compute resource over there, essentially with a cloud of shared resources.

And then one comes up against the 800kg problem of authentication. Originally all this collaboration was a bit ad hoc - a research group needs a blog - install ubuntu in server mode on an old pc in a corner of the lab, install apache, php and wordpress. Create some local accounts, not forgetting one for Joe who'd moved to another institution, and hey presto, we were blogging. Spontaneous, effective and fun - well at least for those research groups that had an IT literate postgrad, who then moved on elsewhere and no one knew much about maintaining it.

Or take the other scenario, the middle aged lecturer in middle english who starts this really interesting blog on the divergence of Frisian English and Dutch around the time of the great vowel shift. Of course he doesn't know much about blog software so he gets a hosted wordpress or blogspot account and then invites a few of his mates to contribute posts. Very ad hoc, very spontaneous and totally invisible to any research quality assessment exercise.

But using small scale resources means that we don't need to have complex authentication, and because everyone's mates, everyone trusts each other.

The only problem is that these blogs increasingly represent a research output and a form of scholarly communication, which means that universities want to host them, if only to ensure their backed up. And hosting of course means hosting them in a multiuser environment with properly provisioned accounts, something that a lot of blog software isn't really designed for, hence projects like lyceum, wordpress-mu and their various ldap plugins.

And that's where it stops. Authentication only is within the institution's ldap domain, or in the case of some of the large system universities in the States, the particular college's ldap domain, effectively killing the spontaneity of collaboration stone dead.

Fortunately there is an answer - shibboleth, which has kind of been a wide area authentication mechanism without a purpose. Everyone thinks its a good idea, but really, until collaboration is required, it remains mere geekery. The joy of shibboleth is that while the implementation can be no easy exercise, for the end user it can be as easy as using openID to gain access to services. It does also require that the shared resources are shibbolized to work with a shibboleth basd mechanism. Given that many of the products required are open source this is less of a problem than it might be.

Also as shibboleth gives users control over what attributes are released the amount of information disclosed is consensual.

The only real problem remaining is a mechanism to provide access for people affiliated to institutions with no IdP, or visitors from outside of the academic world. There need to also be a mechanism for institutions to 'sponsor' non-affiliated invitees to get a non-affiliated shibboleth account that may be more restricted in scope but which will allow them to work with groups of affiliated researchers across institutions. Such a service could be provided on a per institution basis, or on some other basis, for example one provided by the government for employees engaged in collaborative research rather than on a per agency basis.

The mechanics don't actually matter, it can be one solution or a mix of solutions, the main problem is to ensure that whatever solution is found encourages openness and the free flow of communication both within institiutions and across institutions.

technology meets the electoral process

A small win for technology in the electoral process last Saturday at the ACT elections for a new territory government.

Up to now the process has been resolutely nineteenth century, turn up, give your name to the electoral clerk, who then riffles through a massive bound listing of the electors' register, asks you to confirm your address, and rules off your entry to show you've voted and then hands you a ballot paper.

As voting's compulsory, I guess that some group of poor bastards had to collate all the entries to make sure all us good citizens had voted - preferably only once - and issue infringement notices for anyone who didn't and didn't have a good excuse.

Well last Saturday was different. Queue up, talk to the elections clerk who uses a wireless pda to confirm your registration details against the voter's database before handing you your ballot paper. No riffling, no collating. No costs for printing the electoral roll.

What's even nicer is that they borrowed the pda's from the Queensland electoral commission so no acquisition cost, just the software development and network costs.

What I especially liked about it was its sensible, low key, pragmatic use of technology and unlike voting machines with their endless complications and audit requirements, totally non-controversial.

Saturday, 18 October 2008

Putting Twitter to work

Over the last few weeks I've been experimenting with twitter, or more accurately using twitter to experiment with microblogging.

So far so geek.

Now i have a real world application for it -providing live system status updates. One thing we lack at work is an effective way of getting information that there is a system problem out to people. Basically we email notifications to people.

However is we can apply sufficiently rigourous filters to the output from various montoring sysstems such as nagios we can effectively provide an input feed into a micro blogging service. This then produces an RSS output feed which people can then refactor in various ways elsewhere on campus.

And of course we can inject human generated alerts into the service and use our own tiny url service to pass urls of more detailed technical posts on a separate blog when we have a real problem,

Also, we could glue the feeds together, much as I have with this blog in a webpage where people can check and see if there is a problem, or indeed what's happening in the background - good when you have researchers in different timezones wanting to know what's going on, and it gives a fairly live dynamic environment.

All pretty good for a couple of hours creative buggereing about to get my head round Twitter ...

Wednesday, 15 October 2008

Optiputer

This morning I went to a presentation by Larry Smarr on Optiportal.

Optiportal is one of the spinoff's of the Optiputer project. Essentially it's a large scale visualisation system built out of commodity hardware, cheap to build, and essentially made out of some standard LCD displays and off the shelf mounting brackets. The displays are meshed together to give a single large display. So far so good. Your average night club or gallery of modern art could probably come up with something similar. However it was more than that, it allowed large scale shared visualisation of data, and this was the more interesting parts - using high speed optical links to move large amounts of data across the network would allow people to see the same data presented in near real time - rather like my former colleague's rather clever demo of audio mixing across a near-zero latency wide area network. It could equally be a simulation of neural processing or an analysis of radio telescope data. (or indeed, continuing the digital preservation of manuscripts theme, se set of very high resolution images of source documents for input into downstream processing)

And it was the high end data streaming that interested me most. Data is on the whole relatively slow to ship over a wide area network, which is why we still have people fedexing terabyte disks of data round the country, or indeed people buying station wagons and filling them with tapes. Large data sets remain intinisically difficult to ship around.

When I worked on a digital archiving project one of our aims was to distribute subsets of the data to remote locations where people could interrogate the data in whatever way they saw fit. As these remote locations were at the end of very slow links updates would have been slow and unpredictable, a set of update dvd's would work better. If we're talking about big projects such as the square kilometre array project we're talking about big data, bigger than can be easily transferred with the existing infrastructure. For example transferring a terabyte over a 100 megabit per second link takes (8*1024*1024*1024*1024/100*1024*1024)/60/60 hours - or almost a day - and that's without allowing for error correction and retries. Make it a 10 megabit link and we're talking 10 days - even slower than AustraliaPost!

What you do with the data at the other end is the more interesting problem given the limitations of computer backplanes and so on. Applications such as Optiportal serve to demonstrate the potential, its not necssarily the only application ...

[update: today's Australian had a different take on Smarr's presentation]

Tuesday, 14 October 2008

the twitter feed

If you look at this blog in a browser you'll notice a list entitled 'interesting links' on the right hand side of the window. This is actually fed from twitter via twittermail, and I've been doing this as an experiment for just over a month now.

As an experiment I think it's quite successful, if only for me to (a) keep a list on interesting links in my sent mail and (b) share interesting things I've been reading with anyone interested enough to follow this blog.

Of course it doen't have to be twitter and twittermail, various other microblogging services provide the tools to create a similar effect - here it's twitter and twittermail purely because I found the correct off the shelf tools in their software ecology.

Personally I think this anonymous sharing model is more useful than the social networking 'share with friends' closed group model - it allows people who may be interested in a topic and who follow your blog also to follow the links you find interesting. Social network providers want to of course use the links to help add value to people who are part of the network to keep them hooked, or sell targeted advertising to or whatever.

In fact it's probably almost worthwhile also providing a separate rss feed for the interesting links as it's not beyond the bounds of probablity that someone finds the collection of links more useful than the original blog.

bloglines.com

People have accused me of being fixated on google products. Not true, even if it may look that way at times. For example I still think that Zoho provides a richer more flexible set of online tools than google apps.

Likewise I've stuck with Bloglines as a blog aggregator in preference to google reader. In fact I've been using bloglines since 2004 which must mean something. And I've been happy with it - performance is rock solid, or rather was. Since the last upgrade it's claimed that feeds did not exist (including well known ones like guardian.co.uk) and if you re-add a feed it will work for a bit then stop. Response is poor compared to before the upgrade (being ever so anal I tend to read the feeds at the same time in the morning - so I think I can claim this even if it's anecdotal).

So one more bit of me has been assimilated by the borg - I've moved over to google reader. Not such an elegant interface, but more reliable, and given that my major reason for reading rss feeds is industry news and updates that's worth trading elegance for performance ...

Monday, 13 October 2008

not installing plan 9 ...

People react to having a blank day in their calendar in various ways. Some people (apparently) go looking on the web for pictures of people of the opposite gender in various states of undress, some play interactive Sudoku games. Being a sad anorak I do neither, I build test installs of various operating system distros or play with various bits of software.

Today was an operating system day, and finally it was time for Plan 9.

Plan 9 irritates me as an operating system. It gnaws away at the corners of my mind saying 'look this could be interesting' like an itch you want to scratch and never do.

The whole idea of a simple distributed operating system to link together nodes for grid based application s is interesting and perhaps useful, but I'd never gone so far as to build an installation. Well this morning I did, building a instance on VirtualBox rather than a real physical machine.

After a little trial and error with the network settings (PCnet-PCI II (NAT) works fine) I started the install, taking the defaults all the way through on the basis that I didn't know what I was doing, though I had read the documentation. Installation was glacially slow, much slower than usual when building things under virtualbox, but basically worked with only the odd exception, usually caused by disk retries. Basically worked that is, until almost the last stage, unpacking the software image when it failed.

Probably due to running it on a vm more than anything else. Add it to the list to try next time I get another old machine to play with...

Tuesday, 7 October 2008

Nine months sans Microsoft ...

Well, today marks nine months since we became a microsoft free household (not quite true, we still have an old w2k laptop hidden away just in case, but I've only had to get it out once to configure a linksys wireless router ).

Now, I am not an anti microsoft zealot. Yes, I think Microsoft's business practices were not the best, but then all through the nineties and well into the first few years of the present decade there was no real alternative as a mass market desktop operating system - linux wasn't (isn't ?) there, and Apple seemed to lose the plot, and it took a long time to come back with OS X. The same in the application space. The competitors were as good if not better, and they took a long time a dying. Microsoft got where it was by either having products to which there was no serious alternative, or by convincing people that there was no serious alternative to Office and the rest. That was then and this is now, we work with the present reality.

Judi isn't an anti anything zealot as far as computers go. Computers are tools to her. Email, web, grading student reports, writing course notes and assignments and that's it. Providing she can send emails and buy stuff online, research stuff and get into the school email system she's fine.

Our decision for going microsoftletss was purely pragmatic. I can do most of what I can do with abiword, open office, google apps, firefox, zoho and pan, and I can do this on a couple of fairly low powered machines - an old ppc imac and a pc I put together for $83. Judi likes to play with digital photography, so we bought an imac purely because the screen was nicer. I though we might have to buy a second hand windows pc as well but that hasn't turned out to be the case. Firefox, safari, google apps and neo office have let her get her work done, even coping with the docx problem.

The only couple of problems we've had is with bathroom design catalogues (canned autorun powerpoint on a cd) and an ikea kitchen design tool. Things we could work around easily and turned out to be totally inessential. Other than that it's fine. Emails get written, appointments made, books bought, assignments graded, documents formatted.

So we've proved you can live without windows. We've also proved we could live with windows and not linux or OS X - no operating system has a set of killer apps for middle of the road usage.

And more and more we're using online apps - at what point does zonbu or a easy neuf bcome a true alternative? (and when do we get vendor lock in and all the rest in the online space?)

Sunday, 5 October 2008

what is digital preservation for?

I have been thinking a lot about digital archiving/preservation and what is the use behind it. In part I've been doing this to clarify my thoughts, as while the technologies of digital archiving and preservation are well understood the purpose is not and often different purposes are conflated. So lets step through the various options:

Digital Archiving as preservation

Here one essentially wants to keep the data for ever, cross hardware upgrades and format changes. Essentially what one is doing is taking a human cultural artifact such as a medieval manuscript, an aboriginal dreamtime story as recorded and making a digital version of it and keeping the file available for ever.

This has three purposes:

1)Increased access - these artifacts are delicate and cannot be accessed by everyone who wishes to. Nor can everyone who wishes access have or can afford access. While the preservation technology is expensive access is cheap - this is being written on a computer that cost be $83 to put together. This also has the important substrand of digital cultural repatriation - it enables access to the conserved materials by the originators and cultural owners. Thus, to take the case of a project I worked on, Australian Aborigines were too impoverished to conserve photographs and audio recordings of traditional stories and music, digital preservation allows copies of the material to be returned to them without any worries about its long term preservation.

2) Long term preservation. The long term conservation of digital data is a 'just add dollars' problem. The long term preservation of audio recordings, photographs, is not. And paper burns. Once digitised we can have many copies in many locations - think clockss for an example design and we have access for as long as we have electricity and the Internet.

3) Enabling new forms as scholarly research. This is really simply an emergent property of #1. Projects such as the Stavanger Middle English Grammar project are dependent on increased access to the original texts. Without such ready access it would have been logistically impossible to carry out such a study - too many manuscripts in too many different places.

Digital archiving as publication

This seems an odd way of looking at it but bear with me. Scholarly output is born digital these days. It could be as an intrinsically digital medium such as a research group's blog, or digitally created items such as the TeX file of a research paper or indeed a book.

This covers e-journals and e-press as well as conventional journals, which increasingly also have a digital archive.

These technologies have the twin functions of increasing access - no longer does one have to go to a library that holds the journal one wants, and likewise one has massively reduced the costs of publication.

Of course there's a catch here. Once one had printed the books and put them in a warehouse the only costs were of storage. These books were then distributed and put on shelves in libraries. Long term preservation costs was that of a fire alarm, an efficient cat to keep the depredations of rats and mice in check and a can of termite spray. OK, I exaggerate, but the costs of long term preservation are probably higher, in part due to the costs of employing staff to look after the machines and software doing the preserving and making sure that things keep running.

The other advantage is searchability. One creates a data set and then runs a free text search engine over it. From the data comes citation rankings, as loved by university administrators to demonstrate that they are housing productive academics (e-portfolios and the rest) and also the creation of easy literature searches - no more toiling in the library or talking to geeky librarians.

Digital preservation as a record

Outside of academia this is seen as the prime purpose of digital preservation. It is a way, by capturing and archiving emails and documents of creating a record of business, something that government and business has always done - think the medieval rent rolls of York, the account ledgers of Florentine bankers, and the transcripts of the trial of Charles Stuart in 1649. While today they may constitute a valuable historical resource at the time they served as a means of
record to demonstrate that payment had been made and that due procedure had been followed.

In business digital preservation and archiving is exactly that, capturing the details of transactions to show that due process has been followed and because it's searchable, it's possible to do a longitudinal study of a dispute. In the old days it would have been hundreds of APS4's searching through boxes of memo copies, to day it's a pile of free text searches across a set of binary objects.

Digital archiving as teaching

When lecturers lecture, they create a lot of learning aids around the lecture itself, such as handouts, reading lists. The lecture itself is often a digital event itself with a PowerPoint presentation of salient points, or key images, plus also the lecture itself.

Put together this creates a compound digital learning object and some thing that is accessible as a study aid sometime after the event.

While one may not want to keep the object for ever one may wish to preserve either components for re-use or even the whole compound object as the course is only offered in alternate years.

However these learning objects need to be preserved for reuse, and in these
litigious times, to also prove that lecturers did indeed say that about Henry II and consequently students should not be marked down for repeating it in an exam.

Conclusion

So digital preservation and archiving has a purpose, four in fact. The purposes differ but there are many congruences between them.

Fundamentally the gains boil down to increased accessibility and searchability.

The commercial need for such archiving and search should help drive down the cost of preservation for academic and pedagogic purposes. Likewise academic research relevant for digitisation, eg handwriting recognition, and improved search algorithms should benefit business and justify the costs of academic
digital preservation.

Thursday, 2 October 2008

Viking mice, the black death and other plagues

Interesting article [New Scientist, Scotsman] about how most house mice in Scotland, Wales Ireland have a characteristically Scandanavian genome, while mice in other parts of Britain have a genome associated with bronze age migrations of the first farmers, sugeesting that it wasonly in Viking times that the populations were sufficiently dense to sustain a mouse population with more intensive grain cultivation.

Likewise, prior to the black death we know the population had undergone a fairly rapid expansion, and hence could support the large rodent population in towns required as a reservoir for the plague bacterium. It has been hypothesised that the Black death was not a significant problem until it ended up infecting the large urban populations of Alexandria and Constantinople as part of the Plague of Justinian in 548. Large population, seaport, lots of rats.

There's also recently been a suggestion that the same sort of thing has happened with HIV in Africa - it had probably always been there but until significant urbanisation in the twentieth century, with accompanying population densities and greater opportunity for random sexual encounters.

So what does this mean for seventh century Britain. It's often argued that there was a plague event that preferentrially devastated the sub-Roman communities in the west of the island rather than the areas under Saxon control, providing an opportunity for further expansion westward by the Saxons. Does this mean that the plague was in the population and the outbreak was a result of higher population densities in the west capable of supporting a plague reservoir or does it simply mean that continued contact with the Byzantine east exposed the sub-Roman population to greater risk of infection as they had ports and the remnants of urban communities surrounding the ports to form an initial entry point for the plague?

If the former it suggests that the sub Roman successor states were capable of holding their own despite the loss of a lot of the prime agricultural territory, but to make a decision we need to know more about trade patterns, for example were the subroman populations also trading grain with northern France ?