Wednesday, 24 August 2011

Help wanted

It's fair to say his blog has a fairly mixed audience - some read it for the hstorical speculation and some read it for the information science stuff.

If you read it for the latter you might be interested in knowing that as part of my day job, I'm currently looking for two people for a set of ANDS funded data capture and Seeding the Commons projects (and incidentally build an archive solution along the way). Details and information on how to apply at if you're interested. For background on what we're trying to accomplish here take a gander at Extracting, Transforming and Archiving Scientific Data at

Please note that the timescale is fairly tight on submitting applications. Applications need to be lodged by 1700 Canberra time (UTC+10) on 04 September (which coincidentally is the traditional anniversary of the deposition of the last western Roman emperor ...)

SSO, social and institutional identities

I've periodically ruminated on the future of university information services and how if everything is outsourced, all we are left with is access mediation, ie providing institutional logins that allow access to institutional resources, federated institutional resources such as a research cloud, and external resources such as Jstor, for which you gain access because your home institution has paid a subscription to provide access to the resource.

Now if everything was happily outsourced via the institution, say Google Apps, or Live for Edu, we would have something quite nice and tidy. The id you use would still be tied to the institution and therefore you can have institution level access.

But in fact we see that people are starting to blend resources accessed through their social identities with their institutional work. Essentially self outsourcing.

And of course there is no reason why they shouldn't - or example if someone finds Windows Live and One Note ideal for managing research notes why shouldn't they, especially as they can access it from almost everywhere and from a range of platforms.

Your social identity is your google id, twitter id, Live login etc etc, and as increasingly there are lot of services that invite you to login with your google id or twitter id people start doing exactly that, rather than creating yet another id and password.

The UX lesson is that people don't like having multiple id's and passwords. The other lesson is that Google and Twitter are winning the account federation war.

The trouble is that anyone can have a social id - even my cat has a blog and a twitter account and consequently social id's are reputation free.

This of course doesn't matter in 99% of cases. It doesn't even matter when creating a virtual organisation, say an online collaboration, until we get to the problem of access to resources.

Now as people self outsource, this will be a growing problem - the Internet 2 wiki has some real world use cases and I'm sure you can add your own without too much thought.

The solution is probably a gateway of some type where people self declare their preferred social id's and link them to their institutional id's - this is not so different as what happens with a reverse proxy service, where for example, if you want to work from home and access a resource that allows access on the basis of ip address, you log into an institutional reverse proxy service, in my case with your institutional credentials to gain access to the resource.

The logic is quite instructive:
  • you log in with your uni-id to prove you belong to the university
  • the service provides you with a connection as if you have a campus id address
You could then imagine this enhanced scenario:
  • you are logged into google and you google credentials are cached locally
  • you connect to the proxy server.
  • The proxy server inspects your machine and notices your google credentials are set. It looks up your gmail address and sees you have previously linked that to you institutional id
  • it asks you to type your institutional password to confirm it's really you
  • just as in CAS it creates an obfuscated token it passes to all services that request it to allow access to institutional resources including federated resources
And, given the Jasig/CAS Sakai tie up, allows you access to a collaboration platform which allows you to share resources with you colleagues in a collaboration or virtual organisation. providing they all have an electronic institutional affiliation

Tuesday, 23 August 2011

Operating system streaming

I'm having trouble here. There are a couple of news stories circulating [ 1, 2] today about Mircrosoft seeking a patent for operating system streaming.

From the patent description it doesn't seem that much different from staged remote booting of diskless workstations - something that seems very 1990's.

To recap, a long time ago, when God wore short trousers and I didn't have white hair Edmund Sutcliffe and I implemented a staged remote boot environment at the University of York - Edmund had previously done some prior work on this at Bangor.

The way it worked was this:

Machines had no internal hard disks. When a machine fired up the boot rom on the network card sent out a broadcast message and a bootserver replied and allowed the client machine to download a boot image, which as essentially a dd'd 1.44Mb floppy image containing an operating system, and network drivers, plus some configuration information.

In the original version there was a single boot image. In later versions there were multiple boot images die to the need to support different hardware configurations. We had a backend database that allowed the bootserver to lookup the ether address of the calling device, and provide a boot image based on that ether device.

This proved rather useful as it allowed us to have special purpose ase well as general boot volumes.

One example of a special purpose volume was a dedicated workstation access for email reading alone where we took some old computers with limited memory, network cards and no hard disks. I then developed a specialist 1.44Mb floppy image with an operating system on it, a tcp stack and a locked down version of kermit (actually ms-kermit and freedos) in terminal emulation mode that logged into an old Sun server and forced the user into pine. Logging out of the system forced the pc to reboot (basically we waited till we saw the word logout go past) to ensure a clean session for the user - basically a quick and dirty university email reading terminal - login, read mail exit and walk away. Another example was a variant on the same theme for dedicated library catalogue access.

Unlike these specialist environments what the general models did, after starting the operating system and network was to mount a virtual disk image that other operating system components and continue booting. We also experimented with creating a virtual disk in memory and copying components to that. Memory constraints meant that it didn't give us a significant advantage, so we went back to network volumes on topologically close servers for better performance. It also mounted a second disk image for applications and a third, writable, volume on a per user basis to allow users access to filestore to save their work.

As the user storage volume was sitting on a Unix server, they could potentially access it over the internet. As this was 1992 that meant ftp and terminal access, but nowadays it would mean webDav.

So, the boot sequence for general machines was:
  1. mount hardware specific startup volume
  2. mount the rest of the operating system and continue booting
  3. mount the user and application volumes
In the case of machines with hard disks, the sequence was much the same with the local disk being treated as cache and scratch space. This allowed a portable machine to have files coped to a directory on the local disk from a users network storage, the machine disconnected from the network, and, provided it had a valid local install of the operating system it could boot up locally and still access these files. I'd like to claim we provided some clever synchronisation a la Dropbox, but alas no, we didn't think of it at the time.

Now what we did was arguably clever for its time, as it meant that we had only a small number of boot volumes to maintain, and the separation between specific and component environments meant application deployment was pretty simple, but it was by no means unique.

While we used PC-NFS you could easily have built it using a range of other network solutions. What I am struggling to understand however is how does this differ in concept from Operating System Streaming ?

Wednesday, 17 August 2011

When did the great war become the first war?

Neatly blending the Long War theme and what Google's Ngram can tell us about Turkestan, my friend Tim Sherrat has, through his work on methods for harvesting data from Trove, produced a rather nice graph based on Australian newspaper articles to demonstrate when people started referring to the First War as the First World War rather than the Great War.

Interestingly, using Google's Ngram viewer one can see that the results are broadly similar for the English Corpus:

and for American English:

although the Brits were a little later to the piece:

Friday, 12 August 2011

30 years of the IBM PC

Today, 12 August 2011, marks 30 years since the introduction of the original IBM PC.

Personally, I've a lot to thank the PC for as it has kept me more or less gainfully employed for the last 30 years, through the rise of the clones - when just about anyone with a wrist strap and a torx driver seemed to be making clone pc's in their shed, the wordprocessor format wars - wordstar/word/wordperfect - I still have a wordperfect mug - operating systems - dos, windows, windows 95, NT, OS/2, Windows 2000, XP and windows 7 and of course not forgetting peripherals along the way - principally printers.

And while I like to accentuate the sexy, doesn't everyone, it's been the humble desktop computer and laptop that's been my mainstay. And even when I've strayed to other architectures and operating systems, I've got to admit that the original concept behind the IBM PC design has stayed good - allowing you to easily make and upgrade systems out of standard components as seen in my $83 linux PC - which was actually only $20 for the box, memory and disks, with bits stolen begged or borrowed from other dead machine - the principal cost being a second hand Sun LCD screen.

And that's it exactly - the design took the world by storm because it was so open so that in the end it had so much momentum behind its general purpose open extensible architecture many of the non Wintel suppliers ended up using the design and the components to get costs down. And whatever anyone tells you about PC's being dead, ignore them. Have you ever seen an ATM boot up? or an airport self check in terminal, not to mention all these train station screens telling you windows has failed to restart correctly ...

Thursday, 11 August 2011

Sharing or Syncing ?

Thinking about the sharing large files conundrum, it's actually more complex than it first appears, as sharing with others is different from sharing with yourself.

Dropbox of course lets you share files with yourself, applications such as and yousendit make it easier to share files with others.

We therefore have two slightly orthagonal use cases:

Use case 1 (the scenario)
  • user wishes to share files or data with another user (or group of users) on campus or elsewhere. The files are too large to send via email, nor are they lodged on an open repository

Use case 2 (the syncany scenario)
  • user wishes to ensure that files held on a local directory on their personal computing device are backed up to a central location on a periodic basis
  • user wishes to share these files across multiple computing devices in various locations
  • user does not necessarily wish to share contents with other users
Use case #2 is essentially the same as an on demand or periodic backup, and also allows you to provide a synced backup store for devices with local storage that are periodically off the network such as travelling netbooks.

Incidentally Asus's webstorage does exactly that. If I hadn't let mys subscription lapse (my bad) I could have experimented with doing this with the Ookygo for local files. However given that 95% of everything that I use the Ookygo for on the road is done via a browser I probably wouldn't have much in the way of a valid experiment ...

Friday, 5 August 2011

sending big files (part ii)

Arthur (@fatrat) suggested as a solution to the problem of sending big files. It certainly seems to tick the boxes:

  • you run it on a server owned by you - no nasty questions of trust and third party intermediaries using transit servers in Ktoznaetistan
  • you tie it into local authentication making it easy for your users to exchange files
  • it presumes that if you are a home user and are sending the file to an external user the external user is known to you
  • it does some standard address verification if the external user wishes to send a file to an internal user - again there is a presumption that the recipient will know/recognise the email address of the originator
and this works rather well i a consenting adults sort of way allowing people to exchange files. It also bypasses the rather ticklish problem of what to do with users outside of your shibboleth access federation, say from commercial partners of small reasearch institutions without an IdP, or indeed from institutions overseas whose IdP is not part of a federation your home federation recognises.

I also guess it would be possible to add shibboleth authentication as an option for originators for added verification for cross institutional collaboration...

Thursday, 4 August 2011

file sharing - It's about trust

Building on my previous about sending big files I happened across the following in the wikipedia article on the Somali Shilling...

... Traders avoid the need to carry large amounts of Somali shillings by converting them to U.S. dollars and then wiring them to money houses in Somalia. Because identification can be easily forged, those seeking to pick up wired money are required to answer questions about their clan and kinship relations ...

What's interesting is that the de facto solution is based on private knowledge. So for data transfers, rather than send the key, one should perhaps think of a two or three factor response system, similar to those used by banks to establish your identity for online banking.

I've also suggested in the past that such a solution would help in digital cultural repatriation in Aboriginal communities where, as custodian of digitised cultural heritage, you need to maintain the trust of the traditional owners of that heritage by putting in place measures that require people requesting access to demonstrate that they have the right of access under traditional law.

As a content sharing solution it also has the merit of not requiring people to remember passwords or do something sophisticated with encryption keys, but of course it does mean people having to register with the file sharing service, ie establish their bona fides, before being able to use it - which would constitute a barrier to adoption ...

sending big files

in this wonderful synced everywhere world we're sliding towards, there's a problem - sending or sharing big files. Or more accurately doing it between different workgroups located at different geographic or institutional locations.

In the old days of course it was simple - mailboxes were small and any (relatively) big files were transferred by ftp either from your machine, or via some server.

Nowadays. mailboxes are considerably larger, but quite a few systems impose limits on the size of attachments, which can be a problem when sending verbose files such as scanned pdf's of contracts.

Now we could simply use ftp, or better sftp, but of course this has a problem - distribution. Using an ftp solution is reliant on the end user bothering to download the file. A proportion won't. And these days a greater proportion won't know how to use ftp. Email wins as all they need do is click on an icon and the file opens in Acrobat, Preview, or whatever.

Commercial services like Yousendit are a bit better. Even though conceptually simple as an http file upload/download service they generate an email with a link that you click on and the download happens. It's immediate and almost as good as clicking on an attachment.

However, to send a file you are entrusting your content to a third party. A third party elsewhere. And the server is in?

The Dropbox sign in fiasco tells us that however good third parties services are we need to reserve judgement. Sure 90% of the data shared is cat pictures, but what about the 10% that is contracts or X-rays or something equally private or confidential?

In Australia we do at least have Cloudstor that uses Shibboleth to ensure that those you share with are members of the club but this has a disadvantage - one you can't share with non Australian Access Federation people (or the NZ equivalent) ie you can't share with non university people or people in the Northern hemisphere eg UK or US.

So, do we have to trust a third party - for the moment yes, otherwise it's down to sending encrypted USB sticks through the mail. What perhaps we need is a third party encrypt and submit service that sends the key separately to the uploaded file ...

Wednesday, 3 August 2011

Waiting for the Dalai Lama

If you’re a regular follower of this blog, you’ll realise that one of the themes is central Asia. So when Librarything offered me the chance to review Waiting for the Dalai Lama by Annelie Rozeboom I was more than happy to do so.

You can find both  my, and other, reviews on the Librarything website, or if you just want to read my review, the original is online as a Google Doc.

Some of the review was written on the plane back from my recent trip to Evanston, any misspellings and mispunctuations are mine, even if I would like to blame the airline for the rather cramped journey …