Stuff, geeky stuff: 06/01/2008

Monday 23 June 2008

Open Suse 11 ...

I built an Open Suse 11 machine today. Admittedly it was on a VM but it took me less time to build than to download and burn the dvd.

The install was slick, and by the use of preconfigured images and overlays pretty fast. Like a fool I chose kde4 rather than kde3 or gnome as my desktop and consequently spent a few minutes pfaffing around working how tto drive the interface. I'm not sure if I'm convinced about kde4 being an advance, but then I often feel like that about window managers.

What was impressive was the 'just worked' aspect of the install and its speed. Ubuntu and it's variants take about 4 times to install as a vm and I won't start about open solaris. I am quietly impressed as to how good a product it is. I installed a few apps opened and closed documents and there was no real curliness to be seen. I might still prefer ubuntu, but the Suse user experience is pretty slick - you really could just give it away free with pc's and be sure that real people could install and use it.

medieval emoticons?

Sydney Shep, a New Zealand scholar, has found evidence of typeset emoticons in nineteenth century typesetting. Interesting.
However the nineteenth century is not the fourteenth. Shep argues that some of the annotations in medieaval manuscripts are the precursors to emoticons. Other scholars argue differently.

Whatever the truth, it's interesting how different cultures deal with texts and structure them. For me the question is, do you see similar scribbles in greek and roman manuscripts, be they highlighting or be they the analogs of emoticons, to add metadata to text?

University Computing part ii

SWW posted a followup comment to my post on Wither University Computing.

The essence of his comment was that by building shared services, a really good software distribution solution and being really good at replacing broken machines IT services still have a role.

He's correct. You can do that. In fact I've built and run a service elsewhere that did exactly that. It was good and it was, on the whole, valued by users. One of my better moments was when a professor, who shall be nameless, emailed me from his new university to say he apologised for all the grief he'd caused us. But we never got more than 60% of the staff.

But some time along the way I began to suffer doubts and wonder whether all that had been created was a job preservation behemoth. And then I changed jobs several times. The first was to a research institute that, in the main gave staff laptops and expected them to work offline part of the time. Yes they shipped out a standard build of core applications, but on the whole staff added extra applications as they needed them. That worked well and the really key services were folder and mail synchronisation when staff came back onsite after fieldwork.

Then I went to another research institute. There the desktop was locked down, standard apps, and IT staff had to install anything extra that was needed. That also worked well, and because it was simple and well locked down it was easy to deal with staff, some of whom did not have great skills in IT. Of course we had a whole lot of specialist boxes that did odd things but we didn't talk too much about that.

And then I came back to work at a (different) university. One where the centre had never really done desktop provision, the faculties had. And this of course meant that we had a range of qualities of service provision.

And services in common, or core services, call them what you will is probably the way to go. Allowing staff to have (and pay for) those applications they need, self install, gives great flexibility. Setting standards for document interchange and providing core facilities for collaboration allows staff to work in the way they find optimal while allowing us to avoid the massive problem of managing hugely different sets of needs and aspirations, and then by providing the key services in common - storage, web, email, manage the services that scale well and are best managed in big chunks.

And of course it can be cost effective to outsource some of these chunks ...

Thursday 19 June 2008

Wither computing in universities?

So what of the future of university computing?

In earlier posts I've expressed my view that student labs will go virtual. And that increasingly student computing provision will a combination of services and facilities, basically an enhanced LMS, filestore, access to printing and access to specialist software and services - we don't expect students to fork out for expensive software, nor do we expect them to have a Sun V440 in the basement of their share house.

But students are only half the equation. Universities also contain academic staff. (They also contain administrators, but they need fairly standard business systems that any enterprise requires).

Now why do we provide computing services for academics?

Well most universities don't. They provide services. Most academics buy their computer out of some sort of funding and do their work on it. It would be quite possible to be a historian and live out of one's laptop with little or no use of university computing provision, except in relation to the LMS and teaching.

And this is to do with the nature of what university computing was, and what it now is. Originally computing was expensive and it had a resource, the central computer, that was shared between staff and needed specialist staff to look after. The central facility was used primarily by people in numerically based disciplines, although you did get people doing analyses of language usage and word frequencies in anglo saxon texts and statistical analyses of neolithic tomb orientation.

Then these pesky pc's arrived and the initial attempt was to support them like mainframes, but gradually the focus moved off to the shared components of the infrastructure such as the network and filestore. Standard operating environments were thought to be a good thing, but more often than not by the centre as it made enduser support simpler, than by the users who wanted to do things. This view was a continuum - the English faculty, despite their irritations with a SoE preferred a degree of standardisation as they needed either to buy external support, or spend budget employing their own support staff, which was totally unlike Physics, who had an endless supply of geeky computer literate graduate students to provide in house support.

And now everybody has a computer, and that computer is relatively powerful, what most academics need is a set of services:

filestore and repository services - where work in progress can be stored with the assurance it can be backed up and a place for the long term storage of published work, be it papers, seminar videos or whatever
access to lms and allied teaching systems
access to discipline specific network resources - be they online journals, specialist datasets, or whatever
email and web access - including resources such as collaboration servers

and that's about it. Notice no mention of network. There's a reason for that - network provsion can be bought from anyone, universities only provide ther own as it's (usually) cheaper than outsourcing it. And no mention of SOE's or enduser support. Anyone can buy a computer, anyone can install software. And no mention of big numerically intensive computing. Sure it's still around but it's used by a small number of researchers. Our general staff login server averages fewer than six concurrent logins. And the rise of linux makes it as easy for people to run computations on a box under their desk as on a central system.

And realistically, with facilities like Skydrive from Windows live and google docs, not to mention google itself most humanities and social science academics can do without central resources, and the scientists have enough tame geeks to run what they need themselves.

So basically computing lives to support the corporate function, plus provide a small number of specialist services, such as collaboration and repository. The rest can be sourced at minimal cost from elsewhere.

The concept of university computing as something distinct is essentially dead. University computing takes place on students and academics laptops, in google, in the cloud, like shared editing in google and zoho, or under people's desks.

Sunday 15 June 2008

iambic pentameters and decryption

Along with my fixations about filesystems and digital preservation to name but two I've always been fascinated by the history of the roman empire and by writing systems, even as a child I was fascinated by different alphabets.

So a few days ago I was reading about the Oxyrhyncus papyri and suddenly was struck by something:

Latin was originally written with dots as work separators so text would look something like this:

the.cat.sat.on.the.mat

and greek was written without separators like this

thecatsatonthemat

and some time round about the first century the Romans changed from the original dots-as-separators style to the greek style which must have made text very difficult to read, even if in an inflected language like latin you have some intrinsic help puzzling out the relation between words. This is why people read out loud, working out which clauses formed a sentence etc etc.

The use of spaces as separators didn't come in until the 10th century, something for ever immortalised in of all things the sun fortran manual:

Consistently separating words by spaces became a general custom about the tenth century A.D., and lasted until about 1957, when FORTRAN abandoned the practice." —Sun FORTRAN Reference Manual

Anyway - why would they have such a system in Greek ? word separators seem a good idea even if you do find languages such as Thai that don't use them.

And then I had my idea. Word separators are good for prose, which is unstructured text. Inflected languages are reasonably structured which is why it more or less worked for latin. Modern English, or Dutch would be another matter, where the only real structure is word order.

But if what you were writing down is poetry it's easier. Providing you know the rythym and the rules for syllables it's easy to split up the character groups, and when you know for example there are five syllables to a line it's easy to work out, ie when the text is very structured, just the way it's easy to parse text with a computer program when it's structured and a nightmare when it's unstructured.

Anyway my $0.02 on this.

Friday 6 June 2008

Re-engineering the student filestore design

So, thinking about the previous NAS related posts we come to the following conclusions:

We like using Xserves as front ends because of the superior AFP performance
The Xserve front ends also do a good job providing window shares
It is easier to emulate cifs with third party software (samba) than it is afp (netatalk)

So the question would be can we use the Apple infrastructure to front end the storage solution.

We have not done this but the Queensland Brain Institute has done some interesting work front ending a SUN SAM/QFS HSM with Xserves to give afp functionality.

The interesting trick from the QBI document is NFS mounting large chunks of sun filestores and then using the xserves to re offer the user specific shares.

The NFS solution doesn't have to be Sun, it could be a BlueArc or NetApp box, and the storage backend doesn't have to be SAM/QFS, it could be something like honeycomb if we want persistent storage, or possibly just storage we don't want to ever backup, or it could just be come classic storage, or something like thumper with additional inherent resilience from using ZFS.

Some experimentation may be necessary.

This post references the following posts:

Do it Yourself NAS?

So the question has got to be, can we build the filestore solution ourselves?

Now when we talk about providing student filestore we are talking about providing shares (aka mount points) and lots of them to a filestore that has a lot of small files with a lot of churn.

This has typically been done with nas devices, purely because san storage is expensive. Of course you don't actually need a nas device you could build one yourself. Linux + Samba + JBOD would give you something that could export shares to nfs and cifs. In fact Auspex, one of the early nas manufacturers did exactly this - if you opened up their boxes there was a sun file server inside the case controlling the disks and the shares.

The following links describe how you could do this with various implementations of OpenSolaris:

and the choice of OpenSolaris is a good one given ZFS's ability to do copy on write and snapshots.

Why, because backing up big rapidly changing filestores is difficult due to the treewalk program. If you snapshot the filestore and then do an image level backup of the filestore you can post process it to do a synthetic incremental file level backup, which is kind of useful given that users want to restore files, not volumes.

Now the reason that everyone doesn't simply build their own NAS is that simple home built solutions don't scale well. Using a full size operating system is inefficient, as is using a conventional filesystem. ZFS addresses some of the filesystem problems, which was why NetApp got all antsy.

Big NAS solutions can handle lots of connections very efficiently, and that's down to the combination of a very efficient filesystem and very efficient minamalist operating system that is optimised to handle files and not do a lot else. FreeNAS tries to address this by using a cut down version of BSD and get some efficiency gains that way. And because it's BSD, you can run samba and netatalk to give cifs and afp support.

But it's a single box solution, and therefore limited as to connections it can handles etc etc. But it might be fun to build one to see what it can do.

Now our apple based design tries to address these problems:

The client machine connects through a content server switch (CSS) to one of the front end servers. The CSS load balances on a round robin basis, which means we can spread the load between multiple servers, add new servers, drop servers out for maintenance etc. Even though this is a NAS style solution the storage is provided by an Apple XSAN solution.

It's more complicated than I've drawn it with metadata controllers etc but essentially it's connected together by fibre through a switch, meaning that all the filers can see all the storage. This granular design makes adding more storage reasonably simple.

We also have some other things behind the scenes, such as backup drive, which is essentially provided by running rdiff when users log out to copy their changes across so we have a live copy we can then backup to tape or whatever.

It works. My problem when I start reading about clustered filestores such as exanet and isilon is twofold:

Is their solution any more than our solution placed in a box with a badge on the outside?
If it is more, what is the extra?

Now I'm sure they can do other clever things and the management is slicker etc etc, but is the functionality noticably greater?

And does that functionality provide a unique advantage?

Thursday 5 June 2008

Supporting heterogeneous clients with a common filestore or how make OS X, Linux and Windows clients live together in harmony...

The problem:

We need a common filestore to support windows and macintosh clients, and potentially in the future linux clients. The filestore needs to be able to allow people to access their files from all three platforms as students may use a mac lab for one course module and be working on a term paper in a linux lab.

This isn't a new problem. Been here before and all that. First of all with Dec Pathworks to share Windows 3.0 filestore with VMS using DecNet and then Sun's PC-NFS to share Windows 3.11 filestore with Unix filestore over NFS. This was 15 years ago and it worked robustly. In fact it worked so well that we tried to integrate Apple System 7 machines into the equation. And that was a failure as the Apple file system use what's known as Apple double, where a file actually consists of two objects, a resource fork and data fork. The data fork contains the information, the resource fork contains all sorts of extra stuff about what application can open it etc. In essence, the structured metadata.

Unix, of course is not so sophisticated, files are collections of bytes which may contain a combination of metadata and data, or just raw data. Windows fakes it by flle type associations and application specific registry keys but really the files are just collections of bytes. This means that Unix systems can access windows files without too much difficulty and vice versa. The only real problem is that (post NT) Windows has a richer and more complex permissions model than Unix/Linux.

So one has to fake up the apple double file format on linux by storing two files, one for the data fork and one for the resource fork. The typical way it can be done is to create a file called .fred for the resource fork and fred for the data fork. Filenames starting with a dot are normally invisible under unix, just as if they had the hidden +h attribute set under windows.

Back in 1993 our approach was to use GatorBoxes to act as protocol converters. It wasn't very reliable and we had problems with resource fork corruption, load, and hardware reliability. Fortunately, round about the time this was happening Apple looked about to go down the toilet so we canned Macintosh support and proceeded to ignore the problem.

Over the years we carried on with the linux/windows common filestore using Network Appliance boxes with native sharing, or standard Unix boxes with Samba. Both work well, both have kinks, but basically you can share unix filestore between linux boxes on nfs and windows boxes on cifs without too much difficulty, and if you want to share windows filestore linux boxes can mount cifs shares.

Fast forward to today.

OS X is now commonplace. However OS X still has the Apple resource fork/data fork model, which makes sharing files between platforms just as messy as it was fifteen years ago. OS X can mount cifs filestores and by extension a unix filestore exported using samba, but that's not the point. You've lost the richness of the resource fork. NetApp have a paper describing an approach you could take but it's not a true solution and involves installing extra software on the client. The alternative would be a 'samba for afp', ie an application to do protocol translation the way that sambas does for cifs, and which also could cope with Apple Double.

There is such an application - Netatalk. Netatalk solves the resource fork problem by storing them in a .AppleDouble directory, ie a hidden magic directory. And that brings all the problems of damaged resource forks changed documents and luddites who get rid of .AppleDouble directory to free up filespace.

Up to now we've been running another sort of solution. A cluster of Apple Xserves and Xraids that run Samba and also export the filestore using AFP. This means that we have native filestore handling of Apple files, and samba provides access to windows and linux hosts. That works well, but it has scaling problems, and Apple's canning Xraid filestores doesn't help.

So potentially we need another solution. If we didn't have OS X, NetApp or BlueArc would do the job perfectly. Exanet hints that they can support AFP. That would certainly support our heterogenous client problem. But of course it may just be Netatalk in disguise - after all a lot of cifs emulators are built around the samba source so I see no reason why the same shouldn't be true for AFP support. In fact it's even more likely as AFP support is a minority religion, and until recently hadn't been a 'must have' for corporates. Macs simply lived in their own world and mounted foreign filestores if they really had to. The fact that things really havn't moved on since 1993 makes this painfully obvious.

Now we have to not only share the file store between platforms but that users will use different platforms to access the same files on different occasions. And that's the rub. If all we needed was a common filestore and mac users only ever used macs and accessed mostly their files something like netatalk could work. If however users flit between platforms we buy ourselves a bag of problems.

The cynic in me says 'thin client and webdav'. It might be the simplest if not the cheapest ...

Tuesday 3 June 2008

Open Solaris revisited

Following on from my incompetence getting Open Solaris running under VirtualBox I happened across a post explaining how to resolve the problems of getting Open Solaris to play nice with Virtual Box.

This allowed me to finally test the package manager by downloading and installing Open Office. Well it works, though not a slickly as synaptic under Ubuntu, the major difference being that a reboot is required before the menu items are updated to reveal what new software has been installed.

Annoying, and not as slick. And playing with the package manager revealed the paucity of software packaged and available for OpenSolaris. Poor compared to Ubuntu. Poor compared to Suse, which between them probably soak up 90% of the desktop linux market.

And that's a worry. In the desktop space it's access to a big software base that makes a desktop competitive with the established desktop OS's of OS X, XP and Vista. People run desktops for the applications, not the operating system, which means any alternative operating system has got to give people that range of applications that they need and use. And that means more than Office, a pretty slick mail application, a browser and pdf viewer.

Ubuntu definitely has this. Suse is rich enough to play in the corporate desktop space. Open Solaris is distinctly meagre, which potentially means it's an also ran if you're looking for an alternative desktop OS

Monday 2 June 2008

Kidaro

MoMo left a comment on my post about North Carolina University going to VDI for student labs to the effect that Kidaro (now owned by Microsoft) was a much better solution than VMWare's ACE.

As MoMo's profile was private I couldn't follow up on the reasons why.

All these virtual application delivery solutions work like Metropipe's original Personal Virtual Privacy machine - boot an emulator, eg qemu, boot an operating system, eg damn small linux, start the desktop and run the applications. And of course this can be distributes on CD or over the web as a download.

Now the idea of distributing applications like this is quite clever. For a start you could build a virtual company with one distribution server, and a server to check completed work back into, which could be as simple as Sakai, and providing anyone has a suitable desktop or laptop, no support infrastructure in terms of application distribution servers etc. Coupled with something like Groove it could be a very powerful way of working.

All true. But you can build the solution in other ways. Even from the ground up using bits an pieces of open source code. And nothing I've read suggests that there is anything remarkable in Kidaro that isn't in ACE.

So, if I'm wrong, could someone explain why ?