Thursday, 31 December 2009
Originally uploaded by moncur_d.
We'd hoped to have tomatoes for New Year but we havn't quite made it. A few reddish ones but no really ripe tomatoes yet.
However our apricot tree has risen to the occasion. Fresh apricots off the tree for the New year's morning fruit salad.
Happy New Year everyone!
Wednesday, 30 December 2009
And this led me to another thought - we're always very unclear when we talk about institutional repositories and digital archiving what we want to do. If you work in a university as I do the probable answer is something along the lines of "capturing the scholarly outputs of the university to increase access and enhance the reputation of the institution". And this is basically thought of as something like a preprint server with some indexing and metadata so you can easily find out who is working on chimpanzee tool use for example.
And this is a model that works reasonably well for the sciences - after all it's basically what scientific scholarly publishing has been doing for years.
The preprints are documents in their own right, rarely subject to modification and can be happily distributed in a non-revisable format such as pdf.
And then we turn to what these days are called the 'humanities and the creative arts'. And it all gets messy but perhaps our friends Dunbar and Kennedie can help us.
The Flyting was originally a court entertainment cast a duel of wit (and copious obscenity) between two poets of the day. Imagine it as a sort of Medieval Scots rap name calling contest. (Or as an early analog of Commedia dell' arte.)Even better imagine it done by the Baba Brinkmans' of the day. The text was written down and appeared in one of the first books printed in Scotland.
It's important as it was done in Scots and the book is an early record of upper class spoken Scots usage - before then we really only have charters and legal documents, and the language used is shall we say, a little more restrained.
Now let us say we want to digitise the work. Yes, but what do we want to do?
If what we want to do look at a representation to the book to reduce wear and tear on the original, then what we would do is take some high resolution pictures of the pages of the book and perhaps write a clever flash application to let you turn the pages. We might also accompany the images with a transcription of the text, as the original typeface is hard on modern eyes.
Oh look we've just made two objects out of one. If we add a modern translation we've got three distinct objects - the digital representation of the object itself, the text, and the translated text.
Now how should we store them? The pictures are simple - we store them in a lossless well known image format. But the other two - should we store them in a non revisable format such as pdf, or a revisable format such as epub or odt (or indeed provide an option to provide the text in a range of formats). And of course if we're treating these as scholarly inputs, what should we then do with the mp3 recording of the reading in two voices? Done as part of a language or theatre studies project its arguably a scholarly output, and part of the digital patrimony of the institution. Oh, and we edited and abridged the text to fit it into thirty minutes. Do we archive that as well, and do we archive it as a set of edits or just the final edit?
Add mixed media art works and it becomes even more complex...
Wednesday, 23 December 2009
Monday, 21 December 2009
This is an intersting move as one of the problems with having multiple smart devices, computers, phones etc is that your stuff is well, everywhere. Services like Dropbox help keep individual stores in sync (as well as providing a silent source of information leakage).
The next stage is cloud based storage so that your stuff is accessible from everywhere. Doing it through a browser provides a universal access mechanism, and coupling this with virtual compute allows you to execute applications without worrying too much about local host architectures or capabilities - something that ajax heavy applications like google docs or wikidot do care about.
It's interesting - cheap host agnostic computing - all you need is a (recent) browser ....
Sunday, 20 December 2009
Originally uploaded by moncur_d.
been off for a few days bushwalking in victoria, during which time we lived in a rather nice tin hut on the edge of the bush, and very much off net - no internet connection, no 3G, although strangely enough GPRS worked allowing us push email - in effect it meant the phone worked little like a portable email reader, but without any web capability, and our trusty travel computer was simply a kilogram of nothing. While I'd thought of buying a prepaid USB 3G modem, I simply hadn't got round to it, not that it would have helped.
Other than that we had to be so last century - watch the tv news to get the weather forecast, read the Age ( and as always wonder why it is so much better a paper than the Canberra Times, despite them both being owned by the same company).
And for a few days it was fun to be disconnected from the illusion of being branché, of being in the flow ...
Saturday, 12 December 2009
Wednesday, 9 December 2009
excellent way to share files between home work and other computers. Just bloody works.
Asus PC 701 SD
proved it's worth while travelling overseas - reliable, light, effective. Don't leave home without one
Interead Cool-er ebook reader
despite my initial reluctance I've warmed to this. It's allowed me to store and have a collection of medieval research (ok, dilletante research) texts on hand while saving untold trees in the process, not to mention allowing me to visit the unknown corners of project gutenberg
Books from the UK at UK prices and free delivery. Incredibly good value - a least something good has come out of the gfc
fast light effective - the way linux used to be
By turns inane, infuriating, grabbing a handful of sand frustrating, somehow it has turned into something useful if only I could put my finger on it ...
Skype wi-fi phone
a belated mention for last year's undoubted winner. Once you've sampled being able to use skype from anywhere the wi-fi signal goes you won't look back
Tuesday, 8 December 2009
The presentation layer is the application, or applications which people use to add data to the repository, search for data within the repository, and retrieve data.
The database stores information about the individual objects (files) in the repository and the relationship between them as well as information describing the file and the contents.
The files themselves are stored in a file system, usually with unique system generated names redolent of babylonian prophets. As you only ever search for the object you don't need to know the name of the object, all that is required is that the system does. This is why, for example, files downloaded from flickr have wierd long hexadecimal names. It also means that the filestore is unstructured, and contains lots of files of similar size, the majority of which are only accessed rarely.
The filesystem is part of the storage layer.
Repositories are interesting as one typically only adds files to them, never deletes content from them, but one needs to guard against corruption and data loss. Typically this is done by making mutiple copies of the file, checksumming the files, storing the chacksum in the database and periodically rerunning the checksum operation and comparing the answers with the answer stored at time of ingest. If one of the copies is corrupt, it's replaced by copying a good file.
Typically, in the old days, one would use a product like SAM-FS/QFS to do this. It also used to be expensive to license so most repositories didn't and instead trusted in tape backups and rsync.
Of course backing up repository stores to tape is an interesting exercise in itself given that it consists of lots of small files in a flat structure - after all the database doesn't need a directory structure. This can be extremely inefficient and slow to backup. Much better in these days of cheap disks to copy several times.
And of course suddenly what one starts looking at looks like a distributed clustered filestore, like the googlefs or Amazon S3. And there have been experiments in running repositories on cloud infrastructure.
But of course, that costs money.
Building your own shared distributed clustered filestore may be a viable solution. And given that not just repositories but LMS applications are moving to a repository style architecture there may be a use case for building a local shared pool, using an application such as glusterfs - a distributed self healing system that is tolerant of node crashes.
Doing this neatly decouples the storage layer from the presentation layer - as long as the presentation layer can write to a file system, the file system has the smarts to curate the data, meaning that it then becomes easy for separate applications running on separate nodes to both write and share data - afterall all it is is a database entry pointing to an object already stored elsewhere on the system.
Definitely one to take further - other than the slight problem that while people have tried running dspace and fedora against systems such as the sun honeycomb, no one seems to have considered glusterfs...
I've always thought in terms of connections and lists and related facts - not very visual I'm afraid and I've recently started using wikis as dot pointed lists (see here for an example on the tudor ascendancy and here for one on early medieval travel) as away of organising facts and links, any one of which could be expanded out to some text as seen is this slightly more complex example.
Using a wiki this way allows you to build a more complex living document piece by piece.
Next question - ignoring its inherent funkiness, what does google wave bring to the piece that a shared wiki doesn't?
Monday, 7 December 2009
As always there were drinks - a decent dry white and an Australian champagne - and canape's while standing around chatting in the Sculpture Garden. As always it was an interesting cross section of society - those who were there to be seen and be seen, those who were there for the art and those who were there because their friends were. Dress styles ranged from the artistic with frightening lipstick or beards - according to gender - to the amazingly normal.
Despite the inherent pretentiousness the exhibition was rather good - I'd actually say better - for being smaller and more tightly picked - than the collection as exhibited in Paris last July.
A few nice Van Gogh's showing how his techniques in Arles was alternately frentic or controlled, a few Pisarro's, Gaugin from both his Breton and South Pacific periods, some nice little Seurat pointillist sketches, a Monet pretending to be a Turner and some paintings by Vuillard and Emile Bernard that I hadn't seen before. The Bernard had the same economy of line that I like in some 1920's and 30's posters - just line and colour.
Definitely worth a visit.
Friday, 4 December 2009
Well, in our typical anglophone arrogance we're ignoring what is happeening elsewhere. French language newspapers are also under threat and reacting in equally incoherent ways.
This morning, I was looking for an update on the Paris museums strike. Because I'm a left leaning liberal I went immediately to liberation, typed in my search terms, found the article, found it was restricted to subscribers, and invited to sign up for EUR12 per month package including internet access and delivery of the printed paper at the weekend - mes amis - un peu de realite, je habite en Australie!
So I went to Le Monde, found the article I wanted and retweeted it.
The point being Le Monde makes its content free to casual users and Liberation does not, and that in aggregate means that Le Monde gets more contacts and can sell more online ads, and even might sell the odd extra copy now and then. And they probably don't lose that much by it.
Liberation doesn't get any of that. In fact I guess it drives people who might also have bought the printed paper on a one off basis away.
And of course there's a halfway house such as shown by the New York Times which tries to promote a reader community so that the idea that the paper is worth buying spreads virally.
Thursday, 26 November 2009
However there's also been a thread of people trying to make the classroom experience (or perhaps even more the virtual classroom experience) better by providing ways for students to interact, for example there's Wild, and also a piece in the Chronicle of Higher Education about someone using twitter for classroom interaction.
All good, but why not use some inhouse micro blogging service, or in the case of online learning a chat and post facility ? Can it only be because of the hold twitter has on the group mind?
Wednesday, 25 November 2009
At the time I blogged that generally EasyJet were helpful and Ryanair out to gouge.
Well on one of our EasyJet flights one of our bags came back from baggage reclaim minus a couple of new items of clothing which had gone in still with the shop tags on. EasyJet were wonderful - even though we couldn't provide receipts (first rule of travel - throw nothing away) they were prepared to accept our reasonable and honest estimates and proposed a perfectly sensible settlement value. Not only that, they sent us a bank draft in Australian dollars to save us bank charges.
Somehow I feel if it had been Ryanair we'd have been told to piss off long ago. So the morals of the story are:
- If you have a choice - don't fly Ryanair - check for alternatives on BMIBaby, AerLingus and the rest - with Ryanair's extra fees you may not be as out of pocket as you think
- EasyJet are good people to deal with
- Keep every receipt and ATM docket - you might need them when you get home
Riding my bike to work isn't really an option - too far, too hot, too much traffic - one of the downsides of living out in Fadden. (Weetangera was ok, only a couple of scary bits and they weren't really that bad)
So I got myself a pair of shiny new shoes from Running Warehouse and off I went. The first time I tweeted I'd started running again - something I put down to the adrenaline high, because my performance was pretty pathetic.
Today was better, not much but a little better - and in true geek fashion I've started keeping a note of how I'm going - check out http://scribbled.wikidot.com/running if you're interested ...
Tuesday, 24 November 2009
The lessons learned can be summarised as:
- you don't need a whole lot of compute power
- you do need good internet access
- the operating system is basically just a launcher for the browser
- you spend most of your time in a browser
ChromeOs apparently uses a debian derived (apparently partly via Canonical/Ubuntu) to provide a fast browser launch - so your browser is the user interface, and as life's about the cloud these days that's all you should need (other than a 3G modem and a decent data contract ...)
The more interesting question is wether we will see ChromeOs bundled on machines to provide an instant-on environment the way Splashtop is on some devices, the instant-on environment providing an alternative to other more heavyweight environments with a marked startup time - the use cases including using the splashtop environment for flight checkins and email checking in snatched moments in a cafe but the heavyweight OS to run a preso ...
Monday, 16 November 2009
Typically, by the time we got there the sky was full of dark threatening stormy looking clouds. Still squid and chips at the Quarterdeck in Narooma, a rootle round Tilba followed by a long walk along the beach at Camel Rock made for an excellent day.
What made it even more excellent was to be rewarded by the sight of a group of whales offshore waving their flukes. We watched to see if one would breach, but of course they didn't.
Fortunately I'd forgotten my camera, elsewise we'd have a montage of black blobs against a wine dark sea. We thought we might swim but the ocean was still cold after winter's storms, so after our free whale watch it was back up Brown Mountain in thick mist and back across the high plains to a cooler Canberra ...
Saturday, 14 November 2009
Stanza doesn't care - it opens the file you ask it to open and so if you've created a file called thing.epub it will open the file as thing.epub.
The Cool-er is annoyingly cleverer. It knows the file is an epub so it opens the epub file to display the document name - not the filename, and of course as the document name is unknown as it was never set in the original pdf that's what you get - a document called unknown (and often created by unknown). If you have two such documents it picks the first one it finds and doesn't display the second - logical but annoying.
So the next stage is to unpick an existing epub - edit the files appropriately and then zip them back together ....
a little later ...
which following the jedisaber.com tutorial turned out to be fairly easy.
Let us say we're editing a file called lnt.epub - my sequence was:
$ mv lnt.epub lnt.zip
$ unzip lnt.zip
edit the following files on the OEBPS directory using the text editor of your choice
replacing all the unknowns with appropriate text - title, author etc
$ zip -r lnt.zip OEBPS/content.opf OEBPS/toc.ncx OEBPS/title.xhtml
(if you're feeling paranoid or your version of zip doesn't support this could be done as three separate commands)
$ mv lnt.zip lnt.epub
Check the result with Stanza. Now this is where there's a problem - in my version there are two spurious unknowns. Now originally I didn't edit the title.xhtml file and that produced an epub with four spurious unknowns. Editing the title.xhtml file got rid of two of them so I'm guessing there are a couple of other fields - and certainly changing the appropriate values in part1.xhtml got rid of the unknowns, even though I didn't change the values in part2.xhtml - something I thought might provoke an error by the application parsing the epub
Now I'm hacking this - the jedisaber tutorial doesn't mention doing this so I'm going to guess I'm working with a revision to the format which I don't quite understand as yet, certainly there should be a way of supressing these values ...
What also is clear that given a suitable application to extract text from a pdf and convert it to xhtml it would be reasonably simple to write a perl script to build the supporting files programmaticaly from a template.
It also means that it os relatively straightforward to anyone with text that can be converted to xhtml to package their books as epubs - an ideal way for small specialist publishers to redistribute their back list.
Friday, 13 November 2009
My initial thought was, given I've a reasonable amount of stuff in pdf format to convert it. But how?
PDF files are essentially modified postscript with some embedded metadata but epub is a zip file based format with a manifest, formatting css and the document source material in xhtml - conceptually not unlike an open office document file in structure.
My initial thought experiment, based in part a very useful howto on hand creation of epub files was to write a print driver (ok, a ppd) to print the pdf to xhtml based on public domain pdf to thext and pdf to html code, apply a default style and create a manifest based on the embedded metadata.
However Stanza also allows the saving of pdf files in epub documents. Given that they have the technology, and I suspect that their epub conversion is perhaps a little more sophisticated given both that their native format is epub and they are now an amazon subsiduary.
A bit of creative play might be in order ...
Wednesday, 11 November 2009
Be that as it may I've always found David Starkey's television shows models of explanation, wit and acidic clarity and understanding of realpolitik.
About the same time as his recent interview with Varsity in Cambridge, he also gave a lecture on Henry VIII as part of the Guardian festival of ideas. Immensly entertaing and provocative as ever, it's now available as a download.
Monday, 9 November 2009
- the filesystem doesn't supress ._ files if you download via a mac
- books in epub format are definitely better than using pdf
- it's not an ipod moment but it is good, and is better than reading from a computer
- reading quickly becomes natural -even allowing for the 'shazam' page changes
- bright sunlight makes reading out of doors impossible
- battery life is good
- download and loading of texts is straight forward
The real revelation is just how much easier epub formatted documents are to work with than pdf's - certainly epub is something I'll have to investigate.
Am I glad I bought one - the answer's yes - I've found it invaluable for reading stuff without having to print it out. Was it the best $300 I've spent? - possibly not, but certainly the money wasn't wasted - I've probably saved some of that in paper and in buying reprints of translations of classical texts from DoDo. Incidentally I've used it for some recreational reading as well, and that flows pretty well as well even if you do feel yourself to be a bit of gink at first sat in bed reading from it.
I'm certainly not done with buying conventional printed books but e-readers offer a decent alternative and certainly have a role as an adjunct to print on demand as a distribution model.
Sunday, 8 November 2009
However, it's fairly clear that a lot of people have similar doubts, and for that reason we get twibes and lists to let people find communities of interest, and that's a very interesting phenomenon, that the need of groups of like minded people to contact and share with each other is sufficiently powerful that it supports self organising add on technologies.
The other thing is that make use of things like twibes is a fairly deliberate action, its declaring yourself a member of a community that wishes to exchange links and news about particular topics.
The other question is, will these self organising groups end up like either usenet - a minority playground for trolls and sad anoraks - or like some of the specialist mailing lists, little inward looking communities of self selection - or will it turn into something generally useful and self sustaining?
Twitter is clearly useful - as a way of getting instant information and sharing links it's invaluable.
One of the nicer uses I've seen of it is by the The computing service at york where they use it to get service status updates out.
This is particularly nice as (a) it means that even if everything is down you can get a service update as it's independent of all local servers, and (b) it provides an rss feed making it easy to repurpose into other things - such as displaying on a faculty web page.
As usch it's agenuinely useful use of twitter. As to twitter itself, it has also gone through massive growth in the last eighteen months and is continually changing through the development of add on services. So far the signal to noise ratio seems to be acceptable ...
Friday, 6 November 2009
However, I found a company called goosync.com that provides a basic sync service for GBP5.99 (say $11) for a year, using syncml. They also do a 7 day trial.
I tried the trial service - seemed to work pretty well so I signed up for the full service. Obviously with any sync service you need to make sure sync via wi-fi where possible as it tends to rack up data charges - the raw ics file out of my calendar is around 850k, just to give an idea of things.
Setting up was fairly easy - they send a magic sms message to configure sync and once you've provided lgin information and the like you're away. Once set up it just works. Synchronisation is on demand allowing you to choose which network to sync over.
The basic service doesn't transfer tasks - you pay extra for that. Now while I use tasks inside of google calendar as a set of reminders, I tend to work off the screen while ploughing through the to-do's so I reckoned I didn't need it so didn't pay for it, reckoning I can always upgrade later.
Wednesday, 4 November 2009
Nothing computationally heavy, and I havn't used any heavyweight Ajax applications. All on a 700MHz 192 MB PC - less grunt than a netbook.
- fast to load
- application install (kwrite) no worse than standard ubuntu
- text input via an editor is responsive
- web browser performance no worse than under more heavyweight installs
- menu configuration is a bit of a black art
- networking configuration likewise
- the pre-installed apps function well
- fast to shut down
Well inefficient, not bad. The post makes the very good point that sending the same big attachment to a group of people is highly inefficient of bandwidth, and would be better using embedded links to reference the same document from a webserver or what ever. If nothing else it save on local mail storage.
True, and if they can convince marketing departments we'll all be a lot happier.
But point to point sending of attachments - I doubt it. For a start it's not a an ideal use case for a webserver, and the I send you a link and you download it type services detract from the immediacy of email, and while they save a little bandwidth and storage probably, the efficiency losses probably more than offset this.
And of course there's digital faxing - which of course isn't faxing, it's scanning a document to pdf and emailing it straight from a copier to an individual - exactly as the original HP digital senders did ten or so years ago - which is a much underated trick. Contracts, delivery notes, signed tax submissions, all can be sent via email, saving the desperate hunt for a fax machine, which seems to be something that increasingly only banks and telcos seem to use.
And as they're usually pdf's as graphic images these attachments are reasonably large. Not massive, but large. But I doubt if they're any less efficient than a fax machine, and the aggregate costs are mor than maintaining a special device and an analog phone line ...
Tuesday, 3 November 2009
Initially Twitter was essentially generic version of the Facebook Friend Feed - great if you're eighteen and think the world revolves around you and your mates, but not so great as a communication medium. Most of the content was fairly inane (just like Facebook), and disjointed. To be sure there were amusing people such as Stephen Fry to follow, and some interesting ideas such as Cry for Byzantium, but depth was generally lacking.
The use of hashtags allowed the construction of folksonomies, such that if you were interested in the medieval period you could include #medieval in your posts. Now we could argue about early versus late, high middle ages, whether we mean 476-1453, when the renaissance became truly distinct etc etc, but basically everything posted would be medieval - it reduced the noise.
And as a folksonomy we can make it up as we go along, making readily guessable tags such as #viking, #anglosaxon, and perhaps less guessable ones such as #britannia.
And as you use the search you start to find people who you wish to follow, as they post regularly on related topics. You can now create a list to allow people who follow you to follow the people you follow - possibly a bit incestuous and circular, but it means that your feed starts having crowd sourced characteristics, so if one of the people you are following misses something there's a good chance you'll pick it up anyway by someone else on his or her lists.
For example, let's say I see a blog posting about the Jorvik viking festival next year. Despite regularly tweeting with #viking I don't because (a) I used to live in York and (b) I don't now and (c) won't be visiting when the festival is on (all true). However I also have someone called Pete on my list who doesn't have my hangups about that and who posts the link. And maybe I have someone else called Karen on my list who posts a different but related link. That way the message still gets through in a structured and sensible way.
Of course I'm ignoring the role of reputation. All of us #medieval users have to assess the worth of someone's postings before adding them to the list, just as fifteen years ago with usenet news we wrote complex kill files to filter out the known loonies and people with fixations about leather underwear.
But in a sense reputation is implicit - if the quality of the information I post is good, I have a reputation and am thought to be 'serious' - and so very gradually it turns into a community of people sharing common interests - a social network by any other name.
Wednesday, 28 October 2009
By sheer chance I happened across The Irish Catullus, a project to produce a translation of Catullus in Irish English, Scots Gaelic and Ulster Scots.
My immediate reaction was that this sounded fun, fascinating and the rest of it, in just the same way that Baba Brinkman's rap Canterbury Tales were fun.
So I went googling for more information, expecting to find a project home page, or a facebook group with updates as to where they were up to with plans for publication. (or how likely given the gfc).
Rien, zilch, nada. Not a link, just references in other pages suggesting that the project is still a goer. But no information on publication ...
Sunday, 25 October 2009
I chose the Interead Cool-er as it seemed the most open - ie not tied to any book distributor- yet supported a wide range of common fromats including pdf and text - which are particularly important to me as I have a large number of pdf's that I occasionally need to refer to and currently live inaccessibly on a Window live Skydrive.
Basically we're talking about product manuals, documentation and the like, as well as pdf's of medieval texts and other historical stuff.
So I had a sensible use case for the device.
So I duly plugged it into my mac and charged it up - this revealed that the documentation supplied was minimal to the point of obscure, but it did charge and the mac did see it as a valid volume to copy files to, even if something along the lines of 'while charging you will see a red light and a display saying ''usb connection'' would have helped'.
One slight problem is that files copied from (or actually via) a Mac are created with a data fork and a resource fork, the resource fork files having a filename in the format of ._filename, just the way nfs does it when you write files from a mac to unix filesystem - problem that has bedevilled the support of heterogeneous environments.
Unfortunately the Cool-er isn't quite sophisticated enough not to display files prefixed by ._ (which is strange given it's based on a linux kernel and most unix systems do not display filenames prefixed by . by default) - something that didn't faze me but might confuse less technical users of the device.
Otherwise on initial tests of a book in epub format downloaded from project gutenberg and of a pdf of a research paper it worked well.
Page changes involve a shazam style refresh - they're not fast enough to be instantaneous and the buttons require definite pressure but otherwise the device works well with a legible display.
In use the device is lighter than expected, and definitely more plasticy in feel. While it didn't come with a carry sleeve, you probably want to lay hands on something suitable if you were carrying it about regularly in a backpack or briefcase
However after a few minutes the reading experience starts to feel natural enough.
I'll experiment further and report on my experience in due course.
Installation was as smooth and as straightforwards as it was on a VirtualBox VM earlier this week with the machine coping correctly with the Australian locale and the fact that for reasons lost in the mists of history the $83 machine has an old UK format Digital Equipment Corporation keyboard.
The system disk was repartioned sensibly and the installer claimed to import and pick up my various application settings. What it failed to do was to recognise that my home directory was installed on a physically separate volume, ie a second hard disk in the machine - something I'd done originally to allow multiple operating system installs.
The other oddity was that it when updating its libraries from the Ubuntu repositories it seemed to have the gb locale hardcoded rather that the more sensible au locale. (Given I'm in Australia)
Other than that the install seems reasonably fast and responsive, and would seem a reasonable choice for an older laptop, assuming that wireless support works well.
Friday, 23 October 2009
Anyway, The ANU has started building student residences out of shipping containers, which is really cool - pour the slab, stack up the containers like lego, connect up power, data, water, and there you go.
Of course it's not quite like the Jetsons - being real life it involves contractors moving earth, pouring concrete, and plumbers and sparkies to connect things up, bit it's still pretty impressive...
Wednesday, 21 October 2009
Tuesday, 20 October 2009
I know the phone I want, the plan I want, the network I want. What could be easier than to order it online?
And the order process was straightforward. Except that they wanted to deliver it to my home address between 0900 and 1700, ie when I would be at work. And it had to be me, to sign for it in person.
Obviously this was silly, so I phoned the call centre to ask if I could collect it from their local shop on campus, or alternatively have it delivered to my office address. After all the courier service they use is the same one that other vendors use to deliver spare server parts, so none of it would be a surprise to them
Calling the call centre was a big, I mean big, mistake.
The first call centre operator kept on putting me on hold while she found out that she couldn't do any of these things, and then lost the call. I redialled and got someone else. He claimed he could do all these things. He couldn't. More time wasted.
They now claimed that the order could not be cancelled and that the only way was to reject the delivery. Given their general incompetence at everything else I asked them to email me to confirm this. No they couldn't do that either.
I even got to talk to a supervisor who claimed they would credit my account with $10 as compensation for wasted time - not helpful when they couldn't actually deliver my phone to somewhere sensible.
The annoying thing is it is the deal I want (and no, for once it's not telstra or optus)...
Well things are looking up - first of all I found almost the same deal from Virgin, and then the company in question rang me to apologise and to agree to give me a $50 credit on my account in compensation (mind you I havn't seen the confirmation email yet)
What's more the courier company took the phone to the local (and slightly inconvenient) post shop for collection. The question remains of course why they couldn't let me collect it from a post box - again I'd still have had to show id to collect it ...
Sunday, 18 October 2009
Something I've been playing with recently is the crunchbang linux live cd perhaps with a view to loading it on the $83 machine instead of classic ubuntu.
After all 90% of what I do is web based, blogger gmail, and google docs or zoho for writing and comparitievly little is done with a classic locally installed app, and when I do it's usually kwrite or abiword - so crunchbang seemed a likely possibility.
Crunch bang is basically like an up to date fluxbuntu - lightweight, enigmatic with a rightclick on the desktop metaphor but not immediately obvious. In fact it's based on the ubuntu 9 base.
Now having run fluxbuntu in a vm I'm quite happy with this. So Crunchbang looked a natural, and certainly everything seemed to work on the live cd, except it was slow - very slow. Now it is the $83 machine with only 192MB RAM but lightweight distros should surely run well - after all ubuntu 8 runs pretty well.
So I have a dilemma - crunchbang looks good, looks promising, but should I trash what I've got for a new distro?
Not sure, probably the go would be to build a vm using virtual box on the 'proper' computer which has plenty of grunt and see how that goes ...
Which is exactly what I did, configuring a machine with a purposely small disk and small amount of physical memory.
The install is easy. It asks some very simple questions, including one about setting up autologin for that instant-on netbook experience, sets a mac-like default name for the host and you're up and running.
It's reasonably responsive. Installing the non-standard kwrite was straightforward once I remembered to re-initialise the libraries and it looked good.
Generally performance is very much like fluxbox, but with the advantage it's built on a more recent code base.
Would I change? Possibly. The $83 machine is very much a play machine and running something different might be fun, and as I said all I need for 90% of everything is a web browser ...
Friday, 16 October 2009
Most either have a blog site running on a machine under their desk (I exaggerate, but there are cases frighteningly like this) or by using an externally hosted service such blogger or WordPress.com.
As we know from the JournalSpace fiasco, and as we've seen with other services such GeoCities and Macmail closing down or just plain disappearing we do need to think about the long term preservation of blog content, if for no other reason that we cannot be assured of the quality or reliability of other people's backups, or the likelihood of external free hosting providers wishing to continue to provide a service.
However I have just happened across a paper presented earlier this month at iPres09 which suggests a breathtakingly simple methodology - essentially take the RSS feed of a blog and repost it to a private instance, where the content can then be backed up in a manner that guarantees its long term availaibility.
It's an extremely neat idea as
- it makes no assumptions about the blog being backed up other than the availability of an RSS feed
- no special software or configuration is required on the host rendering it ideal for archiving externally hosted blogs where it unlikely blog authors have system level access
- posting to a private instance means that the archive can be kept dark until such time that access is lost to the original content
Tuesday, 13 October 2009
As always the article was accompanied by the faint sound of empires being defended as reputations, and years of literary scholarship, were felt to be under threat by the rude mechanicals.
Actually it's quite a clever technique. For example the chronicle for Fredegar is in part thought to be derived from Gregory of Tours History of the Franks and compiled by three separate authors.
Statistical analyses of phrase frequency (for that's all the plagiarism detection packages really are), would let us show whether that was plausible.
Equally, for many medieval texts there is no definitive source. All there are are copies of copies from which we synthesise a likely translation. Nothing wrong with that, translation has always been in part a creative activity to make texts read well.
However, what the plagiarism detection systems could possibly do is allow us to see which texts most closely resemble each other.
So if we have four texts, A, B, C and D and we can show that B closely resembles A, and that both have reasonable resemblance to C as also C has to D, but that D differs from A/B more than it does from C we could guess that A/B are copies of each other, that one of them was copied from C and that D was copied from C separately, perhaps by someone else entirely.
Scholars have been doing this by hand for years, and possibly with greater accuracy. However computers are good at counting things, and with cheap OCR and the digitisation of transcriptions of the various manuscripts via programs such as GoogleBooks, it would be possible to run these analyses relatively simply and cheaply.
Even if all it does is confirm existing scholarship we have learned something. If it throws up something else that could be rather interesting ...
Monday, 12 October 2009
Over the years, people have tried various solutions, such as fake eyes and cable ties attached to bike helmets, none of which really work, even if they do help brighten up the urban landscape.
Some guys here at ANU tried some experiments to see what works and what doesn't:
http://www.youtube.com/watch?v=9wHreVKgOT4 and http://www.youtube.com/watch?v=ES_n4DgJDHs
I'm now looking forward to a craze for day glo afro bike helmet covers ...
After various programs ended in the early eighties they took the dish down and removed the infrastructure, just leaving the concrete foundations and the iron stubs of the dish mount.
Apart from some sign boards and a stainless steel marker to commemorate it's role in history it's utterly enigmatic - sure you can see the remains of a shower base and tell what was concrete floor and what was car park, but it's basically all going back to bush, other than a small area used as national park campsite.
And that's interesting as it shows just how enigmatic and difficult to analyse sites are. Here is a a well known site, where the plans are known. But over thirty years tree roots have started to lift the concrete, the tarmac has started to decay, soil and leaves have started to build up and the scrub has started to grow. And that's on a site visited by tourists and bushwalkers, and occasionally tidied up by national parks staff ...
Monday, 5 October 2009
Our journey took us through Gundaroo, which we've always liked, and once even thought about buying a house in, when suddenly the penny dropped - with wireless a wireless data connection you really don't need infrastructure, ie with a skype in number to let people who only have landlines call you and a decent enough data plan you can live anywhere within range of a 3G service - you really have cut the cord, just like these irritating adds for unwired where the woman cuts the phone cable and drops her old phone in the bin, and indeed just like our asian floor installers have done.
Sociologically this is quite interesting - in Canberra where there are a lot of people who rent, not to mention a large floating student population the idea of a box you take with, plug into the wall wherever you lay your head, and there's your phone, your internet, your life, why would you bother with a fixed connection?
An there are other implications. Just as in Morocco where the cellphone network has effectively replaced the fixed wire network - greater reliability, greater coverage, greater penetration, one could imagine that in country Australia wireless broadband being a sensible alternative to stringing fibre optic cabling round country towns.
However, once one gets to the cities you need FTTH just becuase of the population densities - and it means you can defray your costs by renting spare bandwidth to the cable tv companies - in just the same way as in France you get adverts for phone+tv+internet packages for EUR40 a month.
However this cuts the other way - wireless broadband will never have the bandwidth to deliver these additional services, meaning the bush is stuck either with the free to air digital channels or the overpriced football obsessesed satellite services.
So wireless is a valuable stopgap for lots of reasons. What it isn't is a replacement to FTTH infrastructure, which leaves the problem of how you get fibre out to remote towns ...
Saturday, 3 October 2009
Last time, which is four years ago now, they were in a short term temporary lease shopfront in Dickson, in ChinaTown, but that's now a stylish VietNamese restaurant.
Googling for them brought up their website and their current address, now in a rather tatty industrial unit in Mitchell.
Well we went out there, looked at the flooring displays, chatted about costs, etc etc.
But what struck me was the sheer minimalism of the operation - a storage unit, a painted out office with the displays, a couple of mobile phones, and a wireless broadband connection - and I'd guess a skype account, both for overseas chats with suppliers and to provide a skype in number for people to call as if to a landline.
And that was it. The website, some technology, and the flooring. Yes sure they had stock, storage and a truck, but basically the whole enterprise would have fitted into the truck, enabling them to move from one cheap short term leased facility to another, and as all their technology was mobile no disconnection, reconnection or cabling fees.
And of course, as people only buy wood flooring a couple of times in their life, almost every customer is a first time customer, so the fact they keep changing physical addresses isn't a problem.
Strangely impressive as a business model...
Tuesday, 29 September 2009
Like the man said - who has that many gold sword pommels?
Almost all the material is of AngloSaxon origin, but much has been made of the presence of the rolled up crosses and the gold helmet strip with a quote from the book of numbers being evidence of their being christianity in west Mercia rather earlier than previously thought - Mercia being considered to be a rather pagan sort of place in the seventh century.
I think the date of the find is the real key to its possible origin. If it dates from the late 600's/early 700's as some writers have supposed it dates from a time when Mercia was expanding westwards into the territory of the Welsh successor states, and it could be argued that the christian artefacts were booty from some frontier raid, and the raiders were themselves attacked - by whom is an open question, and the hoard buried for safe keeping, to be returned for later. Certainly 67 seems low for a large raiding party, but 200 - say sixty odd thegns and a couple of retainers each seems right for a large raiding party.
Push the dates back to the late 700's/early 800's when the frontier had stabilised around Offa's dyke the argument becomes difficult to sustain, especially as Mercia was though to be considerably less pagan by then. It doen't mean however that group A didn't engage in some attack on group B and were later attacked by group C - rather that the scenario built around a raid into the welsh lands is less likely.
However the presence of some christian material in the hoard does not imply anything about the beliefs of the hoard takers, only that there were some people about who professed christianity and had people around who could write in bad latin. If the possible earlier date for the hoard stands up, I don't think the people who made these artefacts were necessarily the same as those who owned the sword pommels.
But this is all just speculation - what the hoard reveals is how little we know of the development of Saxon settlement in Mercia.
Friday, 25 September 2009
But there's one interesting feature - photographs of the objects are hosted on Flickr, not on some institutional repository somewhere. Clearly this will have been done to avoid having an individual server overwhelmed, but it does start having implications for the rest of us.
These pictures are available out there as a resource. Newer learning management systems such as Moodle 2 can harvest from data sources such as flickr, and the material can then be incorporated in course material and the like.
However this starts having implications for bandwidth and storage. Local caching gives predictable performance and ensures continued availability of the digital object at the expense of disk space. Repeated fetching of the object has an impact on bandwidth, a concern here on the dark side of the world.
Both of these options have pros and cons but both have cost implications. How significant these will be is still opaque (to me at least).
Thursday, 24 September 2009
Well I've just had one.
Twenty or more years after I last taught mailmerge using Wordstar's macro language I've just realised why the comment command was .ig - of course it was for ignore. Doh!
Wednesday, 23 September 2009
One of the more interesting findings was that students liked having a separate application - the learning management system was, well the learning management system, and felt to be owned by the institution, while the eportfolio application was felt to be private and sort of like an academic facebook or myspace page in which the student could generate personas and reveal what they wished to whom, something that accords with the Helsinki university finding that people tend to manage their privacy on social networking sites naturally.
This suggests that there may be a requirement for a service that allows people to make a personal statement or more accurately personal statements about themselves. After all to some people I'm a man who likes cats, to others I'm a roman and early medieval history geek, and yet others a respected IT professional, and to be able to separate their lists of friends/contacts accordingly ...
Saturday, 19 September 2009
Originally, given its association with beef rearing, Mount Panorama and motor racing I'd tended to assume that it was, heaving with men with shaven heads and straggly beards, utes, and to misquote that Victoria Bitter ad, brought to you by the men who've had their hand up a cow's bum.
And while it's true it was a little bit country, it was a perfectly pleasant little town with some decent restauarants and pubs, and some nice looking streets of Federation and art deco houses. Definitely not the last place on earth you'd stop - probably the presence of Charles Sturt helps keep things a bit more snazzy.
The conference itself was held a typical university conference venue with the same fairly soulless conference accomadation. If it wasn't for the screeching cockatoos it could have been Birmingham, Warwick, Nijmegen, Stanford or any of a half dozen other places I've been to conferences at.
In fact the only downside was wireless access. This was severly overloaded and went through a portal that seemed to be a bit snotty about talking to the ookygoo. Definitely a downside, and the first time I've felt the need for a 3G phone and a data contract to allow me to check my email ...
Given CSU's geographically distributed multi campus nature a robust VLE solution is clearly key for them.
Why did I go?
Currently we run an outsourced version of moodle as our VLE but use Sakai 2.4 as our collaboration platform, basically a poor man's sharepoint. We are currently updating this to 2.6. However I have a number of questions:
- Do we have too many products in the mix?
- Could Sakai replace Moodle, or vice versa?
- Collaboration requires social networking like features - does either Moodle2 or Sakai3 support these?
- To what extent will these products interwork with Googledocs, flickr and the rest?
- Could either product be used in place of a student (or indeed staff) portal eg Melbourne graduate studies portal?
- How likely is either product to deliver?
With Moodle the answer is quite clear. Moodle2 will become available sometime in the December 2009 / February 2010 timeframe, will have a new respository style content focused architecture and will have support for plugins to import and export to flickr google docs and the rest and is aimed at the shared experience. With plugins for gmail and live@edu, calendaring and the rest moodle could provide the equivalent functionality as a student portal. (See my Moodleposium post for more details)
However while moodle could undoubtedly be run as a collaboration suite, especially given its new architecture, it promotes object reuse and repurposing.
Sakai is somewhat different. Sakai 2.4 and earlier versions tended to be more buggy than moodle and rather more freewheeling in their approach leading do a degree of confusion as to how to get tool x to do y. As of 2.6 the foundation has been putting in place management measures to ensure better QA on the code and as of 3.0, not only will there be a new respository style architecture promoting object resuse, and integration with flickr, gdocs etc but there will be a strict style guide to ensure that all application conform to a common look and feel - basically if 2.4 was linux, 3.0 will be the mac - everything works more or less the same way.
Architecturally, sakai3 makes more use of other open source projects, eg shindig, apache jackrabbit to become more modular more component based, making the environment easier to maintain and easier to build. Objects in the repository store will have metadata controlling their resuse, basically access control is on a per object basis, rather than on a per site basis as in 2.4 and 2.6.
Sakai 3 includes the concept of scholarly social networking in recognition that much academic work involves collaboration, including integration with google docs, and has an architecture that allows the building of connector apps. One nice aspect is its idea of a universal inbox, one where there are sakai instant messages, and imap and pop connectors to pull email out of other mail systems.
Coupled with shibboleth and credential caching for single sign on and ical calendar subscription this would appear to give the ability to provide a portal out of the box. Sakai3 also is cloud storage aware and will scale to support very large sites with many many objects in the repository.
So basically sakai 3 looks on paper to be ideal application.
However, timing is everything. Unlike Moodle2 Sakai3 will not ship until sometime in 2011, with 2.6 going EOL in 2011. The last version currently planned off the Sakai 2 code base will be 2.7 and that will go EOL in 2012.
Sakai 2.6.0 whipped without a Wysiwyg editor for the wiki tool, zip and unzip functionality and an enhanced version of the FCKeditor.
All are envisaged for a 2.6.1 service pack.
The question is therefore if Moodle2 ships without problems, and given that there is a strong market out there for VLE solutions will sakai start looking old and tired with its 2.6/7 architecture. Sakai could start to look very poor in comparison as a collaboration architecture, although features such as the new profile tool, which is much more facebook in its approach - a facebook for learning - enhances collaboration within an institution.
Sakai 2.4 certainly works for us as a collaboration plaform and the same is true of 2.6 for Melbourne's graduate studies portal. The question really is will Moodle 2, be good enough to replace these use cases, given the desire to reduce the number of products. Only time, and testing, will tell. Certainly the demo version of sakai 3 at 3akai looks promising.
Health warning: The views in this post are my own and not those of my institution.
Well I've just finished Ruso and the Disappearing Dancing girls by Ruth Downie and it was good, very good. Got the same rush that I got when I first happened across the Silver Pigs by Lindsey Davis in the late lamented Murder One bookstore in London twelve or so years ago.
Try it - if you like early Falco novels you'll like this one ...
PS if you live in Australia or New Zealand order from the excellent Bookdepository - much cheaper than Amazon and excellent service
I'm now waiting (a) for the LuLu's and CreateSpaces of the world to start offering this as a mail order service, and (b) for your local Borders or A&R to start offering a service where you can bring in an sd card with a book dowloaded from Project Gutenberg or whoever to your e-reader originally and which you can then turn into a printed book for a few bucks
It's going to be an interesting couple of years ...
Friday, 11 September 2009
I happened across this while playing with ning to make a dummy social network, such as you would do for conference participants. What Huddle does is allow you to create mini workspaces for project groups that are private to that group, but visible to the owner. Thus you could create a set of workspaces for individual project groups, which are invisible to other members network who are not part of the project team but visible to the network owner, eg a translation workshop working on different bits of the same text.
Their education flier can be downloaded from http://www.huddle.net/business-solutions/huddle-for-education, and there's a canned presentation at http://www.slideshare.net/ULCCEvents/fote-alastair-presentation.
The real value is being able to graft a more closed environment to an open environment, so that you get this mix of workspaces in a social environment. Certainly the idea of closed and open is extremely powerful and, unlike BuddyPress doesn't make your social network too social.
Certainly worth further investigation.
Wednesday, 9 September 2009
A long time ago I responded to a Guardian Notes and Queries query about togas and horse riding.
Now when I do a search for my name all sorts of things I have no memory of saying or writing appear, but never this one - I'm guessing because no one has ever linked to the page so it floats as a little content orphan ...
So in theory, now there's a link to it it should appear in due course - we'll see.
The event was full to bursting with over 270 people attending. Made me quite nostalgic for the mid nineties when I used to organise similar events on the uptake of windows in UK universities [Internet archive link]. Plenty of people were taking pictures and live blogging, and doubtless tweeting - searching for moodleposium on google and flickr will bring up alternative takes (for example this one from UC) on the event. (I'm old school - notes on an A5 pad plus a bit of reflection and summary)
Now being a geek, I mainly concentrated on the technical sessions including Martin Dougiamas's presentations on Moodle 2 but my key takeaways from the non technical sessions I attended are:
- Social networking is implicit in most LMS use
- a proportion of users expect social interaction
- can also deliver this via a shared blog, wiki, or some other experience like shared google docs spreadsheet
- social interaction cannot be imposed - users all use the system in a different way
- distance ed students have a greater need of social networking to build a community
- Need to expect use of repurposed material from YouTube, flickr etc
- Need to move from a text centric to a mixed media environment
- Need to be clear what is pedagogy and what is there to enhance the student experience
Learning and teaching is changing, and while LMS's were originally viewed as a framework to deliver reading lists, lecture notes and podcasts of the lectures in a uniform format for students who couldn't make every class, the LMS solution has changed the university experience.
Likewise the use of wiki's, shared editing in Google Docs, blogs has changed the nature of teaching, with non-linear electronic and by implication mixed media replacing classic linear paper media. For example, a wiki as an online daybook can show how a student researching a special topic has come to grips with the material etc etc.
The other interesting thing is that no one seemed terribly concerned about privacy - technology conquered all.
And then there's Moodle 2:
Essentially some architectural changes - more modular and also more repository like (database and pointers to file objects, with implicit single instancing), user and group level access controls and a hierarchical structure - again as it's database driven this is not reflected in any underlying disk structure.
At which point your learning management system starts looking like a content management system.
The other thing that Moodle 2 does is it embraces mashup technology and has repository connectors allowing the import and reuse of content from flickr, Google docs, YouTube etc etc. There is also a portfolio API to allow export to GoogleDocs, flickr, alfresco and the like allowing the creation of day books, private collections and private views - say a course portfolio for tutors, another for prospective employers, etc - not just a CMS but a meta cms.
There will also be reasonably tight integration to both Google Apps (including a Gmail block) and at Microsoft's request and similar layer for live@edu and exchange.
This does beg the question however at which point the LMS becomes the student portal - after all all the basic functionality is there.
Moodle 2 also introduces the concept of hubserver - essentially a server to which courses are exported for reuse, to create a global course collection.
To upgrade to Moodle 2 you will require PHP 5.2.8 or greater, MySQL 5.025/PostGres 8.3 although MSSQL and Oracle are also supported. Early adopters might find Linux performance to be better than in the Microsoft environment as Microsoft development tends to lag slightly.
As the underlying filesystem has changed to a database and pointer style repository model any third party filesystem tools will break, as will any database based tools.
Currently testing is scheduled to be finished by December 2009 for deliver in February 2010, although no commitment has been made to meeting these dates.
Tuesday, 8 September 2009
Outsourcing of student email is very much a theme for this year. And I personally can never remember who has outsourced to whom.
Consequently, to keep track of this I've started a little Google Docs spreadsheet to list which university is doing what.
It's now reasonably complete, but there may be a number of errors as my methodology was pretty crude - go to each university's home page - search for webmail, and from the login screen try and guess the system used. Consequently I'm sure that there's some inaccuracies. As 'other' encompasses a range of solutions including webmail clients to other mail solutions eg dovecot I havn't attempted to categorise things further.
Of the universities checked, outsourcing is in the minority, with most preferring to host in house.
A number of universities, eg UTS, have used exchange labs as a staff email system while maintaining an older in house system for students. One might think this was a prelude to an end of semester migration of student email, but then one might be completely wrong.
Of the outsourced, microsoft is in the clear majority with Windows live.
Please advise me of any updates or errors in the spreadsheet.
Sunday, 6 September 2009
Once when watching Xena Warrior Princess (yes, I know) I was suddenly struck by the fact that the centaurs were also smiths. Now Xena had plotlines that were clearly plundered from Roman and Greek myth and history - I always suspected a rogue classicist in the script writing department - so I checked it out. Yes the Centaurs knew how to work iron, and then I suddenly realised that the centaur tale was about the invasion of a people who made iron tools and rode horses - and probably had its origins in the greek dark ages.
Such a story shouldn't surprise us. The Incas, we know, were initially confused by the Conquistadores.
So folk tales usually encompas something that someone thought important enough - either as history to be remembered - such as the earleir sections of the Anglo saxon chronicle - or as a way of imparting a warning or a truth in a way it would be remembered.
What the Telegraph article doesn't reveal is how the worked it out - certainly it would be interesting to do some cluster analysis to see how the variations were distributed through space (and time).
Friday, 4 September 2009
Digitisation coupled with Print on Demand means that old, or obscure, books can be printed when someone wants them, meaning no inventory, no stockholding, no warehousing, and thus should substantially reduce the cost of scholarly publishing.
But of course low production and distribution costs don't really help if no one knows that the book is available, meaning that the whole PoD thing doesn't really happen.
In the last fifteen or so years the process of book buying has changed due to the rise of online retailers such as Amazon, and so has book searching - the Amazon catalogue now being used as a resource to track down books as their stockholding and marketplace listings have attained that critical mass that means just about everything can be got via Amazon.
And that's what is cool about university libraries getting their print on demand editions listed on Amazon - it makes them accessible.
It won't make them rich, it won't generate scads of custom, as let's face it, the books they're running as print on demand were never that popular, but then university presses were never meant to do popular.
Adam of Usk would have been chuffed!
Tuesday, 1 September 2009
Now most of these finds were 'informal' finds made by metal detectorists, ie finds made by people who go out looking for things on a weekend. And if they find an anglo saxon penny, say, it's really cool. And if all that happens is that the penny goes into a finds box in someone's house that's the end of it. Interesting, even fascinating, but useless.
The value in the find is recording the find in a database. That way we know that a coin of a certain type was found at a certain location, ie context .If more, similar coins are found in roughly the same area, it suggests that something important was happening in terms of a cash economy - remember a silver penny was worth something like $50 - where large amounts of cash were being handled.
(In fact let's just say that a silver penny was worth looking for if you dropped it - like the man in rural Morocco I tipped 5DH - something like a dollar - for helping me. He said thank you, tossed it in the air and promptly dropped it in a pile of rocks. To me it was a dollar and if I couldn't find it easily not worth looking for - to him is was 5DH and extra bread for his family and so he set about fossicking enthusiastically to find it.)
So context and aggregation of data. Of course the datasets need to be preserved and publicly accessible to allow them to be cross referenced - meaning we can ask questions like 'do we find pennies on trade routes?', 'do we find pennies in locations where we find wine jars?' and so on.
And from a digital preservation point of view, the power of Julian Richards' Internet Arachaeology paper is showing what significant synthetic research can be carried out using publicly accessible but properly archived data sets - basically the power of dataset reuse.
And that is why we need to preserve datasets and make them publicly accessible - elsewise they're just a pile of spinning 1's and 0's ...
Monday, 31 August 2009
Well I've now seen another version - the student hunched in a corner of the library having a really loud animated conversation with his laptop - of course what they're actually doing is using skype.
Question is, will peer pressure kill the trend, or will it, like the guy who's slightly too loud on his mobile, become a feature of wireless hotspots everywhere?
Wednesday, 26 August 2009
Just now the Musee du Moyen Age is running an exhibition on personal care from antiquity onwards which give some clues. And, having seen the exhibition, I'd recommend it if you're in Paris.
It won't tell you about people's toilet habits, but does tell you about their attention to detail and personal care.
Wednesday, 19 August 2009
(And so it was in Europe this summer. The only reason (apart from not having a printer with us) we did real checkins with Ryanair and EasyJet was having bags with us. Elsewise we'd have used the checkin machines ...)
Tuesday, 18 August 2009
Do we then see the espresso book machine simply be another delivery mechanism, or indeed even have people turning up with e-books they've purchased already stored on an sd-card and having a printed version run off for them. And that means that the bookstore becomes a copy-shop (rather than a coffee shop). The implication being that people buy and download books almost exclusively online as already happens with music (eg iTunes) and then choose whether to burn it of CD etc.
(In this scenario online purchases of real books represents sort of halfway house until such times that everything is available digitally)
After all one of the things that makes books expensive is the distribution and shipping costs. If we get to a situation where people print only what they need to have in a portable non -electronic format - much in the way people print pdf's of journal articles they need to refer to, what will the book trade look like in five or six years time?
This was a really interesting project to procure and implement a digital asset management system, which bore a close resemblance to a digital repository to store for all time the digitised patrimony of the aboriginal cultures of Australia. Now, the Aboriginal cultures were oral culture and while poor in terms of physical cultural artefacts where immensely rich in terms of stories, songs, dance and the rest.
As the traditional societies broke down during the nineteenth and twentieth centuries there was great risk of loss of this material. Social dislocation, disruption, breakup of kinship groups etc etc.
However AIATSIS had built up a great store of anthropologists' field notes, recordings on cassette and quarter inch tape, film, often 8mm or 16mm, and video.
Much of these materials were in a poor state as they had not been conserved at all - in one case a box of tapes was discovered in a tin shed on someone's property after the original owner died. Being stored for forty years in a tin shed in the desert does not do anything for the longevity of quarter inch tape.
So the decision was made to digitise the materials, as recording technologies had moved from analog to digital. The result of this was a large amount of data that needed to be properly indexed and stored with appropriate metadata, and also made available to the societies whose data it originally was - digital cultural repatriation.
My part in this was to acquire a solution to do this. Previous to this I'd done a lot of work on backup solutions and had been on the UK Mirror Service steering group, so I wasn't new to the technology, although perhaps new to the concepts, but then everyone was in 2004.
Digital repositories are fairly simple. They consist of a database which contains the metadata for the digital objects. This metadata is in two parts, the technical metadata, usually things like where the object is stored, what format it is stored in, and so on, and the informational metadata which contains stuff like provenance and access rights, and the object itself which is stored in an object store, or more precisely a persistent object store, ie some form of filesystem that has inbuilt resilience such as SAM-FS or by replicating the filesystem multiple times and using things like md5 checksums to prove the copies are accurate copies and then periodically rerunning the chcksum on the files to see (a) if the answer you got was the same as previously and (b) if all copies continue to give you the same answer - this basically is a check against corruption and is is part of what SAM-FS/QFS will do for you.
Such repositories can be very large as you are not limited by the filesystems as the object store can be spread across multiple filesystems and you search for things, rather than addressing the individual file objects directly.
What you don't do is back them up. Backup is at its most basic a process where you copy the contents of a filesystem to another filesystem to give you a copy of the filesystem as it was at a single point in time. And you do this with sufficient frequency that your copies are reasonably accurate, so on a slow changing filesystem you might make fewer less frequent copies than on a busy filesystem.
Of course there is the risk that the filesystem contents might be changing as you copy it, which is why databases are typically quieseced and dump and the dump is backed up not the live database.
However, if you have a 1 terabyte filesystem and you back it up once a week and you keep your backups for six months you have to store 26 terabytes - decide that you need to do nightly backups because the file system changes so much and you're gong to keep these for the first month, you suddenly find yourself storing 22+30 ie 55 terabytes. Doesn't scale. Starts becoming expensive in the case of storage media and so on.
Of course there are ways to mitigate this, so as to cope with the changes in a big filesystem. Of course if your big filesystem contains lots of rapidly changing small files such as a student filestore, you have a different bag of problems as the filestore is different everytime you look at it to see what's changed. So you end up with tricks like writing every file to two separate filesystems so you've got an automatic backup. And of course if you track changes you can then build a synthetic point in time copy.
Now the point is that conventional serial backup doesn't scale. And if you track the changes (in a database perhaps) you can regenerate a synthetic copy of any filesystem at a specific time (within reason).
And suddenly your filesystem starts looking like a repository.
Now there's a reason why I'm telling you this. After doing AIATSIS's digital repository they asked me to be their IT manager, and from there I moved to ANU to be an Operations manager looking after servers storage and backup, and making sure that the magic kept working. I've now had another left turn and am now doubling up as ANU's repository manager.
OK, and ?
Well I went today to hear Bob Hammer, the CEO of Commvault, the company that produces our backup solution, speak. I'd gone as an Operations mager to hear about the technology enhancements and so on that were on the horizon.
What Bob Hammer had to say was more interesting than that, and very interesting from the repository point of view. In summary it was that conventional linear backup was going to disappear, and really what all the information life cycle management, and clever stuff around replication and deduping and cheap disk store was going to give you an indexed store with persistent objects and you would search for objects against the metadata on the objects - essentially e-discovery and that content, ie the value of the information was what you were preserving, not the files.
The other interesting point was that such a model means that you can decouple storage from the repository and that the repository could live in a datacloud somewhere as what was important was fast search - as long as the results were fast it didn't matter so much about the retrieval time - the google experience. Also of course we can bridge different vendor's storage and we no longer care desperately about file systems and their efficiencies. The key was the metadata database.
He also said a great many more interesting things, but it was the idea of decoupling and going for a metadata approach that piqued my interest - here was the CEO of a backup company saying it was all going to change and this was his view of the changes.
There is also the implication that the filestore contains reslience and everythine is based on the metadata approach - a bit like the google file system.
Of course the implication is that if conventional backup goes away and persistent storage looks a lot like a digital archive, what happens to repositories let alone filesystems?
In a sense the persistent store allows you to query and build a collection with object reuse by querying the persistent store's metadata as regards access and search to identify suitable objects.
So the questions are:
1) at what point does a digital archive just become a set a logic that controls the formats objects are loaded (ingested) in and allows the recoring of informational metadata? (The same is of course true of retrieval)
2) If archives are collections of metadata that point at objects distributed across mutiple object stores is that a problem - provided of course the objects are properly peristent
3) Object reuse just becomes a special case of #1, the same object can be ingested into multiple collections, perhaps eg a podcast could both be ingested in a history collection and an archive of recording made in a particular year.
4)And we need to think about what happens if an institution suffers a catastrophic failure of an object store, if all we have is a set of collections reusing objects, what happens if we lose the objects. Do we need to think about dark clones of these object stores not unlike what Clockss provides for e-journals.