Stuff, geeky stuff: clouds and repositories

Tuesday, 29 June 2010

clouds and repositories

Cloud storage would seem to be the ideal fit for repositories, given that (a) no one ever knows how much storage they need, (b) response time needs merely to be adequate and (c) moving the content to the cloud allows one to outsource all the curatorial questions about storage, backup and so on to someone else in an architecture that looks something like this:

where the metadata server remains in house and the object store sits in the cloud.

Certainly this is the idea behind Fedorazon and is something that would be quite easy to emulate by using the repository/collection management/contet management software of your choice and something akin to Gladinet to connect the cloud storage.

And for something like Occams, which is designed for rapid collection assembly and development, this sort of architecture makes a lot of the provisioning problems go away.

And not to be unsubtle, the large cloud providers can provide terabytes of storage for less than most institutional IT services can, purely through economies of scale. I havn’t done the maths but I’ve seen figures from one cloud provider that suggest that they are substantially cheaper than we can provide storage using conventional SAN technology. On the other hand given a repository doesn’t need SAN, and either something like Isilon, or indeed a pile of Dell Equalogics with Stornext, or Apple Xsan’s might be adequate, with perhaps replication into a private LOCKSS infrastructure for preservation/backup.

However, there’s another variation, which I hadn’t thought of.

If you look at my recent post of the Kiandra mailman you’ll see that the image is sourced from Flickr.

However, that’s not the only way of accessing the image, you can access it with richer metadata from the Powerhouse Museums server – by placing it in the flickr commons, they’ve solved curation – flickr are good at looking after digital images and can provide a range of resolutions, and the image is searchable and findable with the rudimentary metadata (tags) flickr allows, yet the conservators and cataloguers are free to add as much rich metadata as they wish.

Of course this only really works for images and more particularly images in public collections, but it’s an interesting example of a public private partnership in the repository space. And given that we know academics routinely put collections of material in flickr for teaching/research purposes, whether giving something like Occams a flickr connector would allow the capture and harvesting of that material.

But then as the name of the game is federation and we ar increasingly seing more complex archtectures such as

one can begin to envisage a mixture of cloud and local storage and a range of hybrid solutions …