Wednesday 16 May 2007

What pictures of naked people tell us about tagging

Flickr is a wonderful tool for studying tagging/folksonomies - as pictures are well pictures it's only by looking at tags that we can find content. Now one of the more common (85,000 plus entries) tags is 'naked'. Search for it and you get a whole range of images through amateur cheesecake shots of teenage girls with no clothes to pictures of mole rats by way of a whole range of pictures including an overweight young woman using flickr in the nude - incidentally confirming my predjudice that America is a deeply wierd place.

Anyway the four most common sorts of pictures are:

  • pictures of young women not wearing any clothes

  • pictures of young children playing on the beach/in the yard

  • participants in the world naked bike ride

  • participants in a Japanese religious festival



Clearly naked meant not wearing clothing to everyone who posted these pictures and tagged them that way. That's what we would expect where he meaning of the word is well known in English.

However the tacit metadata (or the associations) of the tag for the various groups were different, ie the way people thought about naked was different. To some it was associated with childhood and innocence. To others it had a sexual dimension, others solidarity and shared action.

And that's the point. In a controlled vocabulary we might distinguish between naked, nude and unclothed depending on context. In a folksonomy that's not the case unless we have a degree of common understanding where to use which synonym to more closely convey meaning.

A folksonomy implies a degree of commonality. For a small group of people working on a shared purpose that's probably fair. For a large random group of people that's not the case, even in such an apparently simple case as pictures of people who aren't wearing clothing.

Where there is no commonality here is no tacit controlled vocabulary and hence you get different classes of images tagged the same, meaning that the tags lose their value as a discriminant.

Singapore airlines provides star office on flights

Check this out - Singapore Airlines now provides Star Office as part of their inflight entertainment system.

Makes sense - a lot of airline entertainment systems run on linux (certainly Malaysian's does - there's something about seeing it reboot when you're 30,000 ft over Bali) so Star Office would be the easiest to provide in a netorked environment

Tuesday 15 May 2007

Flickr, tags, folksonomies and the logic of crowds ...

Some time ago I went to a presentation on library technology in which all sorts of people started getting all enthusiastic about tagging and ho you could get students to rate courses, content, modules background reading etc etc.

Won't work. Just won't. People are first of all selfish, and there's enough people to skew the results of a small group. The whole wisdom of crowds thing depends on having a large enough group to have a marked central tendency so that the mavericks and the oddities cancel out. That's the theory behind opinion polls. Unfortunately you probably won't get a thousand people tagging eighteenth century novels in English 101 and a damn sight fewer tagging middle english love poems. And anyway why should they - what's in it for them?

For once this isn't cynicism on my part. I came across three articles that read together basically tell the same story, and they're based on empirical research rather than predjudice:


So all these ideas of creating a folksonomy just don't work. Anyway those who control access to knowledge may have an opinion about this - tacit metadata as per a post of mine a couple oy years ago:

Tacit metadata
posted Mon, 30 May 2005 10:49:39 -0700

Went to an interesting seminar on metadata at the ANU on Friday by Matthew Allen from Curtin on Friday,

His basic thesei is that most metadata contains an implicit categoristaion model and that the model is quite rigid. Most formal metadata models are highly prescriptive with the use of controlled vocabularies etc implying a particulr view of how data is organised and categorised.

Formal metadata models are supposed to make explicit what is implicit, but actually it is more complex than that.

His point was that we all knowe that Journal X is more prestigous than Journal Y, or that such and such a university has a better reputation than another in a particular field. Access to thhis knowledge is controlled by leading practioners who impart knowledge over the years by an initiation ritual.

At this point I was struck by the immediate resemeblance to indiegeous knowledge systems – for initation rituals read graduate scholarship and for leading researchers read senior old men – ie there are people who have position because they are thought to hold knowledge of value and control access.

(As an aside in the Arts and Humanities this is based on perception and not by some quasi objective ranking – eg science citation rankings – as in Sciences. Which leads to the question of when does reasearch stop being an interesting sysntesis of ideas in a dicursive conversation – otherwise known as plausible bullshit – and become part of human knowledge. I've often wondered this about the arts)

To return to the seminar.

Digitisation has created a vast demand for metadata categorisation, such that we could imagine that the didgitisation process could never be completed. Equally this 'objective' categorisation would eventually overwhelm researchers as any online search would produce a vast number of results.

We need some way to interpret the results. To an extent we rely on tacit metadata for an implicit ranking of the value of each results.

One apporach to side step this may be to use a folksonomy style approach where practioners label results – this would use an implicit controlled vocaulary and would build a collection of resources within a particular field of knowledge – the more ranked it is by scholars in the field the more accurate the description would be and the greater the index of value of the resource – allowing the tacit to be made explicit.

Interestingly the NLA/Arrow project are encouraging people to add their own folksonomy type terms to any documents lodged.

Also struck by the possible relevance to indigenous knowledge projects and the means of solictiing knowledge by allowing people with in the community to annotate objects – and the annotations then contain metadata and knowledge.

So tags and folksonomies use the logic of crowds to create an implicit controlled vocabulary to describe the object, and if the same people tag many objects we end up with a set of common words and terms. Trouble is, and I'm repeating myself here, you need a critical mass, elsewise you end up with shit as one of the key descriptors - which maybe critically appropriate but doesn't help decide on the relevance of the material ...

Thursday 10 May 2007

Citrix, SGD and compression technologies

Had a lightbulb moment this morning - went to a presentation from riverbed this morning on their wan optimisaation kit.

Now a problem we have been grappling with is this:

Students do not necessarily use university provided computers these days, by preference they use their own to access university facilities. This in part because most of them have part time jobs and can't just drop into a computer lab for a couple of hours any more.

So the solution is to provide them access to a standard computing environment.

One way is that we provide a classic thin client environment using Citrix or Sun global desktop and that means we provide the apps, the disk and the execution space. SGD and Citrix have a low demand and run on anything (more or less) and make little or no assumptions about the line speed and because it's lightweight as far as the end machine is concerned, don't impact heavily on the performance of the machine.

Using VMPlayer and a pre rolled virtual desktop makes the execution happen locally and comes with a whole bag of assumptions about the architecture of the machine and the amount of grunt available. Coupled to this we can't predict the line speed so accessing remote documents might just suck. Even using compression technology like the riverbed desktop client (or the bluecoat one) tends to make assumptions about the amount of grunt and resources available locally.

In a corporate environment where you control the environment you can make the environment predictable by ensuring everyone has adequate hardware.

Not in a university. Students have everything from super sexy MacBooks through boring but adequate Dell or Acer laptops (or indeed any of the nameless rebadged brands from office superstores - Medion anyone?) through to tatty old desktops running linux. all in all an evironment that is only predictable in its unpredictability.

Under these circumstances the only strategy to go for is the most lightweight lowest impact most multi platform solution, and that looks like Citris (or SGD)

shredding != security

If you havn't already seen it, check out this article in the Guardian about an interesting solution to put shredded Stasi files back together electronically.

(Also I can heartily recommend 'The Lives of Others' as a movie, especially if you like dark deep movies)

securely wiping disks

Recently there was some conversation on one of the lists I subscribe to about the best way to wipe disks. The biggest gripe was the amount of time it took. It's not really about time, it's about data security. Here's my two cents on the subject:

Wiping disks takes time. Disks can also contain potentially valuable information. Deciding how to wipe and what to wipe is a value judgement.

For most purposes something like DBAN will give you a wipe to a standard that will satisfy most auditors (it conforms to standards, standards are good, auditors have to cover their backsides too), and it has the added security of making sure that that credit card number in a cached really has gone. Important, as you never know where your disks end up. One time in Morrocco I saw a whole pile of second user disks (some still with vendor stickers on them suggesting they came from a large facility manager) on a market stall.

Occasionally, you (or your masters) want to be really certain the data is gone. I once worked on a project where we engaged a company to dispose of our hardware securely. This involved breaking down machines, zeroing any static ram and having the disks cut in half by a very large man with an even larger angle grinder. You then accompanied said man to a very hot furnace where you watched him put the bits of disk in the furnace and shut the door. That _was_ data disposal.

Wiping disks is about managing risk, not time

Friday 4 May 2007

OLPC, Gmail, and the communication thing

Kids who get OLPC's get gmail accounts.

Kenya and Rwanda have bought google apps for their universities

Put it together and suddenly you have a whole lot of email access allowing people to ask questions.

Of course a lot of it will be flim flam but suddenly people in the bush have communication with people in the cities. Families are reconnected via webmail. School teachers can ask questions when they don't know things. Sudden;y this communication thing starts happening breaking down the isolation of the rural poor.

And that can only be a good thing.

Dr Nick's stalking horse

Last night I was surfing ebay for no good reason and happened across a gorgeous yellow, and I mean yellow 1970's typwriter made in Holland. And only $35. I lusted after it, because I'm (a) sad and (b) like retro things. However cute as it was I didn't bid for it, in the sure knowledge that my wife would kill me if we had to make space for such a retro device. A trained artist she firmly believes things like that belong in design museums, not lounge rooms.

And in a funny kind of way the gorgeous yellow typewriter is linked to Dr Nick's One Laptop per Child programme. A lot of people in developed countries want one because it's cute, looks good and is a talking point.

But it's also a stalking horse. People might start using these cutesy linux powered low cost boxes seriously as 'take anywhere' machines, and with links to things such as google docs provides lightweight cheap portable computing, just as various boxes like the Compaq Aero did 10 years ago.

Couple with some thinclient stuff and you're away.

And suddenly these cutesy boxes aren't so cutesy anymore, and repackaged in grey and silver start looking like business machines.

And that's what Microsoft is worried about with its $3 give away to education users in poor countries. An old operating system on second user computers in the third world isn't a corporate threat. A cheap low cost linux based platform in the first world is ....


Thursday 3 May 2007

last week


windswept tree i
Originally uploaded by moncur_d.


Took last Friday off, went bushwalking up Guthega Trig, felt really relaxed and had all these interesting blog entries planned.

Then on Sunday got a call that both the web caches were dead, and it's been kinda downhill from then, fixes reports, meetings other fixes, so much so that all these really cool ideas have just gone out of my head.