Stuff, geeky stuff: 07/01/2010

Monday 26 July 2010

Locals and Tourists

Something absolutely fascinating:

Locals and Tourists is a Flickr set that analyses location data photos posted of particular cities and the hometown data of the photographers to show where tourists take their pictures and where the locals take their pictures.

Basically:

tourists take more pictures
most tourists photograph the same locations
locals tend to be more diverse in the range of locations photographed

No surprises in any this but it shows very neatly the power of metadata.

Logins for life

An interesting project from the University of Kent in the UK:

... by Kent University to assist in the Logins for Life project. I am seeking to talk with others in HE who might be considering researching, or already have researched, similar areas. The project is concerned with introducing a life-long digital identity for those who come into contact with the university. We will also explore the use of social networking technologies to access University services and look at the use of OpenID , Facebook Connect etc. as authentication methods. You can read more here:

http://www.kent.ac.uk/is/projects/loginsforlife/index.html

...

Now I've written elsewhere about the problems of knowing who you are, or more accurately being able to demonstrate who you are.

Now increasingly, especially as we now have something like 500 million facebook members on the planet, 500 million hotmail/messenger users, and a little under 200 million twitter users, with about the same number of gmail users these services have started to be come the default authentication providers for a number of other services.

The only problem is that they don't have a great deal of quality control on their data, for example my cat has a gmail account, so one wouldn't necessarily trust having a gmail account to mean anything more than you exist.

However, if one can show that one has a gmail account and say a paypal account, we can presume that you are a reasonable and trustworthy person, if for no other reason that having a paypal account requires divulging verifiable financial information and the banks are quite good at checking that you are who you say you are.

So maybe that's the answer, allow people to create accounts willy-nilly, but for anything with access rights require proof of something such as a paypal account or other trusted provider ...

Bark paintings context and dreamtime

Yesterday I tweeted a link about the discovery of Aboriginal rock in the Northern Territory that manages to date (approximately) initial contact between the Maccasan's from Sulawesi with the Yolngu from Arnhem land.

Now we've known about these contacts for some time - there's a bark painting of a Macassan fishing boat in the National Gallery for one obvious example, and a few years ago when I was talking to a Yolngu elder about my interest in history and archaeology, he counterd by telling me about his interest in the history of the Macassan contacts.

And doubtless there are dreamtime stories about these contacts. Finding art datable to the 1600's pushes back the date of contact and hence the relative date of other dreamtime events which are dated from their mention in other stories, just as being able to date the Trojan War allowed us to date other events in bronze age Greece a little more securely.

Saturday 24 July 2010

Windows messenger live beta

in a moment of weakness I installed it. This was a mistake. I don’t like it.

The original messenger was a useful lightweight tool for one on one conversations with people about technical issues when debugging things. Like good old VMS phone or Unix talk it was a valuable tool, and also let people know when you were online.

The new messenger is bloated, wants to take over your life and bury into your social networks, take over your on board camera. No, sorry, I use skype for video chat. Messenger is for some simple text based purposes only.

In short the new messenger is not a communications tool it’s bloatware

Oh yes and it changes bing to be the default search engine on my firefox install and installed the bing toolbar without asking, and changes some other settings such as your nice handcrafted id picture.

Now it is true that after about ten minutes of fiddling I turned off all the bits I didn't want and got it back to performaing as its old lightweight text based chat application. But I shouldn't have had to do that.

And I'm technically astute - what does someone who's not do?

Personally, I’m going to ditch it as soon as I find a useful light weight alternative to this bloated bit of monkey poo. Your milage may of course vary ...

Friday 23 July 2010

Cloud storage on a budget ...

Very interesting and incredibly cool (supports my predjudices against logo'd boxes) and they include a parts list:

http://www.backblaze.com/petabytes-on-a-budget-how-to-build-cheap-cloud-storage.html

Might be interesting to compare costs against Capricorn - who used to make the google storage nodes (and may still do) - and Iron Systems who have provided the hardware for some digital archiving projects including Clockss. It would also be interesting to know how they play with Gluster ...

(hat tip to Arthur for the initial link)

Thursday 22 July 2010

Open Solaris 2009.06

Some time ago I built OpenSolaris 2008.05 and 2008.11 on VirtualBox aand had various arguments with the software alone the way. To be fair 2008.11 was pretty good it was '05 that was rough.

Well today I've built the latest version on VirtualBox and I'm pleased to say it just worked, and installed quickly and cleanly. As always you have to install OpenOffice post install but saving that things were pretty slick.

The question really is, I guess, is where will Open Solaris go post the Oracle takeover of Sun?

It would be shame to see a project that was finally delivering so successfully just fade away ...

Monday 19 July 2010

Wordpress as content management

Blindingly simple idea ...

Most corporate websites consist of some pretty pictures and some text. Some of the text never changes, some of it does, either to a set frequency or not.

The blindingly simple idea is to replace the text that changes with an rss feed display of a blog content, and any rapidly changing stuff with a twitter feed display, a little like the twitter feed on this blog, ie use wordpress and twitter to feed the dynamic content and leave the static content alone, which can be managed using something like plone.

Advantages:

content comes from a qa'd source
content can be reused eg Arts website might want to show last three posts, central only the last
content creation is simple and straightforward
can use tools like live writer if necessary to simplify content creation
content can be updated from just about anywhere
can easily segment content and authorship reponsibilities
central display website is easy to maintain

Disadvantages

perhaps slightly more restrictive than other ways
still need to maintain static content repository using a product such as plone

we interrupt this blog ...

Hello,

this blog turned out as a sort of work blog, drifted into being a mixture of work and personal. Trouble was that some people wanted the work stuff and some the personal stuff.

I've been toying with the idea of splitting the blogs and have decided to start a second friends and family type blog, which is where you should go for cat pictures, and updates on what we've been doing. This is where you should come for the information technology/archiving/historical stuff.

Hopefully I'll do better in maintaining the split than I have between workwork and semiwork wikis ...

Lease purchase of student pc's

I've been thinking further about the Wits PC tender.

When I was at York in the 1990's we tried to put in place something like that as well and went out and talked to various manufacturers and distributors. Basically it didn't work - no one could come up with a deal that was sufficiently attractive over three years that had a repayment level that let the vendor and their finance company make some profit and allow them to cover themselves for the perceived risk of the students absconding. It probably didn't help that we wouldn't (a) indemnify the vendor against loss and (b) couldn't guarantee a minimum uptake.

To put this in perspective at the time mobile phone companies were pushing pay as you go as the customer paid up front for the phone and the minutes and they didn't have to worry not paying. Now, of course, as phones are relatively cheap they love the $49/mo deal as they essentially make their money out of the minutes you don't use (that and persuading people to buy an iPhone at $49/mo when another vendors phone at $29/mo would better meet their needs)

Now I simply don't know anything like enough about how things work in South Africa to comment on the likely success or other of the Wits scheme, other than to say we tried it in the UK in the 1990's and it didn't work, and other people have tried it more recently in Australia and again it didn't fly. Possibly if Vodafone (or whoever) could do an attractive netbook+3G deal it would work, but then again selling iPads with their intrinsic shininess factor helping them walk out the door probably makes that unlikely.

However what did work at York was a computer recycling scheme. Essentially, if PC's are on a three year replacement cycle you rapidly end up with a lot of PC's that are still usable but are just that little bit too old to have much resale value, making disposal a problem. Gettinga group of students together to recycle computers worked well, and in combination with open source packages allowed students who couldn't afford one any other way to get a competent machine.

Now these were desktops, not laptops, but then it was also the late nineties and people didn't walk around with laptops then. But it did solve two problems - pressure on student computer labs for bread and butter computing and disposal of older equipment, using a model that probably everybody came out reasonably well from ...

Friday 16 July 2010

Student computer labs again

Came across a tweet from Witwatersrand University in South Africa. They have an RFI out for the provision of all Wits students with laptops or netbooks.

Now that's provocative. They are roughly twice the size of us with half as many staff and only around 50% of their students have regular access to a computer. Crucially, they argue that rather than provide access via student computer labs they want to put in place a solution where students have their own computing device, with uninterrupted access to enhance the learning experience.

I'm going to guess that Wits students are more like our demographic 10 of 15 years ago, relatively poor and hence unable to tin up for a computer in one hit, and that WITS is hoping to ease the costs of computer ownership, perhaps through something like a mobile phone solution - you know the deal, pay $49/mo, get a certain number of free calls and free data and after 24 months you get to keep the phone - effectively a combination of service charges and lease-purchase. (And crucially saving you the costs of recovery and disposal of end of lease devices.)

As readers of this blog are aware I periodically rant on about how student computer labs are antiquated, and in one sense we only have them because once we had rooms full of terminals connected to time sharing computer systems. If we'd never had sheds full of vt220's we might have done something different. But we already had the rooms, so it was easier to re-equip them with computers than think outside of the box.

Now unlike Wits who assume they have 50% penetration, we tend to assume we have around 85% penetration, which would mean we only have around 2000 students without regular access to a computer.

That figure of 85% is of course based on what we ask incoming students, I suspect that in fact penetration is close to 100%, ie all students who require a computer to pursue their coursework have access to one, Of course some of them may be second hand ex government cast offs, but given you can buy a Samsung netbook for $500 and a reasonable laptop from Dell for $700 I would be surprised if there were vary few financially disadvantaged students without access.

And of course all these users are self supporting, printing at home, buying their consumables from officeworks or sevenstar etc

That then begs the question of why we continue to provide student labs. Certainly not to print - we have a solution to allow students to print on campus. And certainly not for wordprocessing. If you can't afford Office, there's OpenOffice or Abiword, or GoogleDocs at a pinch. Not for email, most of them either use our webmail service or their own - essentially at least half the student body have self outsourced, and not the VLE, that's web based - in fact basically the only reason left is for access to specialist software as well as a computing provider of last resort when a possum pees on your laptop.

So let's look at the cost of providing public access labs.

We have around 1000 pc's and 300 iMacs. Let's say the pc's cost us around $1200 each to purchase with three years onsite maintenance bundled and the macs $2400, or a little under $2M in aggregate every three years, or an average of $670k per annum. Add in the salaries of the people who check the labs, swap out broken pc's and we're probably running at around $1M a year.

And a million dollars a year would buy a lot of virtual desktop infrastructure. Assume for a moment that the software contract administration, software licensing costs and the costs of packaging and distributing software are the same with a vdi solution. And assume that with near 100% penetration and good to excellent wireless coverage everywhere we can assume that everyone on campus can access a virtual desktop.

People still with orthodox pizza box machines would of course access the vdi infrastructure via their isp, with possible cost implications. But I'll fly a kite here and say that most students have laptops or netbooks - unlike 10 years ago the entry cost for a reasonable laptop (or netbook) is considerably less than that for a desktop, and they all have adequate grunt - remember all they really need to do is run a browser.

In fact if we assume that we can get 24 virtual pc's on a blade (not an unreasonable estimate as a blade will support 16 virtual servers) and we can put 10 blades costing $1500 each into an enclosure that costs us $20,000 including maintenance, we get 240 virtual pc's for $35,000 or a 1000 for $140,000 leaving us with a little over half a million to play with for extra virtualisation software licensing costs, establishing a loan pool to deal with the 'possum peed on my laptop' problem, staff training etc

Now that is provocative ...

Tuesday 13 July 2010

Electronic laboratory notebooks

I was recently in conversation with some geneticists about this and it's an interesting problem.

Lab notebooks are much more than a statement of what was done when, but in fact can be a statement of record as part of commercially focused research, especially in the biomedical sciences.

Equally human factors come into play and it is unrealistic to expect researchers to enter data at the end of the day and associate metadata with it, which is why the old hardback notebook is still with us. Paper is extremely versatile as recording medium, and unlike hi tech devices, can usually survive close contact with reagents.

One could imagine a thought design for an electronic solution however:

Provide a custom application running on a device such as an ipad - it's on, it's on the bench, data can be entered as the day goes on. On startup each day the application asks the researcher to confirm name, date, and project code. The device needs to be simple to use, versatile, relatively stateless and disposable. Labs are messy places and things go wrong. A web based application is probably better than a locally running application. A netbook could be equally suitable.
Researcher enters notes sequentially throughout the day. The notes are stored in a buffer on the server and retrievable in the event of a crash, network, or end node device failure.
At end of day, researcher hits 'commit'. Posts are nicely formatted and posted to a Wordpress blog. Grant code and researcher id are encoded in post header. Blog is a private blog only viewable by members of the research group for review etc
The RSS feed from the blog is monitored by a second application. This takes the blog post, generates appropriate metadata based on lookup of researcher and grant id's, generates a pdf and ingests it into a dark repository, perhaps a private instance of dspace, as a statement of record.

Now this isn't just for genetics and other bench sciences - one could imaging such a sloution being usable for anthropological field notes, in the behavioural sciences, and even as a research diary in paleography, in fact any discipline where there is a degree of repeated work and observation.

Anyway, after the discussion, I was sure that there would be a solution out there. And a number of people have tried this, mostly using wiki software but as yet there seems to be no killer solution.

However it does appear to be a live problem. I've put together a page of links on this topic which I'll maintain as my understanding of the requirements develops. Comments and suggestions for additional links are welcome.

Thursday 8 July 2010

Only disconnect ...

Every year about this time of year there's at least one post or newspaper article about someone going to spend the northern summer in a beach shack with a composting toilet and - gasp! - no internet - and how they're going to spend a fulfilling summer reading, painting, going skinny dipping or whatever.

And it's an attractive idea. But nowadays, when airlines like you to check in online, and you need to call people, email people, and use other booking sites you can't really disconnect. You need a travel computer, and just possibly a 3G modem though so far I've resisted that. (A week in the depths of rural Greece later this year may change my view on this however ...)

However, last week when I was signed off recuperating from having my wisdom teeth out I had a revelation.

To explain, a lot of my work involves knowing about things, essentially technological evangelism to work out what our strategy is or should be and then getting people with the right technical skills to do the work to make it so. This means I read a lot - blog posts, press releases, vendor presentations, email, twitter. And it would be terribly easy to get overwhelmed by this great big booming confusion. And of course 90% is shit, but the key is knowing which 90% to discard.

Anyway, when I was signed off, I still kept up with things, and would spend about 90 minutes going through the rss feeds, email and twitter stuff, perhaps an hour or so in the morning, and another thirty minutes before dinner. And I covered all the stuff I would normally cover, even though I didn't do all the follow up work I would normally do

The rest of the time I read books, ie real paper books, not the electronic sort, watched a really interesting series on planetary astronomy I hadn't got round to watching, stacked the wood fire and fed the cat.

And that taught me a lesson. Structure your day. In a world where apparently over 30% of young women check facebook before having a pee in the morning the key point is discipline.

Yes you need to do these things, but you also need to do other things. And that includes making time for yourself. So as of now I have a walk across campus to pick up my mail, pick up my discounted copy of the Australian, and then come back to my office and sit down and read the paper while I eat lunch.

I don't go and stare at a screen, which I must admit was my terribly slack habit previously. I've also given up listening to work related podcasts on my ipod over lunch.

Simple things, but it gives me some fresh air and time away from the screen.

While I've always been disciplined - for example I never answer my phone if I'm in a meeting with someone - what I learned is that things will always wait, and that if one plans one's working day you can get everything done and more. And if something really won't wait, someone will tell you.

Wednesday 7 July 2010

haiku on virtualbox

I've kind of been neglecting playing with operating systems ever since the $83 machine died, closely followed by my mac at work, but I've recently started playing with running images under virtualbox again. Actually I'm being lazy - I can't be bothered building islandora from scratch so I though I'd try their virtual appliance.

Just for fun while waiting for islandora to download I tried the latest haiku (beOs) pre-rolled vmdk appliance, with the latest virtualbox and ... it doesn't work. Not spectacularly, just a blank desktop with a mouse cursor. Enough to run, not enough to work out what's not working.

Googling seems to suggest that it works for some people but not for others, but neither the haiku or virtualbox website has anything later than 2009, so back to crunchbang for a lightweight desktop ...

Tuesday 6 July 2010

mendeley, zotero and the cloud

sometimes I feel as if I've been asleep for a thousand years.

So with zotero. I saw it first in 2007 at Educause in Seattle, thought it was really neat, would make a good end note killer, told a few people about it and then did nothing with it. Mendeley I've never played with. All my interesting, work related pdf's sit on my windows live skydrive in a great chaotic heap. Call me Nennius - but even when I was a researcher and one built collections by writing the details on index cards and organising them I was never particularly diligent. More a pack rat with a good memory rather than organised.

Not good.

Of course what Zotero and Mendeley do is allow you to build collections, and put metadata around them, ie impose structure. And of course once you've imposed structure you can make your collection available and searchable. Couple this with BibApp and Vitro/Vivo and you've begun to build a social network for e-research which allows people to track down people working in a related field and browse their reference material and share your own.

And of course to do this you need shared storage, which means the cloud. Repositories might reduce the number of items in storage, but will never quite replace the need for some crowd storage as there will always be idiosyncratic material that doesn't quite end up in repositories - pdf's of new scientist review articles.

As for me, I reckon I need to start using these tools properly in order to understand about them, in fact become more structured myself ....