Thursday, 28 March 2013

Getting an old IMac G3 to be useful


Well of course I couldn’t resist.

Having got an old G3 to boot to the command prompt I had to try and get it running a desktop.

First of all I got myself some more memory for it. Everything I read suggested that even the lightweight versions of Lubuntu needed a bit more than the 256Mb Xubuntu 6.04 had been happy with and it was less than twenty bucks for 512Mb.

I had half hoped that the desktop would start working with the extra memory plus a decent xorg.conf to cope with the G3 display, but no such luck - I had the framebuffer problem. Reading the FAQ’s revealed that there was a known problem with framebuffers and 12.04 meaning that the video basically didn’t work.

The solution was to upgrade to 12.10. Easier said than done. The live CD was anything but, and the alternate install disk image for 12.10 didn’t want to play, so after failing to install an older different version of ppc linux, I reinstalled 12.04 and started painfully upgrading linux.

This was a painful exercise, as the old G3 was connected to the internet via a wireless bridge to our home network and then to the internet by our extremely flaky ADSL link - too attenuated, too much noise. Let’s just say that downloads and streaming media can cause our link to go and sulk severely, and trust me, it can sulk for Australia.

However, persistence does pay off and the command sudo apt-get upgrade --fix-missing is a wonder of resilience, and you get there eventually. 

Actually you don’t. I still can’t get a desktop manager to work despite upgrading everything fully. Currently sudo start lightdm bombs out just after saned starts. I suspect it's still not starting the video drivers correctly

I probably still need to spend a little more time with xorg.conf and tweak a few other things and hopefully it will burst into life.

The sensible question to ask about this exercise is why, why bother spending time trying to get a 12 year old machine running, when actually you secretly know it will sit in the garage and be used occasionally to check email and write the odd bit of text, something that could be done from the command prompt, for example by installing alpine and configuring it to work with gmail and creating markdown structured text with nano, and then mailing it to myself to then create decent quality odt by using pandoc, or indeed implementing a script to push it to dropbox.

And I have to agree, rationally there’s no sense. If I wanted or needed a linux machine at home I should have bought myself a refurbished thinkpad, or even an old ex government pc for less than a $100, after all if it will run XP it will run linux.

But then there’s the learning aspect. Under unix (or linux) I’ve always been a command line person when it comes to actual computing and while I’ve played with X Windows (and Dec Windows) I was actually a bit vague about the internals - my approach being to install an ubuntu or debian CD and go from there.

Doing this I’ve learned things, and had a had a little bit of fun along the way, which was really the whole point of the exercise …

Wednesday, 27 March 2013

Reading things later

Google reader is, as we know, finishing.

Like every other reader devotee I've gone through the agony of deciding what to do. I've personally decided on netnewswire for OSX and feedreader for windows as something that does what I need. I've not been able to find a web based application that does it for me, though feedly comes close.

Hopefully I'll be able to keep them in sync - as a transition tool they're both good as they sync nicely with Google Reader. What happens in July is anyone's guess.

However, this is not what this post is about. It's about reading things later.

Up to now I've seen things on the web that were interesting but that I didn't have time to read, so I clipped them to Evernote to read later, and on some occasions I actually did. The whole 'read it later' app scene had kind of passed me by. However both of my candidate newsreaders come with Instapaper buttons - basically a service for caching posts to read later.

Being able to put all the posts I want to read later in a single place is useful, and with an Android client, it means I can download the content to a tablet and take it to read in the sunshine - or indeed the bus.

The other nice thing is that Instapaper lets you push content to Evernote, or share it in a variety of other ways. I personally find doing it this way easier than clipping the page to Evernote, reviewing the content and then deleting it, as I always have a small cache of pending stuff. Both complement each other and it does make managing the information flow considerably easier ...

Tuesday, 26 March 2013

Persisting identity


Digital identity is an interesting problem. If, for example, you google for me you’ll turn up a number of dead email addresses, including


as well as some ex work addresses, some others which might or might not work, and at least one that has been reallocated to someone else entirely.

In other words, email addresses are not persistent.

Which is a pity because if you are going to assert identity, ie you say that you are this uniquely identifiable person, email addresses would seem to be a good candidate - it’s where all these standard things like your gas bill goes, just like in the old pre electronic bills, a utility bill could be accepted as evidence of identity and that you might actually live at th street address listed.

So email addresses don’t work. Twitter handles might, Facebook logins might, purely because there’s 800 million of them, and they are fairly universal.

Now, why am I interested in this problem? Can’t we just use their names?

Well, no. People change their names. In some cultures names are not fixed things and change according to context. And sometimes they do it for convenience or just for the hell of it

For example I once had a young Chinese woman work for me. She gave herself an exotic sounding first name, Rainbow, which sounded vaguely like her Chinese forenames and used the transliteration of her Chinese family name as her surname. And as is common her email address firstname.lastname@institution.edu, or more accurately western_name.lastname@institution.edu.

Yet to the government her name was her chinese name and her personal email address was based on that name.

Now let’s get hypothetical here. Let’s suggest she published a conference paper under her western name and then a journal paper under her chinese name. If we wanted to build a bibliography we need a key such as ORCID or the NLA Party identifier to tie them together.

But that leaves us with the problem of those that came before. For many years journals have recorded email addresses of the authors of papers. As we’ve seen email addresses are not persistent and most institution turn off accounts sometime between zero and ninety days after you leave. The institution may remember you are there, but there’s no link between the old pre ORCID address and your new address.
Some services like academia.edu use name matching to try and come up with lists of publications, but of course this all falls apart as only names of white anglo males are reasonably perisistent.

ORCID and name matching may provide a solution for academics.

But academics are very much a minority population.

If you start looking at ownership of cultural materials by people who are not white anglo males it gets difficult. People can have different names at different times in their life, be referred to by one name if they are kin, and another if they are not.

In other words you need a common key, and while Facebook id’s might be  good starting point they tend only to reflect someone’s current public identity, but one needs a service to tie someone’s identities together, just like ORCID.

This is of course important for people such as indigenous artists, musicians and and story tellers, especially as in these days of digitisation tracing the origin and ownership of material is increasingly a formal process.

In the old days one could presume that someone who allowed themselves to be recorded for anthroplogical or linguistic research was doing it as a private agreement between researcher and subject - nowadays because dissemination is so much easier you need to both obtain permission for reuse and be able to identify the author or performer, or be able to say reuse is not permitted.

I don’t have a solution. However what it does mean that when recording identity related data even in something as apparently straightforward as a consent form we need to be aware of the possibility of reuse and also that names are not truly persistent

Friday, 15 March 2013

exiting google reader

Ok, we've whinged,  signed the petitions, and annoyed loved ones by going on and on about it, but big G is killing reader. Fact.

So what to do?

well there's three months yet so procrastination may bring benefits, something may happen, but you always need to have a plan B. In my case doubly so as I plan to be away for part of June.

First things first. Use Google Takeout and save your configuration data - essentially the list of feeds. I saved mine to dropbox so I could move it easily between machines.

Google doesn't give you a single list of feeds but a zip file containing a whole lot of geeky stuff and a feedlist. The file you want is subscriptions.xml.

Some newsfeed readers want a file with an opml extension as opposed to xml. Subscriptions.xml is in fact an opml file so simply copy it to something.opml eg

cp subscriptions.xml myrssfeeds.opml

Then the next thing to do is find some possible substitute applications. If it wasn't for the fact it's not quite there yet, Feedly would look like a good option as it works as a chrome application and has an android client allowing it to be used natively on my tablets.

However Feedly isn't yet bullet proof so I decided to try a couple of desktop applications, Newswire on OS X and Liferea on Linux as an extra backup plan.

Both have the simple three pane 'Outlook' style interface and are both eerily reminiscent of the old Pan usenet new reader. Newswire will sync with your reader account, Liferea wants an opml file. Both work reasonably well, and Newswire has a nice feature to clean out inactive feeds.

I've not yet tried  a windows application. My other thought is to get them to use a common opml file on dropbox for various applications to allow me to hop between platforms but I've not yet tried that.

Last time I changed feed readers it was from Bloglines to google reader and then, as the world was a simpler place in 2007, it was a simple move from one webpage and subscription list to another. Now as we're considerably more multi platform and multi device it's a tad more tricky ...

Thursday, 14 March 2013

Google Reader going down the cosmic toilet

I'm sure we can all see the irony of using Google's blogging platform to report that Google are sending Reader to the garbage can in the sky.

I must admit I'm fairly hacked off about this as, being someone who flits from machine to machine having a synced reader across all machines and platforms was incredibly useful. Reader itself was useful as a means of aggregating newsfeeds and tracking what was happening out in the big wide world.

I've been using Reader for years with the result that I'm not up on the alternatived but there are some likely looking candidates - I've tried Newswire on OS X and it certainly looks feasible.

This whole issue of course raises the general question of what happens when a 'free' service goes away. Google of course are under no obligation to continue services ad infinitum, and have both (a) given fair warning and (b) an export mechanism.

We're dependent on similar 'free' services for email, photo sharing and the rest. We tend to assume they'll be around for ever. This is clearly not the case and we always need to have a plan B in place ...

Tuesday, 12 March 2013

Newspaper archives and standards

As mentioned, I've beenplaying with QueryPic to track down contemporary Australian newspaper reports of the American Civil War commerce raiders Shenandoah and Alabama.

None of this is new research - it's well known to historians, but as well as being an intrinsically interesting story what I find interesting is the effects of the lag caused the lack of direct telegraphic connections.

For example, when the Shenandoah appeared off Cape Otway it was headline news, not only becuase it brought far off events home but because it was completely unexpected - the nearest contemporary comparison I can draw is with an incident during the Falklands conflict in 1982 when a British Vulcan V-bomber suffered mid air refuelling problems and made an emergency landing in Brazil - to the accompaniement of fascinated Brazilian media coverage.

The reason I can fiddle about and do this is that it is all free as a result of various Australian and New Zealand digitisation initiatives.

Elsewhere, newspapers have built and financed their own digital archives, and quite obviously want to recoup some of the cost by charging.

For example, if I want to look at how the Irish Times reported the arrival of the Shenandoah in November 1865, I can search the archives and the Irish Times archive service will show me snippets of possibly relevant articles and charge me 10 euros for a days access if I want to go further.

All this is perfectly good. Servers, digitisation, indexation and OCR'ing text cost money and cost money to maintain - and given I look after similar operations in my professional life and know how much these things cost to deliver, I'd say that the Irish Times was not making a massive profit out of it.

Let's say I pay my 10 euros and find what I want and then decide to access the London Times archive. It has (fortunately) a search interface very similar to the Irish Times, and is less interested in charging me.

It would of course been quicker and simpler if I could have run the search against the two newspapers archives (and others) simultaeneously.

As it looks that both are using the same content management software one might think that there might be a common api. If so this should make developing something like QueryPic relatively simple - all I need is a common api to let me retrieve the search results and information and a way of indicating if there was a charge to access the article itself - rather in the manner of the New York Times.

Let's be clear - all that is required is that the newspaper archives concerned provide an api - I'm not expecting them to do any of the development work or indeed forego any of their charges, only to provide a mechanism to aid the development of external tools as an aid to research ... 

Friday, 8 March 2013

Being an OS tart ...


There's a bit of a meme going around about why people choose or advocate particular operating systems or desktop environments, and indeed why Linux on the desktop never made it out of sandal -land. Here's my story.

In the beginning I was a command line person. It's an age thing.

MS-DOS, VMS, VM/CMS, Unix, even George3. And for around ten years my life revolved around getting MSDOS systems to play nicely with VMS and Unix systems, including exchanging files, mounting disks and booting from them.

The other things I used to do was convert files from one format to another, At the same time we used wordperfect at work which had a common file format between Windows (DOS really in the early days), VMS (multi-user version) and the Mac – not quite you had to do a magic import, which meant that you could write a file on Windows, upload it to VMS, work on it some more, convert ito the Mac version, work on it even more, and then upload it back to VMS to print it. (There ws a Solaris version and an java version as well at one stage, but I never played seriously with them)

So, despite using windows at work, at home I had a Mac. First a classic which was stolen, and latterly an LCii. The reason being that I've always been interested in writing as a recreational activity. In the nineteenth century I'd have been a diarist.

Both provided a superb writing environment and I had a lot of fun with modems and the internet in the early, pre www days.

But then Apple fell into a deep hole, and I bought a PC for home, and started using Star Office for writing. I even bought a support license from the rather wierdly named Star Division before they sold it to Sun.

And for quite a few years I used a pc at home and a pc at work.

Then I found myself managing a web migration project, taking content from a classic hand built website to a CMS based solution. And I ended up using a desktop linux machine (using College Linux) purely because it was infinitely easier to script content conversions and build test environments.

It was also a lot easier to do a lot of the solaris management I was doing from Linux than from a PC.

Then I moved and they put a Mac laptop on my desk, and I discovered that Apple had climbed out of the deep hole and OS X was pretty good.

At the same time I started building virtual machines, and having a couple of virtual linux boxes (and an open solaris virtual box) on my Mac, such that I didn't need extra machines round the place.

I was so impressed with the Mac that I ditched my old Windows laptop and bought an iMac for home. At the time most of my time at home was being spent on the web and in Google Docs or Open Office, so when J needed to spend a lot of time using the iMac in the evening I moved over to using an old PowerPc iMac I'd installed Xubuntu on.

And that worked well for a couple of years until I found that I couldn't take the Mac any further, so I bought myself a Windows 7 laptop and was pleasantly surprised as quite how much better 7 was compared to XP – somehow I just missed the whole Vista thing.

And that's more or less where we are today. A mac and a Linux laptop on my desk at work – and a pc and mac at home, plus a an old Linux travel netbook and a slightly newer windows netbook, and a couple of tablet computers.

The thing is they're tools to get things done. If you spend a lot of time inside Chrome or on Libre Office there's precious little difference between Linux, OS X and Windows, and in fact I push files between all three via Dropbox all the time

The real discriminants are:

  • Linux doesn't have a native evernote client and this does limit its usefulness
  • Linux is much better for 'play' experiments.
  • Kate is still my editor of choice
  • OS X is stable and now has a decent Libre Office implementation
  • Text Wrangler is a pretty good editor and comes a close second to Kate
  • Windows, well it has some nice applications, but really I just use it for web productivity

So there we have it, OS X and Linux for productivity and Windows as an enabler. Probably if I had to lose one out of the equation it would be Windows, and if I could only keep one it would be OS X.

As always your mileage may vary. We all do different jobs and work in different ways.



Thursday, 7 March 2013

Managing work in progress data


As I've mentioned elsewhere, I've recently become interested in the story of the Confederate commerce raiders during the American civil war, and I've been using QueryPic to search the Australian and New Zealand newspapers of the time.

Of course, I'm not a historian, I'm a dilletante, in fact I'm a digital dilletante.

And in the course of searching for newspaper articles one thing that is amazingly useful is the ability of the Trove newspaper database to create a pdf of the article, and that of evernote to grab the pdf and upload it to a notebook with my own folksonomy of tags.

Now, of course, Evernote is not the only game in town, Zotero is also a pretty good product, but Evernote is what I use and know.

One of the problems I face in these amateur projects is the 'big heap of everything' problem – when one starts with an idea, clips a few things of interest, the odd jpeg without any clear of idea of what you're doing or even whether it's going to turn into something half serious.

When I was a psychophysiology researcher – yes I was a proper scientist once – I had the same problem – except then it was differentiating the interesting from the relevant, but again it was all down to categorisation and organisation.

I've talked to enough researchers in a range of disciplines to know that this is a common problem.

The problem comes down to the accumulation of material and then its organisation and reorganisation, at which point it becomes a body of evidence to support what we rather grandly these days call 'scholarly outputs'.

In the old days people would file their material in old envelopes, write something relevant on the envelope and if they were very organised write some relevant stuff on an index card and file it. Basically they saved the data and created some metadata around the article.

Resources have of course now gone electronic, and this is where tools like evernote come in – they allow us to capture and organise material, and annotate it – and we can organise it and reorganise it to our heart's content.

So, when we come to repositories or data archives we tend to think of places to put finished outputs, be it a conference paper or a dataset. We don't tend to think of work in progress stuff, like my Evernote notebook of 1860's press cuttings about Raphael Semmes, yet of course it is just this work in progress material that enables scholarly outputs.

Any work in progress storage is necessarily an active filestore as the material is subject to reorganisation – something that has implications for its backup and management.

As a data manager the real question is how to support this activity. As I said, Evernote and Zotero do it well, but should we also be trying, on an institutional basis, try to provide some sort of workspace to allow people to accumulate and save material, while marking up tags.

As Evernote and the like already do a good job, trying to replace them is probably a waste of time and money, but being able to provide a general mechanism to allow users to export the material once they are happy with it to a local archive server is probably a good thing as it ensures that the data is backed up and available for reuse.

The other thing is that recently we looked at data management practices in a cohort of beginning Arts and Humanities researchers. Frighteningly, a lot of them were jst storing material on their laptops and dumping it out to a usb disk. Some did use Drobox, but none made meuch use of Evernote or Zotero.

So as well as helping provide a reseource for the organised we also need to consider what to do with the less organised. Training would help, and training focunsed on managing your data rather than simply backing it up, but again there is a need for a work in progress archive solution.

The question is what to provide and how best to do it – probably some sort of relaxed content management solution would provide a starting point ...

Wednesday, 6 March 2013

sticky drives and digital preservation

The other learning from installing lubuntu is that old hardware fails, particularly when it has been in store.

Drives stick, drives fail, and finding replacement bits on ebay and the like can be fun.  Now while the cd format is well known and fairly generic, that's not the same with other media formats - you need the hardware and you need a tool to read the data in the correct way, and with some of the older tape and disk formats this can be a challenge.

But, because hardware is fallible, the idea of simply keeping 'one of everything' doesn't work. They'll die on you. Even if the files you are trying to preserve are in an 'eccentric format' - my favourite example is 1990's Claris works files - the thing that should be done is to get them archived/deposited somewhere well before you do any document recovery on them.

In other words users need to be encouraged to use online storage where possible, rather than local storage. A properly curated gzipped tar file, even if we can't read the documents is a hell of a lot more useful than a pile of floppies or a corrupt cdrw - at lest than we can try and work out which files we should be recovering and have a guess as to their format ...

[update 07/03/2013]

Of course I couldn't resist the temptation to play with old hardware and have ordered myself half a gig of G3 compatible memory for a little under twenty bucks - if  gives me adequate performance I've got myself a basic linux test machine ...

Other things an old imac showed me

Since my first reboot since 2009 of various of my stable of old machines I've been prgressively powering them up and playing with them,

Of the intel based ATX pc's one has a dodgy disk and the other - the old self built $83 machine complains about dodgy RAM and doesn't have quite enough to install a recent version of Ubuntu, and while one can run old versions of the operating system they are singularly useless - since 2008/9 we've all become more dependent on browser based applications and old browsers simply don't cope with modern sites such as google docs.

Both are, I fear, destined for the great network in the sky. That left the two crt based imacs. The older of them is limited to 192 MB memory and not sensibly upgradeable - so while it works it's come to the end of the road.

The newer one though had some promise of being upgradable to a more recent operating system.

PowerPC distributions of ubuntu are now very much a minority sport. While they are out there most of them have various problems - for example Xubuntu doesn't actually fit on a cd, which is a problem when you're trying to install it on a non-USB boot cd only machine.

I finally settled on Lubuntu 12.04 - lightweight and the distribution fitted on a cd. Installing it on the old iMac was fairly straight forward - the main problem was that the cd drive was sticky and it took a couple of go's until the mac could mount and install from it.

Other than that installation just worked - except it installed in text only mode - no window manager and almost certainly due to there not being enough RAM in the machine.

Well I'm scrapping the old ATX Pc's and pulling the memory from the one with a dodgy disk might be a solution, or if I really want to get the machine working, ebay provides a solution for the cost of couple of beers.

However, while dealing with sticky drives and old machines was fun, what it showed me was important.

I've always been an advocate of using Linux to extend the life of old hardware, and I've used these older machines to do some fairly reasonable work in my time. However what we need to recognise is that there are limits - as the average specs of five year old machines improve - and if we're using linux to extend the life of old machines that's probably the age we're going for as parts and spares are probably still obtainable - linux distributions will tend to reflect this and be targeted - whether consciously on unconsciously on that spec (1 or 2GB RAM, reasonable cpu,  say 80GB HDD).

Older machines may work - as I showed with a 12yr old PPC imac, but they will be limited simply because modern kernels and window managers simply expect more.

So when planning to reuse old machines you need to (a) test them with your preferred environment (b) ensure that the operating system chosen will be supported for a reasonable period of time, say something like the two years  you would expect with a LTS version of Ubuntu, and (c) plan your migration/exit strategy - the thing about re using old hardware is that it will fail and it will become unsupported ...