Saturday 29 December 2012

Copying as preservation

I’ve just posted a link to a rather worrying story about what is happening to the manuscript in Timbuktu after the jihadist takeover, but there is also a positive message in the story.

Once digitised, artefacts, be they manuscripts, oral history recordings, or whatever can be copied, and quite simply, the more copies the greater the likelihood of the material surviving.

Museum directors tend to find the idea of letting people copy their digital holdings worrying for a whole host of reasons, such as misuse of the material (do you want your byzantine manuscript being reused in a design for a coffee can?), reduction in visitor numbers, and even conservation budgets (it’s digitised, stuff the original).

In fact what they worry about is losing control.

There are a lot of ways round this – a clockss type solution for museum collections would be one, although in the ideal world a more open solution may be preferable. Likewise one could imagine using something like Amazon glacier for an escrow service.

Solutions like clockss are cheap – the hardware is not remarkable and other than the salaries of the core technical team, all of whom would fit in a minivan with space over, the organisational costs are not high.

Otherwise we are left to ad hoc peering arrangements and arbitrary decisions as tho what is worth replicating. And that tends not to be a good thing – when the monks copied manuscripts they were selective copying the things they thought valuable, plus of course the odd salacious passage to enliven these cold dark winter nights, and that’s not a particularly good set of selection criteria …

Tuesday 11 December 2012

Fun with email

I've previously written how at the day job we've changed to Office 365 for email, and how to be subversive and connect your Gmail account to harvest your Office 365 email and thus allow you to consolidate your email on Gmail.

As an act of pure geekery, I installed Alpine, the command line mail client on my linux machine and then connected it to my gmail account - as you can see below this actually lets you see your office 365 email in a terminal session:


No, I don't have a use case for this. I suppose if I wave my hands it's a way of reading your consolidated email on a cli only unix box, but that's pretty tenuous ...


Monday 10 December 2012

2012 - what worked ...


For a last two or three years I've done a 'what worked' post at the end of the calendar year– this is the 2012 version:


Undoubtedly the success of 2012. My no name seven inch tablet and keyboard combo worked incredibly well, especially as I can sit in meetings and type notes and then post straight to evernote, or indeed pull up reference information from evernote.

Virgin mobile data dongle

The star of our trip to south Australia. Data access problems solved just about everywhere apart from the seriously remote – shame though it's only for Windows or the Mac


Light, versatile, portable. Revolutionised my reading habits, not because it's better than the Cooler, but because it has a e-book delivery solution behind it

Samsung smart phone

it just works, and couple with a decent data contract, doubly useful for checking email etc on the go

Still Delivering


my 9inch android tablet is still useful – allowing me to read the morning paper electronically and check email and rss feeds in comfort, not to mention read and post to Evernote


still the best note management solution around. I must admit I'm less keen on the new interface, but it is still highly effective

Back from the dead


after my travails with AbiWord I've switched back to Libre Office and find it a good text editor, and one which makes an excellent tool for the creation and posting of pdf based notes to evernote

Hanging in there

The Asus lightweight travel notebook and my cooler e-reader havn't seen the use that they have seen formerly, but still have a role to play even if they have been superceded by a windows netbook and a kindle respectively ...


Wednesday 5 December 2012

Setting up Gmail to work with Office 365

At the day job, we've changed from Oracle Convergence as an email solution to Office 365.

Prior to the change I had set up my gmail account to harvest my email from Oracle Convergence into gmail so I could use it easily with  the native gmail client on my android phone and tablets without having to configure them individually.

It also had the advantage of allowing to quickly scan the email headers from Gmail at home without having to log into Convergence seperately at the weekend.

Of course, the change of mail solution meant I couldn't carry on doing the same thing. Actually it didn't, - seeting up gmail to harvest email from Office 365 was pretty straight forward.

I've written a basic guide as to how to get Gmail to harvest your email from Office 365 so you can do the same if you find it useful to your way of working.

The guide is a publically shared Google Doc. The guide assumes that your work email alias is of the form firstname.lastname@work.edu.au and that this is mapped onto an actual account of the form u123457@domain.work.edu.au where domain is the name of your windows sub domain.

Your local sysadmin should be able to tell you what the actual values are for your environment if you are unsure.


Thursday 29 November 2012

User interfaces and memes


Most people who use computers (and tablets) have very little technical knowledge. And most of them have only used one or two user interfaces.

People are of course conservative, which is why change is difficult.

Most people know how to use the classic windows interface – from 95 through NT, 2000, XP, Vista and 7 it became a meme – this is an interface and if I do that this will happen, so double clicking on a little picture starts an application, and there's a menu thing down at the left hand bottom corner.

Now people on the whole don't know windows, they know how to do particular tasks.

This meant that you could give someone who 'knew' XP a computer with a Linux distro like College linux and one they'd found a word processor ad so on they were happy – in the main because KDE 2 kind of looked like XP and things worked the same.

The same people tended to find xfce, as found in xbuntu, a step too far thought Mac users didn't have a lot of trouble adapting.

I'm basing this on real anecdotal experience – I've tried out both on naïve non-technical users with a reasonable degree of success.

The same people, on the whole, don't like Openbox or similar user interfaces like fluxbox - ' what do you mean I right-click on the desktop to open a menu ??'

And of course when we look at Ubuntu, which is what is most probably the desktop linux in widest use by non-technical users, the major complaint is about the Unity UI – not because it's difficult to use, it isn't, but because non technical people have difficulty applying past experience. It's just a little too different.

Macs of course are different, but are consistent across the range. iOS is again consistent across iPads and iPhones. Android is close enough to allow you to transfer skills – just like XP and KDE2. In fact the move to smartphones has made the user experience consistent across models, and we now differentiate phones on capabilities, in much the same way we differentiate laptops disk vs memory vs weight, and actually we know they are all much the same.

Compare this to a few years ago, when each phone manufacturer had their own menu system and called things by different names - changing or upgrading your phone was a major challenge.

So we can say there are three user interface memes out there – the Mac meme, the tablet meme and the XP meme.

And then we come to Windows 8.

Now I havn't used Windows 8 but I've seen enough screen shots to know it looks different. How different I don't know but it looks different.

(Confession time – prior to writing this post someone from Microsoft did ask me if I'd had a chance to play with it – I was momentarily tempted to ask to borrow a Surface, but then decided that was a little too cheeky and anyway I'd like to do any usability tests on generic kit).

Looking different isn't bad for a tablet. After all chunky tiles kind of look like icons so tapping on them should work. On a touchscreen device with a keyboard you evolve a hybrid mix of swipes and keybourd commands – or at least that's my experience of using a no-name seven inch android tablet with a keyboard.
On a non-touchscreen laptop the experience will be different.

Remember that most people silo their experience and expect to transfer past laptop skills to the new environment, not either learn new skills or transfer tablet style skills.

So just like putting people in front of Openbox they're going to boggle. This is why people complain of no start menu in Windows 8.

People will of course learn new things. Judging by the number of MacBooks round campus compared to even two years ago, a lot of people have moved across from Windows.

My guess is not because of any inherent technical superiority on the part of OS X but because Macs are seen in some subjective way of being 'better' and thus it's worth moving your skills across and learning new things.

As I said above, people are conservative with regard to computers. They will only do something, like change or upgrade, if they perceive it has value. If I was Microsoft I'd be trying to work out how to sell Windows 8 as being cool rather than better.

Most people don't care if it's better as long as they can get their stuff done, but, if it's cool that's even better.

Apple is of course the classic example of being cool rather than better. In the mid nineties Apple as nearly broke. At the time I was doing a lot of IT procurement. The early old style CRT iMacs and iBooks looked distinctly poor in comparison to what you could get from the likes of Dell, Toshiba and Compaq. At the time I though the products were interesting, especially with OS X as a unix clone, but felt that they were a last twitch of the corpse, and if anything Linux might be the serious competitor.



Round about the same time I'd spent a lot of time on building Microsoft-lite environments, as a way of controlling or minimizing licensing costs by using alternative products, either as open source or commercial.

Given that experience and the fairly rich set of desktop applications, including business tools, that Linux came with, building an open source desktop seemed not an unreasonable idea, and if it hadn't been for the power of the Microsoft Office brand to the exclusion of all others, it might well have had legs. The lesson being that people won't transfer to another product if the product they're using has 90% of the market. They'll worry about compatibility and being different. Like I said, people are intensely conservative about desktop computing – they don't want to be guineapigs – they just want it to work for them

I was also wrong about Apple.

Apple came back from the dead by building a brand on being cool and being easy to use. If I was Microsoft I'd be studying how they did it, because in doing it, Apple established the OS X meme – and Microsoft needs to establish a Windows 8 meme ...

Monday 12 November 2012

MOOCs are not the only disruptors

As wellas writing about MOOCs and their potential to change the whole university experience, there's another disruptor out there - data publication.

No, really!

Up to now scientific publication has followed a nineteenth century model. Write a paper, send it to a learned journal, who send it to some other people established in the field to make sure it's not complete bollocks and who then publish it and retain copyright, meaning that people then have to buy the journal to read the content.

Academic journals are not cheap - several thousand dollars for an annual subscription. Basically universities and research institutions have to have them,  and yet the content has been produced by their own researchers.

The model has been very successful. The only problem has been that the multiplicity of journals has meant that research libraries can increasingly no longer afford them. The answer to this was bibliometrics - identify the journals with the most widely cited papers and buy them - that way you were getting best bang for your buck.

So less impressive journals withered and died and the more prestigious journals found they could ramp up their prices and still people wold buy them as they were the 'must haves' of the scientific publication scene. You also got strange aberrations where researchers were ranked by the number of papers published in high ranked journals - 5 points for a paper in the Journal of Very Important Things but only one point for a publication in Proceedings of the Bumpoo classical society irrespective of the worth of the actual paper

Things like open access publication represent only a partial fix - basically the researcher pays for the publication process and the validity checking. That way the content stays free or very cheap and libraries can afford them.

But they are still recognisably journals as we have grown to love them.

Possibly this is an important thing, but we are also beginning to see the emergence of a journal free publication model:

Sites like arxiv.org have shown that all digital publication is very cheap. Sure all the researchers do the work, but importantly the validity checking is done by the market place. If your research stands up people will cite it. If it's defective they won't. Basically a market driven model whereby what's good is cited and what's not isn't.

Arxiv.org is just one site. Reputedly it started on a server under someone's desk. The important thing about data publication is that it's building a web of trust and the infrastructure to let you identify and follow up with individual researchers.

Apply it to the arxiv.org model and you end up with a working model for journalless publication in the societies.

Interestingly we sort of have this in the humanities where researchers are quite often  ranked on how well their books are received and how often they are invited to give seminars - effectively post publication peer review ...

MOOCs and disruptive change


Over the weekend, the Guardian published an article on the disruptive effect of MOOC's, massive online courses.

As someone's who's pontificated about universities in the past I read it with a degree of interest.

And it does contain a word of warning to existing universities. For example, at the university where I work we've put a vast amount of coursework onto our VLE, which allows students to catch up when they miss classes and simplifies and speeds up marking of assignments, and also means that large classes can be taught more easily.

Interestingly, we have arrangements where student from some other universities that do not teach some of our specialities study them via our VLE but get credited for the module by their home institution. And it's not one way, we do the same for specialities we lack the resources to teach.

MOOC's are an extension of this. They represent a step change because of their scale, but they are only an evolution of what's already happening.

The other thing to understand is that VLE based courses have limitations. They're great for all the basic knowledge functions, like naming anatomical structures or describing chemical reactions.

Great for what used to be called General degrees some thirty years ago in Scotland, where people studied a range of subjects and once they had enough credits qualified for a degree.

The real difference is where you want people to think and discuss material. In my discipline of animal behaviour it consisted of trying to work out what a behaviour meant.

In languages it consists of trying to understand better in order to better communicate complex material.

I'm sure anyone with a different academic heritage will have other examples, but it has the common thread of moving from demonstrating knowledge and competence by dealing with closed questions to being able to apply it to the analysis of open questions – something for which discussion and interaction is essential.

In other words, I've no doubts that MOOCs can replace lectures but not special topic tutorials. I may be being snotty and out of sorts with the times but I always thought the purpose of a university education was being able to think and analyse, and along the way being extremely knowledgeable about a specialist subject or two.

It's like IT training courses – its one thing to learn how to install and configure an application – it's another thing entirely to understand the end to end design of the process in which it will be used. One is analytical, the other is not.

So MOOCs will be disruptive. But not in the way people expect. Some universities will use them as way to supplement their teaching. Others will undoubtedly give credit for successfully completing them – either as foundation material or to allow students to skip some of the entry requirements for an advanced or honours course.

And some universities will stop teaching a whole range of courses purely because the MOOCs are better.

But the thing to remember about disruptive change is that it's disruptive – things will undoubtedly turn out differently to how we expect ….

Update

While we're on this theme, Clay Shirky has a well argued post on this theme that's well worth a read and much of what he says has resonance with the above

Tuesday 6 November 2012

A dilemma for these digital times


I am the proud over of two e-book readers – a Cool-er and a Kindle.

Of the two the Kindle is the newer, sleeker and more responsive and my prefered reading device.

The Cool-er natively uses epub while the Kindle uses Amazon's azw and mobi formats natively.

Now I have a fair number of books in epub format – most came from project Gutenberg meaning that I can simply re download the if I want to reread them on the Kindle. Or I could convert them using Calibre.

The problem really comes with the DRM-epubs obtained legitimately from other ebook vendors. In technical terms the solution is simple – I could take the files and process them to remove the DRM and then convert them appropriately.

I havn't done this as, whatever my opinions about DRM, it's unethical to break the conditions of use. But this does raise an interesting point:

You don't really own an ebook. You rent it on a long term basis. This means that if you change platforms say from epub to Kindle you have to re rent any of the content from a new supplier.

Ninety percent of the time this doesn't matter as most people don't reread most of their books, or if they do, fairly soon after acquiring them.

The problem is, where do you stand if your reading device dies (and remember that in the case of the Cool-er the manufacturer has gone to the great stock market in the sky) or the ebook vendor has likewise ceased to be (as in the case of Borders Australia) or stopped selling ebooks (Bookdepository)?

Yes, you still have your content, but you can't necessarily access it in your preferred manner unless you break the conditions of use.

This is different from a book – once you buy it, you have it, you can lend it to anyone you wish, or you can sell it if you no longer want it.

With an ebook that's not necessarily the case. And when I look at my collection of a thousand odd travel and history books I begin to wonder about what would happen if they were all digital and I was to lose access to them ...

Monday 5 November 2012

Spam Spam Spam !

Ever since I mentioned our possible trip to Siberia next year, I've noticed a sudden uptick in the amount of russian spam in the spam sump.

Most of it is of the fairly transparent 'hello my name is Elena ... ' type or stuff pretending to be from an internet dating site 'Yuliya has updated her profile ...'.

Now spam's spam. The interesting thing is that instead of just broadcasting at random to a gazillion email addresses they must be running searches for keywords to build lists of likely targets.

Obviously it's a fairly simple process but it's interesting that they seem to be trying to build target lists ...

Friday 2 November 2012

Are e-readers dying?

With the launch of the iPad mini and various smaller format Android tablets there seems to be a growing belief out there that e-readers - by which most people mean dedicated single purpose reading devices such as the standard kindle - are transitional devices - ie devices once poplar but destined to fall by the wayside, in much the same way that the once universal iPod has been supplanted by the iPhone as a music player.

Rather than concentrate on the e-reader and try and list its advantages and disadvantages over a tablet let's look at its predecessor - the book.

Books are on the whole light, have a compact form factor, are superbly portable, do not require an external powersource, and can be accessed anywhere there is an external lightsource.

A kindle, or indeed any e-ink based reader comes close to this usability in that they are all compact, light, portable and do not require an external powersource very often - a month between charges is normal. And like a book it can be used anywhere with an external lightsource.

And when I look at people reading on the bus, those using reading devices other than their phones are overwhelmingly using e-readers.

Compared with a book - or a dedicated e-reader -tablets are on the whole inconviently heavy. Find for resting on one's knees in be but just a tad too heavy for prolonged use on public transport. Some tablets have a rubber grippy back to make holding it one handed an easier proposition but most do not.

Tablets also have a shorter battery life - no so short as to make them unusable on a commute, but enough to mean that you need to think about charging - unlike a kindle you can't just whip it out and have a reasonable expectation of having more than enough battery for an hour or so's reading.

So, I have no doubt that classic e-readers will lose ground to tablets, especially small form factor devices. After all not everyone is an obsessional reader, and why buy two devices when one will do most of the time.

However there will be a core of users, hard core readers, librarything members, travellers, who will continue to prefer the e-ink device for its usability as a device for linear reading ...

Friday 19 October 2012

Samsung's ChromeBook spoiler

It's interesting that Samsung has decided to announce the new ARM based chromebooks just after the Microsoft Surface price announcement and before the rumoured announcement if the iPad mini.

The chromebook is an excellent idea spoiled by reality - as I've said elsewhere the lack of reliable always on internet  makes its use problematical in a lot of locations, and to be honest, after my experiments with the seven incher I can confidently say a no name android table with a keyboard provides a functionally equivalent experience for a lower price, and you can still type on the bus.

So what's the use case for a chrome book?

A low cost laptop for the kids to do their homework on ? Not that stupid a scenario as they're probably relatively unbreakable and if one dies you just get another one the same.

And that's the key - the education market, standard plug interchangeable hardware, the ideal for a repeatable and consistent environment - the thing that's proved difficult to engineer in the iPad world ...

Thursday 11 October 2012

Changes in file format use over time ...


Andrew Jackson of the British Library has recently published a study of the use of particular file types over time, focusing on pdf, image and HTML file versions in an attempt to define whether being widely distributed and in use is a guard against obsolescence.

It's a valuable and interesting chunk of work. However it's possible to pick a couple of holes in the study :

  • it dosn't address the problem of different document formats, eg the variation in doc and ppt formats as recently exemplified by Chris Rusbridge's attempt to recover some powerpoint 4 format files
  • it doesn't explore the problem of legacy formats – my favourite examples are Claris Works and AmiPro files, and also those legacy foramts without a mime type – such as data formats used by specialist dataloggers

However what it does show is that once a format is in common use it is protected against obsolescence. The real problem is with formats from the days before storing documents on the web became the default for many people and the conventions were not fully established.

For example I recently needed to check some documentation about a legacy file format. The manufacturer had put the documentation on the web as TeX files. While perfectly readable this did entail installing OzTeX to read the downloaded file.

Andrew's study also did not address the problem of legacy media formats such as exabyte tapes and the rest. To be fair he explicitly only looked at the UK web corpus, which by definition is online, which meant that he was only concerned with file formats, not media formats.

It would be interesting to run a similar study over the filestore of a medium to large university and see how large a diversity of file type there were, as well as rerunning the study to look at document formats ...

Tuesday 9 October 2012

Microsoft surface ...

I was idly musing about my tablet experience over the weekend, and suddenly I had a light bulb moment.

The iPad is clearly a consummate device for consuming content. It doesn't work for work because work  involves doing things, such as replying to emails, producing notes, other short form documents etc.

My experiments with my no name seven inch tablet show there's a use case there. But it is kind of hobbled by the software base. Android (and iPad) lacks the software base required for serious full out work, at best they can be said to be useful tools for supporting work. That's why people on the whole buy MacBook Airs and other similar ultrabooks rather than Asus Transformers.

Enter the Surface. A computer that's supposed to have a more business oriented software base, but which has the virtues of tablet computers - low power consumption, long battery life inutitive interface (in a year we'll all be using the interface formerly known as Metro and will have forgotten about start menus).

So Microsoft have identified a gap in the market, produced a solution, and given's Microsoft's near total ownership of business computing, grabbed a chunk of market real estate.

Very clever - wish I'd thought of it ...

Friday 5 October 2012

Using the seven inch tablet as a note taker

As anyone following this thread will know I bought myself a no name seven inch android tablet and keyboard combo as a note taker for work.

I've had a long hunt for suitable tools over the years, and until I tried the tablet and keyboard combo the most effective and portable solution I found was a  palm pilot and keyboard combo back in 2003. Everything else in the intervening years has either been too heavy, had poor battery life or some combination of the two. I also find typing on glass keyboards on tablets a frustrating experience for extensive note taking.

The original use case was that the reasonably long battery life and use of a keyboard would allow me to take notes in meetings and either put them straight into dropbox or evernote.

The long battery life is a definite plus - making it more usable than a netbook - basically I charge it using the car charger when I drive in to work and away on the way home and that gives me adequate power for a working day unless it's one dominated by seminars or workshops where I find myself taking lots of notes.

It also of course lets me check email, twitter, and send me calendar reminders etc.

Basically it works, and definitely improves my effectiveness.

However the actual use pattern  has turned out to be not quite what I expected.

Originally I thought I would use epistle - a simple text editor and save the files to dropbox to be cleaned up a later date.

In fact (having tried ted and rejected it due to its poor export capabilities) I found myself going back to textedit to take notes and then mailing the files to myself as a more effective method.

I found I was more likely to put the effort into cleaning things up if the files were in my mailbox. (I'm not an inbox zero person, but I've evolved my own technique that has the same effect using gmail's starring and labelling).

Once I've got the file in my inbox I cut and paste the text into an empty libre office document, spell check it, tidy up the text and the document structure, and once I'm happy with it put it into evernote as a pdf. If I want to keep the original Libre Office document - for example to reuse the text - I save that to Dropbox

Spellchecking is important - unlike say my original Asus netbook the keyboard has genuinely small keys close together. It also suffers from keybounce or even misses characters - basically a better keyboard would improve matters.

The small keys are also a problem. Having large european male fingers I find myself sympathising with the late Steve Jobs over the need to file down my digits to type effectively.  J, who has small hands and doesn't suffer the key mistrike problem, or indeed hitting two keys at once.

I am however getting better - practice helps.

So, in conclusion,


  • the long battery life and lighter weight makes it preferable to a netbook in the same form factor
  • concentrating on just being a note taker is less distracting in meetings
  • you need to be disciplined - notes do need to be cleaned up later
  • a netbook or ultrabook is still preferable for trips away due to its versatility


Would I recommend it ? yes, but be realistic, it's a tool, not a transforming experience ...

Thursday 27 September 2012

Save icons ...

Some time ago there was some chatter on twitter about why the save icon in lots of programs was still a  floppy disk, given that one has actually used floppies for around 10 years now:


part of the answer is of course lack of agreement on what to replace it with. Libre Office uses something that's supposed to be a file being saved to a hard disk

but that really doesn't work. It's odd, does it mean download, is there a different icon for a network save etc etc (and if you're like me and need glasses, it can look like a printer icon on a small screen).

I guess we're kind of stuck with the floppy as the universal save icon until someone comes up with something with universal appeal. I find it slightly amusing that some android text editors (text edit for example) still use the floppy disk icon to mean save.

The other thing would be to be to simply change the default action - when you close a document you are asked to save it. That's ok, except that I quite often want to save a copy of the file I'm working on to dropbox so I can review it at home, and then come back to the version on my machine at work the next day. So I'd need a special push to dropbox icon.

Rather than agonise I think we need to be more like road signs and simply live with them as conventions. In the UK and much of Europe the sign for an ungated level crossing looks like this:


which works,  even though steam trains are thin on the ground these days purely because almost everyone knows what a steam train looks like even if they've only seen them on TV in period dramas.
In contrast the usual Australian sign is truly bizarre


it's a graphic representation of a crossing marker with flashing lights, and not something universally recognizable. It may, like the Libre Office save icon make intellectual sense but it isn't immediately recognisable as it's not part of our graphic lexicon ...

Google Cloud Print ...

I've previously written that I'd never found printer sharing apps on printers terribly useful, and the same held for Google cloud print. Great idea, but in practice not that useful.

Being able to print to any printer you've registered sounds great but the obvious use - printing when you're somewhere other than close to your computer. Sure I can print my boarding pass, but that's no bloody good when I'm in a hotel in Adelaide and my printer's in Canberra ...

That was until yesterday afternoon.

I was working on a document and had got to the stage where I really needed to print out a draft and scrawl all over it.

Except the office printer was down with a dead fuser unit.

To explain, we have a big shared multi function device.  I'm an 'eat our own dogfood guy', ie I believe that we should use our own services rather than bypass them by having a private correspondance printer. The confidentiality arguement doesn't applay to me so I've sent the last two printers that people tried to give me back.

The only downside with the big shared device is that supplies and maintenance are outsourced, and the printer is supposed to send status messages to the maintenance company who are then supposed to pre-emptively arrive with new bits before it goes offline. Most of the time this works pretty well, but sometimes just in time turns out to be just too late.

We do keep an old laserjet on the network as a standby, but rather than that ,I thought I'd just drop a copy in dropbox and print it at home.

Then I had a lateral thought - use Google's cloud print. So I downloaded the webabode cloud printer app of OS X, clicked on the printer I wanted to print to and clicked on the file to queue it, locked my computer and went home.

At home I powered up the printer and opened up my Dell laptop. I did have to restart chrome to get the printing to work, but sure enough, once I'd restarted chrome the printer whirred and out came my document.

Quietly impressive given it was an odt file created with Libre Office on OS X and printed via Chrome on a Windows 7 machine (which for historical reasons has Apache Open Office, not Libre Office installed)

Tuesday 25 September 2012

Skydrive killed the student filestore ...


Over the years I've said various things about student filestores, but now is probably the time to finally lay the corpse out in its box.

The reason? The Apple iPad. (actually it's not the iPad's fault, it could just as easily be an Android tablet or a smartphone of some description)

As soon as we get to a situation where students routinely have multiple computing devices they need to share information between them. Historically they did this by walking about with floppy disks in the bottom of their work bags and more recently with usb sticks. However as soon as you start finding people using information access devices that lack usb ports the game changes. They start sharing files over the network between their various devices. And of course sharing information between devices is really just a restricted version of sharing information with other users.

So what do we have in the way of alternatives:

Dropbox – can share files between computers. Automatic synchonisation. Widely supported. Webclient for accessing your files when you don't have dropbox installed. Can share files via magic link. Totally agnostic as to file type

Google drive. Functionality much as dropbox but with the added advantage of a web based editor and co-operative editing facilities. Can export documents in standard (ie Microsoft, Libre Office and as PDF).

Skydrive. Uses Microsoft style formats by default. Generous amount of storage (7GB), and accessible via a standard browser or via Apple or Android clients. Plays nicely with local Office installs.

There are other services, for example Box.net that provides much the same functionality as dropbox. Even the latest version of AbiWord comes with a built in document collaboration and sharing service.

So we can say that student filestores are dead. No one needs to use them and some of the alternatives are possibly better. Even the old canard about getting files back from a backup doesn't really apply – use google docs or skydrive and your deleted files live on in a recycle bin and even dropbox caches deleted files. In fact things are considerably better than if you walk around with your data on a USB stick – and potentially safer.

So we could quite sensibly say that we could run without student filestore provision – all we need do is nominate a preferred service. This isn't as radical a move as it might have been a few years ago – people are accepting of using cloud storage for their music, so having them self outsource their data to the cloud is not such a stretch.

All that is lost is the joy of managing student filestore ….

Wednesday 19 September 2012

Tungle.me is closing

Strictly speaking this is a service announcement - for quite a few years now I've used Tungle.me to let people schedule meetings with me.

It's been especially useful when scheduling skype calls with colleagues overseas as it's avoided the 'if it's 5 PM in Canberra it's just a little too early in Manchester' to and fro.

As of 3 December Tungle.me will close. Unfortunate, but then as I never paid for the service, always using the free version, I've no reason to feel more than mildly irritated. It also points to one of the inherent problems with using many of the free services out there - sometimes they fail and sometimes they just go away. For example, in the past year I've lost both of my third party unix shell accounts, ie my command line login on time sharing machines not on any of the networks I normally connect to.

Not essential, but mildly useful to check things out, or make sure that a firewall somewhere was doing what it should.

More concretely, I've now changed to a Doodle service that lets you file meeting requests and check my availablity. If you have my tungle.me link squirrelled away somewhere, please update your contacts file appropriately ....

Monday 17 September 2012

Two weeks with the seven incher ...


I've been using my little seven inch tablet for a fortnight or so, and I must say it doesn't disappoint.

The ability to type notes as we go and then save them to dropbox or evernote is incredibly useful, as is having a browser to check things, or be able to search one's evernote documents (to give some perspective I have around 3000 separate notes.

On the downside, battery life and power management has been interesting to say the least. Let's just say that using wifi chews battery and leaving it in standby mode is not the answer – it has a tendency to suddenly wake up. On the other hand, if it's reasonably charged it will cope with three or four hour's use reasonably well.

Having a car charger has been useful – I plug it in on the way in and on the way home to give it a jolt which helps me get through the day.

I've also pruned the application choice somewhat. I'd added a printershare app and then found it less than useful. LaTeX has gone to be replaced with TeD. I've added the connectBot ssh client, but I've also got rid, for the moment, of the newspaper apps.

It also managed (or maybe I did accidentally) to screw up it's internal storage to the extent of needing an OS reset, which at least gave me the opportunity to comprehensively de-crud it.

I'll persist and post again about how useful it is as what is essentially a netbook replacement ...

Tuesday 4 September 2012

Actually using the seven incher for work ...


Today I used the seven inch tablet in anger to take notes during a four hour workshop session this morning.

Whie it could be argued that in the way I've set up the tablet I've basically created a netbook style environment, or more accurately an environment like that of the original Eee netbooks there are a couple of key differences.

Data is saved to the cloud where possible. This morning I saved it straight to Dropbox from epistle. Once the workshop was over I headed back to my office and used Libre Office to fix up the myriad typos and pretty up the layout for legibility and then saved them as a pdf direct into evernote.

The other key point is it was a four hour session. I took around four pages of notes. My battery was more on less 100% when I started and just under forty when I finished – better than a netbook, better than my MacBook pro.

Size. Like a netbook it's light, and hence easy to carry, And having Evernote on it it's easy to show people related material in an informal chat afterwards and if necessary email it to them.

So, looks promising. Will post an update after a couple of week's use ...

Monday 3 September 2012

Work, Apps and the cloud ...


For the seven inch tablet I decided only to install a small set of apps, all strictly work related:

  • Evernote – note organiser/documentation manager
  • Skydrive – access to documents stored on skydrive
  • Dropbox – document exchange service
  • Epistle – text editor that writes straight to Dropbox
  • TextEdit – minimalist text editor with email capability
  • LaTeX – technical editor
  • Wordpress – blogging application
  • Eduroam – network finder application

as well as the usual twitter, gmail and calendar applications. Nothing much else other than a weather app and some newpaper apps in case I end up stuck somewhere.

The idea is to only add applications that deliver value – so I imagine there will be a number of document viewers for Libre/Open Office and Microsoft Office files stored on skydrive and dropbox.

The secondary idea is to make this device as state free as possible with all documents pushed to cloud services as much as possible – that way documents should always be accessible from any device – an 'any time, anywhere' martini style service.

One thing I havn't considered is printing.

While there are a number of printershare applications out there I've never actually ended up using them, despite having installed them on my existing tablet – basically if you want to print it, you probably want to save it somewhere meaning you can print from some other host.

The interesting question will be to see what other applications I end up adding ...

Seven inch tablets and active filestore ...


Over the past few months I've written about both using seven inch tablets as note takers and the use of Dropbox as an active filestore.

As an experiment, I've invested in a no name seven inch tablet which came bundled with a usb keyboard in handy case (all for around $125 including additional memory and a car charger)

I ordered the device through ebay – delivery took longer than expected, not due to any problems at the seller's end but more due to the postal authorities at Shenzen rejecting the package first time around, and taking two weeks to do so. The second time the vendor sent it EMS and it arrived in under ten days.

As a consequence, I've only just received and configured it – I still need to use it seriously to see how useful it is in practice.

However, its not just me that's discovered the power of no name Android tablets from China. Last weekend we went down to the coast and stopped off in Bateman's Bay to pick up some bits and pieces. We were a bit later than planned and most of the stores were winding down but Aldi was still open.

While I was lining up to pay for our salad and fruit I noticed that they had a list of this week's specials which included a nine inch Android tablet for around two hundred bucks – eminently affordable and less than I paid this time last year for my zPad ...

Monday 20 August 2012

So, what to do with this data stuff ?


So, having said that we can treat the literary canon as data for some analyses, what is data?

My view is that it is all just stuff.

Historically researchers have never carde overmuch about data and it retention - the obvious exception being the social and health sciences where people have built whole careers out reanalysing survey data and combining sets of survey data.

In other disciplines attitudes have varied widely, often driven by journal requirements to retain data for substantiation purposes. And in the Sciences Humaines no one has really thought about their research material as data until very recently.

What we can be sure of is that the nature of scholarly output is changing with the increased use of blogs to communicate work in progress and videos to present results etc etc. I'm sure we can all come up with a range of examples. What we can be sure of is that the established communication methods of the last 150 years are breaking down and changing. And to be fair they only really applied to the sciences and social sciences. In the Humanities and other disciplines it has never been the universal model for scholarly communication.

Likewise teaching and learning, the rise of the VLE and its implicit emphasis on the use of electronic media, has changed the way that learning resources are presented to students.

Data is just another resource and being electronic, highly volatile. We can still read Speirs Bruce's antarctic survey data because it was hand written in bound paper notebooks. Reading data stored as ClarisWorks spreadsheets written on a 1990's Mac is rather more complex - for a start you need a working machine and a copy of the software in order to export the files ...

However not all is total gloom,  sometimes one can recover the data.

Just recently I've helped recover some data from the 1980's. It was written on magnetic tape by machines built by a manufacturer that no longer exists. Fortunately the tapes had been written in a standard tape format which could be read by a specialist data recovery company, and the files had reasonably self describing names - and the original people who had carried out the research were still around to explain what the files were.

In 10 years time recovering this data might well be near impossible.

Once recovered, looking after the data is relatively simple - electronic resources are just ones and zeros. No files or content needs special handling in order to preserve them – the techniques to enable this are well understood, as are those to provide large data stores.

It is the accompanying metadata that adds value and makes content discoverable. And that of course is where the people come in. Electronic resources are incomprehensible without documentation, be it technical information on file format and structure or contextual information.

So, if we are to attempt to preserve legacy data we need to start now while the people who can explain and help document the data are still around.

It also means that you have to make the process of deposit easy. Users will not understand the arcane classification rules we may come up with based on discipline or data type.

While it is perfectly possible, and indeed sensible to implement a whole range of specialist collections and repositories, if you are taking a global view you need to start with as ervice of last resort. Such as service should be designed to be agnostic as regards content, and also can hold content by reference ie it can hold the metadata without any restrictions as to type and point to the content stored on some other filesystem not managed by it.

It is essentially a digital asset management system. It is not a universal solution to enable all things to all people.

It is perfectly sensible to group digital assets in a variety of ways that reflect the Institution's internal processes. PDF's of research papers live in ePrints. Learning objects live in the learning object repository should we have one. A specialist collection of East Asian texts live in a dedicated collection, etc.

This means three things:

1) A framework round data governance and data management is a must have. It needs to say things like 'if it's not in a centrally managed repository you need to make sure it's backed up and has a sensible management plan'
2) The institution concerned requires a collection registry that holds information about all the collections and where they are located to aid searching and discovery
3) We need as simple and as universal submission and ingest process. If it's not simple for people to use people won't use it. We might have a model as to what goes where and have demarcation disputes but these are irrelevant to the users of the system(s)

Text Analysis, neither snake oil or a cure all ...


A lot of this text analysis stuff is about treating text documents as data.

And while you can get valuable insights from these analyses it's important to understand that the original creators of these documents were not creating content or depositing data. Jane Austen did not create content. She wrote novels.

When she set out to write these novels, which we could describe as comedies of manners in the main, she inadvertantly described the society in which she lived, one in which, for women a 'good' marriage was necessary to ensure financial security, and one in which communication and travel was difficult, leading to a small and compressed social circle.

Critical reading of these novels allows us to build a portrait of how they lived.

I've picked on Jane Austen as an example, but I could just as easily have chosen Aristophanes or Juvenal.

It is important to understand that the to approaches are complementary. When for example I used the Google Ngram viewer to plot the use of the term Burmah, you could get some measure of the significance of use of the term to the ordinary reader at the time.

It doesn't tell you anything about how colonial society functioned.

This isn't of course to rubbish topic modelling or other such techniques. It lets you identify topics of concern within a corpus, just as looking at the frequency of medieval property transfers might identify times of social turmoil and change.

So we need to be critical in our approach. Topic modelling and other text mining techniques are now possible due to the sheer amount of digitised text available, and they definitely give an index of popular concern.

They are however not a substitute for critical analysis. Rather, they complement it ...

Thursday 16 August 2012

Geeking about with wordcloud...


People may be wondering why I've been geeking about with Wordcloud.

I actually do have  a rational reason - information presentation. Research paper titles are sometime wonderfully opaque and the keywords are sometimes not much better.

People also don't have a lot of time to read lots of abstracts.

So I thought, why not generate a wordcloud on the paper and store it just in the same way that we might store an image thumbnail.

Doing this is actually more complex that it might be.

First of all, as my Gawain experiment showed you really need a discipline specific stopwords file. What they should be I don't know but feeding a whole lot of research papers on a particular topic and doing a frequency count should help generate a set of common terms discipline specific terms that are essentially 'noise'. The human eyeball would also need to play a part - if you have a set of papers on primatology for example you don't want the term 'baboon' to end up in the noise file just because it's common.

Equally you need to be able to classify papers in some way. Going back to my baboon paper, is a paper on changes in foraging behaviour in baboon troops as a result of drought ethology, ecology  or climate science?

Hence the idea behind topic modelling and the 'fridge poetry' output - my idea is to do something like the following:

Feed the text of a research paper through topic modelling software. Compare the results with discipline specific lists that you made earlier by feeding a whole set of papers sorted by discipline though the same software. This should give you some measure of 'likeness', and allow you to allocate it to no more than three fields of research.

Then, taking the top scoring field of research for a paper, feed it through the wordcloud software with the  appropriate stopword list.

This will give you a visual representation of the key themes in a paper, and allow people to rapidly flick through material and identify the papers they are interested in. Of course you also store the classification words to allow people to search for, to continue the example 'climate + baboon'

I say papers. I do mean papers, but at the back of my mind is the fact that scientific communication is changing - blogs as research diaries are becoming important, videos of conference sessions are bubbling up and we need some sort of way of classifying them and producing and easy to read visual representation of the themes - for example here's one of one of my other blogs:



Is this a good idea? I honestly don't know. There is no substitute for reading the material, but finding relevant material has always been a problem. As publications move away from the established journals to self deposit in various repositories we need to think of ways to make research more discoverable.

Gawayne the Green Knight meets the wordcloud

I've always had an affection for Sir Gawayne the Green Knight, (first bit of real middle English I read) so I thought as final act of fiddling about with wordcloud I 'd feed the Guternberg version into the IBM wordcloud software just to see what came out


which neatly demonstrates the need for a proper middle english stopwords file. Hacking my original file to produce an extended though very incomplete file one gets something a little better:

which shows that one of the things we need to take this outside of playing with nineteenth and twentieth century English text is a set of agreed stopword files for analyses.

This would clearly also apply to analyses with other languages, be it Malay or Old Irish...

Wednesday 15 August 2012

Chaucer wordcloud

And finally,

for fun I fed the Gutenberg Collected works of Chaucer into the wordcloud software ...

this is actually quite interesting.

I didn't have a middle english stopwords file so of course we see that common forms of speech (ye, thou, thee, thy etc) predominate. So, I made myself a very simple supplementary stopwords file consisting of the obvious bits of middle english (thee, thy, thou, ye, eke, gan) in the wordcloud and then  reran the generation process:


which I think we can agree is possibly a bit better though it needs more work - for example quoth, hath, anon and may should probably be excluded.

Using an extended stopwords list one can come up with something like this:

which is possibly a more accurate model of Chaucer's drivers. I must say that I'm quietly impressed with the power of this to display the themes in a body of text ...

Wuthering Heights wordcloud

and just for fun this is what you get when you feed Wuthering heights into the IBM wordcloud software:


More fun with text analysis


A couple of days ago I blogged about my rather brain dead experiments with topic modelling software and a couple of books by W G Burn Murdoch.

While the experiments were not mature they certainly taught me a lot about text mining and topic modelling.

Essentially we take a body of texts, strip out the common words (the a's the the's etc that glue a sentence together) and search for statistically significant combinations of words that differ in abundance from their usage in a body of reference texts. Cluster analysis with attitude in other words.

Just by coincidence there is an excellent recent post by Ted Underwood that reiterates and amplifies most of what I've worked out by myself.

Since my initial experiments I've confirmed my findings with Voyant. Basically I got the same results.

I was going to try the Stanford text mining tools as well, but I need to teach myself a little Scala first.

The point which I want to make is that this exploratory research (aka geeking about) was trivial on my part – I downloaded the software onto an old Dell laptop running Ubuntu, chmod'd stuff where appropriate and I was doing it. Installation and execution didn't demand a lot of effort.

People may of course object that I've been playing with computers for years and that these things are easy for me. Well they are. But so they are for everybody. And the uses you can make of them can be innovative.

I started with a fairly simple question. Because I understand a little about cluster analysis etc I had no real problems in understanding what the data gave me – and incidentally came up with a different question – the role of critical reading in all of this.

However the exercise does have some value in itself

When I showed J the 'fridge poetry' topic lists and wordcloud stuff she immediately downloaded the software to her Mac and fed the Gutenberg Bronte texts through it – she alread had them lyng around as she was in the process of hacking out passages as discussion texts for her English Novels students – just for curiosity to see if it gave anything that looked useful as a supplement for teaching. Questions like why do we get these words and not their synonyms.

Now J is not technical – perhaps even less so than she needs to be as she's got me around – but again the effort involved in doing it was minimal – even less that that required on Ubuntu. In other she thought maybe this might be a fun approach – let's see if it spits out anything I can use ...

So the cost of curiosity with this stuff is minimal as are the computing resources required.

There's a message in there somewhere ...

Monday 13 August 2012

W G Burn Murdoch meets topic modelling ...


quite some time ago I blogged about W G Burn Murdoch's from Edinburgh to Burmah chronicling a trip he made in 1908.

As well as being an enjoyable bit of Edwardian travel writing one thing that struck me was the writer's developing sense of Scottishness and also his sympathy with the Burmese people and the annexation of Upper Burma.

At the time I suggseted that one could trace a scottishness meme through his friendship with William Spiers Bruce and the Scottish Antarctic Expedition.

So today I decided to test this out.

First of all I downloaded and installed the gui version of the mallet topic modelling software and fed the texts of both his books through it. Some beautiful fridge poetry resulted but not much of a hint of Scottishness.

Edinburgh to Burmah topics:

1.water sand great left grey soft burmah till evening top
2.home sea made good night indian things natives pretty dinner
3.light hot country burmese brown told thought royal figures faces
4.white black time morning dark board long put head steps
5.red trees back yellow small colours feel train flowers notes
6.people men man air women fish music chinese young coloured
7.day colour sun house full prince deck golden hair low
8.blue half gold sky miles big shore pass hand days
9.side feet work project high ladies native gutenberg pleasant make
10.river round green india place night line open east ground


Edinburgh to Antarctica topics:

1.great night boat seals feet doctor till found called hours
2.crown white man illustrations south put weather warm ship op
3.water snow grey left seal line hard option balaena thought
4.antarctic vo edinburgh round vols back crew svo fcp red
5.wind air work boats world mate sea top brought cabin
6.long men small blue islands sky birds penguins rev turned
7.sea made black life cr home half dark colour green
8.ice day days light whale north amp bergs whales sun
9.board ship good works lay heard cold pack making end
10.time land deck make head side skins mist hands sir


The second list is a little odd as  Burn Murdoch's Antarctica book is a non-proofread ocr version, which contains what are obviously font or word recognition errors.

So to sanity check what I was seeing I installed the ibm word cloud software and fed the books through that, not a hint of Scottishness standing out.

Edinburgh to Antarctica wordcloud


Edinburgh to Burmah wordcloud

Now I'm not about to rubbish topic modelling as a technique, however it possibly is not a complete substitute for critical reading. In his 1908 Burmah book I certainly got a sense of W.G's developing sense of Scottishness as opposed to Britishness and that this informed his feelings about Upper Burmah. It doesn't show up in these analyses.

And that I think is important. Applied to newspaper reports or scientific publications it quite clearly can pull out important themes. What it doesn't pull out is the subjective and impressionistic ...



Monday 6 August 2012

Reading books on the bus on the phone


Way back in September last year I blogged about what sort of devices people were using to read books on the bus.

A few days ago I had to drop the car off for an oil change, so I caught the bus into work. The thing that struck me was the number of young Asian women (I'd guess the usual Canberra mix of Vietnamese, Chinese and Korean) who were reading books on their smartphones.

It of course makes perfect sense - carry one device, and of course getting books in your preferred language, let's say Chinese for sake of argument, as ebooks means that you can get the latest novel/romance or whatever from an online store without having to trawl round local Chinese language bookstores and wait for them to get a delivery.

It's not just an Asian phenomenon.

J is teaching nineteenth century novels this semester, and was about to berate a girl for fiddling with her phone during a session, thinking that Emily Bronte had been displaced by facebook.

But no, the student said she'd forgotten her version of the text, but she'd found a copy on Project Gutenberg, downloaded it and was now flicking through to find the section under discussion (no page numbers you see ...)

Friday 27 July 2012

Social networks and Beowulf

There's a research paper which argues that Beowulf, the Iliad and the Tain are more like reality than Shakepeare or Harry Potter. This seems to be causing a bit of a flutter and some angst among literary types.

Basically two physicists worked out the social graph in Beowulf, the Iliad, and the Tain and compared it to the social graph in well known works of fiction and demonstrated that it was different, and that the epics had graphs that were more like each other than the works of fiction.

They then compared these with the social graphs of people like company directors given that leaders of warbands  are not that common these days

From there the made the leap to say that the epics were more rooted in 'real' messy events than works of fiction. In real life shit happens and sometimes people make what look to be astoundingly stupid errors.

Fiction flows more and has less left field type stuff.

What would be interesting is to take various of the norse sagas and compare them to something like the Tain.

Why? Because we know that most of the personages in the sagas were real. In the Tain, which is essentially the retelling of a war over cattle in pre Christian (and pre literate) Ireland we don't know if it's based on real events or not.

And if that works we could then apply the analysis to a range of other stories, such as the dreamtime stories or various semi mythological south east asian epics ....

Monday 23 July 2012

Dropbox as an active filestore


I've written previously about what you could do with cheap android tablets with an external keyboard as note takers but one of the problems has always been getting the data off of the tablet without ending up with multiple versions of the same document on different platforms.

I think I now have the answer.

I've been using Writebox, a minamalit text editor for Chrome that saves text files straight to your dropbox folder, which then syncs next time you are online - very useful for spotty internet connections. They are text files which mean that they can be read by just about anything under the sun, and highly portable between platforms. This also solves my abiword instability problem allowing me to work on a variety of platforms with ease

You can of course get the same effect by saving notes from any text editor you like to your dropbox folder. Writebox's advantage is that it's a native Chrome app that you can install into Chrome on any machine without you having to remember to save to dropbox, or indeed learn multiple text editors if you switch between platforms,

And then I thought 'I wonder if you can get the same effect with Android ?'

Up to now I've been using text edit on Android which lets you email files, but a couple of minutes googling uncovered Epistle, an app that saves files from Android to your dropbox account. This means that you can write notes on something basic, and then combine/edit/pretty them on a full size machine before saving them whereever.

This is not very new as a concept. In the late 1980's DEC had a product known as pathworks. Basically what it meant was that your Vax filestore was connected to your networked pc as a disk and that your pc could read and write to these files. This turned out to be incredibly useful as it meant one could create a vax text file from home in a terminal session and then import and edit the document at work the next day - in fact one of my tricks was to write reports as text files on my Mac Classic at home, paste them into a vax editing session in a terminal window, save the files and then format them up the next day.

And in fact all through the dial up years of the nineties it was the same. While we ditched pathworks and went to pc-nfs as a campus networking product, it was still the same - dial into a unix server, transfer a file and there it was sitting in your pc filestore. (Which was the same as your unix filestore - infact the filestore was common to anything that could NFS mount the fllestore. At the time I used to rave about the concept of the one-touch filestore - save it once, open it anwhere)

Of course the world changed, personal computers took over the world, and the need to move documents in this way disappeared. Now that typically (and I know I'm not typical) everyone has more than one computing device the need to move data between devices is more and more common. It's not surprising to learn that the whole dropbox concept was dreamed up as a solution to not having to carry documents back on forth between home and work on a usb stick.

Using it actively as a filestore rather than just as a transfer tool means that data can moved where it's most appropriate - basic notes on android or a linux netbook, formatting and structuring on a PC or Mac. Just as in the same way one could get the Vax to output the results of a data manipulation onto disk and one could import it in excel for further analysis.

In my continual quest to avoid having to carry reams of paper to meetings I've just ordered myself an Android tablet and keyboard combination. With a combination of Evernote to hold scanned copies of reference materials and epistle for writing notes I should be sorted. The only possible downside I can see is these people who persist in sending out word attachments rather than something platform agnostic like pdf.

I'll post further about this once I've tried it for real ...