Friday, 17 October 2014

Yosemite ...

Normally I'm very cautious about OS upgrades but this morning I felt wild and impetuous and upgraded my work Mac to Yosemite, Apple's latest version of OS X - one of the advantages of living on the dark side of the world is we tend to get first go at updates.

The process basically just worked, although it took a bit longer than Apple claimed, and the system was pretty unusable during the post install optimization phase - budget at least a couple of hours, or perhaps a little  longer.

A few things needed nudging:

  • Dropbox needed a reinstall - just like after the 10.9.5 upgrade
  • Amazon Cloud drive broke and needed a java install
  • GanntProject (admittedly I was running quite an old version) broke and needed an update
Everything else seems to work, although switching between apps and app start times seems slow, but then it was with Mavericks, but once running everything seems reasonably responsive on what's now a four year old MacBook Pro with 4GB of RAM.

My only real gripe so far is the use of Helvetica Neue as a system font - yes it's legible, but it's a bit too heavy for my taste, but if that's the only annoyance, things are not too bad ...

Monday, 13 October 2014

University rankings and altmetrics

I was on holiday last week, and we drove down to Victoria for a few days. On the way there I listened to the radio and someone (I forget who) was talking about the THES rankings and making the very good point that individually the various ranking tables don’t mean a lot as they all use different algorithms and tray and measure different things. but that if a university scores consistently high in a number of these tables it suggests that in some way it is better than one with either inconsistent scores or consistently low scores.

Better here means that its is good at teaching, good at research, and is effective in promoting this.

So with altmetrics, impact rankings and the rest. Individually the various scores don’t mean a lot, but collectively they are an indicator of engagement. I’ll say engagement because this is still a nebulous topic. There are people who publish highly cited research but don’t promote it. Typically these people are well known in their discipline. Then there are those who communicate well about their field and have an impact through teaching, through social media, and the rest. And of course there’s some people who are somewhere in between.

Like ranking tables, high scores are good. However before dismissing inconsistent scores (high say on social media, low on research impact) we need to actually ask a very difficult question: What are we trying to measure, and how will we know we’ve measured it ?

The first part is probably relatively simple to answer, the second one rather less so, as we need to decide on what we will accept as evidence and what it tells us about engagement

Written with StackEdit.

Friday, 26 September 2014

Almost instant ebook production

For reasons described elsewhere I was searching for pictures of Louise Bryant - who was a witness to some of the events following the 1917 revolution in Russia and was married to John Reed - best known for Ten days that shook the world but with a pedigree in left wing journalism and agitation in the US in the years before the first world war.
However, this post is only tangentially about my interest in the history of the Bolshevik revolution and subsequent civil war. While searching for pictures of Louise Bryant I came across an unpublished biography at (unsurprisingly), which I thought might be worth reading.
The biography is published as a set of long html pages - as if it was a set of blog posts, and I really wanted to read it offline using either a tablet or an ebook reader. So I decided to make it into an epub file for personal use.
It turned out to be really easy to so and as there’s quite a few other books and texts out there that have been transcribed and published as a set of web pages, so I thought I’d publish my recipe …
  • Using firefox download and save each of the web pages in the document
    as a file - choose web page as the format - this will save the html
    and also save any embedded images in a subdirectorynamed after the
    web file, eg if the first section of the book is called part_one it
    will save a file called part_one.html and create a directory called
    part_one_files containing the embedded images.
  • Then create a new blank document in libre office and using the insert command select and insert each of the files in turn into the document. This will give you a document containing the entire text.
    • Automagically the images will also be embedded in more or less the
      correct place.
  • Save the file as somefile.odt onto your dropbox
  • Go to CloudConvert and connect it to your
    dropbox account.
  • Select somefile.odt as your import document and choose either epub (for most generic ebook readers) or mobi (for kindle) as the output format.
  • Select save to dropbox as your output target, click convert and it
    will write the output file out to ~/dropbox/apps/cloudconvert/somefile.epub
    • (Under windows it will be in the apps\cloudconvert directory in My
    • If you chose mobi as the output format the output file will be
At that point you can then transfer your file to your chosen device such as either sideloading it or emailing it to your kindle, or opening the file on your tablet by using an ebook reader application and opening the file on dropbox.
The whole exercise took me about ten minutes.
I see no reason why this solution should not work equally well for other texts transcribed and published as web pages.

> Written with [StackEdit](

Tuesday, 23 September 2014

Digital Preservation Strategies ...

I came across a beautifully succinct quotation from National Records of Scotland:

‘If digital records are not captured there can be no preservation and
if there is no preservation there can be no access’

Which is a beautifully concise description of why we do data capture. If we don’t there is no way of retracing our steps, no way of of substantiating research, because we don’t have the original data.

And of course, if we don’t have the data all our arguments about preferred archival formats are moot. And in a very real sense they are anyway - formats change over time, and preferences change over time. Legal documents and court transcripts in Wordperfect from the nineties are a key example.

They may still have validity, but they are in a dead file format. No one when they created these transcripts knew that in twenty years the files would be in a dead format - they chose a widely used well documented format - it’s just that preferences changed.

Tools such as Tika, Pronom and Fido give us a chance on capture of also being able to record information about the file format, which gives us a clue about how we might read the file in the future.

And of course technology to read files changes as well, all we can do is try and make sensible decisions to make life easy for anyone who wants to access captured files.

File normalisation is one - what of course it really means is ‘convert files in a known proprietary format to an open format on ingest’ - usually using something like libre office in batch mode, and storing the converted file along with the original.

The idea is of course, that the converted file will be easier to read as it’s in an open format than a proprietary format. Of course, when we say proprietary format we mean Microsoft because we worry about its dominance of the file format ecology.

And we are of course most certainly wrong - there is just so much material in Microsoft formats that it is difficult to believe that there will be a future in which there are no applications to read these files - what one should be worrying about is the less well used formats such as Pages or AbiWord where there is a greater risk of losing access.

But the point remains, that unless we capture the files in the first place we will have no chance of reading them in the future …

Written with StackEdit.

Tuesday, 2 September 2014

Moving people away from commercial cloud services

A few days ago I posted an update on my thoughts about Eresearch support services .
One of the points I made was that no matter how desirable it was to move people off of commercially hosted services such as Dropbox, it wouldn't be easy

This ease of sharing and the fact that Dropbox is hosted 
outwith Australia is something that of course gives intellectual 
property managers the willies, but it is also a fact of life, and 
something that has to be dealt with - in other words, as Dropbox 
is already out there in the wild, and whatever is provided as a 
replacement has to be at least as good, and at least as flexible 
- which of course means it will bring the same intellectual property 

Dropbox, and the others, such as Evernote and Box, are in with the woodwork as they already have widespread adoption.

I’ve just had a real world example in which a researcher shared data with me via Dropbox that he wanted to have uploaded to our data repository, and have a Digital Object Identifier minted for that data so that it was citable.

In my conversations with him I followed the party line and suggested he use Cloudstor, AARNET’s file transfer service, which is based on FileSender to transfer the data to me.

As a service, it’s pretty easy to use. However, my client used Dropbox instead, simply because it’s what he was familiar with and he knew that it worked.

I am, of course, as bad as everyone else. I routinely share documents and notebooks stored in Evernote with colleagues, and share Google documents with colleagues, so I’m most definitely not going to complain about using Dropbox here - after all it’s exactly what I would have done, and as I’ve said before I’ve had publishers share material for review in exactly the same way.

Instead of complaining, I’m going to take this as a learning experience:
  • services like Cloudstor, are not going to succeed without a major educational campaign to raise awareness among the user community
  • competitor services like Dropbox are already well established and user have a high degree of familarity with them - any educational campaign needs to focus on cloudstor’s unique features
  • whatever value proposition is made needs to be relevant to the users - so if we want to build a unique selling proposition around keeping intellectual property onshore we’d better make it relevant and explain that as well
and the last point is something that we would need to think carefully about. My client was passing me his data as he wanted to not only to make it citable, but also open access, as he was publishing a paper in a journal that required this.

And if it were me my first question would be

If it’s open access does it matter it’s gone via Dropbox ?

And I must admit, I’d be hard pressed to find a reason why it mattered …
Written with StackEdit.

Tuesday, 26 August 2014

Eresearch services

About a year ago I posted my two cents worth on what an eresearch support service should look like.

A year or so on, and innumerable conversations with users, potential users and people who are interested I find my views are not much changed:

User wants can be broadly summarised as

  • storage
    • dropbox like sharing capability
    • lots of it
    • handling of diverse media types (agnostic)
    • assurance it is secure backed up and accessible
  • virtual machines
    • data analysis & manipulation
  • secure long term storage of data
    • publication of data for substantiation
    • digital object identifiers
  • advice on legacy data
    • format conversion
    • media conversion
    • digitisation
    • some bespoke programming, data wrangling etc

Dropbox is extremely popular because of its ease of use and universality, meaning people can share data from the field with colleagues, with colleagues overseas etc.

I have a second life in which I review books - it’s noticable that in the past year publishers have moved from sending you the epub or mobi version to sharing it with you via dropbox. I don’t see any reason why researchers should be any different in their habits.

This ease of sharing and the fact that Dropbox is hosted outwith Australia is something that of course gives intellectual property managers the willies, but it is also a fact of life, and something that has to be dealt with - in other words, as Dropbox is already out there in the wild, what ever is provided as a replacement has to be at least as good, and at least as flexible - which of course means it will bring the same intellectual property concerns.

And of course it’s not just Dropbox, we can say the same about Evernote, OneDrive, OneNote and Google Drive.

However in the course of my conversations one thing that comes up over and over again is the need for decent work in progress storage, and work in progress storage into which it is easy to load data, either by direct capture from instruments, or by some easy finder/file manager like process - people expect to be able to drag’n’drop and tellin them about some command line incantation with rsync doesn’t play.

There is an interest in data publication, but at the moment it’s basically driven by journals requiring that data has to be made available, but I expect that this will build as more and more journals require this. I also expect to see more interest in publishing source code and things like R scripts as part of the whole substantiation and open review thing.

There’s also an undercurrent of people wanting to return to research they did earlier and finding themselves locked out of their data because it’s been stored on media no longer in common use - such as zip drives, or in older data formats that made sense at the time. We could rehearse the open formats argument here, but that doesn’t fix the problem, which needs to be addressed. Allied to this is the need for a little bespoke programming or data wrangling to get data into a usable format, or to clean data.

So, one year on I’d say change hasn’t happened, but there’s nothing to say that it won’t …

Written with StackEdit.

Wednesday, 20 August 2014

Munich to ditch Linux ?

The internet has been all a-twitter today with the news that Munich was considering dumping Linux and going back to Microsoft.

I’m not surprised. Saddened perhaps, but not surprised. Much as Apple through the iPad owns the tablet space, Microsoft still owns the office desktop, and this means that if you want to do something different you have to not only do it as well as Microsoft, you have to do it better.

So let’s look at the Linux software environment and compare it with Microsoft. And of course when we’re talking about local government we’re largely taking about administrative and management tasks, which means word processing, spreadsheets, email and workflows - in other words office applications.

Libre Office and Open Office basically do everything Microsoft Office does, but slightly more clunkily and clever formatting in Office documents sometimes comes out a little wierd, especially if the original document has been edited with two or three different versions of Office, but in the main it’s perfectly usable. You’d be being snippy to say it wasn’t.

Ditto for evolution as a mail and calendar client. Not as polished as outlook but perfectly usable. And if you were a private individual or running a little home business there’s no reason why Linux wouldn’t work for you. The same argument applies to Macs and OS X. Or running anything with Google Docs.

And then there’s collaboration, workflows, business automation, call it what you will. Sharepoint does that pretty well. And in the Linux world?

Sure there are solutions but they usually involve keeping squads of wild eyed sandal wearing geeks in the basement - ie you can’t just license it, get some nice consultants in at inflated prices to configure it for you and leave it running the way you want.

And there’s lots of things out there to integrate. Useful things like invoicing and payment management solutions. Move to something definitely not mainstream and you have to re-engineer every damn thing …

Written with StackEdit.