Friday 23 March 2012

digital reading ...

There's a story going around at the moment that most US college students prefer digital reading to reading paper books.

As you'd expect there are people who view this as a sign that the barbarians are about to storm the humanities faculty, and equally there are those who see this as an opportunity for a sociological study of reading in the twenty first century.

Two things are clear - digital books are here to stay (just look at the Amanda Hocking phenomenon) and that there's wide spread adoption of digital reading for recreational reading - as I blogged last (Australian) spring.

With all sociological studies context is key. What happens in one place may not happen or be appropriate elsewhere.


Reading paper books may be a more sensual experience, and what's more, if you tend to read obscure books, sometimes essential, but if you want to read a well known, widely available nineteenth century novel, say Wilkie Collins  'The Woman in White' for some escape time while commuting, electronic wins hands down.

The Penguin edition of the Woman in White weighs around half a kilo, and is far to bulky to sit in a jacket pocket or be stowed comfortably in a messenger bag. You are going to read the Project Gutenberg edition for free, even if you have the paper version at home.

And you would naturally expect it to carry over into other aspects of daily life, so that people sitting in a cafe on campus are using their computers and tablets, or if reading are using an e-device. What they do at home or when they've gone down the coast for the weekend might be quite different.

Likewise their choice of technology might be different. If I'm travelling I'll use an e-reader due to the superior battery life, and also quite frankly if I'm rattling in a bus across the city (or northern Thailand for that matter) as the e-reader is lighter, and if it gets damaged I'd much rather lose a $100 kindle than a  $500 tablet computer. 

The northern Thailand thing is more than just hyperbole - if you're travelling somewhere where recharging your device is not an option - be it a 24h flight to Europe or a trip to rural Laos the long battery life of an e-reader really does count for something.

Back in 2010, after having taken an e-reader on vacation instead of a pile of processed dead tree, I blogged about the convenience factor and also included links to a couple of relevant newspaper articles about the convenience of e-readers. It's interesting (and relevant) to note that a bookstore owner in Ireland has accused an airline of damaging his sales because their policy on hand baggage means people will not impulse buy a book to take with them on a flight.

So, if I wanted to re run the study here in Australia I'd like to compare the adoption of digital reading  between literature students, a cohort of mixed arts students, and a cohort of general students. I'd also be asking questions about context and controlling for availability. I'd guess you'd find more e-readng among literature students, purely because it's a way of getting a lot of the standard eighteenth and nineteenth century texts for free and for general reading perhaps a very slight preference for paper as a more sensual experience ...

Monday 19 March 2012

three inflection points in text based media consumption

This is an idea which has been rattling round my head for a while, but I  think that there are three real inflection points for the consumption english language text based media:

1650

Due to the spread of militant protestantism an increasing interest in reading religious arguements by people themselves, increasing access to books in English and printed pamphlets. Personal content creation is handwritten

1885

Widespread literacy due to the 1870 Education acts, improvement of printing process and distribution by rail and steamship have reduced costs substantially and make books increasingly affordable and accessible. Newspapers generally available. Typewriters start being used for personal content creation.

2011

Widespread literacy. Increasing content is distributed and stored digitally, and is consumed in digital form via e readers and tablet computers. Most personal content creation is digital via word processors and social media technologies such as blogging.

The dates are indicative rather than definitive. For example I chose 1650 for no other reason that it's the middle of the seventeenth century, which seems to be the century when literacy took off in England, and sits neatly bracketed by the Putney debates, the execution of Charles I, and the publication of Mr Playford's Dancing Master.

The Putney debates because they represent the 'serious' use of literacy, with manifestos and record taken by stenographers, and the Dancing Master as it shows that literacy is not just dismal and serious.

1885 I chose as it's a candidate date for the beginning of the recognisably modern world, with trains and steamships making long distance travel and communication possible, coupled with a a near universal and cheap postal service. It also marked the arrival of the Rover safety bicycle, allowing individual empowerment - ie you could get on your bike if you so wished. And while the first viable commercial typewriters had appeared 12 years early, it took some time for their adoption to spread.

2011. Well it might be 2012 or 2010. To misquote Trotsky it's too early to tell, but it's clear that the digital revolution that began in the 1980's has had a massive transformative impact on all aspects of society including the way we consume media. I chose 2011 as it's the year tablet computers  and e-readers became common, and that this was reflected with more and more newspapers moving to digital only or combined digital and print subscriptions - such as digital Monday to Friday, and the weekend papers in print to allow you to lounge about with them.

At the same time books are increasingly available in a digital format, and print is rapidly becoming the poor relation ...

Thursday 8 March 2012

Office Live ...

I've been rude about Office Live in the past - certainly, when I last tried to use it from home it seemed slow and cumbersome, and finicky about browsers, something which convinced that the more lightweight google apps environment was the way to go.

Recently something bizarre happened to my office macbook pro. It screwed up some clusters, messed up some files and permissions and since then Office has never been quite the same, occasionally refusing to open files and sometimes refusing to save files to anywhere but the desktop. (Actually, to be fair it's not just Office, Libre Office and Open Office seem similarly screwed).

What I should of course do is wipe the disk, reinstall the OS and reload everything else from time machine. That will take at least a day, and just now is not the time.

So I've been using Office 365. And you know, It's not bad. Being on a fast campus network surely helps but it copes with reasonably complicated spreadsheets and documents, and seems to work fine with Chrome.

In fact I'll go further and say the experience is not that different from using Office on a slow network - it really is a viable alternative to a local install. The only problems would come if I was to somewhere with a  slower performing network (like home), but then I can download my documents from Skydrive, work on them locally and save them back, and still work on the same ones using Office 365.

In light of my earlier post about Android laptops it would be interesting to know how well Office 365 would play on such a machine - if it played well it could be close to being a serious competitor to the MacBook Air ....

Android laptops (again)

I recently mentioned these 7 inch Android laptops you can find on ebay. At the time I wondered who if any buys them, but I recently saw someone using one in a cafe so I asked them how they found it.

The answer is that it has a browser, email and you can add a simple text editor to it and a calendar app. The major differentiation is that with a keyboard people can create longer emails and can write reasonable documents. The small format means that it can be shoved in a backpack or a bag, and long battery life means that it can last a day. The person concerned was a student so something that works like that as a second carry about machine makes sense. Just as my old Asus Eee that I use when travelling makes sense.

Talking of Asus, a colleague was good enough to let me play with their Transformer Prime. It's a tablet, it's a computer. Again its basically a tablet with a keyboard, but it's light, thin, robust and basically allows you to do everything that you need to do while out and about (email, write notes, short documents etc), and yet it can become a tablet when that form factor's more convenient, say for reading a newspaper or a book on the train.

The other thing is it's light, thin, and damned sexy looking, sexy enough to compete with a MacBook Air or an iPad.

However, at the end of the day it's the software base that sells hardware. Android is rich enough but lacks a decent office suite except perhaps for Polaris Office which seems promising (again purely on the basis of reviews).

Given that the Prime would be up against products such as the Macbook Air it needs to have equivalent capability - which these days means MS-office capability.

However, coupled with a cloud service to back up your files it does look like an incredibly attractive option ....


Friday 2 March 2012

How big is your data?

When you're trying to design a data archive one of the problems you face is trying to decide on how much storage you need.

This is in fact almost impossible. The real answer is 'Lots, as much as possible'. Providing a justifiable number is a little more tricky. This is because people really have no idea how much data they have.

For example, a few years ago I was responsible for the design and specification of an anthropological archive. Some of the material was digitised, some wasn't. The digitised information consisted of photographs say 500k each, Tiff images (something between 1-2MB each), MP3's of language say 20MB, and some video (God alone knows).

To make a plausible number, do something like:

([number digitised] x [average size]) + [(estimated annual digitisation rate] x [average size] x 3)

ie how big is what we have, how quickly will it grow, and how long will it be till we can buy more storage.

Of course no one knows how quickly it will grow and there probably isn't a business plan that gives you a number, but using the past rate of digitisation is probably a good start (2000 images in three years say).

Take this number. Double it (or triple it) as you want multiple copies for redundancy and to guard against bit rot and add 50% for head room and that's your answer.

For an average research institute or archive it's probably in the  low tens of terabytes, or if video is involved possibly a hundred or so max. Remember on the basis of these figures a two hundred page hand written journal scanned as TIFF is still less than half a gigabyte. The same goes for a thousand images. In comparison I have a 16GB SD card in my e-reader.  The average epub without graphics is under 500MB, ie I can have a year's reading on a single card, that coincidentally costs around $16. The point here is that storage has grown much more than the size of individual items and that the cost f storing stuff is minimal when costed on a per item basis.

The takeaway is that the data, while substantial, actually is small in terms of storage.

Now think about archiving scientific data. Astronomical data is big. We know this. So is Genomics/Phenomics data, and some of the earth sciences and climatology data.

The rest, well a lot of it is spreadsheets and analyses and probably not that big. I did try running a survey to find out, but with little success.

My guess is that most items are less than a gigabyte, and aggregated into a dataset or collection, much less than a terabyte. Certainly this is true for legacy data as until recently we have not been able to store (or back up) that much.

Given the size of storage devices available now this probably means we can store almost everything except for the data already known to be big, and video data.

This of course isn't the whole answer, as there are operational costs with backup, migration to new filestore as the old stuff ages, but what it means is that in Microsoft can give everyone with Skydrive 25GB so potentially could any research archive.

The major cost is not in storing it, it's on getting the stuff in and ingested and catalogued in the first place, and as such we probably shouldn't worry overmuch about allocation models. What's more important is to understand the data and its storage requirements ...