Thursday, 19 December 2013

Office 365 Wave15, Gmail and Evolution

During 2013 Microsoft upgraded Office365 to Wave 15. I’ve previously written about using both Gmail and Evolution with Office 365. The good news is that both still work.

However there is a major change. Wave 15 includes new standard host settings for POP and IMAP access. The online configuration guide I wrote last year has been updated to take account of the changes to the server settings

For POP access the new settings are:

Access type Host Port Encryption
POP outlook.office365.microsoft.com 995 SSL
SMTP smtp.office365.microsoft.com 587 TLS

If you are not sure if you are using the new settings, click on the settings (cogwheel) symbol and then click on

Options > Settings > Account > Settings for POP, IMAP and SMTP access

If you have previously configured your client with the pre-Wave 15 settings it will continue to work, but obviously Microsoft may withdraw this feature at a future date. If you have rolled this out in a production environment it might be worth thinking about an upgrade strategy.

If you wish to configure an IMAP client such as Alpine, the settings are almost the same:

Access type Host Port Encryption
POP outlook.office365.microsoft.com 993 SSL
SMTP smtp.office365.microsoft.com 587 TLS

The only difference being the access port used for IMAP access.

As always your mileage may vary.
Written with StackEdit.

Monday, 16 December 2013

Living with a Chromebook

Three or four months ago I bought myself a chromebook - really in frustration after my main laptop at home took 45 minutes to boot onetime after a series of Windows updates. At the same time I’d become frustrated with the slow start up time for my Windows netbook, so I thought a change of technology might be appropriate.

I could have bought myself an iPad with a keyboard, but a Chromebook seemed to fit the bill as
  • it was cheap
  • I use gmail as my primary mail provider
  • Ditto for Google Calendar
I find Google Docs good enough for most basic writing, and the Google spreadsheet application is similarly good enough for most basic operations.

At the same time I’m perfectly happy to use the web version of twitter, and I use the Innologic web application as my RSS feed reader. In short I can spend a whole afternoon quite productively inside of Chrome.

This was also intended as a second computer - the one that goes to meetings and goes travelling so being stateless and having everything synced to the web was a plus.

After a few months, I can say with some confidence that it’s a good choice. Evernote has been perfectly usable via the web. and I have never yet found myself in a situation where I wish I’d got something else.

In practice the chiclet style keyboard is no worse to type on than any other budget computer and the screen is reasonably bright and sharp.

I’ve also severely tested it’s offline capability. Mainly I’ve used it at home, which has been an interesting experience as it’s fair to say that our home ADSL link is groaning under the strain of supporting multiple chatty network connections, while competing for bandwidth with all of our neighbours increasingly busy connections.

Our landline connection goes via the neighbour’s apple tree to a junction box on an old phone pole, and then via the legacy overground copper network back to an exchange about 3km away. As a result we’ve always had dropouts and slowdowns. It is however getting steadily worse - as the ownership of computers and tablets rises, and as the network gets older and busier, the attenuation and loss of sync pulses has got markedly worse, meaning that using the network connection can be a fairly stochastic experience - especially in late afternoon.

I was worried about how this would affect the Chromebook. Obviously web browsing and email need a network connection to be present, but on the other hand the google docs app copes pretty well with the network going walkabout - I’m yet to lose anything significant. It’s here though that not having a local Evernote client does become a pain - basically I store all my notes and supporting documentation in Evernote, and having to rely on the web version means you need to be online to search it.

If we didn’t have these network problems I’d say that for most day to day work the Chromebook is perfectly adequate. I’d also say that the device copes well with flaky infrastructure

I have a 3G USB modem that I originally bought eighteen months ago for our trip to South Australia. That was fine as it worked well with my netbook at the time. Of course since then we have taken to using tablets more and the Chromebook of course does not support USB modems. I’ve just bought myself a noname 3G router that you plug the USB modem into and this allows you to create a wireless network. In theory this means that we will have a backup network when our main link is being stupid and yet allow us to take a network (and multiple computing devices - sad people that we are) with us.

The 3G router is buried in the Christmas post somewhere between Melbourne and Canberra, I’ll write how it turns out once I’ve had it in service for a month or so …
Written with StackEdit.

Thursday, 12 December 2013

Postal services going the way of the bookstore

That great nineteenth century invention, the universal postal service, is slowly dying.

Canada Post is phasing out delivery to the home, the New Zealand postal service is reducing deliveries to three days a week in urban areas. In the UK, the newly privatised postal service is getting rid of posties bikes while AustraliaPost soldiers on with five day a week deliveries for the moment, albeit by charging an arm and a leg for a fairly basic service.

And of course, it’s all due to email. Not online shopping, email. In fact most postal services are keeping themselves alive by delivering packages for the various online shopping service. Everything from second hand books and laser toner to running shoes and business shirts.

It’s truly staggering the amount, and variety, of stuff we buy online, and it all needs to be delivered, and in the main it’s the postal service that does the delivering.

The fact that the UK is getting rid of the red post office bike is a symptom - fewer letters and more parcels and packets means that the bike is not as useful as a trolley,

Here in Canberra we still get a postie on a moped, but with Australia Post encouraging people to collect their own parcels from a mail centre one wonders for how much longer it will be economic to provide a mail delivery service.

Certainly the amount of standard mail we get has fallen off a cliff. All our bills come electronically, all our bank and credit card statements. No one much writes to us anymore, it’s all email. About the only mail we get, other than junk mail, is a postcard from our dentist reminding us about checkups, plus an annual letter from the vet about the cat’s vaccinations.

In fact, we don’t get our mail delivered, we have a private post office box and have our mail delivered there, simply because we get a lot of books and other small packets and it was simply easier to have it delivered somewhere we could collect it from easily, rather than have to trek to some post shop in some nameless suburb and line up to collect our mail.

At the same time we send very few letters. Christmas cards and the occasional official document and that’s it. Even though I write to my elderly father, who despite trying valiantly, has never come to grips with email, I use a service that takes my letter, prints it out, and posts it overseas, for a dollar less than it would cost me to do the same at home.

So, like bookstores, the postal service will gradually wither away. Delivery services in the cities and country towns will probably disappear, and the postal service will be part of history …

Written with StackEdit.

Wednesday, 11 December 2013

Using Word

I’ve just tweeted a link to an article by Charlie Stross on why we still use Word. And he’s dead right.
Over my thirty or so years I’ve had a lot to do with text processing - converting documents from one format to another, creating documents that could be used as structured input but still read well as print and the rest.
I still even have a blue and gold Word Perfect coffee mug ...

WordPerfect Mug

I’ve also seen a lot of horrorshows in the print and typesetting world, including a room of typsetters (people, not machines) marking up text in vi fullscreen on lovely shiny Macs, or the time a certain printer manufacturer explained how to override the defaults by creating this firmware macro (again in vi) and prepending it to the front of a file.

But, at the end of the day we come back to Word. Almost everything uses and understands the docx format, and Word of course does it better, quicker and more optimally than product with reverse engineered docx handling.

There is little or nothing in any of the document workflows out there that could not be done by something else, TeX perhaps - equally powerful, but unfortunately no one wrote a decent graphical front end for it in the eighties when Word first came on the scene.

Word came to dominate the market place. Other, equally good alternative products, eg WordPerfect fluffed the transition to windows environments, and some, such as AmiPro were just plain out competed. And so it came to pass that everyone who wanted your text documents expected a word document.

Which is why I have a copy of Word. Libre Office might be as powerful. Markdown might be faster, Google Docs good enough for meeting notes and brainstorming, but at the end of the day, if the text has to go through a workflow it means Word.
Written with StackEdit.

Thursday, 5 December 2013

Mining the social web [book review]

The social web is a phenomenon of our times when the web started to reflect our interactions and communications.


Who speaks to whom, who says what about what, how many people talk about what. Information that marketers want, the information underlying the altmetrics movement in academia, and it would appear,  the various security agencies.


Mapping out interactions is not new, the Republic of Letters project did much the same by analysing the correspondence of eighteenth century savants, but it is both the scale of the social web and the complexities of the analyses made possible by cheap processing power.


This book covers the major social networks such as Twitter, LinkedIn, Facebook, and Google+, with an emphasis on Twitter. The author also discusses mailbox corpus creation and analysis, and the analysis of semantic web data, and also interestingly, GitHub as a social platform.


This book is not a book for the dilettante. More than half the text consists of Python code and the reader really needs to work with the code examples to gain full value from the book. The book also provides a rapid introduction to OAuth, and ranges over topics as diverse as simple text analysis, cluster analysis, natural language processing,  and the use of applications such as MongoDB.

This is however a very good book for anyone seeking to work with the social web and would serve as a very useful primer or as a textbook for a module on data mining. The code examples are clear and nicely structured, making them easy to follow and work with.

Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google , 
GitHub, and More 
O'Reilly Media (2013), Edition: Second Edition, Paperback, 448 pages 
- also available as an ebook in most common formats

Tuesday, 3 December 2013

2013 -what worked

Over the past few years I’ve been doing a ‘what worked’ every December, but this year I’ve also been doing quarterly updates on what I actually use. So this year’s post is also my quarterly update on the tools used

  • Dropbox – used mainly to sync files across computers irrespective of file format
  • Libre Office – platform agnostic document editor for off line writing. Often used in conjunction with Dropbox
  • Evernote – used as a notes and document management system (Nixnote is used on Linux to access my evernote files)
  • Wunderlist for ‘to do’ list management
  • Chrome – browser extraordinaire
  • Gmail – email solution
  • Postbox - lightweight email client for windows to cope with slow connections - used with great success in Sri Linka
  • Evolution - linux email client principly used in conjunction with Libre Office
  • Google docs – fast means to create quick and dirty documents irrespective of platform
  • Windows Live writer – offline blog post creation TextEdit – android text editor for note taking and integrates nicely with evernote and Gmail
  • Kate - my favourite editor
  • TextWrangler - my secondmost favourite editor
  • Stackedit - Google chrome markdown editor (and blog posting tool)
  • Pandoc - converts markdown to a range of other formats
  • Microsoft Skydrive – used for document backup
  • Amazon cloud drive - also used for documents
  • Excel Web App – for these occasions when Google Spreadsheets or Libre Office Calc will not do
  • GanntProject for gannt chart generation
  • InoReader for RSS feed tracking
  • Twitter for tracking interesting things – rarely for messaging
  • Hosted Wordpress and blogger for blogging, and wikidot for creating structured web pages
  • Hojoki for tracking documents and tasks (Gives unified visibility of GoogleDocs, Skydrive, GitHub, Dropbox and Evernote)

The real change has been to the hardware used. My trusty old Android tablet is still in use for checking email and reading news websites at breakfast time - as evidenced by some of the gluckier marks on the screen. The newer seven inch device is still in use as a note taker and I see no reason to change for the moment although I do admit being tempted by the new iPad mini - more because of the software base and the availability of decent keyboard solutions than anything else.

Textedit - the android text editor is now unsupported and while I’m continuing to use it successfully I fear that one day there will be an api change on google drive or evernote that will break things.

The real change has been the Chromebook. It allows me to check my email. create quick and dirty drafs using either Google Docs and StackEdit, as well as surf the web and research things. If anything has ever demonstrated how much of my day to day reading and specification checking has moved to the web the Chromebook certainly has.

It’s also fast, well fast enough, boots quickly and shuts down quickly. It’s not a full featured computer but it most definitely provided on the go functionality.

In fact it shows why my original Asus netwbook was such an effective tool and the windows netbook a bit of a clunker - basically load time. The platform is irrelevant, it’s access to a browser that counts.

However I still use my windows netbook - the Chromebook’s dependence on the internet makes it useless for off net travel, and my windows netbook does support my Virgin 3G dongle, though admittedly I seem to have been staying in places with poor Virgin coverage lately.

My Kindle has become my recreational reading devoce of choice although my vereable Cool-er has taken on a new life as a means of reading Gutenberg epub texts.

The Asus netbook has finally reached the end of it’s useful life but I’m tempted to try Crunchbang Linux on it as a basic writing/note machine, especially as it has a nicer keyboard than my seven inch Android tablet.

Despite several attempts to resurrect them I’m forced to admit that my pair of ppc imacs are too old and slow to be much use and are most probably headed for the great data centre in the sky …

Written with StackEdit.

Monday, 2 December 2013

Ed Summers - Experiments in access

NLA Innovative Ideas talk 02 Dc 2013

Ed Summers @edsu

This talk was held at the National Library of Australia. I went out of curiousity expecting a demo of cool things from the Library of Congress. Well there were certainly some cool things but given my current interest in the quantification of impact I came away with something else - a set of arguments and positions about access and impact and what exactly that means.

This post is basically my edited and cleaned up notes - any opinions or asides are my own and this is my interpretation of Ed’s talk. Comments and asides are marked up like this.

Bio

  • Ed Summers has worked on digitisation at Library of Congress
    • Linkypedia developer among other tools
    • Now a softwre dev at library of Congress

Introduction

  • Library of Congress is de facto US National Library

    • includes repository devlopment centre - essentially a digital preservation group
    • but no final view on what is preservation or repository
    • could make the argument that a digital repository is just all the infrastructure for storage and access facilitation
    • use as justification to focus on doing useful things

    • could make similar argument for eresearch - rather than focus on grand initiatives focus on being useful

Doing useful things

  • role of access especially web based access and what access means in the context of digital preservation

    digital preservation is access in the future

  • preservation means access as a way of enabling preservation

  • access really is the same as web based access - no brainer
  • if people engage with your content it will be ustained

  • balanced value impact model - Simon Tanner

  • think about how preservation has impact
  • what is the benefit? eg cultural preservation and return of digital patrimony to the originating communities, such as Aboriginal groups - may not show formal cost/benefit result

  • idea of web as customer service medium - the great success of the web has around involvement and engagement on a mass scale eg social media

  • example nla newspaper ocr correction by crowd sourcing

  • success of wikipedia by author engagement

  • GLAM galleries archives libraries and museums

  • wikipedia glam effort to engage with GLAM community
  • use of GLAM content in wikipedia

demos

  • American memory - 1990 effort to digitise Library of Congress data and distribute on laser disc to universities to provide access
    • innovative move to web 1993
  • very hierarchical content model oriented round collections - very taxonomic view
    • lots of clicks to get to an item
  • wondered on content use -moved to Flickr to make more searchable photo stream - no massive click frenzy to get to an item
    • simplify access -get 200% increase in access
    • flickr allowed people to tag and reuse content ie engage with content
  • click counting does not measure impact

  • linkypedia - shows how web content used on wikipedia - find how many articles on wikipedia use a a particular resource for citation

    • gives counts give number of secondary links - indicator of degree of value and reuse
  • usage can be monitord by rss
  • see reuse of data in sites such as pinterest

  • wikistream - harvest content from wikipedia via irc to harvest updates

  • wikipedia content very dynamic - lots of changes
  • easy to build an application by gluing tools- unix style building

  • wikipulse -shows edit activity as a spdometaphor

  • wikipedia community is active and engaged

  • Chroma - tool running on amazon to give impression of Wikipedia activity and site usage - question is what use people made of a resource

  • visual representations gives richer more impressistic use of sites

Others

  • use of twitterbot to provide auto feed 100 yeasrs ago today to build engagement

  • mechanical curator - bl tumblr feed of image detail

Conclusion

It’s all about usefulness - if it is useful people will engage with content, cite it and make use of it, and much of the repository space should be around providing access to content - if it’s useful people will engage

Thursday, 28 November 2013

Impact and Impact Story

At Tuesday’s ANDS/Intersect meeting one piece of software getting some traction was Impact Story.

So I thought I’d check it out and make myself an ImpactStory profile. After all the only way to find out about these things and their usability, or otherwise, is to experiment on oneself.

In very crude terms, ImpactStory is an attempt to build a Klout like application for academia. Like Klout it attempts to quantify that nebulous thing ‘influence’. Unlike Klout, it is an open source publicly funded initiative (by the NSF and Alfred P. Sloan Foundation, no less) and transparent - the code and the algorithms used are up for scrutiny.

And of course it’s an important initiative. Scholarly communication has changed. Researchers blog and tweet about work in progress and this forms an important source of preliminary announcements, as well as the usual crosstalk about ideas and interpretations. Project websites have increasingly become a source of research information, etc etc.

So what about ImpactStory ?

I think it’s fair to say it’s a work in progress - it harvests data from twitter, orcid, wordpress, slideshare, vimeo and a few others. Crucially it doesn’t harvest material from blogger, academia.edu or scribd, bot of which are often used to host preprints and grey literature, such as working papers and various reports, for example Rebecca Barley’s work on late Roman copper coins in South India.

In short it’s the first or second pass at a solution. It shows us what the future might be as regards the quantification of ‘impact’, basically la bruit around a researcher and their research work, but it is not the future.

In its current state it probably under reports a researcher’s activity, it can show that a researcher has some impact but it does not show a lack of impact, and we should consequently be wary of including its metrics in any report.

Written with StackEdit.

Monday, 25 November 2013

Student Computer Use ii

Way back in September this year I blogged about Student Computer use. Slightly to my surprise that post garnered quite a high readership despite its extremely informal basis

  • walk round campus at lunch time
  • note what computers you see students using
  • blog about it

Over the weekend a colleague from York posted some interesting figures from there on computer use

OS stats of connections INTO campus for desktop OSs: Win81 1%, W8, 9%, Win7 50%, Vista 5%, XP 15%, Linux 3%, OS X 17%

OS stats of computers ON campus: Win81 5%, Win8 20%, Win7 19%, Vista 7%, XP 16%, Linux 6%, OS X 26%

Which is kind of interesting. Now I don’t know how the figures were filtered but they probably reflect a mix of staff and student connections, with, given that York is a universtity where a lot of students live on campus, the second set of figures reflecting student use more accurately than the first.

From this I think it’s fair to say students have a strong preference for OS X. The presence of Vista and XP is interesting, and I think a reflection of my long held suspicion that students buy themselves a laptop at the start of the their degree course and never ever upgrade the OS over the course of their three or four years.

If I’m right, it probably means that the end of support for XP is less of a problem than it might be, as the XP and Vista machines will age out of the population by the end of the northern hemisphere academic year (Of course here in Canberra, our academic year is already over).

This also explains why quarter of the machines are windows 8 or 8.1 - they represent new purchases.

Connections into campus probably reflect a mix of staff and graduate student connections - and the dynamics of machine replacement are probably different for them - they probably use a machine for more of its natural life of four to five years, and given the initial distaste for Windows 8, they probably tried to replace machines with Windows 7 where possible.

The numbers of Vista and XP are concerning, but given that most people never upgrade computers anyway one would need to take human factors into account in any upgrade campaign.

Sidegrading to Ubuntu is probably a step too far for most of these users, given the current low penetration of Linux among that community.

However, the key takeaways are that OS X has made substantial inroads into the student computer community, Linux hasn’t, and despite OS X’s advance Windows OS’s are still the majority

Written with StackEdit.

Friday, 22 November 2013

Impact (again!)

I’ve been thinking some more about impact and what it means for datasets.

From a researcher’s point of view impact is about trying to create is something akin to a Klout score for academics.

Klout is a social media analytics company that generates a ranking score using a proprietary algorithm that purports to measure influence through the amount of social media engagement generated.

Because I experiment on myself I know what my Klout score is - an average of 43 (+/- 1), which is respectable but not stellar. Now the interesting thing about this score is
  • While I’ve connected my Facebook account to it I don’t actively use Facebook
  • I have a small band of loyal twitter followers (230 at the last count)
  • Google Analytics shows my blogs have a reasonable penetration with an average readership of around 30 per post
In other words, while I am active in social media terms, I’m not massively so. So Klout must be doing some qualitative ranking as well as some quantitative ranking, perhaps along the lines of X follows you, X has N followers, and these N followers have an average of M followers. I’m of course guessing - I actually have no idea how they do it. The algorithm is proprietary and the scoring system could be completely meaningless.

It is however interesting as an attempt to measure impact, or in Klout terms social media influence.

Let’s turn to research papers. The citation rates of scientific papers reflect their influence within a particular field, and we all know that a paper in the Journal of Important Things gets you a nice email from the Dean, and one in Proceedings of the Zagistani Cybernetics Institute does not. And this of course is the idea behind bibliometrics and attempting to quantify impact. Crudely, a paper in a well respected journal is likely to be more widely read than one that is not.

Even if it is not widely cited the paper has had more influence and one less widely read.
And of course we know it probably is more or less true. If you’re an ethologist you’re probably going to want some publications in Animal Behaviour on your CV.

So we could see it could sort of work within disciplines, or at least those in which journal publication is common. There are those, such as computer science, where a lot of the material is based around conference proceedings and that’s a slightly different game.

Let’s now think about dataset citation. By it’s nature data that is widely available is open access and there is no real established infrastructure, with the exception of a few dedicated specialist repositories such as the Archaeological Data Service in the UK and IRIS in the US for Earth Sciences.

These work because they hold a critical mass of data for the disciplines, and thus archaeologists ‘know’ to look at the ADS just as ethologists ‘know’ to look at Animal Behaviour.

Impact is not a function of the dataset, but of some combination of its accessiblity and dissemination. In other words it come down to
  • Can I find it?
  • Can I get access to it ?
Dataset publication and citation is immature. Sites such as Research Data Australia go some way by aggregating the information held in institutional data repositories, but they are in a sense a half way house - if I was working in a university in the UK would I think to search RDA - possibly not - and remember that most datasets are only of interest to a few specialists so they are not going to zoom up the Google page rank.

At this state of the game, there are no competing repositories in the way that there are competing journals, which means that we can simply use raw citation rates to compute influence, and to use citation rates we need to be able to identify individual datasets uniquely - which is where digital object identifiers come in - not only do they make citation simpler, they make counting citations simpler …
Written with StackEdit.

Raspberry Pi as a media player

I’ve recently been looking at buying myself a Raspberry Pi - admittedly more as toy than for any serious purpose - as I’ve been feeling the lack of a linux machine at home ever since my old machines built out of recycled bits died in a rainstorm.

(To explain - I have a bench in the garage where I play with such things, and not unnaturally I had the machines on the floor. We had a rainstorm of tropical dimensions, one of the garage downpipes blocked, and the rainwater backed up and then got in under the tin and flowed across the floor, straight through my linux machines).

Anyway, to the point, I’ve been researching options to buy a Pi, especially as we don’t really have much of a local ecosystem in Australia.

And one thing that became very obvious is that they have a major role in powering media players and displays - which kind of makes sense given that they have HDMI output and run a well known operating system, making the ideal for streaming content off of a local source or powering a display system - run a kiosk app on the Pi, and push your content out onto a display device - wonderful what you can do with cheap technology.

Again, by pure coincidence I came across a post describing the role of cheap Android devices and how in the main they are used as ways of viewing video content or else as embedded devices.

In other words there is a lot of under the radar demand for content viewing which is different from how we think tablets are used - for more engaged activities such as web surfing and email, as well as routine tasks like online banking.

And here we have the key takeaway - tablets like raspberry pi’s are versatile computing devices, just as pc’s are. And just the same way pc’s have a lot of uses other than general purpose computing, tablets and other such devices do.

PC’s became general purpose computing devices in the main because of their open architecture and the fact that various factories in the Far East could make bits for the relatively cheaply, meaning that if you wanted to make a gene sequencer say, rather than having to design embedded hardware, and then have the difficulty of maintaing and upgrading it, you could write software and use the standard interfaces available on a pc - thus significantly reducing your development and delivery costs.

Android, and the Raspberry Pi, both of which are open systems like the original PC are giving us a similar effect - cutteing development and delivery costs for embedded systems as the software environment is already there …

Written with StackEdit.

Wednesday, 6 November 2013

Measuring impact

Recently, I’ve been thinking a lot about how to measure impact in a research data repository.

Impact is a fairly loose concept - it’s not something objectively countable such as citation rates - it is rather more some expression of perceived value.

Here, perceived value is some way of producing numbers (and numbers of course don’t lie) that seem to indicate that the data is being used and accessed by people.

Counting accesses is easy You can use something like AWstats - this will tell you who from where is accessing what - actually of course it doesn’t, it tells you that a computer from a particular ip address has accessed a particular document.

There is of course no way to tell if that is a result of someone idly surfing the web and following a link out of curiosity, or if it’s the start of a major engagement. Both have impact but there’s no way of quantifying or distinguishing the two.

Likewise if you rely on ip addresses, there is no way in this always on contracted out world we live in being able to tell who is accessing you from a revered academic institution’s network and who is on the number 267 bus is of little value. The fact the number 267 bus terminates next to the revered institution is probably unknown to you.

Basically all web statistics gives us is crude counts. This url is more popular than that url. We cannot assess the value of the access.

For example if I look at the google analytics figure for this blog I can say that most posts are read by around 30 individual ip addresses. Some are read by considerably more, a few are read by considerably fewer people. If I look at the originating ip addresses I can see that a few are read from recognisable academic institutions, but that most of the accesses come from elsewhere.

For example, I know that a friend of mine at Oxford has read a particular post, but no Oxford University ip address is reflected in the accesses. I’m going to guess he read it on the bus, or at home.

And then of course there is the question of exactly what do the crude counts tell us. Two of the most popular posts on this blog have been on using office365 with gmail and using google calendar with orage. Both have clearly had some impact as people have emailed me both to compliment and to complain about them. Interestingly, most people seem to have found them via a search engine, not through being passed on from individual to individual via some other mechanism such as twitter.

And that perhaps explains the problem with impact. People nowadays search for things rather than look them up (I know that seems a tautology, but what I mean is that they use google in preference to looking at a research paper and following the citations).

Which of course mean that impact is at the mercy of the search engine algorithm. And in the case of datasets, or other digital objects, which are basically incomprehensible blobs to the search engine we are at the mercy of the quality of the metadata associated with these objects …

Written with StackEdit.

Wednesday, 30 October 2013

A gang of seventeenth century puritans and research impacts ...

I’ve recently become interested in the history of the Providence Island Company.

In abbreviated terms, in the late sixteenth century and early seventeenth centurey there was a slew of Merchant Venturers companies set up to fund and initiate exploration for new lands.

This was in the main a reaction to the Spanish conquest of the Aztec and Inca polities and the resultant flood of wealth. There was a range of companies, including the East India Company, but one of the most interesting was the Providence Island Company.

The what ? Well if you know anything of the disputes between the king and parliament, and look at the names of the principal investors in the Providence Island Company, some names leap out at you - John Pym for one, and others less well known such as Gregory Gawsell, who was later the treasurer of the Eastern Association, one of the most effective Parliamentary military organisations in the early stages of the civil war.

In short the Providence Island Company provided a legitimate vehicle for men who went on to lead the Parliamentary side in the early stages of what became the first English Civil War.

None of this is of course new. Historians have known this for years, just as they know that various country houses, such as Broughton Castle in Oxfordshire, owned by the protagonists are known as the scenes of various conversations and resolutions in the run up to the wars.

In short they all knew each other, and many of them were connected to the people who signed Charles the first’s death warrant.

So being a geek, I thought it might be fun to try and build a social graph and then feed it through a network analysis tool such as Gephi.

There is of course no convenient list of names and relationships, so I started building one using YAML - perhaps not the ideal choice, but it lets me do little index card entries for each person, with information like who participated in which body, and who knows who. Due to it’s flexibility YAML allows me to create a little folksonomy rather than trying to make a formal database while I’m working out what I want to do.

At some point I’ll probably need to write a little code to express the YAML content as RDF triples. The great virtue of YAML is that it’s text based, which means that I can use regexes and suchlike to extract information from the file.

As a data source I’m using wikipedia and following links to compile my YAML folksonomy. Very geeky, but it keeps me amused.

And it’s quite fascinating in a geeky sort of way. For example, Thomas Rainsborough, a Leveller leader (in so far as the Levellers had leaders) was related by marriage to John Winthrop, the Puritan governor of Massachusetts and had also visited the Providence Island colony, even though he had no direct relationship with the directors of the Providence Island Company.

Once I’ve got a big enough data set I’ll transform it and feed it into Gephi and see what comes out.

However this is not just an exercise in geekery, it does have a degree of more general applicability.

Universities are very interested these days in the impact that their researchers have. Using similar social network analyses it ought to be possible to show who has collaborated with who, and who regularly they have collaborated with people.

As as result of our Metadata stores project we actually have a lot of this data, and will shortly have it in an RDF expression.

Potentially by analysing information such as the email addresses used in subsequent papers it might be possible to show where secondary authors (typically graduate students and postdocs) have moved to. Coupled with some bibliometric data this might just give us a measure of the impact graduate students and postdocs within five years say of their moving elsewhere.

In other words trying to gauge the impact of researchers, not just research papers …

Thursday, 24 October 2013

Further thoughts on eresearch services

I’ve been watching the eresearch services thread from the eresearch 2013 conference. I’m beginning to regret not having gone - it looks to have been more interesting than I expected, but that’s life.
A lot of people seem to be getting interested in eresearch services. I’ve already expressed my opinions about such things, but generally a heightened level of interest is probably a good thing.
However there seems to be a bit of conflation going on, so here’s my $0.02:
  • eresearch is not the same as big data - big data refers to the handling of very big data sets and their analysis using inferential analyses, eresearch refers to the application of computational and numerical techniques to research data
  • eresearch is not the same as digital humanities - digital humanities really refers to the move to using electronic resources in the humanities - this move may enable the application of eresearch techniques to humanities research
  • astronomers, physicists, economists and many more have been using inferential analyses such as cluster analysis for many years - eresearch is not new, but its spread and penetration is
  • the rise of cheap (cloud) computing and cheap storage are key drivers in the adoption of eresearch by allowing bigger datasets to be handled more easily and cheaply
In short, you can do perfectly good eresearch with an old laptop and an internet connection, you don’t need all the gee-whizzy stuff, all you need is a problem, the desire to solve it, and a little bit of programming knowledge.
Any eresearch service is going to be there to support such work, by faciliating access and providing advice and support to researchers, in fact it’s taking on the role of research support that belonged to computing services in the days before the commodification of computing and the rise of the internet when in the main computing meant time on a timesharing system to run some analytical software …
Written with StackEdit.

Wednesday, 23 October 2013

Data Science for business [book review]

Data Science for Business: What you need to know about data mining and data-analytic thinking
Foster Provost and Tom Fawcett
O'Reilly Media 

Data science is the new best thing, but like Aristotle’s elephant people study to define 
exactly what data science is and what the skills required are.

When we see data science we tend to recognise what it is, that mixture 
of analysis, inference and logic  that pulls information out of numbers, be it social 
network analysis, plotting interest in a topic over time, or predicting the impact of the 
weather on supermarket stock levels.

This book serves as an introduction to the topic. It’s designed for use as a 
college textbook and perhaps  aimed at business management courses. It starts at a very 
low level, assuming little or no knowledge of statistics or of any of the more advanced 
techniques such as cluster analysis or topic modelling.

If all you ever do is read the first two chapters you’ll come away with enough 
high level knowledge to fluff your way through a job interview as long as you’re 
not expected to get your hands dirty.

Chapter three and things get a bit more rigorous. The book noticably changes 
gear and takes you through some fairly advanced mathematics, discussing 
regression, cluster analysis and the overfitting  of mathematical models, all of 
which are handled fairly well

It’s difficult to know where this book sits. The first two chapters are most 
definitely ‘fluffy’, the remainder demand some knowledge of probability theory 
and statistics of the reader, plus an ability not to be scared by equations embedded 
in the text.

It’s a good book, it’s a useful book. It probably asks too much to be ideal for the 
general reader or even the non numerate graduate, I’d position it more as an 
introduction to data analysis for beginning researchers and statisticians more than 
anything else, rather than as a backgrounder on data science.

[originally written for LibraryThing]

Tuesday, 22 October 2013

What does an eresearch service look like ?

There has been a lot of discussion about eresearch and eresearch services. However when you try and pin down what constitutes an eresearch service it seems to be all things to all people.

In an effort to try and find some consensu I did a very simple survey. I typed 'eresearch services' into Google and chose pages from Australian universities. I've tabulated the results of this fairly unscientific survey in a google spreadsheet.

Each institution of course described the service on offer differently, so the spreadsheet is purely my interpretation of the information available on the web.

There are some clear trends - all sites offer help with
  • storage
  • access to compute/virtual machines
  • cloud services
  • collaboration (which includes data transfer and video conferencing)
Other services tend to be more idiosyncratic, perhaps reflecting the strengths of individual institutions. However it's clear that a lot of the effort revolves around facilitation.

My personal view is that we do not try to second guess researchers. Instead of prescribing we facilitate by helping researchers get on with the business of research.

This is based on my experience. Over the course of the data commons project we fielded a number of out of band questions such as
  • Access to storage for work in progress data
  • Data management and the use of external services like dropbox and skydrive
  • Access to a bare vm to run an application or service
  • Starting a blog to chronicle a project
  • What is the best tablet for a field trip
  • How can I publish my data on the web
which suggests what researchers want is advice and someone to help them do what they want to do - a single point of contact.

Provision of a single point of contact hides any internal organisational complexity from the researcher, it becomes the contact’s problem to manage access to services and not the researcher’s.

There are of course other views - for example this presentation from eresearch 2013 but I think we can agree that what researchers want is easy access to services and a small amount of support and help ...

Monday, 14 October 2013

Recovering data serially

Over the past few weeks I've noticed a number of posts along the lines of

we've an old XYZ machine without a network connection, can anyone help with recovering data from it?

Not having an ethernet connection is a problem, but assuming that the machine still powers ups and the disk spins, it might not be so much of a problem.

The key is to go looking to see if it has terminal application. This isn't as odd a question as a lot of computers were used to access timesharing systems back then in these pre web days, and a terminal application was fairly standard.

The good thing about terminal applications back then is that they usually incorporated a serial file transfer protocol such as xmodem, ymodem, zmodem or kermit. Of these kermit is perhaps the best, not the least because it can be put into server mode and you can push files from your host in batches.

The good news is that both lrzsz, the ?modem client for linux and ckermit are available for install on ubuntu from the repositories via apt-get.

Then all you need is a usb to 9 pin serial adapter cable and a serial nine pin null modem cable - both avaiable from ebay for a few dollars and then you should be ble to transfer data from your old machine to the new.

Yo will of course need to set up things like parity and baudrate, and it might be an idea to practice transfering data first by setting up a second linux machine and transferring data between the two - see [http://www.stlinux.com/install/getting-started/serial](http://www.stlinux.com/install/getting-started/serial for an exaple.

Despite this sounding a bit of black art, it's actually quite easy. The other good thing is that a number of embedded communications devices are still configured over a serial port, so most network technicians still know something about debugging serial connections.

Once you have a managed to establish a working connection you should then be able to get the serial communications software on your problematical machine to talk to your newly enabled serial host.

From there it's simply a matter of transferring the files across one by one and converting them to something usable - if they're wordprocessor files, LibreOffice can read most of the legacy formats and web based services like cloudconvert and zamzar can read many more ...

Written with StackEdit.

Thursday, 10 October 2013

Archiving, persistence and robots.txt

Web archiving is a hazardous business. Content gets created, content gets deleted, content gets changed every minute of every day. There's basically so much content you can't hope to archive it all.

Also a lot of web archiving assume that the pages are static, even if they've been generated from a script - pure on the fly pages have no chance of being archived.

However you can usually make an assumption that if something was a static web page and there long enough, that it will be on the wayback machine in some form.

Not necessarily it turns out. I recently wanted to look at some content I'd written years ago. I didn't have the original source, but I did have the url and I did remember searching successfully for the same content on the wayback machine some years ago. (I even had a screenshot as proof that my memory wasn't playing tricks).

So, you would think it would be easy. Nope. Access is denied because the wayback machine honours the sites current robots.txt file, not the one current at the time of the snapshot, meaning that if your favouriet site changes its robots.txt between then and now to deny access you are locked out.

Now there's a lot of reasons why they've enacted the policy they have but it effectively locks away content that was once public, and that doesn't seem quite right ...

Written with StackEdit.

Wednesday, 9 October 2013

Bufferapp

If you're someone who follows my twitter stream you may have noticed that I seem to post bursts of tweets around the same time every day.

This because I've taken to using Bufferapp to stage some of my tweets. Basically bufferapp is a little application that integrates nicely with Chrome and AddThis which allows you to put tweets into a buffer to be reposted later in the day.

I only use the free version, which means that my buffer is only 10 deep, but that seems to cover most of the tweets I'm likely to make in a day. I'm not obsessive compulsive about twitter, no matter what it seems like.

Why use it?

One could imagine lots of scenarios including making it look as if one was online when one wasn't but my reasons are a little different. Basically I tweet about two topics - geeky stuff to do with computing and data storage, and equally geeky stuff about history and archaeology. There is of course an overlap, for example, big digitisation projects and computational text analysis do provide a degree of overlap but in the main there are two topic groups and two topic audiences. (I had the the same thing with my blogs, which is why I split them - this one is more technically focussed, while the other one is a bit more discursive and random)

When I look at my twitter followers I can say very roughly that the computing and data people are in the same or adjacent timezones to me, but the people interested in the geeky history stuff are clustered in North America and Western Europe - of course that's not quite true, I have followers in South Africa and Chile to name but two, but it's a good enough approximation.

In other words the history followers tend to be between eight and eighteen time zones away from me on the east coast of Australia, and hence unlikely to be awake when I'm tweeting (well except for Chile and the west coast of America where there's a few hours of overlap).

So I've taken to using bufferapp to delay the tweets for that audience, which has the effect of de cluttering the feed for the computing and data people.

I'm still tweaking the schedule and I'm conscious (because some of my followers have said so) that some of both communities like a leavening of the other sort of information so it's not a hard split, and of course there's always the paper.li daily summary of the most popular tweets from both me and the people I follow ...
Written with StackEdit.

Tuesday, 8 October 2013

Usenet, VM's and Pan

Like most people who were in at the beginnings of the internet as something widespread (I'll say sometime around 1991 when JANET connected to the Internet and abandoned Coloured Books for TCP/IP for me) Usenet News filled the niche taken nowadays by twitter and blog feeds.

Usenet news fell apart and lost popularity in the main due to it being hijacked by trolls and other malefactors with the result that people walked away from it when the signal to noise ratio got too high.

In fact I closed down work's usenet news server a few years ago. It was quite an interesting experience as we had a couple of downstream servers elsewhere that we provided a feed to under an SLA. Finding someone at the downstream sites who could remember what a usenet news server was and why we should agree to terminate the SLA (and the feed) was a task in itself. People really don't use it anymore.

However, despite that there's still a couple of technical newsgroups I still find useful, especially now the trolls have abandoned it for twitter and facebook, making the experience kind of like the old days.

To access them I use pan running on a minimal crunchbang linux vm.

This of course has the problem of getting the information out of pan and into somewhere useful - having that useful post sitting on a vm you run up once a week isn't really that useful.

There's lots of ways of solving that problem, but I didn't want to spend a lot of time installing extra software such as dropbox on the vm. My answer is incredibly simple and incredibly old school - install alpine on the vm, set up a dummy account on outlook.com, manually attach the usenet posts as text file and email them to my evernote account, my work account, or where ever suits.

Remarkably old school, but remarkably efficient ...

Written with StackEdit.

Monday, 23 September 2013

So what do you actually use -Q3 update

Since my Q2 update I have of course become a Chromebook user - and that's the major change this quarter ...
  • Dropbox – used mainly to sync files across computers irrespective of file format
  • Libre Office – platform agnostic document editor for off line writing. Often used in conjunction with Dropbox
  • Evernote – used as a notes and document management system (Nixnote is used on Linux to access my evernote files)
  • Wunderlist for 'to do' list management
  • Chrome – browser extraordinaire
  • Gmail – email solution
  • Postbox - lightweight email client for windows to cope with slow connections
  • Evolution - linux email client principly used in conjunction with Libre Office
  • Google docs – fast means to create quick and dirty documents irrespective of platform
  • Windows Live writer – offline blog post creation TextEdit – android text editor for note taking and integrates nicely with evernote and Gmail
  • Kate - my favourite editor
  • TextWrangler - my secondmost favourite editor
  • Stackedit - Google chrome markdown editor (and blog posting tool)
  • Pandoc - converts markdown to a range of other formats
  • Microsoft Skydrive – used for document backup
  • Excel Web App – for these occasions when Google Spreadsheets or Libre Office Calc will not do
  • GanntProject for gannt chart generation
  • InoReader for RSS feed tracking
  • Twitter for tracking interesting things – rarely for messaging
  • Hosted Wordpress and blogger for blogging, and wikidot for creating structured web pages
  • Hojoki for tracking documents and tasks (Gives unified visibility of GoogleDocs, Skydrive, GitHub, Dropbox and Evernote)
The real change has been to the hardware used. My trusty old Android tablet is still in use for checking email and reading news websites at breakfast time - as evidenced by some of the gluckier marks on the screen. The newer seven inch device is still in use as a note taker and I see no reason to change for the moment. The real change has been the Chromebook. It allows me to check my email. create quick and dirty drafs using either Google Docs and StackEdit, as well as surf the web and research things. If anything has ever demonstrated how much of my day to day reading and specification checking has moved to the web the Chromebook certainly has.
It's also fast, well fast enough, boots quickly and shuts down quickly. It's not a full featured computer but it most definitely provided on the go functionality.
In fact it shows why my original Asus netwbook was such an effective tool and the windows netbook a bit of a clunker - basically load time. The platform is irrelevant, it's access to a browser that counts.
Incidentally over the last month I've been failing to upgrade the memory in the windows netbook. It turns out that there are those that use DDR2 and DDR3. I got myself some DDR 3 and it turns out I've got one that uses DDR2. What is hopefully the correct module is currently somewhere between Shenzen and here. If it works I'll write up the whole upgrade saga ...
Written with StackEdit.

Tuesday, 17 September 2013

Using Alpine with outlook.com

On Friday I wrote that you could now use imap with Microsoft's Outlook.com mail service.

I also wrote that I'd had problems getting it to work with evolution on a virtual box vm running crunchbang. I still don't know why it didn't work but I'm happy to report that Alpine - the newer updated replacement for Pine - works just fine on the same vm.

Pine is a mail program with a heritage going back to the early nineties and was one of the first mailers to use imap.

It was extensively used on multi user unix systems, and when I was managing York's early nineties managed pc desktop service we used pc-pine as a pc mail client due to licensing and performance problems with the other windows imap clients available at the time. (This wasn't such a problem as it might be, existing Unix pine users made the shift pretty seamlessly, we could reuse the training materials and documentation from the Unix version, and I wrote a program (in Turbo Pascal no less) to automatically populate a user's confirguration file the first time they ran the application.)

I of course havn't used Pine seriously for years, and had never used Alpine in production, so I used Sanjeeva Wijeyesakere's post on setting up Alpine with gmail as a starting point. Basically if you follow his advice but set the inbox path to

{imap-mail.outlook.com/ssl/user=myusername@outlook.com}inbox

and the smtp server to

smtp-server=smtp-mail.outlook.com/tls/user=myusername@outlook.com

as well as setting the domain name

user-domain=outlook.com

and the personal name

personal-name=myusername

it all worked. Obviously you replace myusername with your account name. Being old school I edited the .pinerc file directly with nano rather than using the Alpine configuration menu. You could of course use gedit, vi, or any other text editor.
Written with StackEdit.

Specially for text analysis people

Project Gutenberg have just released A Middle English Vocabulary by John Ronald Reuel Tolkien which looks to be nicely structured document and one from which it would be comparatively easy to extract wordlists etc ...

Written with StackEdit.

Friday, 13 September 2013

Outlook.com and evolution

Microsoft have recently announced that their Outlook.com email service now supports IMAP.
So I thought I'd try it with that well known linux gui email client, evolution. I've previously got evolution to work with office 365 so I thought it would be straight forward - change the name of the servers and set the client to be imap and it should work (and royally confused me to boot).

It does - or more accurately it does with evolution on ubuntu with a standalone, real machine it works fine. Installing it on a crunchbang linux virtual machine it doesn't for reasons I havn't got to the bottom of.

However setting it up is quite simple - basically if you give a Microsoft mail service style address evolution will try and set you up for pop.

To override this change the server type under receiving mail to imap, set the server to imap-mail.outlook.com and the port to 993 with ssl encryption enabled. For sending mail set the server to smtp-mail.outlook.com  with port 587 with tsl encryption - should then just work ...

As I've said before the only real use case for this is Libre/Open office integration and the ability to send emails from inside of the application. I suspect that the number of habitual linux users with outlook.com accounts is fairly small, unless like me, you've been using the service since 1998
Written with StackEdit.

Students and Public access computers

In the building in which I work we have a raft of public access computers for students.

In the old days - like last year - students would come up to them, log in, do work, such as writing papers or running some specialist software, log off, and go away.

Recently I've noticed an increasingly common trend whereby they put their laptop on the desk, work on their laptop, and at the same time use the public access computer to access some specialist resource.

It's not a rational behaviour but I've seen it often enough to reckon that it's a thing now.

It looks like either students have not worked out about sharing content from their university filestore (it's a webdav mount), or we've failed in the communication business by either not telling them this, or not making it easy for them to push data back and forth.

It also lends credence to my belief that students have to a large extent self outsourced their computing needs, and that the name of the game is connectivity and access to specialist services...
Written with StackEdit.

Thursday, 12 September 2013

Using the scots stopword list on Barbour's Brus

Well, having made a stopwords file the thing to do is test it.

I chose to use the text of Barbour's Brus, as the Oxford Text archive copy was fairly clean of inline markup, clean enough to fix by hand rather than modifying my original text cleaning code.

The first time around the results were not quite what I expected:


so I modified the stopwords list by removing the following from the list:

king
kingis
lordis
lord
haly
and adding:

fayis
ner
yen
schyr
yan
yis
gan
towart
swa
her
gert
which gave a better representation:


a little more tweaking might be required, but it has promise as a technique.

This statistical generation of stopword lists could also be applied to analyses of bodies of scientific literature by generating discipline specific extra stopword files so one could filter out the common noise words to get a better impression of a research group's strengths and focus from their published papers - something that is increasingly important as at least one study of search practices among researchers suggests a dependence on Google and by implication it's search algorithms.

Building topic or keyword extraction models may help counter this by allowing the generation of 'other related' lists ...



Making a Middle Scots stopword file ...

Over a year ago I played with topic modelling and wordclouds. As always the reason has not quite gone away, and a year on, I thought I'd better teach myself how to do it properly using R.

Now one of the things I found when I played about with wordclouds is that if you feed middle english text into a wordcloud it does help to have a middle english stopword file.

Playing with the Gutenberg version of Troilus and Cressida I found it was quite easy using R to come up with a stopwords file based on the 100 most common words in the file excluding the names of the protagonists.

The choice of 100 words is purely arbritrary - Ranks.nl links some example stopwords files for modern English and they sit around the (200 +/- 50) mark. Chaucer used just over 5600 distinct words in Troilus so we'll assume that the hundred most common words are a valid stopword list. (In fact, applying the eyeball test, a stopword list of around 70 is probably close enough),

Now, a stopword list based on a single poem might be interesting, but it's not very useful. You need a number of poems to come up with a stopwords file that's valid for a particular author.

Then you can do such tricks as comparing the frequency of words (minus the stopwords) between poems. If one has a very different distribution of words it might be by a different author.

So having discovered how to make a stopwords file I though I'd make a stopwords file for middle scots and then see if I can find frequency differences between various poems by various authors as well as using it to generate wordclouds.

For the corpus I chose the works in the Oxford Text Archive Early Scottish Texts archive. I chose middle Scots quite deliberately, as it was (a) different enough from contemporary English in its spelling to treat as if it was a different language, (b) there was a decent body of online text available and (c) it didn't do anything complicated with word endings other than using is for a plural rather than s.

As such it meant that I could use standard off the shelf programs written for contemporary English, but simulate using it it on a different language with all the default assumptions turned off rather than relying on someone else's choice of stopwords.

The files came with some angle bracket delimited non-standard markup which was probably intended to be read by some other program. I wrote a simple perl script to remove this markup, remove irrelevant bracketed inline text such as ( ITM ), and a few other stray characters, and while I was at it converted the files to lower case for future processing.

I didn't try to fix any orthographic quirks - I made the assumption that all the likely stopwords would be words in common use with an agreed spelling. Given that I'd ended up with a sample of around 860,000 words I was running on the basis that any really common variant would probably turn up in the stopwords file.

After some final porcessing with R the source text contained just under 50,000 unique items, which is probably a rich enough corpus, although this may be masking orthographic quirks. The resulting stopword list consists of the first 200 words in the frequency list.

Of this I'd say four of the words are possibly words you might wish to exclude:

prince (444th most common)
kingis (462) 
knycht (565) 
lordis (526)

The csv file containing the 500 most common words is also available for download if you wish to make your own decisions as to what should be in the stopwords file ...

Monday, 9 September 2013

Skydrive and the student filestore

A year ago I blogged that skydrive had killed the student filestore. At the time I argued that as students increasingly had multiple computing devices they would tend to self-outsource and store their work on solutions such as google drive, skydrive and dropbox.

And a year on, I see no reason to think otherwise. Students do seem to be using such solutions to store their work. They probably start at high school or college, and carry the habit on to university.

For the rest of this post I'll talk about skydrive, but it's really shorthand for all the options out there.

The one thing that skydrive doesn't do is provide network shares. You cannot use it interactively with an application as if it was a bit of mounted filestore.

As always, not quite true, there are solutions like Gladinet that let you do this, which is useful for things like background automated backup, but really doesn't give you a truly interactive service like a sher, purely because it's just too slow.

So that got me thinking again about filestores and why we have them. In part it's tradition, just like providing an interactive time sharing unix box. We've done it so we keep on doing it, ignoring the fact that the box never has two or three sessions live on it at any one time.

We started providing filestore on a large scale to students twenty or more years ago when computers started becoming really common in education. In the main we did it because we couldn't provide a computer for everyone , and so went for the public toilet model of computer provision - lots of more or less similar computers with more or less similar software. Didn't matter which one you used, it was all the same.

Students of course needed somewhere to store heir work between sessions, and making them use floppy disks or other local removable storage was impractable for a whole range of reasons, so we took to providing filestore.

In the meantime, computers have become cheap enough so that anyone who can afford course fees can afford a computer, and one that contains more storage than anyone is likely to use over their course.

The result is that students have self outsourced for all the routing tasks like essay writing and project reports.

In fact the only reason for providing filestore is to allow access to specialist software, whether we deliver this via some sort of VDI solution or via the classic public toilet model - in short they need enough storage for coursework that requires the use of specialist facilities and a means of getting data off it.

They have more storage available through services like skydrive than we are likely to provide.

A few days ago I trawled some UK university websites (I chose the UK because it is start of the academic year there and thus what it says about provision is current).

Most sites seem to offer between 1 and 2GB storage - quite a lot offer only 1GB - significantly less than Skydrive's default of 7GB and Amazon's 5GB, but they all offer ways of easily moving data to and from the filestore, ie there is a tacit admission that they are no longer the primary storage provider.

So, what does this mean?

As long as students need access to specialist facilities they will still need filestore as a place to write out their work and to store work in progress between lab sessions.

This storage requirement is fairly modest as students have ready access to other storage and consequently we should actively expect them to wish to upload and download data.

The storge requirement then ceases to be onerous, a few terabytes at most, and one that can be easily be met by the provision of off the shelf NAS technology. Implicit in this is moving responsibility for looking after their work to students, rather than looking after it for them, meaning that we no longer put substantial resources into mirroring or backing up the student filestore as we treat it purely as work in progress filestore.

Given that students have already self outsourced this is not as big a change as it might be, but it is a move that should not happen by default.

Such a move should also be accompanied with a push to better education about data management and the sensible use of commercial storage services, and the risks involved ...

Wednesday, 4 September 2013

When your desktop is in the cloud ...

... you don't care about the operating system on the device in front of you.

The old arguments about how software base is what sold machines and put windows in such a dominant position no longer apply. If(and it is still a big if) everything you use is abstracted to the cloud you really don't care.

You then start buying the thing in front of you on these other factors such as look, feel, street cred and the rest. This may in part explain the growing preference for Macs - the triumph of design over utility. (I'm not immune to this as my ten dollar watch story shows)

It then starts to matter which cloud software ecology you belong to - Google has one, Microsoft and their hardware friends have one, Zoho has an interesting set of tools. And Apple doesn't really have one ...

Written with StackEdit.

Tuesday, 3 September 2013

Student Computer use

It's spring here in Canberra, with the days in the low to mid twenties, even if the nights are still chilly.
On campus this means that the students finally emerge from their winter burrows and start sitting out during the day, even though they are still doing work.

In my day this usually meant intimidatingly thick textbooks, or xeroxed copies of research papers plus a notepad or two. The technology of the mid seventies didn't allow for much more, although in one share house we bought an old desk and put in the back yard, and I have happy memories of banging away on a portable typewriter in the evening sun with a glass of cheap Romanian red ...

Well, these days students seem to have got over the textbook and notepad thing and use their laptops out of doors. Given we've wifi just about everywhere this means they can sit outside and work providing it's not too far from a building.

So in the spirit of my informal surveys of how people read books on the the bus here's my totally unscientific survey.
  • Macbooks are the computer of choice, overwhelmingly so
    • the most preferred MacBook appears to be the Air
  • There is no dominant recognisable preferred Windows based computer brand. 
    • All the common retail brands are represented plus some Vaios 
    • there's a preference for Ultrabooks
  • No one seemed to be using a tablet out of doors with or without a keyboard

Given that students carry them around all day I can see why ultrabook style computers are popular. I was surprised at the lack of tablets + keyboards give their better battery life and their potential as note takers.


Of course it could be that only the cool people are out of doors and the rest are inside beavering away on their clunky old laptops ...
Written with StackEdit.