Friday 22 November 2013

Impact (again!)

I’ve been thinking some more about impact and what it means for datasets.

From a researcher’s point of view impact is about trying to create is something akin to a Klout score for academics.

Klout is a social media analytics company that generates a ranking score using a proprietary algorithm that purports to measure influence through the amount of social media engagement generated.

Because I experiment on myself I know what my Klout score is - an average of 43 (+/- 1), which is respectable but not stellar. Now the interesting thing about this score is
  • While I’ve connected my Facebook account to it I don’t actively use Facebook
  • I have a small band of loyal twitter followers (230 at the last count)
  • Google Analytics shows my blogs have a reasonable penetration with an average readership of around 30 per post
In other words, while I am active in social media terms, I’m not massively so. So Klout must be doing some qualitative ranking as well as some quantitative ranking, perhaps along the lines of X follows you, X has N followers, and these N followers have an average of M followers. I’m of course guessing - I actually have no idea how they do it. The algorithm is proprietary and the scoring system could be completely meaningless.

It is however interesting as an attempt to measure impact, or in Klout terms social media influence.

Let’s turn to research papers. The citation rates of scientific papers reflect their influence within a particular field, and we all know that a paper in the Journal of Important Things gets you a nice email from the Dean, and one in Proceedings of the Zagistani Cybernetics Institute does not. And this of course is the idea behind bibliometrics and attempting to quantify impact. Crudely, a paper in a well respected journal is likely to be more widely read than one that is not.

Even if it is not widely cited the paper has had more influence and one less widely read.
And of course we know it probably is more or less true. If you’re an ethologist you’re probably going to want some publications in Animal Behaviour on your CV.

So we could see it could sort of work within disciplines, or at least those in which journal publication is common. There are those, such as computer science, where a lot of the material is based around conference proceedings and that’s a slightly different game.

Let’s now think about dataset citation. By it’s nature data that is widely available is open access and there is no real established infrastructure, with the exception of a few dedicated specialist repositories such as the Archaeological Data Service in the UK and IRIS in the US for Earth Sciences.

These work because they hold a critical mass of data for the disciplines, and thus archaeologists ‘know’ to look at the ADS just as ethologists ‘know’ to look at Animal Behaviour.

Impact is not a function of the dataset, but of some combination of its accessiblity and dissemination. In other words it come down to
  • Can I find it?
  • Can I get access to it ?
Dataset publication and citation is immature. Sites such as Research Data Australia go some way by aggregating the information held in institutional data repositories, but they are in a sense a half way house - if I was working in a university in the UK would I think to search RDA - possibly not - and remember that most datasets are only of interest to a few specialists so they are not going to zoom up the Google page rank.

At this state of the game, there are no competing repositories in the way that there are competing journals, which means that we can simply use raw citation rates to compute influence, and to use citation rates we need to be able to identify individual datasets uniquely - which is where digital object identifiers come in - not only do they make citation simpler, they make counting citations simpler …
Written with StackEdit.

No comments: