Thursday, 13 March 2014

Disambiguation and Altmetrics

Elsewhere, I’ve posted about the fact that as part of the day job, we’ve developed a prototype solution to create ORCID identifiers for researchers programmatically, rather than the researcher having to go to to create one individually.

Why did we do this?

In a word disambiguation. Increasingly funding bodies require eveidence of activity and as the nature of scholarly publication changes this increasingly includes grey literature, conference presentations, exhibition catalogues and the like.

And people are just not consistent in the name they use. They change their surnames due to marriage (or divorce), they abbreviate their forenames, use a different forename informally from formally, they reverse their surname/forename order because they publish in a Hungarian excation report, they adopt an informal western sounding name to use among their peers, but publish under their formal Asian name, etc etc.

Names basically are completely useless as a consistent and persistent identifier. While most people use only a small set of variations on their name, the sum of all the possible variations is amazing.

ORCID solves this problem. As a sixteen digit number it is totally free of cultural biases, and also too big to be easily remembered. It even covers the problem of what to do about these cultures who refer to someone by another name once they have passed on (To explain: the Greeks and Romans had a belief that someone only lived on the afterlife as long as their name was remembered - this is why they put names on headstones, and indeed why we do the same thing. They are not unique in this belief, but a number of other cultures have an equally deep seated beleif that referring to someone by the name they used when they were alive encourages their ghost to hang around an annoy people.)

So ORCID works as an identifier, and it would be possible to build a database of other parallel identifiers to allow us to say that name or that identifier maps to a particular ORCID identifier and that would lets us measure impact.

We then have to deal with altmetrics. Altmetrics is a move to try and measure the chatter around a researcher and by implication the degree of influence. This is an area fraught with difficulty, but there is also a problem - most influence measurers are self selecting - you have to sign up to Klout or ImpactStory or some other service.

This requires that individuals do this, which means that they have to be sufficiently interested and invested in the process to do so. And being human some will and some won’t and some will have to be induced with carrots.

At the moment though there are precious few carrots. There is also half an assumption that people keep their personal blogging and tweeting free from their personal tweeting and blogging.

I suspect this is not the case. I started with one blog and then split it out. My twitter feed reflects my interests - some work based and technical, others to do with my interests in history and archaeology.

And this gives us a problem. How do we know which blog or tweet counts for impact?

We don’t. We can’t.

We can ask people to nominate particular rss feeds, be they tweets or blogs but that’s as far as it goes, and that brings us back to motivation.

And once we’ve solved that problem we ahve the interesting problem attaching twitter handles blog authors to names.

Some people, like me are fairly boring and predictable in their choice of handle, in my case it’s usually dgm or moncur_d or more rarely moncurdg.

Outliers like dougM are usually the result of some wierd automatic allocation rule.

Other people have flights of fantasy - it’s a bit like the problem of Asian researchers who adopt a western style pseudonym - sometimes it’s a rendering of the meaning of their name, sometimes it’s something that sounds like their formal Asian nae, and sometimes it’s completely random.

Essentially this means one of two things - we either have a year zero for altmetrics where everyone agrees that we all link our blog and twitter handles to something like our ORCID id’s, or we have a mess. If we have a mess altmetrics will only ever give us a partial measure of impact.

I’m betting on a mess …

