Thursday, 14 April 2011

ISBN's as persistent identifiers

Ever since I discovered LibraryThing, evenings chez Moncur have involved a hour or so sqautting on the floor with our Asus netbook adding our books into our LibraryThing collection. And it's a strangely therapeutic activity.

Basically, the routine is something like this:
  • make pot of tea
  • fire up netbook and connect to librarything
  • decide which shelf to enter
  • pull books off shelf
  • find and enter each book's isbn in turn, while drinking tea
  • return books to shelf
  • logout and help cook dinner
and the reason why it's so mindlessly enjoyable is the power of the isbn. ISBN's uniquely identify each book (sometimes each edition) published after sometime round about 1970. And because of this they're in all the major public catalogues meaning that adding a book simply involves getting librarything to look up the isbn against a reference source (in my case usually Amazon's UK catalogue), and there's the data.

I've only, so far, found one mistake in the Amazon database, and have only had to use an alternative catalogue source, the National Library of Australia, on three or four occasions. I've been quietly amazed at what's in there - things that you would think difficult such as book published in Thailand (with a Thai ISBN) about Laos are just there. I've only had to do serious detective work on one book - an English translation of a German art book on Egon Schiele.

So ISBN's are the poster child of the persistent identifier world, but when we look at institutional repositories, the use of persistent identifiers is spotty to say the least, and governed by the choice of repository software. Archives of research data are pretty spotty as well, and some registries - the catalogues of the dataset world are not that great either.

ISBN's have been a success because they filled a need for the book trade and libraries, to ensure that books are correctly described and thus when someone orders a book they get what they ordered, rather than something with a similar sounding tittle or author ...

In the data archiving world, ideally one would like to link datasets to publications and also to researchers, but we lack unambiguous primary keys, which is what decent persistent identifiers would give us. We also currently seem to lack a clear driver to introduce such a scheme to enable unambiguous dataset citation ...

No comments: