Tuesday, 12 March 2013

Newspaper archives and standards

As mentioned, I've beenplaying with QueryPic to track down contemporary Australian newspaper reports of the American Civil War commerce raiders Shenandoah and Alabama.

None of this is new research - it's well known to historians, but as well as being an intrinsically interesting story what I find interesting is the effects of the lag caused the lack of direct telegraphic connections.

For example, when the Shenandoah appeared off Cape Otway it was headline news, not only becuase it brought far off events home but because it was completely unexpected - the nearest contemporary comparison I can draw is with an incident during the Falklands conflict in 1982 when a British Vulcan V-bomber suffered mid air refuelling problems and made an emergency landing in Brazil - to the accompaniement of fascinated Brazilian media coverage.

The reason I can fiddle about and do this is that it is all free as a result of various Australian and New Zealand digitisation initiatives.

Elsewhere, newspapers have built and financed their own digital archives, and quite obviously want to recoup some of the cost by charging.

For example, if I want to look at how the Irish Times reported the arrival of the Shenandoah in November 1865, I can search the archives and the Irish Times archive service will show me snippets of possibly relevant articles and charge me 10 euros for a days access if I want to go further.

All this is perfectly good. Servers, digitisation, indexation and OCR'ing text cost money and cost money to maintain - and given I look after similar operations in my professional life and know how much these things cost to deliver, I'd say that the Irish Times was not making a massive profit out of it.

Let's say I pay my 10 euros and find what I want and then decide to access the London Times archive. It has (fortunately) a search interface very similar to the Irish Times, and is less interested in charging me.

It would of course been quicker and simpler if I could have run the search against the two newspapers archives (and others) simultaeneously.

As it looks that both are using the same content management software one might think that there might be a common api. If so this should make developing something like QueryPic relatively simple - all I need is a common api to let me retrieve the search results and information and a way of indicating if there was a charge to access the article itself - rather in the manner of the New York Times.

Let's be clear - all that is required is that the newspaper archives concerned provide an api - I'm not expecting them to do any of the development work or indeed forego any of their charges, only to provide a mechanism to aid the development of external tools as an aid to research ... 

No comments: