Tuesday, 13 October 2009

Plagiarism and authorship

Yesterday I tweeted a link on someone who had used a plagiarism detection package to show that perhaps Shakespeare did not work alone, at least for his early plays.

As always the article was accompanied by the faint sound of empires being defended as reputations, and years of literary scholarship, were felt to be under threat by the rude mechanicals.

Actually it's quite a clever technique. For example the chronicle for Fredegar is in part thought to be derived from Gregory of Tours History of the Franks and compiled by three separate authors.

Statistical analyses of phrase frequency (for that's all the plagiarism detection packages really are), would let us show whether that was plausible.

Equally, for many medieval texts there is no definitive source. All there are are copies of copies from which we synthesise a likely translation. Nothing wrong with that, translation has always been in part a creative activity to make texts read well.

However, what the plagiarism detection systems could possibly do is allow us to see which texts most closely resemble each other.

So if we have four texts, A, B, C and D and we can show that B closely resembles A, and that both have reasonable resemblance to C as also C has to D, but that D differs from A/B more than it does from C we could guess that A/B are copies of each other, that one of them was copied from C and that D was copied from C separately, perhaps by someone else entirely.

Scholars have been doing this by hand for years, and possibly with greater accuracy. However computers are good at counting things, and with cheap OCR and the digitisation of transcriptions of the various manuscripts via programs such as GoogleBooks, it would be possible to run these analyses relatively simply and cheaply.

Even if all it does is confirm existing scholarship we have learned something. If it throws up something else that could be rather interesting ...

