Thursday, 16 August 2012

Gawayne the Green Knight meets the wordcloud

I've always had an affection for Sir Gawayne the Green Knight, (first bit of real middle English I read) so I thought as final act of fiddling about with wordcloud I 'd feed the Guternberg version into the IBM wordcloud software just to see what came out

which neatly demonstrates the need for a proper middle english stopwords file. Hacking my original file to produce an extended though very incomplete file one gets something a little better:

which shows that one of the things we need to take this outside of playing with nineteenth and twentieth century English text is a set of agreed stopword files for analyses.

This would clearly also apply to analyses with other languages, be it Malay or Old Irish...

No comments: