Wednesday, 15 August 2012

Chaucer wordcloud

And finally,

for fun I fed the Gutenberg Collected works of Chaucer into the wordcloud software ...

this is actually quite interesting.

I didn't have a middle english stopwords file so of course we see that common forms of speech (ye, thou, thee, thy etc) predominate. So, I made myself a very simple supplementary stopwords file consisting of the obvious bits of middle english (thee, thy, thou, ye, eke, gan) in the wordcloud and then  reran the generation process:

which I think we can agree is possibly a bit better though it needs more work - for example quoth, hath, anon and may should probably be excluded.

Using an extended stopwords list one can come up with something like this:

which is possibly a more accurate model of Chaucer's drivers. I must say that I'm quietly impressed with the power of this to display the themes in a body of text ...

