Monday, 20 August 2012

Text Analysis, neither snake oil or a cure all ...


A lot of this text analysis stuff is about treating text documents as data.

And while you can get valuable insights from these analyses it's important to understand that the original creators of these documents were not creating content or depositing data. Jane Austen did not create content. She wrote novels.

When she set out to write these novels, which we could describe as comedies of manners in the main, she inadvertantly described the society in which she lived, one in which, for women a 'good' marriage was necessary to ensure financial security, and one in which communication and travel was difficult, leading to a small and compressed social circle.

Critical reading of these novels allows us to build a portrait of how they lived.

I've picked on Jane Austen as an example, but I could just as easily have chosen Aristophanes or Juvenal.

It is important to understand that the to approaches are complementary. When for example I used the Google Ngram viewer to plot the use of the term Burmah, you could get some measure of the significance of use of the term to the ordinary reader at the time.

It doesn't tell you anything about how colonial society functioned.

This isn't of course to rubbish topic modelling or other such techniques. It lets you identify topics of concern within a corpus, just as looking at the frequency of medieval property transfers might identify times of social turmoil and change.

So we need to be critical in our approach. Topic modelling and other text mining techniques are now possible due to the sheer amount of digitised text available, and they definitely give an index of popular concern.

They are however not a substitute for critical analysis. Rather, they complement it ...

No comments: