Sunday, 15 June 2008

iambic pentameters and decryption

Along with my fixations about filesystems and digital preservation to name but two I've always been fascinated by the history of the roman empire and by writing systems, even as a child I was fascinated by different alphabets.

So a few days ago I was reading about the Oxyrhyncus papyri and suddenly was struck by something:

Latin was originally written with dots as work separators so text would look something like this:

the.cat.sat.on.the.mat

and greek was written without separators like this

thecatsatonthemat

and some time round about the first century the Romans changed from the original dots-as-separators style to the greek style which must have made text very difficult to read, even if in an inflected language like latin you have some intrinsic help puzzling out the relation between words. This is why people read out loud, working out which clauses formed a sentence etc etc.

The use of spaces as separators didn't come in until the 10th century, something for ever immortalised in of all things the sun fortran manual:

Consistently separating words by spaces became a general custom about the tenth century A.D., and lasted until about 1957, when FORTRAN abandoned the practice." —Sun FORTRAN Reference Manual

Anyway - why would they have such a system in Greek ? word separators seem a good idea even if you do find languages such as Thai that don't use them.

And then I had my idea. Word separators are good for prose, which is unstructured text. Inflected languages are reasonably structured which is why it more or less worked for latin. Modern English, or Dutch would be another matter, where the only real structure is word order.

But if what you were writing down is poetry it's easier. Providing you know the rythym and the rules for syllables it's easy to split up the character groups, and when you know for example there are five syllables to a line it's easy to work out, ie when the text is very structured, just the way it's easy to parse text with a computer program when it's structured and a nightmare when it's unstructured.

Anyway my $0.02 on this.

No comments: