Lerna IIIc: Why the Greek scales are rigged

Even if you allow for the fact that Greek is flexional and has lots of inflections, a literary corpus of Greek is going to have a lot more morphological variety than most other literary languages. That doesn't tell you something about the superiority of the Greek language. But it does tell you a bit about Greek culture. And it does mean that, if the word form and lemma counts of Greek come out better than expected, the comparison is not exactly fair.

The first catch is that the literary corpus spans three thousand years, as many a Greek ideologue likes to remind you: a trick only Chinese has gotten away with. Does that prove it's the same language? That's a loaded question, of course: if you believe the Moderns are the same people as the Ancients, you'll call both Greek, as everyone does now; and if you don't, you'll distinguish Hellenic from Romeic, as everyone did three centuries ago. (That's unless you were calling Romeic Graecobarbaric, which was also all the rage in some circles.) More to the point, if you believe the Moderns are the same people as the Ancients, your language will reflect that belief. A lot of that in the contemporary Standard is engineered: it results from the conscious efforts of Puristic, to bring older forms of the language back. Some of it is older conservative forces, notably the language of the church.

Greek is no Icelandic: the written literary tradition has had much more of an effect on the spoken language up North, and Iceland is a much smaller place. Greek may be on the conservative side morphologically, compared to say English; but the morphology has still changed quite a bit. Which means, if you count Homeric morphology and contemporary morphology in the same word form count, you're going to get a lot more word forms than if you were doing one millenium at a time. And most counts of what a language's word forms are take just a decade or so at a time, because most counts are synchronic: they're snapshots of a language, not the whole Theseus' Boat ten-part series. A synchronic count of Greek is going to show you a lot less variation, because people don't normally have conversational command of three millenia's worth of speech.

Normally, noone does: that's not the language people have in their skulls, which is what most linguists deal with. Of course, you could compile a corpus of three millenia's worth of language spoken in Rome; and you'd get Classical Latin, Vulgar Latin, several stages of Romanesco, and Standard Italian in the one list. With lots and lots of morphological variation. There's a reason why you wouldn't call that one language's worth of morphology over three millenia in Rome, but three: so the different morphology shouldn't be on the same listing. There's a reason why you may choose to call it one language's worth of morphology over three millenia in Athens (as long as you leave out the Arvanitika of Pllaka). The reasons for that aren't entirely linguistic. They aren't entirely non-linguistic, and the development of Greek has been affected by the underlying thinking. But these are all gradients and slippery slopes; and Greek is at one extreme of the slope. It proves Greek covers a long period; it doesn't prove Greek-speakers have their brains wired differently.

There's not just the three millenia upping the word count. All languages have regional variation, with different grammar and lexicon, up until they get spoken in just one place—or the mass media convince you that they are. People normally speak one dialect at a time, just like they speak one century at a time; so having a corpus span 3000 km of language doesn't tell you more about what language is contained in a single skull than does having a corpus span 3000 years of language. So including 3000 km bumps up your word form count more than is strictly speaking fair.

The thing about Greek is, the literary culture made the same language span not just thousands of years, but thousands of kilometres. Literary Greek is pretty distinctive in having no less than six literary dialects: Epic, which is Old Ionic with other bits, New Ionic, Doric, Aeolic, Attic, and Koine. They're conventionalised in the literary texts, and are not always linguistically reliable; but someone with a literary grounding by Hellenistic times was expected to be conversant in the lot of them; and the literary corpus does need to reflect them all. The literary corpus is not reflecting what was in any one Greek's skull as their native speech, so comparing its morphological diversity to what other language corpora tell you is artificial. But once a language is literary, artifice happens: there's more King James and Shakespeare in contemporary English than there should be, too, and more pepperings of American in Australian English than would have made sense a century ago. And at least some Byzantine scholars did have some command of much of this inflated repertoire of Greek morphology, as artificial as it got.

All this though is reason why counting word forms in Greek is misleading. I'm still going to attempt it, because it raises some further interesting questions, and we're going to see the 1.5 million word forms I quoted whittled down a fair bit. (Having to control for spelling variation, for starters.) Last stop in the ritual abjuring of grocery calculations, lemmata.


