Lerna IIId: Why we do not count lemmata

Now, the whole point of any word counting venture, such as Lerna attempts and gets galumphingly wrong, is not the corpus size, which is contingent and always less than infinity; nor is it the number of word forms, which tells you about morphological happenstance but not about vocabularies. When people talk about words, they mean dictionary words.

This veers off into Eskimo Words For Snow territory, so it's even more fraught for a linguist to talk about. Especially because, even more than for word forms, there is a lot of arbitrariness to be had about how you count lemmata. Enough arbitrariness to make the whole venture deeply problematic. It's especially problematic if, like the artificially inflated corpus of the TLG (or the OED, or indeed any dictionary), the corpus spans more than the vocabulary contained in one skull, and ranges over more than one region, and more than one decade. That brings together all the words you might need to know if you ever come across them, in a literate culture that preserves words in print for centuries. It does not bring together all the words you ever will have in your skull: it's not modelling the vocabulary that any speaker will ever command. Dictionaries are documenting an inflation inherent in any written language; it is particularly pronounced for Greek, for reasons already seen.

Now, it's reasonable to assume that if your language gets used by more people, to talk about more stuff, in a culture where more stuff is around, and in contact with lots of other languages and their speakers' stuff, then that language will have more words. The Greek of the Roman Empire was like that. The English of the Globalisation Empire is much more like that. So if the guesstimates are that contemporary English has twice the dictionary words as contemporary Spanish, that's plausible.

The Greek of the Classical Age invented much of how the West understands the world. But it was not exploding with words. The Spartans weren't the only Greeks to be Laconic: Classical Greek was frugal with its words—enough for its philosophy to look basic (or unsophisticated), compared to the German experience. As we'll see, the vocabulary explosion happened much later. Look at how Plato writes about philosophy, how a speech in Euripides works—how insistently Aristophanes snipes at Socrates' and Euripides' new-fangled words, and how unremarkable those new-fangled words turn out to be. "Verse" στίχος, Frogs 1239, was such a new-fangled word, for goodness' sake, as Andreas Willi writes: The Languages of Aristophanes, p. 58; yet it's merely reusing the word for "line".

Plausibility was never the point of the Lernaean text, nor is it perturbed by any actual familiarity with Classical Greek. But even with three millenia of vocabulary buildup pitted against 500 years' worth of Modern English, the world is working out in such a way that Greek is not going to beat English in the "my lexicon is bigger than your lexicon" games. The information overload explosion is being engineered in English, and involves English coinings. Where the vocabularies are growing, other languages are struggling to keep up, and most don't bother: IT done outside of English is now all about the codeswitching. Lernaeanists hear the codeswitching and see the scriptswitching all around them, yet still they assert in their Letter to the Editor that English having more words than Greek must be some kind of joke. ("Και προβάλλεται ως τέτοια η Αγγλική, που μόνο σαν ανέκδοτο μπορεί να θεωρηθεί.") That... must be some kind of joke itself.

But as the about.com answerer hastened to add, English having double the words of Spanish doesn't mean Spanish doesn't have nuances which English can't readily express. Or that any other language doesn't. There are still notions particular to any given culture, which that culture's vehicle language will have words for, and another culture's language won't have had a reason to come up with a word for. That's true of farm implements vs. modem protocols, and it's true of all the subtle constructs that each language's poets embrace zealously, and that the Meaning of Tingo book series did such a superficial job on. (At least the guy has a blog, so there's some avenue for the readership to fine tune things.)
It always struck me as amusing, btw, that most such "untranslatable" Modern Greek words... are Turkish or Venetian. Although of course, whatever meaning they've since picked up is quite distinct from when they first entered the language. It's a long way from merak "hypochondria" to μεράκι "outburst of creativity" [EDIT: better, "sustained creative effort"]. The sequence, from what I surmise, is: hypochondria > lovesick > yearning > fastidious about one's work > taking pride in one's work. By a similar pathway with a last-minute detour, meraklı "hypochondriac" > μερακλής "bon vivant, connoisseur"... Come to think of it, those French words are untranslatable too, aren't they.

There's likely more animal husbandry terms in Masai than Pitjinjarra, and more terms for kinds of cheese in Italian than Laotian, and more terms for intellectual property arrangements in English than in Sorbian. That's the anodyne version of the Eskimo Words For Snow business, and not particularly surprising. Again, it doesn't mean brains are wired differently. You can translate μεράκι with some work. Unlike Spanish (and Old English), German does not have a verb to distinguish essential being from contingent being (ser/estar, bēon/wesan). That didn't put the brakes on German philosophy (!) , and it didn't prevent them making a nouns for persistent existence, Dasein. Not having as many words as the language up the road is not such a deal-breaker in the end.

But as to this urge to have more words than English, in a game that can't be won and makes no sense anyway... it's malicious to, but I'm compelled to recount the 1980 Richard Feynman in Greece episode (Link 1, Link 2):
They were very upset when I said the development of the greatest importance to mathematics in Europe was the discovery by Tartaglia that you can solve a cubic equation: although it is of little use in itself, the discovery must have been psychologically wonderful. It therefore helped in the Renaissance, which was freeing man from the intimidation of the ancients. What the Greeks are learning in school is to be intimidated into thinking they have fallen so far below their ancestors.

Tartaglia's work was done more than 1000 years after the Greeks and showed to the Greeks that a modern man could do something no ancient Greeks could do
(Richard Feynman, What Do You Care What Other People Think?)

Lerna is a hoax, and Lerna is an annoyance, and Lerna is an embarrassment; but it will not die, because more than anything else, Lerna is a symptom. It's a symptom of what Feynman found. And the way to singe the head of the Hydra is to get over that nagging sense of not measuring up to the Hellenes. Generations have failed to make headway there; but Lerna's not making the job any easier, by attributing to a literature already bestriding the world a vocabulary 1000 times larger than life.


  1. What an excellent series of posts and a sublime ending, thank you very much! Have to re-read it, though, to visit all the links etc.

  2. Yes, I thoroughly enjoyed it, and I hadn't previously encountered the Lerna nonsense.

  3. Bless you for your sanity, Nick! There's a good deal of elementary questions Lernaeanists don't even bother to ask, such as whether "Ancient Greek has 90 gazillion words" really is a meaningful sentence. Exactly what do we mean by "Ancient Greek"? The Iliad is in "Ancient Greek", but so is Aeschylus, Aristophanes, Plato, Plutarch, Nonnus... To take the sum of all the words used in all Greek texts from, say the 8th century BC down to the 6th century AD, and then pronounce Greek the world's richest language is so elementary an error as to defy belief.

  4. George B. rightly pointed out to me I've glossed μεράκι too loosely: he words it as "a continuous effort, even if fruitless, to create something—or simply building on others' creation—in a specific direction." It's the passion of craftsmanship, in other words.

  5. Thanks Nick, that's an excellent post. So much information... at the end I was lost in the words of Feynman and the history of solving polynomial equations.


The Other Place (opɯcɯluklɑr)

Powered by Blogger Widgets