Monday, February 21, 2011

"Zen" and the art of Google N-gram Viewing

Over on the WordSquared blog, WordSquarers are pondering what should be a legal word in a word game. In particular, they are asking whether ZEN should be an allowed word in their game. "zen" is probably the most frequently asked about word because many people (myself included) initially expect that the Word2 game will accept it, but it never does...

The argument in favor of admitting "zen" to the dictionary is that usage suggests that there are two kinds of "Zen": capital-Z "Zen", which refers to Zen Buddhism and lowercase-Z "zen" which refers to a state of extreme calm and centeredness. Of course, the idea of this calm state is a reference to what is considered to be a result of the practice of Zen meditation.

It turns out that "Zen" is sometimes capitalized even in phrases like "a Zen outlook on life" or when something is said to be or feel "so Zen". This usage is consistent with "Zen" being a proper adjective (like "British").

To pursue this question further, I used Google's Ngram Viewer (which really ought to be spelt "N-gram Viewer") to compare the frequency of usage of the words "Zen" and "zen" in books over the last 200 years. The capitalized version completely dominates. (The oscillations in the appearance of "Zen" in English language books seem to reflect periodic variations in Western interest in Eastern mysticism. Roughly similar oscillations can be seen in the usage of "Tao".)

If you look at just the usage of "zen" over time,

you see that back in the 1800s, long before the concept of Zen was even popularized in Western society, instances of "zen" are present in print like some kind of background noise. And indeed, closer examination reveals that these "zen"s have nothing to do with Zen. They are frequently word fragments (like cases where the word "citizen" has been broken between pages and the OCR failed to transmit the dash in "-zen" to Google N-gram Viewer) or abbreviations of names in plays ("Zen." = Zenobia in some plays).

The same search, done on the American English corpus (rather than the overall English corpus, as above),

also fails to show a decisive increase in the usage of "zen" in English books in the U.S..

In contrast, many words admitted into the dictionary show usage patterns that clearly surpass their background noise levels. The first Google N-grams image above shows a good example of this behavior for the word "Zen". And consider "supersize",

a word accepted into the Merriam-Webster Collegiate Dictionary in 2006. Words that have such an abrupt exponential gain in usage must be the easiest for lexicographers to deal with.

I thought I had found a good argument in favor of making "zen" a word when I realized that there is another word which also has a nearly identical usage pattern. Both "Zen" and "Christian" refer to specific religions, and both are also used in a more relaxed fashion as an adjective (roughly meaning "placid" and "humane, altruistic", respectively). But the question of the correct case is the same with "christian": It is not frequently found in an uncapitalized form, and dictionaries nearly universally include only the capitalized form of the word.

It's possible that editors are keeping the uncapitalized "zen" out of books because it does not appear in dictionaries. And then, to the extent that dictionary inclusion reflects usage in print, "zen" is doomed to be perceived as a common misspelling and locked out of dictionaries forever. Of course, these days lexicographers search for new words in lots of other media, including the less rigorously edited Internet, so "zen" may yet be recognized as a legitimate word.

If you want to express your opinion about "zen", the comments on that Word2 blog post are still open, and you can always post here in the shiny new Bananagrammer comments area.