Wednesday, April 20, 2011

Innergrams and what they can tell us about word favoritism

So I was playing Word2, and I had a rack of A R T T U _ _ and a desire to build down from the word LEND. Reflexively, I spelled DART and was about to move on when I paused and asked myself "Why didn't I make DRAT?". DRAT is a perfectly fine word, and unlike DART, I don't recall ever playing it before. After building off the end of LEND, I was planning to build another word off the end of DART to extend my slaloming, weaving string of words off toward the horizon. This style of wordcrafting is not atypical, so many people must have previously encountered a similar choice between two words that start and end with the same letters. I began to wonder how they chose.

Fortunately WordSquared has a new feature that allowed me to find out. By clicking on a word on the board, you can pull up a pop-up box containing a little information about the word including definitions, who has recently played the word, and how many times it has been played (since statistics have been kept... about a month ago as of late-March).

DART had been played 4789 times.

DRAT had been played 1723 times.

So it was not just me.

I decided to study this a little more. I compiled a long list of anagrams that share the same first and last letters (e.g., FORTH and FROTH, SEAHORSE and SEASHORE). Since it is only the inner letters that are scrambled, I decided to call them "innergrams".

The table below shows the resulting innergram pairs with each word's respective word count (as taken from Word2 statistics pages like this one.) The words are sorted so the more frequently used one is always in the first column. The fifth column shows the ratio of the two word counts.

Since I wanted to identify data that was not strong enough to draw conclusions from, I used formal hypothesis testing. The chi-square goodness of fit test I used is described in detail here. The essence of it is that the more data you have and the farther the ratio of word counts is from 1, the stronger the evidence is that one word is preferentially being used over the other. The chi-square parameter (in column 6) measures how strong this evidence is. I've sorted the table by increasing evidence strength.

Admittedly, there are lots of situations where one of these words would be favored over another for in-game reasons (like, CRAVE was already on the board and CRAVEN was made by just adding an N, or maybe a triple-letter score square made CAVERN a higher scoring choice). Averaged over many instances, some of these effects should cancel out.

The first three rows have such a small chi-square value that it's pretty certain that people are not (on average) favoring one of these words over another. (Maybe for every person who makes CRAVEN by adding an N to CRAVE, there is someone else making CAVERN by adding an N to CAVER.) The gray rows are weakly supported. The rest of the rows have a big enough chi-square parameter that we can say with greater than 95% certainty that Word2 players favor the first word over the second word. In the last column of the table, I suggest reasons why.

Essentially, this is a listing of possible word blind spots. DART is a far more popular choice than DRAT, and unlike many of the examples on this list, this asymmetry cannot be explained by ART being a more frequently available hook than RAT. (RAT has been played 19,000 times and ART only 12,000 times.)

I have highlighted with orange the rows where there is strong evidence that the second word is a blind spot word. The green rows indicate that I suspect a blind spot exists, but other explanations could also account for the imbalance.

All innergrams are potentially useful tools for Bananagrams players since the ability to most quickly rearrange your grid can be the difference between winning and losing a game. Blind spot words are just the innergrams that you are most likely to not have immediately at hand... until now!

Blind spot words:
blot, causal, citric, coral, drat, garb, labile, prefect, reserve, rogue, slat, sloe, snag, stanch.

Other possible blind spot words:
brunt, clod, froth, gird, median, recuse, sidle, spilt

In the interest of completeness, below are the innergram pairs that I left out of the table because their word counts were too low. (No count exceeded 8.)


The last two pairs had word counts of zero. Build one of these words in Word2 and you may be the first!