A blog for fans of Bananagrams, word games, puzzles, and amazing things

Wednesday, April 20, 2011

Innergrams and what they can tell us about word favoritism

So I was playing Word2, and I had a rack of A R T T U _ _ and a desire to build down from the word LEND. Reflexively, I spelled DART and was about to move on when I paused and asked myself "Why didn't I make DRAT?". DRAT is a perfectly fine word, and unlike DART, I don't recall ever playing it before. After building off the end of LEND, I was planning to build another word off the end of DART to extend my slaloming, weaving string of words off toward the horizon. This style of wordcrafting is not atypical, so many people must have previously encountered a similar choice between two words that start and end with the same letters. I began to wonder how they chose.

Fortunately WordSquared has a new feature that allowed me to find out. By clicking on a word on the board, you can pull up a pop-up box containing a little information about the word including definitions, who has recently played the word, and how many times it has been played (since statistics have been kept... about a month ago as of late-March).

DART had been played 4789 times.

DRAT had been played 1723 times.

So it was not just me.

I decided to study this a little more. I compiled a long list of anagrams that share the same first and last letters (e.g., FORTH and FROTH, SEAHORSE and SEASHORE). Since it is only the inner letters that are scrambled, I decided to call them "innergrams".

The table below shows the resulting innergram pairs with each word's respective word count (as taken from Word2 statistics pages like this one.) The words are sorted so the more frequently used one is always in the first column. The fifth column shows the ratio of the two word counts.

Since I wanted to identify data that was not strong enough to draw conclusions from, I used formal hypothesis testing. The chi-square goodness of fit test I used is described in detail here. The essence of it is that the more data you have and the farther the ratio of word counts is from 1, the stronger the evidence is that one word is preferentially being used over the other. The chi-square parameter (in column 6) measures how strong this evidence is. I've sorted the table by increasing evidence strength.

Admittedly, there are lots of situations where one of these words would be favored over another for in-game reasons (like, CRAVE was already on the board and CRAVEN was made by just adding an N, or maybe a triple-letter score square made CAVERN a higher scoring choice). Averaged over many instances, some of these effects should cancel out.

The first three rows have such a small chi-square value that it's pretty certain that people are not (on average) favoring one of these words over another. (Maybe for every person who makes CRAVEN by adding an N to CRAVE, there is someone else making CAVERN by adding an N to CAVER.) The gray rows are weakly supported. The rest of the rows have a big enough chi-square parameter that we can say with greater than 95% certainty that Word2 players favor the first word over the second word. In the last column of the table, I suggest reasons why.



Essentially, this is a listing of possible word blind spots. DART is a far more popular choice than DRAT, and unlike many of the examples on this list, this asymmetry cannot be explained by ART being a more frequently available hook than RAT. (RAT has been played 19,000 times and ART only 12,000 times.)

I have highlighted with orange the rows where there is strong evidence that the second word is a blind spot word. The green rows indicate that I suspect a blind spot exists, but other explanations could also account for the imbalance.

All innergrams are potentially useful tools for Bananagrams players since the ability to most quickly rearrange your grid can be the difference between winning and losing a game. Blind spot words are just the innergrams that you are most likely to not have immediately at hand... until now!

Blind spot words:
blot, causal, citric, coral, drat, garb, labile, prefect, reserve, rogue, slat, sloe, snag, stanch.

Other possible blind spot words:
brunt, clod, froth, gird, median, recuse, sidle, spilt



In the interest of completeness, below are the innergram pairs that I left out of the table because their word counts were too low. (No count exceeded 8.)

scalarsacral
martialmarital
converseconserve
eternityentirety
preserveperverse
coagulatecatalogue
seashoreseahorse
parentalpaternal
compliantcomplaint
observeobverse
repriserespire
metronomemonotreme
perceptprecept

The last two pairs had word counts of zero. Build one of these words in Word2 and you may be the first!

Sunday, April 10, 2011

Lexicographer: the iPhone version of Guess My Word

As an addendum to my review of the online Guess My Word game, I'm reviewing the iPhone companion game, Lexicographer. Just as with Guess My Word, Lexicographer allows you to guess words and tells you whether your guess is before or after the secret word ("my word").

But rather than having two daily words to guess, Lexicographer will let you keep playing new rounds of "Guess my word!" all day long.


Once you start, there is a running timer (counting tenths of a second!). I found that this totally changed the guessing experience for me. Without a timer, I try to minimize the number of guesses I need to guess the word, choosing guesses in a calculating but leisurely fashion. With the timer constant ticking away, I guessed words in a more frenzied manner.

There is an option that allows you to see how many words are left in the range that you have bracketed. This is a useful way of sharpening your sense of how words are distributed in the alphabet and where the best bisection point is.

We increased the difficulty level from 1 (where a typical word was "talent") to 10 and then struggled on the last few guesses until we narrowed the word down to four possible words (between "parapets" and "paraphrase"). But we were totally stumped at that point. After guessing "paraphobia" (which I hoped to be defined as the fear of parallel lines) and finding that it was not actually a real word, I randomly guessed "paraph". And it was, to my stunned triumph, correct.

(It turns out that a paraph is a flourish someone adds below or to the end of their signature. The example that first comes to mind is below John Hancock's signature on the Declaration of Independence:


It's believed that paraphs originated during the Middle Ages to discourage forgery. I wondered how effective this might have been until I read in Joe Nickell's Detecting Forgery (browsable in Google Books) the following:
It might also be noted that the concept of individuality that today may be expressed in a distinctive signature was less valued in the penmanship of an earlier time, when adherence to strict copybook form was regarded as a virtue. As Jonathan Goldberg notes in Writing Matter: From the Hands of the English Renaissance, "in fact, what differentiated one italic signature from another is more often a paraph, flourish, than the letter itself." [...] Indeed, sometimes so distinctive was the eighteenth-century paraph ([...] like that of John Hancock's or Benjamin Franklin's) that it was sometimes used instead of the signature, thus concealing one's identity except to the initiate.

That was a really interesting word to learn about, but given that I got it through sheer luck, I think I'll be reducing the Lexicographer difficulty level to something more like 5 for now...)

I think that each version of the game has its merits. Guess My Word is fun because you can compete against other people (mostly just names on the leaderboard, unless you happen to know them or invite your friends to play) and see what sequence of words they chose by mousing over their guess history. Guess My Word words often feel more special, as though I can develop an intuition about the kind of words that are likely to be chosen; it's possible that Lexicographer words have a similar property, and I just haven't played enough to pick up on it. Lexicographer's strength is that it can be pulled out anytime and played with a group of friends, as many times as you like. Showing the number of words that you have bracketed allows you to get better at picking the word to bisect the range. Not only does this improve your Word-Guessing ability, it also adds an extra layer to the game, making for a nice complement to Guess My Word.



Update:
  • You may notice that Lexicographer no longer seems to be available on its former iTunes Store page. My guess is that the developer decided not to renew his ($99/year) iOS developer license, and Apple has nixed his apps. Which is a pity.
  • You can still play the original Guess My Word game online. You can also read my previous post about Guess My Word.