A blog for fans of Bananagrams, word games, puzzles, and amazing things

Sunday, December 15, 2013

The amazing ENABLE word-list project

While looking for a good word list to use for a project that I am working on, I discovered ENABLE (which stands for Enhanced North American Benchmark LEexicon), a word list that seems to have been compiled mainly by Alan Beale (with some help from Mendel Cooper) in order to create a reference that can be used when playing word games. Since it is an open and freely available list, it has served as the basis for the word lists used in many games, such as Words with Friends. What distinguishes this word list from the many others out there is how thoroughly its creation has been documented in the many files in the ENABLE package and its supplemental archive.

For this reason, many of the disadvantages of the Scrabble Tournament Word List can be eliminated. For instance, as the compilers themselves note:

In contrast to other word lists, the ENABLE list has not been crippled by being limited to words under an arbitrary length. The ENABLE list is eminently suitable for most word games, such as Anagrams and Clabbers, and for crossword puzzle solving, rather than just for Scrabble. A great deal of research has gone into removing this limitation, however the list is much the better for it.
Another critique of the Scrabble Word Lists and Dictionaries is that they are carrying around many words that were in dictionaries back in the 1970s but have long since disappeared from both usage and lexicons. The ENABLE supplement includes a list of 9,768 stale words (which it defines as words that appear in the Scrabble Tournament Word List but not in modern dictionaries).

Most of these stale words (like AXAL (an obsolete form of "axial") and WHERVE ("a round piece of wood put on a spindle to receive the thread")) were words I had never heard of and therefore had no problem eliminating from the word list for my project. There were also some words that I thought needed to be retained based on being in common usage including SPELUNK/SPELUNKED/SPELUNKING (which, according to the Google Books Ngram Viewer, has been used with increasing frequency since about the 1940s) and UPSTANDING (which peaked in popularity in the 1920s, reached a local minimum around 1970, but has been on the upswing since 1990).

This is only a sampling of what makes ENABLE so useful. Amateur lexicographers and other interested parties can find and download the whole ENABLE package through this page.

Monday, November 4, 2013

A sesquipedalianist Boggle puzzle

During my analysis of the effects of playing Boggle with different letter distributions, I simulated more than 50,000 games of standard 4-by-4 Boggle. One statistic that I was tracking was the longest word found across each of the data sets. Invariably, each data set got stuck at a maximum word length of 11 letters. (In the Big Boggle simulations, the solver found words as long as 13 letters.) I was really hoping for something longer, but rather than keep running simulations until I finally find some 12-letter word and have it turn out to be something disappointing, like BORINGNESSES, I've decided to embed one 16-letter word in a Boggle grid and present it below as a puzzle for your solving pleasure.

Good luck!

Thursday, October 17, 2013

The Boggle cube redesign and its effect on the difficulty of Boggle

I wanted to buy a copy of Boggle. This seemingly simple mission was complicated by the facts that a) there are different kinds of Boggle out there and b) I like to make things complicated.

If you exclude variations such as Big Boggle and the recently introduced 6-by-6 Super Big Boggle and just limit yourself to the original 4-by-4 Boggle configuration, there are three principle versions of Boggle:

1) Boggle Reinvention (now sold as just "Boggle") - While the new sealed case design and the integrated timer mean that you don't need to worry about losing any of the pieces, there are reports that it is possible for two dice to become jammed together in such a way that it is essentially impossible to separate them without opening up the case and destroying the game. In my opinion, Boggle should not be a game that can break.

2) Plain old Boggle, made from about 1976 to 1986 (which I will call "classic Boggle").

3) The version of Boggle sold from 1987 to ~2008 - essentially the same as classic Boggle except that the letter distribution on the cubes was completely redesigned. I'll call this "New Boggle".

Below, you can see a side-by-side comparison of the classic and new sets of Boggle dice.

Boggle Dice 
Boggle Dice
To help visualize the differences between these distributions, I sorted the classic letter distribution by number of letters (shown on the left below) and used that order to sort the new letter distribution (shown on the right).

<= Classic New =====>
      DDDD DDD
      UUUU UUU
       BBB BB
       CCC CC
       GGG GG
       HHH HHHHH
       MMM MM
       PPP PP
       YYY YYY
        FF FF
        KK K
        VV VV
        WW WWW
         J J
         Q Q
         X X
         Z Z
In some ways (such as increasing the number of Ts and Hs), the new distribution is closer to the letter frequency in English words, but that motive alone would not explain why the number of As was decreased and the number of Os was increased. It has been suggested that this change was designed to reduce the frequency of harder letters (like K and G) and make finding words easier.

One other interesting property of the new set of dice is that since it concentrates certain letters all on the same die, it is never possible to make words that combine F and K (like FAKE, FORK, SKIFF,...) or words that combine B and J (JOB, JAB, BANJO,...). It is also not possible to make words with three Ps (like PINEAPPLE) or two Ks (like SKOOKUM, which is a slang term in the Pacific Northwest, derived from the Chinook language, and having multiple meanings: as an adjective it refers to something that is massive or powerful or reliable or simply really cool; as a noun, it can refer to an evil spirit or demon or a monster somewhat like Bigfoot or Sasquatch; it is pronounced /SKOO kum/).

So does this change in letter distribution have an effect on the game? To find out, I ran some simulated Boggle games, generating random boards with each set of dice and using a Boggle solver (written by GitHub user cespare) to determine the number of words in each board, the resulting Boggle score, and the longest word in each grid.

[The default word list used by the Boggle solver only contains words that are 15 letters long or shorter. While it's highly unlikely to find a random 4-by-4 Boggle board containing a 16-letter word, I decided to augment the word list to include 16-letter words, as well as the 17-letter words that could be made with the Qu cube (like QUATTUORDECILLION [which means 1045 in the U.S. and 1084 in Britain] and SESQUIPEDALIANISM).]

Dice setAverage 
# of words
length of
longest word
New Boggle~104~1506.8
Classic Boggle~93~1286.6

The results show that there are about 12% more words to be found in a New Boggle board. These results are from simulating 10,000 boards for each set of dice, so the numbers in the table may be off by a few percent.

To try to give a little more insight into the difference between these versions of the game, I ran simulations for a New Boggle board in which a randomly chosen cube from a corner of the board was removed. The corresponding results,

Dice setAverage 
# of words
length of
longest word
New Boggle
w/o one
corner cube

are really close to the classic Boggle results, suggesting that if you want to make your New Boggle game about as hard as a classic Boggle game, you can just remove a die from the corner of the board before hunkering down to find words.

Out of curiosity, I also ran simulations for 5-by-5 Big Boggle, using the requirement that words be at least four letters long (unlike the three-letter limit in regular Boggle) and using the Big Boggle scoring system, which yielded these results:

Dice setAverage 
# of words
length of
longest word
Big Boggle~190~3958.3

(For comparison with 4-by-4 Boggle, if you include the three-letter words that are in Big Boggle boards, the average number of words increases to about 260.)

Of course, these calculations only confirmed what I already knew: the older version of the game is harder and is the one for me. I bought a copy of classic Boggle from eBay. The dice are made of wood rather than plastic. The timer has sand in it and doesn't make some noise to tell me when time is up. Succinctly, I think it is skookum.

Sunday, September 15, 2013

The most difficult Bananagrams challenge I've encountered

Recently, I played a few rounds of Bananagrams. At the beginning of the last game, I flipped over my tiles and only had three vowels. As I continued to peel mostly consonants, I realized that the optimum strategy was probably to dump consonants until I obtained a more reasonable consonant-to-vowel ratio, but I wanted the challenge of trying to finish the game without dumping tiles. But by the end of the game, the situation had not improved: I had 23 consonants and 6 vowels. Furthermore, I also had a Q (with no U), an X, and a Z. I was nowhere close to finishing my grid by the time someone else won.

Here was the set of tiles that I had at the end:



I decided to save the tiles and try to work out a solution later. I spent some time working on this problem on two consecutive nights. The second night I found a solution that used all the letters but one N, but that seemed to be the best that was possible.

Finally, several days later, I found a true solution. It's possible to vary some of the peripheral words and get alternate solutions, but there is a core structure that I have not been able to alter without rendering the grid uncompletable.

This puzzle can be solved without using any two-letter words or any vowelless words. (I think that violating these constraints would make the puzzle too easy.)

I leave this as a challenge. I will post a solution at some point in the future.