This post discusses a computer program that you can download to try yourself (and get the source code if you want to make your own version).

At our huge family reunion (Mom’s side) earlier this summer, we were handed a wordfind that someone had generated somewhere on the Internets that contained the names of the family founders. I was solving mine and noticed that, as anyone has frequently observed, in any given wordfind you will find words that are not in the list. Presumably, this is due to the randomly-assorted letters, by chance, spelling out an unplanned word. Of course, the wordfind makers might also stick those in on purpose (for example, the family wordfind contained the website name multiple times) or purposely prevent some random words (profanity). Regardless, I began to wonder how often a word might appear in a word find just by chance. So I used the margins to scratch out a formula for the chance of finding a word of a certain length within a matrix of random letters.

Here’s what I got (I’ve lost the derivation, but I kept the answer in my wallet):

For a square matrix of random letters with dimensions *n* (meaning *n* by *n* letters) and a word of length *w*, the chance of finding that word is

**(4/26**^{w}**)(12****w**^{2}**+10****w****+5****nw****-4****n****-9)**

This includes searching forwards, backwards, and diagonals (i.e. any starting letter can spell the word in eight possible directions).

Because deriving that thing was so tedious (and error-prone), I decided to test my C++ skills and make a program that would actually create wordfinds and then see how frequently words showed up. That way I could test the results of the program with the results of the derivation. The program, called wordFind Statistics, gives options to choose the dimensions of your wordfinds, how many you want to search, what word you want to search for, and whether or not to allow diagonal or reverse directions.

So, let’s try it out! How about a wordfind of dimensions 20 by 20, looking for the word “zomg” (four letters). For the formula, that gives us *n*=20 and *w*=4. Plugging these in, we get a chance of 4.75×10^{-3}, meaning that, on average, we will find the word “zomg” in about half a percent of 20×20 wordfinds.

Now let’s check this by simulation using the wordFind program.

Height: 20

Width: 20

Word to Find: zomg *[Note: any 4-letter word will do]*

Number of wordfinds: 100000 *[Note: need a big number, since the likelihood is low]*

Diagonals and Reverse allowed (as in the formula)

I repeated this 6 times and got an average of 5.5×10^{-3}, which is in the same order of magnitude but significantly higher (16%). So, either my formula is wrong or there is an error in my program (or both!). My bet is on the formula, though I’m astonished it was this close. I’ll have to redo the derivation. Any thoughts?