a simple model of selection

Inspired by Dawkins’ METHINKS IT IS LIKE A WEASEL program (hereafter just weasel) described in his book “The Blind Watchmaker,” and wanting to practice my blossoming C++ skills, I decided to write my own version of weasel. It was successful enough, and I found the results interesting enough to warrant discussion. Download the program (Windows .exe file) so you can try it out for yourself (and you can also get the source code if you want). In this post I’ll discuss what the program does and why. In the next post I’ll talk a bit about the results of the program.

weasel_title

Weasel is programmed to provide a (very) simple model of natural selection. Natural selection is, essentially, what we call the fact that things that are better at reproducing reproduce more. Oversimplified? Perhaps, but the reality is that the concept of natural selection really is that simple, it is the just the ramifications of this fact that are relatively difficult to understand. So when you run weasel, you will provide a target phrase made up of letters and spaces (e.g. “methinks it is like a weasel”). This is the target of our selective process. After you have provided a target, weasel will then make a random phrase of the same length. This is your starting generation. Essentially, we want to use a process of selection to turn the garbled phrase into the target phrase.

Now how could we go about doing that? There are likely many ways, but the only two that are important for this discussion are the random option and the option that mimics natural selection. Indeed, Dawkins’ whole point with his discussion of this program was that natural selection is hugely powerful and efficient compared to random processes. The following discussion on randomness vs. selection  is, in large part, my paraphrasing of this section of Dawkins’ book.

the random approach

The random option: change the letters of the garbled phrase at random and see if they match the target phrase. This is the same idea as the monkeys-at-typewriters business. The random process is easy enough for a computer program to do, and is equivalent to typing letters at random until you manage to hit on the target sequence of letters. The phrase “methinks it is like a weasel” consists of 28 characters (including spaces). How difficult would it be to have a monkey type 28-character sequences until he happened onto the phrase “methinks it is like a weasel”? Each position can contain 1 of 27 possible characters (the 26 lower-case letters plus the space character). This means that there are 2728 possible phrases that are 28 characters in length(the vast majority being complete gibberish). In scientific notation, this number is about 1.2×1040. For those of you who aren’t familiar with scientific notation, here is that number in standard notation:

12,000,000,000,000,000,000,000,000,000,000,000,000,000

Yeah. The national debt of a trillion trillion USAs. That’s a huge number. So large, in fact, that it barely has any meaning. So what are the chances of that monkey randomly typing “methinks it is like a weasel”? Well, 1 divided by that giant number. So we can safely call it zero. This doesn’t mean it can’t or won’t happen (remember, every single time the monkey types something, whatever he typed had the same tiny chance of occurring, but it occurred all the same), it just means it’s incredibly unlikely. Even at the fast speed that a computer processor can randomly generate a phrase and test it against the target phrase, it would take something much larger than the human lifespan to get the desired result.

Clearly randomness would not be an efficient way to change from one phrase to another, much less one organism to another.

the selection approach

Our second option is selection. We are still going to make random changes, but we will add a second step. With our random process, all that mattered was that the phrase either was or was not the target. We didn’t care how close it was. In essence, we didn’t differentiate between the phrases “methinks it is like a weasee” (just one letter off) and “asdfalsdfjoijfdbjnqertbqwaxs” (all letters wrong). Now what if we had some sort of scoring system so that we could select the closest match after a randomization, and then start with that for the next randomization? This is, essentially, what selection is.

As a simple example, let’s take the word “hello”. There are 275 possible phrases of this length, only one of which is the word “hello”. Let’s start with something random, like “poker”. Same length, no letters in common. We want to turn “poker” into “hello” using a selective process. There are different ways to do this, but for our purposes we’ll use the one that mimics selection in nature. So our process will let the phrase “reproduce”, which in this case means that the phrase will make copies of itself. But we will also introduce errors into the offspring (the new copies) by randomly changing some number of letters.

Let’s say our phrases will have 5 offspring in each generation, with an average error rate of 1/5 per copy (so each new phrase will, on average, have one changed letter). “poker” would then give us something like “pokir”, “paker”, “xoker”, “pokes”, “poler”. Now we see if any of the offspring are a closer match to “hello”. Indeed, “poler” shares an “l” with “hello”. So now we select the phrase “poler” and do the same thing again, hoping that we might get another common letter. And so on.

This is what the weasel program does. You tell it the target phrase, and it randomly generates a starting phrase of the same length. You then tell it how many offspring you want per generation and how many errors (on average) per offspring there should be. Then the program will make the offspring, select the one closest to the target, have that one have offspring, and so on until it ends up at the target. The selected offspring that seed the next generation, along with generation number, are printed on your screen.

weasel_result

With the power of selection, you will see that you can obtain a long phrase in pretty short order, as long as you have a fair number of offspring and a low mutation rate. Try it out with “methinks it is like a weasel”, 15 offspring per generation, and 1 mutation per offspring. Then try more interesting things, like really high numbers of offspring, or higher mutation rates, etc. You will find that some parameters will cause you to never get to the target phrase (at least in your lifetime) and that others get you there extremely quickly. This actually reveals a lot about how selection works in real life, and so I’ll discuss these results in a later post.