I’ve been perusing a pretty good book by Michael McRoberts called “Beginning Arduino”, and after putting together one of the first projects I decided to have fun with it and write some more interesting code than the one provided. The original scheme gave a series of three LEDs that would turn on as if they were a stop light, and then allow someone to press a button to get the light to change to red so that another LED, representing the pedestrian walk sign, could turn on. This wasn’t very interesting to me, so I made it into a reaction game instead otherwise using the same circuit. See the video immediately below, and the circuit diagram (made using Fritzing) and code below the fold.

# programming

# The Birthday “Problem” in Python

A few days ago I found myself having a vague recollection of an interesting statistics problem. All I could remember was that it had to do with having a room full of people and the probability that any two people in that room would have the same birthday. I remembered the point, which was that it is much more likely than you might think, but I was fuzzy on the details.

After trying to define the problem and find an answer mathematically, I remembered that I suck at statistical reasoning about as much as the average person. So I decided to model the problem with a short Python script and find the answer that way.

Sure, I could’ve looked it up, but where’s the fun in that?

**The problem:** There are * n* people (say, at a party) drawn randomly from a population in which the chances of having a birthday on any day is equal to having a birthday on any other (which is not true of real populations (probably)). What is the probability of there being at least two people with the same birthday in the sample?

To put this thing together, I figure we need three things:

- The ability to generate random numbers (provided by Python’s random module);
- An object representing each person;
- A party object full of those people.

Then we can add things like the ability to choose how many people we want at the party and how many parties to have, as well as some output for making plots!

First, the Person object. All each person needs is a birthday:

import random random.seed() class Person: def __init__( self ): self.birthday = random.randint( 1, 365 )

# Python: Monty Hall modeling

You’ve all heard this classic statistics problem, based on an old game show:

A contestant is shown 3 doors. Only one of those three doors hides something of value to the contestant (perhaps a new car), while the other two contain nothing. The contestant chooses one door, but that door remains closed. The host then opens up a 2nd door, and this door is *always* a losing door. At this point, the contestant may choose to now open the originally-chosen door, or switch to and open the last remaining door.

So why is this interesting? It turns out that the way to maximize your chances of winning is to *always switch*, and this maximized chance is 67%. It also turns out that this is totally non-intuitive, and that most people think that, if the contestant always switches, the chances of winning are at best 50%. If you haven’t heard the solution to this problem before, you should think through it and see what you expect the chances of winning are under the two conditions: After the contestant chooses a door, and is subsequently shown that one of the other two is a losing door, [1] the contestant *always* switches to the remaining door, or [2] the contestant *never* switches. After the jump, I’ll explain this intuitively and then show a Python script to simulate this problem.

# Python: Clean up and translate nucleotide sequences

*[If you are more familiar with biology than with Python or computer programming, I highly recommend this book .]*

*[Note: A lot of you are finding this post through Google searches. Let me know in the comments if you found it helpful and, if not, what it was you were looking for!]*

Some simple, hopefully useful, and totally non-optimized functions for working with nucleotide sequence data (note that there are many more tools as part of the biopython distribution, if you’re interested in learning the library) :

First, for cleaning up a sequence (preferably in FASTA format):

def clean_sequence( sequence ): """Given a sequence string, return a crap-free, standardized DNA version.""" s = sequence.replace( '\r', '' ).split( '\n' ) # separate each line if s[0][0] == '>': s = s[ 1 :] # remove defline s = ''.join( s ) # make one long string s = s.replace( ' ', '' ).replace( '\t', '' ) # remove spaces return s.upper().replace( 'U', 'T' ) |

Then, a function to let you know if there are characters in your sequence that shouldn’t be:

def report_bad_chars( sequence ): """Given a string 'sequence', return a dictionary of any non-AGCT characters.""" bad_chars = {} for l in sequence: if l not in 'AGCT': if l in bad_chars: bad_chars[ l ] += 1 else: bad_chars[ l ] = 1 if bad_chars != {}: print( bad_chars ) |

After the jump, functions for translation, calculating amino acid and nucleotide frequencies, and making random DNA sequences.

# More Puzzling

In a previous post, I discussed my attempt to write a program to solve a puzzle. I never updated that post because, well, I ran the program all night and it didn’t find the solution!

I had made up a fake puzzle that I knew had a solution for testing, and the program could solve it in 15 minutes. But it couldn’t solve the one I had recorded for the real puzzle. I figured (and hoped) that I had simply recorded it wrong and to check, I re-recorded the pieces and tried again. And it worked! Here’s how:

# A simple model of selection

Inspired by Dawkins’ METHINKS IT IS LIKE A WEASEL program (hereafter just *weasel*) described in his book “The Blind Watchmaker,” and wanting to practice my blossoming C++ skills, I decided to write my own version of *weasel*. It was successful enough, and I found the results interesting enough to warrant discussion. Download the program (Windows .exe file) so you can try it out for yourself (and you can also get the source code if you want). In this post I’ll discuss what the program does and why. In the next post I’ll talk a bit about the results of the program.