Python: shared birthdays

A few days ago I found myself having a vague recollection of a statistics problem presented at some unknown level in my education. All I could remember was that it had to do with having a room full of people and the probability that any two people in that room would have the same birthday. I remembered the point, which was that it is much more likely than you might think, but I was fuzzy on the details.

After trying to define the problem and find an answer mathematically, I remembered that I suck at statistical reasoning about as much as the average American. So I decided to model the problem with a short Python script and find the answer that way.

The problem: There are n people (say, at a party) drawn randomly from a population in which the chances of having a birthday on any day is equal to having a birthday on any other (which is not true of real populations (probably)). What is the probability of there being at least two people with the same birthday in the sample?

To put this thing together, I figure we need three things:

  1. The ability to generate random numbers (provided by Python’s random module);
  2. An object representing each person;
  3. A party object full of those people.

Then we can add things like the ability to choose how many people we want at the party and how many parties to have, as well as some output for making plots!

First, the Person object. All each person needs is a birthday:

import random
random.seed()

class Person:
    def __init__( self ):
        self.birthday = random.randint( 1, 365 )

Continue reading

2 Comments

Filed under computers/software, science

cloning trick: ligation of multiple inserts

I’ve spent the last couple months building a plasmid library, and in the process I thought of a trick. Ligations, perhaps the worst part of cloning, are notoriously finicky reactions. The goal is to take several pieces of linear DNA, where the ends of the pieces can only connect in a certain way, and then use an enzyme (T4 Ligase) to sew them all together into one piece (in my case, a circular plasmid).

Figure 1. Ligase (2HVQ.pdb) rendered in PyMOL. Click to see a crappy animated GIF!

I needed to insert three fragments at once into a single backbone. In my ignorance (from my lack of experience) I thought ligating four fragments should work just as well as two, so I just threw them all together and ran the reaction. The result was a mess, and when I tested 40 different clones afterwards not a single one was correct. So I started adding them one piece at a time which, obviously, was going to take three times as long.

Continue reading

Leave a Comment

Filed under HowTo, science

secrecy and biological research

It is becoming increasingly clear to me that my ideal picture of “doing science” is following the fate of all ideals: death at the hands of reality.

While I was working away at WashU, preparing for graduate school, I imagined myself as a grad student. In that imagination land I was working my ass off, learning all kinds of things, and sharing every bit of that with others via this platform. My work would be totally open. In reality, I am doing only those first two things.

Every couple of days I have an awesome research experience or an interesting new idea and come home planning to write about it. Then I start thinking about how I can present the experience while maintaining the proper level of censorship. Perhaps unsurprisingly, my motivation always quickly evaporates. You may be wondering: why is there any censorship required at all?

Continue reading

1 Comment

Filed under life, science

Mindforge Tutorial: Cleartype Tuner

As part of the continuing closet-cleaning series: WindowsXP text is ugly and induces eye-pain and headaches. Here’s what you can do about it:

Leave a Comment

Filed under computers/software, HowTo

Mindforge Tutorial: Try out Linux using Wubi

Still cleaning out the closet… I made this video tutorial a couple years ago for my brother‘s and my short-lived computer company. It’s a little outdated, but (probably) still accurate.

Leave a Comment

Filed under computers/software, HowTo

Mindforge Tutorial: Disk Images

If your first thought, when reading the title of this post, was “What in the hell is a disk image?”, you probably aren’t alone. But before you decide that you don’t care enough to read on, let me quickly tell you what the big deal is so that your decision to stop reading will be an informed one!

A disk image is a virtual copy of a real disk (so a virtual CD).

That’s it. This is cool because you can make backup copies of your CDs/DVDs (though some have protections that may make this difficult), video games, operating systems, etc etc.

On top of that, you can use virtual CD-drives to play your virtual disks! This has a few advantages: [1] your computer communicates faster with your virtual disks than your real ones, and [2] you won’t have to worry about carrying CDs around ever. Of course, CDs are going the way of the Dodo, so this post may already be useless.

In any event, my middle brother and I once started a custom-PC company called Mindforge Technologies. I made a screencast tutorial for the company that shows how to use disk images, which is the Youtube video above. Enjoy!

Leave a Comment

Filed under computers/software, HowTo

LabTeX

Lab notebooks are the linchpin of any scientific endeavor, since they serve as proof for everything that an investigator has done (and as a personal reference for long-forgotten protocols). The standard is to use a bound notebook with handwritten (in pen!) notes, the idea being that these are more difficult to fake and easier to organize than, say, loose-leaf paper with pencil scribblings.

However, current biological research is done more and more by computer, and a lot of this stuff does not translate well to a hard-bound notebook. For instance, I’m regularly taking thousands of microscopy images (in a single day) for my projects, and printing all of these off would be tremendously stupid. The value of spreadsheets, scripts, plots, and all kinds of computer-generated data is in the fact they they are digital, so why try to convert them to an outdated medium?

Continue reading

4 Comments

Filed under computers/software, LaTeX, science

Make Notepad++ your default editor (Windows7)

I previously showed you (without a screencast) how to make NP++ default in XP. Of course, people have successfully done this for W7/Vista as well, but the various tutorials I saw were all a huge pain in the ass. Except for one: a comment on a more-complicated method. I made the following short screencast to demonstrate that method. This one has the advantages of being easy to reverse, easy to do, and not in any way intimidating!

6 Comments

Filed under computers/software

Python: Monty Hall modeling

You’ve all heard this classic statistics problem, based on an old game show:

A contestant is shown 3 doors. Only one of those three doors hides something of value to the contestant (perhaps a new car), while the other two contain nothing. The contestant chooses one door, but that door remains closed. The host then opens up a 2nd door, and this door is always a losing door. At this point, the contestant may choose to now open the originally-chosen door, or switch to and open the last remaining door.

So why is this interesting? It turns out that the way to maximize your chances of winning is to always switch, and this maximized chance is 67%. It also turns out that this is totally non-intuitive, and that most people think that, if the contestant always switches, the chances of winning are at best 50%. If you haven’t heard the solution to this problem before, you should think through it and see what you expect the chances of winning are under the two conditions: After the contestant chooses a door, and is subsequently shown that one of the other two is a losing door, [1] the contestant always switches to the remaining door, or [2] the contestant never switches. After the jump, I’ll explain this intuitively and then show a Python script to simulate this problem.

Continue reading

4 Comments

Filed under computers/software, science

useful bioinformatics resources

I’ll try to keep this page updated with the kinds of things that I’m familiar with and find the most useful. The resources in this list will be used extensively in tutorials on this site.

General

Genomes:

Online Tools:

Leave a Comment

Filed under computers/software, HowTo, science