Allele frequency problem in “Looper”

Time travel movies are always full of bad physics and and contradictory logic, though certainly some do it better than others. I usually just try not to think about them too hard so that I can take in the entertainment value. Looper (streaming|DVD) is no exception, but the most glaring error in the movie’s science was not in the physics; it was in the biology.

The beginning of the movie tells of a new mutation, the “TK mutation”, that has crept into the population to give people weak telekinetic powers. The idea of a gene, and more importantly a mutation in an existing gene, somehow allowing telekinesis is of course absurd, but that isn’t what I’m talking about.

I’m talking about the allele frequency. The movie takes place in the 2040’s. Only thirty years from now. And, at that time, the movie says that 10% of the human population has the TK mutation. This frequency is fantastically improbable.

Why? Well, right now 0% of the human population has this mutation. The thirty years between now and then have to bring that to 10%. That sounds impossible – let’s see if my suspicion is correct.

Continue reading

Cloning trick: ligation of multiple inserts

[2013.02.26 Edit: A number of people are finding this through Google searches. I don’t have an updated post on the topic, but if you’re trying to assemble multiple DNA fragments then I suggest looking into Gibson Assembly. NEB sells* a dead-simple mastermix, which is a bit pricey per reaction (I just make my reactions half the size) but comes out to cheap when you take into account the cost of labor (so long as your PI values your time…).]

I’ve spent the last couple months building a plasmid library, and in the process I thought of a trick. Ligations, perhaps the worst part of cloning, are notoriously finicky reactions. The goal is to take several pieces of linear DNA, where the ends of the pieces can only connect in a certain way, and then use an enzyme (T4 Ligase) to sew them all together into one piece (in my case, a circular plasmid).

Figure 1. Ligase (2HVQ.pdb) rendered in PyMOL. Click to see a crappy animated GIF!

I needed to insert three fragments at once into a single backbone. In my ignorance (from my lack of experience) I thought ligating four fragments should work just as well as two, so I just threw them all together and ran the reaction. The result was a mess, and when I tested 40 different clones afterwards not a single one was correct. So I started adding them one piece at a time which, obviously, was going to take three times as long.

Continue reading

average gene length in prokaryotes (part 1)

One of my side research projects involves processing large numbers of genomes (specifically, all fully-sequenced prokaryotic genomes). Since I’m playing with the data anyway, sometimes I end up with random questions that can be answered with what I already have on hand. One such question is this: “What is the average length of a prokaryotic gene?” We could figure this out fairly directly, but it’s always best to have a prediction in hand first. After all, if we have no idea what kind of values to expect, how can we trust the accuracy of a more direct (and experimental) method?

So what do we know? There are 4 possible bases (A, G, C, and T) and three such bases make up a codon. This means that each position of the codon can be any of 4 bases, so there are 4*4*4 = 64 possible codons. Of these, 3 are stop codons (meaning that they mark the end of a gene). We generally think of there being only 1 start codon (ATG, coding for methionine), but it turns out that prokaryotes often use other codons instead. Plus, if there are multiple ATG’s in the same stretch of DNA, how do we know which is the actual start?

For example, take the sequence:

(Sequence 1)  ATG AGT TGA ATG GTA TTG TAA TTT AGA TAA

This sequence has two potential start sites (in bold) and two stop codons (in bold italics). We can unambiguously choose the first stop codon, but we have no way of knowing without more evidence which start codon is the real one.

To get around this, let’s take a conservative approach in calling sequences a “gene”. Instead of anything beginning with a start codon and ending with a stop, let’s take the entire genome and blast it to bits by cutting at every stop codon.

Continue reading

Comp Bio is complicated

I finished up my first lab rotation two Fridays ago, here at UT Southwestern. It was a pleasant few months with an interesting project, consisting mostly of starting at a computer screen and writing Python scripts, running BLAST searches, and so on. To summarize, but leaving things vague (both for most-people-don’t-care reasons and the-data-is-unpublished reasons), the project was this:

There are currently a crap-ton (“crap-ton” is a standard scientific prefix) of bacterial and archaeal genomes published and available on NCBI‘s servers. Archaea, like bacteria, are single-celled prokaryotic organisms. However, they differ from bacteria genomically (and therefore metabolically) in many ways. Some archaeal properties are like those in eukaryotes (like us!), while others are like those in bacteria. So one of the huge unanswered questions in evolution is: how are bacteria, archaea, and eukaryotes related to each other? Or, how would we make a tree of life relating these three domains?

Continue reading