A few years ago I wrote a post on making Notepad++ your default text editor, wherein I expressed my extreme love for this little piece of open source software. To date, I use it for just about everything and, after extensive use in a computational biology lab rotation, I figured I should share one of its most amazing features: Regular Expressions.
I’m not going to take the time to define Regular expressions (“REs” from now on) and will instead send you to Wikipedia. Just know that they are totally badass. In a nutshell, REs let you do really fancy search-and-replace in a text document. Perhaps that doesn’t excite you, but let me give some hypothetical situations in which you may find yourself:
- You have a file of contact information for everyone you know (say, 1000 people) and want to get just the email addresses so that you can spam everyone.
- You have a large FASTA file and want to pull out all of the organism names.
- You want to convert a file from one format to another.
- You want to combine multiple lines into a single line.
- You want to separate a line into multiple lines.
- Other pain-in-the-ass sounding stuff.
Sure, you could manually copy-paste all of those email addresses or organism names, and you could go through and hit the ENTER key to put things on separate lines. OR, you could write a few characters into NP++’s Find & Replace box. I think an example is the best way to make this work.
As a biologist, I’m rather fond of FASTA files. FASTA is simply a way to format DNA or protein sequence data so that people and programs can easily do stuff with that data. The format is: