If you’re like me, you *love* data.

Ok, so it’s likely that you aren’t like me, at least in the data-loving regard, but you have to admit that (1) data can be beautiful, and (2) data is how we understand the world. And I really mean you *have to* admit that; to do otherwise would be quite deluded. Or, euphemistically, *New Age-y*.

Unfortunately, a lot of data is presented in boring, confusing, and/or misleading ways. In a vain attempt to enable you, the reader, to make data pretty, or, at the least, understandable, I want to strongly encourage a movement away from the crappy graphs generated by Excel and Openoffice and toward the fancy ones made by R. This transition is not simple and so, as with my Python introduction, I’m going to get you started and then feed you to the sharks. I’ll continue to post little tidbits as time goes on, but there are enough tutorials around the Webs already, so I will not do anything in depth. Plus, I hardly understand it myself.

At this point, you may be wondering what the letter “R” has to do with data or graphs. Well, R essentially a programming language for statistics. And you can make pretty graphs with it. Let me demonstrate:

plot(1:100,randomNumbers(min=0,max=10,n=100,col=1),

col=1:100,pch=1:100,main=”Random Numbers”,

ylab=”Some Random Numbers”,xlab=”As a Function of X”)

Yields the following graph:

*x*,

*y*, and

*z*are sets of 100 random numbers:

scatterplot3d(x,y,z)

Yields

*that*was simple. In fact, I could have made the previous plot with the command plot(1:100,y) and it would have looked exactly the same, except that the axes would be labeled differently. Of course, there is one problem with that 3d scatterplot: it’s awfully hard to figure out where 3d points are sitting on a 2d surface. I wonder if there’s a way to get around that in R…

Badass, right?

So, how do we get from not knowing what R is to making 3d plots? Sadly, the learning curve is a little steep unless you’re already familiar with using text to make your computer do stuff. Let’s get started.

First, if you are running linux you may already have R installed. If not, go to synaptic and find it or join the Windows users in downloading it from CRAN. Then install it. As with Python, you can work with R in an interactive mode or by writing files for R to read and implement. We’ll stick with interactive mode for now.

In Windows, launch R from your start menu (it should be a folder called ‘R’). In Linux, just type ‘R’ in the console (it must be capitalized) followed by a vigorous smack of the Enter key. In either case, you’ll see something like this:

- r = ‘awesome’
- x = 1:100
- mean(x)
- sd(x)
- r
- x
- paste( x, r )
- y = c(1,2,3,4,5)
- y
- rep( y, 20 )
- plot( x, rep(y,20))
- plot( x[11:90], rep(y,16))
- summary(lm(rep(y,20)~x))

**1**,

**2**, and

**6**, we

*assigned values*to the variables

*r*,

*x*, and

*y*. This means that when you have R use those variables (they’re called

*objects*), R will return whatever values you made them contain. If you do something in R that isn’t an assignment, it’s going to give you some kind of output. If you assign

*r*to the string ‘awesome’ (note: you can surround strings with ” ” or ‘ ‘), and then just type the expression in (

**5**), R returns the values assigned. In statement two, we assigned

*x*a range of numbers. In R, two numbers with a colon in between (e.g. 1:100) means a list of those two numbers with every number in between. So when you call

*x*(statement

**6**above) you should see all the values from 1:100.

**8**). This basically says “stick the elements in between parentheses into one list). You can make lists of strings, numbers, and even other lists!

**10 shows you how to make a list repeat itself. The syntax is rep( values , # ), where values can be a single item or a list, and # is the number of times you want it repeated. paste() can be used to stick multiple items together, with an output that is a string (this can be useful for labeling graphs). plot(), as you’ve seen before, generates a pleasant, bare-bones graph of your data.**

**In statement 12, I showed that you can selectively choose any elements you want out of a list. Normally,**

*x*contains the values 1-100. If you only want the middle 80 values (i.e. not the first or last 10), you can have R return only those using the square-bracket notation ‘[ ]’. The numbers inside that bracket will refer to the positions you want. So, if you said*x[1]*, you would return the first value in the list*x*(which is 1). Since*10:90*is a list of the numbers 10-90, the expression*x[10:90]*is going to return all items in*x*between the 10th and 90th positions.

**The last item shows a glimpse of the statistics that R can do (stats is the main reason people use R in the sciences). This expression is more complicated, so the reason I put it there is to show you how simple (meaning “short”) expressions can be in R to give you quite detailed and valuable information.**

**Alright, now dive in! As with Python, I’ll continue to post more useful things on using R, though I will refrain from doing a thorough tutorial. I’ll list some useful tutorials in a later post. Good luck!**