hipsteR re-educating people who learned R before it was cool
I was an early adopter of R, having first
learned S (yay!) and then S-plus (yuck!). But at times my knowledge of
R seems stuck in 2001. I keep finding out about “new” R
replicate, which was new in 2003).
This is a tutorial for people like me, or people who were taught by people like me.
Switch to knitr
A number of Sweave annoyances have been eliminated, but most importantly you can use knitr with R Markdown or with AsciiDoc for writing simple reports. The markup is much simpler than LaTeX, and you don’t have to worry about page breaks.
Learn Hadley Wickham’s packages
These are the main packages for what’s now called the “tidyverse”, which has grown beyond Hadley. Also check out
- lubridate for handing dates
- stringr for handling strings
- forcats for handling factors
- readr for reading csv/tsv files
- readxl for reading Excel files
- broom for tidying statistical analysis objects
Adopt the pipe operator
You’re old school, so you’re used to writing stuff like this:
x <- c(0.109, 0.359, 0.63, 0.996, 0.515, 0.142, 0.017, 0.829, 0.907) round(exp(diff(log(x))), 1)
Seems perfectly fine, but note how it’s read from the inside out. With the pipe operator, you can do the same series of steps, written in the order that they’re actually performed.
library(magrittr) x %>% log %>% diff %>% exp %>% round(1)
The pipe operator does some magic that makes the bit on the left be the first argument of the function call on the right.
If you need the bit on the left of the pipe to be somewhere other than the first argument, you can use a period. For example, here’s a wacky way to get the log (base 2) of 5.
2 %>% log(5, base=.)
Note: Jenny Bryan suggests that we use the parentheses on the functions even when they’re not formally required, like this:
library(magrittr) x %>% log() %>% diff() %>% exp() %>% round(1)
If you’re still using the R GUI (for Windows or Mac), you should switch to RStudio. Everything about it is better.
But I use RStudio for teaching: for demonstrations, and I have the students use it; it’s the best environment for learning R.
CRAN is huge, and there’s also GitHub
And there are even more packages that live on GitHub (solely, or in
addition to CRAN), and with the
install_github() function in the
devtools package, you can skip
CRAN and install packages straight from GitHub. devtools also has an
install_bitbucket() for installing from
I’d better mention Bioconductor; oodles of bioinformatics/genomics-related packages live there rather than CRAN.
You can put underscores in names
It used to be that
_ was a shortcut for
<-. (That was always a bad
idea. And it led me to use dots in function names, like
which has been problematic due to the S3 class system.)
Then they started allowing
= in place of
And then they got rid of
_ as a shortcut for
<-. Good idea, and
now we can have functions named like
Read about new features
Read about new features in R here.
New apply-type functions
You probably know about
did you know about
mapply? And how about
Parallel and Rcpp
I searched through the
NEWS files (mentioned above) and
wrote down some of the functions that were new since 2002.
(Note that I have little experience with many of these, and some are
not entirely recommended. For example,
left_join in dplyr can be 10×
merge. Ben Bolker recommends the
replicate, as you get to define the return structure.)
The source for this tutorial is on github.
I would be glad for suggestions, corrections, or additions.