I was an early adopter of R, having first learned S (yay!) and then S-plus (yuck!). But at times my knowledge of R seems stuck in 2001. I keep finding out about “new” R functions (like replicate, which was new in 2003).

This is a tutorial for people like me, or people who were taught by people like me.

Switch to knitr

If you use Sweave, it’s time you switched to knitr. You’ll find that the transition is easy.

A number of Sweave annoyances have been eliminated, but most importantly you can use knitr with R Markdown or with AsciiDoc for writing simple reports. The markup is much simpler than LaTeX, and you don’t have to worry about page breaks.

Learn Hadley Wickham’s packages

Start with dplyr, tidyr, reshape2, and ggplot2.

Also devtools, roxygen2, and testthat.

Also lubridate and stringr.

Also, read his Advanced R programming book.

Consider RStudio

If you’re still using the R GUI (for Windows or Mac), you should switch to RStudio. Everything about it is better.

Personally, I stick with Emacs + ESS, because I’m writing code in multiple languages (not just R). (Another IDE option for R that many recommend: Eclipse with StatET.)

But I use RStudio for teaching: for demonstrations, and I have the students use it; it’s the best environment for learning R.

And note that RStudio makes it easy to use knitr with Markdown, and to develop R Packages. And RStudio also has some nice debugging features, like the ability to set breakpoints.

RStudio, the company, produces a number of other great tools, like shiny and ggvis.

CRAN is huge, and there’s also GitHub

CRAN has over 6000 packages, with lots of great stuff like data.table, magrittr, RSQLite, XML, rCharts, animation, and slidify.

And there are even more packages that live on GitHub (solely, or in addition to CRAN), and with the install_github() function in the devtools package, you can skip CRAN and install packages straight from GitHub. devtools also has an install_bitbucket() for installing from BitBucket.

I’d better mention Bioconductor; oodles of bioinformatics/genomics-related packages live there rather than CRAN.

And while I’m talking packages, I should mention ROpenSci, an effort to create packages to access all kinds of data repositories from R. Take a look at their list.

You can put underscores in names

It used to be that _ was a shortcut for <-. (That was always a bad idea. And it led me to use dots in function names, like calc.genoprob, which has been problematic due to the S3 class system.)

Then they started allowing = in place of <-.

And then they got rid of _ as a shortcut for <-. Good idea, and now we can have functions named like calc_genoprob.

Read about new features

Read about new features in R here.

Also look at what was new in older versions and even older versions.

New apply-type functions

You probably know about apply, lapply, sapply, and tapply. But did you know about vapply and mapply? And how about replicate?

Parallel and Rcpp

Look at the parallel package, and perhaps read the Parallel R book.

Also look at Rcpp, a simpler way to call C/C++ functions from R. Read the Rcpp book.

Various

I searched through the NEWS files (mentioned above) and wrote down some of the functions that were new since 2002.

(Note that I have little experience with many of these, and some are not entirely recommended. For example, rickyars noted that inner_join and left_join in dplyr can be 10× faster than merge. Ben Bolker recommends the plyr::r*ply functions over replicate, as you get to define the return structure.)

Vectorize

which.min, which.max

stopifnot

strwrap

unsplit

rowSums, colSums, rowMeans, colMeans

slice.index

runmed

addmargins

head, tail

arrayInd

droplevels

saveRDS, readRDS

paste0

anyNA

rowsum

aggregate

by, merge, with

stack, reshape, relist


The source for this tutorial is on github.

I would be glad for suggestions, corrections, or additions.

Also see my git/github guide, knitr in a knutshell tutorial, minimal make tutorial, and initial steps towards reproducible research.