I was at the useR! Conference at The University of Warwick in Coventry, UK, last week. My goal in going was to learn the latest things regarding (simple) dynamic graphics, (simple) web-based apps, parallel computing, and memory management (dealing with big data sets). I got just what I was hoping for and more. There are a lot of useful tools available that I want to adopt. I’ll summarize the high points below, with the particular areas of interest to me covered more exhaustively than just “highlights”.
I left feeling that my programming skills are crap. My biggest failing is in not making sufficient use of others' packages, but rather just building what I need from scratch (with great effort) and skipping dynamic graphics completely.
General
There were 440 participants from 41 countries (342 Europe; 60 North America).
Prof. Brian Ripley [picture taken from here] spoke about the R development process.
-
There are now >3000 packages on CRAN, with 110 submissions per week (of which 80 are successful), basically all handled by Kurt Hornik.
-
CRAN will throw out binaries of packages that are more than two years old.
-
What’s within the base of R will shrink rather than grow.
-
There have been a lot of improvements in the rendering of graphics.
-
R is heavily dependent on a small number of altruistic developers, many of whom feel their contributions are not treated with respect.
-
library()
is to be replaced byuse()
. -
There will soon be a
parallel
package for parallel computing.
David Smith from Revolution Analytics, in his talk on the R Ecosystem, claimed that there are more than 2 million R users.
Barry Rowlingson gave a great lightning talk, “Why R-help must die!” He suggested the Q-and-A type sites Stack Overflow (on programming) and Cross Validated (on statistics), both part of Stack Exchange; they hope to create an R-specific site soon (currently R things are split between the two).
Tal Galili (founder of R-bloggers) gave a lightning talk about blogging and R. He emphasized that one need not write frequent posts. At the 2010 useR! Conference he gave a more comprehensive introduction to blogging available here.
Andreas Leha talked about emacs org-mode for reproducible research: sort of like Sweave but it will take a large number of possible languages (R, python, Java, …) and can produce html as well as pdf.
Tobias Verbeke from OpenAnalytics talked about StatET, an Eclipse-based IDE for R (especially useful for debugging).
Patrick Burns talked about random input software testing and had a great analogy: if writing test suites is like digging ditches, random input testing is like digging in the sand (ie, fun). (I do random input testing, but with my users providing the inputs.) [slides (with transcript)]
Olaf Mersmann gave a cool talk about microbenchmark to get accurate nanosecond timings of R expressions. Why do that? Because if you repeat something 1000 times in a for
loop and then time it with system.time()
, you are including the overhead of the for
loop.
The next useR! Conference will be at Vanderbilt in Nashville, TN, June 12-15, 2012.
Graphics
Toby Dylan Hocking had a poster about the directlabels package for putting labels near curves or clusters of points in a plot rather than have a separate legend. I really like the technique and am eager to try out the package. For figures for a publication, one will probably want to edit things by hand, but for day-to-day, the package looks extremely useful.
Sina Rüeger presented uniPlot to reduce time to polish reports, by making base graphics, ggplot2, and lattice all use the same style (otherwise the reader may be distracted by the differences). [CRAN]
Alexander Kowarik presented sparkTable for creating html tables with very small graphs included.
Simon Urbanek mentioned several new features in R graphics:
-
R will soon include
dev.hold()
anddev.flush()
(written by Prof. Brian Ripley) so that you can tell a graphics device when you actually want to see a plot. This should improve graphics rendering. -
rasterImage()
is way faster thanimage()
-
osmap()
to get OpenStreetMap maps in one line, with different zoom levels. [It’s in the snippets package.] -
polypath()
to plot polygons with holes.
Dynamic graphics
Paul Murrell gave an especially cool talk on turning a PDF of a campus map of the University of Aukland into a dynamic graph where mousing over a number in the legend lights up the number at the corresponding building (and vice versa). The pdf was converted to postscript, loaded into R via grImport and written to an SVG via gridSVG. There was a bit of javascript to write. [slides | page with the interactive maps (you may need Firefox for the maps)]
Adrian Bowman gave a talk on modeling 3d surfaces with some great dynamic graphics using RGL and rpanel. He had some cool animations in his PDF, too, developed with the animate package for LaTeX.
Simon Urbanek talked about iPlots eXtreme (currently: codename Acinonyx) which has fabulous and easy-to-create dynamic graphics. You basically just prefix the usual plot functions with an “i” (iplot, ihist). Super fast and can handle big data sets. It uses OpenGL (a solution developed by the gaming industry).
TIBCO Spotfire has some ways to develop interactive graphics tools, but it’s commercial and Windows only.
Ian Hansel talked about Rocessing, for combining processing with R.
Richie Cotton talked about the use of gWidgets for easy interactive graphics. [data and code here; also see this blog post]
Adrian Waddell talked about RnavGraph for interactive graphics. He had some neat ideas about navigating among multiple scatterplots: a graph where nodes are images and where moving along an edge between two nodes involves morphing from one image to another. Moving from one scatterplot to another is like rotating a 3d scatterplot. [CRAN]
Simple web applications
Timothee Carayol gave a lightning talk about how to use RGoogleDocs and rApache for quick and easy deployment of a web interface. You set up a spreadsheet, which acts as a configuration file for rApache, so RGoogleDocs handles the inputs in place of what could be complex web programming. It sounds neat but I don’t fully understand it. But Timothee wrote to say that he would write a tutorial in the coming weeks. [slides]
Wolfgang Huber gave a cool talk on the analysis of images of cells from a large experiment in which each gene is knocked down, one at a time, by RNAi. They are creating interactive reports that are like Sweave but using html5 to give dynamic reports viewed in a browser. He gave examples using the arrayQualityMetrics package. One can identify points via tooltips; multiple plots are linked; click to select/highlight; collapse/expand sections. Callback processing is in javascript. It uses the gridSVG package (which makes it mostly easy).
Eleni-Anthippi Chatzimichali talked about iWebPlots for making dynamic, web-based scatterplots.
Comprehensive web applications and GUIs
E. James Harner spoke about Rc2 for collaborative use of R (including shared R sessions with voice chat), aimed to support distance learning. It seems really complicated and not easily adapted for others' use.
Naim Matasci had a poster about iPlant which has a fancy web front end with interactive analysis (using R). They have something like 10 developers working on it, but he said that the source code will be available and that it could be adapted for other purposes.
Xavier de Pedro Puente discussed the use of Tiki with R to make comprehensive web sites with wiki-like pages including R (converted to output or graphs on the fly) or web-based forms. It uses his PluginR package. It seems a great idea, but is likely too complicated for me. The key coding is with Smarty. [slides]
Jason Waddell and Tobias Verbeke from OpenAnalytics talked about the use of the R service bus [documentation] for lab automation.
David Nicolaides from Accelrys talked about browser-based applications using Pipeline Pilot.
Sheri Gilley from Revolution Analytics presented a GUI that they’re working on. It looks like it will be superb for the novice who wants a GUI. They’ll have a beta by the end of 2011, with the real release in 2012. Sheri spent 25 years doing UI design at SPSS (so I guess she was 10 or 15 when she started).
Companies
I was surprised by the large number of companies forming around R.
-
Revolution Analytics: have code for handling large datasets and parallel computing and are developing a GUI.
-
RStudio: aimed to be an IDE (supporting programmers) rather than a GUI. Upcoming features (including quick traversal of code across multiple files) look cool, but I’ll probably stay with emacs.
-
OpenAnalytics: R service bus and StatET (IDE via Eclipse)
-
TIBCO: purchased Insightful (who had bought Splus) in 2008.
-
CloudNumbers: cloud-based computing, including the use of R.
Talks I wish I’d seen
Andrej Blejec talked about his animatoR package for creating animations in R.
Jonathan Rougier talked about nomograms (and donkeys).
Markus Gesmann talked about using the Google visualization API with R: the googleVis package. [detailed info]
Things (particularly packages) I need to try out
-
Emacs org-mode
-
directlabels package for automatically putting labels directly next to curves or clouds of points.
-
hexbin package for dense scatterplots.
-
animation for making animations in R
-
grid and ggplot2 (I’m still just using base graphics)
-
gridSVG for making complex web-based dynamic graphics
-
sparkTable for making html tables with small figures inserted
-
uniPlot for making base graphics, ggplot2, and lattice all use the same style. [CRAN]
-
compareGroups for making complex tables with confidence intervals and p-values and such, like epidemiologists (and my collaborators) often want
-
arrayQualityMetrics which creates fancy web-based dynamic reports
-
osmap (contained in snippets) for making maps.
-
animatoR for making animations