I’m at UCLA for the UseR Conference. I attended once before, and I really enjoyed it. And I’m really enjoying this one. I’m learning a ton, and I find the talks very inspiring.
In my comments below, I give short shrift to some speakers (largely by not having attended their talks), and I’m critical in some places about the conference organization. Having co-organized a small conference last year, I appreciate the difficulties. I think the organizers of this meeting have done a great job, but there are some ways it which it might have been better (e.g., no tiny rooms, a better time slot for the posters, and more space for the posters).
Day 1: Tutorials
The first day was devoted to tutorials. I arrived later than intended and missed the morning tutorials and was just in time for the afternoon ones. But the tutorial I wanted to attend, Ramnath Vaidyanathan’s tutorial on interactive documents was in a tiny room that was already filled beyond capacity. And by the time I tried to switch, the others were well underway, and I’d lost my momentum. Plus I was distracted by ice cream. And seeing Yihui, Karthik, Hilary, and Sandy.
Materials for many of the tutorials are online; grab them while you can.
The evening reception was interesting. I listened in to Joe Chang talking to Tal Galili about the value of dplyr and magrittr’s pipe operator (
%>%) for data manipulation in R. Joe was quite persuasive: replace a nested series of function calls, to be read from the inside out, with a stream of pipes.
I was excited to meet Scott Chamberlain and to see Jenny Bryan and Vince Vu, and of course Hadley.
The opening talk was by John Chambers. I’d not heard him speak before. (Back in 2005, I think, at JSM in Minneapolis, I tried to attend a session with John Chambers, Robert Gentleman, and Duncan Temple Lang, on the future of statistical computing, but it was assigned a room for 30 people, and by the time I got there there was a mass of more than 30 people outside the room.)
John talked about the origin of S, including a handwritten “viewgraph” from the first discussion of the idea in 1976. His basic point was that R was conceived not as a language but as an interface to algorithms: to make it easier to use Fortran-based routines for things like linear regression. And R continues to have much value in that regard: user-friendly interfaces for statistical calculations.
He gave a shout out to Rcpp, Rllvm, and h2o
The first contributed session was on interactive graphics from R.
Winston Chang introduced ggvis, which is an exciting effort at RStudio to make interactive graphics readily accessible from R. It uses a ggplot2-style syntax, but with the magrittr pipe operator, and using Vega as the underlying engine.
There were additional talks about plotly and iwPlot.
plotly is an effort to make D3-based graphs accessible from R (and python and matlab), as well as editable and collaborative. But I don’t like plotly’s Terms of service, nor that it’s closed source. And the presentation was all demonstration with little exposition and so a bit hard to follow.
I skipped a couple of important talks in here: on RCloud and OpenCPU. I needed a walk and an espresso.
And then Karthik Ram gave a great talk about rOpenSci, to foster open science: a large set of packages providing access to a variety of data sources.
Martin Machler gave a superb talk on “Good practices for R programming,” including seven principles:
Work with source files (rather than just typing in the console)
Keep R source readable and maintainable
Read the documentation
Learn from the masters (read others’ code)
Do not copy and paste (write functions)
Strive for clarity and simplicity
Test your code
He made a number of further points:
Leave optimization to later, clear code is more important than optimized code, and if you do optimize, measure rather than guess about speed.
Use version control
R CMD BATCHshould work on your R script.
.RData; your code should do all of the work. But use
savefor big computations and then
attach. (Later he clarified, don’t use
attachwith data frames, but do use it with
Make use of the
logargument to density functions, to avoid underflow, and use functions like
log1p, for accuracy.
drop=FALSEwhen subsetting a matrix, so that the outcome remains a matrix.
x == NA.
nis not a positive integer.
Hadley Wickham gave a persuasive and instructive talk on dplyr and magrittr’s pipe operator, for data manipulation tasks in R. Joe Cheng had largely convinced me the night before, but now I’m thoroughly convinced of the value of these tools. The bottlenecks in analysis tasks are thinking, coding, and doing. For data manipulation, dplyr helps with the thinking part, by having you focus on a set of basic operations: filter, select, arrange, mutate, summarize, and “group by,” and then also the left join, right join, semi-join, and anti-join. Magrittr’s pipe operator (%>%) helps with the coding: you create a stream of pipes rather than a nasty mess of nested function calls that need to be read from the inside out. And Rcpp helps with the doing: dplyr is super-fast (though not quite as fast as data.table). Further, dplyr takes advantage of R’s “non-standard evaluation” in a way that makes it easy to connect to external data sources, like databases, as if they were regular objects.
I skipped the rest of the session, though I should have at least stayed for the Matt Dowle’s presentation on data.table. (I understand that he could have an alternate career as a leader of group meditations.) I’ve just found that I have to limit myself. But a tweet by Tim Triche had a photo of a slide that showed the speed of data.table’s
fread function over R’s
read.csv. I need to adopt
The poster sessions were a tragedy. Seriously.
First, it was immediately after a long session (well, for those who stuck it through), and at the slot where one would want to eat dinner. (Martin’s talk was 3-4, the 3rd contributed session was 4-5:30, the posters were 5:30-7, and two panel sessions/mixers were 6:30-9.)
Second, the posters were arranged along a narrow hallway, and not on the walls but on stands which stuck out into the hallway. There really was no room for people to view the things.
It was too frustrating. I left to eat dinner.
It was great that the meeting included a well-attended panel discussion and mixer to discuss how to encourage more women R users. But I was a bit disappointed with the event.
We can all agree that women are underrepresented as contributors of packages to CRAN and as participants in R conferences. But why? And how to change this? I didn’t gain much insight.
I think I just don’t like panel discussion. The five women on the panel, and the moderator, are admirable and their views interesting. But they don’t have a unique understanding of the problem or solutions. I think I would have preferred a more open discussion, to hear much more from all of the other women in the room.
I felt the same way about the panel discussion at the meeting I co-organized in 2013: I would have preferred to have heard much more from the general audience and less from the panel.
The most shocking revelation concerned the unnecessarily extensive effort that Amelia McNamara had to expend in order to include a harrassment policy on the meeting web site. The value of such a policy should be obvious from the history of bad behavior at scientific and technical conferences.
I would have been happier had the discussion continued for another hour. The audience was just getting going when the event ended.
My personal opinion is that there are a lot more women R users out there, but they aren’t contributing R packages because it has long been a very cumbersome process. (But see Hilary Parker’s tutorial.) And they don’t necessarily identify as “R users” but rather as more general applied statisticians. And they’ve not yet learned of the great value in attending UseR conferences, for learning and networking.
UseR conferences are fabulous: for inspiration, learning, and networking. They’re not for gurus, but for normal people who want to share ideas and learn about new and exciting developments. (The next UseR meeting will be June 30-July 3, 2015, in Aalborg, Denmark.)
The smallest meeting room shouldn’t be very small.
Poster sessions need a proper time slot, and they need a lot of space.
I don’t much like panel discussions but rather prefer more open discussions.
I need to adopt dplyr and magrittr immediately. Also data.table’s
I need to learn shiny, ggvis, rCharts, and some of the rOpenSci packages (and ggplot2 for that matter; I still just use base graphics), at least so that I can get students using them.