Terry Speed recently gave a talk on the role of statisticians in “Big Data” initiatives (see the video or just look at the slides). He points to the history of statisticians' discussions of massive data sets (e.g., the Proceedings of a 1998 NRC workshop on Massive data sets) and how this history is being ignored in the current Big Data hype, and that statisticians, generally, are being ignored.
I was thinking of writing a polemic on the need for reform of academic statistics and biostatistics, but in reading back over Simply Statistics posts, I’ve decided that Rafael Irizarry and Jeff Leek have already said what I wanted to say, and so I think I’ll just summarize their points.
Following the RSS Future of the Statistical Sciences Workshop, Rafael was quite optimistic about the prospects for academic statistics, as he noted considerable consensus on the following points:
-
We need to engage in real present-day problems
-
Computing should be a big part of our PhD curriculum
-
We need to deliver solutions
-
We need to improve our communication skills
Jeff said, “Data science only poses a threat to (bio)statistics if we don’t adapt,” and made the following series of proposals:
-
Remove some theoretical requirements and add computing requirements to statistics curricula.
-
Focus on statistical writing, presentation, and communication as a main part of the curriculum.
-
Focus on positive interactions with collaborators (being a scientist) rather than immediately going to the referee attitude.
-
Add a unit on translating scientific problems to statistical problems.
-
Add a unit on data munging and getting data from databases.
-
Integrating real and live data analyses into our curricula.
-
Make all our students create an R package (a data product) before they graduate.
-
Most important of all have a “big tent” attitude about what constitutes statistics.
I agree strongly with what they’ve written. To make it happen, we ultimately need to reform our values.
Currently, we (as a field) appear satisfied with
-
Papers that report new methods with no usable software
-
Applications that focus on toy problems
-
Talks that skip the details of the scientific context of a problem
-
Data visualizations that are both ugly and ineffective
Further, we tend to get more excited about the fanciness of a method than its usefulness.
We should value
-
Usefulness above fanciness
-
Tool building (e.g., usable software)
-
Data visualization
-
In-depth knowledge of the scientific context of a problem
In evaluating (bio)statistics faculty, we should consider not just the number of JASA or Biometrics papers they’ve published, but also whether they’ve made themselves useful, and to the scientific community and well as to other statisticians.